Why does "delete" need an agent restart? I think operators can just delete the CNI network configuration file from "--network_cni_config_dir" at any time they want, and later when a framework tries to launch a container to that deleted CNI network, CNI isolator will find that network is in its cache but not in the disk, so it can fail framework's request and remove that CNI network from its cache. So it is kind of lazy delete in cache.
Thanks, Qian Zhang On Thu, Dec 8, 2016 at 8:12 AM, Avinash Sridharan <[email protected]> wrote: > On Wed, Dec 7, 2016 at 4:07 PM, Daniel Osborne <[email protected]> wrote: > > > For the record, we already support a). Qian explains it here: > > https://issues.apache.org/jira/browse/MESOS-6567? > > focusedCommentId=15652501&page=com.atlassian.jira. > > plugin.system.issuetabpanels:comment-tabpanel#comment-15652501 > > > > You are correct. We don't store the config in-memory just the `name`. So > we will be reading the config every time we launch a new container. So > looks like "delete" is the only operation that will need an agent restart. > > > > > On Wed, Dec 7, 2016 at 4:02 PM, Avinash Sridharan <[email protected] > > > > wrote: > > > > > Thinking about the solution of treating the CNI config as an in-memory > > > cache and doing disk reads on failures I see two problems: > > > a) We won't be able to support modifications to CNI networks. Since > > > modification to existing networks won't generate a miss. > > > b) We won't be able to support deletion of CNI networks. > > > > > > The two operations above will still need an agent restart. > > > > > > On Wed, Dec 7, 2016 at 3:40 PM, Avinash Sridharan < > [email protected] > > > > > > wrote: > > > > > > > > > > > > > > > On Wed, Dec 7, 2016 at 3:31 PM, Avinash Sridharan < > > [email protected] > > > > > > > > wrote: > > > > > > > >> > > > >> > > > >> On Wed, Dec 7, 2016 at 3:17 PM, Daniel Osborne <[email protected]> > wrote: > > > >> > > > >>> Chiming in since I raised an identical issue a few weeks back: > > > >>> https://issues.apache.org/jira/browse/MESOS-6567 > > > >>> > > > >>> The proposed endpoint solution sounds plausible. However I'd like > to > > > >>> explore if it solves the use case I raised my issue for. I was > trying > > > to > > > >>> create a Mesos framework that adds new CNI networks. But [IIRC] the > > > Agent > > > >>> API can't be reached from a Mesos Executor instance since the Agent > > > could > > > >>> be listening on a non-default port, or on any of its IPs. The > > executor > > > >>> instance doesn't know that information, so after it installs the > > > plugin, > > > >>> it > > > >>> won't know how to reach that new reload endpoint. > > > >>> > > > >> > > > >> Just trying to understand the problem you are alluding to here. The > > > >> executor needs to register with the agent in order to launch the > > > container, > > > >> so it should have reachability to the agent, and hence the endpoint? > > > >> > > > >> > > > >>> - Is there a reliable way to reach the reload endpoint from a > > default > > > >>> executor instance? > > > >>> - Why not scan the config directory every time? Are you trying to > > avoid > > > >>> the > > > >>> speed hit from disk reads? > > > >>> > > > >> By scan the config directory every time, do you mean run a timer > that > > > >> will periodically scan the config directory and keep updating the > > > configs. > > > >> This is feasible. The only problem is that the point at which the > > > operator > > > >> write the config and the point at which the network will be > available > > > for > > > >> container launch will not be deterministic. The behavior would be > much > > > >> cleaner if we can make it deterministic. > > > >> > > > > > > > > Daniel, ignore this comment. I think you were referring to using the > > disc > > > > as a cache as Vinod had pointed out. I misread your suggestion. > > > > > > > >> Best, > > > >>> -Dan > > > >>> > > > >>> On Wed, Dec 7, 2016 at 3:01 PM, Avinash Sridharan < > > > [email protected] > > > >>> > > > > >>> wrote: > > > >>> > > > >>> > @adam @vinod Starting to work on > > > >>> > https://issues.apache.org/jira/browse/MESOS-6679 . Need some > > inputs. > > > >>> > > > > >>> > > > > >>> > The goal is to allow the `network/cni` isolator to load CNI > configs > > > >>> without > > > >>> > the need for agent restarts. Had a discussion with @jieyu and the > > > >>> solution > > > >>> > we came up with was for the `network/cni` isolator to expose an > > > >>> endpoint > > > >>> > called `reload`. The endpoint will accept `POST` requests (with > an > > > >>> empty > > > >>> > body), which will trigger the `network/cni` isolator to reload > the > > > CNI > > > >>> > configs present in the `network_cni_config_dir`. On a successful > > > >>> `reload` > > > >>> > the `network/cni` isolator will respond with an empty HTTP > > response. > > > >>> Wanted > > > >>> > to run this by you guys to understand the implications on > > authn/authz > > > >>> and > > > >>> > if this is the right place (the `network/cni` isolator) to host > > this > > > >>> > endponit? > > > >>> > > > > >>> > -- > > > >>> > Avinash Sridharan, Mesosphere > > > >>> > +1 (323) 702 5245 > > > >>> > > > > >>> > > > >> > > > >> > > > >> > > > >> -- > > > >> Avinash Sridharan, Mesosphere > > > >> +1 (323) 702 5245 > > > >> > > > > > > > > > > > > > > > > -- > > > > Avinash Sridharan, Mesosphere > > > > +1 (323) 702 5245 > > > > > > > > > > > > > > > > -- > > > Avinash Sridharan, Mesosphere > > > +1 (323) 702 5245 > > > > > > > > > -- > Avinash Sridharan, Mesosphere > +1 (323) 702 5245 >
