> On March 12, 2016, 4:02 a.m., Jie Yu wrote:
> > src/slave/containerizer/mesos/isolators/network/cni.hpp, line 85
> > <https://reviews.apache.org/r/44514/diff/2/?file=1294397#file1294397line85>
> >
> >     I would suggest we have a `Info` for each container.
> >     
> >     ```
> >     struct Info
> >     {
> >       ...
> >     };
> >     ```
> >     
> >     Remember that for any field you put into the Info struct, you need to 
> > be able to 'recover' it in recover function. It's also possible that 
> > ExecutorInfo/TaskInfo are not available during recover (orphans due to 
> > wiped meta data). You need to think about how to recover (e.g., the name of 
> > the network) that the container has joined.
> 
> Avinash sridharan wrote:
>     We also need to recover the IP address associated with the container. 
> Since this has to be returned as part of `NetworkINfo` . Need to figure out 
> how to recover this?
> 
> Jie Yu wrote:
>     I am not sure. Why do we need to store/recover IP address? When destroy a 
> container, the CNI plugin only needs container ID (which can be the same as 
> our container Id), the namespace handle and the networkconfig file. The rest 
> can all be fixed.
>     
>     I think the trick part is: when there is an orphan container (no 
> executorinfo/taskinfo), how can we figure out its network namespace handle 
> and networconfig file so that we can destroy it (call CNI plugin DEL). We 
> need to know the name of the network to get the networkcofig file, and the 
> pid to get the network namespace handle.
> 
> Qian Zhang wrote:
>     Yes, that is the tricky part. For orphan container (e.g., the container 
> created by the framework without checkpoint enabled), when agent restarts, in 
> the recover() we only have container ID which is not enough to invoke CNI 
> plugin DEL. Totally we need:
>     1) Container ID.
>     2) Network namespace path.
>     3) Network configuration.
>     4) Name of the interface inside the container.
>     
>     I think currently we only have 1).
>     For 2), my idea is, in `isolate()`, before calling CNI plugin, we do a 
> bind mount (`/var/run/netns/[pid]` -> `proc/[pid]/ns/net`), and then a symbol 
> link (`/var/run/mesos/cni/netns/[containerId]` -> `/var/run/netns/[pid]`). 
> And then in `recover()`, since we have `containerId` for orphan containers, 
> we should be able to figure out the `pid` based on the symbol link, and then 
> get the network namespace path `/proc/[pid]/ns/net`.
>     For 3), actually when we call CNI plugin for a container in `isolate()`, 
> the CNI plugin (actually the CNI `host-local` IPAM plugin) will create a file 
> under `/var/lib/cni/networks/[neworkName]/`, the name of the file is the IP 
> assigned to the container, and the content of file is the container ID. So if 
> a container is assigned an IP from a CNI network, it must have a file under 
> `/var/lib/cni/networks/[neworkName]/`, we may leverage this information to 
> figure out the name of CNI networks the orphan container joins. But this may 
> only work for the CNI network which uses `host-local` IPAM plugin, for `dhcp` 
> IPAM plugin, I think it will not write any files for containers.
>     For 4), I do not have idea for it yet :-(
>     
>     Or another solution would be, when we create the symbol link in 
> `isolate()`, we may encode more info into the name of symbol link, e.g., 
> `/var/run/mesos/cni/netns/[containerId]-[net1]_[ifName in 
> net1]-[net2]_[ifName in net2]...`, however this seems ugly to me.
> 
> Avinash sridharan wrote:
>     For (2) why not bind mount directly to 
> /var/run/mesos/cni/netns/[continaerId] and create a simlink to 
> /var/run/netns/. Also, Jie mentioned it would be a good idea to prefix the 
> `[containerId]` with a conanical string like mesos-cni. For (3) I don't think 
> we should rely on files created by CNI plugins. There is no guarantee that 
> the IPAM plugin would maintain those files. Instead if we have already bind 
> moutned the network namespace, the namespace will persist in the kernel even 
> after the container has exited (became an orphan). At recover we just need to 
> walk to list of `containerID` stored under /var/run/mesos/cni/netns and for 
> each network namespace we can run the `ip netns comamnd` to retrieve the link 
> and IP address for that network namespace.
> 
> Qian Zhang wrote:
>     For 2), what symlink do you want to create under `/var/run/netns`? 
> `/var/run/netns/[pid] -> /var/run/mesos/cni/netns/[containerId]`? Or 
> `/var/run/netns/[containerId] -> /var/run/mesos/cni/netns/[contiainerId]`? 
> And you can take a look at 
> https://github.com/apache/mesos/blob/0.27.2/src/slave/containerizer/mesos/isolators/network/port_mapping.cpp#L2205:L2243,
>  What it does is same as what I proposed above.
>     
>     And good point for 3), I agree in `recover()`, we can run `ip netns exec 
> [namespace] ip addr` to retrieve the link and IP, but the problem is how we 
> can know which CNI networks that these links/IPs belong to? E.g., after run 
> `ip netns exec [namespace] ip addr` for a container, we get eth0/10.0.0.3 and 
> eth1/10.0.1.3, then how can we know the CNI networks they belong to? Maybe we 
> can encode CNI network name in the interface name when calling CNI plugin in 
> `isolate()`, e.g., name the interface name as `net0-eth0` rather than `eth0`, 
> but this seems a bit strange to me and not consistent with the interface 
> naming convention in Linux.
> 
> Jie Yu wrote:
>     I don't think we need the symlink. We can just use a bind mount 
> /var/run/netns/mesos_cni_<ContainerID>. You can specify the namespace handle 
> using the bind mount directly (no need to get the pid). I think the symlink 
> in port mapping isolator is mainly for legacy reason (because initially, we 
> only have the pid bind mount). But turns out that we definitely need to 
> checkpoint some data under /var/run (for ifname and network name). The layout 
> might be something like the following:
>     ```
>     /var/run/mesos/isolators/network/cni/<ContainerID>/
>       |-- ns -> /proc/<pid>/ns/net (bind mount)
>       |-- <net1>/
>             |-- ifname
>             |-- ipv4
>       |-- <net2>/
>             |-- ifname
>             |-- ipv4
>     ```

Thank Jie. I agree that we only need a bind mount and no need the symlink.

And for the layout under `/var/run/mesos/isolators/network/cni/<ContainerID>/`, 
I have a few questions:
1. If we create `|-- ns -> /proc/<pid>/ns/net (bind mount)`, then I think we do 
not need the bind mount `/var/run/netns/mesos_cni_<ContainerID> -> 
/proc/<pid>/ns/net`, right?
2. Do we need a sub-dir (e.g., `<net1>/` in your layout) for each CNI network 
that the container joins? Or what about just a file with network name as its 
file name and the line in the file is `ifname:ipv4`.
3. Do you think that we need to introduce an agent flag for operator to specify 
the directory `/var/run/mesos/isolators/network/cni/` when starting agent? Or 
we can just keep it internal?

So basically the layout in my mind is like:
  /var/run/mesos/isolators/network/cni/<ContainerID>/
    |-- ns -> /proc/<pid>/ns/net (bind mount)
    |-- <net1>: ifname1:ip1  
                ifname2:ip2
    |-- <net2>: ifname3:ip3
                ifname4:ip4


- Qian


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/44514/#review123200
-----------------------------------------------------------


On March 10, 2016, 10:20 p.m., Qian Zhang wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/44514/
> -----------------------------------------------------------
> 
> (Updated March 10, 2016, 10:20 p.m.)
> 
> 
> Review request for mesos, Avinash sridharan, Gilbert Song, and Jie Yu.
> 
> 
> Bugs: MESOS-4759
>     https://issues.apache.org/jira/browse/MESOS-4759
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Implemented prepare() method of "network/cni" isolator.
> 
> 
> Diffs
> -----
> 
>   src/slave/containerizer/mesos/isolators/network/cni.hpp PRE-CREATION 
>   src/slave/containerizer/mesos/isolators/network/cni.cpp PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/44514/diff/
> 
> 
> Testing
> -------
> 
> make check
> 
> 
> Thanks,
> 
> Qian Zhang
> 
>

Reply via email to