Hi, Over the last few weeks several people have talked to me about various issues relating to devices in containers. So I thought I'd send this out as a general survey of the work that I know of being done relating in one way or another to devices in containers. Different people have different goals, and several people are doing their own thing to achieve their goals. I wanted to get started by having everyone being aware of what others are doing, in the hopes that, over the next few years, we can work toward a comprehensive solution.
So here goes. Some people (mwarfield, Michael Coss) want to send uevents into specific containers, i.e. consoles or X displays. Michael Warfield (AIUI) does this by moving devices into /dev/lxc/$container/. At the containers track at plumbers a few weeks ago, Michael Coss presented a solution developed at Lucent where uevents were sent only to the initial netns, and a userspace daemon checks a database and forwards uevents directly into containers so that containers can hotplug as needed: http://www.linuxplumbersconf.org/2014/ocw/sessions/2157 Several people have wanted to use iscsi in containers. AIUI Containers (at least non-userns) can use iscsi devices if they are moved into the containers namespace, however Clint was wanting to go further and actually be able to create iscsi devices inside containers. My memory may fail me, but I believe that to solve that we'd need to extend the current netlink backend, which (IIRC) only accepts connections from the initial netns. More realistically, I'd envision an answer to this being a userspace daemon on the host which the containers can talk to to make requests. OTOH it also feels similar to the loop device namespacing issues which would be far more elegantly solved in the kernel (imo). Does anyone know of existing work to this end (either way) for iscsi? A few people have worked at the device driver level to actually namespace the devices themselves. For instance, Cellrox supports switching the active display between multiple containers. When c2 is the active container, its writes go to the real display. When it is not the active container, its writes are buffered. This allows the devices to be namespaced without any actual general "device namespace" support in the kernel. ISTR there was another company doing something similar (I don't recall for which devices), but can't remember who/what at the moment. As I alluded to earlier, Seth had previously done a bit of work on namespacing loop so that containers could create and use their own loop devices. For the moment that's been shelved and he's focused on fuse inside containers instead. However, at the kernel summit this year Ted T'so said that at least mounting ext4 inside a container "should" work, meaning that any issues (i.e. ability to corrupt the supserblock reader with trash data by a malicious container admin) would be considered a bug (rather than "don't do that" misuse). I hope to follow up on that with some simple tests, and of course loopdev in containers will become far more compelling if we can actually mount ext4 in a container (which we can't right now). There's probably more and I aplogize to anyone who's work I neglected to mention here. I think we're at a point where collaboration would be useful. thanks, -serge PS - I certainly get some details wrong. I'm gonna lie and claim that I did so on purpose to encourage responses/discussion :) _______________________________________________ lxc-devel mailing list [email protected] http://lists.linuxcontainers.org/listinfo/lxc-devel
