"Serge E. Hallyn" <[email protected]> writes: > Hi, > > So we've been over this many times... but unfortunately there is more > breakage to report. Regular privileged and unprivileged containers > work all right for us. But running an unprivileged container inside a > privileged container is blocked. > > When creating privileged containers, lxc by default does a few things: > it mounts some fuse.lxcfs files over procfiles include /proc/meminfo and > /proc/uptime. It mounts proc rw but /proc/sysrq-trigger ro as well as > moves /proc/sys/net out of the way, bind-mounts /proc/sys readonly > (because this container is not in a user namespace) then moves > /proc/sys/net back. Finally it mounts sys ro but bind-mounts > /sys/devices/virtual/net as writeable. > > If any of these are left enabled, unprivileged containers can't be > started. If all are disabled, then they can be. > > Can we find a way to make these not block remounts in child user > namespaces? A boot flag, a procfs and sysfs mount option, a sysctl?
Are any of these overmounts done for the purpose of security? It appears the /proc/sys and /sys mounts being made read-only is for that purpose. If none of the mounts are for secuirty the easy solution that works today is to also mount /proc and /sys somewhere else in your container so that the permission check for mounting a new copy passes. That said /proc/sys appears to be a show stopper in this scheme. As the root of your privileged container can enter your unprivileged container it can bypass your read-only /proc/sys by mounting a new copy of proc if we allow the relaxation you are requesting. Therefore the only choice on the table (and I don't have a clue how realistic it is) is to have a variant of proc with just files describing processes. Call it processfs. That would not need the current restrictions. As for sysfs I am drawing a blank about what might be possible. Eric

