Re: [systemd-devel] What makes systemd-nspawn "not suitable for secure container setups"?
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 04/26/2011 01:54 PM, Lennart Poettering wrote: > On Mon, 25.04.11 20:51, microcai (micro...@fedoraproject.org) wrote: > >> 于 2011年04月25日 20:43, Daniel J Walsh 写道: >>> SELinux would be a good start. >> >> No, root inside can still change SE-Linux policy. > > No. The SELinux policy can forbid reloading the SELinux policy for > certain users/processes. > > SELinux should work fine to secure nspawn containers. > > Lennart > Right the idea would be to run all processes within te nspawn container with the same process label, then only allow the access required for the container. -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.11 (GNU/Linux) Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org/ iEYEARECAAYFAk23B90ACgkQrlYvE4MpobNUXACgma9He3gGO6tZdv7WVwJaE0oe mUsAoJ2GMaDRfP7hpflfS3Eqx3wEQKtM =CqeA -END PGP SIGNATURE- ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] What makes systemd-nspawn "not suitable for secure container setups"?
On Mon, 25.04.11 20:51, microcai (micro...@fedoraproject.org) wrote: > 于 2011年04月25日 20:43, Daniel J Walsh 写道: > > SELinux would be a good start. > > No, root inside can still change SE-Linux policy. No. The SELinux policy can forbid reloading the SELinux policy for certain users/processes. SELinux should work fine to secure nspawn containers. Lennart -- Lennart Poettering - Red Hat, Inc. ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] What makes systemd-nspawn "not suitable for secure container setups"?
于 2011年04月25日 20:43, Daniel J Walsh 写道: > SELinux would be a good start. No, root inside can still change SE-Linux policy. ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] What makes systemd-nspawn "not suitable for secure container setups"?
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 04/22/2011 07:42 PM, Josh Triplett wrote: > The systemd-nspawn manpage lists the various mechanisms used to isolate > the container, and then says "Note that even though these security > precautions are taken systemd-nspawn is not suitable for secure > container setups. Many of the security features may be circumvented and > are hence primarily useful to avoid accidental changes to the host > system from the container." > > How can a process in a systemd-nspawn container circumvent the container > setup? What additional steps would systemd-nspawn need to take to > provide a secure container setup? > > - Josh Triplett > ___ > systemd-devel mailing list > systemd-devel@lists.freedesktop.org > http://lists.freedesktop.org/mailman/listinfo/systemd-devel SELinux would be a good start. -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.11 (GNU/Linux) Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org/ iEYEARECAAYFAk21bFcACgkQrlYvE4MpobNwJwCeO7xqfUTykQGDQsiJj3oAYD/4 4bIAoNJucumKU17lquo/insid7cYwCg9 =H8IP -END PGP SIGNATURE- ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] What makes systemd-nspawn "not suitable for secure container setups"?
]] Lennart Poettering [...] | (Consider the container blocking all ports > 6000 thus making it | impossible to run X on the host). But this one is actually not a big | issue in the end I guess, so let's ignore it here. X doesn't listen on tcp by default those days, so this shouldn't be a problem in this specific case. -- Tollef Fog Heen UNIX is user friendly, it's just picky about who its friends are ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] What makes systemd-nspawn "not suitable for secure container setups"?
On Sat, 23.04.11 13:29, microcai (micro...@fedoraproject.org) wrote: > > Ah, good point. So, root inside the container can trivially circumvent > > the container that way. Any way to prevent that with current kernel > > support, or would fixing this require additional kernel changes to lock > > down other /proc and /sys mounts? > > > OpenVZ is what you need that way. OpenVZ is much like systemd-nspawn, > but with more secure. So it can be used to provide VPS ;) I never looked in much detail into OpenVZ but quite honestly I have my doubts that it is completely sealed off and really doesn't suffer by any of the vulnerabilities I pointed out in my other mail. OpenVZ is probably at a better spot than the vanilla kernel whith container virtualization, but I think they define "secure" much more losely than some folks are aware of. Lennart -- Lennart Poettering - Red Hat, Inc. ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] What makes systemd-nspawn "not suitable for secure container setups"?
On Fri, 22.04.11 21:16, Josh Triplett (j...@joshtriplett.org) wrote: > On Sat, Apr 23, 2011 at 11:28:58AM +0800, microcai wrote: > > 于 2011年04月23日 10:55, Josh Triplett 写道: > > > The systemd-nspawn manpage lists the various mechanisms used to isolate > > > the container, and then says "Note that even though these security > > > precautions are taken systemd-nspawn is not suitable for secure > > > container setups. Many of the security features may be circumvented and > > > are hence primarily useful to avoid accidental changes to the host > > > system from the container." > > > > > > How can a process in a systemd-nspawn container circumvent the container > > > > remount /proc and /sys > > Ah, good point. So, root inside the container can trivially circumvent > the container that way. Any way to prevent that with current kernel > support, or would fixing this require additional kernel changes to lock > down other /proc and /sys mounts? Yes, by dropping CAP_SYS_ADMIN for the container. As mentioned we could do that probably, but there are a lot of other problems remaining. > That particular problem only applies if running code within the > container as root. How about if running code as an unprivileged user? > With that addition, does systemd-nspawn provide a secure container > (modulo local privilege escalation vulnerabilities)? You cannot boot a full system without handing out root access to a container. But one of the advantages of nspawn is actually that it allows you to boot a full OS inside it just like that. Lennart -- Lennart Poettering - Red Hat, Inc. ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] What makes systemd-nspawn "not suitable for secure container setups"?
On Fri, 22.04.11 19:55, Josh Triplett (j...@joshtriplett.org) wrote: > The systemd-nspawn manpage lists the various mechanisms used to isolate > the container, and then says "Note that even though these security > precautions are taken systemd-nspawn is not suitable for secure > container setups. Many of the security features may be circumvented and > are hence primarily useful to avoid accidental changes to the host > system from the container." > > How can a process in a systemd-nspawn container circumvent the container > setup? What additional steps would systemd-nspawn need to take to > provide a secure container setup? Well, the question is of course what "secure" actually means... But here's why I put this sentence in the man page: First of all, we don't virtualize AF_UNIX abstract namespace sockets. It is part of the network virtualization, and I explicitly decided not do virtualize that, to simplify things, since otherwise containers need specific network configuration, and they'd be much harder to use hence than chroots, but the simplicity to use of chroot is what I was heading for. Ideally AF_UNIX virtulaization would not be part of CLONE_NEWNET but of CLONE_NEWIPC, since it is a local IPC interface, and has nothing to do with the network, but I guess that's too late now. Fortunately not many services use abstract namespace sockets, since they are insecure and mostly unnecessary in most cases these days. There are a few exceptions though: some services use randomly named unix sockets. And there's udev. Since we don't want to run a second udev in the container we actually benefit from this here: only the host udev can bind the socket, hence the container udev will immediately fail. The missing virtualization of the abstarct namespace means processes can talk to services outside of the namespace. This has obvious problems. And a couple of non-obvious ones on top: SCM_CREDENTIALS will be weird due to the non-matching users and stuff. When we enter the container we drop all capabilities, except the following: CAP_CHOWN, CAP_DAC_OVERRIDE, CAP_DAC_READ_SEARCH, CAP_FOWNER, CAP_FSETID, CAP_IPC_OWNER, CAP_KILL, CAP_LEASE, CAP_LINUX_IMMUTABLE, CAP_NET_BIND_SERVICE, CAP_NET_BROADCAST, CAP_NET_RAW, CAP_SETGID, CAP_SETFCAP, CAP_SETPCAP, CAP_SETUID, CAP_SYS_ADMIN, CAP_SYS_CHROOT, CAP_SYS_NICE, CAP_SYS_PTRACE, CAP_SYS_TTY_CONFIG. Due to the PID, fs and IPC namespacing a couple of these capabilities should not be much of a problem. Except for a few cases: - We don't virtualize the network for simplicity reasons, that means CAP_NET_BIND allows processes in the container to bind to any port, thus blocking stuff outside of the container to work. Now, it would be easy to remove this capability too, but this of course would still allow DoS high port services on the host from withing the container. (Consider the container blocking all ports > 6000 thus making it impossible to run X on the host). But this one is actually not a big issue in the end I guess, so let's ignore it here. - CAP_NET_RAW means that the container can sniff into the host's traffic. - CAP_SYS_ADMIN is a grab bag of things, and is the biggie here. With this the container can remount /sys, /selinux and /proc/sys read-writable and thus influence this host massively. It can disable swap partitions, too, and lots and lots of other things, too. - A couple of the FS related operations might be problematic since the abstract namespace sockets are not virtualized, and thus you could do privileged operations on fds from outside the container. There's also currently no virtualization of the users. That means RLIMIT_NPROC and stuff when applied in the container will also affect the same user outside of the container. That's pretty bad... Some of these issues require kernel support to fix properly (for example the RLIMIT_NPROC issue). Other's we could fix in userspace probably. For example, we might be able to make CAP_SYS_ADMIN unnecessary if we premount really everything in the container that it might need. systemd is already smart enough to be happy with pre-mounted directories, not entirely sure about sysvinit though. With a bit of work we probably could even add CLONE_NEWNET support, and automatically set up a valid virtualized net interface for the container, that could not be reconfigurable by the container and is always forwarded to the host, but which buys us AF_UNIX abstract namespace virtualization and fixes the CAP_NET_BIND issue. With CLONE_NEWUSER in place and these changes we could probably make things reasonably secure. But especially figuring out a way to virtualize the network in an elegant way so that things will continue to "just work" is not going to be easy. Lennart -- Lennart Poettering - Red Hat, Inc. ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] What makes systemd-nspawn "not suitable for secure container setups"?
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 于 2011年04月23日 12:16, Josh Triplett 写道: > On Sat, Apr 23, 2011 at 11:28:58AM +0800, microcai wrote: >> 于 2011年04月23日 10:55, Josh Triplett 写道: >>> The systemd-nspawn manpage lists the various mechanisms used to isolate >>> the container, and then says "Note that even though these security >>> precautions are taken systemd-nspawn is not suitable for secure >>> container setups. Many of the security features may be circumvented and >>> are hence primarily useful to avoid accidental changes to the host >>> system from the container." >>> >>> How can a process in a systemd-nspawn container circumvent the container >> >> remount /proc and /sys > > Ah, good point. So, root inside the container can trivially circumvent > the container that way. Any way to prevent that with current kernel > support, or would fixing this require additional kernel changes to lock > down other /proc and /sys mounts? OpenVZ is what you need that way. OpenVZ is much like systemd-nspawn, but with more secure. So it can be used to provide VPS ;) > > That particular problem only applies if running code within the > container as root. How about if running code as an unprivileged user? > With that addition, does systemd-nspawn provide a secure container > (modulo local privilege escalation vulnerabilities)? > > Thanks, > Josh Triplett -BEGIN PGP SIGNATURE- Version: GnuPG v2.0.17 (GNU/Linux) iQEcBAEBAgAGBQJNsmPCAAoJEKT4Uz7oTANZ5DEH/1xAJvN0UqGv4JNMTuy/Hl8/ P7+6BkmhbE8wXtQt37z5QQNaDoNKNiTrdkppPWboFCsf4ulZyf02jkJGqN0BJoWg IC9xTWv2dE8RK+r3cnD1Nx0jpHuTq56Bo/W1UGeY+JKKNC/Ox8M81i+7M8xKrOB7 zhNnElNRTnHOHmzqSlcC1ODMnDw69lVpxZ0HusxpTAKLp1ms49PlhnFcXokHsD6/ GwhSNR7zjlimxUvoVbOPXqiIty37LgMn/Sl6+kvzWsngvCyBzpURmo9tp785iijL ZxtX5AIo1rlgFTt8TXphp3477M0P3Nfmg9R1iRJGD19631etr7IJYF4hd+x3Z5A= =meKC -END PGP SIGNATURE- ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] What makes systemd-nspawn "not suitable for secure container setups"?
On Sat, Apr 23, 2011 at 11:28:58AM +0800, microcai wrote: > 于 2011年04月23日 10:55, Josh Triplett 写道: > > The systemd-nspawn manpage lists the various mechanisms used to isolate > > the container, and then says "Note that even though these security > > precautions are taken systemd-nspawn is not suitable for secure > > container setups. Many of the security features may be circumvented and > > are hence primarily useful to avoid accidental changes to the host > > system from the container." > > > > How can a process in a systemd-nspawn container circumvent the container > > remount /proc and /sys Ah, good point. So, root inside the container can trivially circumvent the container that way. Any way to prevent that with current kernel support, or would fixing this require additional kernel changes to lock down other /proc and /sys mounts? That particular problem only applies if running code within the container as root. How about if running code as an unprivileged user? With that addition, does systemd-nspawn provide a secure container (modulo local privilege escalation vulnerabilities)? Thanks, Josh Triplett ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] What makes systemd-nspawn "not suitable for secure container setups"?
于 2011年04月23日 10:55, Josh Triplett 写道: > The systemd-nspawn manpage lists the various mechanisms used to isolate > the container, and then says "Note that even though these security > precautions are taken systemd-nspawn is not suitable for secure > container setups. Many of the security features may be circumvented and > are hence primarily useful to avoid accidental changes to the host > system from the container." > > How can a process in a systemd-nspawn container circumvent the container remount /proc and /sys > setup? What additional steps would systemd-nspawn need to take to > provide a secure container setup? > > - Josh Triplett > ___ > systemd-devel mailing list > systemd-devel@lists.freedesktop.org > http://lists.freedesktop.org/mailman/listinfo/systemd-devel ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel