Re: [systemd-devel] What makes systemd-nspawn not suitable for secure container setups?

2011-04-26 Thread Lennart Poettering
On Mon, 25.04.11 20:51, microcai (micro...@fedoraproject.org) wrote:

 于 2011年04月25日 20:43, Daniel J Walsh 写道:
  SELinux  would be a good start.
 
 No, root inside can still change SE-Linux policy.

No. The SELinux policy can forbid reloading the SELinux policy for
certain users/processes.

SELinux should work fine to secure nspawn containers.

Lennart

-- 
Lennart Poettering - Red Hat, Inc.
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] What makes systemd-nspawn not suitable for secure container setups?

2011-04-26 Thread Daniel J Walsh
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 04/26/2011 01:54 PM, Lennart Poettering wrote:
 On Mon, 25.04.11 20:51, microcai (micro...@fedoraproject.org) wrote:
 
 于 2011年04月25日 20:43, Daniel J Walsh 写道:
 SELinux  would be a good start.

 No, root inside can still change SE-Linux policy.
 
 No. The SELinux policy can forbid reloading the SELinux policy for
 certain users/processes.
 
 SELinux should work fine to secure nspawn containers.
 
 Lennart
 
Right the idea would be to run all processes within te nspawn container
with the same process label, then only allow the access required for the
container.
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org/

iEYEARECAAYFAk23B90ACgkQrlYvE4MpobNUXACgma9He3gGO6tZdv7WVwJaE0oe
mUsAoJ2GMaDRfP7hpflfS3Eqx3wEQKtM
=CqeA
-END PGP SIGNATURE-
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] What makes systemd-nspawn not suitable for secure container setups?

2011-04-25 Thread Daniel J Walsh
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 04/22/2011 07:42 PM, Josh Triplett wrote:
 The systemd-nspawn manpage lists the various mechanisms used to isolate
 the container, and then says Note that even though these security
 precautions are taken systemd-nspawn is not suitable for secure
 container setups. Many of the security features may be circumvented and
 are hence primarily useful to avoid accidental changes to the host
 system from the container.
 
 How can a process in a systemd-nspawn container circumvent the container
 setup?  What additional steps would systemd-nspawn need to take to
 provide a secure container setup?
 
 - Josh Triplett
 ___
 systemd-devel mailing list
 systemd-devel@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/systemd-devel
SELinux  would be a good start.
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org/

iEYEARECAAYFAk21bFcACgkQrlYvE4MpobNwJwCeO7xqfUTykQGDQsiJj3oAYD/4
4bIAoNJucumKU17lquo/insid7cYwCg9
=H8IP
-END PGP SIGNATURE-
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] What makes systemd-nspawn not suitable for secure container setups?

2011-04-25 Thread microcai
于 2011年04月25日 20:43, Daniel J Walsh 写道:
 SELinux  would be a good start.

No, root inside can still change SE-Linux policy.
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] What makes systemd-nspawn not suitable for secure container setups?

2011-04-24 Thread Lennart Poettering
On Fri, 22.04.11 19:55, Josh Triplett (j...@joshtriplett.org) wrote:

 The systemd-nspawn manpage lists the various mechanisms used to isolate
 the container, and then says Note that even though these security
 precautions are taken systemd-nspawn is not suitable for secure
 container setups. Many of the security features may be circumvented and
 are hence primarily useful to avoid accidental changes to the host
 system from the container.
 
 How can a process in a systemd-nspawn container circumvent the container
 setup?  What additional steps would systemd-nspawn need to take to
 provide a secure container setup?

Well, the question is of course what secure actually means...

But here's why I put this sentence in the man page:

First of all, we don't virtualize AF_UNIX abstract namespace sockets. It
is part of the network virtualization, and I explicitly decided not do
virtualize that, to simplify things, since otherwise containers need
specific network configuration, and they'd be much harder to use hence
than chroots, but the simplicity to use of chroot is what I was heading for.

Ideally AF_UNIX virtulaization would not be part of CLONE_NEWNET but of
CLONE_NEWIPC, since it is a local IPC interface, and has nothing to do
with the network, but I guess that's too late now.

Fortunately not many services use abstract namespace sockets, since they
are insecure and mostly unnecessary in most cases these days. There are
a few exceptions though: some services use randomly named unix
sockets. And there's udev. Since we don't want to run a second udev in
the container we actually benefit from this here: only the host udev can
bind the socket, hence the container udev will immediately fail.

The missing virtualization of the abstarct namespace means processes can
talk to services outside of the namespace. This has obvious
problems. And a couple of non-obvious ones on top: SCM_CREDENTIALS will
be weird due to the non-matching users and stuff.

When we enter the container we drop all capabilities, except the following:

CAP_CHOWN, CAP_DAC_OVERRIDE, CAP_DAC_READ_SEARCH, CAP_FOWNER,
CAP_FSETID, CAP_IPC_OWNER, CAP_KILL, CAP_LEASE, CAP_LINUX_IMMUTABLE,
CAP_NET_BIND_SERVICE, CAP_NET_BROADCAST, CAP_NET_RAW, CAP_SETGID,
CAP_SETFCAP, CAP_SETPCAP, CAP_SETUID, CAP_SYS_ADMIN, CAP_SYS_CHROOT,
CAP_SYS_NICE, CAP_SYS_PTRACE, CAP_SYS_TTY_CONFIG.

Due to the PID, fs and IPC namespacing a couple of these capabilities
should not be much of a problem. Except for a few cases:

- We don't virtualize the network for simplicity reasons, that means
  CAP_NET_BIND allows processes in the container to bind to any port,
  thus blocking stuff outside of the container to work. Now, it would be
  easy to remove this capability too, but this of course would still
  allow DoS high port services on the host from withing the
  container. (Consider the container blocking all ports  6000 thus
  making it impossible to run X on the host). But this one is actually
  not a big issue in the end I guess, so let's ignore it here.

- CAP_NET_RAW means that the container can sniff into the host's traffic.

- CAP_SYS_ADMIN is a grab bag of things, and is the biggie here. With this
  the container can remount /sys, /selinux and /proc/sys read-writable
  and thus influence this host massively. It can disable swap
  partitions, too, and lots and lots of other things, too.

- A couple of the FS related operations might be problematic since the
  abstract namespace sockets are not virtualized, and thus you could do
  privileged operations on fds from outside the container.

There's also currently no virtualization of the users. That means
RLIMIT_NPROC and stuff when applied in the container will also affect
the same user outside of the container. That's pretty bad...

Some of these issues require kernel support to fix properly (for example
the RLIMIT_NPROC issue). Other's we could fix in userspace probably. For
example, we might be able to make CAP_SYS_ADMIN unnecessary if we
premount really everything in the container that it might need. systemd
is already smart enough to be happy with pre-mounted directories, not
entirely sure about sysvinit though. With a bit of work we probably
could even add CLONE_NEWNET support, and automatically set up a valid
virtualized net interface for the container, that could not be
reconfigurable by the container and is always forwarded to the host, but
which buys us AF_UNIX abstract namespace virtualization and fixes the
CAP_NET_BIND issue.

With CLONE_NEWUSER in place and these changes we could probably make
things reasonably secure. But especially figuring out a way to
virtualize the network in an elegant way so that things will continue to
just work is not going to be easy.

Lennart

-- 
Lennart Poettering - Red Hat, Inc.
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] What makes systemd-nspawn not suitable for secure container setups?

2011-04-24 Thread Lennart Poettering
On Fri, 22.04.11 21:16, Josh Triplett (j...@joshtriplett.org) wrote:

 On Sat, Apr 23, 2011 at 11:28:58AM +0800, microcai wrote:
  于 2011年04月23日 10:55, Josh Triplett 写道:
   The systemd-nspawn manpage lists the various mechanisms used to isolate
   the container, and then says Note that even though these security
   precautions are taken systemd-nspawn is not suitable for secure
   container setups. Many of the security features may be circumvented and
   are hence primarily useful to avoid accidental changes to the host
   system from the container.
   
   How can a process in a systemd-nspawn container circumvent the container
  
  remount /proc and /sys
 
 Ah, good point.  So, root inside the container can trivially circumvent
 the container that way.  Any way to prevent that with current kernel
 support, or would fixing this require additional kernel changes to lock
 down other /proc and /sys mounts?

Yes, by dropping CAP_SYS_ADMIN for the container. 

As mentioned we could do that probably, but there are a lot of other
problems remaining.

 That particular problem only applies if running code within the
 container as root.  How about if running code as an unprivileged user?
 With that addition, does systemd-nspawn provide a secure container
 (modulo local privilege escalation vulnerabilities)?

You cannot boot a full system without handing out root access to a
container. But one of the advantages of nspawn is actually that it
allows you to boot a full OS inside it just like that.

Lennart

-- 
Lennart Poettering - Red Hat, Inc.
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] What makes systemd-nspawn not suitable for secure container setups?

2011-04-24 Thread Lennart Poettering
On Sat, 23.04.11 13:29, microcai (micro...@fedoraproject.org) wrote:

  Ah, good point.  So, root inside the container can trivially circumvent
  the container that way.  Any way to prevent that with current kernel
  support, or would fixing this require additional kernel changes to lock
  down other /proc and /sys mounts?
 
 
 OpenVZ is what you need that way. OpenVZ is much like systemd-nspawn,
 but with more secure. So it can be used to provide VPS ;)

I never looked in much detail into OpenVZ but quite honestly I have my
doubts that it is completely sealed off and really doesn't suffer by any
of the vulnerabilities I pointed out in my other mail.

OpenVZ is probably at a better spot than the vanilla kernel whith
container virtualization, but I think they define secure much more
losely than some folks are aware of.

Lennart

-- 
Lennart Poettering - Red Hat, Inc.
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] What makes systemd-nspawn not suitable for secure container setups?

2011-04-24 Thread Tollef Fog Heen
]] Lennart Poettering 

[...]

| (Consider the container blocking all ports  6000 thus making it
| impossible to run X on the host). But this one is actually not a big
| issue in the end I guess, so let's ignore it here.

X doesn't listen on tcp by default those days, so this shouldn't be a
problem in this specific case.

-- 
Tollef Fog Heen
UNIX is user friendly, it's just picky about who its friends are
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] What makes systemd-nspawn not suitable for secure container setups?

2011-04-22 Thread Josh Triplett
On Sat, Apr 23, 2011 at 11:28:58AM +0800, microcai wrote:
 于 2011年04月23日 10:55, Josh Triplett 写道:
  The systemd-nspawn manpage lists the various mechanisms used to isolate
  the container, and then says Note that even though these security
  precautions are taken systemd-nspawn is not suitable for secure
  container setups. Many of the security features may be circumvented and
  are hence primarily useful to avoid accidental changes to the host
  system from the container.
  
  How can a process in a systemd-nspawn container circumvent the container
 
 remount /proc and /sys

Ah, good point.  So, root inside the container can trivially circumvent
the container that way.  Any way to prevent that with current kernel
support, or would fixing this require additional kernel changes to lock
down other /proc and /sys mounts?

That particular problem only applies if running code within the
container as root.  How about if running code as an unprivileged user?
With that addition, does systemd-nspawn provide a secure container
(modulo local privilege escalation vulnerabilities)?

Thanks,
Josh Triplett
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] What makes systemd-nspawn not suitable for secure container setups?

2011-04-22 Thread microcai
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

于 2011年04月23日 12:16, Josh Triplett 写道:
 On Sat, Apr 23, 2011 at 11:28:58AM +0800, microcai wrote:
 于 2011年04月23日 10:55, Josh Triplett 写道:
 The systemd-nspawn manpage lists the various mechanisms used to isolate
 the container, and then says Note that even though these security
 precautions are taken systemd-nspawn is not suitable for secure
 container setups. Many of the security features may be circumvented and
 are hence primarily useful to avoid accidental changes to the host
 system from the container.

 How can a process in a systemd-nspawn container circumvent the container

 remount /proc and /sys
 
 Ah, good point.  So, root inside the container can trivially circumvent
 the container that way.  Any way to prevent that with current kernel
 support, or would fixing this require additional kernel changes to lock
 down other /proc and /sys mounts?


OpenVZ is what you need that way. OpenVZ is much like systemd-nspawn,
but with more secure. So it can be used to provide VPS ;)

 
 That particular problem only applies if running code within the
 container as root.  How about if running code as an unprivileged user?
 With that addition, does systemd-nspawn provide a secure container
 (modulo local privilege escalation vulnerabilities)?
 
 Thanks,
 Josh Triplett

-BEGIN PGP SIGNATURE-
Version: GnuPG v2.0.17 (GNU/Linux)

iQEcBAEBAgAGBQJNsmPCAAoJEKT4Uz7oTANZ5DEH/1xAJvN0UqGv4JNMTuy/Hl8/
P7+6BkmhbE8wXtQt37z5QQNaDoNKNiTrdkppPWboFCsf4ulZyf02jkJGqN0BJoWg
IC9xTWv2dE8RK+r3cnD1Nx0jpHuTq56Bo/W1UGeY+JKKNC/Ox8M81i+7M8xKrOB7
zhNnElNRTnHOHmzqSlcC1ODMnDw69lVpxZ0HusxpTAKLp1ms49PlhnFcXokHsD6/
GwhSNR7zjlimxUvoVbOPXqiIty37LgMn/Sl6+kvzWsngvCyBzpURmo9tp785iijL
ZxtX5AIo1rlgFTt8TXphp3477M0P3Nfmg9R1iRJGD19631etr7IJYF4hd+x3Z5A=
=meKC
-END PGP SIGNATURE-
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel