Re: [systemd-devel] systemd-nspawn containers

2016-11-11 Thread Michał Zegan
well you can read user_namespaces(7), the beginning of it at least. it
probably says something about keyrings. so either this info is
incorrect, or I for example understand it wrongly, or whatever.
Also, you know, when you say that currently containers have holes and so
are still not really secure I don't actually see any example of that
except this small number of things you just cannot do there at all (for
example use/access audit or use fuse/file capabilities), and those like
cgroups that are work in progress at this very moment. Well, file caps
are also work in progress at the moment I believe, I saw some patches
lately. I don't see such problems probably because I am not a security
expert and I am not working with any kind of servers/containers in
production, this technology is just extremely interesting for me.

W dniu 11.11.2016 o 19:41, Lennart Poettering pisze:
> On Fri, 11.11.16 19:36, Michał Zegan (webczat_...@poczta.onet.pl) wrote:
> 
>> Why do you turn off keyrings? at least manpages say that userns
>> virtualizes keyrings or something similar...
> 
> That'd be a new feature then...
> 
> Lennart
> 



signature.asc
Description: OpenPGP digital signature
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] systemd-nspawn containers

2016-11-11 Thread Michał Zegan
Why do you turn off keyrings? at least manpages say that userns
virtualizes keyrings or something similar...

W dniu 11.11.2016 o 19:24, Lennart Poettering pisze:
> On Fri, 11.11.16 19:21, Michał Zegan (webczat_...@poczta.onet.pl) wrote:
> 
>> audit/autofs are not properly virtualized, I know. But I thought
>> keyrings and cgroups are.
> 
> most container managers turn off keyrings entirely (as we do in nspawn
> actually).
> 
> delegating controllers in cgroupsv1 is unsafe, if you do it the
> container can make the system hang easily.
> 
> delegating controllers in cgroupvs2 is safe, but cgroupsv2 are
> incomplete as of now, the most relevant controller (cpu) is not
> available for it yet.
> 
> Lennart
> 



signature.asc
Description: OpenPGP digital signature
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] systemd-nspawn containers

2016-11-11 Thread Lennart Poettering
On Fri, 11.11.16 19:36, Michał Zegan (webczat_...@poczta.onet.pl) wrote:

> Why do you turn off keyrings? at least manpages say that userns
> virtualizes keyrings or something similar...

That'd be a new feature then...

Lennart

-- 
Lennart Poettering, Red Hat
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] systemd-nspawn containers

2016-11-11 Thread Lennart Poettering
On Fri, 11.11.16 19:21, Michał Zegan (webczat_...@poczta.onet.pl) wrote:

> audit/autofs are not properly virtualized, I know. But I thought
> keyrings and cgroups are.

most container managers turn off keyrings entirely (as we do in nspawn
actually).

delegating controllers in cgroupsv1 is unsafe, if you do it the
container can make the system hang easily.

delegating controllers in cgroupvs2 is safe, but cgroupsv2 are
incomplete as of now, the most relevant controller (cpu) is not
available for it yet.

Lennart

-- 
Lennart Poettering, Red Hat
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] systemd-nspawn containers

2016-11-11 Thread Michał Zegan
audit/autofs are not properly virtualized, I know. But I thought
keyrings and cgroups are.

W dniu 11.11.2016 o 18:28, Lennart Poettering pisze:
> On Fri, 11.11.16 16:41, Michał Zegan (webczat_...@poczta.onet.pl) wrote:
> 
>> Thank you for your answers!
>>
>> What I meant by secure containers is mostly, containers that are or will
>> be secure enough to use them for things like virtual private server
>> hosting. Is nspawn intended to be usable for such things in the future,
>> or maybe it already is, or whatever?
> 
> I run my own server this way, already as an exercise of dogfooding.
> 
> So, yes, running a VPS like this certainly works, but do note that
> nspawn doesn't do orchestration or anything. It's good enough for me,
> but if you needy fancy orchestration tools then nspawn won't be
> sufficient.
> 
>> What kernel limitations do you mean when you say about security?
> 
> Well, a lot of subsystems cannot be locked down properly for use in
> containers yet. You can lock down a lot, in particular if you use
> userns, but there are still a lot of holes in there, and in particular
> userns itself has been a major source of CVEs alone in the most recent
> kernels.
> 
> Right now, "containers" in general are not about security. Some
> companies claim they were secure, but they really aren't. And that's
> not a bug in nspawn, or docker, or lxc for that matter, it's simply a
> limiation of the kernel.
> 
> Or to say this differently: we'll do in nspawn everything we can to
> lock things down properly, but there are limits based on what the
> kernel provides... As the kernel gets improved in this area, we'll
> update nspawn to make use of it. We are sitting in the same boat in
> this regard as others container managers, and they have the same
> limits more or less we have.
> 
>> For now I know that in full containers with userns file capabilities do
>> not work (I think), you have no virtualized /proc/meminfo and friends
>> (do cgroup namespaces give a chance to change that?), you cannot mknod
>> devices (no whitelist possible at this level), no fuse support, no
>> automatic uid shifting kernel level, no possibility to mount physical
>> filesystems in userns, and no possibility to have selinux/etc per
>> container. Do you mean such limitations or something else?
> 
> Well, devices are not virtualized at all (with the exception of
> network devices), that means no udev, not hotplug events and so
> on. Some container managers ignore this, and provide access to
> selected device nodes anyway, but we don't do something like that in
> nspawn, since it's pretty broken (as /sys wouldn't match what you see
> in /dev). In general, I think people should just accept that
> containers mean "you don't get physical device access". And if you
> want physical device access, then don't use containers...
> 
>> I am interested in this topic but it is quite hard for me to track
>> progress in that area (kernel side) even though I subscribe in some
>> kernel ml's and know at least about submitted patches, or some of
>> them. What else is missing that I didn't say about that would be
>> good to have?
> 
> Well, a lot of stuff is still not properly virtualized. To mind come
> audit, autofs, keyring, cgroups, …
> 
>> Also what about setting cgroup parameters per container? nspawn does not
>> allow doing that, and you probably do not intent it to be done by
>> overriding container's scope unit settings, for example?
> 
> You can actually do that just fine. Simply set it in the nspawn  service
> file. Or if you run nspawn from the cmdline with the "-p" switch. Or
> make your changes dynamically via "systemctl set-property". It's all
> supported and works well.
> 
> Lennart
> 



signature.asc
Description: OpenPGP digital signature
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] systemd-nspawn containers

2016-11-11 Thread Lennart Poettering
On Fri, 11.11.16 16:41, Michał Zegan (webczat_...@poczta.onet.pl) wrote:

> Thank you for your answers!
> 
> What I meant by secure containers is mostly, containers that are or will
> be secure enough to use them for things like virtual private server
> hosting. Is nspawn intended to be usable for such things in the future,
> or maybe it already is, or whatever?

I run my own server this way, already as an exercise of dogfooding.

So, yes, running a VPS like this certainly works, but do note that
nspawn doesn't do orchestration or anything. It's good enough for me,
but if you needy fancy orchestration tools then nspawn won't be
sufficient.

> What kernel limitations do you mean when you say about security?

Well, a lot of subsystems cannot be locked down properly for use in
containers yet. You can lock down a lot, in particular if you use
userns, but there are still a lot of holes in there, and in particular
userns itself has been a major source of CVEs alone in the most recent
kernels.

Right now, "containers" in general are not about security. Some
companies claim they were secure, but they really aren't. And that's
not a bug in nspawn, or docker, or lxc for that matter, it's simply a
limiation of the kernel.

Or to say this differently: we'll do in nspawn everything we can to
lock things down properly, but there are limits based on what the
kernel provides... As the kernel gets improved in this area, we'll
update nspawn to make use of it. We are sitting in the same boat in
this regard as others container managers, and they have the same
limits more or less we have.

> For now I know that in full containers with userns file capabilities do
> not work (I think), you have no virtualized /proc/meminfo and friends
> (do cgroup namespaces give a chance to change that?), you cannot mknod
> devices (no whitelist possible at this level), no fuse support, no
> automatic uid shifting kernel level, no possibility to mount physical
> filesystems in userns, and no possibility to have selinux/etc per
> container. Do you mean such limitations or something else?

Well, devices are not virtualized at all (with the exception of
network devices), that means no udev, not hotplug events and so
on. Some container managers ignore this, and provide access to
selected device nodes anyway, but we don't do something like that in
nspawn, since it's pretty broken (as /sys wouldn't match what you see
in /dev). In general, I think people should just accept that
containers mean "you don't get physical device access". And if you
want physical device access, then don't use containers...

> I am interested in this topic but it is quite hard for me to track
> progress in that area (kernel side) even though I subscribe in some
> kernel ml's and know at least about submitted patches, or some of
> them. What else is missing that I didn't say about that would be
> good to have?

Well, a lot of stuff is still not properly virtualized. To mind come
audit, autofs, keyring, cgroups, …

> Also what about setting cgroup parameters per container? nspawn does not
> allow doing that, and you probably do not intent it to be done by
> overriding container's scope unit settings, for example?

You can actually do that just fine. Simply set it in the nspawn  service
file. Or if you run nspawn from the cmdline with the "-p" switch. Or
make your changes dynamically via "systemctl set-property". It's all
supported and works well.

Lennart

-- 
Lennart Poettering, Red Hat
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] systemd-nspawn containers

2016-11-11 Thread Michał Zegan
Thank you for your answers!

What I meant by secure containers is mostly, containers that are or will
be secure enough to use them for things like virtual private server
hosting. Is nspawn intended to be usable for such things in the future,
or maybe it already is, or whatever?
What kernel limitations do you mean when you say about security?
For now I know that in full containers with userns file capabilities do
not work (I think), you have no virtualized /proc/meminfo and friends
(do cgroup namespaces give a chance to change that?), you cannot mknod
devices (no whitelist possible at this level), no fuse support, no
automatic uid shifting kernel level, no possibility to mount physical
filesystems in userns, and no possibility to have selinux/etc per
container. Do you mean such limitations or something else? I am
interested in this topic but it is quite hard for me to track progress
in that area (kernel side) even though I subscribe in some kernel ml's
and know at least about submitted patches, or some of them. What else is
missing that I didn't say about that would be good to have?

Also what about setting cgroup parameters per container? nspawn does not
allow doing that, and you probably do not intent it to be done by
overriding container's scope unit settings, for example?

W dniu 11.11.2016 o 13:52, Lennart Poettering pisze:
> On Wed, 09.11.16 18:24, Michał Zegan (webczat_...@poczta.onet.pl) wrote:
> 
>> Hello.
>>
>> Does systemd-nspawn intent to be a full secure container technology? or
>> it maybe already is? what is missing?
> 
> I am not sure what "full secure container technology" realls is
> supposed to mean.
> 
> nspawn right now is great for two things:
> 
> a) full OS containers (think VMs, except based on container
>technology. This means that inside the container you have a proper
>PID 1 running, and a network configuration daemon and most other
>things that would run on a normal, physical system, except one
>thing: no device manager, as the kernel does not virtualize
>devices)
> 
> b) as a building block for whatever you want it to be. It's a pretty
>generic tool, you can use as base for anything you like. The "rkt"
>container manager makes use of this facet.
> 
> There are a number of things nspawn is better at than other container
> managers, for example in conjunction with networkd networking happens
> pretty much entirely automatically out of the box. It also ships
> userns support that is relatively usable without much manual
> intervention. OTOH it clearly doesn't do a lot of stuff that other
> container managers do and we have no intention to ever do: do IP level
> configuration in the manager itself, support for ZFS and other exotic
> (possibly out-of-tree) storage technology, and so on.
> 
> So it really depends what you mean by "full secure container
> technology". We do a lot, we will add more, but there are also things
> I don't see on our list at all.
> 
> (And "secure" is a difficult thing anyway, currently security of
> containers on Linux is pretty limited in general, due to kernel
> limitations.)
> 
> Lennart
> 



signature.asc
Description: OpenPGP digital signature
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] systemd-nspawn containers

2016-11-11 Thread Lennart Poettering
On Wed, 09.11.16 18:24, Michał Zegan (webczat_...@poczta.onet.pl) wrote:

> Hello.
> 
> Does systemd-nspawn intent to be a full secure container technology? or
> it maybe already is? what is missing?

I am not sure what "full secure container technology" realls is
supposed to mean.

nspawn right now is great for two things:

a) full OS containers (think VMs, except based on container
   technology. This means that inside the container you have a proper
   PID 1 running, and a network configuration daemon and most other
   things that would run on a normal, physical system, except one
   thing: no device manager, as the kernel does not virtualize
   devices)

b) as a building block for whatever you want it to be. It's a pretty
   generic tool, you can use as base for anything you like. The "rkt"
   container manager makes use of this facet.

There are a number of things nspawn is better at than other container
managers, for example in conjunction with networkd networking happens
pretty much entirely automatically out of the box. It also ships
userns support that is relatively usable without much manual
intervention. OTOH it clearly doesn't do a lot of stuff that other
container managers do and we have no intention to ever do: do IP level
configuration in the manager itself, support for ZFS and other exotic
(possibly out-of-tree) storage technology, and so on.

So it really depends what you mean by "full secure container
technology". We do a lot, we will add more, but there are also things
I don't see on our list at all.

(And "secure" is a difficult thing anyway, currently security of
containers on Linux is pretty limited in general, due to kernel
limitations.)

Lennart

-- 
Lennart Poettering, Red Hat
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel