Re: [RFC v2 09/10] landlock: Handle cgroups (performance)

2016-08-30 Thread Andy Lutomirski
On Tue, Aug 30, 2016 at 6:36 PM, Alexei Starovoitov
 wrote:
> On Tue, Aug 30, 2016 at 02:45:14PM -0700, Andy Lutomirski wrote:
>>
>> One might argue that landlock shouldn't be tied to seccomp (in theory,
>> attached progs could be given access to syscall_get_xyz()), but I
>
> proposed lsm is way more powerful than syscall_get_xyz.
> no need to dumb it down.

I think you're misunderstanding me.

Mickaël's code allows one to make the LSM hook filters depend on the
syscall using SECCOMP_RET_LANDLOCK.  I'm suggesting that a similar
effect could be achieved by allowing the eBPF LSM hook to call
syscall_get_xyz() if it wants to.

>
>> think that the seccomp attachment mechanism is the right way to
>> install unprivileged filters.  It handles the no_new_privs stuff, it
>> allows TSYNC, it's totally independent of systemwide policy, etc.
>>
>> Trying to use cgroups or similar for this is going to be much nastier.
>> Some tighter sandboxes (Sandstorm, etc) aren't even going to dream of
>> putting cgroupfs in their containers, so requiring cgroups or similar
>> would be a mess for that type of application.
>
> I don't see why it is a 'mess'. cgroups are already used by majority
> of the systems, so I don't see why requiring a cgroup is such a big deal.

Requiring cgroup to be configured in isn't a big deal.  Requiring

> But let's say we don't do them. How implementation is going to look like
> for task based hierarchy? Note that we need an array of bpf_prog pointers.
> One for each lsm hook. Where this array is going to be stored?
> We cannot put in task_struct, since it's too large. Cannot put it
> into 'struct seccomp' directly either, unless it will become a pointer.
> Is that the proposal?

It would go in struct seccomp_filter or in something pointed to from there.

> So now we will be wasting extra 1kbyte of memory per task. Not great.
> We'd want to optimize it by sharing this such struct seccomp with prog array
> across threads of the same task? Or dynimically allocating it when
> landlock is in use? May sound nice, but how to account for that kernel
> memory? I guess also solvable by charging memlock.
> With cgroup based approach we don't need to worry about all that.
>

The considerations are essentially identical either way.

With cgroups, if you want to share the memory between multiple
separate sandboxes (Firejail instances, Sandstorm grains, Chromium
instances, xdg-apps, etc), you'd need to get them to all coordinate to
share a cgroup.  With a seccomp-like interface, you'd need to get them
to coordinate to share an installed layer (using my FD idea or
similar).

There would *not* be any duplication of this memory just because a
sandboxed process called fork().

--Andy

-- 
Andy Lutomirski
AMA Capital Management, LLC


Re: [RFC v2 09/10] landlock: Handle cgroups (performance)

2016-08-30 Thread Andy Lutomirski
On Tue, Aug 30, 2016 at 6:36 PM, Alexei Starovoitov
 wrote:
> On Tue, Aug 30, 2016 at 02:45:14PM -0700, Andy Lutomirski wrote:
>>
>> One might argue that landlock shouldn't be tied to seccomp (in theory,
>> attached progs could be given access to syscall_get_xyz()), but I
>
> proposed lsm is way more powerful than syscall_get_xyz.
> no need to dumb it down.

I think you're misunderstanding me.

Mickaël's code allows one to make the LSM hook filters depend on the
syscall using SECCOMP_RET_LANDLOCK.  I'm suggesting that a similar
effect could be achieved by allowing the eBPF LSM hook to call
syscall_get_xyz() if it wants to.

>
>> think that the seccomp attachment mechanism is the right way to
>> install unprivileged filters.  It handles the no_new_privs stuff, it
>> allows TSYNC, it's totally independent of systemwide policy, etc.
>>
>> Trying to use cgroups or similar for this is going to be much nastier.
>> Some tighter sandboxes (Sandstorm, etc) aren't even going to dream of
>> putting cgroupfs in their containers, so requiring cgroups or similar
>> would be a mess for that type of application.
>
> I don't see why it is a 'mess'. cgroups are already used by majority
> of the systems, so I don't see why requiring a cgroup is such a big deal.

Requiring cgroup to be configured in isn't a big deal.  Requiring

> But let's say we don't do them. How implementation is going to look like
> for task based hierarchy? Note that we need an array of bpf_prog pointers.
> One for each lsm hook. Where this array is going to be stored?
> We cannot put in task_struct, since it's too large. Cannot put it
> into 'struct seccomp' directly either, unless it will become a pointer.
> Is that the proposal?

It would go in struct seccomp_filter or in something pointed to from there.

> So now we will be wasting extra 1kbyte of memory per task. Not great.
> We'd want to optimize it by sharing this such struct seccomp with prog array
> across threads of the same task? Or dynimically allocating it when
> landlock is in use? May sound nice, but how to account for that kernel
> memory? I guess also solvable by charging memlock.
> With cgroup based approach we don't need to worry about all that.
>

The considerations are essentially identical either way.

With cgroups, if you want to share the memory between multiple
separate sandboxes (Firejail instances, Sandstorm grains, Chromium
instances, xdg-apps, etc), you'd need to get them to all coordinate to
share a cgroup.  With a seccomp-like interface, you'd need to get them
to coordinate to share an installed layer (using my FD idea or
similar).

There would *not* be any duplication of this memory just because a
sandboxed process called fork().

--Andy

-- 
Andy Lutomirski
AMA Capital Management, LLC


Re: [RFC v2 09/10] landlock: Handle cgroups (performance)

2016-08-30 Thread Alexei Starovoitov
On Tue, Aug 30, 2016 at 02:45:14PM -0700, Andy Lutomirski wrote:
> 
> One might argue that landlock shouldn't be tied to seccomp (in theory,
> attached progs could be given access to syscall_get_xyz()), but I

proposed lsm is way more powerful than syscall_get_xyz.
no need to dumb it down.

> think that the seccomp attachment mechanism is the right way to
> install unprivileged filters.  It handles the no_new_privs stuff, it
> allows TSYNC, it's totally independent of systemwide policy, etc.
> 
> Trying to use cgroups or similar for this is going to be much nastier.
> Some tighter sandboxes (Sandstorm, etc) aren't even going to dream of
> putting cgroupfs in their containers, so requiring cgroups or similar
> would be a mess for that type of application.

I don't see why it is a 'mess'. cgroups are already used by majority
of the systems, so I don't see why requiring a cgroup is such a big deal.
But let's say we don't do them. How implementation is going to look like
for task based hierarchy? Note that we need an array of bpf_prog pointers.
One for each lsm hook. Where this array is going to be stored?
We cannot put in task_struct, since it's too large. Cannot put it
into 'struct seccomp' directly either, unless it will become a pointer.
Is that the proposal?
So now we will be wasting extra 1kbyte of memory per task. Not great.
We'd want to optimize it by sharing this such struct seccomp with prog array
across threads of the same task? Or dynimically allocating it when
landlock is in use? May sound nice, but how to account for that kernel
memory? I guess also solvable by charging memlock.
With cgroup based approach we don't need to worry about all that.



Re: [RFC v2 09/10] landlock: Handle cgroups (performance)

2016-08-30 Thread Alexei Starovoitov
On Tue, Aug 30, 2016 at 02:45:14PM -0700, Andy Lutomirski wrote:
> 
> One might argue that landlock shouldn't be tied to seccomp (in theory,
> attached progs could be given access to syscall_get_xyz()), but I

proposed lsm is way more powerful than syscall_get_xyz.
no need to dumb it down.

> think that the seccomp attachment mechanism is the right way to
> install unprivileged filters.  It handles the no_new_privs stuff, it
> allows TSYNC, it's totally independent of systemwide policy, etc.
> 
> Trying to use cgroups or similar for this is going to be much nastier.
> Some tighter sandboxes (Sandstorm, etc) aren't even going to dream of
> putting cgroupfs in their containers, so requiring cgroups or similar
> would be a mess for that type of application.

I don't see why it is a 'mess'. cgroups are already used by majority
of the systems, so I don't see why requiring a cgroup is such a big deal.
But let's say we don't do them. How implementation is going to look like
for task based hierarchy? Note that we need an array of bpf_prog pointers.
One for each lsm hook. Where this array is going to be stored?
We cannot put in task_struct, since it's too large. Cannot put it
into 'struct seccomp' directly either, unless it will become a pointer.
Is that the proposal?
So now we will be wasting extra 1kbyte of memory per task. Not great.
We'd want to optimize it by sharing this such struct seccomp with prog array
across threads of the same task? Or dynimically allocating it when
landlock is in use? May sound nice, but how to account for that kernel
memory? I guess also solvable by charging memlock.
With cgroup based approach we don't need to worry about all that.



Re: [RFC v2 09/10] landlock: Handle cgroups (performance)

2016-08-30 Thread Andy Lutomirski
On Aug 30, 2016 1:56 PM, "Alexei Starovoitov"
 wrote:
>
> On Tue, Aug 30, 2016 at 10:33:31PM +0200, Mickaël Salaün wrote:
> >
> >
> > On 30/08/2016 22:23, Andy Lutomirski wrote:
> > > On Tue, Aug 30, 2016 at 1:20 PM, Mickaël Salaün  wrote:
> > >>
> > >> On 30/08/2016 20:55, Andy Lutomirski wrote:
> > >>> On Sun, Aug 28, 2016 at 2:42 AM, Mickaël Salaün  
> > >>> wrote:
> > 
> > 
> >  On 28/08/2016 10:13, Andy Lutomirski wrote:
> > > On Aug 27, 2016 11:14 PM, "Mickaël Salaün"  wrote:
> > >>
> > >>
> > >> On 27/08/2016 22:43, Alexei Starovoitov wrote:
> > >>> On Sat, Aug 27, 2016 at 09:35:14PM +0200, Mickaël Salaün wrote:
> >  On 27/08/2016 20:06, Alexei Starovoitov wrote:
> > > On Sat, Aug 27, 2016 at 04:06:38PM +0200, Mickaël Salaün wrote:
> > >> As said above, Landlock will not run an eBPF programs when not 
> > >> strictly
> > >> needed. Attaching to a cgroup will have the same performance 
> > >> impact as
> > >> attaching to a process hierarchy.
> > >
> > > Having a prog per cgroup per lsm_hook is the only scalable way I
> > > could come up with. If you see another way, please propose.
> > > current->seccomp.landlock_prog is not the answer.
> > 
> >  Hum, I don't see the difference from a performance point of view 
> >  between
> >  a cgroup-based or a process hierarchy-based system.
> > 
> >  Maybe a better option should be to use an array of pointers with N
> >  entries, one for each supported hook, instead of a unique pointer 
> >  list?
> > >>>
> > >>> yes, clearly array dereference is faster than link list walk.
> > >>> Now the question is where to keep this prog_array[num_lsm_hooks] ?
> > >>> Since we cannot keep it inside task_struct, we have to allocate it.
> > >>> Every time the task is creted then. What to do on the fork? That
> > >>> will require changes all over. Then the obvious optimization would 
> > >>> be
> > >>> to share this allocated array of prog pointers across multiple 
> > >>> tasks...
> > >>> and little by little this new facility will look like cgroup.
> > >>> Hence the suggestion to put this array into cgroup from the start.
> > >>
> > >> I see your point :)
> > >>
> > >>>
> >  Anyway, being able to attach an LSM hook program to a cgroup 
> >  thanks to
> >  the new BPF_PROG_ATTACH seems a good idea (while keeping the 
> >  possibility
> >  to use a process hierarchy). The downside will be to handle an LSM 
> >  hook
> >  program which is not triggered by a seccomp-filter, but this 
> >  should be
> >  needed anyway to handle interruptions.
> > >>>
> > >>> what do you mean 'not triggered by seccomp' ?
> > >>> You're not suggesting that this lsm has to enable seccomp to be 
> > >>> functional?
> > >>> imo that's non starter due to overhead.
> > >>
> > >> Yes, for now, it is triggered by a new seccomp filter return value
> > >> RET_LANDLOCK, which can take a 16-bit value called cookie. This must 
> > >> not
> > >> be needed but could be useful to bind a seccomp filter security 
> > >> policy
> > >> with a Landlock one. Waiting for Kees's point of view…
> > >>
> > >
> > > I'm not Kees, but I'd be okay with that.  I still think that doing
> > > this by process hierarchy a la seccomp will be easier to use and to
> > > understand (which is quite important for this kind of work) than doing
> > > it by cgroup.
> > >
> > > A feature I've wanted to add for a while is to have an fd that
> > > represents a seccomp layer, the idea being that you would set up your
> > > seccomp layer (with syscall filter, landlock hooks, etc) and then you
> > > would have a syscall to install that layer.  Then an unprivileged
> > > sandbox manager could set up its layer and still be able to inject new
> > > processes into it later on, no cgroups needed.
> > 
> >  A nice thing I didn't highlight about Landlock is that a process can
> >  prepare a layer of rules (arraymap of handles + Landlock programs) and
> >  pass the file descriptors of the Landlock programs to another process.
> >  This process could then apply this programs to get sandboxed. However,
> >  for now, because a Landlock program is only triggered by a seccomp
> >  filter (which do not follow the Landlock programs as a FD), they will 
> >  be
> >  useless.
> > 
> >  The FD referring to an arraymap of handles can also be used to update a
> >  map and change the behavior of a Landlock program. A master process can
> >  then add or remove restrictions to another process hierarchy on the 
> >  fly.
> 

Re: [RFC v2 09/10] landlock: Handle cgroups (performance)

2016-08-30 Thread Andy Lutomirski
On Aug 30, 2016 1:56 PM, "Alexei Starovoitov"
 wrote:
>
> On Tue, Aug 30, 2016 at 10:33:31PM +0200, Mickaël Salaün wrote:
> >
> >
> > On 30/08/2016 22:23, Andy Lutomirski wrote:
> > > On Tue, Aug 30, 2016 at 1:20 PM, Mickaël Salaün  wrote:
> > >>
> > >> On 30/08/2016 20:55, Andy Lutomirski wrote:
> > >>> On Sun, Aug 28, 2016 at 2:42 AM, Mickaël Salaün  
> > >>> wrote:
> > 
> > 
> >  On 28/08/2016 10:13, Andy Lutomirski wrote:
> > > On Aug 27, 2016 11:14 PM, "Mickaël Salaün"  wrote:
> > >>
> > >>
> > >> On 27/08/2016 22:43, Alexei Starovoitov wrote:
> > >>> On Sat, Aug 27, 2016 at 09:35:14PM +0200, Mickaël Salaün wrote:
> >  On 27/08/2016 20:06, Alexei Starovoitov wrote:
> > > On Sat, Aug 27, 2016 at 04:06:38PM +0200, Mickaël Salaün wrote:
> > >> As said above, Landlock will not run an eBPF programs when not 
> > >> strictly
> > >> needed. Attaching to a cgroup will have the same performance 
> > >> impact as
> > >> attaching to a process hierarchy.
> > >
> > > Having a prog per cgroup per lsm_hook is the only scalable way I
> > > could come up with. If you see another way, please propose.
> > > current->seccomp.landlock_prog is not the answer.
> > 
> >  Hum, I don't see the difference from a performance point of view 
> >  between
> >  a cgroup-based or a process hierarchy-based system.
> > 
> >  Maybe a better option should be to use an array of pointers with N
> >  entries, one for each supported hook, instead of a unique pointer 
> >  list?
> > >>>
> > >>> yes, clearly array dereference is faster than link list walk.
> > >>> Now the question is where to keep this prog_array[num_lsm_hooks] ?
> > >>> Since we cannot keep it inside task_struct, we have to allocate it.
> > >>> Every time the task is creted then. What to do on the fork? That
> > >>> will require changes all over. Then the obvious optimization would 
> > >>> be
> > >>> to share this allocated array of prog pointers across multiple 
> > >>> tasks...
> > >>> and little by little this new facility will look like cgroup.
> > >>> Hence the suggestion to put this array into cgroup from the start.
> > >>
> > >> I see your point :)
> > >>
> > >>>
> >  Anyway, being able to attach an LSM hook program to a cgroup 
> >  thanks to
> >  the new BPF_PROG_ATTACH seems a good idea (while keeping the 
> >  possibility
> >  to use a process hierarchy). The downside will be to handle an LSM 
> >  hook
> >  program which is not triggered by a seccomp-filter, but this 
> >  should be
> >  needed anyway to handle interruptions.
> > >>>
> > >>> what do you mean 'not triggered by seccomp' ?
> > >>> You're not suggesting that this lsm has to enable seccomp to be 
> > >>> functional?
> > >>> imo that's non starter due to overhead.
> > >>
> > >> Yes, for now, it is triggered by a new seccomp filter return value
> > >> RET_LANDLOCK, which can take a 16-bit value called cookie. This must 
> > >> not
> > >> be needed but could be useful to bind a seccomp filter security 
> > >> policy
> > >> with a Landlock one. Waiting for Kees's point of view…
> > >>
> > >
> > > I'm not Kees, but I'd be okay with that.  I still think that doing
> > > this by process hierarchy a la seccomp will be easier to use and to
> > > understand (which is quite important for this kind of work) than doing
> > > it by cgroup.
> > >
> > > A feature I've wanted to add for a while is to have an fd that
> > > represents a seccomp layer, the idea being that you would set up your
> > > seccomp layer (with syscall filter, landlock hooks, etc) and then you
> > > would have a syscall to install that layer.  Then an unprivileged
> > > sandbox manager could set up its layer and still be able to inject new
> > > processes into it later on, no cgroups needed.
> > 
> >  A nice thing I didn't highlight about Landlock is that a process can
> >  prepare a layer of rules (arraymap of handles + Landlock programs) and
> >  pass the file descriptors of the Landlock programs to another process.
> >  This process could then apply this programs to get sandboxed. However,
> >  for now, because a Landlock program is only triggered by a seccomp
> >  filter (which do not follow the Landlock programs as a FD), they will 
> >  be
> >  useless.
> > 
> >  The FD referring to an arraymap of handles can also be used to update a
> >  map and change the behavior of a Landlock program. A master process can
> >  then add or remove restrictions to another process hierarchy on the 
> >  fly.
> > >>>
> > >>> Maybe this could be extended a little bit.  The fd could hold the
> > 

Re: [RFC v2 09/10] landlock: Handle cgroups (performance)

2016-08-30 Thread Alexei Starovoitov
On Tue, Aug 30, 2016 at 10:33:31PM +0200, Mickaël Salaün wrote:
> 
> 
> On 30/08/2016 22:23, Andy Lutomirski wrote:
> > On Tue, Aug 30, 2016 at 1:20 PM, Mickaël Salaün  wrote:
> >>
> >> On 30/08/2016 20:55, Andy Lutomirski wrote:
> >>> On Sun, Aug 28, 2016 at 2:42 AM, Mickaël Salaün  wrote:
> 
> 
>  On 28/08/2016 10:13, Andy Lutomirski wrote:
> > On Aug 27, 2016 11:14 PM, "Mickaël Salaün"  wrote:
> >>
> >>
> >> On 27/08/2016 22:43, Alexei Starovoitov wrote:
> >>> On Sat, Aug 27, 2016 at 09:35:14PM +0200, Mickaël Salaün wrote:
>  On 27/08/2016 20:06, Alexei Starovoitov wrote:
> > On Sat, Aug 27, 2016 at 04:06:38PM +0200, Mickaël Salaün wrote:
> >> As said above, Landlock will not run an eBPF programs when not 
> >> strictly
> >> needed. Attaching to a cgroup will have the same performance 
> >> impact as
> >> attaching to a process hierarchy.
> >
> > Having a prog per cgroup per lsm_hook is the only scalable way I
> > could come up with. If you see another way, please propose.
> > current->seccomp.landlock_prog is not the answer.
> 
>  Hum, I don't see the difference from a performance point of view 
>  between
>  a cgroup-based or a process hierarchy-based system.
> 
>  Maybe a better option should be to use an array of pointers with N
>  entries, one for each supported hook, instead of a unique pointer 
>  list?
> >>>
> >>> yes, clearly array dereference is faster than link list walk.
> >>> Now the question is where to keep this prog_array[num_lsm_hooks] ?
> >>> Since we cannot keep it inside task_struct, we have to allocate it.
> >>> Every time the task is creted then. What to do on the fork? That
> >>> will require changes all over. Then the obvious optimization would be
> >>> to share this allocated array of prog pointers across multiple 
> >>> tasks...
> >>> and little by little this new facility will look like cgroup.
> >>> Hence the suggestion to put this array into cgroup from the start.
> >>
> >> I see your point :)
> >>
> >>>
>  Anyway, being able to attach an LSM hook program to a cgroup thanks 
>  to
>  the new BPF_PROG_ATTACH seems a good idea (while keeping the 
>  possibility
>  to use a process hierarchy). The downside will be to handle an LSM 
>  hook
>  program which is not triggered by a seccomp-filter, but this should 
>  be
>  needed anyway to handle interruptions.
> >>>
> >>> what do you mean 'not triggered by seccomp' ?
> >>> You're not suggesting that this lsm has to enable seccomp to be 
> >>> functional?
> >>> imo that's non starter due to overhead.
> >>
> >> Yes, for now, it is triggered by a new seccomp filter return value
> >> RET_LANDLOCK, which can take a 16-bit value called cookie. This must 
> >> not
> >> be needed but could be useful to bind a seccomp filter security policy
> >> with a Landlock one. Waiting for Kees's point of view…
> >>
> >
> > I'm not Kees, but I'd be okay with that.  I still think that doing
> > this by process hierarchy a la seccomp will be easier to use and to
> > understand (which is quite important for this kind of work) than doing
> > it by cgroup.
> >
> > A feature I've wanted to add for a while is to have an fd that
> > represents a seccomp layer, the idea being that you would set up your
> > seccomp layer (with syscall filter, landlock hooks, etc) and then you
> > would have a syscall to install that layer.  Then an unprivileged
> > sandbox manager could set up its layer and still be able to inject new
> > processes into it later on, no cgroups needed.
> 
>  A nice thing I didn't highlight about Landlock is that a process can
>  prepare a layer of rules (arraymap of handles + Landlock programs) and
>  pass the file descriptors of the Landlock programs to another process.
>  This process could then apply this programs to get sandboxed. However,
>  for now, because a Landlock program is only triggered by a seccomp
>  filter (which do not follow the Landlock programs as a FD), they will be
>  useless.
> 
>  The FD referring to an arraymap of handles can also be used to update a
>  map and change the behavior of a Landlock program. A master process can
>  then add or remove restrictions to another process hierarchy on the fly.
> >>>
> >>> Maybe this could be extended a little bit.  The fd could hold the
> >>> seccomp filter *and* the LSM hook filters.  FMODE_EXECUTE could give
> >>> the ability to install it and FMODE_WRITE could give the ability to
> >>> modify it.
> >>>
> >>
> >> This is interesting! It should be possible to append the seccomp 

Re: [RFC v2 09/10] landlock: Handle cgroups (performance)

2016-08-30 Thread Alexei Starovoitov
On Tue, Aug 30, 2016 at 10:33:31PM +0200, Mickaël Salaün wrote:
> 
> 
> On 30/08/2016 22:23, Andy Lutomirski wrote:
> > On Tue, Aug 30, 2016 at 1:20 PM, Mickaël Salaün  wrote:
> >>
> >> On 30/08/2016 20:55, Andy Lutomirski wrote:
> >>> On Sun, Aug 28, 2016 at 2:42 AM, Mickaël Salaün  wrote:
> 
> 
>  On 28/08/2016 10:13, Andy Lutomirski wrote:
> > On Aug 27, 2016 11:14 PM, "Mickaël Salaün"  wrote:
> >>
> >>
> >> On 27/08/2016 22:43, Alexei Starovoitov wrote:
> >>> On Sat, Aug 27, 2016 at 09:35:14PM +0200, Mickaël Salaün wrote:
>  On 27/08/2016 20:06, Alexei Starovoitov wrote:
> > On Sat, Aug 27, 2016 at 04:06:38PM +0200, Mickaël Salaün wrote:
> >> As said above, Landlock will not run an eBPF programs when not 
> >> strictly
> >> needed. Attaching to a cgroup will have the same performance 
> >> impact as
> >> attaching to a process hierarchy.
> >
> > Having a prog per cgroup per lsm_hook is the only scalable way I
> > could come up with. If you see another way, please propose.
> > current->seccomp.landlock_prog is not the answer.
> 
>  Hum, I don't see the difference from a performance point of view 
>  between
>  a cgroup-based or a process hierarchy-based system.
> 
>  Maybe a better option should be to use an array of pointers with N
>  entries, one for each supported hook, instead of a unique pointer 
>  list?
> >>>
> >>> yes, clearly array dereference is faster than link list walk.
> >>> Now the question is where to keep this prog_array[num_lsm_hooks] ?
> >>> Since we cannot keep it inside task_struct, we have to allocate it.
> >>> Every time the task is creted then. What to do on the fork? That
> >>> will require changes all over. Then the obvious optimization would be
> >>> to share this allocated array of prog pointers across multiple 
> >>> tasks...
> >>> and little by little this new facility will look like cgroup.
> >>> Hence the suggestion to put this array into cgroup from the start.
> >>
> >> I see your point :)
> >>
> >>>
>  Anyway, being able to attach an LSM hook program to a cgroup thanks 
>  to
>  the new BPF_PROG_ATTACH seems a good idea (while keeping the 
>  possibility
>  to use a process hierarchy). The downside will be to handle an LSM 
>  hook
>  program which is not triggered by a seccomp-filter, but this should 
>  be
>  needed anyway to handle interruptions.
> >>>
> >>> what do you mean 'not triggered by seccomp' ?
> >>> You're not suggesting that this lsm has to enable seccomp to be 
> >>> functional?
> >>> imo that's non starter due to overhead.
> >>
> >> Yes, for now, it is triggered by a new seccomp filter return value
> >> RET_LANDLOCK, which can take a 16-bit value called cookie. This must 
> >> not
> >> be needed but could be useful to bind a seccomp filter security policy
> >> with a Landlock one. Waiting for Kees's point of view…
> >>
> >
> > I'm not Kees, but I'd be okay with that.  I still think that doing
> > this by process hierarchy a la seccomp will be easier to use and to
> > understand (which is quite important for this kind of work) than doing
> > it by cgroup.
> >
> > A feature I've wanted to add for a while is to have an fd that
> > represents a seccomp layer, the idea being that you would set up your
> > seccomp layer (with syscall filter, landlock hooks, etc) and then you
> > would have a syscall to install that layer.  Then an unprivileged
> > sandbox manager could set up its layer and still be able to inject new
> > processes into it later on, no cgroups needed.
> 
>  A nice thing I didn't highlight about Landlock is that a process can
>  prepare a layer of rules (arraymap of handles + Landlock programs) and
>  pass the file descriptors of the Landlock programs to another process.
>  This process could then apply this programs to get sandboxed. However,
>  for now, because a Landlock program is only triggered by a seccomp
>  filter (which do not follow the Landlock programs as a FD), they will be
>  useless.
> 
>  The FD referring to an arraymap of handles can also be used to update a
>  map and change the behavior of a Landlock program. A master process can
>  then add or remove restrictions to another process hierarchy on the fly.
> >>>
> >>> Maybe this could be extended a little bit.  The fd could hold the
> >>> seccomp filter *and* the LSM hook filters.  FMODE_EXECUTE could give
> >>> the ability to install it and FMODE_WRITE could give the ability to
> >>> modify it.
> >>>
> >>
> >> This is interesting! It should be possible to append the seccomp stack
> >> of a source process to the seccomp stack of 

Re: [RFC v2 09/10] landlock: Handle cgroups (performance)

2016-08-30 Thread Mickaël Salaün


On 30/08/2016 22:23, Andy Lutomirski wrote:
> On Tue, Aug 30, 2016 at 1:20 PM, Mickaël Salaün  wrote:
>>
>> On 30/08/2016 20:55, Andy Lutomirski wrote:
>>> On Sun, Aug 28, 2016 at 2:42 AM, Mickaël Salaün  wrote:


 On 28/08/2016 10:13, Andy Lutomirski wrote:
> On Aug 27, 2016 11:14 PM, "Mickaël Salaün"  wrote:
>>
>>
>> On 27/08/2016 22:43, Alexei Starovoitov wrote:
>>> On Sat, Aug 27, 2016 at 09:35:14PM +0200, Mickaël Salaün wrote:
 On 27/08/2016 20:06, Alexei Starovoitov wrote:
> On Sat, Aug 27, 2016 at 04:06:38PM +0200, Mickaël Salaün wrote:
>> As said above, Landlock will not run an eBPF programs when not 
>> strictly
>> needed. Attaching to a cgroup will have the same performance impact 
>> as
>> attaching to a process hierarchy.
>
> Having a prog per cgroup per lsm_hook is the only scalable way I
> could come up with. If you see another way, please propose.
> current->seccomp.landlock_prog is not the answer.

 Hum, I don't see the difference from a performance point of view 
 between
 a cgroup-based or a process hierarchy-based system.

 Maybe a better option should be to use an array of pointers with N
 entries, one for each supported hook, instead of a unique pointer list?
>>>
>>> yes, clearly array dereference is faster than link list walk.
>>> Now the question is where to keep this prog_array[num_lsm_hooks] ?
>>> Since we cannot keep it inside task_struct, we have to allocate it.
>>> Every time the task is creted then. What to do on the fork? That
>>> will require changes all over. Then the obvious optimization would be
>>> to share this allocated array of prog pointers across multiple tasks...
>>> and little by little this new facility will look like cgroup.
>>> Hence the suggestion to put this array into cgroup from the start.
>>
>> I see your point :)
>>
>>>
 Anyway, being able to attach an LSM hook program to a cgroup thanks to
 the new BPF_PROG_ATTACH seems a good idea (while keeping the 
 possibility
 to use a process hierarchy). The downside will be to handle an LSM hook
 program which is not triggered by a seccomp-filter, but this should be
 needed anyway to handle interruptions.
>>>
>>> what do you mean 'not triggered by seccomp' ?
>>> You're not suggesting that this lsm has to enable seccomp to be 
>>> functional?
>>> imo that's non starter due to overhead.
>>
>> Yes, for now, it is triggered by a new seccomp filter return value
>> RET_LANDLOCK, which can take a 16-bit value called cookie. This must not
>> be needed but could be useful to bind a seccomp filter security policy
>> with a Landlock one. Waiting for Kees's point of view…
>>
>
> I'm not Kees, but I'd be okay with that.  I still think that doing
> this by process hierarchy a la seccomp will be easier to use and to
> understand (which is quite important for this kind of work) than doing
> it by cgroup.
>
> A feature I've wanted to add for a while is to have an fd that
> represents a seccomp layer, the idea being that you would set up your
> seccomp layer (with syscall filter, landlock hooks, etc) and then you
> would have a syscall to install that layer.  Then an unprivileged
> sandbox manager could set up its layer and still be able to inject new
> processes into it later on, no cgroups needed.

 A nice thing I didn't highlight about Landlock is that a process can
 prepare a layer of rules (arraymap of handles + Landlock programs) and
 pass the file descriptors of the Landlock programs to another process.
 This process could then apply this programs to get sandboxed. However,
 for now, because a Landlock program is only triggered by a seccomp
 filter (which do not follow the Landlock programs as a FD), they will be
 useless.

 The FD referring to an arraymap of handles can also be used to update a
 map and change the behavior of a Landlock program. A master process can
 then add or remove restrictions to another process hierarchy on the fly.
>>>
>>> Maybe this could be extended a little bit.  The fd could hold the
>>> seccomp filter *and* the LSM hook filters.  FMODE_EXECUTE could give
>>> the ability to install it and FMODE_WRITE could give the ability to
>>> modify it.
>>>
>>
>> This is interesting! It should be possible to append the seccomp stack
>> of a source process to the seccomp stack of the target process when a
>> Landlock program is passed and then activated through seccomp(2).
>>
>> For the FMODE_EXECUTE/FMODE_WRITE, are you suggesting to manage
>> permission of the eBPF program FD in a specific way?
>>
> 
> This wouldn't be an eBPF program FD -- it 

Re: [RFC v2 09/10] landlock: Handle cgroups (performance)

2016-08-30 Thread Mickaël Salaün


On 30/08/2016 22:23, Andy Lutomirski wrote:
> On Tue, Aug 30, 2016 at 1:20 PM, Mickaël Salaün  wrote:
>>
>> On 30/08/2016 20:55, Andy Lutomirski wrote:
>>> On Sun, Aug 28, 2016 at 2:42 AM, Mickaël Salaün  wrote:


 On 28/08/2016 10:13, Andy Lutomirski wrote:
> On Aug 27, 2016 11:14 PM, "Mickaël Salaün"  wrote:
>>
>>
>> On 27/08/2016 22:43, Alexei Starovoitov wrote:
>>> On Sat, Aug 27, 2016 at 09:35:14PM +0200, Mickaël Salaün wrote:
 On 27/08/2016 20:06, Alexei Starovoitov wrote:
> On Sat, Aug 27, 2016 at 04:06:38PM +0200, Mickaël Salaün wrote:
>> As said above, Landlock will not run an eBPF programs when not 
>> strictly
>> needed. Attaching to a cgroup will have the same performance impact 
>> as
>> attaching to a process hierarchy.
>
> Having a prog per cgroup per lsm_hook is the only scalable way I
> could come up with. If you see another way, please propose.
> current->seccomp.landlock_prog is not the answer.

 Hum, I don't see the difference from a performance point of view 
 between
 a cgroup-based or a process hierarchy-based system.

 Maybe a better option should be to use an array of pointers with N
 entries, one for each supported hook, instead of a unique pointer list?
>>>
>>> yes, clearly array dereference is faster than link list walk.
>>> Now the question is where to keep this prog_array[num_lsm_hooks] ?
>>> Since we cannot keep it inside task_struct, we have to allocate it.
>>> Every time the task is creted then. What to do on the fork? That
>>> will require changes all over. Then the obvious optimization would be
>>> to share this allocated array of prog pointers across multiple tasks...
>>> and little by little this new facility will look like cgroup.
>>> Hence the suggestion to put this array into cgroup from the start.
>>
>> I see your point :)
>>
>>>
 Anyway, being able to attach an LSM hook program to a cgroup thanks to
 the new BPF_PROG_ATTACH seems a good idea (while keeping the 
 possibility
 to use a process hierarchy). The downside will be to handle an LSM hook
 program which is not triggered by a seccomp-filter, but this should be
 needed anyway to handle interruptions.
>>>
>>> what do you mean 'not triggered by seccomp' ?
>>> You're not suggesting that this lsm has to enable seccomp to be 
>>> functional?
>>> imo that's non starter due to overhead.
>>
>> Yes, for now, it is triggered by a new seccomp filter return value
>> RET_LANDLOCK, which can take a 16-bit value called cookie. This must not
>> be needed but could be useful to bind a seccomp filter security policy
>> with a Landlock one. Waiting for Kees's point of view…
>>
>
> I'm not Kees, but I'd be okay with that.  I still think that doing
> this by process hierarchy a la seccomp will be easier to use and to
> understand (which is quite important for this kind of work) than doing
> it by cgroup.
>
> A feature I've wanted to add for a while is to have an fd that
> represents a seccomp layer, the idea being that you would set up your
> seccomp layer (with syscall filter, landlock hooks, etc) and then you
> would have a syscall to install that layer.  Then an unprivileged
> sandbox manager could set up its layer and still be able to inject new
> processes into it later on, no cgroups needed.

 A nice thing I didn't highlight about Landlock is that a process can
 prepare a layer of rules (arraymap of handles + Landlock programs) and
 pass the file descriptors of the Landlock programs to another process.
 This process could then apply this programs to get sandboxed. However,
 for now, because a Landlock program is only triggered by a seccomp
 filter (which do not follow the Landlock programs as a FD), they will be
 useless.

 The FD referring to an arraymap of handles can also be used to update a
 map and change the behavior of a Landlock program. A master process can
 then add or remove restrictions to another process hierarchy on the fly.
>>>
>>> Maybe this could be extended a little bit.  The fd could hold the
>>> seccomp filter *and* the LSM hook filters.  FMODE_EXECUTE could give
>>> the ability to install it and FMODE_WRITE could give the ability to
>>> modify it.
>>>
>>
>> This is interesting! It should be possible to append the seccomp stack
>> of a source process to the seccomp stack of the target process when a
>> Landlock program is passed and then activated through seccomp(2).
>>
>> For the FMODE_EXECUTE/FMODE_WRITE, are you suggesting to manage
>> permission of the eBPF program FD in a specific way?
>>
> 
> This wouldn't be an eBPF program FD -- it would be an FD encapsulating
> an entire configuration 

Re: [RFC v2 09/10] landlock: Handle cgroups (performance)

2016-08-30 Thread Andy Lutomirski
On Tue, Aug 30, 2016 at 1:20 PM, Mickaël Salaün  wrote:
>
> On 30/08/2016 20:55, Andy Lutomirski wrote:
>> On Sun, Aug 28, 2016 at 2:42 AM, Mickaël Salaün  wrote:
>>>
>>>
>>> On 28/08/2016 10:13, Andy Lutomirski wrote:
 On Aug 27, 2016 11:14 PM, "Mickaël Salaün"  wrote:
>
>
> On 27/08/2016 22:43, Alexei Starovoitov wrote:
>> On Sat, Aug 27, 2016 at 09:35:14PM +0200, Mickaël Salaün wrote:
>>> On 27/08/2016 20:06, Alexei Starovoitov wrote:
 On Sat, Aug 27, 2016 at 04:06:38PM +0200, Mickaël Salaün wrote:
> As said above, Landlock will not run an eBPF programs when not 
> strictly
> needed. Attaching to a cgroup will have the same performance impact as
> attaching to a process hierarchy.

 Having a prog per cgroup per lsm_hook is the only scalable way I
 could come up with. If you see another way, please propose.
 current->seccomp.landlock_prog is not the answer.
>>>
>>> Hum, I don't see the difference from a performance point of view between
>>> a cgroup-based or a process hierarchy-based system.
>>>
>>> Maybe a better option should be to use an array of pointers with N
>>> entries, one for each supported hook, instead of a unique pointer list?
>>
>> yes, clearly array dereference is faster than link list walk.
>> Now the question is where to keep this prog_array[num_lsm_hooks] ?
>> Since we cannot keep it inside task_struct, we have to allocate it.
>> Every time the task is creted then. What to do on the fork? That
>> will require changes all over. Then the obvious optimization would be
>> to share this allocated array of prog pointers across multiple tasks...
>> and little by little this new facility will look like cgroup.
>> Hence the suggestion to put this array into cgroup from the start.
>
> I see your point :)
>
>>
>>> Anyway, being able to attach an LSM hook program to a cgroup thanks to
>>> the new BPF_PROG_ATTACH seems a good idea (while keeping the possibility
>>> to use a process hierarchy). The downside will be to handle an LSM hook
>>> program which is not triggered by a seccomp-filter, but this should be
>>> needed anyway to handle interruptions.
>>
>> what do you mean 'not triggered by seccomp' ?
>> You're not suggesting that this lsm has to enable seccomp to be 
>> functional?
>> imo that's non starter due to overhead.
>
> Yes, for now, it is triggered by a new seccomp filter return value
> RET_LANDLOCK, which can take a 16-bit value called cookie. This must not
> be needed but could be useful to bind a seccomp filter security policy
> with a Landlock one. Waiting for Kees's point of view…
>

 I'm not Kees, but I'd be okay with that.  I still think that doing
 this by process hierarchy a la seccomp will be easier to use and to
 understand (which is quite important for this kind of work) than doing
 it by cgroup.

 A feature I've wanted to add for a while is to have an fd that
 represents a seccomp layer, the idea being that you would set up your
 seccomp layer (with syscall filter, landlock hooks, etc) and then you
 would have a syscall to install that layer.  Then an unprivileged
 sandbox manager could set up its layer and still be able to inject new
 processes into it later on, no cgroups needed.
>>>
>>> A nice thing I didn't highlight about Landlock is that a process can
>>> prepare a layer of rules (arraymap of handles + Landlock programs) and
>>> pass the file descriptors of the Landlock programs to another process.
>>> This process could then apply this programs to get sandboxed. However,
>>> for now, because a Landlock program is only triggered by a seccomp
>>> filter (which do not follow the Landlock programs as a FD), they will be
>>> useless.
>>>
>>> The FD referring to an arraymap of handles can also be used to update a
>>> map and change the behavior of a Landlock program. A master process can
>>> then add or remove restrictions to another process hierarchy on the fly.
>>
>> Maybe this could be extended a little bit.  The fd could hold the
>> seccomp filter *and* the LSM hook filters.  FMODE_EXECUTE could give
>> the ability to install it and FMODE_WRITE could give the ability to
>> modify it.
>>
>
> This is interesting! It should be possible to append the seccomp stack
> of a source process to the seccomp stack of the target process when a
> Landlock program is passed and then activated through seccomp(2).
>
> For the FMODE_EXECUTE/FMODE_WRITE, are you suggesting to manage
> permission of the eBPF program FD in a specific way?
>

This wouldn't be an eBPF program FD -- it would be an FD encapsulating
an entire configuration including seccomp BPF program, whatever
landlock stuff is associated, and eventual seccomp monitor
configuration (once I write 

Re: [RFC v2 09/10] landlock: Handle cgroups (performance)

2016-08-30 Thread Andy Lutomirski
On Tue, Aug 30, 2016 at 1:20 PM, Mickaël Salaün  wrote:
>
> On 30/08/2016 20:55, Andy Lutomirski wrote:
>> On Sun, Aug 28, 2016 at 2:42 AM, Mickaël Salaün  wrote:
>>>
>>>
>>> On 28/08/2016 10:13, Andy Lutomirski wrote:
 On Aug 27, 2016 11:14 PM, "Mickaël Salaün"  wrote:
>
>
> On 27/08/2016 22:43, Alexei Starovoitov wrote:
>> On Sat, Aug 27, 2016 at 09:35:14PM +0200, Mickaël Salaün wrote:
>>> On 27/08/2016 20:06, Alexei Starovoitov wrote:
 On Sat, Aug 27, 2016 at 04:06:38PM +0200, Mickaël Salaün wrote:
> As said above, Landlock will not run an eBPF programs when not 
> strictly
> needed. Attaching to a cgroup will have the same performance impact as
> attaching to a process hierarchy.

 Having a prog per cgroup per lsm_hook is the only scalable way I
 could come up with. If you see another way, please propose.
 current->seccomp.landlock_prog is not the answer.
>>>
>>> Hum, I don't see the difference from a performance point of view between
>>> a cgroup-based or a process hierarchy-based system.
>>>
>>> Maybe a better option should be to use an array of pointers with N
>>> entries, one for each supported hook, instead of a unique pointer list?
>>
>> yes, clearly array dereference is faster than link list walk.
>> Now the question is where to keep this prog_array[num_lsm_hooks] ?
>> Since we cannot keep it inside task_struct, we have to allocate it.
>> Every time the task is creted then. What to do on the fork? That
>> will require changes all over. Then the obvious optimization would be
>> to share this allocated array of prog pointers across multiple tasks...
>> and little by little this new facility will look like cgroup.
>> Hence the suggestion to put this array into cgroup from the start.
>
> I see your point :)
>
>>
>>> Anyway, being able to attach an LSM hook program to a cgroup thanks to
>>> the new BPF_PROG_ATTACH seems a good idea (while keeping the possibility
>>> to use a process hierarchy). The downside will be to handle an LSM hook
>>> program which is not triggered by a seccomp-filter, but this should be
>>> needed anyway to handle interruptions.
>>
>> what do you mean 'not triggered by seccomp' ?
>> You're not suggesting that this lsm has to enable seccomp to be 
>> functional?
>> imo that's non starter due to overhead.
>
> Yes, for now, it is triggered by a new seccomp filter return value
> RET_LANDLOCK, which can take a 16-bit value called cookie. This must not
> be needed but could be useful to bind a seccomp filter security policy
> with a Landlock one. Waiting for Kees's point of view…
>

 I'm not Kees, but I'd be okay with that.  I still think that doing
 this by process hierarchy a la seccomp will be easier to use and to
 understand (which is quite important for this kind of work) than doing
 it by cgroup.

 A feature I've wanted to add for a while is to have an fd that
 represents a seccomp layer, the idea being that you would set up your
 seccomp layer (with syscall filter, landlock hooks, etc) and then you
 would have a syscall to install that layer.  Then an unprivileged
 sandbox manager could set up its layer and still be able to inject new
 processes into it later on, no cgroups needed.
>>>
>>> A nice thing I didn't highlight about Landlock is that a process can
>>> prepare a layer of rules (arraymap of handles + Landlock programs) and
>>> pass the file descriptors of the Landlock programs to another process.
>>> This process could then apply this programs to get sandboxed. However,
>>> for now, because a Landlock program is only triggered by a seccomp
>>> filter (which do not follow the Landlock programs as a FD), they will be
>>> useless.
>>>
>>> The FD referring to an arraymap of handles can also be used to update a
>>> map and change the behavior of a Landlock program. A master process can
>>> then add or remove restrictions to another process hierarchy on the fly.
>>
>> Maybe this could be extended a little bit.  The fd could hold the
>> seccomp filter *and* the LSM hook filters.  FMODE_EXECUTE could give
>> the ability to install it and FMODE_WRITE could give the ability to
>> modify it.
>>
>
> This is interesting! It should be possible to append the seccomp stack
> of a source process to the seccomp stack of the target process when a
> Landlock program is passed and then activated through seccomp(2).
>
> For the FMODE_EXECUTE/FMODE_WRITE, are you suggesting to manage
> permission of the eBPF program FD in a specific way?
>

This wouldn't be an eBPF program FD -- it would be an FD encapsulating
an entire configuration including seccomp BPF program, whatever
landlock stuff is associated, and eventual seccomp monitor
configuration (once I write that code), etc.

You wouldn't say "attach this 

Re: [RFC v2 09/10] landlock: Handle cgroups (performance)

2016-08-30 Thread Mickaël Salaün

On 30/08/2016 20:55, Andy Lutomirski wrote:
> On Sun, Aug 28, 2016 at 2:42 AM, Mickaël Salaün  wrote:
>>
>>
>> On 28/08/2016 10:13, Andy Lutomirski wrote:
>>> On Aug 27, 2016 11:14 PM, "Mickaël Salaün"  wrote:


 On 27/08/2016 22:43, Alexei Starovoitov wrote:
> On Sat, Aug 27, 2016 at 09:35:14PM +0200, Mickaël Salaün wrote:
>> On 27/08/2016 20:06, Alexei Starovoitov wrote:
>>> On Sat, Aug 27, 2016 at 04:06:38PM +0200, Mickaël Salaün wrote:
 As said above, Landlock will not run an eBPF programs when not strictly
 needed. Attaching to a cgroup will have the same performance impact as
 attaching to a process hierarchy.
>>>
>>> Having a prog per cgroup per lsm_hook is the only scalable way I
>>> could come up with. If you see another way, please propose.
>>> current->seccomp.landlock_prog is not the answer.
>>
>> Hum, I don't see the difference from a performance point of view between
>> a cgroup-based or a process hierarchy-based system.
>>
>> Maybe a better option should be to use an array of pointers with N
>> entries, one for each supported hook, instead of a unique pointer list?
>
> yes, clearly array dereference is faster than link list walk.
> Now the question is where to keep this prog_array[num_lsm_hooks] ?
> Since we cannot keep it inside task_struct, we have to allocate it.
> Every time the task is creted then. What to do on the fork? That
> will require changes all over. Then the obvious optimization would be
> to share this allocated array of prog pointers across multiple tasks...
> and little by little this new facility will look like cgroup.
> Hence the suggestion to put this array into cgroup from the start.

 I see your point :)

>
>> Anyway, being able to attach an LSM hook program to a cgroup thanks to
>> the new BPF_PROG_ATTACH seems a good idea (while keeping the possibility
>> to use a process hierarchy). The downside will be to handle an LSM hook
>> program which is not triggered by a seccomp-filter, but this should be
>> needed anyway to handle interruptions.
>
> what do you mean 'not triggered by seccomp' ?
> You're not suggesting that this lsm has to enable seccomp to be 
> functional?
> imo that's non starter due to overhead.

 Yes, for now, it is triggered by a new seccomp filter return value
 RET_LANDLOCK, which can take a 16-bit value called cookie. This must not
 be needed but could be useful to bind a seccomp filter security policy
 with a Landlock one. Waiting for Kees's point of view…

>>>
>>> I'm not Kees, but I'd be okay with that.  I still think that doing
>>> this by process hierarchy a la seccomp will be easier to use and to
>>> understand (which is quite important for this kind of work) than doing
>>> it by cgroup.
>>>
>>> A feature I've wanted to add for a while is to have an fd that
>>> represents a seccomp layer, the idea being that you would set up your
>>> seccomp layer (with syscall filter, landlock hooks, etc) and then you
>>> would have a syscall to install that layer.  Then an unprivileged
>>> sandbox manager could set up its layer and still be able to inject new
>>> processes into it later on, no cgroups needed.
>>
>> A nice thing I didn't highlight about Landlock is that a process can
>> prepare a layer of rules (arraymap of handles + Landlock programs) and
>> pass the file descriptors of the Landlock programs to another process.
>> This process could then apply this programs to get sandboxed. However,
>> for now, because a Landlock program is only triggered by a seccomp
>> filter (which do not follow the Landlock programs as a FD), they will be
>> useless.
>>
>> The FD referring to an arraymap of handles can also be used to update a
>> map and change the behavior of a Landlock program. A master process can
>> then add or remove restrictions to another process hierarchy on the fly.
> 
> Maybe this could be extended a little bit.  The fd could hold the
> seccomp filter *and* the LSM hook filters.  FMODE_EXECUTE could give
> the ability to install it and FMODE_WRITE could give the ability to
> modify it.
> 

This is interesting! It should be possible to append the seccomp stack
of a source process to the seccomp stack of the target process when a
Landlock program is passed and then activated through seccomp(2).

For the FMODE_EXECUTE/FMODE_WRITE, are you suggesting to manage
permission of the eBPF program FD in a specific way?



signature.asc
Description: OpenPGP digital signature


Re: [RFC v2 09/10] landlock: Handle cgroups (performance)

2016-08-30 Thread Mickaël Salaün

On 30/08/2016 20:55, Andy Lutomirski wrote:
> On Sun, Aug 28, 2016 at 2:42 AM, Mickaël Salaün  wrote:
>>
>>
>> On 28/08/2016 10:13, Andy Lutomirski wrote:
>>> On Aug 27, 2016 11:14 PM, "Mickaël Salaün"  wrote:


 On 27/08/2016 22:43, Alexei Starovoitov wrote:
> On Sat, Aug 27, 2016 at 09:35:14PM +0200, Mickaël Salaün wrote:
>> On 27/08/2016 20:06, Alexei Starovoitov wrote:
>>> On Sat, Aug 27, 2016 at 04:06:38PM +0200, Mickaël Salaün wrote:
 As said above, Landlock will not run an eBPF programs when not strictly
 needed. Attaching to a cgroup will have the same performance impact as
 attaching to a process hierarchy.
>>>
>>> Having a prog per cgroup per lsm_hook is the only scalable way I
>>> could come up with. If you see another way, please propose.
>>> current->seccomp.landlock_prog is not the answer.
>>
>> Hum, I don't see the difference from a performance point of view between
>> a cgroup-based or a process hierarchy-based system.
>>
>> Maybe a better option should be to use an array of pointers with N
>> entries, one for each supported hook, instead of a unique pointer list?
>
> yes, clearly array dereference is faster than link list walk.
> Now the question is where to keep this prog_array[num_lsm_hooks] ?
> Since we cannot keep it inside task_struct, we have to allocate it.
> Every time the task is creted then. What to do on the fork? That
> will require changes all over. Then the obvious optimization would be
> to share this allocated array of prog pointers across multiple tasks...
> and little by little this new facility will look like cgroup.
> Hence the suggestion to put this array into cgroup from the start.

 I see your point :)

>
>> Anyway, being able to attach an LSM hook program to a cgroup thanks to
>> the new BPF_PROG_ATTACH seems a good idea (while keeping the possibility
>> to use a process hierarchy). The downside will be to handle an LSM hook
>> program which is not triggered by a seccomp-filter, but this should be
>> needed anyway to handle interruptions.
>
> what do you mean 'not triggered by seccomp' ?
> You're not suggesting that this lsm has to enable seccomp to be 
> functional?
> imo that's non starter due to overhead.

 Yes, for now, it is triggered by a new seccomp filter return value
 RET_LANDLOCK, which can take a 16-bit value called cookie. This must not
 be needed but could be useful to bind a seccomp filter security policy
 with a Landlock one. Waiting for Kees's point of view…

>>>
>>> I'm not Kees, but I'd be okay with that.  I still think that doing
>>> this by process hierarchy a la seccomp will be easier to use and to
>>> understand (which is quite important for this kind of work) than doing
>>> it by cgroup.
>>>
>>> A feature I've wanted to add for a while is to have an fd that
>>> represents a seccomp layer, the idea being that you would set up your
>>> seccomp layer (with syscall filter, landlock hooks, etc) and then you
>>> would have a syscall to install that layer.  Then an unprivileged
>>> sandbox manager could set up its layer and still be able to inject new
>>> processes into it later on, no cgroups needed.
>>
>> A nice thing I didn't highlight about Landlock is that a process can
>> prepare a layer of rules (arraymap of handles + Landlock programs) and
>> pass the file descriptors of the Landlock programs to another process.
>> This process could then apply this programs to get sandboxed. However,
>> for now, because a Landlock program is only triggered by a seccomp
>> filter (which do not follow the Landlock programs as a FD), they will be
>> useless.
>>
>> The FD referring to an arraymap of handles can also be used to update a
>> map and change the behavior of a Landlock program. A master process can
>> then add or remove restrictions to another process hierarchy on the fly.
> 
> Maybe this could be extended a little bit.  The fd could hold the
> seccomp filter *and* the LSM hook filters.  FMODE_EXECUTE could give
> the ability to install it and FMODE_WRITE could give the ability to
> modify it.
> 

This is interesting! It should be possible to append the seccomp stack
of a source process to the seccomp stack of the target process when a
Landlock program is passed and then activated through seccomp(2).

For the FMODE_EXECUTE/FMODE_WRITE, are you suggesting to manage
permission of the eBPF program FD in a specific way?



signature.asc
Description: OpenPGP digital signature


Re: [RFC v2 09/10] landlock: Handle cgroups (performance)

2016-08-30 Thread Andy Lutomirski
On Sun, Aug 28, 2016 at 2:42 AM, Mickaël Salaün  wrote:
>
>
> On 28/08/2016 10:13, Andy Lutomirski wrote:
>> On Aug 27, 2016 11:14 PM, "Mickaël Salaün"  wrote:
>>>
>>>
>>> On 27/08/2016 22:43, Alexei Starovoitov wrote:
 On Sat, Aug 27, 2016 at 09:35:14PM +0200, Mickaël Salaün wrote:
> On 27/08/2016 20:06, Alexei Starovoitov wrote:
>> On Sat, Aug 27, 2016 at 04:06:38PM +0200, Mickaël Salaün wrote:
>>> As said above, Landlock will not run an eBPF programs when not strictly
>>> needed. Attaching to a cgroup will have the same performance impact as
>>> attaching to a process hierarchy.
>>
>> Having a prog per cgroup per lsm_hook is the only scalable way I
>> could come up with. If you see another way, please propose.
>> current->seccomp.landlock_prog is not the answer.
>
> Hum, I don't see the difference from a performance point of view between
> a cgroup-based or a process hierarchy-based system.
>
> Maybe a better option should be to use an array of pointers with N
> entries, one for each supported hook, instead of a unique pointer list?

 yes, clearly array dereference is faster than link list walk.
 Now the question is where to keep this prog_array[num_lsm_hooks] ?
 Since we cannot keep it inside task_struct, we have to allocate it.
 Every time the task is creted then. What to do on the fork? That
 will require changes all over. Then the obvious optimization would be
 to share this allocated array of prog pointers across multiple tasks...
 and little by little this new facility will look like cgroup.
 Hence the suggestion to put this array into cgroup from the start.
>>>
>>> I see your point :)
>>>

> Anyway, being able to attach an LSM hook program to a cgroup thanks to
> the new BPF_PROG_ATTACH seems a good idea (while keeping the possibility
> to use a process hierarchy). The downside will be to handle an LSM hook
> program which is not triggered by a seccomp-filter, but this should be
> needed anyway to handle interruptions.

 what do you mean 'not triggered by seccomp' ?
 You're not suggesting that this lsm has to enable seccomp to be functional?
 imo that's non starter due to overhead.
>>>
>>> Yes, for now, it is triggered by a new seccomp filter return value
>>> RET_LANDLOCK, which can take a 16-bit value called cookie. This must not
>>> be needed but could be useful to bind a seccomp filter security policy
>>> with a Landlock one. Waiting for Kees's point of view…
>>>
>>
>> I'm not Kees, but I'd be okay with that.  I still think that doing
>> this by process hierarchy a la seccomp will be easier to use and to
>> understand (which is quite important for this kind of work) than doing
>> it by cgroup.
>>
>> A feature I've wanted to add for a while is to have an fd that
>> represents a seccomp layer, the idea being that you would set up your
>> seccomp layer (with syscall filter, landlock hooks, etc) and then you
>> would have a syscall to install that layer.  Then an unprivileged
>> sandbox manager could set up its layer and still be able to inject new
>> processes into it later on, no cgroups needed.
>
> A nice thing I didn't highlight about Landlock is that a process can
> prepare a layer of rules (arraymap of handles + Landlock programs) and
> pass the file descriptors of the Landlock programs to another process.
> This process could then apply this programs to get sandboxed. However,
> for now, because a Landlock program is only triggered by a seccomp
> filter (which do not follow the Landlock programs as a FD), they will be
> useless.
>
> The FD referring to an arraymap of handles can also be used to update a
> map and change the behavior of a Landlock program. A master process can
> then add or remove restrictions to another process hierarchy on the fly.

Maybe this could be extended a little bit.  The fd could hold the
seccomp filter *and* the LSM hook filters.  FMODE_EXECUTE could give
the ability to install it and FMODE_WRITE could give the ability to
modify it.


Re: [RFC v2 09/10] landlock: Handle cgroups (performance)

2016-08-30 Thread Andy Lutomirski
On Sun, Aug 28, 2016 at 2:42 AM, Mickaël Salaün  wrote:
>
>
> On 28/08/2016 10:13, Andy Lutomirski wrote:
>> On Aug 27, 2016 11:14 PM, "Mickaël Salaün"  wrote:
>>>
>>>
>>> On 27/08/2016 22:43, Alexei Starovoitov wrote:
 On Sat, Aug 27, 2016 at 09:35:14PM +0200, Mickaël Salaün wrote:
> On 27/08/2016 20:06, Alexei Starovoitov wrote:
>> On Sat, Aug 27, 2016 at 04:06:38PM +0200, Mickaël Salaün wrote:
>>> As said above, Landlock will not run an eBPF programs when not strictly
>>> needed. Attaching to a cgroup will have the same performance impact as
>>> attaching to a process hierarchy.
>>
>> Having a prog per cgroup per lsm_hook is the only scalable way I
>> could come up with. If you see another way, please propose.
>> current->seccomp.landlock_prog is not the answer.
>
> Hum, I don't see the difference from a performance point of view between
> a cgroup-based or a process hierarchy-based system.
>
> Maybe a better option should be to use an array of pointers with N
> entries, one for each supported hook, instead of a unique pointer list?

 yes, clearly array dereference is faster than link list walk.
 Now the question is where to keep this prog_array[num_lsm_hooks] ?
 Since we cannot keep it inside task_struct, we have to allocate it.
 Every time the task is creted then. What to do on the fork? That
 will require changes all over. Then the obvious optimization would be
 to share this allocated array of prog pointers across multiple tasks...
 and little by little this new facility will look like cgroup.
 Hence the suggestion to put this array into cgroup from the start.
>>>
>>> I see your point :)
>>>

> Anyway, being able to attach an LSM hook program to a cgroup thanks to
> the new BPF_PROG_ATTACH seems a good idea (while keeping the possibility
> to use a process hierarchy). The downside will be to handle an LSM hook
> program which is not triggered by a seccomp-filter, but this should be
> needed anyway to handle interruptions.

 what do you mean 'not triggered by seccomp' ?
 You're not suggesting that this lsm has to enable seccomp to be functional?
 imo that's non starter due to overhead.
>>>
>>> Yes, for now, it is triggered by a new seccomp filter return value
>>> RET_LANDLOCK, which can take a 16-bit value called cookie. This must not
>>> be needed but could be useful to bind a seccomp filter security policy
>>> with a Landlock one. Waiting for Kees's point of view…
>>>
>>
>> I'm not Kees, but I'd be okay with that.  I still think that doing
>> this by process hierarchy a la seccomp will be easier to use and to
>> understand (which is quite important for this kind of work) than doing
>> it by cgroup.
>>
>> A feature I've wanted to add for a while is to have an fd that
>> represents a seccomp layer, the idea being that you would set up your
>> seccomp layer (with syscall filter, landlock hooks, etc) and then you
>> would have a syscall to install that layer.  Then an unprivileged
>> sandbox manager could set up its layer and still be able to inject new
>> processes into it later on, no cgroups needed.
>
> A nice thing I didn't highlight about Landlock is that a process can
> prepare a layer of rules (arraymap of handles + Landlock programs) and
> pass the file descriptors of the Landlock programs to another process.
> This process could then apply this programs to get sandboxed. However,
> for now, because a Landlock program is only triggered by a seccomp
> filter (which do not follow the Landlock programs as a FD), they will be
> useless.
>
> The FD referring to an arraymap of handles can also be used to update a
> map and change the behavior of a Landlock program. A master process can
> then add or remove restrictions to another process hierarchy on the fly.

Maybe this could be extended a little bit.  The fd could hold the
seccomp filter *and* the LSM hook filters.  FMODE_EXECUTE could give
the ability to install it and FMODE_WRITE could give the ability to
modify it.


Re: [RFC v2 09/10] landlock: Handle cgroups (performance)

2016-08-28 Thread Mickaël Salaün


On 28/08/2016 10:13, Andy Lutomirski wrote:
> On Aug 27, 2016 11:14 PM, "Mickaël Salaün"  wrote:
>>
>>
>> On 27/08/2016 22:43, Alexei Starovoitov wrote:
>>> On Sat, Aug 27, 2016 at 09:35:14PM +0200, Mickaël Salaün wrote:
 On 27/08/2016 20:06, Alexei Starovoitov wrote:
> On Sat, Aug 27, 2016 at 04:06:38PM +0200, Mickaël Salaün wrote:
>> As said above, Landlock will not run an eBPF programs when not strictly
>> needed. Attaching to a cgroup will have the same performance impact as
>> attaching to a process hierarchy.
>
> Having a prog per cgroup per lsm_hook is the only scalable way I
> could come up with. If you see another way, please propose.
> current->seccomp.landlock_prog is not the answer.

 Hum, I don't see the difference from a performance point of view between
 a cgroup-based or a process hierarchy-based system.

 Maybe a better option should be to use an array of pointers with N
 entries, one for each supported hook, instead of a unique pointer list?
>>>
>>> yes, clearly array dereference is faster than link list walk.
>>> Now the question is where to keep this prog_array[num_lsm_hooks] ?
>>> Since we cannot keep it inside task_struct, we have to allocate it.
>>> Every time the task is creted then. What to do on the fork? That
>>> will require changes all over. Then the obvious optimization would be
>>> to share this allocated array of prog pointers across multiple tasks...
>>> and little by little this new facility will look like cgroup.
>>> Hence the suggestion to put this array into cgroup from the start.
>>
>> I see your point :)
>>
>>>
 Anyway, being able to attach an LSM hook program to a cgroup thanks to
 the new BPF_PROG_ATTACH seems a good idea (while keeping the possibility
 to use a process hierarchy). The downside will be to handle an LSM hook
 program which is not triggered by a seccomp-filter, but this should be
 needed anyway to handle interruptions.
>>>
>>> what do you mean 'not triggered by seccomp' ?
>>> You're not suggesting that this lsm has to enable seccomp to be functional?
>>> imo that's non starter due to overhead.
>>
>> Yes, for now, it is triggered by a new seccomp filter return value
>> RET_LANDLOCK, which can take a 16-bit value called cookie. This must not
>> be needed but could be useful to bind a seccomp filter security policy
>> with a Landlock one. Waiting for Kees's point of view…
>>
> 
> I'm not Kees, but I'd be okay with that.  I still think that doing
> this by process hierarchy a la seccomp will be easier to use and to
> understand (which is quite important for this kind of work) than doing
> it by cgroup.
> 
> A feature I've wanted to add for a while is to have an fd that
> represents a seccomp layer, the idea being that you would set up your
> seccomp layer (with syscall filter, landlock hooks, etc) and then you
> would have a syscall to install that layer.  Then an unprivileged
> sandbox manager could set up its layer and still be able to inject new
> processes into it later on, no cgroups needed.

A nice thing I didn't highlight about Landlock is that a process can
prepare a layer of rules (arraymap of handles + Landlock programs) and
pass the file descriptors of the Landlock programs to another process.
This process could then apply this programs to get sandboxed. However,
for now, because a Landlock program is only triggered by a seccomp
filter (which do not follow the Landlock programs as a FD), they will be
useless.

The FD referring to an arraymap of handles can also be used to update a
map and change the behavior of a Landlock program. A master process can
then add or remove restrictions to another process hierarchy on the fly.

However, I think it would make more sense to use cgroups if we want to
move an existing (unwilling) unsandoxed process into a sandboxed
environment. Of course, some more no_new_privs checks would be needed.



signature.asc
Description: OpenPGP digital signature


Re: [RFC v2 09/10] landlock: Handle cgroups (performance)

2016-08-28 Thread Mickaël Salaün


On 28/08/2016 10:13, Andy Lutomirski wrote:
> On Aug 27, 2016 11:14 PM, "Mickaël Salaün"  wrote:
>>
>>
>> On 27/08/2016 22:43, Alexei Starovoitov wrote:
>>> On Sat, Aug 27, 2016 at 09:35:14PM +0200, Mickaël Salaün wrote:
 On 27/08/2016 20:06, Alexei Starovoitov wrote:
> On Sat, Aug 27, 2016 at 04:06:38PM +0200, Mickaël Salaün wrote:
>> As said above, Landlock will not run an eBPF programs when not strictly
>> needed. Attaching to a cgroup will have the same performance impact as
>> attaching to a process hierarchy.
>
> Having a prog per cgroup per lsm_hook is the only scalable way I
> could come up with. If you see another way, please propose.
> current->seccomp.landlock_prog is not the answer.

 Hum, I don't see the difference from a performance point of view between
 a cgroup-based or a process hierarchy-based system.

 Maybe a better option should be to use an array of pointers with N
 entries, one for each supported hook, instead of a unique pointer list?
>>>
>>> yes, clearly array dereference is faster than link list walk.
>>> Now the question is where to keep this prog_array[num_lsm_hooks] ?
>>> Since we cannot keep it inside task_struct, we have to allocate it.
>>> Every time the task is creted then. What to do on the fork? That
>>> will require changes all over. Then the obvious optimization would be
>>> to share this allocated array of prog pointers across multiple tasks...
>>> and little by little this new facility will look like cgroup.
>>> Hence the suggestion to put this array into cgroup from the start.
>>
>> I see your point :)
>>
>>>
 Anyway, being able to attach an LSM hook program to a cgroup thanks to
 the new BPF_PROG_ATTACH seems a good idea (while keeping the possibility
 to use a process hierarchy). The downside will be to handle an LSM hook
 program which is not triggered by a seccomp-filter, but this should be
 needed anyway to handle interruptions.
>>>
>>> what do you mean 'not triggered by seccomp' ?
>>> You're not suggesting that this lsm has to enable seccomp to be functional?
>>> imo that's non starter due to overhead.
>>
>> Yes, for now, it is triggered by a new seccomp filter return value
>> RET_LANDLOCK, which can take a 16-bit value called cookie. This must not
>> be needed but could be useful to bind a seccomp filter security policy
>> with a Landlock one. Waiting for Kees's point of view…
>>
> 
> I'm not Kees, but I'd be okay with that.  I still think that doing
> this by process hierarchy a la seccomp will be easier to use and to
> understand (which is quite important for this kind of work) than doing
> it by cgroup.
> 
> A feature I've wanted to add for a while is to have an fd that
> represents a seccomp layer, the idea being that you would set up your
> seccomp layer (with syscall filter, landlock hooks, etc) and then you
> would have a syscall to install that layer.  Then an unprivileged
> sandbox manager could set up its layer and still be able to inject new
> processes into it later on, no cgroups needed.

A nice thing I didn't highlight about Landlock is that a process can
prepare a layer of rules (arraymap of handles + Landlock programs) and
pass the file descriptors of the Landlock programs to another process.
This process could then apply this programs to get sandboxed. However,
for now, because a Landlock program is only triggered by a seccomp
filter (which do not follow the Landlock programs as a FD), they will be
useless.

The FD referring to an arraymap of handles can also be used to update a
map and change the behavior of a Landlock program. A master process can
then add or remove restrictions to another process hierarchy on the fly.

However, I think it would make more sense to use cgroups if we want to
move an existing (unwilling) unsandoxed process into a sandboxed
environment. Of course, some more no_new_privs checks would be needed.



signature.asc
Description: OpenPGP digital signature


Re: [RFC v2 09/10] landlock: Handle cgroups (performance)

2016-08-28 Thread Andy Lutomirski
On Aug 27, 2016 11:14 PM, "Mickaël Salaün"  wrote:
>
>
> On 27/08/2016 22:43, Alexei Starovoitov wrote:
> > On Sat, Aug 27, 2016 at 09:35:14PM +0200, Mickaël Salaün wrote:
> >> On 27/08/2016 20:06, Alexei Starovoitov wrote:
> >>> On Sat, Aug 27, 2016 at 04:06:38PM +0200, Mickaël Salaün wrote:
>  As said above, Landlock will not run an eBPF programs when not strictly
>  needed. Attaching to a cgroup will have the same performance impact as
>  attaching to a process hierarchy.
> >>>
> >>> Having a prog per cgroup per lsm_hook is the only scalable way I
> >>> could come up with. If you see another way, please propose.
> >>> current->seccomp.landlock_prog is not the answer.
> >>
> >> Hum, I don't see the difference from a performance point of view between
> >> a cgroup-based or a process hierarchy-based system.
> >>
> >> Maybe a better option should be to use an array of pointers with N
> >> entries, one for each supported hook, instead of a unique pointer list?
> >
> > yes, clearly array dereference is faster than link list walk.
> > Now the question is where to keep this prog_array[num_lsm_hooks] ?
> > Since we cannot keep it inside task_struct, we have to allocate it.
> > Every time the task is creted then. What to do on the fork? That
> > will require changes all over. Then the obvious optimization would be
> > to share this allocated array of prog pointers across multiple tasks...
> > and little by little this new facility will look like cgroup.
> > Hence the suggestion to put this array into cgroup from the start.
>
> I see your point :)
>
> >
> >> Anyway, being able to attach an LSM hook program to a cgroup thanks to
> >> the new BPF_PROG_ATTACH seems a good idea (while keeping the possibility
> >> to use a process hierarchy). The downside will be to handle an LSM hook
> >> program which is not triggered by a seccomp-filter, but this should be
> >> needed anyway to handle interruptions.
> >
> > what do you mean 'not triggered by seccomp' ?
> > You're not suggesting that this lsm has to enable seccomp to be functional?
> > imo that's non starter due to overhead.
>
> Yes, for now, it is triggered by a new seccomp filter return value
> RET_LANDLOCK, which can take a 16-bit value called cookie. This must not
> be needed but could be useful to bind a seccomp filter security policy
> with a Landlock one. Waiting for Kees's point of view…
>

I'm not Kees, but I'd be okay with that.  I still think that doing
this by process hierarchy a la seccomp will be easier to use and to
understand (which is quite important for this kind of work) than doing
it by cgroup.

A feature I've wanted to add for a while is to have an fd that
represents a seccomp layer, the idea being that you would set up your
seccomp layer (with syscall filter, landlock hooks, etc) and then you
would have a syscall to install that layer.  Then an unprivileged
sandbox manager could set up its layer and still be able to inject new
processes into it later on, no cgroups needed.

--Andy


Re: [RFC v2 09/10] landlock: Handle cgroups (performance)

2016-08-28 Thread Andy Lutomirski
On Aug 27, 2016 11:14 PM, "Mickaël Salaün"  wrote:
>
>
> On 27/08/2016 22:43, Alexei Starovoitov wrote:
> > On Sat, Aug 27, 2016 at 09:35:14PM +0200, Mickaël Salaün wrote:
> >> On 27/08/2016 20:06, Alexei Starovoitov wrote:
> >>> On Sat, Aug 27, 2016 at 04:06:38PM +0200, Mickaël Salaün wrote:
>  As said above, Landlock will not run an eBPF programs when not strictly
>  needed. Attaching to a cgroup will have the same performance impact as
>  attaching to a process hierarchy.
> >>>
> >>> Having a prog per cgroup per lsm_hook is the only scalable way I
> >>> could come up with. If you see another way, please propose.
> >>> current->seccomp.landlock_prog is not the answer.
> >>
> >> Hum, I don't see the difference from a performance point of view between
> >> a cgroup-based or a process hierarchy-based system.
> >>
> >> Maybe a better option should be to use an array of pointers with N
> >> entries, one for each supported hook, instead of a unique pointer list?
> >
> > yes, clearly array dereference is faster than link list walk.
> > Now the question is where to keep this prog_array[num_lsm_hooks] ?
> > Since we cannot keep it inside task_struct, we have to allocate it.
> > Every time the task is creted then. What to do on the fork? That
> > will require changes all over. Then the obvious optimization would be
> > to share this allocated array of prog pointers across multiple tasks...
> > and little by little this new facility will look like cgroup.
> > Hence the suggestion to put this array into cgroup from the start.
>
> I see your point :)
>
> >
> >> Anyway, being able to attach an LSM hook program to a cgroup thanks to
> >> the new BPF_PROG_ATTACH seems a good idea (while keeping the possibility
> >> to use a process hierarchy). The downside will be to handle an LSM hook
> >> program which is not triggered by a seccomp-filter, but this should be
> >> needed anyway to handle interruptions.
> >
> > what do you mean 'not triggered by seccomp' ?
> > You're not suggesting that this lsm has to enable seccomp to be functional?
> > imo that's non starter due to overhead.
>
> Yes, for now, it is triggered by a new seccomp filter return value
> RET_LANDLOCK, which can take a 16-bit value called cookie. This must not
> be needed but could be useful to bind a seccomp filter security policy
> with a Landlock one. Waiting for Kees's point of view…
>

I'm not Kees, but I'd be okay with that.  I still think that doing
this by process hierarchy a la seccomp will be easier to use and to
understand (which is quite important for this kind of work) than doing
it by cgroup.

A feature I've wanted to add for a while is to have an fd that
represents a seccomp layer, the idea being that you would set up your
seccomp layer (with syscall filter, landlock hooks, etc) and then you
would have a syscall to install that layer.  Then an unprivileged
sandbox manager could set up its layer and still be able to inject new
processes into it later on, no cgroups needed.

--Andy


Re: [RFC v2 09/10] landlock: Handle cgroups (performance)

2016-08-27 Thread Mickaël Salaün

On 27/08/2016 22:43, Alexei Starovoitov wrote:
> On Sat, Aug 27, 2016 at 09:35:14PM +0200, Mickaël Salaün wrote:
>> On 27/08/2016 20:06, Alexei Starovoitov wrote:
>>> On Sat, Aug 27, 2016 at 04:06:38PM +0200, Mickaël Salaün wrote:
 As said above, Landlock will not run an eBPF programs when not strictly
 needed. Attaching to a cgroup will have the same performance impact as
 attaching to a process hierarchy.
>>>
>>> Having a prog per cgroup per lsm_hook is the only scalable way I
>>> could come up with. If you see another way, please propose.
>>> current->seccomp.landlock_prog is not the answer.
>>
>> Hum, I don't see the difference from a performance point of view between
>> a cgroup-based or a process hierarchy-based system.
>>
>> Maybe a better option should be to use an array of pointers with N
>> entries, one for each supported hook, instead of a unique pointer list?
> 
> yes, clearly array dereference is faster than link list walk.
> Now the question is where to keep this prog_array[num_lsm_hooks] ?
> Since we cannot keep it inside task_struct, we have to allocate it.
> Every time the task is creted then. What to do on the fork? That
> will require changes all over. Then the obvious optimization would be
> to share this allocated array of prog pointers across multiple tasks...
> and little by little this new facility will look like cgroup.
> Hence the suggestion to put this array into cgroup from the start.

I see your point :)

> 
>> Anyway, being able to attach an LSM hook program to a cgroup thanks to
>> the new BPF_PROG_ATTACH seems a good idea (while keeping the possibility
>> to use a process hierarchy). The downside will be to handle an LSM hook
>> program which is not triggered by a seccomp-filter, but this should be
>> needed anyway to handle interruptions.
> 
> what do you mean 'not triggered by seccomp' ?
> You're not suggesting that this lsm has to enable seccomp to be functional?
> imo that's non starter due to overhead.

Yes, for now, it is triggered by a new seccomp filter return value
RET_LANDLOCK, which can take a 16-bit value called cookie. This must not
be needed but could be useful to bind a seccomp filter security policy
with a Landlock one. Waiting for Kees's point of view…



signature.asc
Description: OpenPGP digital signature


Re: [RFC v2 09/10] landlock: Handle cgroups (performance)

2016-08-27 Thread Mickaël Salaün

On 27/08/2016 22:43, Alexei Starovoitov wrote:
> On Sat, Aug 27, 2016 at 09:35:14PM +0200, Mickaël Salaün wrote:
>> On 27/08/2016 20:06, Alexei Starovoitov wrote:
>>> On Sat, Aug 27, 2016 at 04:06:38PM +0200, Mickaël Salaün wrote:
 As said above, Landlock will not run an eBPF programs when not strictly
 needed. Attaching to a cgroup will have the same performance impact as
 attaching to a process hierarchy.
>>>
>>> Having a prog per cgroup per lsm_hook is the only scalable way I
>>> could come up with. If you see another way, please propose.
>>> current->seccomp.landlock_prog is not the answer.
>>
>> Hum, I don't see the difference from a performance point of view between
>> a cgroup-based or a process hierarchy-based system.
>>
>> Maybe a better option should be to use an array of pointers with N
>> entries, one for each supported hook, instead of a unique pointer list?
> 
> yes, clearly array dereference is faster than link list walk.
> Now the question is where to keep this prog_array[num_lsm_hooks] ?
> Since we cannot keep it inside task_struct, we have to allocate it.
> Every time the task is creted then. What to do on the fork? That
> will require changes all over. Then the obvious optimization would be
> to share this allocated array of prog pointers across multiple tasks...
> and little by little this new facility will look like cgroup.
> Hence the suggestion to put this array into cgroup from the start.

I see your point :)

> 
>> Anyway, being able to attach an LSM hook program to a cgroup thanks to
>> the new BPF_PROG_ATTACH seems a good idea (while keeping the possibility
>> to use a process hierarchy). The downside will be to handle an LSM hook
>> program which is not triggered by a seccomp-filter, but this should be
>> needed anyway to handle interruptions.
> 
> what do you mean 'not triggered by seccomp' ?
> You're not suggesting that this lsm has to enable seccomp to be functional?
> imo that's non starter due to overhead.

Yes, for now, it is triggered by a new seccomp filter return value
RET_LANDLOCK, which can take a 16-bit value called cookie. This must not
be needed but could be useful to bind a seccomp filter security policy
with a Landlock one. Waiting for Kees's point of view…



signature.asc
Description: OpenPGP digital signature


Re: [RFC v2 09/10] landlock: Handle cgroups (performance)

2016-08-27 Thread Alexei Starovoitov
On Sat, Aug 27, 2016 at 09:35:14PM +0200, Mickaël Salaün wrote:
> 
> On 27/08/2016 20:06, Alexei Starovoitov wrote:
> > On Sat, Aug 27, 2016 at 04:06:38PM +0200, Mickaël Salaün wrote:
> >>
> >> On 27/08/2016 01:05, Alexei Starovoitov wrote:
> >>> On Fri, Aug 26, 2016 at 05:10:40PM +0200, Mickaël Salaün wrote:
> 
> >
> > - I don't think such 'for' loop can scale. The solution needs to work
> > with thousands of containers and thousands of cgroups.
> > In the patch 06/10 the proposal is to use 'current' as holder of
> > the programs:
> > +   for (prog = current->seccomp.landlock_prog;
> > +   prog; prog = prog->prev) {
> > +   if (prog->filter == landlock_ret->filter) {
> > +   cur_ret = BPF_PROG_RUN(prog->prog, (void *));
> > +   break;
> > +   }
> > +   }
> > imo that's the root of scalability issue.
> > I think to be able to scale the bpf programs have to be attached to
> > cgroups instead of tasks.
> > That would be very different api. seccomp doesn't need to be touched.
> > But that is the only way I see to be able to scale.
> 
>  Landlock is inspired from seccomp which also use a BPF program per
>  thread. For seccomp, each BPF programs are executed for each syscall.
>  For Landlock, some BPF programs are executed for some LSM hooks. I don't
>  see why it is a scale issue for Landlock comparing to seccomp. I also
>  don't see why storing the BPF program list pointer in the cgroup struct
>  instead of the task struct change a lot here. The BPF programs execution
>  will be the same anyway (for each LSM hook). Kees should probably have a
>  better opinion on this.
> >>>
> >>> seccomp has its own issues and copying them doesn't make this lsm any 
> >>> better.
> >>> Like seccomp bpf programs are all gigantic switch statement that looks
> >>> for interesting syscall numbers. All syscalls of a task are paying
> >>> non-trivial seccomp penalty due to such design. If bpf was attached per
> >>> syscall it would have been much cheaper. Of course doing it this way
> >>> for seccomp is not easy, but for lsm such facility is already there.
> >>> Blank call of a single bpf prog for all lsm hooks is unnecessary
> >>> overhead that can and should be avoided.
> >>
> >> It's probably a misunderstanding. Contrary to seccomp which run all the
> >> thread's BPF programs for any syscall, Landlock only run eBPF programs
> >> for the triggered LSM hooks, if their type match. Indeed, thanks to the
> >> multiple eBPF program types and contrary to seccomp, Landlock only run
> >> an eBPF program when needed. Landlock will have almost no performance
> >> overhead if the syscalls do not trigger the watched LSM hooks for the
> >> current process.
> > 
> > that's not what I see in the patch 06/10:
> > all lsm_hooks in 'static struct security_hook_list landlock_hooks'
> > (which eventually means all lsm hooks) will call
> > static inline int landlock_hook_##NAME
> > which will call landlock_run_prog()
> > which does:
> > + for (landlock_ret = current->seccomp.landlock_ret;
> > +  landlock_ret; landlock_ret = landlock_ret->prev) {
> > +if (landlock_ret->triggered) {
> > +   ctx.cookie = landlock_ret->cookie;
> > +   for (prog = current->seccomp.landlock_prog;
> > +prog; prog = prog->prev) {
> > +   if (prog->filter == landlock_ret->filter) {
> > +   cur_ret = BPF_PROG_RUN(prog->prog, (void *));
> > +   break;
> > +   }
> > +   }
> > 
> > that is unacceptable overhead and not a scalable design.
> > It kinda works for 3 lsm_hooks as in patch 6, but doesn't scale
> > as soon as more lsm hooks are added.
> 
> Good catch! I forgot to check the program (sub)type in the loop to only
> run the needed programs for the current hook. I will fix this.
> 
> 
> > 
> >> As said above, Landlock will not run an eBPF programs when not strictly
> >> needed. Attaching to a cgroup will have the same performance impact as
> >> attaching to a process hierarchy.
> > 
> > Having a prog per cgroup per lsm_hook is the only scalable way I
> > could come up with. If you see another way, please propose.
> > current->seccomp.landlock_prog is not the answer.
> 
> Hum, I don't see the difference from a performance point of view between
> a cgroup-based or a process hierarchy-based system.
> 
> Maybe a better option should be to use an array of pointers with N
> entries, one for each supported hook, instead of a unique pointer list?

yes, clearly array dereference is faster than link list walk.
Now the question is where to keep this prog_array[num_lsm_hooks] ?
Since we cannot keep it inside task_struct, we have to allocate it.
Every time the task is creted then. What to do on the fork? That
will require changes all over. Then the obvious optimization would be
to share this allocated array of prog 

Re: [RFC v2 09/10] landlock: Handle cgroups (performance)

2016-08-27 Thread Alexei Starovoitov
On Sat, Aug 27, 2016 at 09:35:14PM +0200, Mickaël Salaün wrote:
> 
> On 27/08/2016 20:06, Alexei Starovoitov wrote:
> > On Sat, Aug 27, 2016 at 04:06:38PM +0200, Mickaël Salaün wrote:
> >>
> >> On 27/08/2016 01:05, Alexei Starovoitov wrote:
> >>> On Fri, Aug 26, 2016 at 05:10:40PM +0200, Mickaël Salaün wrote:
> 
> >
> > - I don't think such 'for' loop can scale. The solution needs to work
> > with thousands of containers and thousands of cgroups.
> > In the patch 06/10 the proposal is to use 'current' as holder of
> > the programs:
> > +   for (prog = current->seccomp.landlock_prog;
> > +   prog; prog = prog->prev) {
> > +   if (prog->filter == landlock_ret->filter) {
> > +   cur_ret = BPF_PROG_RUN(prog->prog, (void *));
> > +   break;
> > +   }
> > +   }
> > imo that's the root of scalability issue.
> > I think to be able to scale the bpf programs have to be attached to
> > cgroups instead of tasks.
> > That would be very different api. seccomp doesn't need to be touched.
> > But that is the only way I see to be able to scale.
> 
>  Landlock is inspired from seccomp which also use a BPF program per
>  thread. For seccomp, each BPF programs are executed for each syscall.
>  For Landlock, some BPF programs are executed for some LSM hooks. I don't
>  see why it is a scale issue for Landlock comparing to seccomp. I also
>  don't see why storing the BPF program list pointer in the cgroup struct
>  instead of the task struct change a lot here. The BPF programs execution
>  will be the same anyway (for each LSM hook). Kees should probably have a
>  better opinion on this.
> >>>
> >>> seccomp has its own issues and copying them doesn't make this lsm any 
> >>> better.
> >>> Like seccomp bpf programs are all gigantic switch statement that looks
> >>> for interesting syscall numbers. All syscalls of a task are paying
> >>> non-trivial seccomp penalty due to such design. If bpf was attached per
> >>> syscall it would have been much cheaper. Of course doing it this way
> >>> for seccomp is not easy, but for lsm such facility is already there.
> >>> Blank call of a single bpf prog for all lsm hooks is unnecessary
> >>> overhead that can and should be avoided.
> >>
> >> It's probably a misunderstanding. Contrary to seccomp which run all the
> >> thread's BPF programs for any syscall, Landlock only run eBPF programs
> >> for the triggered LSM hooks, if their type match. Indeed, thanks to the
> >> multiple eBPF program types and contrary to seccomp, Landlock only run
> >> an eBPF program when needed. Landlock will have almost no performance
> >> overhead if the syscalls do not trigger the watched LSM hooks for the
> >> current process.
> > 
> > that's not what I see in the patch 06/10:
> > all lsm_hooks in 'static struct security_hook_list landlock_hooks'
> > (which eventually means all lsm hooks) will call
> > static inline int landlock_hook_##NAME
> > which will call landlock_run_prog()
> > which does:
> > + for (landlock_ret = current->seccomp.landlock_ret;
> > +  landlock_ret; landlock_ret = landlock_ret->prev) {
> > +if (landlock_ret->triggered) {
> > +   ctx.cookie = landlock_ret->cookie;
> > +   for (prog = current->seccomp.landlock_prog;
> > +prog; prog = prog->prev) {
> > +   if (prog->filter == landlock_ret->filter) {
> > +   cur_ret = BPF_PROG_RUN(prog->prog, (void *));
> > +   break;
> > +   }
> > +   }
> > 
> > that is unacceptable overhead and not a scalable design.
> > It kinda works for 3 lsm_hooks as in patch 6, but doesn't scale
> > as soon as more lsm hooks are added.
> 
> Good catch! I forgot to check the program (sub)type in the loop to only
> run the needed programs for the current hook. I will fix this.
> 
> 
> > 
> >> As said above, Landlock will not run an eBPF programs when not strictly
> >> needed. Attaching to a cgroup will have the same performance impact as
> >> attaching to a process hierarchy.
> > 
> > Having a prog per cgroup per lsm_hook is the only scalable way I
> > could come up with. If you see another way, please propose.
> > current->seccomp.landlock_prog is not the answer.
> 
> Hum, I don't see the difference from a performance point of view between
> a cgroup-based or a process hierarchy-based system.
> 
> Maybe a better option should be to use an array of pointers with N
> entries, one for each supported hook, instead of a unique pointer list?

yes, clearly array dereference is faster than link list walk.
Now the question is where to keep this prog_array[num_lsm_hooks] ?
Since we cannot keep it inside task_struct, we have to allocate it.
Every time the task is creted then. What to do on the fork? That
will require changes all over. Then the obvious optimization would be
to share this allocated array of prog 

Re: [RFC v2 09/10] landlock: Handle cgroups (performance)

2016-08-27 Thread Mickaël Salaün

On 27/08/2016 20:06, Alexei Starovoitov wrote:
> On Sat, Aug 27, 2016 at 04:06:38PM +0200, Mickaël Salaün wrote:
>>
>> On 27/08/2016 01:05, Alexei Starovoitov wrote:
>>> On Fri, Aug 26, 2016 at 05:10:40PM +0200, Mickaël Salaün wrote:

>
> - I don't think such 'for' loop can scale. The solution needs to work
> with thousands of containers and thousands of cgroups.
> In the patch 06/10 the proposal is to use 'current' as holder of
> the programs:
> +   for (prog = current->seccomp.landlock_prog;
> +   prog; prog = prog->prev) {
> +   if (prog->filter == landlock_ret->filter) {
> +   cur_ret = BPF_PROG_RUN(prog->prog, (void *));
> +   break;
> +   }
> +   }
> imo that's the root of scalability issue.
> I think to be able to scale the bpf programs have to be attached to
> cgroups instead of tasks.
> That would be very different api. seccomp doesn't need to be touched.
> But that is the only way I see to be able to scale.

 Landlock is inspired from seccomp which also use a BPF program per
 thread. For seccomp, each BPF programs are executed for each syscall.
 For Landlock, some BPF programs are executed for some LSM hooks. I don't
 see why it is a scale issue for Landlock comparing to seccomp. I also
 don't see why storing the BPF program list pointer in the cgroup struct
 instead of the task struct change a lot here. The BPF programs execution
 will be the same anyway (for each LSM hook). Kees should probably have a
 better opinion on this.
>>>
>>> seccomp has its own issues and copying them doesn't make this lsm any 
>>> better.
>>> Like seccomp bpf programs are all gigantic switch statement that looks
>>> for interesting syscall numbers. All syscalls of a task are paying
>>> non-trivial seccomp penalty due to such design. If bpf was attached per
>>> syscall it would have been much cheaper. Of course doing it this way
>>> for seccomp is not easy, but for lsm such facility is already there.
>>> Blank call of a single bpf prog for all lsm hooks is unnecessary
>>> overhead that can and should be avoided.
>>
>> It's probably a misunderstanding. Contrary to seccomp which run all the
>> thread's BPF programs for any syscall, Landlock only run eBPF programs
>> for the triggered LSM hooks, if their type match. Indeed, thanks to the
>> multiple eBPF program types and contrary to seccomp, Landlock only run
>> an eBPF program when needed. Landlock will have almost no performance
>> overhead if the syscalls do not trigger the watched LSM hooks for the
>> current process.
> 
> that's not what I see in the patch 06/10:
> all lsm_hooks in 'static struct security_hook_list landlock_hooks'
> (which eventually means all lsm hooks) will call
> static inline int landlock_hook_##NAME
> which will call landlock_run_prog()
> which does:
> + for (landlock_ret = current->seccomp.landlock_ret;
> +  landlock_ret; landlock_ret = landlock_ret->prev) {
> +if (landlock_ret->triggered) {
> +   ctx.cookie = landlock_ret->cookie;
> +   for (prog = current->seccomp.landlock_prog;
> +prog; prog = prog->prev) {
> +   if (prog->filter == landlock_ret->filter) {
> +   cur_ret = BPF_PROG_RUN(prog->prog, (void *));
> +   break;
> +   }
> +   }
> 
> that is unacceptable overhead and not a scalable design.
> It kinda works for 3 lsm_hooks as in patch 6, but doesn't scale
> as soon as more lsm hooks are added.

Good catch! I forgot to check the program (sub)type in the loop to only
run the needed programs for the current hook. I will fix this.


> 
>> As said above, Landlock will not run an eBPF programs when not strictly
>> needed. Attaching to a cgroup will have the same performance impact as
>> attaching to a process hierarchy.
> 
> Having a prog per cgroup per lsm_hook is the only scalable way I
> could come up with. If you see another way, please propose.
> current->seccomp.landlock_prog is not the answer.

Hum, I don't see the difference from a performance point of view between
a cgroup-based or a process hierarchy-based system.

Maybe a better option should be to use an array of pointers with N
entries, one for each supported hook, instead of a unique pointer list?

Anyway, being able to attach an LSM hook program to a cgroup thanks to
the new BPF_PROG_ATTACH seems a good idea (while keeping the possibility
to use a process hierarchy). The downside will be to handle an LSM hook
program which is not triggered by a seccomp-filter, but this should be
needed anyway to handle interruptions.



signature.asc
Description: OpenPGP digital signature


Re: [RFC v2 09/10] landlock: Handle cgroups (performance)

2016-08-27 Thread Mickaël Salaün

On 27/08/2016 20:06, Alexei Starovoitov wrote:
> On Sat, Aug 27, 2016 at 04:06:38PM +0200, Mickaël Salaün wrote:
>>
>> On 27/08/2016 01:05, Alexei Starovoitov wrote:
>>> On Fri, Aug 26, 2016 at 05:10:40PM +0200, Mickaël Salaün wrote:

>
> - I don't think such 'for' loop can scale. The solution needs to work
> with thousands of containers and thousands of cgroups.
> In the patch 06/10 the proposal is to use 'current' as holder of
> the programs:
> +   for (prog = current->seccomp.landlock_prog;
> +   prog; prog = prog->prev) {
> +   if (prog->filter == landlock_ret->filter) {
> +   cur_ret = BPF_PROG_RUN(prog->prog, (void *));
> +   break;
> +   }
> +   }
> imo that's the root of scalability issue.
> I think to be able to scale the bpf programs have to be attached to
> cgroups instead of tasks.
> That would be very different api. seccomp doesn't need to be touched.
> But that is the only way I see to be able to scale.

 Landlock is inspired from seccomp which also use a BPF program per
 thread. For seccomp, each BPF programs are executed for each syscall.
 For Landlock, some BPF programs are executed for some LSM hooks. I don't
 see why it is a scale issue for Landlock comparing to seccomp. I also
 don't see why storing the BPF program list pointer in the cgroup struct
 instead of the task struct change a lot here. The BPF programs execution
 will be the same anyway (for each LSM hook). Kees should probably have a
 better opinion on this.
>>>
>>> seccomp has its own issues and copying them doesn't make this lsm any 
>>> better.
>>> Like seccomp bpf programs are all gigantic switch statement that looks
>>> for interesting syscall numbers. All syscalls of a task are paying
>>> non-trivial seccomp penalty due to such design. If bpf was attached per
>>> syscall it would have been much cheaper. Of course doing it this way
>>> for seccomp is not easy, but for lsm such facility is already there.
>>> Blank call of a single bpf prog for all lsm hooks is unnecessary
>>> overhead that can and should be avoided.
>>
>> It's probably a misunderstanding. Contrary to seccomp which run all the
>> thread's BPF programs for any syscall, Landlock only run eBPF programs
>> for the triggered LSM hooks, if their type match. Indeed, thanks to the
>> multiple eBPF program types and contrary to seccomp, Landlock only run
>> an eBPF program when needed. Landlock will have almost no performance
>> overhead if the syscalls do not trigger the watched LSM hooks for the
>> current process.
> 
> that's not what I see in the patch 06/10:
> all lsm_hooks in 'static struct security_hook_list landlock_hooks'
> (which eventually means all lsm hooks) will call
> static inline int landlock_hook_##NAME
> which will call landlock_run_prog()
> which does:
> + for (landlock_ret = current->seccomp.landlock_ret;
> +  landlock_ret; landlock_ret = landlock_ret->prev) {
> +if (landlock_ret->triggered) {
> +   ctx.cookie = landlock_ret->cookie;
> +   for (prog = current->seccomp.landlock_prog;
> +prog; prog = prog->prev) {
> +   if (prog->filter == landlock_ret->filter) {
> +   cur_ret = BPF_PROG_RUN(prog->prog, (void *));
> +   break;
> +   }
> +   }
> 
> that is unacceptable overhead and not a scalable design.
> It kinda works for 3 lsm_hooks as in patch 6, but doesn't scale
> as soon as more lsm hooks are added.

Good catch! I forgot to check the program (sub)type in the loop to only
run the needed programs for the current hook. I will fix this.


> 
>> As said above, Landlock will not run an eBPF programs when not strictly
>> needed. Attaching to a cgroup will have the same performance impact as
>> attaching to a process hierarchy.
> 
> Having a prog per cgroup per lsm_hook is the only scalable way I
> could come up with. If you see another way, please propose.
> current->seccomp.landlock_prog is not the answer.

Hum, I don't see the difference from a performance point of view between
a cgroup-based or a process hierarchy-based system.

Maybe a better option should be to use an array of pointers with N
entries, one for each supported hook, instead of a unique pointer list?

Anyway, being able to attach an LSM hook program to a cgroup thanks to
the new BPF_PROG_ATTACH seems a good idea (while keeping the possibility
to use a process hierarchy). The downside will be to handle an LSM hook
program which is not triggered by a seccomp-filter, but this should be
needed anyway to handle interruptions.



signature.asc
Description: OpenPGP digital signature


Re: [RFC v2 09/10] landlock: Handle cgroups (performance)

2016-08-27 Thread Alexei Starovoitov
On Sat, Aug 27, 2016 at 04:06:38PM +0200, Mickaël Salaün wrote:
> 
> On 27/08/2016 01:05, Alexei Starovoitov wrote:
> > On Fri, Aug 26, 2016 at 05:10:40PM +0200, Mickaël Salaün wrote:
> >>
> >>>
> >>> - I don't think such 'for' loop can scale. The solution needs to work
> >>> with thousands of containers and thousands of cgroups.
> >>> In the patch 06/10 the proposal is to use 'current' as holder of
> >>> the programs:
> >>> +   for (prog = current->seccomp.landlock_prog;
> >>> +   prog; prog = prog->prev) {
> >>> +   if (prog->filter == landlock_ret->filter) {
> >>> +   cur_ret = BPF_PROG_RUN(prog->prog, (void *));
> >>> +   break;
> >>> +   }
> >>> +   }
> >>> imo that's the root of scalability issue.
> >>> I think to be able to scale the bpf programs have to be attached to
> >>> cgroups instead of tasks.
> >>> That would be very different api. seccomp doesn't need to be touched.
> >>> But that is the only way I see to be able to scale.
> >>
> >> Landlock is inspired from seccomp which also use a BPF program per
> >> thread. For seccomp, each BPF programs are executed for each syscall.
> >> For Landlock, some BPF programs are executed for some LSM hooks. I don't
> >> see why it is a scale issue for Landlock comparing to seccomp. I also
> >> don't see why storing the BPF program list pointer in the cgroup struct
> >> instead of the task struct change a lot here. The BPF programs execution
> >> will be the same anyway (for each LSM hook). Kees should probably have a
> >> better opinion on this.
> > 
> > seccomp has its own issues and copying them doesn't make this lsm any 
> > better.
> > Like seccomp bpf programs are all gigantic switch statement that looks
> > for interesting syscall numbers. All syscalls of a task are paying
> > non-trivial seccomp penalty due to such design. If bpf was attached per
> > syscall it would have been much cheaper. Of course doing it this way
> > for seccomp is not easy, but for lsm such facility is already there.
> > Blank call of a single bpf prog for all lsm hooks is unnecessary
> > overhead that can and should be avoided.
> 
> It's probably a misunderstanding. Contrary to seccomp which run all the
> thread's BPF programs for any syscall, Landlock only run eBPF programs
> for the triggered LSM hooks, if their type match. Indeed, thanks to the
> multiple eBPF program types and contrary to seccomp, Landlock only run
> an eBPF program when needed. Landlock will have almost no performance
> overhead if the syscalls do not trigger the watched LSM hooks for the
> current process.

that's not what I see in the patch 06/10:
all lsm_hooks in 'static struct security_hook_list landlock_hooks'
(which eventually means all lsm hooks) will call
static inline int landlock_hook_##NAME
which will call landlock_run_prog()
which does:
+ for (landlock_ret = current->seccomp.landlock_ret;
+  landlock_ret; landlock_ret = landlock_ret->prev) {
+if (landlock_ret->triggered) {
+   ctx.cookie = landlock_ret->cookie;
+   for (prog = current->seccomp.landlock_prog;
+prog; prog = prog->prev) {
+   if (prog->filter == landlock_ret->filter) {
+   cur_ret = BPF_PROG_RUN(prog->prog, (void *));
+   break;
+   }
+   }

that is unacceptable overhead and not a scalable design.
It kinda works for 3 lsm_hooks as in patch 6, but doesn't scale
as soon as more lsm hooks are added.

> As said above, Landlock will not run an eBPF programs when not strictly
> needed. Attaching to a cgroup will have the same performance impact as
> attaching to a process hierarchy.

Having a prog per cgroup per lsm_hook is the only scalable way I
could come up with. If you see another way, please propose.
current->seccomp.landlock_prog is not the answer.



Re: [RFC v2 09/10] landlock: Handle cgroups (performance)

2016-08-27 Thread Alexei Starovoitov
On Sat, Aug 27, 2016 at 04:06:38PM +0200, Mickaël Salaün wrote:
> 
> On 27/08/2016 01:05, Alexei Starovoitov wrote:
> > On Fri, Aug 26, 2016 at 05:10:40PM +0200, Mickaël Salaün wrote:
> >>
> >>>
> >>> - I don't think such 'for' loop can scale. The solution needs to work
> >>> with thousands of containers and thousands of cgroups.
> >>> In the patch 06/10 the proposal is to use 'current' as holder of
> >>> the programs:
> >>> +   for (prog = current->seccomp.landlock_prog;
> >>> +   prog; prog = prog->prev) {
> >>> +   if (prog->filter == landlock_ret->filter) {
> >>> +   cur_ret = BPF_PROG_RUN(prog->prog, (void *));
> >>> +   break;
> >>> +   }
> >>> +   }
> >>> imo that's the root of scalability issue.
> >>> I think to be able to scale the bpf programs have to be attached to
> >>> cgroups instead of tasks.
> >>> That would be very different api. seccomp doesn't need to be touched.
> >>> But that is the only way I see to be able to scale.
> >>
> >> Landlock is inspired from seccomp which also use a BPF program per
> >> thread. For seccomp, each BPF programs are executed for each syscall.
> >> For Landlock, some BPF programs are executed for some LSM hooks. I don't
> >> see why it is a scale issue for Landlock comparing to seccomp. I also
> >> don't see why storing the BPF program list pointer in the cgroup struct
> >> instead of the task struct change a lot here. The BPF programs execution
> >> will be the same anyway (for each LSM hook). Kees should probably have a
> >> better opinion on this.
> > 
> > seccomp has its own issues and copying them doesn't make this lsm any 
> > better.
> > Like seccomp bpf programs are all gigantic switch statement that looks
> > for interesting syscall numbers. All syscalls of a task are paying
> > non-trivial seccomp penalty due to such design. If bpf was attached per
> > syscall it would have been much cheaper. Of course doing it this way
> > for seccomp is not easy, but for lsm such facility is already there.
> > Blank call of a single bpf prog for all lsm hooks is unnecessary
> > overhead that can and should be avoided.
> 
> It's probably a misunderstanding. Contrary to seccomp which run all the
> thread's BPF programs for any syscall, Landlock only run eBPF programs
> for the triggered LSM hooks, if their type match. Indeed, thanks to the
> multiple eBPF program types and contrary to seccomp, Landlock only run
> an eBPF program when needed. Landlock will have almost no performance
> overhead if the syscalls do not trigger the watched LSM hooks for the
> current process.

that's not what I see in the patch 06/10:
all lsm_hooks in 'static struct security_hook_list landlock_hooks'
(which eventually means all lsm hooks) will call
static inline int landlock_hook_##NAME
which will call landlock_run_prog()
which does:
+ for (landlock_ret = current->seccomp.landlock_ret;
+  landlock_ret; landlock_ret = landlock_ret->prev) {
+if (landlock_ret->triggered) {
+   ctx.cookie = landlock_ret->cookie;
+   for (prog = current->seccomp.landlock_prog;
+prog; prog = prog->prev) {
+   if (prog->filter == landlock_ret->filter) {
+   cur_ret = BPF_PROG_RUN(prog->prog, (void *));
+   break;
+   }
+   }

that is unacceptable overhead and not a scalable design.
It kinda works for 3 lsm_hooks as in patch 6, but doesn't scale
as soon as more lsm hooks are added.

> As said above, Landlock will not run an eBPF programs when not strictly
> needed. Attaching to a cgroup will have the same performance impact as
> attaching to a process hierarchy.

Having a prog per cgroup per lsm_hook is the only scalable way I
could come up with. If you see another way, please propose.
current->seccomp.landlock_prog is not the answer.



Re: [RFC v2 09/10] landlock: Handle cgroups (performance)

2016-08-27 Thread Mickaël Salaün

On 27/08/2016 01:05, Alexei Starovoitov wrote:
> On Fri, Aug 26, 2016 at 05:10:40PM +0200, Mickaël Salaün wrote:
>>
>>>
>>> - I don't think such 'for' loop can scale. The solution needs to work
>>> with thousands of containers and thousands of cgroups.
>>> In the patch 06/10 the proposal is to use 'current' as holder of
>>> the programs:
>>> +   for (prog = current->seccomp.landlock_prog;
>>> +   prog; prog = prog->prev) {
>>> +   if (prog->filter == landlock_ret->filter) {
>>> +   cur_ret = BPF_PROG_RUN(prog->prog, (void *));
>>> +   break;
>>> +   }
>>> +   }
>>> imo that's the root of scalability issue.
>>> I think to be able to scale the bpf programs have to be attached to
>>> cgroups instead of tasks.
>>> That would be very different api. seccomp doesn't need to be touched.
>>> But that is the only way I see to be able to scale.
>>
>> Landlock is inspired from seccomp which also use a BPF program per
>> thread. For seccomp, each BPF programs are executed for each syscall.
>> For Landlock, some BPF programs are executed for some LSM hooks. I don't
>> see why it is a scale issue for Landlock comparing to seccomp. I also
>> don't see why storing the BPF program list pointer in the cgroup struct
>> instead of the task struct change a lot here. The BPF programs execution
>> will be the same anyway (for each LSM hook). Kees should probably have a
>> better opinion on this.
> 
> seccomp has its own issues and copying them doesn't make this lsm any better.
> Like seccomp bpf programs are all gigantic switch statement that looks
> for interesting syscall numbers. All syscalls of a task are paying
> non-trivial seccomp penalty due to such design. If bpf was attached per
> syscall it would have been much cheaper. Of course doing it this way
> for seccomp is not easy, but for lsm such facility is already there.
> Blank call of a single bpf prog for all lsm hooks is unnecessary
> overhead that can and should be avoided.

It's probably a misunderstanding. Contrary to seccomp which run all the
thread's BPF programs for any syscall, Landlock only run eBPF programs
for the triggered LSM hooks, if their type match. Indeed, thanks to the
multiple eBPF program types and contrary to seccomp, Landlock only run
an eBPF program when needed. Landlock will have almost no performance
overhead if the syscalls do not trigger the watched LSM hooks for the
current process.


> 
>>> May be another way of thinking about it is 'lsm cgroup controller'
>>> that Sargun is proposing.
>>> The lsm hooks will provide stable execution points and the programs
>>> will be called like:
>>> prog = task_css_set(current)->dfl_cgrp->bpf.prog_effective[lsm_hook_id];
>>> BPF_PROG_RUN(prog, ctx);
>>> The delegation functionality and 'prog_effective' logic that
>>> Daniel Mack is proposing will be fully reused here.
>>> External container management software will be able to apply bpf
>>> programs to control tasks under cgroup and such
>>> bpf_landlock_cmp_cgroup_beneath() helper won't be necessary.
>>> The user will be able to register different programs for different lsm 
>>> hooks.
>>> If I understand the patch 6/10 correctly, there is one (or a list) prog for
>>> all lsm hooks per task which is not flexible enough.
>>
>> For each LSM hook triggered by a thread, all of its Landlock eBPF
>> programs (dedicated for this kind of hook) will be evaluated (if
>> needed). This is the same behavior as seccomp (list of BPF programs
>> attached to a process hierarchy) except the BPF programs are not
>> evaluated for syscall but for LSM hooks. There is no way to make it more
>> fine-grained :)
> 
> There is a way to attach different bpf program per cgroup
> and per lsm hook. Such approach drastically reduces overhead
> of sandboxed application.

As said above, Landlock will not run an eBPF programs when not strictly
needed. Attaching to a cgroup will have the same performance impact as
attaching to a process hierarchy.



signature.asc
Description: OpenPGP digital signature


Re: [RFC v2 09/10] landlock: Handle cgroups (performance)

2016-08-27 Thread Mickaël Salaün

On 27/08/2016 01:05, Alexei Starovoitov wrote:
> On Fri, Aug 26, 2016 at 05:10:40PM +0200, Mickaël Salaün wrote:
>>
>>>
>>> - I don't think such 'for' loop can scale. The solution needs to work
>>> with thousands of containers and thousands of cgroups.
>>> In the patch 06/10 the proposal is to use 'current' as holder of
>>> the programs:
>>> +   for (prog = current->seccomp.landlock_prog;
>>> +   prog; prog = prog->prev) {
>>> +   if (prog->filter == landlock_ret->filter) {
>>> +   cur_ret = BPF_PROG_RUN(prog->prog, (void *));
>>> +   break;
>>> +   }
>>> +   }
>>> imo that's the root of scalability issue.
>>> I think to be able to scale the bpf programs have to be attached to
>>> cgroups instead of tasks.
>>> That would be very different api. seccomp doesn't need to be touched.
>>> But that is the only way I see to be able to scale.
>>
>> Landlock is inspired from seccomp which also use a BPF program per
>> thread. For seccomp, each BPF programs are executed for each syscall.
>> For Landlock, some BPF programs are executed for some LSM hooks. I don't
>> see why it is a scale issue for Landlock comparing to seccomp. I also
>> don't see why storing the BPF program list pointer in the cgroup struct
>> instead of the task struct change a lot here. The BPF programs execution
>> will be the same anyway (for each LSM hook). Kees should probably have a
>> better opinion on this.
> 
> seccomp has its own issues and copying them doesn't make this lsm any better.
> Like seccomp bpf programs are all gigantic switch statement that looks
> for interesting syscall numbers. All syscalls of a task are paying
> non-trivial seccomp penalty due to such design. If bpf was attached per
> syscall it would have been much cheaper. Of course doing it this way
> for seccomp is not easy, but for lsm such facility is already there.
> Blank call of a single bpf prog for all lsm hooks is unnecessary
> overhead that can and should be avoided.

It's probably a misunderstanding. Contrary to seccomp which run all the
thread's BPF programs for any syscall, Landlock only run eBPF programs
for the triggered LSM hooks, if their type match. Indeed, thanks to the
multiple eBPF program types and contrary to seccomp, Landlock only run
an eBPF program when needed. Landlock will have almost no performance
overhead if the syscalls do not trigger the watched LSM hooks for the
current process.


> 
>>> May be another way of thinking about it is 'lsm cgroup controller'
>>> that Sargun is proposing.
>>> The lsm hooks will provide stable execution points and the programs
>>> will be called like:
>>> prog = task_css_set(current)->dfl_cgrp->bpf.prog_effective[lsm_hook_id];
>>> BPF_PROG_RUN(prog, ctx);
>>> The delegation functionality and 'prog_effective' logic that
>>> Daniel Mack is proposing will be fully reused here.
>>> External container management software will be able to apply bpf
>>> programs to control tasks under cgroup and such
>>> bpf_landlock_cmp_cgroup_beneath() helper won't be necessary.
>>> The user will be able to register different programs for different lsm 
>>> hooks.
>>> If I understand the patch 6/10 correctly, there is one (or a list) prog for
>>> all lsm hooks per task which is not flexible enough.
>>
>> For each LSM hook triggered by a thread, all of its Landlock eBPF
>> programs (dedicated for this kind of hook) will be evaluated (if
>> needed). This is the same behavior as seccomp (list of BPF programs
>> attached to a process hierarchy) except the BPF programs are not
>> evaluated for syscall but for LSM hooks. There is no way to make it more
>> fine-grained :)
> 
> There is a way to attach different bpf program per cgroup
> and per lsm hook. Such approach drastically reduces overhead
> of sandboxed application.

As said above, Landlock will not run an eBPF programs when not strictly
needed. Attaching to a cgroup will have the same performance impact as
attaching to a process hierarchy.



signature.asc
Description: OpenPGP digital signature