Re: [PATCH net-next 1/2] bpf: enable non-root eBPF programs

2015-10-08 Thread Ingo Molnar

* Alexei Starovoitov  wrote:

> As far as sysctl we can look at two with similar purpose:
> sysctl_perf_event_paranoid and modules_disabled.
> First one is indeed multi level, but not because of the fear of bugs,
> but because of real security implications.

It serves both purposes flexibly, and note that most people and distros will 
use 
the default value.

> [...] Like raw events on hyperthreaded cpu or uncore events can extract data 
> from other user processes. So it controls these extra privileges.

It also controls the generally increased risk caused by a larger attack 
surface, 
which some users may not want to carry and which they can thus shrink.

With a static keys approach there would be no runtime overhead worth speaking 
of, 
so I see no reason why unprivileged eBPF couldn't have a sysctl too - with the 
default value set to permissive.

Thanks,

Ingo
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next 1/2] bpf: enable non-root eBPF programs

2015-10-08 Thread Alexei Starovoitov

On 10/7/15 11:21 PM, Ingo Molnar wrote:

so I see no reason why unprivileged eBPF couldn't have a sysctl too - with the
default value set to permissive.


agreed. sent out v2 follows 'modules_disabled' style.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next 1/2] bpf: enable non-root eBPF programs

2015-10-08 Thread Kees Cook
On Wed, Oct 7, 2015 at 4:49 PM, Alexei Starovoitov  wrote:
> On 10/7/15 3:22 PM, Kees Cook wrote:
>>>
>>> Yes, I agree with you that there would be a CVE regardless. I still
>>> >like the option of configurable access, not a big fan of the sysctl
>>> >either. Thinking out loudly, what about a Kconfig option? We started
>>> >out like this on bpf(2) itself (initially under expert settings, now
>>> >afaik not anymore), and depending on usage scenarios, a requirement
>>> >could be to have immutable cap_sys_admin-only, for other use-cases a
>>> >requirement on the kernel might instead be to have unprivileged users
>>> >as well.
>>
>> It'd be nice to have it just be a Kconfig, but this shoots
>> distro-users in the foot if a distro decides to include unpriv bpf and
>> the user doesn't want it. I think it's probably a good idea to keep
>> the sysctl.
>
>
> I don't like introducing Kconfig for no clear reason. It only adds
> to the testing matrix and makes it harder to hack around.
> Paranoid distros can disable bpf via single config already,
> there is no reason to go more fine grained here.
> Unpriv checks add minimal amount of code, so even for tinification
> purpose there is no need to chop of few bytes. tiny kernels would
> disable bpf all together.
>
> As far as sysctl we can look at two with similar purpose:
> sysctl_perf_event_paranoid and modules_disabled.
> First one is indeed multi level, but not because of the fear of bugs,
> but because of real security implications. Like raw events on
> hyperthreaded cpu or uncore events can extract data from other
> user processes. So it controls these extra privileges.
> For bpf there are no hw implications to deal with.
> If we make seccomp+bpf in the future it shouldn't need another knob
> or extra bit. There are no extra privileges to grant, so not needed.
>
> modules_disabled is off by default and can be toggled on once.
> I think for paranoid distro users that "don't want bpf" that is
> the better model.
> So I'm thinking to do sysctl_unprivileged_bpf_disabled that will be
> 0=off by default (meaning that users can load unpriv socket filter
> programs and seccomp in the future) and that can be switched
> to 1=on once and stay that way until reboot.
> I think that's the best balance that avoids adding checks to all
> apps that want to use bpf and admins can still act on it.
> From app point of view it's no different than bpf syscall
> was not compiled in. So single feature test for bpf syscall will
> be enough.

I think this would be great. :)

-Kees

-- 
Kees Cook
Chrome OS Security
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next 1/2] bpf: enable non-root eBPF programs

2015-10-07 Thread Alexei Starovoitov

On 10/6/15 5:45 AM, Daniel Borkmann wrote:

Should instead something similar be adapted on bpf(2) as well? Or, would
that even be more painful for application developers shipping their stuff
through distros in the end (where they might then decide to just setup
everything BPF-related and then drop privs)?


I think loading as root and then dropping privs won't work in many
cases, since apps still need to access maps even after dropping privs
and today it's not possible, since cap_sys_admin is tested for every
bpf syscall.


I'm also wondering with regards to seccomp, which could adapt to eBPF at
some point and be used by unprivileged programs. Perhaps then, a single
paranoia alike setting might not suit to all eBPF subsystem users. Any
ideas?


There is no such paranoid sysctl for cBPF, so there is no reason to
add one for eBPF other than fear.
Adding multiple sysctl knobs for seccomp, socket, tracing is only
reflection of even higher fear.
What sysadmins suppose to do with such sysctl when kernel is kinda
saying 'may be something unsafe here you're on your own' ?
Also the presence of this sysctl_bpf_enable_unprivileged or any other
one doesn't help with CVEs. Any bug with security implications will
be a CVE regardless, so I think the better course of action is to
avoid introducing this sysctl.

We've discussed adding something like CAP_BPF to control it,
but then again, do we want this because of fear of bugs or because
it's actually needed. I think the design of all CAP_* is to give
unprivileged users permissions to do something beyond normal that
can potentially be harmful for other users or the whole system.
In this case it's not the case. One user can load eBPF programs
and maps up to its MEMLOCK limit and they cannot interfere with
other users or affect the host, so CAP_BPF is not necessary either.

Thoughts?

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next 1/2] bpf: enable non-root eBPF programs

2015-10-07 Thread Kees Cook
On Wed, Oct 7, 2015 at 3:07 PM, Daniel Borkmann  wrote:
> On 10/07/2015 11:20 PM, Alexei Starovoitov wrote:
>>
>> On 10/6/15 5:45 AM, Daniel Borkmann wrote:
>>>
>>> Should instead something similar be adapted on bpf(2) as well? Or, would
>>> that even be more painful for application developers shipping their stuff
>>> through distros in the end (where they might then decide to just setup
>>> everything BPF-related and then drop privs)?
>>
>>
>> I think loading as root and then dropping privs won't work in many
>> cases, since apps still need to access maps even after dropping privs
>> and today it's not possible, since cap_sys_admin is tested for every
>> bpf syscall.
>
>
> Yep, maps-only would then need to be made accessible in some way.
>
>>> I'm also wondering with regards to seccomp, which could adapt to eBPF at
>>> some point and be used by unprivileged programs. Perhaps then, a single
>>> paranoia alike setting might not suit to all eBPF subsystem users. Any
>>> ideas?
>>
>>
>> There is no such paranoid sysctl for cBPF, so there is no reason to
>> add one for eBPF other than fear.
>> Adding multiple sysctl knobs for seccomp, socket, tracing is only
>> reflection of even higher fear.
>> What sysadmins suppose to do with such sysctl when kernel is kinda
>> saying 'may be something unsafe here you're on your own' ?
>> Also the presence of this sysctl_bpf_enable_unprivileged or any other
>> one doesn't help with CVEs. Any bug with security implications will
>> be a CVE regardless, so I think the better course of action is to
>> avoid introducing this sysctl.
>
>
> Yes, I agree with you that there would be a CVE regardless. I still
> like the option of configurable access, not a big fan of the sysctl
> either. Thinking out loudly, what about a Kconfig option? We started
> out like this on bpf(2) itself (initially under expert settings, now
> afaik not anymore), and depending on usage scenarios, a requirement
> could be to have immutable cap_sys_admin-only, for other use-cases a
> requirement on the kernel might instead be to have unprivileged users
> as well.

It'd be nice to have it just be a Kconfig, but this shoots
distro-users in the foot if a distro decides to include unpriv bpf and
the user doesn't want it. I think it's probably a good idea to keep
the sysctl.

-Kees

>
>> We've discussed adding something like CAP_BPF to control it,
>> but then again, do we want this because of fear of bugs or because
>> it's actually needed. I think the design of all CAP_* is to give
>> unprivileged users permissions to do something beyond normal that
>> can potentially be harmful for other users or the whole system.
>> In this case it's not the case. One user can load eBPF programs
>> and maps up to its MEMLOCK limit and they cannot interfere with
>> other users or affect the host, so CAP_BPF is not necessary either.
>
>
> Thanks,
> Daniel



-- 
Kees Cook
Chrome OS Security
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next 1/2] bpf: enable non-root eBPF programs

2015-10-07 Thread Daniel Borkmann

On 10/07/2015 11:20 PM, Alexei Starovoitov wrote:

On 10/6/15 5:45 AM, Daniel Borkmann wrote:

Should instead something similar be adapted on bpf(2) as well? Or, would
that even be more painful for application developers shipping their stuff
through distros in the end (where they might then decide to just setup
everything BPF-related and then drop privs)?


I think loading as root and then dropping privs won't work in many
cases, since apps still need to access maps even after dropping privs
and today it's not possible, since cap_sys_admin is tested for every
bpf syscall.


Yep, maps-only would then need to be made accessible in some way.


I'm also wondering with regards to seccomp, which could adapt to eBPF at
some point and be used by unprivileged programs. Perhaps then, a single
paranoia alike setting might not suit to all eBPF subsystem users. Any
ideas?


There is no such paranoid sysctl for cBPF, so there is no reason to
add one for eBPF other than fear.
Adding multiple sysctl knobs for seccomp, socket, tracing is only
reflection of even higher fear.
What sysadmins suppose to do with such sysctl when kernel is kinda
saying 'may be something unsafe here you're on your own' ?
Also the presence of this sysctl_bpf_enable_unprivileged or any other
one doesn't help with CVEs. Any bug with security implications will
be a CVE regardless, so I think the better course of action is to
avoid introducing this sysctl.


Yes, I agree with you that there would be a CVE regardless. I still
like the option of configurable access, not a big fan of the sysctl
either. Thinking out loudly, what about a Kconfig option? We started
out like this on bpf(2) itself (initially under expert settings, now
afaik not anymore), and depending on usage scenarios, a requirement
could be to have immutable cap_sys_admin-only, for other use-cases a
requirement on the kernel might instead be to have unprivileged users
as well.


We've discussed adding something like CAP_BPF to control it,
but then again, do we want this because of fear of bugs or because
it's actually needed. I think the design of all CAP_* is to give
unprivileged users permissions to do something beyond normal that
can potentially be harmful for other users or the whole system.
In this case it's not the case. One user can load eBPF programs
and maps up to its MEMLOCK limit and they cannot interfere with
other users or affect the host, so CAP_BPF is not necessary either.


Thanks,
Daniel
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next 1/2] bpf: enable non-root eBPF programs

2015-10-07 Thread Ingo Molnar

* Andy Lutomirski  wrote:

> On Tue, Oct 6, 2015 at 10:56 AM, Eric Dumazet  wrote:
> > On Tue, 2015-10-06 at 10:50 -0700, Alexei Starovoitov wrote:
> >
> >> was also thinking that we can do it only in paths that actually
> >> have multiple protocol layers, since today bpf is mainly used with
> >> tcpdump(raw_socket) and new af_packet fanout both have cb cleared
> >> on RX, because it just came out of alloc_skb and no layers were called,
> >> and on TX we can clear 20 bytes in dev_queue_xmit_nit().
> >> af_unix/netlink also have clean skb. Need to analyze tun and sctp...
> >> but it feels overly fragile to save a branch in sk_filter,
> >> so planning to go with
> >> if(unlikely(prog->cb_access)) memset in sk_filter().
> >>
> >
> > This will break TCP use of sk_filter().
> > skb->cb[] contains useful data in TCP layer.
> >
> >
> 
> Since I don't know too much about the networking details:
> 
> 1. Does "skb->cb" *ever* contain anything useful for an unprivileged user?
> 
> 2. Does sbk->cb form a stable ABI?
> 
> Unless both answers are solid yesses, then maybe the right solution is
> to just deny access entirely to unprivileged users.

So this kind of instrumentation data is not an ABI in a similar fashion as 
tracing 
information is not an ABI either.

I.e. tracepoints can (and sometimes do) change 'semantics' - in that the 
interpretation of the implementational details behind that data changes as the 
implementation changes. That's not something that can ever be an ABI, just like 
the contents of /proc/kcore or /proc/slabinfo can not be an ABI.

Thanks,

Ingo
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next 1/2] bpf: enable non-root eBPF programs

2015-10-07 Thread Alexei Starovoitov

On 10/7/15 3:22 PM, Kees Cook wrote:

Yes, I agree with you that there would be a CVE regardless. I still
>like the option of configurable access, not a big fan of the sysctl
>either. Thinking out loudly, what about a Kconfig option? We started
>out like this on bpf(2) itself (initially under expert settings, now
>afaik not anymore), and depending on usage scenarios, a requirement
>could be to have immutable cap_sys_admin-only, for other use-cases a
>requirement on the kernel might instead be to have unprivileged users
>as well.

It'd be nice to have it just be a Kconfig, but this shoots
distro-users in the foot if a distro decides to include unpriv bpf and
the user doesn't want it. I think it's probably a good idea to keep
the sysctl.


I don't like introducing Kconfig for no clear reason. It only adds
to the testing matrix and makes it harder to hack around.
Paranoid distros can disable bpf via single config already,
there is no reason to go more fine grained here.
Unpriv checks add minimal amount of code, so even for tinification
purpose there is no need to chop of few bytes. tiny kernels would
disable bpf all together.

As far as sysctl we can look at two with similar purpose:
sysctl_perf_event_paranoid and modules_disabled.
First one is indeed multi level, but not because of the fear of bugs,
but because of real security implications. Like raw events on
hyperthreaded cpu or uncore events can extract data from other
user processes. So it controls these extra privileges.
For bpf there are no hw implications to deal with.
If we make seccomp+bpf in the future it shouldn't need another knob
or extra bit. There are no extra privileges to grant, so not needed.

modules_disabled is off by default and can be toggled on once.
I think for paranoid distro users that "don't want bpf" that is
the better model.
So I'm thinking to do sysctl_unprivileged_bpf_disabled that will be
0=off by default (meaning that users can load unpriv socket filter
programs and seccomp in the future) and that can be switched
to 1=on once and stay that way until reboot.
I think that's the best balance that avoids adding checks to all
apps that want to use bpf and admins can still act on it.
From app point of view it's no different than bpf syscall
was not compiled in. So single feature test for bpf syscall will
be enough.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next 1/2] bpf: enable non-root eBPF programs

2015-10-07 Thread Alexei Starovoitov

On 10/5/15 1:48 PM, Alexei Starovoitov wrote:

Unprivileged socket filter bpf programs have access to the
following helper functions:
- map lookup/update/delete (but they cannot store kernel pointers into them)
- get_random (it's already exposed to unprivileged user space)
- get_smp_processor_id
- tail_call into another socket filter program
- ktime_get_ns
- bpf_trace_printk (for debugging)


while reviewing everything for Nth time realized that
bpf_trace_printk is useless for unprivileged users, since
trace_pipe is root only.
So going to drop it in V2.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next 1/2] bpf: enable non-root eBPF programs

2015-10-06 Thread Alexei Starovoitov

On 10/6/15 1:39 AM, Daniel Borkmann wrote:

[...] Also classic BPF would then need to test for it, since a socket
filter
doesn't really know whether native eBPF is loaded there or a
classic-to-eBPF
transformed one, and classic never makes use of this. Anyway, it
could be done
by adding a bit flag cb_access:1 to the bpf_prog, set it during eBPF
verification phase, and test it inside sk_filter() if I see it
correctly.


That could also be done in an unlikely() branch, to keep the cost to
the non-eBPF
case near zero.


Yes, agreed. For the time being, the majority of users are coming from the
classic BPF side anyway and the unlikely() could still be changed later on
if it should not be the case anymore. The flag and bpf_func would share the
same cacheline as well.


was also thinking that we can do it only in paths that actually
have multiple protocol layers, since today bpf is mainly used with
tcpdump(raw_socket) and new af_packet fanout both have cb cleared
on RX, because it just came out of alloc_skb and no layers were called,
and on TX we can clear 20 bytes in dev_queue_xmit_nit().
af_unix/netlink also have clean skb. Need to analyze tun and sctp...
but it feels overly fragile to save a branch in sk_filter,
so planning to go with
if(unlikely(prog->cb_access)) memset in sk_filter().

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next 1/2] bpf: enable non-root eBPF programs

2015-10-06 Thread Daniel Borkmann

On 10/06/2015 07:50 PM, Alexei Starovoitov wrote:

On 10/6/15 1:39 AM, Daniel Borkmann wrote:

[...] Also classic BPF would then need to test for it, since a socket
filter
doesn't really know whether native eBPF is loaded there or a
classic-to-eBPF
transformed one, and classic never makes use of this. Anyway, it
could be done
by adding a bit flag cb_access:1 to the bpf_prog, set it during eBPF
verification phase, and test it inside sk_filter() if I see it
correctly.


That could also be done in an unlikely() branch, to keep the cost to
the non-eBPF
case near zero.


Yes, agreed. For the time being, the majority of users are coming from the
classic BPF side anyway and the unlikely() could still be changed later on
if it should not be the case anymore. The flag and bpf_func would share the
same cacheline as well.


was also thinking that we can do it only in paths that actually
have multiple protocol layers, since today bpf is mainly used with
tcpdump(raw_socket) and new af_packet fanout both have cb cleared
on RX, because it just came out of alloc_skb and no layers were called,
and on TX we can clear 20 bytes in dev_queue_xmit_nit().
af_unix/netlink also have clean skb. Need to analyze tun and sctp...
but it feels overly fragile to save a branch in sk_filter,
so planning to go with
if(unlikely(prog->cb_access)) memset in sk_filter().


I was also thinking that for dev_queue_xmit_nit(), since we do the skb_clone()
there, to have a clone version (w/o affecting performance of the current one)
that instead of copying cb[] over, it would just do a memset(). But that would
just be limited to AF_PACKET, and doesn't catch all sk_filter() users.

Thanks,
Daniel
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next 1/2] bpf: enable non-root eBPF programs

2015-10-06 Thread Alexei Starovoitov

On 10/6/15 10:56 AM, Eric Dumazet wrote:

On Tue, 2015-10-06 at 10:50 -0700, Alexei Starovoitov wrote:


was also thinking that we can do it only in paths that actually
have multiple protocol layers, since today bpf is mainly used with
tcpdump(raw_socket) and new af_packet fanout both have cb cleared
on RX, because it just came out of alloc_skb and no layers were called,
and on TX we can clear 20 bytes in dev_queue_xmit_nit().
af_unix/netlink also have clean skb. Need to analyze tun and sctp...
but it feels overly fragile to save a branch in sk_filter,
so planning to go with
if(unlikely(prog->cb_access)) memset in sk_filter().



This will break TCP use of sk_filter().
skb->cb[] contains useful data in TCP layer.


oops. thanks for catching. In case of sk_filter on top of tcp sock,
it shouldn't be looking at cb at all.
I'm thinking to send a patch to get rid of cb access for socket filters
all together until better solution found.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next 1/2] bpf: enable non-root eBPF programs

2015-10-06 Thread Andy Lutomirski
On Tue, Oct 6, 2015 at 10:56 AM, Eric Dumazet  wrote:
> On Tue, 2015-10-06 at 10:50 -0700, Alexei Starovoitov wrote:
>
>> was also thinking that we can do it only in paths that actually
>> have multiple protocol layers, since today bpf is mainly used with
>> tcpdump(raw_socket) and new af_packet fanout both have cb cleared
>> on RX, because it just came out of alloc_skb and no layers were called,
>> and on TX we can clear 20 bytes in dev_queue_xmit_nit().
>> af_unix/netlink also have clean skb. Need to analyze tun and sctp...
>> but it feels overly fragile to save a branch in sk_filter,
>> so planning to go with
>> if(unlikely(prog->cb_access)) memset in sk_filter().
>>
>
> This will break TCP use of sk_filter().
> skb->cb[] contains useful data in TCP layer.
>
>

Since I don't know too much about the networking details:

1. Does "skb->cb" *ever* contain anything useful for an unprivileged user?

2. Does sbk->cb form a stable ABI?

Unless both answers are solid yesses, then maybe the right solution is
to just deny access entirely to unprivileged users.

--Andy

-- 
Andy Lutomirski
AMA Capital Management, LLC
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next 1/2] bpf: enable non-root eBPF programs

2015-10-06 Thread Eric Dumazet
On Tue, 2015-10-06 at 10:50 -0700, Alexei Starovoitov wrote:

> was also thinking that we can do it only in paths that actually
> have multiple protocol layers, since today bpf is mainly used with
> tcpdump(raw_socket) and new af_packet fanout both have cb cleared
> on RX, because it just came out of alloc_skb and no layers were called,
> and on TX we can clear 20 bytes in dev_queue_xmit_nit().
> af_unix/netlink also have clean skb. Need to analyze tun and sctp...
> but it feels overly fragile to save a branch in sk_filter,
> so planning to go with
> if(unlikely(prog->cb_access)) memset in sk_filter().
> 

This will break TCP use of sk_filter().
skb->cb[] contains useful data in TCP layer.


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next 1/2] bpf: enable non-root eBPF programs

2015-10-06 Thread Ingo Molnar

* Alexei Starovoitov  wrote:

> On 10/5/15 3:14 PM, Daniel Borkmann wrote:
> >One scenario that comes to mind ... what happens when there are kernel
> >pointers stored in skb->cb[] (either from the current layer or an old
> >one from a different layer that the skb went through previously, but
> >which did not get overwritten)?
> >
> >Socket filters could read a portion of skb->cb[] also when unprived and
> >leak that out through maps. I think the verifier doesn't catch that,
> >right?
> 
> grrr. indeed. previous layer before sk_filter() can leave junk in there.

Could this be solved by activating zeroing/sanitizing of this data if there's 
an 
active BPF function around that can access that socket?

Thanks,

Ingo
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next 1/2] bpf: enable non-root eBPF programs

2015-10-06 Thread Ingo Molnar

* Daniel Borkmann  wrote:

> On 10/06/2015 09:13 AM, Ingo Molnar wrote:
> >
> >* Alexei Starovoitov  wrote:
> >
> >>On 10/5/15 3:14 PM, Daniel Borkmann wrote:
> >>>One scenario that comes to mind ... what happens when there are kernel
> >>>pointers stored in skb->cb[] (either from the current layer or an old
> >>>one from a different layer that the skb went through previously, but
> >>>which did not get overwritten)?
> >>>
> >>>Socket filters could read a portion of skb->cb[] also when unprived and
> >>>leak that out through maps. I think the verifier doesn't catch that,
> >>>right?
> >>
> >>grrr. indeed. previous layer before sk_filter() can leave junk in there.
> >
> >Could this be solved by activating zeroing/sanitizing of this data if 
> >there's an
> >active BPF function around that can access that socket?
> 
> I think this check could only be done in sk_filter() for testing these
> conditions (unprivileged user + access to cb area), so it would need to
> happen from outside a native eBPF program. :/

Yes, the kernel (with code running outside of any eBPF program) would guarantee 
that those data fields are zeroed/sanitized, if there's an eBPF program that is 
attached to that socket.

> [...] Also classic BPF would then need to test for it, since a socket filter 
> doesn't really know whether native eBPF is loaded there or a classic-to-eBPF 
> transformed one, and classic never makes use of this. Anyway, it could be 
> done 
> by adding a bit flag cb_access:1 to the bpf_prog, set it during eBPF 
> verification phase, and test it inside sk_filter() if I see it correctly.

That could also be done in an unlikely() branch, to keep the cost to the 
non-eBPF 
case near zero.

> The reason is that this sanitizing must only be done in the 'top-level' 
> program 
> that is run from sk_filter() _directly_, because a user at any time could 
> decide 
> to put an already loaded eBPF fd into a tail call map. And cb[] is then used 
> to 
> pass args/state around between two programs, thus it cannot be 
> unconditionally 
> cleared from within the program. The association to a socket filter 
> (SO_ATTACH_BPF) happens at a later time after a native eBPF program has 
> already 
> been loaded via bpf(2).

So zeroing tends to be very cheap and it could also be beneficial to 
performance 
in terms of bringing the cacheline into the CPU cache. But I really don't know 
the 
filter code so I'm just handwaving.

Thanks,

Ingo
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next 1/2] bpf: enable non-root eBPF programs

2015-10-06 Thread Daniel Borkmann

On 10/06/2015 10:20 AM, Ingo Molnar wrote:


* Daniel Borkmann  wrote:


On 10/06/2015 09:13 AM, Ingo Molnar wrote:


* Alexei Starovoitov  wrote:


On 10/5/15 3:14 PM, Daniel Borkmann wrote:

One scenario that comes to mind ... what happens when there are kernel
pointers stored in skb->cb[] (either from the current layer or an old
one from a different layer that the skb went through previously, but
which did not get overwritten)?

Socket filters could read a portion of skb->cb[] also when unprived and
leak that out through maps. I think the verifier doesn't catch that,
right?


grrr. indeed. previous layer before sk_filter() can leave junk in there.


Could this be solved by activating zeroing/sanitizing of this data if there's an
active BPF function around that can access that socket?


I think this check could only be done in sk_filter() for testing these
conditions (unprivileged user + access to cb area), so it would need to
happen from outside a native eBPF program. :/


Yes, the kernel (with code running outside of any eBPF program) would guarantee
that those data fields are zeroed/sanitized, if there's an eBPF program that is
attached to that socket.


[...] Also classic BPF would then need to test for it, since a socket filter
doesn't really know whether native eBPF is loaded there or a classic-to-eBPF
transformed one, and classic never makes use of this. Anyway, it could be done
by adding a bit flag cb_access:1 to the bpf_prog, set it during eBPF
verification phase, and test it inside sk_filter() if I see it correctly.


That could also be done in an unlikely() branch, to keep the cost to the 
non-eBPF
case near zero.


Yes, agreed. For the time being, the majority of users are coming from the
classic BPF side anyway and the unlikely() could still be changed later on
if it should not be the case anymore. The flag and bpf_func would share the
same cacheline as well.


The reason is that this sanitizing must only be done in the 'top-level' program
that is run from sk_filter() _directly_, because a user at any time could decide
to put an already loaded eBPF fd into a tail call map. And cb[] is then used to
pass args/state around between two programs, thus it cannot be unconditionally
cleared from within the program. The association to a socket filter
(SO_ATTACH_BPF) happens at a later time after a native eBPF program has already
been loaded via bpf(2).


So zeroing tends to be very cheap and it could also be beneficial to performance
in terms of bringing the cacheline into the CPU cache. But I really don't know 
the
filter code so I'm just handwaving.

Thanks,

Ingo

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next 1/2] bpf: enable non-root eBPF programs

2015-10-06 Thread Daniel Borkmann

On 10/06/2015 09:13 AM, Ingo Molnar wrote:


* Alexei Starovoitov  wrote:


On 10/5/15 3:14 PM, Daniel Borkmann wrote:

One scenario that comes to mind ... what happens when there are kernel
pointers stored in skb->cb[] (either from the current layer or an old
one from a different layer that the skb went through previously, but
which did not get overwritten)?

Socket filters could read a portion of skb->cb[] also when unprived and
leak that out through maps. I think the verifier doesn't catch that,
right?


grrr. indeed. previous layer before sk_filter() can leave junk in there.


Could this be solved by activating zeroing/sanitizing of this data if there's an
active BPF function around that can access that socket?


I think this check could only be done in sk_filter() for testing these
conditions (unprivileged user + access to cb area), so it would need to
happen from outside a native eBPF program. :/ Also classic BPF would
then need to test for it, since a socket filter doesn't really know
whether native eBPF is loaded there or a classic-to-eBPF transformed one,
and classic never makes use of this. Anyway, it could be done by adding
a bit flag cb_access:1 to the bpf_prog, set it during eBPF verification
phase, and test it inside sk_filter() if I see it correctly.

The reason is that this sanitizing must only be done in the 'top-level'
program that is run from sk_filter() _directly_, because a user at any
time could decide to put an already loaded eBPF fd into a tail call map.
And cb[] is then used to pass args/state around between two programs,
thus it cannot be unconditionally cleared from within the program. The
association to a socket filter (SO_ATTACH_BPF) happens at a later time
after a native eBPF program has already been loaded via bpf(2).

Thanks,
Daniel
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next 1/2] bpf: enable non-root eBPF programs

2015-10-06 Thread Daniel Borkmann

On 10/06/2015 02:51 AM, Alexei Starovoitov wrote:

On 10/5/15 3:14 PM, Daniel Borkmann wrote:

One scenario that comes to mind ... what happens when there are kernel
pointers stored in skb->cb[] (either from the current layer or an old
one from a different layer that the skb went through previously, but
which did not get overwritten)?

Socket filters could read a portion of skb->cb[] also when unprived and
leak that out through maps. I think the verifier doesn't catch that,
right?

...

Please keep poking.


;)

I'm still wondering whether sysctl_bpf_enable_unprivileged is a good
way to go with regards to controlling capabilties of bpf(2), hmm, but
don't really have a good idea at the moment.

So, the rationale of this is to give it some soaking time before flipping
the switch that then defaults to on, and later on to still have a
possibility for an admin to turn it off (if not silently overwritten by
some system application later on ;)).

I think only having a Kconfig doesn't really make sense as distros
will blindly turn lots of stuff on anyway. A hidden Kconfig entry
that is not exposed into menuconfig might allow for sorting everything
out first, but with the issue of getting only little testing exposure.

If I see this correctly, perf_event_open(2) has a number of paranoia
levels with some helpers wrapped around it, f.e.:

/*
 * perf event paranoia level:
 *  -1 - not paranoid at all
 *   0 - disallow raw tracepoint access for unpriv
 *   1 - disallow cpu events for unpriv
 *   2 - disallow kernel profiling for unpriv
 */
int sysctl_perf_event_paranoid __read_mostly = 1;

Should instead something similar be adapted on bpf(2) as well? Or, would
that even be more painful for application developers shipping their stuff
through distros in the end (where they might then decide to just setup
everything BPF-related and then drop privs)?

I'm also wondering with regards to seccomp, which could adapt to eBPF at
some point and be used by unprivileged programs. Perhaps then, a single
paranoia alike setting might not suit to all eBPF subsystem users. Any
ideas?

Thanks,
Daniel
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH net-next 1/2] bpf: enable non-root eBPF programs

2015-10-05 Thread Alexei Starovoitov
In order to let unprivileged users load and execute eBPF programs
teach verifier to prevent pointer leaks.
Verifier will prevent
- any arithmetic on pointers
  (except R10+Imm which is used to compute stack addresses)
- comparison of pointers
- passing pointers to helper functions
- indirectly passing pointers in stack to helper functions
- returning pointer from bpf program
- storing pointers into ctx or maps

Spill/fill of pointers into stack is allowed, but mangling
of pointers stored in the stack or reading them byte by byte is not.

Within bpf programs the pointers do exist, since programs need to
be able to access maps, pass skb pointer to LD_ABS insns, etc
but programs cannot pass such pointer values to the outside
or obfuscate them.

Only allow BPF_PROG_TYPE_SOCKET_FILTER unprivileged programs,
so that socket filters (tcpdump), af_packet (quic acceleration)
and future kcm can use it.
tracing and tc cls/act program types still require root permissions,
since tracing actually needs to be able to see all kernel pointers
and tc is for root only.

For example, the following unprivileged socket filter program is allowed:
int foo(struct __sk_buff *skb)
{
  char fmt[] = "hello %d\n";
  bpf_trace_printk(fmt, sizeof(fmt), skb->len);
  return 0;
}

but the following program is not:
int foo(struct __sk_buff *skb)
{
  char fmt[] = "hello %p\n";
  bpf_trace_printk(fmt, sizeof(fmt), fmt);
  return 0;
}
since it would leak the kernel stack address via bpf_trace_printk().

Unprivileged socket filter bpf programs have access to the
following helper functions:
- map lookup/update/delete (but they cannot store kernel pointers into them)
- get_random (it's already exposed to unprivileged user space)
- get_smp_processor_id
- tail_call into another socket filter program
- ktime_get_ns
- bpf_trace_printk (for debugging)

The feature is controlled by sysctl kernel.bpf_enable_unprivileged
which is off by default.

New tests were added to test_verifier:
 unpriv: return pointer OK
 unpriv: add const to pointer OK
 unpriv: add pointer to pointer OK
 unpriv: neg pointer OK
 unpriv: cmp pointer with const OK
 unpriv: cmp pointer with pointer OK
 unpriv: pass pointer to printk OK
 unpriv: pass pointer to helper function OK
 unpriv: indirectly pass pointer on stack to helper function OK
 unpriv: mangle pointer on stack 1 OK
 unpriv: mangle pointer on stack 2 OK
 unpriv: read pointer from stack in small chunks OK
 unpriv: write pointer into ctx OK
 unpriv: write pointer into map elem value OK
 unpriv: partial copy of pointer OK

Signed-off-by: Alexei Starovoitov 
---
 include/linux/bpf.h |3 +
 kernel/bpf/syscall.c|   11 +-
 kernel/bpf/verifier.c   |  114 +++--
 kernel/sysctl.c |   10 ++
 kernel/trace/bpf_trace.c|3 +
 samples/bpf/libbpf.h|8 ++
 samples/bpf/test_verifier.c |  239 ++-
 7 files changed, 371 insertions(+), 17 deletions(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index f57d7fed9ec3..acf97d66b681 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -66,6 +66,7 @@ enum bpf_arg_type {
 
ARG_PTR_TO_CTX, /* pointer to context */
ARG_ANYTHING,   /* any (initialized) argument is ok */
+   ARG_VARARG, /* optional argument */
 };
 
 /* type of values returned from helper functions */
@@ -168,6 +169,8 @@ void bpf_prog_put_rcu(struct bpf_prog *prog);
 struct bpf_map *bpf_map_get(struct fd f);
 void bpf_map_put(struct bpf_map *map);
 
+extern int sysctl_bpf_enable_unprivileged;
+
 /* verify correctness of eBPF program */
 int bpf_check(struct bpf_prog **fp, union bpf_attr *attr);
 #else
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 5f35f420c12f..9a2098da2da9 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -18,6 +18,8 @@
 #include 
 #include 
 
+int sysctl_bpf_enable_unprivileged __read_mostly;
+
 static LIST_HEAD(bpf_map_types);
 
 static struct bpf_map *find_and_alloc_map(union bpf_attr *attr)
@@ -542,6 +544,9 @@ static int bpf_prog_load(union bpf_attr *attr)
attr->kern_version != LINUX_VERSION_CODE)
return -EINVAL;
 
+   if (type != BPF_PROG_TYPE_SOCKET_FILTER && !capable(CAP_SYS_ADMIN))
+   return -EPERM;
+
/* plain bpf_prog allocation */
prog = bpf_prog_alloc(bpf_prog_size(attr->insn_cnt), GFP_USER);
if (!prog)
@@ -597,11 +602,7 @@ SYSCALL_DEFINE3(bpf, int, cmd, union bpf_attr __user *, 
uattr, unsigned int, siz
union bpf_attr attr = {};
int err;
 
-   /* the syscall is limited to root temporarily. This restriction will be
-* lifted when security audit is clean. Note that eBPF+tracing must have
-* this restriction, since it may pass kernel data to user space
-*/
-   if (!capable(CAP_SYS_ADMIN))
+   if (!capable(CAP_SYS_ADMIN) && !sysctl_bpf_enable_unprivileged)

Re: [PATCH net-next 1/2] bpf: enable non-root eBPF programs

2015-10-05 Thread Kees Cook
On Mon, Oct 5, 2015 at 1:48 PM, Alexei Starovoitov  wrote:
> In order to let unprivileged users load and execute eBPF programs
> teach verifier to prevent pointer leaks.
> Verifier will prevent
> - any arithmetic on pointers
>   (except R10+Imm which is used to compute stack addresses)
> - comparison of pointers
> - passing pointers to helper functions
> - indirectly passing pointers in stack to helper functions
> - returning pointer from bpf program
> - storing pointers into ctx or maps

Does the arithmetic restriction include using a pointer as an index to
a maps-based tail call? I'm still worried about pointer-based
side-effects.

-Kees

>
> Spill/fill of pointers into stack is allowed, but mangling
> of pointers stored in the stack or reading them byte by byte is not.
>
> Within bpf programs the pointers do exist, since programs need to
> be able to access maps, pass skb pointer to LD_ABS insns, etc
> but programs cannot pass such pointer values to the outside
> or obfuscate them.
>
> Only allow BPF_PROG_TYPE_SOCKET_FILTER unprivileged programs,
> so that socket filters (tcpdump), af_packet (quic acceleration)
> and future kcm can use it.
> tracing and tc cls/act program types still require root permissions,
> since tracing actually needs to be able to see all kernel pointers
> and tc is for root only.
>
> For example, the following unprivileged socket filter program is allowed:
> int foo(struct __sk_buff *skb)
> {
>   char fmt[] = "hello %d\n";
>   bpf_trace_printk(fmt, sizeof(fmt), skb->len);
>   return 0;
> }
>
> but the following program is not:
> int foo(struct __sk_buff *skb)
> {
>   char fmt[] = "hello %p\n";
>   bpf_trace_printk(fmt, sizeof(fmt), fmt);
>   return 0;
> }
> since it would leak the kernel stack address via bpf_trace_printk().
>
> Unprivileged socket filter bpf programs have access to the
> following helper functions:
> - map lookup/update/delete (but they cannot store kernel pointers into them)
> - get_random (it's already exposed to unprivileged user space)
> - get_smp_processor_id
> - tail_call into another socket filter program
> - ktime_get_ns
> - bpf_trace_printk (for debugging)
>
> The feature is controlled by sysctl kernel.bpf_enable_unprivileged
> which is off by default.
>
> New tests were added to test_verifier:
>  unpriv: return pointer OK
>  unpriv: add const to pointer OK
>  unpriv: add pointer to pointer OK
>  unpriv: neg pointer OK
>  unpriv: cmp pointer with const OK
>  unpriv: cmp pointer with pointer OK
>  unpriv: pass pointer to printk OK
>  unpriv: pass pointer to helper function OK
>  unpriv: indirectly pass pointer on stack to helper function OK
>  unpriv: mangle pointer on stack 1 OK
>  unpriv: mangle pointer on stack 2 OK
>  unpriv: read pointer from stack in small chunks OK
>  unpriv: write pointer into ctx OK
>  unpriv: write pointer into map elem value OK
>  unpriv: partial copy of pointer OK
>
> Signed-off-by: Alexei Starovoitov 
> ---
>  include/linux/bpf.h |3 +
>  kernel/bpf/syscall.c|   11 +-
>  kernel/bpf/verifier.c   |  114 +++--
>  kernel/sysctl.c |   10 ++
>  kernel/trace/bpf_trace.c|3 +
>  samples/bpf/libbpf.h|8 ++
>  samples/bpf/test_verifier.c |  239 
> ++-
>  7 files changed, 371 insertions(+), 17 deletions(-)
>
> diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> index f57d7fed9ec3..acf97d66b681 100644
> --- a/include/linux/bpf.h
> +++ b/include/linux/bpf.h
> @@ -66,6 +66,7 @@ enum bpf_arg_type {
>
> ARG_PTR_TO_CTX, /* pointer to context */
> ARG_ANYTHING,   /* any (initialized) argument is ok */
> +   ARG_VARARG, /* optional argument */
>  };
>
>  /* type of values returned from helper functions */
> @@ -168,6 +169,8 @@ void bpf_prog_put_rcu(struct bpf_prog *prog);
>  struct bpf_map *bpf_map_get(struct fd f);
>  void bpf_map_put(struct bpf_map *map);
>
> +extern int sysctl_bpf_enable_unprivileged;
> +
>  /* verify correctness of eBPF program */
>  int bpf_check(struct bpf_prog **fp, union bpf_attr *attr);
>  #else
> diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
> index 5f35f420c12f..9a2098da2da9 100644
> --- a/kernel/bpf/syscall.c
> +++ b/kernel/bpf/syscall.c
> @@ -18,6 +18,8 @@
>  #include 
>  #include 
>
> +int sysctl_bpf_enable_unprivileged __read_mostly;
> +
>  static LIST_HEAD(bpf_map_types);
>
>  static struct bpf_map *find_and_alloc_map(union bpf_attr *attr)
> @@ -542,6 +544,9 @@ static int bpf_prog_load(union bpf_attr *attr)
> attr->kern_version != LINUX_VERSION_CODE)
> return -EINVAL;
>
> +   if (type != BPF_PROG_TYPE_SOCKET_FILTER && !capable(CAP_SYS_ADMIN))
> +   return -EPERM;
> +
> /* plain bpf_prog allocation */
> prog = bpf_prog_alloc(bpf_prog_size(attr->insn_cnt), GFP_USER);
> if (!prog)
> @@ -597,11 +602,7 @@ SYSCALL_DEFINE3(bpf, int, cmd, union 

Re: [PATCH net-next 1/2] bpf: enable non-root eBPF programs

2015-10-05 Thread Alexei Starovoitov

On 10/5/15 3:14 PM, Daniel Borkmann wrote:

One scenario that comes to mind ... what happens when there are kernel
pointers stored in skb->cb[] (either from the current layer or an old
one from a different layer that the skb went through previously, but
which did not get overwritten)?

Socket filters could read a portion of skb->cb[] also when unprived and
leak that out through maps. I think the verifier doesn't catch that,
right?


grrr. indeed. previous layer before sk_filter() can leave junk in there.
Would need to disable cb[0-5] for unpriv, but that will make tail_call
much harder to use, since cb[0-5] is a way to pass arguments from
one prog to another and clearing them is not an option, since it's
too expensive. Like samples/bpf/sockex3_kern.c usage of cb[0]
won't work anymore. I guess that's the price of unpriv.
Will fix this, add few tail_call specific tests and respin.
Please keep poking.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next 1/2] bpf: enable non-root eBPF programs

2015-10-05 Thread Alexei Starovoitov

On 10/5/15 2:00 PM, Kees Cook wrote:

On Mon, Oct 5, 2015 at 1:48 PM, Alexei Starovoitov  wrote:

>In order to let unprivileged users load and execute eBPF programs
>teach verifier to prevent pointer leaks.
>Verifier will prevent
>- any arithmetic on pointers
>   (except R10+Imm which is used to compute stack addresses)
>- comparison of pointers
>- passing pointers to helper functions
>- indirectly passing pointers in stack to helper functions
>- returning pointer from bpf program
>- storing pointers into ctx or maps

Does the arithmetic restriction include using a pointer as an index to
a maps-based tail call? I'm still worried about pointer-based
side-effects.


the array maps that hold FDs (BPF_MAP_TYPE_PROG_ARRAY and
BPF_MAP_TYPE_PERF_EVENT_ARRAY) don't have lookup/update accessors
from the program side, so programs cannot see or manipulate
those pointers.
For the former only bpf_tail_call() is allowed that takes integer
index and jumps to it. And the latter map accessed with
bpf_perf_event_read() that also takes index only (this helper
is not available to socket filters anyway).
Also bpf_tail_call() can only jump to the program of the same type.
So I'm quite certain it's safe.

Yes, please ask questions and try to poke holes. Now it is time.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next 1/2] bpf: enable non-root eBPF programs

2015-10-05 Thread Alexei Starovoitov

On 10/5/15 2:16 PM, Andy Lutomirski wrote:

On Mon, Oct 5, 2015 at 2:12 PM, Alexei Starovoitov  wrote:

On 10/5/15 2:00 PM, Kees Cook wrote:


On Mon, Oct 5, 2015 at 1:48 PM, Alexei Starovoitov
wrote:



In order to let unprivileged users load and execute eBPF programs
teach verifier to prevent pointer leaks.
Verifier will prevent
- any arithmetic on pointers
   (except R10+Imm which is used to compute stack addresses)
- comparison of pointers
- passing pointers to helper functions
- indirectly passing pointers in stack to helper functions
- returning pointer from bpf program
- storing pointers into ctx or maps


Does the arithmetic restriction include using a pointer as an index to
a maps-based tail call? I'm still worried about pointer-based
side-effects.



the array maps that hold FDs (BPF_MAP_TYPE_PROG_ARRAY and
BPF_MAP_TYPE_PERF_EVENT_ARRAY) don't have lookup/update accessors
from the program side, so programs cannot see or manipulate
those pointers.
For the former only bpf_tail_call() is allowed that takes integer
index and jumps to it. And the latter map accessed with
bpf_perf_event_read() that also takes index only (this helper
is not available to socket filters anyway).
Also bpf_tail_call() can only jump to the program of the same type.
So I'm quite certain it's safe.


At some point there will be an unprivileged way to create a map,
though, and we don't want to let pointers get poked into the map.


yes. exactly. With these two patches non-root can create a map
against memlock user limit and have a program store bytes into it
(like data from network packet), but it cannot store pointers into it.
That's covered by test "unpriv: write pointer into map elem value"
I've added new tests for all cases that can 'leak pointer':
 unpriv: return pointer OK
 unpriv: add const to pointer OK
 unpriv: add pointer to pointer OK
 unpriv: neg pointer OK
 unpriv: cmp pointer with const OK
 unpriv: cmp pointer with pointer OK
 unpriv: pass pointer to printk OK
 unpriv: pass pointer to helper function OK
 unpriv: indirectly pass pointer on stack to helper function OK
 unpriv: mangle pointer on stack 1 OK
 unpriv: mangle pointer on stack 2 OK
 unpriv: read pointer from stack in small chunks OK
 unpriv: write pointer into ctx OK
 unpriv: write pointer into map elem value OK
 unpriv: partial copy of pointer OK

the most interesting one is 'indirectly pass pointer'.
It checks the case where user stores a pointer into a stack
and then uses that stack region either as a key for lookup or
as part of format string for printk.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next 1/2] bpf: enable non-root eBPF programs

2015-10-05 Thread Kees Cook
On Mon, Oct 5, 2015 at 2:12 PM, Alexei Starovoitov  wrote:
> On 10/5/15 2:00 PM, Kees Cook wrote:
>>
>> On Mon, Oct 5, 2015 at 1:48 PM, Alexei Starovoitov
>> wrote:
>>>
>>> >In order to let unprivileged users load and execute eBPF programs
>>> >teach verifier to prevent pointer leaks.
>>> >Verifier will prevent
>>> >- any arithmetic on pointers
>>> >   (except R10+Imm which is used to compute stack addresses)
>>> >- comparison of pointers
>>> >- passing pointers to helper functions
>>> >- indirectly passing pointers in stack to helper functions
>>> >- returning pointer from bpf program
>>> >- storing pointers into ctx or maps
>>
>> Does the arithmetic restriction include using a pointer as an index to
>> a maps-based tail call? I'm still worried about pointer-based
>> side-effects.
>
>
> the array maps that hold FDs (BPF_MAP_TYPE_PROG_ARRAY and
> BPF_MAP_TYPE_PERF_EVENT_ARRAY) don't have lookup/update accessors
> from the program side, so programs cannot see or manipulate
> those pointers.
> For the former only bpf_tail_call() is allowed that takes integer
> index and jumps to it. And the latter map accessed with

Okay, so I can't take a pointer, put it on the stack, take it back any
part of it as an integer and use it for a tail call?

Sounds like this is shaping up nicely! Thanks for adding all these checks.

-Kees

> bpf_perf_event_read() that also takes index only (this helper
> is not available to socket filters anyway).
> Also bpf_tail_call() can only jump to the program of the same type.
> So I'm quite certain it's safe.
>
> Yes, please ask questions and try to poke holes. Now it is time.
>



-- 
Kees Cook
Chrome OS Security
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next 1/2] bpf: enable non-root eBPF programs

2015-10-05 Thread Daniel Borkmann

On 10/05/2015 10:48 PM, Alexei Starovoitov wrote:

In order to let unprivileged users load and execute eBPF programs
teach verifier to prevent pointer leaks.
Verifier will prevent
- any arithmetic on pointers
   (except R10+Imm which is used to compute stack addresses)
- comparison of pointers
- passing pointers to helper functions
- indirectly passing pointers in stack to helper functions
- returning pointer from bpf program
- storing pointers into ctx or maps

Spill/fill of pointers into stack is allowed, but mangling
of pointers stored in the stack or reading them byte by byte is not.

Within bpf programs the pointers do exist, since programs need to
be able to access maps, pass skb pointer to LD_ABS insns, etc
but programs cannot pass such pointer values to the outside
or obfuscate them.

Only allow BPF_PROG_TYPE_SOCKET_FILTER unprivileged programs,
so that socket filters (tcpdump), af_packet (quic acceleration)
and future kcm can use it.
tracing and tc cls/act program types still require root permissions,
since tracing actually needs to be able to see all kernel pointers
and tc is for root only.

For example, the following unprivileged socket filter program is allowed:
int foo(struct __sk_buff *skb)
{
   char fmt[] = "hello %d\n";
   bpf_trace_printk(fmt, sizeof(fmt), skb->len);
   return 0;
}

but the following program is not:
int foo(struct __sk_buff *skb)
{
   char fmt[] = "hello %p\n";
   bpf_trace_printk(fmt, sizeof(fmt), fmt);
   return 0;
}
since it would leak the kernel stack address via bpf_trace_printk().

Unprivileged socket filter bpf programs have access to the
following helper functions:
- map lookup/update/delete (but they cannot store kernel pointers into them)
- get_random (it's already exposed to unprivileged user space)
- get_smp_processor_id
- tail_call into another socket filter program
- ktime_get_ns
- bpf_trace_printk (for debugging)

The feature is controlled by sysctl kernel.bpf_enable_unprivileged
which is off by default.

New tests were added to test_verifier:
  unpriv: return pointer OK
  unpriv: add const to pointer OK
  unpriv: add pointer to pointer OK
  unpriv: neg pointer OK
  unpriv: cmp pointer with const OK
  unpriv: cmp pointer with pointer OK
  unpriv: pass pointer to printk OK
  unpriv: pass pointer to helper function OK
  unpriv: indirectly pass pointer on stack to helper function OK
  unpriv: mangle pointer on stack 1 OK
  unpriv: mangle pointer on stack 2 OK
  unpriv: read pointer from stack in small chunks OK
  unpriv: write pointer into ctx OK
  unpriv: write pointer into map elem value OK
  unpriv: partial copy of pointer OK

Signed-off-by: Alexei Starovoitov 


One scenario that comes to mind ... what happens when there are kernel
pointers stored in skb->cb[] (either from the current layer or an old
one from a different layer that the skb went through previously, but
which did not get overwritten)?

Socket filters could read a portion of skb->cb[] also when unprived and
leak that out through maps. I think the verifier doesn't catch that,
right?

Thanks,
Daniel
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next 1/2] bpf: enable non-root eBPF programs

2015-10-05 Thread Andy Lutomirski
On Mon, Oct 5, 2015 at 2:12 PM, Alexei Starovoitov  wrote:
> On 10/5/15 2:00 PM, Kees Cook wrote:
>>
>> On Mon, Oct 5, 2015 at 1:48 PM, Alexei Starovoitov
>> wrote:
>>>
>>> >In order to let unprivileged users load and execute eBPF programs
>>> >teach verifier to prevent pointer leaks.
>>> >Verifier will prevent
>>> >- any arithmetic on pointers
>>> >   (except R10+Imm which is used to compute stack addresses)
>>> >- comparison of pointers
>>> >- passing pointers to helper functions
>>> >- indirectly passing pointers in stack to helper functions
>>> >- returning pointer from bpf program
>>> >- storing pointers into ctx or maps
>>
>> Does the arithmetic restriction include using a pointer as an index to
>> a maps-based tail call? I'm still worried about pointer-based
>> side-effects.
>
>
> the array maps that hold FDs (BPF_MAP_TYPE_PROG_ARRAY and
> BPF_MAP_TYPE_PERF_EVENT_ARRAY) don't have lookup/update accessors
> from the program side, so programs cannot see or manipulate
> those pointers.
> For the former only bpf_tail_call() is allowed that takes integer
> index and jumps to it. And the latter map accessed with
> bpf_perf_event_read() that also takes index only (this helper
> is not available to socket filters anyway).
> Also bpf_tail_call() can only jump to the program of the same type.
> So I'm quite certain it's safe.

At some point there will be an unprivileged way to create a map,
though, and we don't want to let pointers get poked into the map.

Or am I misunderstanding?

--Andy
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next 1/2] bpf: enable non-root eBPF programs

2015-10-05 Thread Alexei Starovoitov

On 10/5/15 3:02 PM, Kees Cook wrote:

the array maps that hold FDs (BPF_MAP_TYPE_PROG_ARRAY and
>BPF_MAP_TYPE_PERF_EVENT_ARRAY) don't have lookup/update accessors
>from the program side, so programs cannot see or manipulate
>those pointers.
>For the former only bpf_tail_call() is allowed that takes integer
>index and jumps to it. And the latter map accessed with

Okay, so I can't take a pointer, put it on the stack, take it back any
part of it as an integer and use it for a tail call?


not quite.
you can store a pointer to stack and read it as 8 byte load back into
another register, but reading <8 byte of it will be rejected.
That's the test:
unpriv: read pointer from stack in small chunks
we obviously want to avoid hiding pointer in integers.
After reading it back from stack as a pointer you cannnot use
this register to pass as index into bpf_tail_call().
That's the test:
unpriv: pass pointer to helper function

please keep shooting everything that comes to mind.


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html