Re: [PATCH net-next 1/2] bpf: enable non-root eBPF programs
* Alexei Starovoitovwrote: > As far as sysctl we can look at two with similar purpose: > sysctl_perf_event_paranoid and modules_disabled. > First one is indeed multi level, but not because of the fear of bugs, > but because of real security implications. It serves both purposes flexibly, and note that most people and distros will use the default value. > [...] Like raw events on hyperthreaded cpu or uncore events can extract data > from other user processes. So it controls these extra privileges. It also controls the generally increased risk caused by a larger attack surface, which some users may not want to carry and which they can thus shrink. With a static keys approach there would be no runtime overhead worth speaking of, so I see no reason why unprivileged eBPF couldn't have a sysctl too - with the default value set to permissive. Thanks, Ingo -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-next 1/2] bpf: enable non-root eBPF programs
On 10/7/15 11:21 PM, Ingo Molnar wrote: so I see no reason why unprivileged eBPF couldn't have a sysctl too - with the default value set to permissive. agreed. sent out v2 follows 'modules_disabled' style. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-next 1/2] bpf: enable non-root eBPF programs
On Wed, Oct 7, 2015 at 4:49 PM, Alexei Starovoitovwrote: > On 10/7/15 3:22 PM, Kees Cook wrote: >>> >>> Yes, I agree with you that there would be a CVE regardless. I still >>> >like the option of configurable access, not a big fan of the sysctl >>> >either. Thinking out loudly, what about a Kconfig option? We started >>> >out like this on bpf(2) itself (initially under expert settings, now >>> >afaik not anymore), and depending on usage scenarios, a requirement >>> >could be to have immutable cap_sys_admin-only, for other use-cases a >>> >requirement on the kernel might instead be to have unprivileged users >>> >as well. >> >> It'd be nice to have it just be a Kconfig, but this shoots >> distro-users in the foot if a distro decides to include unpriv bpf and >> the user doesn't want it. I think it's probably a good idea to keep >> the sysctl. > > > I don't like introducing Kconfig for no clear reason. It only adds > to the testing matrix and makes it harder to hack around. > Paranoid distros can disable bpf via single config already, > there is no reason to go more fine grained here. > Unpriv checks add minimal amount of code, so even for tinification > purpose there is no need to chop of few bytes. tiny kernels would > disable bpf all together. > > As far as sysctl we can look at two with similar purpose: > sysctl_perf_event_paranoid and modules_disabled. > First one is indeed multi level, but not because of the fear of bugs, > but because of real security implications. Like raw events on > hyperthreaded cpu or uncore events can extract data from other > user processes. So it controls these extra privileges. > For bpf there are no hw implications to deal with. > If we make seccomp+bpf in the future it shouldn't need another knob > or extra bit. There are no extra privileges to grant, so not needed. > > modules_disabled is off by default and can be toggled on once. > I think for paranoid distro users that "don't want bpf" that is > the better model. > So I'm thinking to do sysctl_unprivileged_bpf_disabled that will be > 0=off by default (meaning that users can load unpriv socket filter > programs and seccomp in the future) and that can be switched > to 1=on once and stay that way until reboot. > I think that's the best balance that avoids adding checks to all > apps that want to use bpf and admins can still act on it. > From app point of view it's no different than bpf syscall > was not compiled in. So single feature test for bpf syscall will > be enough. I think this would be great. :) -Kees -- Kees Cook Chrome OS Security -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-next 1/2] bpf: enable non-root eBPF programs
On 10/6/15 5:45 AM, Daniel Borkmann wrote: Should instead something similar be adapted on bpf(2) as well? Or, would that even be more painful for application developers shipping their stuff through distros in the end (where they might then decide to just setup everything BPF-related and then drop privs)? I think loading as root and then dropping privs won't work in many cases, since apps still need to access maps even after dropping privs and today it's not possible, since cap_sys_admin is tested for every bpf syscall. I'm also wondering with regards to seccomp, which could adapt to eBPF at some point and be used by unprivileged programs. Perhaps then, a single paranoia alike setting might not suit to all eBPF subsystem users. Any ideas? There is no such paranoid sysctl for cBPF, so there is no reason to add one for eBPF other than fear. Adding multiple sysctl knobs for seccomp, socket, tracing is only reflection of even higher fear. What sysadmins suppose to do with such sysctl when kernel is kinda saying 'may be something unsafe here you're on your own' ? Also the presence of this sysctl_bpf_enable_unprivileged or any other one doesn't help with CVEs. Any bug with security implications will be a CVE regardless, so I think the better course of action is to avoid introducing this sysctl. We've discussed adding something like CAP_BPF to control it, but then again, do we want this because of fear of bugs or because it's actually needed. I think the design of all CAP_* is to give unprivileged users permissions to do something beyond normal that can potentially be harmful for other users or the whole system. In this case it's not the case. One user can load eBPF programs and maps up to its MEMLOCK limit and they cannot interfere with other users or affect the host, so CAP_BPF is not necessary either. Thoughts? -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-next 1/2] bpf: enable non-root eBPF programs
On Wed, Oct 7, 2015 at 3:07 PM, Daniel Borkmannwrote: > On 10/07/2015 11:20 PM, Alexei Starovoitov wrote: >> >> On 10/6/15 5:45 AM, Daniel Borkmann wrote: >>> >>> Should instead something similar be adapted on bpf(2) as well? Or, would >>> that even be more painful for application developers shipping their stuff >>> through distros in the end (where they might then decide to just setup >>> everything BPF-related and then drop privs)? >> >> >> I think loading as root and then dropping privs won't work in many >> cases, since apps still need to access maps even after dropping privs >> and today it's not possible, since cap_sys_admin is tested for every >> bpf syscall. > > > Yep, maps-only would then need to be made accessible in some way. > >>> I'm also wondering with regards to seccomp, which could adapt to eBPF at >>> some point and be used by unprivileged programs. Perhaps then, a single >>> paranoia alike setting might not suit to all eBPF subsystem users. Any >>> ideas? >> >> >> There is no such paranoid sysctl for cBPF, so there is no reason to >> add one for eBPF other than fear. >> Adding multiple sysctl knobs for seccomp, socket, tracing is only >> reflection of even higher fear. >> What sysadmins suppose to do with such sysctl when kernel is kinda >> saying 'may be something unsafe here you're on your own' ? >> Also the presence of this sysctl_bpf_enable_unprivileged or any other >> one doesn't help with CVEs. Any bug with security implications will >> be a CVE regardless, so I think the better course of action is to >> avoid introducing this sysctl. > > > Yes, I agree with you that there would be a CVE regardless. I still > like the option of configurable access, not a big fan of the sysctl > either. Thinking out loudly, what about a Kconfig option? We started > out like this on bpf(2) itself (initially under expert settings, now > afaik not anymore), and depending on usage scenarios, a requirement > could be to have immutable cap_sys_admin-only, for other use-cases a > requirement on the kernel might instead be to have unprivileged users > as well. It'd be nice to have it just be a Kconfig, but this shoots distro-users in the foot if a distro decides to include unpriv bpf and the user doesn't want it. I think it's probably a good idea to keep the sysctl. -Kees > >> We've discussed adding something like CAP_BPF to control it, >> but then again, do we want this because of fear of bugs or because >> it's actually needed. I think the design of all CAP_* is to give >> unprivileged users permissions to do something beyond normal that >> can potentially be harmful for other users or the whole system. >> In this case it's not the case. One user can load eBPF programs >> and maps up to its MEMLOCK limit and they cannot interfere with >> other users or affect the host, so CAP_BPF is not necessary either. > > > Thanks, > Daniel -- Kees Cook Chrome OS Security -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-next 1/2] bpf: enable non-root eBPF programs
On 10/07/2015 11:20 PM, Alexei Starovoitov wrote: On 10/6/15 5:45 AM, Daniel Borkmann wrote: Should instead something similar be adapted on bpf(2) as well? Or, would that even be more painful for application developers shipping their stuff through distros in the end (where they might then decide to just setup everything BPF-related and then drop privs)? I think loading as root and then dropping privs won't work in many cases, since apps still need to access maps even after dropping privs and today it's not possible, since cap_sys_admin is tested for every bpf syscall. Yep, maps-only would then need to be made accessible in some way. I'm also wondering with regards to seccomp, which could adapt to eBPF at some point and be used by unprivileged programs. Perhaps then, a single paranoia alike setting might not suit to all eBPF subsystem users. Any ideas? There is no such paranoid sysctl for cBPF, so there is no reason to add one for eBPF other than fear. Adding multiple sysctl knobs for seccomp, socket, tracing is only reflection of even higher fear. What sysadmins suppose to do with such sysctl when kernel is kinda saying 'may be something unsafe here you're on your own' ? Also the presence of this sysctl_bpf_enable_unprivileged or any other one doesn't help with CVEs. Any bug with security implications will be a CVE regardless, so I think the better course of action is to avoid introducing this sysctl. Yes, I agree with you that there would be a CVE regardless. I still like the option of configurable access, not a big fan of the sysctl either. Thinking out loudly, what about a Kconfig option? We started out like this on bpf(2) itself (initially under expert settings, now afaik not anymore), and depending on usage scenarios, a requirement could be to have immutable cap_sys_admin-only, for other use-cases a requirement on the kernel might instead be to have unprivileged users as well. We've discussed adding something like CAP_BPF to control it, but then again, do we want this because of fear of bugs or because it's actually needed. I think the design of all CAP_* is to give unprivileged users permissions to do something beyond normal that can potentially be harmful for other users or the whole system. In this case it's not the case. One user can load eBPF programs and maps up to its MEMLOCK limit and they cannot interfere with other users or affect the host, so CAP_BPF is not necessary either. Thanks, Daniel -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-next 1/2] bpf: enable non-root eBPF programs
* Andy Lutomirskiwrote: > On Tue, Oct 6, 2015 at 10:56 AM, Eric Dumazet wrote: > > On Tue, 2015-10-06 at 10:50 -0700, Alexei Starovoitov wrote: > > > >> was also thinking that we can do it only in paths that actually > >> have multiple protocol layers, since today bpf is mainly used with > >> tcpdump(raw_socket) and new af_packet fanout both have cb cleared > >> on RX, because it just came out of alloc_skb and no layers were called, > >> and on TX we can clear 20 bytes in dev_queue_xmit_nit(). > >> af_unix/netlink also have clean skb. Need to analyze tun and sctp... > >> but it feels overly fragile to save a branch in sk_filter, > >> so planning to go with > >> if(unlikely(prog->cb_access)) memset in sk_filter(). > >> > > > > This will break TCP use of sk_filter(). > > skb->cb[] contains useful data in TCP layer. > > > > > > Since I don't know too much about the networking details: > > 1. Does "skb->cb" *ever* contain anything useful for an unprivileged user? > > 2. Does sbk->cb form a stable ABI? > > Unless both answers are solid yesses, then maybe the right solution is > to just deny access entirely to unprivileged users. So this kind of instrumentation data is not an ABI in a similar fashion as tracing information is not an ABI either. I.e. tracepoints can (and sometimes do) change 'semantics' - in that the interpretation of the implementational details behind that data changes as the implementation changes. That's not something that can ever be an ABI, just like the contents of /proc/kcore or /proc/slabinfo can not be an ABI. Thanks, Ingo -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-next 1/2] bpf: enable non-root eBPF programs
On 10/7/15 3:22 PM, Kees Cook wrote: Yes, I agree with you that there would be a CVE regardless. I still >like the option of configurable access, not a big fan of the sysctl >either. Thinking out loudly, what about a Kconfig option? We started >out like this on bpf(2) itself (initially under expert settings, now >afaik not anymore), and depending on usage scenarios, a requirement >could be to have immutable cap_sys_admin-only, for other use-cases a >requirement on the kernel might instead be to have unprivileged users >as well. It'd be nice to have it just be a Kconfig, but this shoots distro-users in the foot if a distro decides to include unpriv bpf and the user doesn't want it. I think it's probably a good idea to keep the sysctl. I don't like introducing Kconfig for no clear reason. It only adds to the testing matrix and makes it harder to hack around. Paranoid distros can disable bpf via single config already, there is no reason to go more fine grained here. Unpriv checks add minimal amount of code, so even for tinification purpose there is no need to chop of few bytes. tiny kernels would disable bpf all together. As far as sysctl we can look at two with similar purpose: sysctl_perf_event_paranoid and modules_disabled. First one is indeed multi level, but not because of the fear of bugs, but because of real security implications. Like raw events on hyperthreaded cpu or uncore events can extract data from other user processes. So it controls these extra privileges. For bpf there are no hw implications to deal with. If we make seccomp+bpf in the future it shouldn't need another knob or extra bit. There are no extra privileges to grant, so not needed. modules_disabled is off by default and can be toggled on once. I think for paranoid distro users that "don't want bpf" that is the better model. So I'm thinking to do sysctl_unprivileged_bpf_disabled that will be 0=off by default (meaning that users can load unpriv socket filter programs and seccomp in the future) and that can be switched to 1=on once and stay that way until reboot. I think that's the best balance that avoids adding checks to all apps that want to use bpf and admins can still act on it. From app point of view it's no different than bpf syscall was not compiled in. So single feature test for bpf syscall will be enough. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-next 1/2] bpf: enable non-root eBPF programs
On 10/5/15 1:48 PM, Alexei Starovoitov wrote: Unprivileged socket filter bpf programs have access to the following helper functions: - map lookup/update/delete (but they cannot store kernel pointers into them) - get_random (it's already exposed to unprivileged user space) - get_smp_processor_id - tail_call into another socket filter program - ktime_get_ns - bpf_trace_printk (for debugging) while reviewing everything for Nth time realized that bpf_trace_printk is useless for unprivileged users, since trace_pipe is root only. So going to drop it in V2. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-next 1/2] bpf: enable non-root eBPF programs
On 10/6/15 1:39 AM, Daniel Borkmann wrote: [...] Also classic BPF would then need to test for it, since a socket filter doesn't really know whether native eBPF is loaded there or a classic-to-eBPF transformed one, and classic never makes use of this. Anyway, it could be done by adding a bit flag cb_access:1 to the bpf_prog, set it during eBPF verification phase, and test it inside sk_filter() if I see it correctly. That could also be done in an unlikely() branch, to keep the cost to the non-eBPF case near zero. Yes, agreed. For the time being, the majority of users are coming from the classic BPF side anyway and the unlikely() could still be changed later on if it should not be the case anymore. The flag and bpf_func would share the same cacheline as well. was also thinking that we can do it only in paths that actually have multiple protocol layers, since today bpf is mainly used with tcpdump(raw_socket) and new af_packet fanout both have cb cleared on RX, because it just came out of alloc_skb and no layers were called, and on TX we can clear 20 bytes in dev_queue_xmit_nit(). af_unix/netlink also have clean skb. Need to analyze tun and sctp... but it feels overly fragile to save a branch in sk_filter, so planning to go with if(unlikely(prog->cb_access)) memset in sk_filter(). -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-next 1/2] bpf: enable non-root eBPF programs
On 10/06/2015 07:50 PM, Alexei Starovoitov wrote: On 10/6/15 1:39 AM, Daniel Borkmann wrote: [...] Also classic BPF would then need to test for it, since a socket filter doesn't really know whether native eBPF is loaded there or a classic-to-eBPF transformed one, and classic never makes use of this. Anyway, it could be done by adding a bit flag cb_access:1 to the bpf_prog, set it during eBPF verification phase, and test it inside sk_filter() if I see it correctly. That could also be done in an unlikely() branch, to keep the cost to the non-eBPF case near zero. Yes, agreed. For the time being, the majority of users are coming from the classic BPF side anyway and the unlikely() could still be changed later on if it should not be the case anymore. The flag and bpf_func would share the same cacheline as well. was also thinking that we can do it only in paths that actually have multiple protocol layers, since today bpf is mainly used with tcpdump(raw_socket) and new af_packet fanout both have cb cleared on RX, because it just came out of alloc_skb and no layers were called, and on TX we can clear 20 bytes in dev_queue_xmit_nit(). af_unix/netlink also have clean skb. Need to analyze tun and sctp... but it feels overly fragile to save a branch in sk_filter, so planning to go with if(unlikely(prog->cb_access)) memset in sk_filter(). I was also thinking that for dev_queue_xmit_nit(), since we do the skb_clone() there, to have a clone version (w/o affecting performance of the current one) that instead of copying cb[] over, it would just do a memset(). But that would just be limited to AF_PACKET, and doesn't catch all sk_filter() users. Thanks, Daniel -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-next 1/2] bpf: enable non-root eBPF programs
On 10/6/15 10:56 AM, Eric Dumazet wrote: On Tue, 2015-10-06 at 10:50 -0700, Alexei Starovoitov wrote: was also thinking that we can do it only in paths that actually have multiple protocol layers, since today bpf is mainly used with tcpdump(raw_socket) and new af_packet fanout both have cb cleared on RX, because it just came out of alloc_skb and no layers were called, and on TX we can clear 20 bytes in dev_queue_xmit_nit(). af_unix/netlink also have clean skb. Need to analyze tun and sctp... but it feels overly fragile to save a branch in sk_filter, so planning to go with if(unlikely(prog->cb_access)) memset in sk_filter(). This will break TCP use of sk_filter(). skb->cb[] contains useful data in TCP layer. oops. thanks for catching. In case of sk_filter on top of tcp sock, it shouldn't be looking at cb at all. I'm thinking to send a patch to get rid of cb access for socket filters all together until better solution found. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-next 1/2] bpf: enable non-root eBPF programs
On Tue, Oct 6, 2015 at 10:56 AM, Eric Dumazetwrote: > On Tue, 2015-10-06 at 10:50 -0700, Alexei Starovoitov wrote: > >> was also thinking that we can do it only in paths that actually >> have multiple protocol layers, since today bpf is mainly used with >> tcpdump(raw_socket) and new af_packet fanout both have cb cleared >> on RX, because it just came out of alloc_skb and no layers were called, >> and on TX we can clear 20 bytes in dev_queue_xmit_nit(). >> af_unix/netlink also have clean skb. Need to analyze tun and sctp... >> but it feels overly fragile to save a branch in sk_filter, >> so planning to go with >> if(unlikely(prog->cb_access)) memset in sk_filter(). >> > > This will break TCP use of sk_filter(). > skb->cb[] contains useful data in TCP layer. > > Since I don't know too much about the networking details: 1. Does "skb->cb" *ever* contain anything useful for an unprivileged user? 2. Does sbk->cb form a stable ABI? Unless both answers are solid yesses, then maybe the right solution is to just deny access entirely to unprivileged users. --Andy -- Andy Lutomirski AMA Capital Management, LLC -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-next 1/2] bpf: enable non-root eBPF programs
On Tue, 2015-10-06 at 10:50 -0700, Alexei Starovoitov wrote: > was also thinking that we can do it only in paths that actually > have multiple protocol layers, since today bpf is mainly used with > tcpdump(raw_socket) and new af_packet fanout both have cb cleared > on RX, because it just came out of alloc_skb and no layers were called, > and on TX we can clear 20 bytes in dev_queue_xmit_nit(). > af_unix/netlink also have clean skb. Need to analyze tun and sctp... > but it feels overly fragile to save a branch in sk_filter, > so planning to go with > if(unlikely(prog->cb_access)) memset in sk_filter(). > This will break TCP use of sk_filter(). skb->cb[] contains useful data in TCP layer. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-next 1/2] bpf: enable non-root eBPF programs
* Alexei Starovoitovwrote: > On 10/5/15 3:14 PM, Daniel Borkmann wrote: > >One scenario that comes to mind ... what happens when there are kernel > >pointers stored in skb->cb[] (either from the current layer or an old > >one from a different layer that the skb went through previously, but > >which did not get overwritten)? > > > >Socket filters could read a portion of skb->cb[] also when unprived and > >leak that out through maps. I think the verifier doesn't catch that, > >right? > > grrr. indeed. previous layer before sk_filter() can leave junk in there. Could this be solved by activating zeroing/sanitizing of this data if there's an active BPF function around that can access that socket? Thanks, Ingo -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-next 1/2] bpf: enable non-root eBPF programs
* Daniel Borkmannwrote: > On 10/06/2015 09:13 AM, Ingo Molnar wrote: > > > >* Alexei Starovoitov wrote: > > > >>On 10/5/15 3:14 PM, Daniel Borkmann wrote: > >>>One scenario that comes to mind ... what happens when there are kernel > >>>pointers stored in skb->cb[] (either from the current layer or an old > >>>one from a different layer that the skb went through previously, but > >>>which did not get overwritten)? > >>> > >>>Socket filters could read a portion of skb->cb[] also when unprived and > >>>leak that out through maps. I think the verifier doesn't catch that, > >>>right? > >> > >>grrr. indeed. previous layer before sk_filter() can leave junk in there. > > > >Could this be solved by activating zeroing/sanitizing of this data if > >there's an > >active BPF function around that can access that socket? > > I think this check could only be done in sk_filter() for testing these > conditions (unprivileged user + access to cb area), so it would need to > happen from outside a native eBPF program. :/ Yes, the kernel (with code running outside of any eBPF program) would guarantee that those data fields are zeroed/sanitized, if there's an eBPF program that is attached to that socket. > [...] Also classic BPF would then need to test for it, since a socket filter > doesn't really know whether native eBPF is loaded there or a classic-to-eBPF > transformed one, and classic never makes use of this. Anyway, it could be > done > by adding a bit flag cb_access:1 to the bpf_prog, set it during eBPF > verification phase, and test it inside sk_filter() if I see it correctly. That could also be done in an unlikely() branch, to keep the cost to the non-eBPF case near zero. > The reason is that this sanitizing must only be done in the 'top-level' > program > that is run from sk_filter() _directly_, because a user at any time could > decide > to put an already loaded eBPF fd into a tail call map. And cb[] is then used > to > pass args/state around between two programs, thus it cannot be > unconditionally > cleared from within the program. The association to a socket filter > (SO_ATTACH_BPF) happens at a later time after a native eBPF program has > already > been loaded via bpf(2). So zeroing tends to be very cheap and it could also be beneficial to performance in terms of bringing the cacheline into the CPU cache. But I really don't know the filter code so I'm just handwaving. Thanks, Ingo -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-next 1/2] bpf: enable non-root eBPF programs
On 10/06/2015 10:20 AM, Ingo Molnar wrote: * Daniel Borkmannwrote: On 10/06/2015 09:13 AM, Ingo Molnar wrote: * Alexei Starovoitov wrote: On 10/5/15 3:14 PM, Daniel Borkmann wrote: One scenario that comes to mind ... what happens when there are kernel pointers stored in skb->cb[] (either from the current layer or an old one from a different layer that the skb went through previously, but which did not get overwritten)? Socket filters could read a portion of skb->cb[] also when unprived and leak that out through maps. I think the verifier doesn't catch that, right? grrr. indeed. previous layer before sk_filter() can leave junk in there. Could this be solved by activating zeroing/sanitizing of this data if there's an active BPF function around that can access that socket? I think this check could only be done in sk_filter() for testing these conditions (unprivileged user + access to cb area), so it would need to happen from outside a native eBPF program. :/ Yes, the kernel (with code running outside of any eBPF program) would guarantee that those data fields are zeroed/sanitized, if there's an eBPF program that is attached to that socket. [...] Also classic BPF would then need to test for it, since a socket filter doesn't really know whether native eBPF is loaded there or a classic-to-eBPF transformed one, and classic never makes use of this. Anyway, it could be done by adding a bit flag cb_access:1 to the bpf_prog, set it during eBPF verification phase, and test it inside sk_filter() if I see it correctly. That could also be done in an unlikely() branch, to keep the cost to the non-eBPF case near zero. Yes, agreed. For the time being, the majority of users are coming from the classic BPF side anyway and the unlikely() could still be changed later on if it should not be the case anymore. The flag and bpf_func would share the same cacheline as well. The reason is that this sanitizing must only be done in the 'top-level' program that is run from sk_filter() _directly_, because a user at any time could decide to put an already loaded eBPF fd into a tail call map. And cb[] is then used to pass args/state around between two programs, thus it cannot be unconditionally cleared from within the program. The association to a socket filter (SO_ATTACH_BPF) happens at a later time after a native eBPF program has already been loaded via bpf(2). So zeroing tends to be very cheap and it could also be beneficial to performance in terms of bringing the cacheline into the CPU cache. But I really don't know the filter code so I'm just handwaving. Thanks, Ingo -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-next 1/2] bpf: enable non-root eBPF programs
On 10/06/2015 09:13 AM, Ingo Molnar wrote: * Alexei Starovoitovwrote: On 10/5/15 3:14 PM, Daniel Borkmann wrote: One scenario that comes to mind ... what happens when there are kernel pointers stored in skb->cb[] (either from the current layer or an old one from a different layer that the skb went through previously, but which did not get overwritten)? Socket filters could read a portion of skb->cb[] also when unprived and leak that out through maps. I think the verifier doesn't catch that, right? grrr. indeed. previous layer before sk_filter() can leave junk in there. Could this be solved by activating zeroing/sanitizing of this data if there's an active BPF function around that can access that socket? I think this check could only be done in sk_filter() for testing these conditions (unprivileged user + access to cb area), so it would need to happen from outside a native eBPF program. :/ Also classic BPF would then need to test for it, since a socket filter doesn't really know whether native eBPF is loaded there or a classic-to-eBPF transformed one, and classic never makes use of this. Anyway, it could be done by adding a bit flag cb_access:1 to the bpf_prog, set it during eBPF verification phase, and test it inside sk_filter() if I see it correctly. The reason is that this sanitizing must only be done in the 'top-level' program that is run from sk_filter() _directly_, because a user at any time could decide to put an already loaded eBPF fd into a tail call map. And cb[] is then used to pass args/state around between two programs, thus it cannot be unconditionally cleared from within the program. The association to a socket filter (SO_ATTACH_BPF) happens at a later time after a native eBPF program has already been loaded via bpf(2). Thanks, Daniel -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-next 1/2] bpf: enable non-root eBPF programs
On 10/06/2015 02:51 AM, Alexei Starovoitov wrote: On 10/5/15 3:14 PM, Daniel Borkmann wrote: One scenario that comes to mind ... what happens when there are kernel pointers stored in skb->cb[] (either from the current layer or an old one from a different layer that the skb went through previously, but which did not get overwritten)? Socket filters could read a portion of skb->cb[] also when unprived and leak that out through maps. I think the verifier doesn't catch that, right? ... Please keep poking. ;) I'm still wondering whether sysctl_bpf_enable_unprivileged is a good way to go with regards to controlling capabilties of bpf(2), hmm, but don't really have a good idea at the moment. So, the rationale of this is to give it some soaking time before flipping the switch that then defaults to on, and later on to still have a possibility for an admin to turn it off (if not silently overwritten by some system application later on ;)). I think only having a Kconfig doesn't really make sense as distros will blindly turn lots of stuff on anyway. A hidden Kconfig entry that is not exposed into menuconfig might allow for sorting everything out first, but with the issue of getting only little testing exposure. If I see this correctly, perf_event_open(2) has a number of paranoia levels with some helpers wrapped around it, f.e.: /* * perf event paranoia level: * -1 - not paranoid at all * 0 - disallow raw tracepoint access for unpriv * 1 - disallow cpu events for unpriv * 2 - disallow kernel profiling for unpriv */ int sysctl_perf_event_paranoid __read_mostly = 1; Should instead something similar be adapted on bpf(2) as well? Or, would that even be more painful for application developers shipping their stuff through distros in the end (where they might then decide to just setup everything BPF-related and then drop privs)? I'm also wondering with regards to seccomp, which could adapt to eBPF at some point and be used by unprivileged programs. Perhaps then, a single paranoia alike setting might not suit to all eBPF subsystem users. Any ideas? Thanks, Daniel -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH net-next 1/2] bpf: enable non-root eBPF programs
In order to let unprivileged users load and execute eBPF programs teach verifier to prevent pointer leaks. Verifier will prevent - any arithmetic on pointers (except R10+Imm which is used to compute stack addresses) - comparison of pointers - passing pointers to helper functions - indirectly passing pointers in stack to helper functions - returning pointer from bpf program - storing pointers into ctx or maps Spill/fill of pointers into stack is allowed, but mangling of pointers stored in the stack or reading them byte by byte is not. Within bpf programs the pointers do exist, since programs need to be able to access maps, pass skb pointer to LD_ABS insns, etc but programs cannot pass such pointer values to the outside or obfuscate them. Only allow BPF_PROG_TYPE_SOCKET_FILTER unprivileged programs, so that socket filters (tcpdump), af_packet (quic acceleration) and future kcm can use it. tracing and tc cls/act program types still require root permissions, since tracing actually needs to be able to see all kernel pointers and tc is for root only. For example, the following unprivileged socket filter program is allowed: int foo(struct __sk_buff *skb) { char fmt[] = "hello %d\n"; bpf_trace_printk(fmt, sizeof(fmt), skb->len); return 0; } but the following program is not: int foo(struct __sk_buff *skb) { char fmt[] = "hello %p\n"; bpf_trace_printk(fmt, sizeof(fmt), fmt); return 0; } since it would leak the kernel stack address via bpf_trace_printk(). Unprivileged socket filter bpf programs have access to the following helper functions: - map lookup/update/delete (but they cannot store kernel pointers into them) - get_random (it's already exposed to unprivileged user space) - get_smp_processor_id - tail_call into another socket filter program - ktime_get_ns - bpf_trace_printk (for debugging) The feature is controlled by sysctl kernel.bpf_enable_unprivileged which is off by default. New tests were added to test_verifier: unpriv: return pointer OK unpriv: add const to pointer OK unpriv: add pointer to pointer OK unpriv: neg pointer OK unpriv: cmp pointer with const OK unpriv: cmp pointer with pointer OK unpriv: pass pointer to printk OK unpriv: pass pointer to helper function OK unpriv: indirectly pass pointer on stack to helper function OK unpriv: mangle pointer on stack 1 OK unpriv: mangle pointer on stack 2 OK unpriv: read pointer from stack in small chunks OK unpriv: write pointer into ctx OK unpriv: write pointer into map elem value OK unpriv: partial copy of pointer OK Signed-off-by: Alexei Starovoitov--- include/linux/bpf.h |3 + kernel/bpf/syscall.c| 11 +- kernel/bpf/verifier.c | 114 +++-- kernel/sysctl.c | 10 ++ kernel/trace/bpf_trace.c|3 + samples/bpf/libbpf.h|8 ++ samples/bpf/test_verifier.c | 239 ++- 7 files changed, 371 insertions(+), 17 deletions(-) diff --git a/include/linux/bpf.h b/include/linux/bpf.h index f57d7fed9ec3..acf97d66b681 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -66,6 +66,7 @@ enum bpf_arg_type { ARG_PTR_TO_CTX, /* pointer to context */ ARG_ANYTHING, /* any (initialized) argument is ok */ + ARG_VARARG, /* optional argument */ }; /* type of values returned from helper functions */ @@ -168,6 +169,8 @@ void bpf_prog_put_rcu(struct bpf_prog *prog); struct bpf_map *bpf_map_get(struct fd f); void bpf_map_put(struct bpf_map *map); +extern int sysctl_bpf_enable_unprivileged; + /* verify correctness of eBPF program */ int bpf_check(struct bpf_prog **fp, union bpf_attr *attr); #else diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c index 5f35f420c12f..9a2098da2da9 100644 --- a/kernel/bpf/syscall.c +++ b/kernel/bpf/syscall.c @@ -18,6 +18,8 @@ #include #include +int sysctl_bpf_enable_unprivileged __read_mostly; + static LIST_HEAD(bpf_map_types); static struct bpf_map *find_and_alloc_map(union bpf_attr *attr) @@ -542,6 +544,9 @@ static int bpf_prog_load(union bpf_attr *attr) attr->kern_version != LINUX_VERSION_CODE) return -EINVAL; + if (type != BPF_PROG_TYPE_SOCKET_FILTER && !capable(CAP_SYS_ADMIN)) + return -EPERM; + /* plain bpf_prog allocation */ prog = bpf_prog_alloc(bpf_prog_size(attr->insn_cnt), GFP_USER); if (!prog) @@ -597,11 +602,7 @@ SYSCALL_DEFINE3(bpf, int, cmd, union bpf_attr __user *, uattr, unsigned int, siz union bpf_attr attr = {}; int err; - /* the syscall is limited to root temporarily. This restriction will be -* lifted when security audit is clean. Note that eBPF+tracing must have -* this restriction, since it may pass kernel data to user space -*/ - if (!capable(CAP_SYS_ADMIN)) + if (!capable(CAP_SYS_ADMIN) && !sysctl_bpf_enable_unprivileged)
Re: [PATCH net-next 1/2] bpf: enable non-root eBPF programs
On Mon, Oct 5, 2015 at 1:48 PM, Alexei Starovoitovwrote: > In order to let unprivileged users load and execute eBPF programs > teach verifier to prevent pointer leaks. > Verifier will prevent > - any arithmetic on pointers > (except R10+Imm which is used to compute stack addresses) > - comparison of pointers > - passing pointers to helper functions > - indirectly passing pointers in stack to helper functions > - returning pointer from bpf program > - storing pointers into ctx or maps Does the arithmetic restriction include using a pointer as an index to a maps-based tail call? I'm still worried about pointer-based side-effects. -Kees > > Spill/fill of pointers into stack is allowed, but mangling > of pointers stored in the stack or reading them byte by byte is not. > > Within bpf programs the pointers do exist, since programs need to > be able to access maps, pass skb pointer to LD_ABS insns, etc > but programs cannot pass such pointer values to the outside > or obfuscate them. > > Only allow BPF_PROG_TYPE_SOCKET_FILTER unprivileged programs, > so that socket filters (tcpdump), af_packet (quic acceleration) > and future kcm can use it. > tracing and tc cls/act program types still require root permissions, > since tracing actually needs to be able to see all kernel pointers > and tc is for root only. > > For example, the following unprivileged socket filter program is allowed: > int foo(struct __sk_buff *skb) > { > char fmt[] = "hello %d\n"; > bpf_trace_printk(fmt, sizeof(fmt), skb->len); > return 0; > } > > but the following program is not: > int foo(struct __sk_buff *skb) > { > char fmt[] = "hello %p\n"; > bpf_trace_printk(fmt, sizeof(fmt), fmt); > return 0; > } > since it would leak the kernel stack address via bpf_trace_printk(). > > Unprivileged socket filter bpf programs have access to the > following helper functions: > - map lookup/update/delete (but they cannot store kernel pointers into them) > - get_random (it's already exposed to unprivileged user space) > - get_smp_processor_id > - tail_call into another socket filter program > - ktime_get_ns > - bpf_trace_printk (for debugging) > > The feature is controlled by sysctl kernel.bpf_enable_unprivileged > which is off by default. > > New tests were added to test_verifier: > unpriv: return pointer OK > unpriv: add const to pointer OK > unpriv: add pointer to pointer OK > unpriv: neg pointer OK > unpriv: cmp pointer with const OK > unpriv: cmp pointer with pointer OK > unpriv: pass pointer to printk OK > unpriv: pass pointer to helper function OK > unpriv: indirectly pass pointer on stack to helper function OK > unpriv: mangle pointer on stack 1 OK > unpriv: mangle pointer on stack 2 OK > unpriv: read pointer from stack in small chunks OK > unpriv: write pointer into ctx OK > unpriv: write pointer into map elem value OK > unpriv: partial copy of pointer OK > > Signed-off-by: Alexei Starovoitov > --- > include/linux/bpf.h |3 + > kernel/bpf/syscall.c| 11 +- > kernel/bpf/verifier.c | 114 +++-- > kernel/sysctl.c | 10 ++ > kernel/trace/bpf_trace.c|3 + > samples/bpf/libbpf.h|8 ++ > samples/bpf/test_verifier.c | 239 > ++- > 7 files changed, 371 insertions(+), 17 deletions(-) > > diff --git a/include/linux/bpf.h b/include/linux/bpf.h > index f57d7fed9ec3..acf97d66b681 100644 > --- a/include/linux/bpf.h > +++ b/include/linux/bpf.h > @@ -66,6 +66,7 @@ enum bpf_arg_type { > > ARG_PTR_TO_CTX, /* pointer to context */ > ARG_ANYTHING, /* any (initialized) argument is ok */ > + ARG_VARARG, /* optional argument */ > }; > > /* type of values returned from helper functions */ > @@ -168,6 +169,8 @@ void bpf_prog_put_rcu(struct bpf_prog *prog); > struct bpf_map *bpf_map_get(struct fd f); > void bpf_map_put(struct bpf_map *map); > > +extern int sysctl_bpf_enable_unprivileged; > + > /* verify correctness of eBPF program */ > int bpf_check(struct bpf_prog **fp, union bpf_attr *attr); > #else > diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c > index 5f35f420c12f..9a2098da2da9 100644 > --- a/kernel/bpf/syscall.c > +++ b/kernel/bpf/syscall.c > @@ -18,6 +18,8 @@ > #include > #include > > +int sysctl_bpf_enable_unprivileged __read_mostly; > + > static LIST_HEAD(bpf_map_types); > > static struct bpf_map *find_and_alloc_map(union bpf_attr *attr) > @@ -542,6 +544,9 @@ static int bpf_prog_load(union bpf_attr *attr) > attr->kern_version != LINUX_VERSION_CODE) > return -EINVAL; > > + if (type != BPF_PROG_TYPE_SOCKET_FILTER && !capable(CAP_SYS_ADMIN)) > + return -EPERM; > + > /* plain bpf_prog allocation */ > prog = bpf_prog_alloc(bpf_prog_size(attr->insn_cnt), GFP_USER); > if (!prog) > @@ -597,11 +602,7 @@ SYSCALL_DEFINE3(bpf, int, cmd, union
Re: [PATCH net-next 1/2] bpf: enable non-root eBPF programs
On 10/5/15 3:14 PM, Daniel Borkmann wrote: One scenario that comes to mind ... what happens when there are kernel pointers stored in skb->cb[] (either from the current layer or an old one from a different layer that the skb went through previously, but which did not get overwritten)? Socket filters could read a portion of skb->cb[] also when unprived and leak that out through maps. I think the verifier doesn't catch that, right? grrr. indeed. previous layer before sk_filter() can leave junk in there. Would need to disable cb[0-5] for unpriv, but that will make tail_call much harder to use, since cb[0-5] is a way to pass arguments from one prog to another and clearing them is not an option, since it's too expensive. Like samples/bpf/sockex3_kern.c usage of cb[0] won't work anymore. I guess that's the price of unpriv. Will fix this, add few tail_call specific tests and respin. Please keep poking. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-next 1/2] bpf: enable non-root eBPF programs
On 10/5/15 2:00 PM, Kees Cook wrote: On Mon, Oct 5, 2015 at 1:48 PM, Alexei Starovoitovwrote: >In order to let unprivileged users load and execute eBPF programs >teach verifier to prevent pointer leaks. >Verifier will prevent >- any arithmetic on pointers > (except R10+Imm which is used to compute stack addresses) >- comparison of pointers >- passing pointers to helper functions >- indirectly passing pointers in stack to helper functions >- returning pointer from bpf program >- storing pointers into ctx or maps Does the arithmetic restriction include using a pointer as an index to a maps-based tail call? I'm still worried about pointer-based side-effects. the array maps that hold FDs (BPF_MAP_TYPE_PROG_ARRAY and BPF_MAP_TYPE_PERF_EVENT_ARRAY) don't have lookup/update accessors from the program side, so programs cannot see or manipulate those pointers. For the former only bpf_tail_call() is allowed that takes integer index and jumps to it. And the latter map accessed with bpf_perf_event_read() that also takes index only (this helper is not available to socket filters anyway). Also bpf_tail_call() can only jump to the program of the same type. So I'm quite certain it's safe. Yes, please ask questions and try to poke holes. Now it is time. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-next 1/2] bpf: enable non-root eBPF programs
On 10/5/15 2:16 PM, Andy Lutomirski wrote: On Mon, Oct 5, 2015 at 2:12 PM, Alexei Starovoitovwrote: On 10/5/15 2:00 PM, Kees Cook wrote: On Mon, Oct 5, 2015 at 1:48 PM, Alexei Starovoitov wrote: In order to let unprivileged users load and execute eBPF programs teach verifier to prevent pointer leaks. Verifier will prevent - any arithmetic on pointers (except R10+Imm which is used to compute stack addresses) - comparison of pointers - passing pointers to helper functions - indirectly passing pointers in stack to helper functions - returning pointer from bpf program - storing pointers into ctx or maps Does the arithmetic restriction include using a pointer as an index to a maps-based tail call? I'm still worried about pointer-based side-effects. the array maps that hold FDs (BPF_MAP_TYPE_PROG_ARRAY and BPF_MAP_TYPE_PERF_EVENT_ARRAY) don't have lookup/update accessors from the program side, so programs cannot see or manipulate those pointers. For the former only bpf_tail_call() is allowed that takes integer index and jumps to it. And the latter map accessed with bpf_perf_event_read() that also takes index only (this helper is not available to socket filters anyway). Also bpf_tail_call() can only jump to the program of the same type. So I'm quite certain it's safe. At some point there will be an unprivileged way to create a map, though, and we don't want to let pointers get poked into the map. yes. exactly. With these two patches non-root can create a map against memlock user limit and have a program store bytes into it (like data from network packet), but it cannot store pointers into it. That's covered by test "unpriv: write pointer into map elem value" I've added new tests for all cases that can 'leak pointer': unpriv: return pointer OK unpriv: add const to pointer OK unpriv: add pointer to pointer OK unpriv: neg pointer OK unpriv: cmp pointer with const OK unpriv: cmp pointer with pointer OK unpriv: pass pointer to printk OK unpriv: pass pointer to helper function OK unpriv: indirectly pass pointer on stack to helper function OK unpriv: mangle pointer on stack 1 OK unpriv: mangle pointer on stack 2 OK unpriv: read pointer from stack in small chunks OK unpriv: write pointer into ctx OK unpriv: write pointer into map elem value OK unpriv: partial copy of pointer OK the most interesting one is 'indirectly pass pointer'. It checks the case where user stores a pointer into a stack and then uses that stack region either as a key for lookup or as part of format string for printk. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-next 1/2] bpf: enable non-root eBPF programs
On Mon, Oct 5, 2015 at 2:12 PM, Alexei Starovoitovwrote: > On 10/5/15 2:00 PM, Kees Cook wrote: >> >> On Mon, Oct 5, 2015 at 1:48 PM, Alexei Starovoitov >> wrote: >>> >>> >In order to let unprivileged users load and execute eBPF programs >>> >teach verifier to prevent pointer leaks. >>> >Verifier will prevent >>> >- any arithmetic on pointers >>> > (except R10+Imm which is used to compute stack addresses) >>> >- comparison of pointers >>> >- passing pointers to helper functions >>> >- indirectly passing pointers in stack to helper functions >>> >- returning pointer from bpf program >>> >- storing pointers into ctx or maps >> >> Does the arithmetic restriction include using a pointer as an index to >> a maps-based tail call? I'm still worried about pointer-based >> side-effects. > > > the array maps that hold FDs (BPF_MAP_TYPE_PROG_ARRAY and > BPF_MAP_TYPE_PERF_EVENT_ARRAY) don't have lookup/update accessors > from the program side, so programs cannot see or manipulate > those pointers. > For the former only bpf_tail_call() is allowed that takes integer > index and jumps to it. And the latter map accessed with Okay, so I can't take a pointer, put it on the stack, take it back any part of it as an integer and use it for a tail call? Sounds like this is shaping up nicely! Thanks for adding all these checks. -Kees > bpf_perf_event_read() that also takes index only (this helper > is not available to socket filters anyway). > Also bpf_tail_call() can only jump to the program of the same type. > So I'm quite certain it's safe. > > Yes, please ask questions and try to poke holes. Now it is time. > -- Kees Cook Chrome OS Security -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-next 1/2] bpf: enable non-root eBPF programs
On 10/05/2015 10:48 PM, Alexei Starovoitov wrote: In order to let unprivileged users load and execute eBPF programs teach verifier to prevent pointer leaks. Verifier will prevent - any arithmetic on pointers (except R10+Imm which is used to compute stack addresses) - comparison of pointers - passing pointers to helper functions - indirectly passing pointers in stack to helper functions - returning pointer from bpf program - storing pointers into ctx or maps Spill/fill of pointers into stack is allowed, but mangling of pointers stored in the stack or reading them byte by byte is not. Within bpf programs the pointers do exist, since programs need to be able to access maps, pass skb pointer to LD_ABS insns, etc but programs cannot pass such pointer values to the outside or obfuscate them. Only allow BPF_PROG_TYPE_SOCKET_FILTER unprivileged programs, so that socket filters (tcpdump), af_packet (quic acceleration) and future kcm can use it. tracing and tc cls/act program types still require root permissions, since tracing actually needs to be able to see all kernel pointers and tc is for root only. For example, the following unprivileged socket filter program is allowed: int foo(struct __sk_buff *skb) { char fmt[] = "hello %d\n"; bpf_trace_printk(fmt, sizeof(fmt), skb->len); return 0; } but the following program is not: int foo(struct __sk_buff *skb) { char fmt[] = "hello %p\n"; bpf_trace_printk(fmt, sizeof(fmt), fmt); return 0; } since it would leak the kernel stack address via bpf_trace_printk(). Unprivileged socket filter bpf programs have access to the following helper functions: - map lookup/update/delete (but they cannot store kernel pointers into them) - get_random (it's already exposed to unprivileged user space) - get_smp_processor_id - tail_call into another socket filter program - ktime_get_ns - bpf_trace_printk (for debugging) The feature is controlled by sysctl kernel.bpf_enable_unprivileged which is off by default. New tests were added to test_verifier: unpriv: return pointer OK unpriv: add const to pointer OK unpriv: add pointer to pointer OK unpriv: neg pointer OK unpriv: cmp pointer with const OK unpriv: cmp pointer with pointer OK unpriv: pass pointer to printk OK unpriv: pass pointer to helper function OK unpriv: indirectly pass pointer on stack to helper function OK unpriv: mangle pointer on stack 1 OK unpriv: mangle pointer on stack 2 OK unpriv: read pointer from stack in small chunks OK unpriv: write pointer into ctx OK unpriv: write pointer into map elem value OK unpriv: partial copy of pointer OK Signed-off-by: Alexei StarovoitovOne scenario that comes to mind ... what happens when there are kernel pointers stored in skb->cb[] (either from the current layer or an old one from a different layer that the skb went through previously, but which did not get overwritten)? Socket filters could read a portion of skb->cb[] also when unprived and leak that out through maps. I think the verifier doesn't catch that, right? Thanks, Daniel -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-next 1/2] bpf: enable non-root eBPF programs
On Mon, Oct 5, 2015 at 2:12 PM, Alexei Starovoitovwrote: > On 10/5/15 2:00 PM, Kees Cook wrote: >> >> On Mon, Oct 5, 2015 at 1:48 PM, Alexei Starovoitov >> wrote: >>> >>> >In order to let unprivileged users load and execute eBPF programs >>> >teach verifier to prevent pointer leaks. >>> >Verifier will prevent >>> >- any arithmetic on pointers >>> > (except R10+Imm which is used to compute stack addresses) >>> >- comparison of pointers >>> >- passing pointers to helper functions >>> >- indirectly passing pointers in stack to helper functions >>> >- returning pointer from bpf program >>> >- storing pointers into ctx or maps >> >> Does the arithmetic restriction include using a pointer as an index to >> a maps-based tail call? I'm still worried about pointer-based >> side-effects. > > > the array maps that hold FDs (BPF_MAP_TYPE_PROG_ARRAY and > BPF_MAP_TYPE_PERF_EVENT_ARRAY) don't have lookup/update accessors > from the program side, so programs cannot see or manipulate > those pointers. > For the former only bpf_tail_call() is allowed that takes integer > index and jumps to it. And the latter map accessed with > bpf_perf_event_read() that also takes index only (this helper > is not available to socket filters anyway). > Also bpf_tail_call() can only jump to the program of the same type. > So I'm quite certain it's safe. At some point there will be an unprivileged way to create a map, though, and we don't want to let pointers get poked into the map. Or am I misunderstanding? --Andy -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-next 1/2] bpf: enable non-root eBPF programs
On 10/5/15 3:02 PM, Kees Cook wrote: the array maps that hold FDs (BPF_MAP_TYPE_PROG_ARRAY and >BPF_MAP_TYPE_PERF_EVENT_ARRAY) don't have lookup/update accessors >from the program side, so programs cannot see or manipulate >those pointers. >For the former only bpf_tail_call() is allowed that takes integer >index and jumps to it. And the latter map accessed with Okay, so I can't take a pointer, put it on the stack, take it back any part of it as an integer and use it for a tail call? not quite. you can store a pointer to stack and read it as 8 byte load back into another register, but reading <8 byte of it will be rejected. That's the test: unpriv: read pointer from stack in small chunks we obviously want to avoid hiding pointer in integers. After reading it back from stack as a pointer you cannnot use this register to pass as index into bpf_tail_call(). That's the test: unpriv: pass pointer to helper function please keep shooting everything that comes to mind. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html