Re: XDP question: best API for returning/setting egress port?
On Wed, Apr 19, 2017 at 11:42:30PM +0200, Daniel Borkmann wrote: > On 04/19/2017 10:02 PM, Andy Gospodarek wrote: > > On Tue, Apr 18, 2017 at 09:58:56PM +0200, Jesper Dangaard Brouer wrote: > > > > > > As I argued in NetConf presentation[1] (from slide #9) we need a port > > > mapping table (instead of using ifindex'es). Both for supporting > > > other "port" types than net_devices (think sockets), and for > > > sandboxing what XDP can bypass. > > > > > > I want to create a new XDP action called XDP_REDIRECT, that instruct > > > XDP to send the xdp_buff to another "port" (get translated into a > > > net_device, or something else depending on internal port type). > > > > > > Looking at the userspace/eBPF interface, I'm wondering what is the > > > best API for "returning" this port number from eBPF? > > > > > > The options I see is: > > > > > > 1) Split-up the u32 action code, and e.g let the high-16-bit be the > > > port number and lower-16bit the (existing) action verdict. > > > > > > Pros: Simple API > > > Cons: Number of ports limited to 64K > > > > Practically speaking this may be seem reserving 64k for the port number > > might be enough space, but I would also like to see a new return option > > for flags so I start to get concerned about space. Daniel also > > hightlights the fact that encoding the port in the action may does not > > leave room for flags and could get confusing. > > > > One unfortunate side-effect of dropping or transmitting frames with XDP > > is that we lose the opportunity to statistically sample in netfilter > > since the frames were dropped so early and I'd like to bring that back > > with a call to parse flags and possibly call psample_sample_packet() > > after the xdp action. Packet sampling cannot simply be an action > > since there are times when a frame should be dropped but should also be > > sampled, so it seems logical to add this as a flag. > > Hmm, you can do that already today with bpf_xdp_event_output() helper > and the program decides when to sample and what custom meta data to > prepend along with the (partial or full) xdp packet, see also [1], slide > 11 for the "XDP dump" part. > > No need to change drivers for this, psample_sample_packet() would also > be slower. > > [1] http://netdevconf.org/2.1/slides/apr6/zhou-netdev-xdp-2017.pdf Thanks, Daniel. Not sure how I missed this, but I appreciate the pointer. I'll add coding up a program to demo this to my TODO list.
Re: XDP question: best API for returning/setting egress port?
On Thu, 20 Apr 2017 00:51:31 +0200 Daniel Borkmann wrote: > On 04/19/2017 10:02 PM, Andy Gospodarek wrote: > [...] > > and then lookup this dest in a table we have the option to make that > > dest an ifindex/socket/other. > > > > I did also look at JohnF's patch and I do like the simplicity of the > > redirect > > action and new ndo_xdp_xmit and how it moves towards a way to transmit the > > frame. The downside is that it presumes an ifindex, so it might not be > > ideal > > we want the lookup to return something other than an ifindex. > > > [...] > > would be handled. If we are ultimately going to need a new netdev op to > > handle the redirect then what may be the issue with not providing the > > destination port the return code and the option proposed by JohnF looks > > good to me with maybe a small tweak to not presume ifindex in some manner. > > Is there a concrete reason that all the proposed future cases like sockets > have to be handled within the very same XDP_REDIRECT return code? F.e. why > not XDP_TX_NIC that only assumes ifindex as proposed in the patch, and future > ones would get a different return code f.e. XDP_TX_SK only handling sockets > when we get there implementation-wise? The concrete reason for an "unified" return code is to simplify changes needed in the drivers. The purpose is moving code out of the drivers into the core. IMHO we should keep drivers as simple as possible and stop adding more return action codes. XDP_REDIRECT combined with a port-table should be the last action-code needed in the drivers. As the port table will know the "type" of the port. Keeping action XDP_DROP and XDP_TX inside the drivers make sense, because it allow for driver level optimizations. Actions that want to transfer the packet/xdp_buff "outside" the driver should call a core function (given it will be called with the xdp_buff struct, you can hide all your extra extensions there without changing the driver code). -- Best regards, Jesper Dangaard Brouer MSc.CS, Principal Kernel Engineer at Red Hat LinkedIn: http://www.linkedin.com/in/brouer
Re: XDP question: best API for returning/setting egress port?
[...] > JohnF, any test results with this you can share? Presumably you tested with > virtio-net, right? > For my patches using the xdp_redirect with virtio-net showed no measurable difference from running just the straight XDP_TX in the PPS column. However virtio_net XDP_TX was quite low to start with (and I was mostly just looking at functionality so I don't recall if there was a CPU usage hit) so I expect on ixgbe to see a difference. However I have plenty of cpu overhead to consume on the 10Gbps card so it will likely just be a CPU % cost not a PPS cost. I'll bother Alex about the 40Gbps cards ;) Anyways, I'll put it on my list to rerun. But like the xdp generic pieces it wont be for a couple days until I get around to it. Sorry need to take care of a few things first. Thanks, John
Re: XDP question: best API for returning/setting egress port?
On 04/19/2017 10:02 PM, Andy Gospodarek wrote: [...] and then lookup this dest in a table we have the option to make that dest an ifindex/socket/other. I did also look at JohnF's patch and I do like the simplicity of the redirect action and new ndo_xdp_xmit and how it moves towards a way to transmit the frame. The downside is that it presumes an ifindex, so it might not be ideal we want the lookup to return something other than an ifindex. [...] would be handled. If we are ultimately going to need a new netdev op to handle the redirect then what may be the issue with not providing the destination port the return code and the option proposed by JohnF looks good to me with maybe a small tweak to not presume ifindex in some manner. Is there a concrete reason that all the proposed future cases like sockets have to be handled within the very same XDP_REDIRECT return code? F.e. why not XDP_TX_NIC that only assumes ifindex as proposed in the patch, and future ones would get a different return code f.e. XDP_TX_SK only handling sockets when we get there implementation-wise?
Re: XDP question: best API for returning/setting egress port?
On 04/19/2017 10:02 PM, Andy Gospodarek wrote: On Tue, Apr 18, 2017 at 09:58:56PM +0200, Jesper Dangaard Brouer wrote: As I argued in NetConf presentation[1] (from slide #9) we need a port mapping table (instead of using ifindex'es). Both for supporting other "port" types than net_devices (think sockets), and for sandboxing what XDP can bypass. I want to create a new XDP action called XDP_REDIRECT, that instruct XDP to send the xdp_buff to another "port" (get translated into a net_device, or something else depending on internal port type). Looking at the userspace/eBPF interface, I'm wondering what is the best API for "returning" this port number from eBPF? The options I see is: 1) Split-up the u32 action code, and e.g let the high-16-bit be the port number and lower-16bit the (existing) action verdict. Pros: Simple API Cons: Number of ports limited to 64K Practically speaking this may be seem reserving 64k for the port number might be enough space, but I would also like to see a new return option for flags so I start to get concerned about space. Daniel also hightlights the fact that encoding the port in the action may does not leave room for flags and could get confusing. One unfortunate side-effect of dropping or transmitting frames with XDP is that we lose the opportunity to statistically sample in netfilter since the frames were dropped so early and I'd like to bring that back with a call to parse flags and possibly call psample_sample_packet() after the xdp action. Packet sampling cannot simply be an action since there are times when a frame should be dropped but should also be sampled, so it seems logical to add this as a flag. Hmm, you can do that already today with bpf_xdp_event_output() helper and the program decides when to sample and what custom meta data to prepend along with the (partial or full) xdp packet, see also [1], slide 11 for the "XDP dump" part. No need to change drivers for this, psample_sample_packet() would also be slower. [1] http://netdevconf.org/2.1/slides/apr6/zhou-netdev-xdp-2017.pdf
Re: XDP question: best API for returning/setting egress port?
On Tue, Apr 18, 2017 at 09:58:56PM +0200, Jesper Dangaard Brouer wrote: > > As I argued in NetConf presentation[1] (from slide #9) we need a port > mapping table (instead of using ifindex'es). Both for supporting > other "port" types than net_devices (think sockets), and for > sandboxing what XDP can bypass. > > I want to create a new XDP action called XDP_REDIRECT, that instruct > XDP to send the xdp_buff to another "port" (get translated into a > net_device, or something else depending on internal port type). > > Looking at the userspace/eBPF interface, I'm wondering what is the > best API for "returning" this port number from eBPF? > > The options I see is: > > 1) Split-up the u32 action code, and e.g let the high-16-bit be the >port number and lower-16bit the (existing) action verdict. > > Pros: Simple API > Cons: Number of ports limited to 64K Practically speaking this may be seem reserving 64k for the port number might be enough space, but I would also like to see a new return option for flags so I start to get concerned about space. Daniel also hightlights the fact that encoding the port in the action may does not leave room for flags and could get confusing. One unfortunate side-effect of dropping or transmitting frames with XDP is that we lose the opportunity to statistically sample in netfilter since the frames were dropped so early and I'd like to bring that back with a call to parse flags and possibly call psample_sample_packet() after the xdp action. Packet sampling cannot simply be an action since there are times when a frame should be dropped but should also be sampled, so it seems logical to add this as a flag. > > 2) Extend both xdp_buff + xdp_md to contain a (u32) port number, allow >eBPF to update xdp_md->port. > > Pros: Larger number of ports. > Cons: This require some ebpf translation steps between xdp_buff <-> xdp_md. >(see xdp_convert_ctx_access) I think I would lean towards this based on the fact that I'd like to see a flags field added to the u32 return (maybe the top 8 bits) as mentioned above. So if we follow down this path and add a 'dest' field to xdp_buff and xdp_md like this: struct xdp_buff { void *data; void *data_end; void *data_hard_start; __u32 dest; }; struct xdp_md { __u32 data; __u32 data_end; __u32 dest; }; and then lookup this dest in a table we have the option to make that dest an ifindex/socket/other. I did also look at JohnF's patch and I do like the simplicity of the redirect action and new ndo_xdp_xmit and how it moves towards a way to transmit the frame. The downside is that it presumes an ifindex, so it might not be ideal we want the lookup to return something other than an ifindex. Before aligning on a direction for the return values from exiting xdp call, it seems like we should also think about the tx side and how that would be handled. If we are ultimately going to need a new netdev op to handle the redirect then what may be the issue with not providing the destination port the return code and the option proposed by JohnF looks good to me with maybe a small tweak to not presume ifindex in some manner. JohnF, any test results with this you can share? Presumably you tested with virtio-net, right? > > 3) Extend only xdp_buff and create bpf_helper that set port in xdp_buff. > > Pros: Hides impl details, and allows helper to give eBPF code feedback >(on e.g. if port doesn't exist any longer) > Cons: Helper function call likely slower? > > > (Cc'ed xdp-newbies as end-users might have an opinion on UAPI?) > -- > Best regards, > Jesper Dangaard Brouer > MSc.CS, Principal Kernel Engineer at Red Hat > LinkedIn: http://www.linkedin.com/in/brouer > > [1] > http://people.netfilter.org/hawk/presentations/NetConf2017/xdp_work_ahead_NetConf_April_2017.pdf
Re: XDP question: best API for returning/setting egress port?
On Wed, 19 Apr 2017 14:33:27 +0200 Daniel Borkmann wrote: > On 04/19/2017 02:00 PM, Jesper Dangaard Brouer wrote: > > On Tue, 18 Apr 2017 13:54:45 -0700 > > John Fastabend wrote: > >> On 17-04-18 12:58 PM, Jesper Dangaard Brouer wrote: > >>> > >>> As I argued in NetConf presentation[1] (from slide #9) we need a port > >>> mapping table (instead of using ifindex'es). Both for supporting > >>> other "port" types than net_devices (think sockets), and for > >>> sandboxing what XDP can bypass. > >>> > >>> I want to create a new XDP action called XDP_REDIRECT, that instruct > >>> XDP to send the xdp_buff to another "port" (get translated into a > >>> net_device, or something else depending on internal port type). > >>> > >>> Looking at the userspace/eBPF interface, I'm wondering what is the > >>> best API for "returning" this port number from eBPF? > >>> > >>> The options I see is: > >>> > >>> 1) Split-up the u32 action code, and e.g let the high-16-bit be the > >>> port number and lower-16bit the (existing) action verdict. > >>> > >>> Pros: Simple API > >>> Cons: Number of ports limited to 64K > >>> > >>> 2) Extend both xdp_buff + xdp_md to contain a (u32) port number, allow > >>> eBPF to update xdp_md->port. > >>> > >>> Pros: Larger number of ports. > >>> Cons: This require some ebpf translation steps between xdp_buff <-> > >>> xdp_md. > >>> (see xdp_convert_ctx_access) > >>> > >>> 3) Extend only xdp_buff and create bpf_helper that set port in xdp_buff. > >>> > >>> Pros: Hides impl details, and allows helper to give eBPF code feedback > >>> (on e.g. if port doesn't exist any longer) > >>> Cons: Helper function call likely slower? > >> > >> How about doing this the same way redirect is done in the tc case? I have > >> this > >> patch under test, > >> > >> > >> https://github.com/jrfastab/linux/commit/e78f5425d5e3c305b4170ddd85c61c2e15359fee > >> > > > > I have been looking at this approach, which is close to option #3 above. > > > > The problem with your implementation that you use a per-cpu store. > > This creates the problem of storing state between packets. First packet > > can call helper bpf_xdp_redirect() setting an ifindex, but program can > > still return XDP_PASS. Next packet can call XDP_REDIRECT and use the > > ifindex set from the first packet. IMHO this is a problematic API to > > expose. > > > > I do see that the TC interface that uses the same approach, via helper > > bpf_redirect(). Maybe it have the same API problem? Looking at > > sch_handle_ingress() I don't see this is handled (e.g. by always > > clearing this_cpu_ptr(redirect_info)->ifindex = 0). > > It's cleared in {skb,xdp}_do_redirect() right after fetching the > ifindex. I think this approach is just fine. The example described > above is a misuse of the API by a buggy program calling bpf_xdp_redirect() > and returning XDP_PASS while another time it returns XDP_REDIRECT > without the bpf_xdp_redirect() helper, sounds very exotic, but it's > as buggy as, say, a program doing the csum update wrong, a program > writing the wrong data to the packet, doing adjust head on the wrong > header offset, jumping into the wrong tail call entry and other things. For TC I guess it is fine to keep it as is, because it is needed to avoid extending skb. IHMO for XDP I see no reason to keep a per-cpu-store (which besides will be slower), simply update xdp_buff.port should be sufficient (which is only relevant for this packet). As noted in option#3, my concern is that calling a helper function call will be slower, than simply returning the needed port info? Maybe some bpf experts can tell me if such helper call could be optimized out with some bpf magic? > I think encoding this into an action code is rather limiting, f.e. > where would we place a flags argument if needed in future? Would > that mean, we need a XDP_REDIRECT2 return code that also allows for > encoding flags? Nope, it will be extensible. We can start with: struct xdp_ret { union { __u32 act; struct { __u16 action; __u16 port; }; }; And later change it to: struct xdp_ret { union { __u32 act; struct { __u8 action; __u8 flags; __u16 port; }; }; If actions does not go above 255. I would prefer that we start with the latter, else people would argue that we need to extend the structure like: struct xdp_ret { union { __u32 act; struct { union { __u16 action; struct { __u8 action2; __u8 flags; }; }; __u16 port; }; }; -- Best regards, Jesper Dangaard Brouer MSc.CS, Principal Kernel Engineer at Red Hat LinkedIn: http://www.linkedin.com/in/brouer
Re: XDP question: best API for returning/setting egress port?
On 04/19/2017 02:00 PM, Jesper Dangaard Brouer wrote: On Tue, 18 Apr 2017 13:54:45 -0700 John Fastabend wrote: On 17-04-18 12:58 PM, Jesper Dangaard Brouer wrote: As I argued in NetConf presentation[1] (from slide #9) we need a port mapping table (instead of using ifindex'es). Both for supporting other "port" types than net_devices (think sockets), and for sandboxing what XDP can bypass. I want to create a new XDP action called XDP_REDIRECT, that instruct XDP to send the xdp_buff to another "port" (get translated into a net_device, or something else depending on internal port type). Looking at the userspace/eBPF interface, I'm wondering what is the best API for "returning" this port number from eBPF? The options I see is: 1) Split-up the u32 action code, and e.g let the high-16-bit be the port number and lower-16bit the (existing) action verdict. Pros: Simple API Cons: Number of ports limited to 64K 2) Extend both xdp_buff + xdp_md to contain a (u32) port number, allow eBPF to update xdp_md->port. Pros: Larger number of ports. Cons: This require some ebpf translation steps between xdp_buff <-> xdp_md. (see xdp_convert_ctx_access) 3) Extend only xdp_buff and create bpf_helper that set port in xdp_buff. Pros: Hides impl details, and allows helper to give eBPF code feedback (on e.g. if port doesn't exist any longer) Cons: Helper function call likely slower? How about doing this the same way redirect is done in the tc case? I have this patch under test, https://github.com/jrfastab/linux/commit/e78f5425d5e3c305b4170ddd85c61c2e15359fee I have been looking at this approach, which is close to option #3 above. The problem with your implementation that you use a per-cpu store. This creates the problem of storing state between packets. First packet can call helper bpf_xdp_redirect() setting an ifindex, but program can still return XDP_PASS. Next packet can call XDP_REDIRECT and use the ifindex set from the first packet. IMHO this is a problematic API to expose. I do see that the TC interface that uses the same approach, via helper bpf_redirect(). Maybe it have the same API problem? Looking at sch_handle_ingress() I don't see this is handled (e.g. by always clearing this_cpu_ptr(redirect_info)->ifindex = 0). It's cleared in {skb,xdp}_do_redirect() right after fetching the ifindex. I think this approach is just fine. The example described above is a misuse of the API by a buggy program calling bpf_xdp_redirect() and returning XDP_PASS while another time it returns XDP_REDIRECT without the bpf_xdp_redirect() helper, sounds very exotic, but it's as buggy as, say, a program doing the csum update wrong, a program writing the wrong data to the packet, doing adjust head on the wrong header offset, jumping into the wrong tail call entry and other things. I think encoding this into an action code is rather limiting, f.e. where would we place a flags argument if needed in future? Would that mean, we need a XDP_REDIRECT2 return code that also allows for encoding flags? Thanks, Daniel
Re: XDP question: best API for returning/setting egress port?
Hi, On 18.04.2017 21:58, Jesper Dangaard Brouer wrote: > > As I argued in NetConf presentation[1] (from slide #9) we need a port > mapping table (instead of using ifindex'es). Both for supporting > other "port" types than net_devices (think sockets), and for > sandboxing what XDP can bypass. > > I want to create a new XDP action called XDP_REDIRECT, that instruct > XDP to send the xdp_buff to another "port" (get translated into a > net_device, or something else depending on internal port type). > > Looking at the userspace/eBPF interface, I'm wondering what is the > best API for "returning" this port number from eBPF? > > The options I see is: > > 1) Split-up the u32 action code, and e.g let the high-16-bit be the >port number and lower-16bit the (existing) action verdict. > > Pros: Simple API > Cons: Number of ports limited to 64K > > 2) Extend both xdp_buff + xdp_md to contain a (u32) port number, allow >eBPF to update xdp_md->port. > > Pros: Larger number of ports. > Cons: This require some ebpf translation steps between xdp_buff <-> xdp_md. >(see xdp_convert_ctx_access) > > 3) Extend only xdp_buff and create bpf_helper that set port in xdp_buff. > > Pros: Hides impl details, and allows helper to give eBPF code feedback >(on e.g. if port doesn't exist any longer) > Cons: Helper function call likely slower? > > > (Cc'ed xdp-newbies as end-users might have an opinion on UAPI?) I am not sure how the socket interface should look like, it seems to be too far away to me right now. Regarding having stable ifindexes, I wonder if we could do something: int ifindexes_in_use_by_ebpf_program[] __section("ifindex") = { 1,2,3,8,9,10 }; and we can make sure that the ifindexes automatically stay stable for the lifetime while the ebpf program is loaded? Bye, Hannes
Re: XDP question: best API for returning/setting egress port?
On Tue, 18 Apr 2017 13:54:45 -0700 John Fastabend wrote: > On 17-04-18 12:58 PM, Jesper Dangaard Brouer wrote: > > > > As I argued in NetConf presentation[1] (from slide #9) we need a port > > mapping table (instead of using ifindex'es). Both for supporting > > other "port" types than net_devices (think sockets), and for > > sandboxing what XDP can bypass. > > > > I want to create a new XDP action called XDP_REDIRECT, that instruct > > XDP to send the xdp_buff to another "port" (get translated into a > > net_device, or something else depending on internal port type). > > > > Looking at the userspace/eBPF interface, I'm wondering what is the > > best API for "returning" this port number from eBPF? > > > > The options I see is: > > > > 1) Split-up the u32 action code, and e.g let the high-16-bit be the > >port number and lower-16bit the (existing) action verdict. > > > > Pros: Simple API > > Cons: Number of ports limited to 64K > > > > 2) Extend both xdp_buff + xdp_md to contain a (u32) port number, allow > >eBPF to update xdp_md->port. > > > > Pros: Larger number of ports. > > Cons: This require some ebpf translation steps between xdp_buff <-> xdp_md. > >(see xdp_convert_ctx_access) > > > > 3) Extend only xdp_buff and create bpf_helper that set port in xdp_buff. > > > > Pros: Hides impl details, and allows helper to give eBPF code feedback > >(on e.g. if port doesn't exist any longer) > > Cons: Helper function call likely slower? > > > > > > How about doing this the same way redirect is done in the tc case? I have this > patch under test, > > > https://github.com/jrfastab/linux/commit/e78f5425d5e3c305b4170ddd85c61c2e15359fee I have been looking at this approach, which is close to option #3 above. The problem with your implementation that you use a per-cpu store. This creates the problem of storing state between packets. First packet can call helper bpf_xdp_redirect() setting an ifindex, but program can still return XDP_PASS. Next packet can call XDP_REDIRECT and use the ifindex set from the first packet. IMHO this is a problematic API to expose. I do see that the TC interface that uses the same approach, via helper bpf_redirect(). Maybe it have the same API problem? Looking at sch_handle_ingress() I don't see this is handled (e.g. by always clearing this_cpu_ptr(redirect_info)->ifindex = 0). > that should give you some idea. It just needs a port mapping table in the > bpf_tx_xdp() call. I'll take a closer look. I don't think we need the per-cpu-store approach for XDP, as we might as well store the port info in xdp_buff, or return it directly option #1. (TC redirect need the per-cpu-store to avoid extending the SKB). > > (Cc'ed xdp-newbies as end-users might have an opinion on UAPI?) I would still like people to comment on the above options? -- Best regards, Jesper Dangaard Brouer MSc.CS, Principal Kernel Engineer at Red Hat LinkedIn: http://www.linkedin.com/in/brouer
Re: XDP question: best API for returning/setting egress port?
On 17-04-18 12:58 PM, Jesper Dangaard Brouer wrote: > > As I argued in NetConf presentation[1] (from slide #9) we need a port > mapping table (instead of using ifindex'es). Both for supporting > other "port" types than net_devices (think sockets), and for > sandboxing what XDP can bypass. > > I want to create a new XDP action called XDP_REDIRECT, that instruct > XDP to send the xdp_buff to another "port" (get translated into a > net_device, or something else depending on internal port type). > > Looking at the userspace/eBPF interface, I'm wondering what is the > best API for "returning" this port number from eBPF? > > The options I see is: > > 1) Split-up the u32 action code, and e.g let the high-16-bit be the >port number and lower-16bit the (existing) action verdict. > > Pros: Simple API > Cons: Number of ports limited to 64K > > 2) Extend both xdp_buff + xdp_md to contain a (u32) port number, allow >eBPF to update xdp_md->port. > > Pros: Larger number of ports. > Cons: This require some ebpf translation steps between xdp_buff <-> xdp_md. >(see xdp_convert_ctx_access) > > 3) Extend only xdp_buff and create bpf_helper that set port in xdp_buff. > > Pros: Hides impl details, and allows helper to give eBPF code feedback >(on e.g. if port doesn't exist any longer) > Cons: Helper function call likely slower? > > How about doing this the same way redirect is done in the tc case? I have this patch under test, https://github.com/jrfastab/linux/commit/e78f5425d5e3c305b4170ddd85c61c2e15359fee that should give you some idea. It just needs a port mapping table in the bpf_tx_xdp() call. > (Cc'ed xdp-newbies as end-users might have an opinion on UAPI?) >