Re: Blogpost evaluation this [PATCH v4 net-next RFC] net: Generic XDP
From: Andy Gospodarek Date: Tue, 25 Apr 2017 13:25:49 -0400 > On Mon, Apr 24, 2017 at 06:26:43PM -0400, David Miller wrote: >> Andy, how goes it? :) > > Sorry for the delayed response. I was AFK yesterday, but based on > testing from Friday and what I wrapped up today all looks good to me. > > On my system (i7-6700 CPU @ 3.40GHz) the reported and actual TX > throughput for xdp_tx_iptunnel is 4.6Mpps for the optimized XDP. > > For generic XDP the reported throughput of xdp_tx_iptunnel is 4.6Mpps > but only ~880kpps actually on the wire. It seems to me that can be > fixed with a follow-up for offending drivers or the stack if deemed that > there is a real error there. Ok, I'll commit the latest version with tested-by tags for you, Jesper, and David Ahern added. Thanks everyone.
Re: Blogpost evaluation this [PATCH v4 net-next RFC] net: Generic XDP
On Mon, Apr 24, 2017 at 06:26:43PM -0400, David Miller wrote: > From: Jesper Dangaard Brouer > Date: Mon, 24 Apr 2017 16:24:05 +0200 > > > I've done a very detailed evaluation of this patch, and I've created a > > blogpost like report here: > > > > > > https://prototype-kernel.readthedocs.io/en/latest/blogposts/xdp25_eval_generic_xdp_tx.html > > Thanks for doing this Jesper. Yes, this is excellent. I'm not all the way thru it, but I looked at the data and corroborate the results you are seeing. My results for both optimized and generic XDP for xdp_bench01_mem_access_cost --action XDP_DROP --readmem are quite similar to yours (11.7Mpps and 7.8Mpps, respectively for me 11.7Mpps and 8.4Mpps for you). I also noted (as you did) that there is no discernible difference running xdp_bench01_mem_access_cost with or without the --readmem option since the packet data is already being accessed that late it the stack. > > > I didn't evaluate the adjust_head part, so I hope Andy is still > > planning to validate that part? > > I was hoping he would post some results today as well. > > Andy, how goes it? :) Sorry for the delayed response. I was AFK yesterday, but based on testing from Friday and what I wrapped up today all looks good to me. On my system (i7-6700 CPU @ 3.40GHz) the reported and actual TX throughput for xdp_tx_iptunnel is 4.6Mpps for the optimized XDP. For generic XDP the reported throughput of xdp_tx_iptunnel is 4.6Mpps but only ~880kpps actually on the wire. It seems to me that can be fixed with a follow-up for offending drivers or the stack if deemed that there is a real error there. > Once the basic patch is ready and integrated in we can try to do > xmit_more in generic XDP and see what that does for XDP_TX > performance. Agreed.
Re: Blogpost evaluation this [PATCH v4 net-next RFC] net: Generic XDP
On Mon, 24 Apr 2017 18:26:43 -0400 (EDT) David Miller wrote: > From: Jesper Dangaard Brouer > Date: Mon, 24 Apr 2017 16:24:05 +0200 > > > I've done a very detailed evaluation of this patch, and I've created a > > blogpost like report here: > > > > > > https://prototype-kernel.readthedocs.io/en/latest/blogposts/xdp25_eval_generic_xdp_tx.html > > > > Thanks for doing this Jesper. > > > I didn't evaluate the adjust_head part, so I hope Andy is still > > planning to validate that part? > > I was hoping he would post some results today as well. > > Andy, how goes it? :) > > Once the basic patch is ready and integrated in we can try to do > xmit_more in generic XDP and see what that does for XDP_TX > performance. I agree, we can do xmit_more for generic-XDP later, and it should not be that hard... basically replacing netdev_start_xmit() with dev_hard_start_xmit() in generic_xdp_tx(), and finding some place to store a XDP-skb-pointer (functioning as the skb-list) that will be "flushed" like __kfree_skb_flush(). -- Best regards, Jesper Dangaard Brouer MSc.CS, Principal Kernel Engineer at Red Hat LinkedIn: http://www.linkedin.com/in/brouer
Re: Blogpost evaluation this [PATCH v4 net-next RFC] net: Generic XDP
From: Jesper Dangaard Brouer Date: Mon, 24 Apr 2017 16:24:05 +0200 > I've done a very detailed evaluation of this patch, and I've created a > blogpost like report here: > > > https://prototype-kernel.readthedocs.io/en/latest/blogposts/xdp25_eval_generic_xdp_tx.html Thanks for doing this Jesper. > I didn't evaluate the adjust_head part, so I hope Andy is still > planning to validate that part? I was hoping he would post some results today as well. Andy, how goes it? :) Once the basic patch is ready and integrated in we can try to do xmit_more in generic XDP and see what that does for XDP_TX performance.
Blogpost evaluation this [PATCH v4 net-next RFC] net: Generic XDP
I've done a very detailed evaluation of this patch, and I've created a blogpost like report here: https://prototype-kernel.readthedocs.io/en/latest/blogposts/xdp25_eval_generic_xdp_tx.html I didn't evaluate the adjust_head part, so I hope Andy is still planning to validate that part? --Jesper On Thu, 13 Apr 2017 12:09:25 -0400 (EDT) David Miller wrote: > This provides a generic SKB based non-optimized XDP path which is used > if either the driver lacks a specific XDP implementation, or the user > requests it via a new IFLA_XDP_FLAGS value named XDP_FLAGS_SKB_MODE. > > It is arguable that perhaps I should have required something like > this as part of the initial XDP feature merge. > > I believe this is critical for two reasons: > > 1) Accessibility. More people can play with XDP with less >dependencies. Yes I know we have XDP support in virtio_net, but >that just creates another depedency for learning how to use this >facility. > >I wrote this to make life easier for the XDP newbies. > > 2) As a model for what the expected semantics are. If there is a pure >generic core implementation, it serves as a semantic example for >driver folks adding XDP support. > > This is just a rough draft and is untested. -- Best regards, Jesper Dangaard Brouer MSc.CS, Principal Kernel Engineer at Red Hat LinkedIn: http://www.linkedin.com/in/brouer
Re: [PATCH v4 net-next RFC] net: Generic XDP
On Thu, 20 Apr 2017 16:30:34 +0200 Jesper Dangaard Brouer wrote: > On Wed, 19 Apr 2017 10:29:03 -0400 > Andy Gospodarek wrote: > > > I ran this on top of a card that uses the bnxt_en driver on a desktop > > class system with an i7-6700 CPU @ 3.40GHz, sending a single stream of > > UDP traffic with flow control disabled and saw the following (all stats > > in Million PPS). > > > > xdp1xdp2xdp_tx_tunnel > > Generic XDP 7.85.5 (1.3 actual) 4.6 (1.1 actual) > > Optimized XDP 11.7 9.7 4.6 > > > > One thing to note is that the Generic XDP case shows some different > > results for reported by the application vs actual (seen on the wire). I > > did not debug where the drops are happening and what counter needs to be > > incremented to note this -- I'll add that to my TODO list. The > > Optimized XDP case does not have a difference in reported vs actual > > frames on the wire. > > The reported application vs actual (seen on the wire) number sound scary. > How do you evaluate/measure "seen on the wire"? > > Perhaps you could use ethtool -S stats to see if anything is fishy? > I recommend using my tool[1] like: > > ~/git/network-testing/bin/ethtool_stats.pl --dev mlx5p2 --sec 2 > > [1] > https://github.com/netoptimizer/network-testing/blob/master/bin/ethtool_stats.pl > > I'm evaluating this patch on a mlx5 NIC, and something is not right... > I'm seeing: > > Ethtool(mlx5p2) stat: 349599 (349,599) <= tx_multicast_phy /sec > Ethtool(mlx5p2) stat:4940185 ( 4,940,185) <= tx_packets /sec > Ethtool(mlx5p2) stat: 349596 (349,596) <= tx_packets_phy /sec > [...] > Ethtool(mlx5p2) stat: 36898 ( 36,898) <= rx_cache_busy /sec > Ethtool(mlx5p2) stat: 36898 ( 36,898) <= rx_cache_full /sec > Ethtool(mlx5p2) stat:4903287 ( 4,903,287) <= rx_cache_reuse /sec > Ethtool(mlx5p2) stat:4940185 ( 4,940,185) <= rx_csum_complete /sec > Ethtool(mlx5p2) stat:4940185 ( 4,940,185) <= rx_packets /sec > > Something is wrong... when I tcpdump on the generator machine, I see > garbled packets with IPv6 multicast addresses. > > And it looks like I'm only sending 349,596 tx_packets_phy/sec on the "wire". > Not seeing packets on the TX wire was caused by the NIC HW dropping the packets, because the ethernet MAC-addr were not changed/swapped. Fixed this XDP_TX bug in my test program xdp_bench01_mem_access_cost. https://github.com/netoptimizer/prototype-kernel/commit/85f7ba2f0ea2 Even added a new option --swapmac for creating another test option for modifying the packet. https://github.com/netoptimizer/prototype-kernel/commit/fe080e6f3ccf I will shortly publish a full report of testing this patch. -- Best regards, Jesper Dangaard Brouer MSc.CS, Principal Kernel Engineer at Red Hat LinkedIn: http://www.linkedin.com/in/brouer
Re: [PATCH v4 net-next RFC] net: Generic XDP
On Wed, Apr 19, 2017 at 09:40:49PM -0400, David Miller wrote: > From: Andy Gospodarek > Date: Wed, 19 Apr 2017 10:29:03 -0400 > > > So I tried a variety of things and the simplest change on top of yours that > > works well for xdp1, xdp2, and xdp_tx_iptunnel. > > > > diff --git a/net/core/dev.c b/net/core/dev.c > > index b3d3a6e..1bab3dc 100644 > > --- a/net/core/dev.c > > +++ b/net/core/dev.c > > @@ -4316,11 +4316,11 @@ static u32 netif_receive_generic_xdp(struct sk_buff > > *skb, > > > > off = xdp.data - orig_data; > > if (off) > > - __skb_push(skb, off); > > + __skb_push(skb, -off); > > We have to handle both pushing and popping headers, so could you > please test the snippet I asked you to try? > I will tomorrow or by Monday of next week. I'm also going to hack^W write a quick test app to exercise it as well. > > if (off > 0) > > __skb_pull(skb, off); > > else if (off < 0) > > __skb_push(skb, -off); > > Thanks.
Re: [PATCH v4 net-next RFC] net: Generic XDP
On Wed, 19 Apr 2017 10:29:03 -0400 Andy Gospodarek wrote: > I ran this on top of a card that uses the bnxt_en driver on a desktop > class system with an i7-6700 CPU @ 3.40GHz, sending a single stream of > UDP traffic with flow control disabled and saw the following (all stats > in Million PPS). > > xdp1xdp2xdp_tx_tunnel > Generic XDP 7.85.5 (1.3 actual) 4.6 (1.1 actual) > Optimized XDP 11.7 9.7 4.6 > > One thing to note is that the Generic XDP case shows some different > results for reported by the application vs actual (seen on the wire). I > did not debug where the drops are happening and what counter needs to be > incremented to note this -- I'll add that to my TODO list. The > Optimized XDP case does not have a difference in reported vs actual > frames on the wire. The reported application vs actual (seen on the wire) number sound scary. How do you evaluate/measure "seen on the wire"? Perhaps you could use ethtool -S stats to see if anything is fishy? I recommend using my tool[1] like: ~/git/network-testing/bin/ethtool_stats.pl --dev mlx5p2 --sec 2 [1] https://github.com/netoptimizer/network-testing/blob/master/bin/ethtool_stats.pl I'm evaluating this patch on a mlx5 NIC, and something is not right... I'm seeing: Ethtool(mlx5p2) stat: 349599 (349,599) <= tx_multicast_phy /sec Ethtool(mlx5p2) stat:4940185 ( 4,940,185) <= tx_packets /sec Ethtool(mlx5p2) stat: 349596 (349,596) <= tx_packets_phy /sec [...] Ethtool(mlx5p2) stat: 36898 ( 36,898) <= rx_cache_busy /sec Ethtool(mlx5p2) stat: 36898 ( 36,898) <= rx_cache_full /sec Ethtool(mlx5p2) stat:4903287 ( 4,903,287) <= rx_cache_reuse /sec Ethtool(mlx5p2) stat:4940185 ( 4,940,185) <= rx_csum_complete /sec Ethtool(mlx5p2) stat:4940185 ( 4,940,185) <= rx_packets /sec Something is wrong... when I tcpdump on the generator machine, I see garbled packets with IPv6 multicast addresses. And it looks like I'm only sending 349,596 tx_packets_phy/sec on the "wire". -- Best regards, Jesper Dangaard Brouer MSc.CS, Principal Kernel Engineer at Red Hat LinkedIn: http://www.linkedin.com/in/brouer
Re: [PATCH v4 net-next RFC] net: Generic XDP
From: Andy Gospodarek Date: Wed, 19 Apr 2017 10:29:03 -0400 > So I tried a variety of things and the simplest change on top of yours that > works well for xdp1, xdp2, and xdp_tx_iptunnel. > > diff --git a/net/core/dev.c b/net/core/dev.c > index b3d3a6e..1bab3dc 100644 > --- a/net/core/dev.c > +++ b/net/core/dev.c > @@ -4316,11 +4316,11 @@ static u32 netif_receive_generic_xdp(struct sk_buff > *skb, > > off = xdp.data - orig_data; > if (off) > - __skb_push(skb, off); > + __skb_push(skb, -off); We have to handle both pushing and popping headers, so could you please test the snippet I asked you to try? > if (off > 0) > __skb_pull(skb, off); > else if (off < 0) > __skb_push(skb, -off); Thanks.
Re: [PATCH v4 net-next RFC] net: Generic XDP
On Wed, Apr 19, 2017 at 04:25:43PM -0400, Andy Gospodarek wrote: > On Wed, Apr 19, 2017 at 10:44:59AM -0700, John Fastabend wrote: > > On 17-04-19 10:17 AM, Alexei Starovoitov wrote: > > > On Wed, Apr 19, 2017 at 10:29:03AM -0400, Andy Gospodarek wrote: > > >> > > >> I ran this on top of a card that uses the bnxt_en driver on a desktop > > >> class system with an i7-6700 CPU @ 3.40GHz, sending a single stream of > > >> UDP traffic with flow control disabled and saw the following (all stats > > >> in Million PPS). > > >> > > >> xdp1xdp2xdp_tx_tunnel > > >> Generic XDP 7.85.5 (1.3 actual) 4.6 (1.1 actual) > > >> Optimized XDP 11.7 9.7 4.6 > > > > > > Nice! Thanks for testing. > > > > > >> One thing to note is that the Generic XDP case shows some different > > >> results for reported by the application vs actual (seen on the wire). I > > >> did not debug where the drops are happening and what counter needs to be > > >> incremented to note this -- I'll add that to my TODO list. The > > >> Optimized XDP case does not have a difference in reported vs actual > > >> frames on the wire. > > > > > > The missed packets are probably due to xmit queue being full. > > > We need 'xdp_tx_full' counter in: > > > + if (free_skb) { > > > + trace_xdp_exception(dev, xdp_prog, XDP_TX); > > > + kfree_skb(skb); > > > + } > > > like in-driver xdp does. > > > It's surprising that tx becomes full so often. May be bnxt specific > > > behavior? > > > > hmm as a data point I get better numbers than 1.3Mpps running through the > > qdisc > > layer with pktgen so seems like something is wrong with the driver perhaps? > > If > > I get ~6.5Mpps on a single core with pktgen, so inconclusive for now may be your tx queue is simply smaller than rx queue?
Re: [PATCH v4 net-next RFC] net: Generic XDP
On Wed, Apr 19, 2017 at 10:44:59AM -0700, John Fastabend wrote: > On 17-04-19 10:17 AM, Alexei Starovoitov wrote: > > On Wed, Apr 19, 2017 at 10:29:03AM -0400, Andy Gospodarek wrote: > >> > >> I ran this on top of a card that uses the bnxt_en driver on a desktop > >> class system with an i7-6700 CPU @ 3.40GHz, sending a single stream of > >> UDP traffic with flow control disabled and saw the following (all stats > >> in Million PPS). > >> > >> xdp1xdp2xdp_tx_tunnel > >> Generic XDP 7.85.5 (1.3 actual) 4.6 (1.1 actual) > >> Optimized XDP 11.79.7 4.6 > > > > Nice! Thanks for testing. > > > >> One thing to note is that the Generic XDP case shows some different > >> results for reported by the application vs actual (seen on the wire). I > >> did not debug where the drops are happening and what counter needs to be > >> incremented to note this -- I'll add that to my TODO list. The > >> Optimized XDP case does not have a difference in reported vs actual > >> frames on the wire. > > > > The missed packets are probably due to xmit queue being full. > > We need 'xdp_tx_full' counter in: > > + if (free_skb) { > > + trace_xdp_exception(dev, xdp_prog, XDP_TX); > > + kfree_skb(skb); > > + } > > like in-driver xdp does. > > It's surprising that tx becomes full so often. May be bnxt specific > > behavior? > > hmm as a data point I get better numbers than 1.3Mpps running through the > qdisc > layer with pktgen so seems like something is wrong with the driver perhaps? If I get ~6.5Mpps on a single core with pktgen, so inconclusive for now > I get a chance I'll take a look with my setup here, although it likely wont be > until the weekend. I don't think it needs to slow down dropping the RFC tag > and getting the patch applied though. > > > > >> I agree with all those who have asserted that this is great tool for > >> those that want to get started with XDP but do not have hardware, so I'd > >> say it's ready to have the 'RFC' tag dropped. Thanks for pushing this > >> forward, Dave! :-) > > > > +1 > > > > >
Re: [PATCH v4 net-next RFC] net: Generic XDP
On 17-04-19 10:17 AM, Alexei Starovoitov wrote: > On Wed, Apr 19, 2017 at 10:29:03AM -0400, Andy Gospodarek wrote: >> >> I ran this on top of a card that uses the bnxt_en driver on a desktop >> class system with an i7-6700 CPU @ 3.40GHz, sending a single stream of >> UDP traffic with flow control disabled and saw the following (all stats >> in Million PPS). >> >> xdp1xdp2xdp_tx_tunnel >> Generic XDP 7.85.5 (1.3 actual) 4.6 (1.1 actual) >> Optimized XDP 11.7 9.7 4.6 > > Nice! Thanks for testing. > >> One thing to note is that the Generic XDP case shows some different >> results for reported by the application vs actual (seen on the wire). I >> did not debug where the drops are happening and what counter needs to be >> incremented to note this -- I'll add that to my TODO list. The >> Optimized XDP case does not have a difference in reported vs actual >> frames on the wire. > > The missed packets are probably due to xmit queue being full. > We need 'xdp_tx_full' counter in: > + if (free_skb) { > + trace_xdp_exception(dev, xdp_prog, XDP_TX); > + kfree_skb(skb); > + } > like in-driver xdp does. > It's surprising that tx becomes full so often. May be bnxt specific behavior? hmm as a data point I get better numbers than 1.3Mpps running through the qdisc layer with pktgen so seems like something is wrong with the driver perhaps? If I get a chance I'll take a look with my setup here, although it likely wont be until the weekend. I don't think it needs to slow down dropping the RFC tag and getting the patch applied though. > >> I agree with all those who have asserted that this is great tool for >> those that want to get started with XDP but do not have hardware, so I'd >> say it's ready to have the 'RFC' tag dropped. Thanks for pushing this >> forward, Dave! :-) > > +1 > >
Re: [PATCH v4 net-next RFC] net: Generic XDP
On Wed, Apr 19, 2017 at 10:29:03AM -0400, Andy Gospodarek wrote: > > I ran this on top of a card that uses the bnxt_en driver on a desktop > class system with an i7-6700 CPU @ 3.40GHz, sending a single stream of > UDP traffic with flow control disabled and saw the following (all stats > in Million PPS). > > xdp1xdp2xdp_tx_tunnel > Generic XDP 7.85.5 (1.3 actual) 4.6 (1.1 actual) > Optimized XDP 11.7 9.7 4.6 Nice! Thanks for testing. > One thing to note is that the Generic XDP case shows some different > results for reported by the application vs actual (seen on the wire). I > did not debug where the drops are happening and what counter needs to be > incremented to note this -- I'll add that to my TODO list. The > Optimized XDP case does not have a difference in reported vs actual > frames on the wire. The missed packets are probably due to xmit queue being full. We need 'xdp_tx_full' counter in: + if (free_skb) { + trace_xdp_exception(dev, xdp_prog, XDP_TX); + kfree_skb(skb); + } like in-driver xdp does. It's surprising that tx becomes full so often. May be bnxt specific behavior? > I agree with all those who have asserted that this is great tool for > those that want to get started with XDP but do not have hardware, so I'd > say it's ready to have the 'RFC' tag dropped. Thanks for pushing this > forward, Dave! :-) +1
Re: [PATCH v4 net-next RFC] net: Generic XDP
On Tue, Apr 18, 2017 at 03:29:16PM -0400, David Miller wrote: > From: David Miller > Date: Tue, 18 Apr 2017 15:07:08 -0400 (EDT) > > > From: Andy Gospodarek > > Date: Tue, 18 Apr 2017 15:05:35 -0400 > > > >> On Fri, Apr 14, 2017 at 05:59:51PM -0700, Alexei Starovoitov wrote: > >>> On Thu, Apr 13, 2017 at 04:23:15PM -0400, David Miller wrote: > >>> > + > >>> > + switch (act) { > >>> > + case XDP_TX: > >>> > + __skb_push(skb, skb->mac_len); > >>> > >>> s/skb->mac_len/mac_len/ > >>> > >> > >> I was away from my keyboard for a few days, but was able to get some > >> time to test this today. > >> > >> When using this change above suggested by Alexei, XDP_DROP and XDP_TX > >> actions appear to work well with xdp1 and xdp2. > >> > >> I'm seeing some rather odd behavior with xdp_tx_tunnel so it might be > >> good to hold off on committing this just yet. > >> > >> At first glance, it looks like there is enough headroom for the new > >> frame, but the resulting packet data do not look right and I'm actually > >> seeing some data that may be left on the stack from a previous caller. > > > > Thanks for testing Andy, I'll take a look and see if I can figure it out. > > Andy, I think we might be getting burnt by signedness issues in the > offset handling when the XDP program adjusts the packet data pointer. > > In netif_receive_generic_xdp(), try changing the offset handling code to > read something like: > > off = xdp.data - orig_data; > if (off > 0) > __skb_pull(skb, off); > else if (off < 0) > __skb_push(skb, -off); > > If that doesn't work try adding: > > __skb_cow(skb, XDP_PACKET_HEADROOM, 0); > > right after the skb_linearize() call in that same function. So I tried a variety of things and the simplest change on top of yours that works well for xdp1, xdp2, and xdp_tx_iptunnel. diff --git a/net/core/dev.c b/net/core/dev.c index b3d3a6e..1bab3dc 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -4316,11 +4316,11 @@ static u32 netif_receive_generic_xdp(struct sk_buff *skb, off = xdp.data - orig_data; if (off) - __skb_push(skb, off); + __skb_push(skb, -off); switch (act) { case XDP_TX: - __skb_push(skb, skb->mac_len); + __skb_push(skb, mac_len); /* fall through */ case XDP_PASS: break; I ran this on top of a card that uses the bnxt_en driver on a desktop class system with an i7-6700 CPU @ 3.40GHz, sending a single stream of UDP traffic with flow control disabled and saw the following (all stats in Million PPS). xdp1xdp2xdp_tx_tunnel Generic XDP 7.85.5 (1.3 actual) 4.6 (1.1 actual) Optimized XDP 11.7 9.7 4.6 One thing to note is that the Generic XDP case shows some different results for reported by the application vs actual (seen on the wire). I did not debug where the drops are happening and what counter needs to be incremented to note this -- I'll add that to my TODO list. The Optimized XDP case does not have a difference in reported vs actual frames on the wire. I agree with all those who have asserted that this is great tool for those that want to get started with XDP but do not have hardware, so I'd say it's ready to have the 'RFC' tag dropped. Thanks for pushing this forward, Dave! :-)
Re: [PATCH v4 net-next RFC] net: Generic XDP
On Tue, 18 Apr 2017 15:05:35 -0400 Andy Gospodarek wrote: > On Fri, Apr 14, 2017 at 05:59:51PM -0700, Alexei Starovoitov wrote: > > On Thu, Apr 13, 2017 at 04:23:15PM -0400, David Miller wrote: > > > + > > > + switch (act) { > > > + case XDP_TX: > > > + __skb_push(skb, skb->mac_len); > > > > s/skb->mac_len/mac_len/ > > > [...] > > When using this change above suggested by Alexei, XDP_DROP and XDP_TX > actions appear to work well with xdp1 and xdp2. Also adjusted patch accordingly. Ran a few quick tests today, but just against an really old e1000 NIC attached to a PCI-bus (32bit). There were not difference between DROP and PASS, but this is likely due to a NIC HW limit. Sender were sending 951,146 pps (<= tx_queue_0_packets /sec) $ sudo ./xdp_bench01_mem_access_cost --readmem --action XDP_DROP --dev e1000 XDP_action ppspps-human-readable mem XDP_DROP 671975 671,975read XDP_DROP 671997 671,997read XDP_DROP 672061 672,061read XDP_DROP 671861 671,861read ^CInterrupted: Removing XDP program on ifindex:2 device:e1000 $ sudo ./xdp_bench01_mem_access_cost --readmem --action XDP_PASS --dev e1000 XDP_action ppspps-human-readable mem XDP_PASS 672032 672,032read XDP_PASS 671910 671,910read XDP_PASS 671926 671,926read XDP_PASS 671947 671,947read ^CInterrupted: Removing XDP program on ifindex:2 device:e1000 Program xdp_bench01_mem_access_cost avail here: https://github.com/netoptimizer/prototype-kernel/tree/master/kernel/samples/bpf -- Best regards, Jesper Dangaard Brouer MSc.CS, Principal Kernel Engineer at Red Hat LinkedIn: http://www.linkedin.com/in/brouer
Re: [PATCH v4 net-next RFC] net: Generic XDP
On Tue, Apr 18, 2017 at 03:29:16PM -0400, David Miller wrote: > From: David Miller > Date: Tue, 18 Apr 2017 15:07:08 -0400 (EDT) > > > From: Andy Gospodarek > > Date: Tue, 18 Apr 2017 15:05:35 -0400 > > > >> On Fri, Apr 14, 2017 at 05:59:51PM -0700, Alexei Starovoitov wrote: > >>> On Thu, Apr 13, 2017 at 04:23:15PM -0400, David Miller wrote: > >>> > + > >>> > + switch (act) { > >>> > + case XDP_TX: > >>> > + __skb_push(skb, skb->mac_len); > >>> > >>> s/skb->mac_len/mac_len/ > >>> > >> > >> I was away from my keyboard for a few days, but was able to get some > >> time to test this today. > >> > >> When using this change above suggested by Alexei, XDP_DROP and XDP_TX > >> actions appear to work well with xdp1 and xdp2. > >> > >> I'm seeing some rather odd behavior with xdp_tx_tunnel so it might be > >> good to hold off on committing this just yet. > >> > >> At first glance, it looks like there is enough headroom for the new > >> frame, but the resulting packet data do not look right and I'm actually > >> seeing some data that may be left on the stack from a previous caller. > > > > Thanks for testing Andy, I'll take a look and see if I can figure it out. > > Andy, I think we might be getting burnt by signedness issues in the > offset handling when the XDP program adjusts the packet data pointer. I completely agree -- I just noted that the offset would be -20 in the tx_tunnel case and it's easy to get confused since a positive int for the second arg in skb_pull() does go 'back' with a positive value and was just rebuilding and testing just this case with: - off = xdp.data - orig_data; + /* note that offset is negative */ + off = orig_data - xdp.data; > In netif_receive_generic_xdp(), try changing the offset handling code to > read something like: > > off = xdp.data - orig_data; > if (off > 0) > __skb_pull(skb, off); > else if (off < 0) > __skb_push(skb, -off); This should do it. I'll give that run, too. > If that doesn't work try adding: > > __skb_cow(skb, XDP_PACKET_HEADROOM, 0); > > right after the skb_linearize() call in that same function.
Re: [PATCH v4 net-next RFC] net: Generic XDP
From: David Miller Date: Tue, 18 Apr 2017 15:07:08 -0400 (EDT) > From: Andy Gospodarek > Date: Tue, 18 Apr 2017 15:05:35 -0400 > >> On Fri, Apr 14, 2017 at 05:59:51PM -0700, Alexei Starovoitov wrote: >>> On Thu, Apr 13, 2017 at 04:23:15PM -0400, David Miller wrote: >>> > + >>> > + switch (act) { >>> > + case XDP_TX: >>> > + __skb_push(skb, skb->mac_len); >>> >>> s/skb->mac_len/mac_len/ >>> >> >> I was away from my keyboard for a few days, but was able to get some >> time to test this today. >> >> When using this change above suggested by Alexei, XDP_DROP and XDP_TX >> actions appear to work well with xdp1 and xdp2. >> >> I'm seeing some rather odd behavior with xdp_tx_tunnel so it might be >> good to hold off on committing this just yet. >> >> At first glance, it looks like there is enough headroom for the new >> frame, but the resulting packet data do not look right and I'm actually >> seeing some data that may be left on the stack from a previous caller. > > Thanks for testing Andy, I'll take a look and see if I can figure it out. Andy, I think we might be getting burnt by signedness issues in the offset handling when the XDP program adjusts the packet data pointer. In netif_receive_generic_xdp(), try changing the offset handling code to read something like: off = xdp.data - orig_data; if (off > 0) __skb_pull(skb, off); else if (off < 0) __skb_push(skb, -off); If that doesn't work try adding: __skb_cow(skb, XDP_PACKET_HEADROOM, 0); right after the skb_linearize() call in that same function.
Re: [PATCH v4 net-next RFC] net: Generic XDP
From: Andy Gospodarek Date: Tue, 18 Apr 2017 15:05:35 -0400 > On Fri, Apr 14, 2017 at 05:59:51PM -0700, Alexei Starovoitov wrote: >> On Thu, Apr 13, 2017 at 04:23:15PM -0400, David Miller wrote: >> > + >> > + switch (act) { >> > + case XDP_TX: >> > + __skb_push(skb, skb->mac_len); >> >> s/skb->mac_len/mac_len/ >> > > I was away from my keyboard for a few days, but was able to get some > time to test this today. > > When using this change above suggested by Alexei, XDP_DROP and XDP_TX > actions appear to work well with xdp1 and xdp2. > > I'm seeing some rather odd behavior with xdp_tx_tunnel so it might be > good to hold off on committing this just yet. > > At first glance, it looks like there is enough headroom for the new > frame, but the resulting packet data do not look right and I'm actually > seeing some data that may be left on the stack from a previous caller. Thanks for testing Andy, I'll take a look and see if I can figure it out.
Re: [PATCH v4 net-next RFC] net: Generic XDP
On Fri, Apr 14, 2017 at 05:59:51PM -0700, Alexei Starovoitov wrote: > On Thu, Apr 13, 2017 at 04:23:15PM -0400, David Miller wrote: > > + > > + switch (act) { > > + case XDP_TX: > > + __skb_push(skb, skb->mac_len); > > s/skb->mac_len/mac_len/ > I was away from my keyboard for a few days, but was able to get some time to test this today. When using this change above suggested by Alexei, XDP_DROP and XDP_TX actions appear to work well with xdp1 and xdp2. I'm seeing some rather odd behavior with xdp_tx_tunnel so it might be good to hold off on committing this just yet. At first glance, it looks like there is enough headroom for the new frame, but the resulting packet data do not look right and I'm actually seeing some data that may be left on the stack from a previous caller. > > + HARD_TX_UNLOCK(dev, txq); > > + if (free_skb) { > > + trace_xdp_exception(dev, xdp_prog, XDP_TX); > > + kfree_skb(skb); > > nice that you didn't forget to add trace_xdp_exception in this path :) > > Overall looks good to me and other than the minor nit in tx, i think it > should work for programs already used with in-driver xdp. > I'll test it next week unless people beat me to it. >
Re: [PATCH v4 net-next RFC] net: Generic XDP
On 4/14/17 6:59 PM, Alexei Starovoitov wrote: > I'll test it next week unless people beat me to it. > I have run the xdp1 example from samples/bpf. I have a patch that modifies set_link_xdp_fd to take a flags argument which it adds to the request as the IFLA_XDP_FLAGS attribute. That's used by callers to set XDP_FLAGS_SKB_MODE.
Re: [PATCH v4 net-next RFC] net: Generic XDP
On Thu, Apr 13, 2017 at 04:23:15PM -0400, David Miller wrote: > + > + switch (act) { > + case XDP_TX: > + __skb_push(skb, skb->mac_len); s/skb->mac_len/mac_len/ > + HARD_TX_UNLOCK(dev, txq); > + if (free_skb) { > + trace_xdp_exception(dev, xdp_prog, XDP_TX); > + kfree_skb(skb); nice that you didn't forget to add trace_xdp_exception in this path :) Overall looks good to me and other than the minor nit in tx, i think it should work for programs already used with in-driver xdp. I'll test it next week unless people beat me to it.
Re: [PATCH v4 net-next RFC] net: Generic XDP
On Thu, Apr 13, 2017 at 9:09 AM, David Miller wrote: > [snip] > +static u32 netif_receive_generic_xdp(struct sk_buff *skb, > +struct bpf_prog *xdp_prog) > +{ > + struct xdp_buff xdp; > + u32 act = XDP_DROP; > + void *orig_data; > + int hlen, off; > + > + if (skb_linearize(skb)) > + goto do_drop; > + > + /* The XDP program wants to see the packet starting at the MAC > +* header. > +*/ > + hlen = skb_headlen(skb) + skb->mac_len; > + xdp.data = skb->data - skb->mac_len; > + xdp.data_end = xdp.data + hlen; > + xdp.data_hard_start = xdp.data - skb_headroom(skb); > + orig_data = xdp.data; > + > + act = bpf_prog_run_xdp(xdp_prog, &xdp); > + > + off = xdp.data - orig_data; should we do "orig_data - xdp.data" instead? The 'off' might be < 0, when pushing new header. > + if (off) > + __skb_push(skb, off); When doing encapsulation by calling xdp_adjust_head(skb, offset), the offset is < 0 in order to make extra room in the front ; so xdp.data < orig_data, off < 0. But if we are decapsulating protocol, then off > 0, and maybe we should call __skb_pull()? Thanks, William
Re: [PATCH v4 net-next RFC] net: Generic XDP
From: Michael Chan Date: Thu, 13 Apr 2017 13:16:43 -0700 > On Thu, Apr 13, 2017 at 9:09 AM, David Miller wrote: >> >> --- >> >> v4: >> - Fix MAC header adjustmnet before calling prog (David Ahern) >> - Disable LRO when generic XDP is installed (Michael Chan) > > I don't see where you are disabling LRO in the patch. Ugh, I posted the wrong patch, here is the correct one. Sorry about that: Subject: [PATCH] net: Generic XDP This provides a generic SKB based non-optimized XDP path which is used if either the driver lacks a specific XDP implementation, or the user requests it via a new IFLA_XDP_FLAGS value named XDP_FLAGS_SKB_MODE. It is arguable that perhaps I should have required something like this as part of the initial XDP feature merge. I believe this is critical for two reasons: 1) Accessibility. More people can play with XDP with less dependencies. Yes I know we have XDP support in virtio_net, but that just creates another depedency for learning how to use this facility. I wrote this to make life easier for the XDP newbies. 2) As a model for what the expected semantics are. If there is a pure generic core implementation, it serves as a semantic example for driver folks adding XDP support. This is just a rough draft and is untested. One thing I have not tried to address here is the issue of XDP_PACKET_HEADROOM, thanks to Daniel for spotting that. It seems incredibly expensive to do a skb_cow(skb, XDP_PACKET_HEADROOM) or whatever even if the XDP program doesn't try to push headers at all. I think we really need the verifier to somehow propagate whether certain XDP helpers are used or not. Signed-off-by: David S. Miller --- v4: - Fix MAC header adjustmnet before calling prog (David Ahern) - Disable LRO when generic XDP is installed (Michael Chan) - Bypass qdisc et al. on XDP_TX and record the event (Alexei) - Do not perform generic XDP on reinjected packets (DaveM) v3: - Make sure XDP program sees packet at MAC header, push back MAC header if we do XDP_TX. (Alexei) - Elide GRO when generic XDP is in use. (Alexei) - Add XDP_FLAG_SKB_MODE flag which the user can use to request generic XDP even if the driver has an XDP implementation. (Alexei) - Report whether SKB mode is in use in rtnl_xdp_fill() via XDP_FLAGS attribute. (Daniel) v2: - Add some "fall through" comments in switch statements based upon feedback from Andrew Lunn - Use RCU for generic xdp_prog, thanks to Johannes Berg. --- include/linux/netdevice.h| 8 +++ include/uapi/linux/if_link.h | 4 +- net/core/dev.c | 153 +-- net/core/gro_cells.c | 2 +- net/core/rtnetlink.c | 40 ++- 5 files changed, 185 insertions(+), 22 deletions(-) diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index b0aa089..071a58b 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -1891,9 +1891,17 @@ struct net_device { struct lock_class_key *qdisc_tx_busylock; struct lock_class_key *qdisc_running_key; boolproto_down; + struct bpf_prog __rcu *xdp_prog; }; #define to_net_dev(d) container_of(d, struct net_device, dev) +static inline bool netif_elide_gro(const struct net_device *dev) +{ + if (!(dev->features & NETIF_F_GRO) || dev->xdp_prog) + return true; + return false; +} + #defineNETDEV_ALIGN32 static inline diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h index 8b405af..633aa02 100644 --- a/include/uapi/linux/if_link.h +++ b/include/uapi/linux/if_link.h @@ -887,7 +887,9 @@ enum { /* XDP section */ #define XDP_FLAGS_UPDATE_IF_NOEXIST(1U << 0) -#define XDP_FLAGS_MASK (XDP_FLAGS_UPDATE_IF_NOEXIST) +#define XDP_FLAGS_SKB_MODE (2U << 0) +#define XDP_FLAGS_MASK (XDP_FLAGS_UPDATE_IF_NOEXIST | \ +XDP_FLAGS_SKB_MODE) enum { IFLA_XDP_UNSPEC, diff --git a/net/core/dev.c b/net/core/dev.c index ef9fe60e..b3d3a6e 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -95,6 +95,7 @@ #include #include #include +#include #include #include #include @@ -4247,6 +4248,123 @@ static int __netif_receive_skb(struct sk_buff *skb) return ret; } +static struct static_key generic_xdp_needed __read_mostly; + +static int generic_xdp_install(struct net_device *dev, struct netdev_xdp *xdp) +{ + struct bpf_prog *new = xdp->prog; + int ret = 0; + + switch (xdp->command) { + case XDP_SETUP_PROG: { + struct bpf_prog *old = rtnl_dereference(dev->xdp_prog); + + rcu_assign_pointer(dev->xdp_prog, new); + if (old) + bpf_prog_put(old); + + if (old && !new) { + static_key_slow_dec(&generic_xdp_needed); + } else i
Re: [PATCH v4 net-next RFC] net: Generic XDP
On Thu, Apr 13, 2017 at 9:09 AM, David Miller wrote: > > --- > > v4: > - Fix MAC header adjustmnet before calling prog (David Ahern) > - Disable LRO when generic XDP is installed (Michael Chan) I don't see where you are disabling LRO in the patch. > - Bypass qdisc et al. on XDP_TX and record the event (Alexei) > - Do not perform generic XDP on reinjected packets (DaveM) >
[PATCH v4 net-next RFC] net: Generic XDP
This provides a generic SKB based non-optimized XDP path which is used if either the driver lacks a specific XDP implementation, or the user requests it via a new IFLA_XDP_FLAGS value named XDP_FLAGS_SKB_MODE. It is arguable that perhaps I should have required something like this as part of the initial XDP feature merge. I believe this is critical for two reasons: 1) Accessibility. More people can play with XDP with less dependencies. Yes I know we have XDP support in virtio_net, but that just creates another depedency for learning how to use this facility. I wrote this to make life easier for the XDP newbies. 2) As a model for what the expected semantics are. If there is a pure generic core implementation, it serves as a semantic example for driver folks adding XDP support. This is just a rough draft and is untested. One thing I have not tried to address here is the issue of XDP_PACKET_HEADROOM, thanks to Daniel for spotting that. It seems incredibly expensive to do a skb_cow(skb, XDP_PACKET_HEADROOM) or whatever even if the XDP program doesn't try to push headers at all. I think we really need the verifier to somehow propagate whether certain XDP helpers are used or not. Signed-off-by: David S. Miller --- v4: - Fix MAC header adjustmnet before calling prog (David Ahern) - Disable LRO when generic XDP is installed (Michael Chan) - Bypass qdisc et al. on XDP_TX and record the event (Alexei) - Do not perform generic XDP on reinjected packets (DaveM) v3: - Make sure XDP program sees packet at MAC header, push back MAC header if we do XDP_TX. (Alexei) - Elide GRO when generic XDP is in use. (Alexei) - Add XDP_FLAG_SKB_MODE flag which the user can use to request generic XDP even if the driver has an XDP implementation. (Alexei) - Report whether SKB mode is in use in rtnl_xdp_fill() via XDP_FLAGS attribute. (Daniel) v2: - Add some "fall through" comments in switch statements based upon feedback from Andrew Lunn - Use RCU for generic xdp_prog, thanks to Johannes Berg. diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index b0aa089..071a58b 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -1891,9 +1891,17 @@ struct net_device { struct lock_class_key *qdisc_tx_busylock; struct lock_class_key *qdisc_running_key; boolproto_down; + struct bpf_prog __rcu *xdp_prog; }; #define to_net_dev(d) container_of(d, struct net_device, dev) +static inline bool netif_elide_gro(const struct net_device *dev) +{ + if (!(dev->features & NETIF_F_GRO) || dev->xdp_prog) + return true; + return false; +} + #defineNETDEV_ALIGN32 static inline diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h index 8b405af..633aa02 100644 --- a/include/uapi/linux/if_link.h +++ b/include/uapi/linux/if_link.h @@ -887,7 +887,9 @@ enum { /* XDP section */ #define XDP_FLAGS_UPDATE_IF_NOEXIST(1U << 0) -#define XDP_FLAGS_MASK (XDP_FLAGS_UPDATE_IF_NOEXIST) +#define XDP_FLAGS_SKB_MODE (2U << 0) +#define XDP_FLAGS_MASK (XDP_FLAGS_UPDATE_IF_NOEXIST | \ +XDP_FLAGS_SKB_MODE) enum { IFLA_XDP_UNSPEC, diff --git a/net/core/dev.c b/net/core/dev.c index ef9fe60e..9ed4569 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -95,6 +95,7 @@ #include #include #include +#include #include #include #include @@ -4247,6 +4248,88 @@ static int __netif_receive_skb(struct sk_buff *skb) return ret; } +static struct static_key generic_xdp_needed __read_mostly; + +static int generic_xdp_install(struct net_device *dev, struct netdev_xdp *xdp) +{ + struct bpf_prog *new = xdp->prog; + int ret = 0; + + switch (xdp->command) { + case XDP_SETUP_PROG: { + struct bpf_prog *old = rtnl_dereference(dev->xdp_prog); + + rcu_assign_pointer(dev->xdp_prog, new); + if (old) + bpf_prog_put(old); + + if (old && !new) + static_key_slow_dec(&generic_xdp_needed); + else if (new && !old) + static_key_slow_inc(&generic_xdp_needed); + break; + } + + case XDP_QUERY_PROG: + xdp->prog_attached = !!rcu_access_pointer(dev->xdp_prog); + break; + + default: + ret = -EINVAL; + break; + } + + return ret; +} + +static u32 netif_receive_generic_xdp(struct sk_buff *skb, +struct bpf_prog *xdp_prog) +{ + struct xdp_buff xdp; + u32 act = XDP_DROP; + void *orig_data; + int hlen, off; + + if (skb_linearize(skb)) + goto do_drop; + + /* The XDP program wants to see the packet starting at the MAC +* header. +*/ + hle