Re: [RFC PATCH 00/12] Implement XDP bpf_redirect vairants

2017-07-17 Thread Jesse Brandeburg
On Sun, 16 Jul 2017 10:23:08 +0200
Jesper Dangaard Brouer  wrote:

> On Tue, 11 Jul 2017 12:37:10 -0700 John Fastabend  
> wrote:
> 
>  [...]  
> > 
> > hmm maybe Jesse or Alex have some clues. Adding them to the CC list.  
> 
> This seems related to Hyper-Threading.  I tried to disable
> hyperthreading in the BIOS, and the problem goes away.  That is, the
> benchmarks are no-longer affected by the CPU tuned-adm profile.

Wow, nice job figuring that out.  I went and took a look at tuned-adm
documentation but didn't see anything that stood out that would cause
the behavior you were seeing.

I suspect your toplev results showing this is a frontend problem mesh
nicely with the hyper-threading configuration inducing the bad behavior,
since there is still only one execution unit, and the fetchers would
have to stall.  I think you've rediscovered why the forwarding /
routing crowd generally turns off hyperthreading.


Re: [RFC PATCH 00/12] Implement XDP bpf_redirect vairants

2017-07-16 Thread Jesper Dangaard Brouer

On Tue, 11 Jul 2017 12:37:10 -0700 John Fastabend  
wrote:

> > I have a really strange observation... if I change the CPU powersave
> > settings, then the xdp_redirect_map performance drops in half!  Above
> > was with "tuned-adm profile powersave" (because, this is a really noisy
> > server, and I'm sitting next to it).  I can see that the CPU under-load
> > goes into "turbomode", rest going into low-power, including the
> > Hyper-thread siblings.
> > 
> > If I change the profile to: # tuned-adm profile network-latency
> > 
> > ifindex 6:   12964879 pkt/s
> > ifindex 6:   12964683 pkt/s
> > ifindex 6:   12961497 pkt/s
> > ifindex 6:   11779966 pkt/s <-- change to tuned-adm profile network-latency
> > ifindex 6:6853959 pkt/s
> > ifindex 6:6851120 pkt/s
> > ifindex 6:6856934 pkt/s
> > ifindex 6:6857344 pkt/s
> > ifindex 6:6857161 pkt/s
> > 
> > The CPU efficiency goes from 2.35 to 1.24 insn per cycle.
> > 
> > John do you know some Intel people that could help me understand what
> > is going on?!? This is very strange...
> > 
> > I tried Andi's toplev tool, which AFAIK indicate that this is a
> > Frontend problem, e.g. in decoding the instructions?!?
> >   
> 
> hmm maybe Jesse or Alex have some clues. Adding them to the CC list.

This seems related to Hyper-Threading.  I tried to disable
hyperthreading in the BIOS, and the problem goes away.  That is, the
benchmarks are no-longer affected by the CPU tuned-adm profile.

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer


Re: [RFC PATCH 00/12] Implement XDP bpf_redirect vairants

2017-07-13 Thread David Miller
From: John Fastabend 
Date: Thu, 13 Jul 2017 10:00:15 -0700

> On 07/13/2017 09:16 AM, Jesper Dangaard Brouer wrote:
>> On Thu, 13 Jul 2017 13:14:30 +0200
>> Jesper Dangaard Brouer  wrote:
>> 
>>> I'm still getting crashes (but much harder to provoke), but I figured
>>> out why.  We sort of missed one case, where map_to_flush gets set, when
>>> the ndo_xdp_xmit() call starts to fail, and the ixgbe driver then
>>> forgets to call xdp_do_flush_map, if all packets in that NAPI cycle
>>> failed.  We could blame the driver, but yhe clean solution is making
>>> sure, that we don't set map_to_flush when the __bpf_tx_xdp() call
>>> fails. It should also handle the other case I fixed  I'll cleanup
>>> my PoC-fix patch, test it and provide it here.
>> 
>> I changed flow in the function to be:
> 
> 
> Great, I'll merge this, the other couple fixes, and the bitops optimization 
> and
> hopefully then we are set. I'll post a v2 and we can do some final checks.

I am so looking forward to merging this, great work everyone.


Re: [RFC PATCH 00/12] Implement XDP bpf_redirect vairants

2017-07-13 Thread John Fastabend
On 07/13/2017 09:16 AM, Jesper Dangaard Brouer wrote:
> On Thu, 13 Jul 2017 13:14:30 +0200
> Jesper Dangaard Brouer  wrote:
> 
>> I'm still getting crashes (but much harder to provoke), but I figured
>> out why.  We sort of missed one case, where map_to_flush gets set, when
>> the ndo_xdp_xmit() call starts to fail, and the ixgbe driver then
>> forgets to call xdp_do_flush_map, if all packets in that NAPI cycle
>> failed.  We could blame the driver, but yhe clean solution is making
>> sure, that we don't set map_to_flush when the __bpf_tx_xdp() call
>> fails. It should also handle the other case I fixed  I'll cleanup
>> my PoC-fix patch, test it and provide it here.
> 
> I changed flow in the function to be:


Great, I'll merge this, the other couple fixes, and the bitops optimization and
hopefully then we are set. I'll post a v2 and we can do some final checks.

Thanks!
John

> 
> int xdp_do_redirect_map(struct net_device *dev, struct xdp_buff *xdp,
>   struct bpf_prog *xdp_prog)
> {
>   struct redirect_info *ri = this_cpu_ptr(_info);
>   struct bpf_map *map = ri->map;
>   u32 index = ri->ifindex;
>   struct net_device *fwd;
>   int err = -EINVAL;
> 
>   ri->ifindex = 0;
>   ri->map = NULL;
> 
>   fwd = __dev_map_lookup_elem(map, index);
>   if (!fwd)
>   goto out;
> 
>   if (ri->map_to_flush && (ri->map_to_flush != map))
>   xdp_do_flush_map();
> 
>   err = __bpf_tx_xdp(fwd, map, xdp, index);
>   if (likely(!err))
>   ri->map_to_flush = map;
> 
> out:
>   trace_xdp_redirect(dev, fwd, xdp_prog, XDP_REDIRECT);
>   return err;
> }
> 
> 
> The diff is:
> 
> diff --git a/net/core/filter.c b/net/core/filter.c
> index 4ca895d6ed51..c50a7ec2cdab 100644
> --- a/net/core/filter.c
> +++ b/net/core/filter.c
> @@ -2483,26 +2483,25 @@ int xdp_do_redirect_map(struct net_device *dev, 
> struct xdp_buff *xdp,
> struct bpf_map *map = ri->map;
> u32 index = ri->ifindex;
> struct net_device *fwd;
> +   int err = -EINVAL;
> +
> +   ri->ifindex = 0;
> +   ri->map = NULL;
>  
> fwd = __dev_map_lookup_elem(map, index);
> if (!fwd)
> goto out;
>  
> -   ri->ifindex = 0;
> -
> if (ri->map_to_flush && (ri->map_to_flush != map))
> xdp_do_flush_map();
>  
> -   ri->map_to_flush = map;
> -   ri->map = NULL;
> +   err = __bpf_tx_xdp(fwd, map, xdp, index);
> +   if (likely(!err))
> +   ri->map_to_flush = map;
>  
> -   trace_xdp_redirect(dev, fwd, xdp_prog, XDP_REDIRECT);
> -
> -   return __bpf_tx_xdp(fwd, map, xdp, index);
>  out:
> -   ri->ifindex = 0;
> -   ri->map = NULL;
> -   return -EINVAL;
> +   trace_xdp_redirect(dev, fwd, xdp_prog, XDP_REDIRECT);
> +   return err;
>  }
>  
>  int xdp_do_redirect(struct net_device *dev, struct xdp_buff *xdp,
> 



Re: [RFC PATCH 00/12] Implement XDP bpf_redirect vairants

2017-07-13 Thread Jesper Dangaard Brouer
On Thu, 13 Jul 2017 13:14:30 +0200
Jesper Dangaard Brouer  wrote:

> I'm still getting crashes (but much harder to provoke), but I figured
> out why.  We sort of missed one case, where map_to_flush gets set, when
> the ndo_xdp_xmit() call starts to fail, and the ixgbe driver then
> forgets to call xdp_do_flush_map, if all packets in that NAPI cycle
> failed.  We could blame the driver, but yhe clean solution is making
> sure, that we don't set map_to_flush when the __bpf_tx_xdp() call
> fails. It should also handle the other case I fixed  I'll cleanup
> my PoC-fix patch, test it and provide it here.

I changed flow in the function to be:

int xdp_do_redirect_map(struct net_device *dev, struct xdp_buff *xdp,
struct bpf_prog *xdp_prog)
{
struct redirect_info *ri = this_cpu_ptr(_info);
struct bpf_map *map = ri->map;
u32 index = ri->ifindex;
struct net_device *fwd;
int err = -EINVAL;

ri->ifindex = 0;
ri->map = NULL;

fwd = __dev_map_lookup_elem(map, index);
if (!fwd)
goto out;

if (ri->map_to_flush && (ri->map_to_flush != map))
xdp_do_flush_map();

err = __bpf_tx_xdp(fwd, map, xdp, index);
if (likely(!err))
ri->map_to_flush = map;

out:
trace_xdp_redirect(dev, fwd, xdp_prog, XDP_REDIRECT);
return err;
}


The diff is:

diff --git a/net/core/filter.c b/net/core/filter.c
index 4ca895d6ed51..c50a7ec2cdab 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -2483,26 +2483,25 @@ int xdp_do_redirect_map(struct net_device *dev, struct 
xdp_buff *xdp,
struct bpf_map *map = ri->map;
u32 index = ri->ifindex;
struct net_device *fwd;
+   int err = -EINVAL;
+
+   ri->ifindex = 0;
+   ri->map = NULL;
 
fwd = __dev_map_lookup_elem(map, index);
if (!fwd)
goto out;
 
-   ri->ifindex = 0;
-
if (ri->map_to_flush && (ri->map_to_flush != map))
xdp_do_flush_map();
 
-   ri->map_to_flush = map;
-   ri->map = NULL;
+   err = __bpf_tx_xdp(fwd, map, xdp, index);
+   if (likely(!err))
+   ri->map_to_flush = map;
 
-   trace_xdp_redirect(dev, fwd, xdp_prog, XDP_REDIRECT);
-
-   return __bpf_tx_xdp(fwd, map, xdp, index);
 out:
-   ri->ifindex = 0;
-   ri->map = NULL;
-   return -EINVAL;
+   trace_xdp_redirect(dev, fwd, xdp_prog, XDP_REDIRECT);
+   return err;
 }
 
 int xdp_do_redirect(struct net_device *dev, struct xdp_buff *xdp,

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer


Re: [RFC PATCH 00/12] Implement XDP bpf_redirect vairants

2017-07-13 Thread Jesper Dangaard Brouer
On Tue, 11 Jul 2017 11:26:54 -0700
John Fastabend  wrote:

> On 07/11/2017 07:23 AM, Jesper Dangaard Brouer wrote:
> > On Mon, 10 Jul 2017 17:59:17 -0700
> > John Fastabend  wrote:
> >   
> >> On 07/10/2017 11:30 AM, Jesper Dangaard Brouer wrote:  
> >>> On Sat, 8 Jul 2017 21:06:17 +0200
> >>> Jesper Dangaard Brouer  wrote:
> >>> 
>  On Sat, 08 Jul 2017 10:46:18 +0100 (WEST)
>  David Miller  wrote:
> 
> > From: John Fastabend 
> > Date: Fri, 07 Jul 2017 10:48:36 -0700
> >   
> >> On 07/07/2017 10:34 AM, John Fastabend wrote:
> >>> This series adds two new XDP helper routines bpf_redirect() and
> >>> bpf_redirect_map(). The first variant bpf_redirect() is meant
> >>> to be used the same way it is currently being used by the cls_bpf
> >>> classifier. An xdp packet will be redirected immediately when this
> >>> is called.
> >>
> >> Also other than the typo in the title there ;) I'm going to CC
> >> the driver maintainers working on XDP (makes for a long CC list but)
> >> because we would want to try and get support in as many as possible in
> >> the next merge window.
> >>
> >> For this rev I just implemented on ixgbe because I wrote the
> >> original XDP support there. I'll volunteer to do virtio as well.   
> >>  
> >
> > I went over this series a few times and it looks great to me.
> > You didn't even give me some coding style issues to pick on :-)  
> 
>  We (Daniel, Andy and I) have been reviewing and improving on this
>  patchset the last couple of weeks ;-).  We had some stability issues,
>  which is why it wasn't published earlier. My plan is to test this
>  latest patchset again, Monday and Tuesday. I'll try to assess stability
>  and provide some performance numbers.
> >>>
> >>>
> >>> Damn, I though it was stable, I have been running a lot of performance
> >>> tests, and then this just happened :-(
> >>
> >> Thanks, I'll take a look through the code and see if I can come up with
> >> why this might happen. I haven't hit it on my tests yet though.  
> > 
> > I've figured out why this happens, and I have a fix, see patch below
> > with some comments with questions.
> >   
> 
> Awesome!
> 
> > The problem is that we can leak map_to_flush in an error path, the fix:
> > 
> > diff --git a/net/core/filter.c b/net/core/filter.c
> > index 2ccd6ff09493..7f1f48668dcf 100644
> > --- a/net/core/filter.c
> > +++ b/net/core/filter.c
> > @@ -2497,11 +2497,14 @@ int xdp_do_redirect_map(struct net_device *dev, 
> > struct xdp_buff *xdp,
> > ri->map = NULL;
> >  
> > trace_xdp_redirect(dev, fwd, xdp_prog, XDP_REDIRECT);
> > -
> > +   // Q: Should we also trace "goto out" (failed lookup)?
> > +   //like bpf_warn_invalid_xdp_redirect();  
> 
> Maybe another trace event? trace_xdp_redirect_failed()
> 
> > return __bpf_tx_xdp(fwd, map, xdp, index);
> >  out:
> > ri->ifindex = 0;
> > -   ri->map = NULL;
> > +   // XXX: here we could leak ri->map_to_flush, which could be
> > +   //  picked up later by xdp_do_flush_map()
> > +   xdp_do_flush_map(); /* Clears ri->map_to_flush + ri->map */  
> 
> +1 
> 
> ah map lookup failed and we need to do the flush nice catch.

I'm still getting crashes (but much harder to provoke), but I figured
out why.  We sort of missed one case, where map_to_flush gets set, when
the ndo_xdp_xmit() call starts to fail, and the ixgbe driver then
forgets to call xdp_do_flush_map, if all packets in that NAPI cycle
failed.  We could blame the driver, but yhe clean solution is making
sure, that we don't set map_to_flush when the __bpf_tx_xdp() call
fails. It should also handle the other case I fixed  I'll cleanup
my PoC-fix patch, test it and provide it here.

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer


Re: [RFC PATCH 00/12] Implement XDP bpf_redirect vairants

2017-07-11 Thread John Fastabend
On 07/11/2017 12:19 PM, Jesper Dangaard Brouer wrote:
> On Tue, 11 Jul 2017 11:56:21 -0700
> John Fastabend  wrote:
> 
>> On 07/11/2017 11:44 AM, Jesper Dangaard Brouer wrote:
>>> On Tue, 11 Jul 2017 20:01:36 +0200
>>> Jesper Dangaard Brouer  wrote:
>>>   
 On Tue, 11 Jul 2017 10:48:29 -0700
 John Fastabend  wrote:
  
> On 07/11/2017 08:36 AM, Jesper Dangaard Brouer wrote:
>> On Sat, 8 Jul 2017 21:06:17 +0200
>> Jesper Dangaard Brouer  wrote:
>>   
>>> My plan is to test this latest patchset again, Monday and Tuesday.
>>> I'll try to assess stability and provide some performance numbers.  
>>
>> Performance numbers:
>>
>>  14378479 pkt/s = XDP_DROP without touching memory
>>   9222401 pkt/s = xdp1: XDP_DROP with reading packet data
>>   6344472 pkt/s = xdp2: XDP_TX   with swap mac (writes into pkt)
>>   4595574 pkt/s = xdp_redirect: XDP_REDIRECT with swap mac (simulate 
>> XDP_TX)
>>   5066243 pkt/s = xdp_redirect_map: XDP_REDIRECT with swap mac + devmap
>>
>> The performance drop between xdp2 and xdp_redirect, was expected due
>> to the HW-tailptr flush per packet, which is costly.
>>
>>  (1/6344472-1/4595574)*10^9 = -59.98 ns
>>
>> The performance drop between xdp2 and xdp_redirect_map, is higher than
>> I expected, which is not good!  The avoidance of the tailptr flush per
>> packet was expected to give a higher boost.  The cost increased with
>> 40 ns, which is too high compared to the code added (on a 4GHz machine
>> approx 160 cycles).
>>
>>  (1/6344472-1/5066243)*10^9 = -39.77 ns
>>
>> This system doesn't have DDIO, thus we are stalling on cache-misses,
>> but I was actually expecting that the added code could "hide" behind
>> these cache-misses.
>>
>> I'm somewhat surprised to see this large a performance drop.
>>   
>
> Yep, although there is room for optimizations in the code path for sure. 
> And
> 5mpps is not horrible my preference is to get this series in plus any
> small optimization we come up with while the merge window is closed. Then
> follow up patches can do optimizations.

 IMHO 5Mpps is a very bad number for XDP.
  
> One easy optimization is to get rid of the atomic bitops. They are not 
> needed
> here we have a per cpu unsigned long. Another easy one would be to move
> some of the checks out of the hotpath. For example checking for 
> ndo_xdp_xmit
> and flush ops on the net device in the hotpath really should be done in 
> the
> slow path.

 I'm already running with a similar patch as below, but it
 (surprisingly) only gave my 3 ns improvement.  I also tried a
 prefetchw() on xdp.data that gave me 10 ns (which is quite good).

 I'm booting up another system with a CPU E5-1650 v4 @ 3.60GHz, which
 have DDIO ... I have high hopes for this, as the major bottleneck on
 this CPU i7-4790K CPU @ 4.00GHz is clearly cache-misses.

 Something is definitely wrong on this CPU, as perf stats shows, a very
 bad utilization of the CPU pipeline with 0.89 insn per cycle.  
>>>
>>> Wow, getting DDIO working and avoiding the cache-miss, was really
>>> _the_ issue.  On this CPU E5-1650 v4 @ 3.60GHz things look really
>>> really good for XDP_REDIRECT with maps. (p.s. with __set_bit()
>>> optimization)
>>>   
>>
>> Very nice :) this was with the prefecthw() removed right?
> 
> Yes, prefetchw removed.

Great, I wanted to avoid playing prefetch games for as long as possible.
At least in this initial submission.

> 
>>> 13,939,674 pkt/s = XDP_DROP without touching memory
>>> 14,290,650 pkt/s = xdp1: XDP_DROP with reading packet data
>>> 13,221,812 pkt/s = xdp2: XDP_TX   with swap mac (writes into pkt)
>>>  7,596,576 pkt/s = xdp_redirect:XDP_REDIRECT with swap mac (like XDP_TX)
>>> 13,058,435 pkt/s = xdp_redirect_map:XDP_REDIRECT with swap mac + devmap
>>>
>>> Surprisingly, on this DDIO capable CPU it is slightly slower NOT to
>>> read packet memory.
>>>
>>> The large performance gap to xdp_redirect is due to the tailptr flush,
>>> which really show up on this system.  The CPU efficiency is 1.36 insn
>>> per cycle, which for map variant is 2.15 insn per cycle.
>>>
>>>  Gap (1/13221812-1/7596576)*10^9 = -56 ns
>>>
>>> The xdp_redirect_map performance is really really good, almost 10G
>>> wirespeed on a single CPU!!!  This is amazing, and we know that this
>>> code is not even optimal yet.  The performance difference to xdp2 is
>>> only around 1 ns.
>>>   
>>
>> Great, yeah there are some more likely()/unlikely() hints we could add and
>> also remove some of the checks in the hotpath, etc.
> 
> Yes, plus inlining some function call.
>  

Yep.

>> Thanks for doing this!
> 
> I have a really strange observation... if I change the 

Re: [RFC PATCH 00/12] Implement XDP bpf_redirect vairants

2017-07-11 Thread Jesper Dangaard Brouer
On Tue, 11 Jul 2017 11:56:21 -0700
John Fastabend  wrote:

> On 07/11/2017 11:44 AM, Jesper Dangaard Brouer wrote:
> > On Tue, 11 Jul 2017 20:01:36 +0200
> > Jesper Dangaard Brouer  wrote:
> >   
> >> On Tue, 11 Jul 2017 10:48:29 -0700
> >> John Fastabend  wrote:
> >>  
> >>> On 07/11/2017 08:36 AM, Jesper Dangaard Brouer wrote:
>  On Sat, 8 Jul 2017 21:06:17 +0200
>  Jesper Dangaard Brouer  wrote:
>    
> > My plan is to test this latest patchset again, Monday and Tuesday.
> > I'll try to assess stability and provide some performance numbers.  
> 
>  Performance numbers:
> 
>   14378479 pkt/s = XDP_DROP without touching memory
>    9222401 pkt/s = xdp1: XDP_DROP with reading packet data
>    6344472 pkt/s = xdp2: XDP_TX   with swap mac (writes into pkt)
>    4595574 pkt/s = xdp_redirect: XDP_REDIRECT with swap mac (simulate 
>  XDP_TX)
>    5066243 pkt/s = xdp_redirect_map: XDP_REDIRECT with swap mac + devmap
> 
>  The performance drop between xdp2 and xdp_redirect, was expected due
>  to the HW-tailptr flush per packet, which is costly.
> 
>   (1/6344472-1/4595574)*10^9 = -59.98 ns
> 
>  The performance drop between xdp2 and xdp_redirect_map, is higher than
>  I expected, which is not good!  The avoidance of the tailptr flush per
>  packet was expected to give a higher boost.  The cost increased with
>  40 ns, which is too high compared to the code added (on a 4GHz machine
>  approx 160 cycles).
> 
>   (1/6344472-1/5066243)*10^9 = -39.77 ns
> 
>  This system doesn't have DDIO, thus we are stalling on cache-misses,
>  but I was actually expecting that the added code could "hide" behind
>  these cache-misses.
> 
>  I'm somewhat surprised to see this large a performance drop.
>    
> >>>
> >>> Yep, although there is room for optimizations in the code path for sure. 
> >>> And
> >>> 5mpps is not horrible my preference is to get this series in plus any
> >>> small optimization we come up with while the merge window is closed. Then
> >>> follow up patches can do optimizations.
> >>
> >> IMHO 5Mpps is a very bad number for XDP.
> >>  
> >>> One easy optimization is to get rid of the atomic bitops. They are not 
> >>> needed
> >>> here we have a per cpu unsigned long. Another easy one would be to move
> >>> some of the checks out of the hotpath. For example checking for 
> >>> ndo_xdp_xmit
> >>> and flush ops on the net device in the hotpath really should be done in 
> >>> the
> >>> slow path.
> >>
> >> I'm already running with a similar patch as below, but it
> >> (surprisingly) only gave my 3 ns improvement.  I also tried a
> >> prefetchw() on xdp.data that gave me 10 ns (which is quite good).
> >>
> >> I'm booting up another system with a CPU E5-1650 v4 @ 3.60GHz, which
> >> have DDIO ... I have high hopes for this, as the major bottleneck on
> >> this CPU i7-4790K CPU @ 4.00GHz is clearly cache-misses.
> >>
> >> Something is definitely wrong on this CPU, as perf stats shows, a very
> >> bad utilization of the CPU pipeline with 0.89 insn per cycle.  
> > 
> > Wow, getting DDIO working and avoiding the cache-miss, was really
> > _the_ issue.  On this CPU E5-1650 v4 @ 3.60GHz things look really
> > really good for XDP_REDIRECT with maps. (p.s. with __set_bit()
> > optimization)
> >   
> 
> Very nice :) this was with the prefecthw() removed right?

Yes, prefetchw removed.

> > 13,939,674 pkt/s = XDP_DROP without touching memory
> > 14,290,650 pkt/s = xdp1: XDP_DROP with reading packet data
> > 13,221,812 pkt/s = xdp2: XDP_TX   with swap mac (writes into pkt)
> >  7,596,576 pkt/s = xdp_redirect:XDP_REDIRECT with swap mac (like XDP_TX)
> > 13,058,435 pkt/s = xdp_redirect_map:XDP_REDIRECT with swap mac + devmap
> > 
> > Surprisingly, on this DDIO capable CPU it is slightly slower NOT to
> > read packet memory.
> > 
> > The large performance gap to xdp_redirect is due to the tailptr flush,
> > which really show up on this system.  The CPU efficiency is 1.36 insn
> > per cycle, which for map variant is 2.15 insn per cycle.
> > 
> >  Gap (1/13221812-1/7596576)*10^9 = -56 ns
> > 
> > The xdp_redirect_map performance is really really good, almost 10G
> > wirespeed on a single CPU!!!  This is amazing, and we know that this
> > code is not even optimal yet.  The performance difference to xdp2 is
> > only around 1 ns.
> >   
> 
> Great, yeah there are some more likely()/unlikely() hints we could add and
> also remove some of the checks in the hotpath, etc.

Yes, plus inlining some function call.
 
> Thanks for doing this!

I have a really strange observation... if I change the CPU powersave
settings, then the xdp_redirect_map performance drops in half!  Above
was with "tuned-adm profile powersave" (because, this is a really noisy
server, and I'm sitting next to it).  I 

Re: [RFC PATCH 00/12] Implement XDP bpf_redirect vairants

2017-07-11 Thread John Fastabend
On 07/11/2017 11:44 AM, Jesper Dangaard Brouer wrote:
> On Tue, 11 Jul 2017 20:01:36 +0200
> Jesper Dangaard Brouer  wrote:
> 
>> On Tue, 11 Jul 2017 10:48:29 -0700
>> John Fastabend  wrote:
>>
>>> On 07/11/2017 08:36 AM, Jesper Dangaard Brouer wrote:  
 On Sat, 8 Jul 2017 21:06:17 +0200
 Jesper Dangaard Brouer  wrote:
 
> My plan is to test this latest patchset again, Monday and Tuesday.
> I'll try to assess stability and provide some performance numbers.

 Performance numbers:

  14378479 pkt/s = XDP_DROP without touching memory
   9222401 pkt/s = xdp1: XDP_DROP with reading packet data
   6344472 pkt/s = xdp2: XDP_TX   with swap mac (writes into pkt)
   4595574 pkt/s = xdp_redirect: XDP_REDIRECT with swap mac (simulate 
 XDP_TX)
   5066243 pkt/s = xdp_redirect_map: XDP_REDIRECT with swap mac + devmap

 The performance drop between xdp2 and xdp_redirect, was expected due
 to the HW-tailptr flush per packet, which is costly.

  (1/6344472-1/4595574)*10^9 = -59.98 ns

 The performance drop between xdp2 and xdp_redirect_map, is higher than
 I expected, which is not good!  The avoidance of the tailptr flush per
 packet was expected to give a higher boost.  The cost increased with
 40 ns, which is too high compared to the code added (on a 4GHz machine
 approx 160 cycles).

  (1/6344472-1/5066243)*10^9 = -39.77 ns

 This system doesn't have DDIO, thus we are stalling on cache-misses,
 but I was actually expecting that the added code could "hide" behind
 these cache-misses.

 I'm somewhat surprised to see this large a performance drop.
 
>>>
>>> Yep, although there is room for optimizations in the code path for sure. And
>>> 5mpps is not horrible my preference is to get this series in plus any
>>> small optimization we come up with while the merge window is closed. Then
>>> follow up patches can do optimizations.  
>>
>> IMHO 5Mpps is a very bad number for XDP.
>>
>>> One easy optimization is to get rid of the atomic bitops. They are not 
>>> needed
>>> here we have a per cpu unsigned long. Another easy one would be to move
>>> some of the checks out of the hotpath. For example checking for ndo_xdp_xmit
>>> and flush ops on the net device in the hotpath really should be done in the
>>> slow path.  
>>
>> I'm already running with a similar patch as below, but it
>> (surprisingly) only gave my 3 ns improvement.  I also tried a
>> prefetchw() on xdp.data that gave me 10 ns (which is quite good).
>>
>> I'm booting up another system with a CPU E5-1650 v4 @ 3.60GHz, which
>> have DDIO ... I have high hopes for this, as the major bottleneck on
>> this CPU i7-4790K CPU @ 4.00GHz is clearly cache-misses.
>>
>> Something is definitely wrong on this CPU, as perf stats shows, a very
>> bad utilization of the CPU pipeline with 0.89 insn per cycle.
> 
> Wow, getting DDIO working and avoiding the cache-miss, was really
> _the_ issue.  On this CPU E5-1650 v4 @ 3.60GHz things look really
> really good for XDP_REDIRECT with maps. (p.s. with __set_bit()
> optimization)
> 

Very nice :) this was with the prefecthw() removed right?

> 13,939,674 pkt/s = XDP_DROP without touching memory
> 14,290,650 pkt/s = xdp1: XDP_DROP with reading packet data
> 13,221,812 pkt/s = xdp2: XDP_TX   with swap mac (writes into pkt)
>  7,596,576 pkt/s = xdp_redirect:XDP_REDIRECT with swap mac (like XDP_TX)
> 13,058,435 pkt/s = xdp_redirect_map:XDP_REDIRECT with swap mac + devmap
> 
> Surprisingly, on this DDIO capable CPU it is slightly slower NOT to
> read packet memory.
> 
> The large performance gap to xdp_redirect is due to the tailptr flush,
> which really show up on this system.  The CPU efficiency is 1.36 insn
> per cycle, which for map variant is 2.15 insn per cycle.
> 
>  Gap (1/13221812-1/7596576)*10^9 = -56 ns
> 
> The xdp_redirect_map performance is really really good, almost 10G
> wirespeed on a single CPU!!!  This is amazing, and we know that this
> code is not even optimal yet.  The performance difference to xdp2 is
> only around 1 ns.
> 

Great, yeah there are some more likely()/unlikely() hints we could add and
also remove some of the checks in the hotpath, etc.

Thanks for doing this!



Re: [RFC PATCH 00/12] Implement XDP bpf_redirect vairants

2017-07-11 Thread Jesper Dangaard Brouer
On Tue, 11 Jul 2017 20:01:36 +0200
Jesper Dangaard Brouer  wrote:

> On Tue, 11 Jul 2017 10:48:29 -0700
> John Fastabend  wrote:
> 
> > On 07/11/2017 08:36 AM, Jesper Dangaard Brouer wrote:  
> > > On Sat, 8 Jul 2017 21:06:17 +0200
> > > Jesper Dangaard Brouer  wrote:
> > > 
> > >> My plan is to test this latest patchset again, Monday and Tuesday.
> > >> I'll try to assess stability and provide some performance numbers.
> > > 
> > > Performance numbers:
> > > 
> > >  14378479 pkt/s = XDP_DROP without touching memory
> > >   9222401 pkt/s = xdp1: XDP_DROP with reading packet data
> > >   6344472 pkt/s = xdp2: XDP_TX   with swap mac (writes into pkt)
> > >   4595574 pkt/s = xdp_redirect: XDP_REDIRECT with swap mac (simulate 
> > > XDP_TX)
> > >   5066243 pkt/s = xdp_redirect_map: XDP_REDIRECT with swap mac + devmap
> > > 
> > > The performance drop between xdp2 and xdp_redirect, was expected due
> > > to the HW-tailptr flush per packet, which is costly.
> > > 
> > >  (1/6344472-1/4595574)*10^9 = -59.98 ns
> > > 
> > > The performance drop between xdp2 and xdp_redirect_map, is higher than
> > > I expected, which is not good!  The avoidance of the tailptr flush per
> > > packet was expected to give a higher boost.  The cost increased with
> > > 40 ns, which is too high compared to the code added (on a 4GHz machine
> > > approx 160 cycles).
> > > 
> > >  (1/6344472-1/5066243)*10^9 = -39.77 ns
> > > 
> > > This system doesn't have DDIO, thus we are stalling on cache-misses,
> > > but I was actually expecting that the added code could "hide" behind
> > > these cache-misses.
> > > 
> > > I'm somewhat surprised to see this large a performance drop.
> > > 
> > 
> > Yep, although there is room for optimizations in the code path for sure. And
> > 5mpps is not horrible my preference is to get this series in plus any
> > small optimization we come up with while the merge window is closed. Then
> > follow up patches can do optimizations.  
> 
> IMHO 5Mpps is a very bad number for XDP.
> 
> > One easy optimization is to get rid of the atomic bitops. They are not 
> > needed
> > here we have a per cpu unsigned long. Another easy one would be to move
> > some of the checks out of the hotpath. For example checking for ndo_xdp_xmit
> > and flush ops on the net device in the hotpath really should be done in the
> > slow path.  
> 
> I'm already running with a similar patch as below, but it
> (surprisingly) only gave my 3 ns improvement.  I also tried a
> prefetchw() on xdp.data that gave me 10 ns (which is quite good).
> 
> I'm booting up another system with a CPU E5-1650 v4 @ 3.60GHz, which
> have DDIO ... I have high hopes for this, as the major bottleneck on
> this CPU i7-4790K CPU @ 4.00GHz is clearly cache-misses.
> 
> Something is definitely wrong on this CPU, as perf stats shows, a very
> bad utilization of the CPU pipeline with 0.89 insn per cycle.

Wow, getting DDIO working and avoiding the cache-miss, was really
_the_ issue.  On this CPU E5-1650 v4 @ 3.60GHz things look really
really good for XDP_REDIRECT with maps. (p.s. with __set_bit()
optimization)

13,939,674 pkt/s = XDP_DROP without touching memory
14,290,650 pkt/s = xdp1: XDP_DROP with reading packet data
13,221,812 pkt/s = xdp2: XDP_TX   with swap mac (writes into pkt)
 7,596,576 pkt/s = xdp_redirect:XDP_REDIRECT with swap mac (like XDP_TX)
13,058,435 pkt/s = xdp_redirect_map:XDP_REDIRECT with swap mac + devmap

Surprisingly, on this DDIO capable CPU it is slightly slower NOT to
read packet memory.

The large performance gap to xdp_redirect is due to the tailptr flush,
which really show up on this system.  The CPU efficiency is 1.36 insn
per cycle, which for map variant is 2.15 insn per cycle.

 Gap (1/13221812-1/7596576)*10^9 = -56 ns

The xdp_redirect_map performance is really really good, almost 10G
wirespeed on a single CPU!!!  This is amazing, and we know that this
code is not even optimal yet.  The performance difference to xdp2 is
only around 1 ns.

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer


 [jbrouer@E5-1650v4 bpf]$ sudo ./xdp1 6
 proto 17:   11919302 pkt/s
 proto 17:   14281659 pkt/s
 proto 17:   14290650 pkt/s
 proto 17:   14283120 pkt/s
 proto 17:   14303023 pkt/s
 proto 17:   14299496 pkt/s

 [jbrouer@E5-1650v4 bpf]$ sudo ./xdp2 6
 proto 0:  1 pkt/s
 proto 17:3225455 pkt/s
 proto 17:   13266772 pkt/s
 proto 17:   13221812 pkt/s
 proto 17:   1300 pkt/s
 proto 17:   13225508 pkt/s
 proto 17:   13226274 pkt/s

 [jbrouer@E5-1650v4 bpf]$ sudo ./xdp_redirect 6 6
 ifindex 6:  66040 pkt/s
 ifindex 6:7029143 pkt/s
 ifindex 6:7596576 pkt/s
 ifindex 6:7598499 pkt/s
 ifindex 6:7597025 pkt/s
 ifindex 6:7598462 pkt/s

 [jbrouer@E5-1650v4 bpf]$ sudo ./xdp_redirect_map 6 6
 map[0] (vports) = 4, map[1] (map) = 5, map[2] (count) = 0
 ifindex 6:  

Re: [RFC PATCH 00/12] Implement XDP bpf_redirect vairants

2017-07-11 Thread John Fastabend
On 07/11/2017 11:01 AM, Jesper Dangaard Brouer wrote:
> On Tue, 11 Jul 2017 10:48:29 -0700
> John Fastabend  wrote:
> 
>> On 07/11/2017 08:36 AM, Jesper Dangaard Brouer wrote:
>>> On Sat, 8 Jul 2017 21:06:17 +0200
>>> Jesper Dangaard Brouer  wrote:
>>>   
 My plan is to test this latest patchset again, Monday and Tuesday.
 I'll try to assess stability and provide some performance numbers.  
>>>
>>> Performance numbers:
>>>
>>>  14378479 pkt/s = XDP_DROP without touching memory
>>>   9222401 pkt/s = xdp1: XDP_DROP with reading packet data
>>>   6344472 pkt/s = xdp2: XDP_TX   with swap mac (writes into pkt)
>>>   4595574 pkt/s = xdp_redirect: XDP_REDIRECT with swap mac (simulate 
>>> XDP_TX)
>>>   5066243 pkt/s = xdp_redirect_map: XDP_REDIRECT with swap mac + devmap
>>>
>>> The performance drop between xdp2 and xdp_redirect, was expected due
>>> to the HW-tailptr flush per packet, which is costly.
>>>
>>>  (1/6344472-1/4595574)*10^9 = -59.98 ns
>>>
>>> The performance drop between xdp2 and xdp_redirect_map, is higher than
>>> I expected, which is not good!  The avoidance of the tailptr flush per
>>> packet was expected to give a higher boost.  The cost increased with
>>> 40 ns, which is too high compared to the code added (on a 4GHz machine
>>> approx 160 cycles).
>>>
>>>  (1/6344472-1/5066243)*10^9 = -39.77 ns
>>>
>>> This system doesn't have DDIO, thus we are stalling on cache-misses,
>>> but I was actually expecting that the added code could "hide" behind
>>> these cache-misses.
>>>
>>> I'm somewhat surprised to see this large a performance drop.
>>>   
>>
>> Yep, although there is room for optimizations in the code path for sure. And
>> 5mpps is not horrible my preference is to get this series in plus any
>> small optimization we come up with while the merge window is closed. Then
>> follow up patches can do optimizations.
> 
> IMHO 5Mpps is a very bad number for XDP.
> 
>> One easy optimization is to get rid of the atomic bitops. They are not needed
>> here we have a per cpu unsigned long. Another easy one would be to move
>> some of the checks out of the hotpath. For example checking for ndo_xdp_xmit
>> and flush ops on the net device in the hotpath really should be done in the
>> slow path.
> 
> I'm already running with a similar patch as below, but it
> (surprisingly) only gave my 3 ns improvement.  I also tried a
> prefetchw() on xdp.data that gave me 10 ns (which is quite good).
> 

Ah OK good, do the above numbers use the both the bitops changes and the
prefechw?

> I'm booting up another system with a CPU E5-1650 v4 @ 3.60GHz, which
> have DDIO ... I have high hopes for this, as the major bottleneck on
> this CPU i7-4790K CPU @ 4.00GHz is clearly cache-misses.
> 
> Something is definitely wrong on this CPU, as perf stats shows, a very
> bad utilization of the CPU pipeline with 0.89 insn per cycle.
> 

Interesting, the E5-1650 numbers will be good to know. If you have the
perf trace to posting might help track down some hot spots.

.John


Re: [RFC PATCH 00/12] Implement XDP bpf_redirect vairants

2017-07-11 Thread John Fastabend
On 07/11/2017 07:23 AM, Jesper Dangaard Brouer wrote:
> On Mon, 10 Jul 2017 17:59:17 -0700
> John Fastabend  wrote:
> 
>> On 07/10/2017 11:30 AM, Jesper Dangaard Brouer wrote:
>>> On Sat, 8 Jul 2017 21:06:17 +0200
>>> Jesper Dangaard Brouer  wrote:
>>>   
 On Sat, 08 Jul 2017 10:46:18 +0100 (WEST)
 David Miller  wrote:
  
> From: John Fastabend 
> Date: Fri, 07 Jul 2017 10:48:36 -0700
> 
>> On 07/07/2017 10:34 AM, John Fastabend wrote:  
>>> This series adds two new XDP helper routines bpf_redirect() and
>>> bpf_redirect_map(). The first variant bpf_redirect() is meant
>>> to be used the same way it is currently being used by the cls_bpf
>>> classifier. An xdp packet will be redirected immediately when this
>>> is called.  
>>
>> Also other than the typo in the title there ;) I'm going to CC
>> the driver maintainers working on XDP (makes for a long CC list but)
>> because we would want to try and get support in as many as possible in
>> the next merge window.
>>
>> For this rev I just implemented on ixgbe because I wrote the
>> original XDP support there. I'll volunteer to do virtio as well.  
>
> I went over this series a few times and it looks great to me.
> You didn't even give me some coding style issues to pick on :-)

 We (Daniel, Andy and I) have been reviewing and improving on this
 patchset the last couple of weeks ;-).  We had some stability issues,
 which is why it wasn't published earlier. My plan is to test this
 latest patchset again, Monday and Tuesday. I'll try to assess stability
 and provide some performance numbers.  
>>>
>>>
>>> Damn, I though it was stable, I have been running a lot of performance
>>> tests, and then this just happened :-(  
>>
>> Thanks, I'll take a look through the code and see if I can come up with
>> why this might happen. I haven't hit it on my tests yet though.
> 
> I've figured out why this happens, and I have a fix, see patch below
> with some comments with questions.
> 

Awesome!

> The problem is that we can leak map_to_flush in an error path, the fix:
> 
> diff --git a/net/core/filter.c b/net/core/filter.c
> index 2ccd6ff09493..7f1f48668dcf 100644
> --- a/net/core/filter.c
> +++ b/net/core/filter.c
> @@ -2497,11 +2497,14 @@ int xdp_do_redirect_map(struct net_device *dev, 
> struct xdp_buff *xdp,
> ri->map = NULL;
>  
> trace_xdp_redirect(dev, fwd, xdp_prog, XDP_REDIRECT);
> -
> +   // Q: Should we also trace "goto out" (failed lookup)?
> +   //like bpf_warn_invalid_xdp_redirect();

Maybe another trace event? trace_xdp_redirect_failed()

> return __bpf_tx_xdp(fwd, map, xdp, index);
>  out:
> ri->ifindex = 0;
> -   ri->map = NULL;
> +   // XXX: here we could leak ri->map_to_flush, which could be
> +   //  picked up later by xdp_do_flush_map()
> +   xdp_do_flush_map(); /* Clears ri->map_to_flush + ri->map */

+1 

ah map lookup failed and we need to do the flush nice catch.

> return -EINVAL;
> 
> 
> While debugging this, I noticed that we can have packets in-flight,
> while the XDP RX rings are being reconfigured.  I wonder if this is a
> ixgbe driver XDP-bug?  I think it would be best to add some
> RCU-barrier, after ixgbe_setup_tc().
> 

Actually I think a synchronize_sched() is needed, after the IXGBE_DOWN bit
is set but before the xdp_tx queues are cleaned up. In practice the 
ixgbe_down/up
sequence has so many msleep() operations for napi cleanup and hardware sync
I would be really surprised we ever hit this. But for correctness we should
likely add it.  

> diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c 
> b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
> index ed97aa81a850..4872fbb54ecd 100644
> --- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
> +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
> @@ -9801,7 +9804,18 @@ static int ixgbe_xdp_setup(struct net_device *dev, 
> struct bpf_prog *prog)
>  
> /* If transitioning XDP modes reconfigure rings */
> if (!!prog != !!old_prog) {
> -   int err = ixgbe_setup_tc(dev, netdev_get_num_tc(dev));
> +   // XXX: Warn pkts can be in-flight in old_prog
> +   //  while ixgbe_setup_tc() calls ixgbe_close(dev).
> +   //
> +   // Should we avoid these in-flight packets?
> +   // Would it be enough to add an synchronize_rcu()
> +   // or rcu_barrier()?
> +   // or do we need an napi_synchronize() call here?
> +   //
> +   int err;
> +   netdev_info(dev,
> +   "Calling ixgbe_setup_tc() to reconfig XDP 
> rings\n");
> +   err = ixgbe_setup_tc(dev, netdev_get_num_tc(dev));
>  
> if (err) {
>   

Re: [RFC PATCH 00/12] Implement XDP bpf_redirect vairants

2017-07-11 Thread Jesper Dangaard Brouer
On Tue, 11 Jul 2017 10:48:29 -0700
John Fastabend  wrote:

> On 07/11/2017 08:36 AM, Jesper Dangaard Brouer wrote:
> > On Sat, 8 Jul 2017 21:06:17 +0200
> > Jesper Dangaard Brouer  wrote:
> >   
> >> My plan is to test this latest patchset again, Monday and Tuesday.
> >> I'll try to assess stability and provide some performance numbers.  
> > 
> > Performance numbers:
> > 
> >  14378479 pkt/s = XDP_DROP without touching memory
> >   9222401 pkt/s = xdp1: XDP_DROP with reading packet data
> >   6344472 pkt/s = xdp2: XDP_TX   with swap mac (writes into pkt)
> >   4595574 pkt/s = xdp_redirect: XDP_REDIRECT with swap mac (simulate 
> > XDP_TX)
> >   5066243 pkt/s = xdp_redirect_map: XDP_REDIRECT with swap mac + devmap
> > 
> > The performance drop between xdp2 and xdp_redirect, was expected due
> > to the HW-tailptr flush per packet, which is costly.
> > 
> >  (1/6344472-1/4595574)*10^9 = -59.98 ns
> > 
> > The performance drop between xdp2 and xdp_redirect_map, is higher than
> > I expected, which is not good!  The avoidance of the tailptr flush per
> > packet was expected to give a higher boost.  The cost increased with
> > 40 ns, which is too high compared to the code added (on a 4GHz machine
> > approx 160 cycles).
> > 
> >  (1/6344472-1/5066243)*10^9 = -39.77 ns
> > 
> > This system doesn't have DDIO, thus we are stalling on cache-misses,
> > but I was actually expecting that the added code could "hide" behind
> > these cache-misses.
> > 
> > I'm somewhat surprised to see this large a performance drop.
> >   
> 
> Yep, although there is room for optimizations in the code path for sure. And
> 5mpps is not horrible my preference is to get this series in plus any
> small optimization we come up with while the merge window is closed. Then
> follow up patches can do optimizations.

IMHO 5Mpps is a very bad number for XDP.

> One easy optimization is to get rid of the atomic bitops. They are not needed
> here we have a per cpu unsigned long. Another easy one would be to move
> some of the checks out of the hotpath. For example checking for ndo_xdp_xmit
> and flush ops on the net device in the hotpath really should be done in the
> slow path.

I'm already running with a similar patch as below, but it
(surprisingly) only gave my 3 ns improvement.  I also tried a
prefetchw() on xdp.data that gave me 10 ns (which is quite good).

I'm booting up another system with a CPU E5-1650 v4 @ 3.60GHz, which
have DDIO ... I have high hopes for this, as the major bottleneck on
this CPU i7-4790K CPU @ 4.00GHz is clearly cache-misses.

Something is definitely wrong on this CPU, as perf stats shows, a very
bad utilization of the CPU pipeline with 0.89 insn per cycle.


> diff --git a/kernel/bpf/devmap.c b/kernel/bpf/devmap.c
> index 1191060..899364d 100644
> --- a/kernel/bpf/devmap.c
> +++ b/kernel/bpf/devmap.c
> @@ -213,7 +213,7 @@ void __dev_map_insert_ctx(struct bpf_map *map, u32 key)
> struct bpf_dtab *dtab = container_of(map, struct bpf_dtab, map);
> unsigned long *bitmap = this_cpu_ptr(dtab->flush_needed);
>  
> -   set_bit(key, bitmap);
> +   __set_bit(key, bitmap);
>  }
>  
>  struct net_device  *__dev_map_lookup_elem(struct bpf_map *map, u32 key)
> @@ -253,7 +253,7 @@ void __dev_map_flush(struct bpf_map *map)
>  
> netdev = dev->dev;
>  
> -   clear_bit(bit, bitmap);
> +   __clear_bit(bit, bitmap);
> if (unlikely(!netdev || !netdev->netdev_ops->ndo_xdp_flush))
> continue;
>  
> @@ -287,7 +287,7 @@ static void dev_map_flush_old(struct bpf_dtab_netdev 
> *old_dev)
>  
> for_each_online_cpu(cpu) {
> bitmap = per_cpu_ptr(old_dev->dtab->flush_needed, 
> cpu);
> -   clear_bit(old_dev->key, bitmap);
> +   __clear_bit(old_dev->key, bitmap);
>  
> fl->netdev_ops->ndo_xdp_flush(old_dev->dev);
> }
> 



-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer


Re: [RFC PATCH 00/12] Implement XDP bpf_redirect vairants

2017-07-11 Thread John Fastabend
On 07/11/2017 08:36 AM, Jesper Dangaard Brouer wrote:
> On Sat, 8 Jul 2017 21:06:17 +0200
> Jesper Dangaard Brouer  wrote:
> 
>> My plan is to test this latest patchset again, Monday and Tuesday.
>> I'll try to assess stability and provide some performance numbers.
> 
> Performance numbers:
> 
>  14378479 pkt/s = XDP_DROP without touching memory
>   9222401 pkt/s = xdp1: XDP_DROP with reading packet data
>   6344472 pkt/s = xdp2: XDP_TX   with swap mac (writes into pkt)
>   4595574 pkt/s = xdp_redirect: XDP_REDIRECT with swap mac (simulate 
> XDP_TX)
>   5066243 pkt/s = xdp_redirect_map: XDP_REDIRECT with swap mac + devmap
> 
> The performance drop between xdp2 and xdp_redirect, was expected due
> to the HW-tailptr flush per packet, which is costly.
> 
>  (1/6344472-1/4595574)*10^9 = -59.98 ns
> 
> The performance drop between xdp2 and xdp_redirect_map, is higher than
> I expected, which is not good!  The avoidance of the tailptr flush per
> packet was expected to give a higher boost.  The cost increased with
> 40 ns, which is too high compared to the code added (on a 4GHz machine
> approx 160 cycles).
> 
>  (1/6344472-1/5066243)*10^9 = -39.77 ns
> 
> This system doesn't have DDIO, thus we are stalling on cache-misses,
> but I was actually expecting that the added code could "hide" behind
> these cache-misses.
> 
> I'm somewhat surprised to see this large a performance drop.
> 

Yep, although there is room for optimizations in the code path for sure. And
5mpps is not horrible my preference is to get this series in plus any
small optimization we come up with while the merge window is closed. Then
follow up patches can do optimizations.

One easy optimization is to get rid of the atomic bitops. They are not needed
here we have a per cpu unsigned long. Another easy one would be to move
some of the checks out of the hotpath. For example checking for ndo_xdp_xmit
and flush ops on the net device in the hotpath really should be done in the
slow path.

diff --git a/kernel/bpf/devmap.c b/kernel/bpf/devmap.c
index 1191060..899364d 100644
--- a/kernel/bpf/devmap.c
+++ b/kernel/bpf/devmap.c
@@ -213,7 +213,7 @@ void __dev_map_insert_ctx(struct bpf_map *map, u32 key)
struct bpf_dtab *dtab = container_of(map, struct bpf_dtab, map);
unsigned long *bitmap = this_cpu_ptr(dtab->flush_needed);
 
-   set_bit(key, bitmap);
+   __set_bit(key, bitmap);
 }
 
 struct net_device  *__dev_map_lookup_elem(struct bpf_map *map, u32 key)
@@ -253,7 +253,7 @@ void __dev_map_flush(struct bpf_map *map)
 
netdev = dev->dev;
 
-   clear_bit(bit, bitmap);
+   __clear_bit(bit, bitmap);
if (unlikely(!netdev || !netdev->netdev_ops->ndo_xdp_flush))
continue;
 
@@ -287,7 +287,7 @@ static void dev_map_flush_old(struct bpf_dtab_netdev 
*old_dev)
 
for_each_online_cpu(cpu) {
bitmap = per_cpu_ptr(old_dev->dtab->flush_needed, cpu);
-   clear_bit(old_dev->key, bitmap);
+   __clear_bit(old_dev->key, bitmap);
 
fl->netdev_ops->ndo_xdp_flush(old_dev->dev);
}



Re: [RFC PATCH 00/12] Implement XDP bpf_redirect vairants

2017-07-11 Thread Jesper Dangaard Brouer
On Sat, 8 Jul 2017 21:06:17 +0200
Jesper Dangaard Brouer  wrote:

> My plan is to test this latest patchset again, Monday and Tuesday.
> I'll try to assess stability and provide some performance numbers.

Performance numbers:

 14378479 pkt/s = XDP_DROP without touching memory
  9222401 pkt/s = xdp1: XDP_DROP with reading packet data
  6344472 pkt/s = xdp2: XDP_TX   with swap mac (writes into pkt)
  4595574 pkt/s = xdp_redirect: XDP_REDIRECT with swap mac (simulate XDP_TX)
  5066243 pkt/s = xdp_redirect_map: XDP_REDIRECT with swap mac + devmap

The performance drop between xdp2 and xdp_redirect, was expected due
to the HW-tailptr flush per packet, which is costly.

 (1/6344472-1/4595574)*10^9 = -59.98 ns

The performance drop between xdp2 and xdp_redirect_map, is higher than
I expected, which is not good!  The avoidance of the tailptr flush per
packet was expected to give a higher boost.  The cost increased with
40 ns, which is too high compared to the code added (on a 4GHz machine
approx 160 cycles).

 (1/6344472-1/5066243)*10^9 = -39.77 ns

This system doesn't have DDIO, thus we are stalling on cache-misses,
but I was actually expecting that the added code could "hide" behind
these cache-misses.

I'm somewhat surprised to see this large a performance drop.

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer

Results::

 # XDP_DROP with reading packet data
 [jbrouer@canyon bpf]$ sudo ./xdp1 3
 proto 17:6449727 pkt/s
 proto 17:9222639 pkt/s
 proto 17:9222401 pkt/s
 proto 17:9223083 pkt/s
 proto 17:9223515 pkt/s
 proto 17:9222477 pkt/s
 ^C

 # XDP_TX with swap mac
 [jbrouer@canyon bpf]$ sudo ./xdp2 3
 proto 17: 934682 pkt/s
 proto 17:6344845 pkt/s
 proto 17:6344472 pkt/s
 proto 17:6345265 pkt/s
 proto 17:6345238 pkt/s
 proto 17:6345338 pkt/s
 ^C

 # XDP_REDIRECT with swap mac (simulate XDP_TX via same ifindex)
 [jbrouer@canyon bpf]$ sudo ./xdp_redirect 3 3
 ifindex 3: 749567 pkt/s
 ifindex 3:4595025 pkt/s
 ifindex 3:4595574 pkt/s
 ifindex 3:4595429 pkt/s
 ifindex 3:4595340 pkt/s
 ifindex 3:4595352 pkt/s
 ifindex 3:4595364 pkt/s
 ^C

  # XDP_REDIRECT with swap mac + devmap (still simulate XDP_TX)
 [jbrouer@canyon bpf]$ sudo ./xdp_redirect_map 3 3
 map[0] (vports) = 4, map[1] (map) = 5, map[2] (count) = 0
 ifindex 3:3076506 pkt/s
 ifindex 3:5066282 pkt/s
 ifindex 3:5066243 pkt/s
 ifindex 3:5067376 pkt/s
 ifindex 3:5067226 pkt/s
 ifindex 3:5067622 pkt/s

My own tools::

 [jbrouer@canyon prototype-kernel]$
   sudo ./xdp_bench01_mem_access_cost --dev ixgbe1 --sec 2 \
--action XDP_DROP
 XDP_action   ppspps-human-readable mem  
 XDP_DROP 0  0  no_touch 
 XDP_DROP 98944019,894,401  no_touch 
 XDP_DROP 14377459   14,377,459 no_touch 
 XDP_DROP 14378228   14,378,228 no_touch 
 XDP_DROP 14378400   14,378,400 no_touch 
 XDP_DROP 14378319   14,378,319 no_touch 
 XDP_DROP 14378479   14,378,479 no_touch 
 XDP_DROP 14377332   14,377,332 no_touch 
 XDP_DROP 14378411   14,378,411 no_touch 
 XDP_DROP 14378095   14,378,095 no_touch 
 ^CInterrupted: Removing XDP program on ifindex:3 device:ixgbe1

 [jbrouer@canyon prototype-kernel]$
  sudo ./xdp_bench01_mem_access_cost --dev ixgbe1 --sec 2 \
   --action XDP_DROP --read
 XDP_action   ppspps-human-readable mem  
 XDP_DROP 0  0  read 
 XDP_DROP 69941146,994,114  read 
 XDP_DROP 89794148,979,414  read 
 XDP_DROP 89796368,979,636  read 
 XDP_DROP 89800878,980,087  read 
 XDP_DROP 89790978,979,097  read 
 XDP_DROP 89789708,978,970  read 
 ^CInterrupted: Removing XDP program on ifindex:3 device:ixgbe1

 [jbrouer@canyon prototype-kernel]$
  sudo ./xdp_bench01_mem_access_cost --dev ixgbe1 --sec 2 \
  --action XDP_TX --swap --read
 XDP_action   ppspps-human-readable mem  
 XDP_TX   0  0  swap_mac 
 XDP_TX   21415562,141,556  swap_mac 
 XDP_TX   61719846,171,984  swap_mac 
 XDP_TX   61719556,171,955  swap_mac 
 XDP_TX   61717676,171,767  swap_mac 
 XDP_TX   61716806,171,680  swap_mac 
 XDP_TX   61722016,172,201  swap_mac 
 ^CInterrupted: Removing XDP program on ifindex:3 device:ixgbe1


Setting tuned-adm network-latency ::

 $ sudo tuned-adm list
 [...]
 Current active profile: network-latency



Re: [RFC PATCH 00/12] Implement XDP bpf_redirect vairants

2017-07-11 Thread Jesper Dangaard Brouer
On Mon, 10 Jul 2017 17:59:17 -0700
John Fastabend  wrote:

> On 07/10/2017 11:30 AM, Jesper Dangaard Brouer wrote:
> > On Sat, 8 Jul 2017 21:06:17 +0200
> > Jesper Dangaard Brouer  wrote:
> >   
> >> On Sat, 08 Jul 2017 10:46:18 +0100 (WEST)
> >> David Miller  wrote:
> >>  
> >>> From: John Fastabend 
> >>> Date: Fri, 07 Jul 2017 10:48:36 -0700
> >>> 
>  On 07/07/2017 10:34 AM, John Fastabend wrote:  
> > This series adds two new XDP helper routines bpf_redirect() and
> > bpf_redirect_map(). The first variant bpf_redirect() is meant
> > to be used the same way it is currently being used by the cls_bpf
> > classifier. An xdp packet will be redirected immediately when this
> > is called.  
> 
>  Also other than the typo in the title there ;) I'm going to CC
>  the driver maintainers working on XDP (makes for a long CC list but)
>  because we would want to try and get support in as many as possible in
>  the next merge window.
> 
>  For this rev I just implemented on ixgbe because I wrote the
>  original XDP support there. I'll volunteer to do virtio as well.  
> >>>
> >>> I went over this series a few times and it looks great to me.
> >>> You didn't even give me some coding style issues to pick on :-)
> >>
> >> We (Daniel, Andy and I) have been reviewing and improving on this
> >> patchset the last couple of weeks ;-).  We had some stability issues,
> >> which is why it wasn't published earlier. My plan is to test this
> >> latest patchset again, Monday and Tuesday. I'll try to assess stability
> >> and provide some performance numbers.  
> > 
> > 
> > Damn, I though it was stable, I have been running a lot of performance
> > tests, and then this just happened :-(  
> 
> Thanks, I'll take a look through the code and see if I can come up with
> why this might happen. I haven't hit it on my tests yet though.

I've figured out why this happens, and I have a fix, see patch below
with some comments with questions.

The problem is that we can leak map_to_flush in an error path, the fix:

diff --git a/net/core/filter.c b/net/core/filter.c
index 2ccd6ff09493..7f1f48668dcf 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -2497,11 +2497,14 @@ int xdp_do_redirect_map(struct net_device *dev, struct 
xdp_buff *xdp,
ri->map = NULL;
 
trace_xdp_redirect(dev, fwd, xdp_prog, XDP_REDIRECT);
-
+   // Q: Should we also trace "goto out" (failed lookup)?
+   //like bpf_warn_invalid_xdp_redirect();
return __bpf_tx_xdp(fwd, map, xdp, index);
 out:
ri->ifindex = 0;
-   ri->map = NULL;
+   // XXX: here we could leak ri->map_to_flush, which could be
+   //  picked up later by xdp_do_flush_map()
+   xdp_do_flush_map(); /* Clears ri->map_to_flush + ri->map */
return -EINVAL;


While debugging this, I noticed that we can have packets in-flight,
while the XDP RX rings are being reconfigured.  I wonder if this is a
ixgbe driver XDP-bug?  I think it would be best to add some
RCU-barrier, after ixgbe_setup_tc().

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c 
b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index ed97aa81a850..4872fbb54ecd 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -9801,7 +9804,18 @@ static int ixgbe_xdp_setup(struct net_device *dev, 
struct bpf_prog *prog)
 
/* If transitioning XDP modes reconfigure rings */
if (!!prog != !!old_prog) {
-   int err = ixgbe_setup_tc(dev, netdev_get_num_tc(dev));
+   // XXX: Warn pkts can be in-flight in old_prog
+   //  while ixgbe_setup_tc() calls ixgbe_close(dev).
+   //
+   // Should we avoid these in-flight packets?
+   // Would it be enough to add an synchronize_rcu()
+   // or rcu_barrier()?
+   // or do we need an napi_synchronize() call here?
+   //
+   int err;
+   netdev_info(dev,
+   "Calling ixgbe_setup_tc() to reconfig XDP rings\n");
+   err = ixgbe_setup_tc(dev, netdev_get_num_tc(dev));
 
if (err) {
rcu_assign_pointer(adapter->xdp_prog, old_prog);

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer


Re: [RFC PATCH 00/12] Implement XDP bpf_redirect vairants

2017-07-10 Thread John Fastabend
On 07/10/2017 11:30 AM, Jesper Dangaard Brouer wrote:
> On Sat, 8 Jul 2017 21:06:17 +0200
> Jesper Dangaard Brouer  wrote:
> 
>> On Sat, 08 Jul 2017 10:46:18 +0100 (WEST)
>> David Miller  wrote:
>>
>>> From: John Fastabend 
>>> Date: Fri, 07 Jul 2017 10:48:36 -0700
>>>   
 On 07/07/2017 10:34 AM, John Fastabend wrote:
> This series adds two new XDP helper routines bpf_redirect() and
> bpf_redirect_map(). The first variant bpf_redirect() is meant
> to be used the same way it is currently being used by the cls_bpf
> classifier. An xdp packet will be redirected immediately when this
> is called.

 Also other than the typo in the title there ;) I'm going to CC
 the driver maintainers working on XDP (makes for a long CC list but)
 because we would want to try and get support in as many as possible in
 the next merge window.

 For this rev I just implemented on ixgbe because I wrote the
 original XDP support there. I'll volunteer to do virtio as well.
>>>
>>> I went over this series a few times and it looks great to me.
>>> You didn't even give me some coding style issues to pick on :-)  
>>
>> We (Daniel, Andy and I) have been reviewing and improving on this
>> patchset the last couple of weeks ;-).  We had some stability issues,
>> which is why it wasn't published earlier. My plan is to test this
>> latest patchset again, Monday and Tuesday. I'll try to assess stability
>> and provide some performance numbers.
> 
> 
> Damn, I though it was stable, I have been running a lot of performance
> tests, and then this just happened :-(

Thanks, I'll take a look through the code and see if I can come up with
why this might happen. I haven't hit it on my tests yet though.

.John


Re: [RFC PATCH 00/12] Implement XDP bpf_redirect vairants

2017-07-10 Thread Jesper Dangaard Brouer
On Sat, 8 Jul 2017 21:06:17 +0200
Jesper Dangaard Brouer  wrote:

> On Sat, 08 Jul 2017 10:46:18 +0100 (WEST)
> David Miller  wrote:
> 
> > From: John Fastabend 
> > Date: Fri, 07 Jul 2017 10:48:36 -0700
> >   
> > > On 07/07/2017 10:34 AM, John Fastabend wrote:
> > >> This series adds two new XDP helper routines bpf_redirect() and
> > >> bpf_redirect_map(). The first variant bpf_redirect() is meant
> > >> to be used the same way it is currently being used by the cls_bpf
> > >> classifier. An xdp packet will be redirected immediately when this
> > >> is called.
> > > 
> > > Also other than the typo in the title there ;) I'm going to CC
> > > the driver maintainers working on XDP (makes for a long CC list but)
> > > because we would want to try and get support in as many as possible in
> > > the next merge window.
> > > 
> > > For this rev I just implemented on ixgbe because I wrote the
> > > original XDP support there. I'll volunteer to do virtio as well.
> > 
> > I went over this series a few times and it looks great to me.
> > You didn't even give me some coding style issues to pick on :-)  
> 
> We (Daniel, Andy and I) have been reviewing and improving on this
> patchset the last couple of weeks ;-).  We had some stability issues,
> which is why it wasn't published earlier. My plan is to test this
> latest patchset again, Monday and Tuesday. I'll try to assess stability
> and provide some performance numbers.


Damn, I though it was stable, I have been running a lot of performance
tests, and then this just happened :-(

[11357.149486] BUG: unable to handle kernel NULL pointer dereference at 
0210
[11357.157393] IP: __dev_map_flush+0x58/0x90
[11357.161446] PGD 3ff685067 
[11357.161446] P4D 3ff685067 
[11357.164199] PUD 3ff684067 
[11357.166947] PMD 0 
[11357.170396] 
[11357.173981] Oops:  [#1] PREEMPT SMP
[11357.177859] Modules linked in: coretemp kvm_intel kvm irqbypass intel_cstate 
intel_uncore intel_rapl_perf mxm_wmi i2c_i801 pcspkr sg i2c_co]
[11357.203021] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 
4.12.0-net-next-xdp_redirect02-rfc+ #135
[11357.211706] Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./Z97 
Extreme4, BIOS P2.10 05/12/2015
[11357.221606] task: 81c0e480 task.stack: 81c0
[11357.227568] RIP: 0010:__dev_map_flush+0x58/0x90
[11357.232138] RSP: 0018:88041fa03de0 EFLAGS: 00010286
[11357.237409] RAX:  RBX: 8803fc996600 RCX: 0003
[11357.244589] RDX: 88040d0bf480 RSI:  RDI: 81d901d8
[11357.251767] RBP: 88041fa03df8 R08: fffc R09: 00070008
[11357.258940] R10: 813f11d0 R11: 0040 R12: e8c014a0
[11357.266119] R13: 0003 R14: 003c R15: 8803fc9c26c0
[11357.273294] FS:  () GS:88041fa0() 
knlGS:
[11357.281454] CS:  0010 DS:  ES:  CR0: 80050033
[11357.287244] CR2: 0210 CR3: 0003fc41e000 CR4: 001406f0
[11357.294423] Call Trace:
[11357.296912]  
[11357.298967]  xdp_do_flush_map+0x36/0x40
[11357.302847]  ixgbe_poll+0x7ea/0x1370 [ixgbe]
[11357.307160]  net_rx_action+0x247/0x3e0
[11357.310957]  __do_softirq+0x106/0x2cb
[11357.314664]  irq_exit+0xbe/0xd0
[11357.317851]  do_IRQ+0x4f/0xd0
[11357.320858]  common_interrupt+0x86/0x86
[11357.324733] RIP: 0010:poll_idle+0x2f/0x5a
[11357.328781] RSP: 0018:81c03dd0 EFLAGS: 0246 ORIG_RAX: 
ff8e
[11357.336426] RAX:  RBX: 81d689e0 RCX: 0020
[11357.343605] RDX:  RSI: 81c0e480 RDI: 88041fa22800
[11357.350783] RBP: 81c03dd0 R08: 03c5 R09: 0018
[11357.357958] R10: 0327 R11: 0390 R12: 88041fa22800
[11357.365135] R13: 81d689f8 R14:  R15: 81d689e0
[11357.372311]  
[11357.374453]  cpuidle_enter_state+0xf2/0x2e0
[11357.378678]  cpuidle_enter+0x17/0x20
[11357.382299]  call_cpuidle+0x23/0x40
[11357.385834]  do_idle+0xe8/0x190
[11357.389024]  cpu_startup_entry+0x1d/0x20
[11357.392993]  rest_init+0xd5/0xe0
[11357.396268]  start_kernel+0x3d7/0x3e4
[11357.399979]  x86_64_start_reservations+0x2a/0x2c
[11357.404641]  x86_64_start_kernel+0x178/0x18b
[11357.408959]  secondary_startup_64+0x9f/0x9f
[11357.413186]  ? secondary_startup_64+0x9f/0x9f
[11357.417589] Code: 41 89 c5 48 8b 53 60 44 89 e8 48 8d 14 c2 48 8b 12 48 85 
d2 74 23 48 8b 3a f0 49 0f b3 04 24 48 85 ff 74 15 48 8b 87 e0 0 
[11357.436565] RIP: __dev_map_flush+0x58/0x90 RSP: 88041fa03de0
[11357.442613] CR2: 0210
[11357.445972] ---[ end trace f7ed232095169a98 ]---
[11357.450637] Kernel panic - not syncing: Fatal exception in interrupt
[11357.457038] Kernel Offset: disabled
[11357.460566] ---[ end Kernel panic - not syncing: Fatal exception in interrupt

(gdb) list 

Re: [RFC PATCH 00/12] Implement XDP bpf_redirect vairants

2017-07-08 Thread Jesper Dangaard Brouer
On Sat, 08 Jul 2017 10:46:18 +0100 (WEST)
David Miller  wrote:

> From: John Fastabend 
> Date: Fri, 07 Jul 2017 10:48:36 -0700
> 
> > On 07/07/2017 10:34 AM, John Fastabend wrote:  
> >> This series adds two new XDP helper routines bpf_redirect() and
> >> bpf_redirect_map(). The first variant bpf_redirect() is meant
> >> to be used the same way it is currently being used by the cls_bpf
> >> classifier. An xdp packet will be redirected immediately when this
> >> is called.  
> > 
> > Also other than the typo in the title there ;) I'm going to CC
> > the driver maintainers working on XDP (makes for a long CC list but)
> > because we would want to try and get support in as many as possible in
> > the next merge window.
> > 
> > For this rev I just implemented on ixgbe because I wrote the
> > original XDP support there. I'll volunteer to do virtio as well.  
> 
> I went over this series a few times and it looks great to me.
> You didn't even give me some coding style issues to pick on :-)

We (Daniel, Andy and I) have been reviewing and improving on this
patchset the last couple of weeks ;-).  We had some stability issues,
which is why it wasn't published earlier. My plan is to test this
latest patchset again, Monday and Tuesday. I'll try to assess stability
and provide some performance numbers.

I've complained/warned about the danger of redirecting with XDP,
without providing (1) a way to debug/see XDP redirects, (2) a way
interfaces opt-in they can be redirected. (1) is solved by patch-07/12
via a tracepoint. (2) is currently done by only forwarding to
interfaces with an XDP program loaded itself, this also comes from a
practical need for NIC drivers to allocate XDP-TX HW queues.

I'm not satisfied with the (UAPI) name for the new map
"BPF_MAP_TYPE_DEVMAP" and the filename this is placed in
"kernel/bpf/devmap.c", as we want to take advantage of compiler
inlining for the next redirect map types.  (1) because the name doesn't
tell the user that this map is connected to the redirect_map call.
(2) we want to introduce other kinds of redirect maps (like redirect to
CPUs and sockets), and it would be good if they shared a common "text"
string.

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer


Re: [RFC PATCH 00/12] Implement XDP bpf_redirect vairants

2017-07-08 Thread David Miller
From: John Fastabend 
Date: Fri, 07 Jul 2017 10:48:36 -0700

> On 07/07/2017 10:34 AM, John Fastabend wrote:
>> This series adds two new XDP helper routines bpf_redirect() and
>> bpf_redirect_map(). The first variant bpf_redirect() is meant
>> to be used the same way it is currently being used by the cls_bpf
>> classifier. An xdp packet will be redirected immediately when this
>> is called.
> 
> Also other than the typo in the title there ;) I'm going to CC
> the driver maintainers working on XDP (makes for a long CC list but)
> because we would want to try and get support in as many as possible in
> the next merge window.
> 
> For this rev I just implemented on ixgbe because I wrote the
> original XDP support there. I'll volunteer to do virtio as well.

I went over this series a few times and it looks great to me.
You didn't even give me some coding style issues to pick on :-)



Re: [RFC PATCH 00/12] Implement XDP bpf_redirect vairants

2017-07-07 Thread John Fastabend
On 07/07/2017 10:34 AM, John Fastabend wrote:
> This series adds two new XDP helper routines bpf_redirect() and
> bpf_redirect_map(). The first variant bpf_redirect() is meant
> to be used the same way it is currently being used by the cls_bpf
> classifier. An xdp packet will be redirected immediately when this
> is called.

Also other than the typo in the title there ;) I'm going to CC
the driver maintainers working on XDP (makes for a long CC list but)
because we would want to try and get support in as many as possible in
the next merge window.

For this rev I just implemented on ixgbe because I wrote the
original XDP support there. I'll volunteer to do virtio as well.

Thanks,
John


[RFC PATCH 00/12] Implement XDP bpf_redirect vairants

2017-07-07 Thread John Fastabend
This series adds two new XDP helper routines bpf_redirect() and
bpf_redirect_map(). The first variant bpf_redirect() is meant
to be used the same way it is currently being used by the cls_bpf
classifier. An xdp packet will be redirected immediately when this
is called.

The other variant bpf_redirect_map(map, key, flags) uses a new
map type called devmap. A devmap uses integers as keys and
net_devices as values. The user provies key/ifindex pairs to
update the map with new net_devices. This provides two benefits
over the normal variant 'bpf_redirect()'. First the datapath
bpf program is abstracted away from using hard-coded ifindex
values. Allowing a single bpf program to be run any many different
environments. Second, and perhaps more important, the map enables 
batching packet transmits. The map plus small driver changes
allows for batching all send requests across a NAPI poll loop.
This allows driver writers to optimize the driver xmit path
and only call expensive operations once for a batch of xdp_buffs.

The devmap was designed with possible future work to support
multicast and broadcast as following patches.

To see, in more detail, how to leverage the new helpers and
map from the userspace side please review these two patches,

  xdp: sample program for new bpf_redirect helper
  xdp: bpf redirect with map sample program

I'm sending this as an RFC now because (a) the merge window
is closed so it seems like a good time to get feedback and
(b) I'm currently doing a last round of testing to ensure all
the features are still working after latest revisions as well
as doing a final review.

Any feedback would be welcome. Thanks to Jesper, Andy, and
Daniel for all their input, patches, fixes, testing, review, etc.
so far it is very much appreciated!

Thanks,
John

---

John Fastabend (12):
  ixgbe: NULL xdp_tx rings on resource cleanup
  net: xdp: support xdp generic on virtual devices
  xdp: add bpf_redirect helper function
  xdp: sample program for new bpf_redirect helper
  net: implement XDP_REDIRECT for xdp generic
  ixgbe: add initial support for xdp redirect
  xdp: add trace event for xdp redirect
  bpf: add devmap, a map for storing net device references
  bpf: add bpf_redirect_map helper routine
  xdp: Add batching support to redirect map
  net: add notifier hooks for devmap bpf map
  xdp: bpf redirect with map sample program


 drivers/net/ethernet/intel/ixgbe/ixgbe_lib.c  |8 
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c |   60 +++
 include/linux/bpf.h   |5 
 include/linux/bpf_types.h |3 
 include/linux/filter.h|   14 +
 include/linux/netdevice.h |   11 +
 include/trace/events/xdp.h|   31 ++
 include/uapi/linux/bpf.h  |   10 +
 kernel/bpf/Makefile   |3 
 kernel/bpf/devmap.c   |  431 +
 kernel/bpf/verifier.c |   14 +
 net/core/dev.c|  226 -
 net/core/filter.c |  172 ++
 samples/bpf/Makefile  |8 
 samples/bpf/bpf_helpers.h |2 
 samples/bpf/xdp_redirect_kern.c   |   81 +
 samples/bpf/xdp_redirect_map_kern.c   |   83 +
 samples/bpf/xdp_redirect_map_user.c   |  105 ++
 samples/bpf/xdp_redirect_user.c   |  102 ++
 tools/testing/selftests/bpf/test_maps.c   |   15 +
 20 files changed, 1283 insertions(+), 101 deletions(-)
 create mode 100644 kernel/bpf/devmap.c
 create mode 100644 samples/bpf/xdp_redirect_kern.c
 create mode 100644 samples/bpf/xdp_redirect_map_kern.c
 create mode 100644 samples/bpf/xdp_redirect_map_user.c
 create mode 100644 samples/bpf/xdp_redirect_user.c

--
Signature