On Tue, 28 Aug 2018 14:44:35 +0200
Björn Töpel <bjorn.to...@gmail.com> wrote:

> From: Björn Töpel <bjorn.to...@intel.com>
> 
> The -c/--copy -z/--zero-copy flags enforces either copy or zero-copy
> mode.

Nice, thanks for adding this.  It allows me to quickly test the
difference between normal-copy vs zero-copy modes.
(Kernel bpf-next without RETPOLINE).

AF_XDP RX-drop:
 Normal-copy mode: rx 13,070,318 pps - 76.5 ns
 Zero-copy   mode: rx 26,132,328 pps - 38.3 ns

Compare to XDP_DROP:  34,251,464 pps - 29.2 ns
   XDP_DROP + read :  30,756,664 pps - 32.5 ns

The normal-copy mode is surprisingly fast (and it works for every
driver implemeting the regular XDP_REDIRECT action).  It is still
faster to do in-kernel XDP_DROP than AF_XDP zero-copy mode dropping,
which was expected given frames travel to a remote CPU before returned
(don't think remote CPU reads payload?).  The gap in nanosec is
actually quite small, thus I'm impressed by the SPSC-queue
implementation working across these CPUs.


AF_XDP layer2-fwd:
 Normal-copy mode: rx  3,200,885   tx  3,200,892
 Zero-copy   mode: rx 17,026,300   tx 17,026,269

Compare to XDP_TX: rx 14,529,079   tx 14,529,850  - 68.82 ns
     XDP_REDIRECT: rx 13,235,785   tx 13,235,784  - 75.55 ns

The copy-mode is slow because it allocates SKBs internally (I do
wonder if we could speed it up by using ndo_xdp_xmit + disable-BH).
More intersting is that the zero-copy is faster than XDP_TX and
XDP_REDIRECT. I think the speedup comes from avoiding some DMA mapping
calls with ZC.

Side-note: XDP_TX vs. REDIRECT: 75.55 - 68.82 = 6.73 ns.  The cost of
going through the xdp_do_redirect_map core is actually quite small :-)
(I have some micro optimizations that should help ~2ns).


AF_XDP TX-only:
 Normal-copy mode: tx  2,853,461 pps
 Zero-copy   mode: tx 22,255,311 pps

(There is not XDP mode that does TX to compare against)

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer

Reply via email to