On Tue, 28 Aug 2018 14:44:35 +0200 Björn Töpel <bjorn.to...@gmail.com> wrote:
> From: Björn Töpel <bjorn.to...@intel.com> > > The -c/--copy -z/--zero-copy flags enforces either copy or zero-copy > mode. Nice, thanks for adding this. It allows me to quickly test the difference between normal-copy vs zero-copy modes. (Kernel bpf-next without RETPOLINE). AF_XDP RX-drop: Normal-copy mode: rx 13,070,318 pps - 76.5 ns Zero-copy mode: rx 26,132,328 pps - 38.3 ns Compare to XDP_DROP: 34,251,464 pps - 29.2 ns XDP_DROP + read : 30,756,664 pps - 32.5 ns The normal-copy mode is surprisingly fast (and it works for every driver implemeting the regular XDP_REDIRECT action). It is still faster to do in-kernel XDP_DROP than AF_XDP zero-copy mode dropping, which was expected given frames travel to a remote CPU before returned (don't think remote CPU reads payload?). The gap in nanosec is actually quite small, thus I'm impressed by the SPSC-queue implementation working across these CPUs. AF_XDP layer2-fwd: Normal-copy mode: rx 3,200,885 tx 3,200,892 Zero-copy mode: rx 17,026,300 tx 17,026,269 Compare to XDP_TX: rx 14,529,079 tx 14,529,850 - 68.82 ns XDP_REDIRECT: rx 13,235,785 tx 13,235,784 - 75.55 ns The copy-mode is slow because it allocates SKBs internally (I do wonder if we could speed it up by using ndo_xdp_xmit + disable-BH). More intersting is that the zero-copy is faster than XDP_TX and XDP_REDIRECT. I think the speedup comes from avoiding some DMA mapping calls with ZC. Side-note: XDP_TX vs. REDIRECT: 75.55 - 68.82 = 6.73 ns. The cost of going through the xdp_do_redirect_map core is actually quite small :-) (I have some micro optimizations that should help ~2ns). AF_XDP TX-only: Normal-copy mode: tx 2,853,461 pps Zero-copy mode: tx 22,255,311 pps (There is not XDP mode that does TX to compare against) -- Best regards, Jesper Dangaard Brouer MSc.CS, Principal Kernel Engineer at Red Hat LinkedIn: http://www.linkedin.com/in/brouer