On Sat, 21 Jan 2017 11:26:49 -0800 Eric Dumazet <eric.duma...@gmail.com> wrote:
> > My previous measurements show approx 20℅ speedup on a UDP test with > > delivery to remote CPU. > > > I find this a bit strange. When you have time (ie not while driving your > car or during week end) please give more details, for example on message > size. I tested this with both 64 bytes and 1500 bytes. After I moved to 50G and 100G testing then I don't need to use 64 bytes packets to provoke the bottlenecks in the stack ;-) > Was it before skb_condense() was added ? It tested this just before skb_condense() was added. BUT skb_condense() does not get activated when using mlx5, because uses build_skb() ie. not using frags. For people that don't realize this: Eric's optimization in skb_condense() is about trading remote CPU atomic refcnt (put_page) for copy + local CPU refcnt dec. My measurements show cycles cost local=31 vs. remote=208, thus a estimated saving around 177 cycles. Which is spend on calling a fairly complex function __pskb_pull_tail(), and only works for more complex SKBs with frags. -- Best regards, Jesper Dangaard Brouer MSc.CS, Principal Kernel Engineer at Red Hat LinkedIn: http://www.linkedin.com/in/brouer