On Mon, Jan 23, 2017 at 11:14 AM, Jesper Dangaard Brouer <bro...@redhat.com> wrote: > On Sat, 21 Jan 2017 11:26:49 -0800 > Eric Dumazet <eric.duma...@gmail.com> wrote: > >> > My previous measurements show approx 20℅ speedup on a UDP test with >> > delivery to remote CPU. >> > >> I find this a bit strange. When you have time (ie not while driving your >> car or during week end) please give more details, for example on message >> size. > > I tested this with both 64 bytes and 1500 bytes. After I moved to 50G > and 100G testing then I don't need to use 64 bytes packets to provoke > the bottlenecks in the stack ;-) >
Exactly! for XDP like uses cases, page cache maybe a non required optimization. but when you start testing a typical TCP use cases over 50/100G link you will need more buffers (pages) to host the traffic for longer periods, you will hit that bottleneck. >> Was it before skb_condense() was added ? > > It tested this just before skb_condense() was added. BUT > skb_condense() does not get activated when using mlx5, because uses > build_skb() ie. not using frags. > Well, we can always replace build_skb with alloc_skb + memcpy(skb->data, headlen) + add_skb_frag(payload) does it it worth it ? and is it healthy that both skb->data and skb_shinfo(skb)->frags[i] point to the same page ? > For people that don't realize this: > Eric's optimization in skb_condense() is about trading remote CPU > atomic refcnt (put_page) for copy + local CPU refcnt dec. > > My measurements show cycles cost local=31 vs. remote=208, thus a > estimated saving around 177 cycles. Which is spend on calling a fairly > complex function __pskb_pull_tail(), and only works for more complex > SKBs with frags. > > -- > Best regards, > Jesper Dangaard Brouer > MSc.CS, Principal Kernel Engineer at Red Hat > LinkedIn: http://www.linkedin.com/in/brouer