Re: [RFC/PATCH v3] IPoIB: Leave space in skb linear buffer for IP headers

2013-04-16 Thread Markus Stockhausen



I only wonder what the effect on performance would be with an IB MTU
of 4K active; then full-sized packets would be pretty much exactly
split between the linear part and the fragment page.  How does GRO
cope with that?  I guess in the 2K IB MTU case there's no cost in
having all the data in the linear part of the skb.

 - R.


Hm,

if I think about the current situation we can make it only better.
When receiving a packet on a 4K HCA we have to pull the IP header
into the linear part of the SKB during GRO handling. That consumes
extra CPU cycles and does not depend on the packet size.

We can avoid this by splitting the packet at a well defined position.
Your patch made a cut at x+128 bytes. From my understanding the
position should have no performance impact. The 3 cases that we
analyzed up to now are:

- 2K fragment + header pull = fast
- header and some data in linear part + 1,9K fragment = faster
- only linear part + no fragment = fastest

Maybe I'm too hasty (without a 4K MTU test environment) but from
the above I would derive that larger packets will still benefit
from the adapted handling.

Markus


--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC/PATCH v3] IPoIB: Leave space in skb linear buffer for IP headers

2013-04-16 Thread Roland Dreier
On Tue, Apr 16, 2013 at 9:22 AM, Markus Stockhausen
markus.stockhau...@gmx.de wrote:
 if I think about the current situation we can make it only better.
 When receiving a packet on a 4K HCA we have to pull the IP header
 into the linear part of the SKB during GRO handling. That consumes
 extra CPU cycles and does not depend on the packet size.

 We can avoid this by splitting the packet at a well defined position.
 Your patch made a cut at x+128 bytes. From my understanding the
 position should have no performance impact. The 3 cases that we
 analyzed up to now are:

 - 2K fragment + header pull = fast
 - header and some data in linear part + 1,9K fragment = faster
 - only linear part + no fragment = fastest

 Maybe I'm too hasty (without a 4K MTU test environment) but from
 the above I would derive that larger packets will still benefit
 from the adapted handling.

Yes, makes sense to me as well.  I sent a v4 patch that I'm pretty
sure should work well for you.

 - R.
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC/PATCH v3] IPoIB: Leave space in skb linear buffer for IP headers

2013-04-15 Thread Roland Dreier
On Tue, Apr 9, 2013 at 12:52 PM, Markus Stockhausen
markus.stockhau...@gmx.de wrote:
 IPOIB_UD_HEAD_SIZE  = IB_GRH_BYTES + IPOIB_ENCAP_LEN + 3072

 In my 2044 MTU case this brings the netperf  NFS throughput to
 the same levels as the dirty hack. Of course this no longer
 reflects a head but equals more or less to something like a
 new constant IPOIB_UD_FIXED_SKB_SIZE.

After thinking about this, I'm pretty convinced that this is probably
a good approach.  However it seems that making IPOIB_UD_HEAD_SIZE be
IB_GRH_BYTES + 2048 should be enough, since that's the size of the
largest receive buffer needed with a 2K IB MTU.

I only wonder what the effect on performance would be with an IB MTU
of 4K active; then full-sized packets would be pretty much exactly
split between the linear part and the fragment page.  How does GRO
cope with that?  I guess in the 2K IB MTU case there's no cost in
having all the data in the linear part of the skb.

 - R.
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [RFC/PATCH v3] IPoIB: Leave space in skb linear buffer for IP headers

2013-04-09 Thread Luick, Dean
 From: Roland Dreier rol...@purestorage.com
 + if (wc-byte_len  IPOIB_UD_HEAD_SIZE) {
 + page = priv-rx_ring[wr_id].page;
 + priv-rx_ring[wr_id].page = NULL;
 + } else {
 + page = NULL;
 + }
 +
   /*
* If we can't allocate a new RX buffer, dump
* this packet and reuse the old buffer.
*/
   if (unlikely(!ipoib_alloc_rx_skb(dev, wr_id))) {
   ++dev-stats.rx_dropped;
 + priv-rx_ring[wr_id].page = page;
   goto repost;
   }


Can you go through the else of the first if (page is NULL), then enter the 
second if? If so, isn't the page lost?


Dean
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC/PATCH v3] IPoIB: Leave space in skb linear buffer for IP headers

2013-04-09 Thread Roland Dreier
On Tue, Apr 9, 2013 at 6:13 AM, Luick, Dean dean.lu...@intel.com wrote:
 Can you go through the else of the first if (page is NULL), then enter the 
 second if? If so, isn't the page lost?

Thanks, good catch.  I'll fix that up.
--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC/PATCH v3] IPoIB: Leave space in skb linear buffer for IP headers

2013-04-09 Thread Markus Stockhausen

 
-IPOIB_UD_HEAD_SIZE  = IB_GRH_BYTES + IPOIB_ENCAP_LEN,
+/* add 128 bytes of tailroom for IP/TCP headers */
+IPOIB_UD_HEAD_SIZE  = IB_GRH_BYTES + IPOIB_ENCAP_LEN + 128,

Hello,

the version 3 of the patch finally works. I can see the performance
gains but I cannot feel them (in real life). Here are the results
of my testbed:

Test 1:
netperf/netserver message size 16K

kernel 3.5 default :  5.1 GBit/s
kernel 3.5 + patch v3  :  7.7 GBit/s
kernel 3.5 + max MTU 3K: 10.8 GBit/s

Test 2:
Disk write performance
VM with disk mounted on IB async NFS server

block size  | default  | patch v3 | max MTU 3K
+--+--+--
   1 KB |  10 MB/s |  10 MB/s |  10 MB/s
   2 KB |  20 MB/s |  21 MB/s |  20 MB/s
   4 KB |  40 MB/s |  40 MB/s |  43 MB/s
   8 KB |  68 MB/s |  70 MB/s |  78 MB/s
  16 KB | 105 MB/s | 105 MB/s | 120 MB/s
  32 KB | 150 MB/s | 150 MB/s | 170 MB/s
  64 KB | 200 MB/s | 210 MB/s | 260 MB/s
 128 KB | 270 MB/s | 290 MB/s | 400 MB/s
 256 KB | 300 MB/s | 310 MB/s | 430 MB/s
 512 KB | 305 MB/s | 320 MB/s | 470 MB/s
1024 KB | 310 MB/s | 325 MB/s | 500 MB/s
2048 KB | 310 MB/s | 325 MB/s | 510 MB/s
4096 KB | 370 MB/s | 325 MB/s | 510 MB/s
8192 KB | 400 MB/s | 325 MB/s | 520 MB/s


As you can see netperf throughput increases while NFS does not
even care about the optimizations. Maybe it does not work well
with fragmented SKBs. The MAX MTU 3K values once again are
forced through a hack inside ipoib_main.c.

For curiosity I changed the block splitting in your v3 patch
from small head with large fragment to large head with small
fragment in this line.

IPOIB_UD_HEAD_SIZE  = IB_GRH_BYTES + IPOIB_ENCAP_LEN + 3072

In my 2044 MTU case this brings the netperf  NFS throughput to
the same levels as the dirty hack. Of course this no longer
reflects a head but equals more or less to something like a
new constant IPOIB_UD_FIXED_SKB_SIZE.

I guess 4K MTU will not see any further gains but avoiding the
skb_pull calls should improve speed as well. Maybe a final
adaption could put the cherry on the cake.

Markus


--
To unsubscribe from this list: send the line unsubscribe linux-rdma in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html