On Sep 5, 2012, at 11:51 AM, Christoph Lameter wrote: > On Wed, 29 Aug 2012, Atchley, Scott wrote: > >> I am benchmarking a sockets based application and I want a sanity check >> on IPoIB performance expectations when using connected mode (65520 MTU). >> I am using the tuning tips in Documentation/infiniband/ipoib.txt. The >> machines have Mellanox QDR cards (see below for the verbose ibv_devinfo >> output). I am using a 2.6.36 kernel. The hosts have single socket Intel >> E5520 (4 core with hyper-threading on) at 2.27 GHz. >> >> I am using netperf's TCP_STREAM and binding cores. The best I have seen >> is ~13 Gbps. Is this the best I can expect from these cards? > > Sounds about right, This is not a hardware limitation but > a limitation of the socket I/O layer / PCI-E bus. The cards generally can > process more data than the PCI bus and the OS can handle. > > PCI-E on PCI 2.0 should give you up to about 2.3 Gbytes/sec with these > nics. So there is like something that the network layer does to you that > limits the bandwidth.
First, thanks for the reply. I am not sure where are are getting the 2.3 GB/s value. When using verbs natively, I can get ~3.4 GB/s. I am assuming that these HCAs lack certain TCP offloads that might allow higher Socket performance. Ethtool reports: # ethtool -k ib0 Offload parameters for ib0: rx-checksumming: off tx-checksumming: off scatter-gather: off tcp segmentation offload: off udp fragmentation offload: off generic segmentation offload: on generic-receive-offload: off There is no checksum support which I would expect to lower performance. Since checksums need to be calculated in the host, I would expect faster processors to help performance some. So basically, am I in the ball park given this hardware? > >> What should I expect as a max for ipoib with FDR cards? > > More of the same. You may want to > > A) increase the block size handled by the socket layer Do you mean altering sysctl with something like: # increase TCP max buffer size setable using setsockopt() net.core.rmem_max = 16777216 net.core.wmem_max = 16777216 # increase Linux autotuning TCP buffer limit net.ipv4.tcp_rmem = 4096 87380 16777216 net.ipv4.tcp_wmem = 4096 65536 16777216 # increase the length of the processor input queue net.core.netdev_max_backlog = 30000 or something increasing the SO_SNFBUF and SO_RCVBUF sizes or something else? > B) Increase the bandwidth by using PCI-E 3 or more PCI-E lanes. > > C) Bypass the socket layer. Look at Sean's rsockets layer f.e. We actually want to test the socket stack and not bypass it. Thanks again! Scott -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html