Re: [ewg] IPoIB to Ethernet routing performance
On Sat, 25 Dec 2010, Ali Ayoub wrote: On Thu, Dec 9, 2010 at 3:46 PM, Christoph Lameter c...@linux.com wrote: On Mon, 6 Dec 2010, sebastien dugue wrote: The Mellanox BridgeX looks a better hardware solution with 12x 10Ge ports but when I tested this they could only provide vNIC functionality and would not commit to adding IPoIB gateway on their roadmap. Right, we did some evaluation on it and this was really a show stopper. Did the same thing here came to the same conclusions. May I ask why do you need IPoIB when you have EoIB (vNic driver)? Why it's a show stopper? EoIB is immature for some use cases like financial. No multicast support f.e. All multicast becomes broadcast. There is extensive support for multicast on IPoIB and the various gotchas and hiccups that where there initially have mostly been worked out. ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
Re: [ewg] IPoIB to Ethernet routing performance
IPoIB is far easier to use and does not carry out the additional management burden of vNICS. With vNICs you have to manage the MAC address mapping to Ethernet g/w port. In some situations, such as when multiple G/w's are used for resiliency this can amount to a lot of separate vNICs on each server to manage. In a small configuration I had, we ended up with 6 vNICS per server to manage. On a large configuration this additional management would be a big burden. My experience with IPoIB has always been very positive. All my existing socket programs have worked, even some esoteric ioctls I use for multicast and buffer management. Performance could always be better, but in my experience it's not great for the vNICS either. Latency in particular was very disappointing when I tested. If you want high performance you have to avoid TCP/IP. -Original Message- From: Jabe [mailto:jabe.chap...@shiftmail.org] Sent: 27 December 2010 11:51 To: richard.crouc...@informatix-sol.com Cc: Richard Croucher; 'Ali Ayoub'; 'Christoph Lameter'; 'linux-rdma'; 'sebastien dugue'; 'OF EWG' Subject: Re: [ewg] IPoIB to Ethernet routing performance On 12/26/2010 11:57 AM, Richard Croucher wrote: The vNIC driver only works when you have Ethernet/InfiniBand hardware gateways in your environment. It is useful when you have external hosts to communicate with which do not have direct InfiniBand connectivity. IPoIB is still heavily used in these environments to provide TCP/IP connectivity within the InfiniBand fabric. The primary Use Case for vNICs is probably for virtualization servers, so that individual Guests can be presented with a virtual Ethernet NIC and do not lead to load any InfiniBand drivers. Only the hypervisor needs to have the InfiniBand software stack loaded. I've also applied vNICs in the Financial Services arena, for connectivity to external TCP/IP services but there the IPoIB gateway function is arguably more useful. The whole vNIC arena is complicated by different, incompatible implementations from each of Qlogic and Mellanox. Richard Richard, with your explanation I understand why vNIC / EoIB is used in the case you cite, but I don't understand why it is NOT used in the other cases (like Ali says). I can *guess* it's probably because with a virtual ethernet fabric you have to do all IP stack in software, probably without even having the stateless offloads (so it would be a performance reason). Is that the reason? Thank you ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
Re: [ewg] IPoIB to Ethernet routing performance
On 12/28/2010 5:30 PM, Reeted wrote: You and Richard seem to have good experience of infiniband in virtualized environments. May I ask one thing? We were thinking about buying some Mellanox Connectx-2 for use with SR-IOV (hardware virtualization for PCI bypass, supposedly supported by connectx-2) in KVM (also supposedly supports SR-IOV and PCI bypass). Do you have info if this will work, in KVM or other hypervisors? I asked in KVM mailing list but they have not tried this card (which is the only SR-IOV card among Infiniband ones, so they have not tried infiniband). We are working on enabling SRIOV on ConnectX2 cards Once we will have it working with KVM we will submit the patches to the linux-rdma list Should be in few months - but don't ask how much is few :-) Tziporet ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
Re: [ewg] IPoIB to Ethernet routing performance
On 12/28/2010 01:06 AM, Ali Ayoub wrote: EoIB primary use is not virtualization, although it can support it in better ways than other ULPs. FYI, today running full/para virtualized driver in the Guest OS is needed also for IPoIB. Only when platform-virtualization solution is available, the GOS will run IB stack (for any ULP). You and Richard seem to have good experience of infiniband in virtualized environments. May I ask one thing? We were thinking about buying some Mellanox Connectx-2 for use with SR-IOV (hardware virtualization for PCI bypass, supposedly supported by connectx-2) in KVM (also supposedly supports SR-IOV and PCI bypass). Do you have info if this will work, in KVM or other hypervisors? I asked in KVM mailing list but they have not tried this card (which is the only SR-IOV card among Infiniband ones, so they have not tried infiniband). We can be interested in both true infiniband and IPoIB support. Thank you. ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
Re: [ewg] IPoIB to Ethernet routing performance
On 12/26/2010 11:57 AM, Richard Croucher wrote: The vNIC driver only works when you have Ethernet/InfiniBand hardware gateways in your environment. It is useful when you have external hosts to communicate with which do not have direct InfiniBand connectivity. IPoIB is still heavily used in these environments to provide TCP/IP connectivity within the InfiniBand fabric. The primary Use Case for vNICs is probably for virtualization servers, so that individual Guests can be presented with a virtual Ethernet NIC and do not lead to load any InfiniBand drivers. Only the hypervisor needs to have the InfiniBand software stack loaded. I've also applied vNICs in the Financial Services arena, for connectivity to external TCP/IP services but there the IPoIB gateway function is arguably more useful. The whole vNIC arena is complicated by different, incompatible implementations from each of Qlogic and Mellanox. Richard Richard, with your explanation I understand why vNIC / EoIB is used in the case you cite, but I don't understand why it is NOT used in the other cases (like Ali says). I can *guess* it's probably because with a virtual ethernet fabric you have to do all IP stack in software, probably without even having the stateless offloads (so it would be a performance reason). Is that the reason? Thank you ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
Re: [ewg] IPoIB to Ethernet routing performance
On Sun, Dec 26, 2010 at 2:57 AM, Richard Croucher rich...@informatix-sol.com wrote: The vNIC driver only works when you have Ethernet/InfiniBand hardware gateways in your environment. It is useful when you have external hosts to communicate with which do not have direct InfiniBand connectivity. IPoIB is still heavily used in these environments to provide TCP/IP connectivity within the InfiniBand fabric. Once you have BridgeX HW, Mellanox vNic (EoIB) driver provides IB to EN connectivity, as well as IB to IB connectivity. Note that IB to IB connectivity doesn't involve the bridge HW (peer-to-peer communication) so any packet sent internally within the IB fabric doesn't reach the bridge HW. Today, EoIB requires the BridgeX HW, in the future, Mellanox may support bridge-less mode where it can work without the bridge HW. The primary Use Case for vNICs is probably for virtualization servers, so that individual Guests can be presented with a virtual Ethernet NIC and do not lead to load any InfiniBand drivers. Only the hypervisor needs to have the InfiniBand software stack loaded. EoIB primary use is not virtualization, although it can support it in better ways than other ULPs. FYI, today running full/para virtualized driver in the Guest OS is needed also for IPoIB. Only when platform-virtualization solution is available, the GOS will run IB stack (for any ULP). -Original Message- From: ewg-boun...@lists.openfabrics.org [mailto:ewg-boun...@lists.openfabrics.org] On Behalf Of Ali Ayoub Sent: 26 December 2010 07:43 To: Christoph Lameter Cc: linux-rdma; sebastien dugue; Richard Croucher; OF EWG Subject: Re: [ewg] IPoIB to Ethernet routing performance On Thu, Dec 9, 2010 at 3:46 PM, Christoph Lameter c...@linux.com wrote: On Mon, 6 Dec 2010, sebastien dugue wrote: The Mellanox BridgeX looks a better hardware solution with 12x 10Ge ports but when I tested this they could only provide vNIC functionality and would not commit to adding IPoIB gateway on their roadmap. Right, we did some evaluation on it and this was really a show stopper. Did the same thing here came to the same conclusions. May I ask why do you need IPoIB when you have EoIB (vNic driver)? Why it's a show stopper? ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
Re: [ewg] IPoIB to Ethernet routing performance
The vNIC driver only works when you have Ethernet/InfiniBand hardware gateways in your environment. It is useful when you have external hosts to communicate with which do not have direct InfiniBand connectivity. IPoIB is still heavily used in these environments to provide TCP/IP connectivity within the InfiniBand fabric. The primary Use Case for vNICs is probably for virtualization servers, so that individual Guests can be presented with a virtual Ethernet NIC and do not lead to load any InfiniBand drivers. Only the hypervisor needs to have the InfiniBand software stack loaded. I've also applied vNICs in the Financial Services arena, for connectivity to external TCP/IP services but there the IPoIB gateway function is arguably more useful. The whole vNIC arena is complicated by different, incompatible implementations from each of Qlogic and Mellanox. Richard -Original Message- From: ewg-boun...@lists.openfabrics.org [mailto:ewg-boun...@lists.openfabrics.org] On Behalf Of Ali Ayoub Sent: 26 December 2010 07:43 To: Christoph Lameter Cc: linux-rdma; sebastien dugue; Richard Croucher; OF EWG Subject: Re: [ewg] IPoIB to Ethernet routing performance On Thu, Dec 9, 2010 at 3:46 PM, Christoph Lameter c...@linux.com wrote: On Mon, 6 Dec 2010, sebastien dugue wrote: The Mellanox BridgeX looks a better hardware solution with 12x 10Ge ports but when I tested this they could only provide vNIC functionality and would not commit to adding IPoIB gateway on their roadmap. Right, we did some evaluation on it and this was really a show stopper. Did the same thing here came to the same conclusions. May I ask why do you need IPoIB when you have EoIB (vNic driver)? Why it's a show stopper? ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
Re: [ewg] IPoIB to Ethernet routing performance
On Thu, Dec 9, 2010 at 3:46 PM, Christoph Lameter c...@linux.com wrote: On Mon, 6 Dec 2010, sebastien dugue wrote: The Mellanox BridgeX looks a better hardware solution with 12x 10Ge ports but when I tested this they could only provide vNIC functionality and would not commit to adding IPoIB gateway on their roadmap. Right, we did some evaluation on it and this was really a show stopper. Did the same thing here came to the same conclusions. May I ask why do you need IPoIB when you have EoIB (vNic driver)? Why it's a show stopper? ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
Re: [ewg] IPoIB to Ethernet routing performance
Hi Matthieu, On Thu, 16 Dec 2010 23:20:35 +0100 matthieu hautreux matthieu.hautr...@gmail.com wrote: The router is fitted with one ConnectX2 QDR HCA and one dual port Myricom 10G Ethernet adapter. ... Here are some numbers: - 1 IPoIB stream between client and router: 20 Gbits/sec Looks OK. - 2 Ethernet streams between router and server: 19.5 Gbits/sec Looks OK. Actually I am amazed you can get such a speed with IPoIB. Trying with NPtcp on my DDR infiniband I can only obtain about 4.6Gbit/sec at the best packet size (that is 1/4 of the infiniband bandwidth) with this chip embedded in the mainboard: InfiniBand: Mellanox Technologies MT25204 [InfiniHost III Lx HCA]; and dual E5430 xeon (not nehalem). That's with 2.6.37 kernel and vanilla ib_ipoib module. What's wrong with my setup? I always assumed that such a slow speed was due to the lack of offloading capabilities you get with ethernet cards, but maybe I was wrong...? Hi, I made the same kind of experimentations than Sebastien and got results similar to those of you Jabe, with about ~4.6Gbit/s. I am using QDR HCA and ipoib in connected mode on the infiniband part of the testbed and 2 * 10Ge ethernet cards in bonding on the ethernet side of the router. To get better results, I had to increase the MTU on the ethernet side from 1500 to 9000. Indeed, due to the TCP Path MTU discovery, during routed exchanges the MTU used on the ipoib link for TCP messages was automatically set to the minimum MTU of 1500. This small but yet very standard MTU value does not seem to be well handled by the ipoib_cm layer. This may be due to the fact that the IB MTU is 2048. Every 1500 bytes packet is padded to 2048 bytes before being sent through the wire, so you're loosing roughly 25% bandwidth compared to an IPoIB MTU which is a multiple of 2048. Is this issue already known and/or reported ? It should be really interesting to understand why a small value of MTU is such a problem for ipoib_cm. After a quick look at the code, it seems that ipoib packet processing is single threaded and that each ip packet is transmitted/received and processed as a single unit. If that appears to be the bottleneck, do you think that packets aggregation and/or processing parallelization could be feasible in a future ipoib module ? A big part of the ethernet networks are configured with an MTU of 1500 and 10Ge cards currently employ parallelization strategy in their kernel module to cope with this problem. It is clear that a bigger MTU is better but it is not always possible to achieve due to existing equipments and machines. IMHO, that is a real problem for infiniband/ethernet interoperability. Sebastien, concerning your bad performance of 9.3Gbit/s when routing 2 streams from you infiniband client to your ethernet server, what is the mode of your bonding on the ethernet side during the test ? are you using balance-rr or LACP ? I did not use any Ethernet teaming, I only declared 2 aliases on the clients' ib0 and set the routing tables accordingly. Sébastien. I got this kind of results with LACP as only one link is really used during the transmissions and this link depends of the layer 2 informations of the peers involved in the communication (as long as you use the default xmit_hash_policy). HTH Regards, Matthieu Also what application did you use for the benchmark? Thank you ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
Re: [ewg] IPoIB to Ethernet routing performance
This may be due to the fact that the IB MTU is 2048. Every 1500 bytes packet is padded to 2048 bytes before being sent through the wire, so you're loosing roughly 25% bandwidth compared to an IPoIB MTU which is a multiple of 2048. This isn't true. IB packets are only padded to a multiple of 4 bytes. However there's no point in using IPoIB connected mode to pass packets smaller than the IB MTU -- you might as well use datagram mode. - R. ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
Re: [ewg] IPoIB to Ethernet routing performance
2010/12/17 Roland Dreier rdre...@cisco.com This may be due to the fact that the IB MTU is 2048. Every 1500 bytes packet is padded to 2048 bytes before being sent through the wire, so you're loosing roughly 25% bandwidth compared to an IPoIB MTU which is a multiple of 2048. This isn't true. IB packets are only padded to a multiple of 4 bytes. However there's no point in using IPoIB connected mode to pass packets smaller than the IB MTU -- you might as well use datagram mode. We are using infiniband as an HPC cluster interconnect network and our compute nodes use this technology to exchange data in IPoIB with a MTU of 65520, do RDMA MPI communications and access Lustre filesystems. On top of that, some nodes are connected to both the IB interconnect and an external ethernet network. These nodes act as IP routers and enable compute nodes to access site centric resources (home directories using nfs, LDAP, ...). Compute nodes are using IPoIB with a large MTU to contact the router nodes so we get really good performances when we only communicate with the routers. However, as soon as the compute nodes communicate with the external ethernet world, the TCP path MTU discovery automatically reduces IPoIB MTU to 1500, the ethernet MTU, and we touch this 4.6Gbit/s wall. Using datagram mode in our scenario is not possible as it will reduce the cluster internal performances in IPoIB. What we 'd rather have is an ipoib_cm that would better handle small packet. Do you think that this limitation is a HCA hardware limitation (number of packets per second) or a software limitation (number of packet processed per second) ? I would think that it is a software limitation as better results are achieved in datagram mode with a same 1500 bytes MTU. IPoIB in connected mode seems to use a single completion queue with a single MSI vector for all the queue pairs it creates to communicate. Perhaps that multiplying the number of completion queues and MSI vectors could help to spread/parallelize the load and get better results. What is your feeling about that ? Regards, Matthieu In fact, we really need to use IPoIB connected mode as - R. ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
Re: [ewg] IPoIB to Ethernet routing performance
Qlogic claim their QLE7340 is lowest latency for MPI applications but it is restricted to a single port. I've not carried out IPoIB testing of this card. There are plenty of published results for the Mellanox ConnectX cards which tend to account for the majority of HCA's deployed. My opinion is that the Mellanox ConnectX has more capabilities on board and is probably the best all rounder, but the Qlogic TrueScale lets certain apps get closer to the metal and therefore lower latency. You really have to carry out your own testing, since it depends on what you consider important. Richard -Original Message- From: ewg-boun...@lists.openfabrics.org [mailto:ewg-boun...@lists.openfabrics.org] On Behalf Of Jabe Sent: 13 December 2010 16:02 To: Jason Gunthorpe Cc: linux-rdma; OF EWG Subject: Re: [ewg] IPoIB to Ethernet routing performance On 12/06/2010 10:27 PM, Jason Gunthorpe wrote: On Mon, Dec 06, 2010 at 09:47:42PM +0100, Jabe wrote: Technologies MT25204 [InfiniHost III Lx HCA]; and dual E5430 xeon (not nehalem). Newer Mellanox cards have most of the offloads you see for ethernet so they get better performance. Plus Nehalem is just better at TCP in the first place.. Very interesting Do you know if new Qlogic IB cards like QLE7340 also have such offloads? In general which brand would you recommend for IB and for IPoIB? Thank you ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
Re: [ewg] IPoIB to Ethernet routing performance
On Mon, 6 Dec 2010, sebastien dugue wrote: The Mellanox BridgeX looks a better hardware solution with 12x 10Ge ports but when I tested this they could only provide vNIC functionality and would not commit to adding IPoIB gateway on their roadmap. Right, we did some evaluation on it and this was really a show stopper. Did the same thing here came to the same conclusions. Qlogic also offer the 12400 Gateway. This has 6x 10ge ports. However, like the Mellanox, I understand they only provide host vNIC support. Really? I was hoping that they would have something worth looking at. ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
[ewg] IPoIB to Ethernet routing performance
Hi, I know this might be off topic, but somebody may have already run into the same problem before. I'm trying to use a server as a router between an IB fabric and an Ethernet network. The router is fitted with one ConnectX2 QDR HCA and one dual port Myricom 10G Ethernet adapter. I did some bandwidth measurements using iperf with the following setup: +-+ +-+ +-+ | | | | 10G Eth | | | |QDR IB | +---+ | | client +---+ Router | 10G Eth | Server | | | | +---+ | | | | | | | +-+ +-+ +-+ However, the routing performance is far from what I would have expected. Here are some numbers: - 1 IPoIB stream between client and router: 20 Gbits/sec Looks OK. - 2 Ethernet streams between router and server: 19.5 Gbits/sec Looks OK. - routing 1 IPoIB stream to 1 Ethernet stream from client to server: 9.8 Gbits/sec We manage to saturate the Ethernet link, looks good so far. - routing 2 IPoIB streams to 2 Ethernet streams from client to server: 9.3 Gbits/sec Argh, even less that when routing a single stream. I would have expected a bit more than this. Has anybody ever tried to do some routing between an IB fabric and an Ethernet network and achieved some sensible bandwidth figures? Are there some known limitations in what I try to achieve? Thanks, Sébastien. ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
Re: [ewg] IPoIB to Ethernet routing performance
You may be able to improve by doing some OS tuning. All this data should stay in kernel mode but there are lots of bottlenecks in the TCP/IP stack that limit scalability. The IPoIB code has not been optimized for this use case. You don't mention what Server, kernel and OFED distro you are running. The best performance is achieved using InfiniBand/Ethernet hardware gateways. Most of these provide virtual Ethernet NICs to InfiniBand hosts, but the Voltaire 4036E does provide a IPoIB to Ethernet gateway capability. This is FPGA based so does provide much higher performance than you will achieve using a standard server solution. -Original Message- From: ewg-boun...@lists.openfabrics.org [mailto:ewg-boun...@lists.openfabrics.org] On Behalf Of sebastien dugue Sent: 06 December 2010 10:25 To: OF EWG Cc: linux-rdma Subject: [ewg] IPoIB to Ethernet routing performance Hi, I know this might be off topic, but somebody may have already run into the same problem before. I'm trying to use a server as a router between an IB fabric and an Ethernet network. The router is fitted with one ConnectX2 QDR HCA and one dual port Myricom 10G Ethernet adapter. I did some bandwidth measurements using iperf with the following setup: +-+ +-+ +-+ | | | | 10G Eth | | | |QDR IB | +---+ | | client +---+ Router | 10G Eth | Server | | | | +---+ | | | | | | | +-+ +-+ +-+ However, the routing performance is far from what I would have expected. Here are some numbers: - 1 IPoIB stream between client and router: 20 Gbits/sec Looks OK. - 2 Ethernet streams between router and server: 19.5 Gbits/sec Looks OK. - routing 1 IPoIB stream to 1 Ethernet stream from client to server: 9.8 Gbits/sec We manage to saturate the Ethernet link, looks good so far. - routing 2 IPoIB streams to 2 Ethernet streams from client to server: 9.3 Gbits/sec Argh, even less that when routing a single stream. I would have expected a bit more than this. Has anybody ever tried to do some routing between an IB fabric and an Ethernet network and achieved some sensible bandwidth figures? Are there some known limitations in what I try to achieve? Thanks, Sébastien. ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
Re: [ewg] IPoIB to Ethernet routing performance
On Mon, 6 Dec 2010 10:49:58 - Richard Croucher rich...@informatix-sol.com wrote: You may be able to improve by doing some OS tuning. Right, I tried a few things concerning the TCP/IP stack tuning but nothing really came out of it. All this data should stay in kernel mode but there are lots of bottlenecks in the TCP/IP stack that limit scalability. That may be my problem in fact. The IPoIB code has not been optimized for this use case. I don't think IPoIB to be the bottleneck. In this case as I managed to feed 2 IPoIB streams between the client and the router yielding about 40 Gbits/s bandwidth. You don't mention what Server, kernel and OFED distro you are running. Right, sorry. The router is one of our 4 sockets Nehalem-EX box with 2 IOHs which is running an OFED 1.5.2. The best performance is achieved using InfiniBand/Ethernet hardware gateways. Most of these provide virtual Ethernet NICs to InfiniBand hosts, but the Voltaire 4036E does provide a IPoIB to Ethernet gateway capability. This is FPGA based so does provide much higher performance than you will achieve using a standard server solution. That may be a solution indeed. Are there any real world figures out there concerning the 4036E performance? Thanks Richard, Sébastien. -Original Message- From: ewg-boun...@lists.openfabrics.org [mailto:ewg-boun...@lists.openfabrics.org] On Behalf Of sebastien dugue Sent: 06 December 2010 10:25 To: OF EWG Cc: linux-rdma Subject: [ewg] IPoIB to Ethernet routing performance Hi, I know this might be off topic, but somebody may have already run into the same problem before. I'm trying to use a server as a router between an IB fabric and an Ethernet network. The router is fitted with one ConnectX2 QDR HCA and one dual port Myricom 10G Ethernet adapter. I did some bandwidth measurements using iperf with the following setup: +-+ +-+ +-+ | | | | 10G Eth | | | |QDR IB | +---+ | | client +---+ Router | 10G Eth | Server | | | | +---+ | | | | | | | +-+ +-+ +-+ However, the routing performance is far from what I would have expected. Here are some numbers: - 1 IPoIB stream between client and router: 20 Gbits/sec Looks OK. - 2 Ethernet streams between router and server: 19.5 Gbits/sec Looks OK. - routing 1 IPoIB stream to 1 Ethernet stream from client to server: 9.8 Gbits/sec We manage to saturate the Ethernet link, looks good so far. - routing 2 IPoIB streams to 2 Ethernet streams from client to server: 9.3 Gbits/sec Argh, even less that when routing a single stream. I would have expected a bit more than this. Has anybody ever tried to do some routing between an IB fabric and an Ethernet network and achieved some sensible bandwidth figures? Are there some known limitations in what I try to achieve? Thanks, Sébastien. ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
Re: [ewg] IPoIB to Ethernet routing performance
Unfortunately, the 4036E only has two 10G Ethernet ports which will ultimately limit the throughput. The Mellanox BridgeX looks a better hardware solution with 12x 10Ge ports but when I tested this they could only provide vNIC functionality and would not commit to adding IPoIB gateway on their roadmap. Qlogic also offer the 12400 Gateway. This has 6x 10ge ports. However, like the Mellanox, I understand they only provide host vNIC support. I'll leave it to representatives from Voltaire, Mellanox and Qlogic to update us. Particularly on support for InfiniBand to Ethernet Gateway for RoCEE. This is needed so that RDMA sessions can be run between InfiniBand and RoCEE connected hosts. I don't believe this will work over any of the today's available products. Richard -Original Message- From: sebastien dugue [mailto:sebastien.du...@bull.net] Sent: 06 December 2010 11:40 To: Richard Croucher Cc: 'OF EWG'; 'linux-rdma' Subject: Re: [ewg] IPoIB to Ethernet routing performance On Mon, 6 Dec 2010 10:49:58 - Richard Croucher rich...@informatix-sol.com wrote: You may be able to improve by doing some OS tuning. Right, I tried a few things concerning the TCP/IP stack tuning but nothing really came out of it. All this data should stay in kernel mode but there are lots of bottlenecks in the TCP/IP stack that limit scalability. That may be my problem in fact. The IPoIB code has not been optimized for this use case. I don't think IPoIB to be the bottleneck. In this case as I managed to feed 2 IPoIB streams between the client and the router yielding about 40 Gbits/s bandwidth. You don't mention what Server, kernel and OFED distro you are running. Right, sorry. The router is one of our 4 sockets Nehalem-EX box with 2 IOHs which is running an OFED 1.5.2. The best performance is achieved using InfiniBand/Ethernet hardware gateways. Most of these provide virtual Ethernet NICs to InfiniBand hosts, but the Voltaire 4036E does provide a IPoIB to Ethernet gateway capability. This is FPGA based so does provide much higher performance than you will achieve using a standard server solution. That may be a solution indeed. Are there any real world figures out there concerning the 4036E performance? Thanks Richard, Sébastien. -Original Message- From: ewg-boun...@lists.openfabrics.org [mailto:ewg-boun...@lists.openfabrics.org] On Behalf Of sebastien dugue Sent: 06 December 2010 10:25 To: OF EWG Cc: linux-rdma Subject: [ewg] IPoIB to Ethernet routing performance Hi, I know this might be off topic, but somebody may have already run into the same problem before. I'm trying to use a server as a router between an IB fabric and an Ethernet network. The router is fitted with one ConnectX2 QDR HCA and one dual port Myricom 10G Ethernet adapter. I did some bandwidth measurements using iperf with the following setup: +-+ +-+ +-+ | | | | 10G Eth | | | |QDR IB | +---+ | | client +---+ Router | 10G Eth | Server | | | | +---+ | | | | | | | +-+ +-+ +-+ However, the routing performance is far from what I would have expected. Here are some numbers: - 1 IPoIB stream between client and router: 20 Gbits/sec Looks OK. - 2 Ethernet streams between router and server: 19.5 Gbits/sec Looks OK. - routing 1 IPoIB stream to 1 Ethernet stream from client to server: 9.8 Gbits/sec We manage to saturate the Ethernet link, looks good so far. - routing 2 IPoIB streams to 2 Ethernet streams from client to server: 9.3 Gbits/sec Argh, even less that when routing a single stream. I would have expected a bit more than this. Has anybody ever tried to do some routing between an IB fabric and an Ethernet network and achieved some sensible bandwidth figures? Are there some known limitations in what I try to achieve? Thanks, Sébastien. ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
Re: [ewg] IPoIB to Ethernet routing performance
On Mon, 6 Dec 2010 12:08:43 - Richard Croucher rich...@informatix-sol.com wrote: Unfortunately, the 4036E only has two 10G Ethernet ports which will ultimately limit the throughput. I'll need to look into this option. The Mellanox BridgeX looks a better hardware solution with 12x 10Ge ports but when I tested this they could only provide vNIC functionality and would not commit to adding IPoIB gateway on their roadmap. Right, we did some evaluation on it and this was really a show stopper. Thanks, Sébastien. Qlogic also offer the 12400 Gateway. This has 6x 10ge ports. However, like the Mellanox, I understand they only provide host vNIC support. I'll leave it to representatives from Voltaire, Mellanox and Qlogic to update us. Particularly on support for InfiniBand to Ethernet Gateway for RoCEE. This is needed so that RDMA sessions can be run between InfiniBand and RoCEE connected hosts. I don't believe this will work over any of the today's available products. Richard -Original Message- From: sebastien dugue [mailto:sebastien.du...@bull.net] Sent: 06 December 2010 11:40 To: Richard Croucher Cc: 'OF EWG'; 'linux-rdma' Subject: Re: [ewg] IPoIB to Ethernet routing performance On Mon, 6 Dec 2010 10:49:58 - Richard Croucher rich...@informatix-sol.com wrote: You may be able to improve by doing some OS tuning. Right, I tried a few things concerning the TCP/IP stack tuning but nothing really came out of it. All this data should stay in kernel mode but there are lots of bottlenecks in the TCP/IP stack that limit scalability. That may be my problem in fact. The IPoIB code has not been optimized for this use case. I don't think IPoIB to be the bottleneck. In this case as I managed to feed 2 IPoIB streams between the client and the router yielding about 40 Gbits/s bandwidth. You don't mention what Server, kernel and OFED distro you are running. Right, sorry. The router is one of our 4 sockets Nehalem-EX box with 2 IOHs which is running an OFED 1.5.2. The best performance is achieved using InfiniBand/Ethernet hardware gateways. Most of these provide virtual Ethernet NICs to InfiniBand hosts, but the Voltaire 4036E does provide a IPoIB to Ethernet gateway capability. This is FPGA based so does provide much higher performance than you will achieve using a standard server solution. That may be a solution indeed. Are there any real world figures out there concerning the 4036E performance? Thanks Richard, Sébastien. -Original Message- From: ewg-boun...@lists.openfabrics.org [mailto:ewg-boun...@lists.openfabrics.org] On Behalf Of sebastien dugue Sent: 06 December 2010 10:25 To: OF EWG Cc: linux-rdma Subject: [ewg] IPoIB to Ethernet routing performance Hi, I know this might be off topic, but somebody may have already run into the same problem before. I'm trying to use a server as a router between an IB fabric and an Ethernet network. The router is fitted with one ConnectX2 QDR HCA and one dual port Myricom 10G Ethernet adapter. I did some bandwidth measurements using iperf with the following setup: +-+ +-+ +-+ | | | | 10G Eth | | | |QDR IB | +---+ | | client +---+ Router | 10G Eth | Server | | | | +---+ | | | | | | | +-+ +-+ +-+ However, the routing performance is far from what I would have expected. Here are some numbers: - 1 IPoIB stream between client and router: 20 Gbits/sec Looks OK. - 2 Ethernet streams between router and server: 19.5 Gbits/sec Looks OK. - routing 1 IPoIB stream to 1 Ethernet stream from client to server: 9.8 Gbits/sec We manage to saturate the Ethernet link, looks good so far. - routing 2 IPoIB streams to 2 Ethernet streams from client to server: 9.3 Gbits/sec Argh, even less that when routing a single stream. I would have expected a bit more than this. Has anybody ever tried to do some routing between an IB fabric and an Ethernet network and achieved some sensible bandwidth figures? Are there some known limitations in what I try to achieve? Thanks, Sébastien. ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
Re: [ewg] IPoIB to Ethernet routing performance
On Mon, Dec 06, 2010 at 09:47:42PM +0100, Jabe wrote: Technologies MT25204 [InfiniHost III Lx HCA]; and dual E5430 xeon (not nehalem). Newer Mellanox cards have most of the offloads you see for ethernet so they get better performance. Plus Nehalem is just better at TCP in the first place.. Jason ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
Re: [ewg] IPoIB to Ethernet routing performance
The router is fitted with one ConnectX2 QDR HCA and one dual port Myricom 10G Ethernet adapter. ... Here are some numbers: - 1 IPoIB stream between client and router: 20 Gbits/sec Looks OK. - 2 Ethernet streams between router and server: 19.5 Gbits/sec Looks OK. Actually I am amazed you can get such a speed with IPoIB. Trying with NPtcp on my DDR infiniband I can only obtain about 4.6Gbit/sec at the best packet size (that is 1/4 of the infiniband bandwidth) with this chip embedded in the mainboard: InfiniBand: Mellanox Technologies MT25204 [InfiniHost III Lx HCA]; and dual E5430 xeon (not nehalem). That's with 2.6.37 kernel and vanilla ib_ipoib module. What's wrong with my setup? I always assumed that such a slow speed was due to the lack of offloading capabilities you get with ethernet cards, but maybe I was wrong...? Also what application did you use for the benchmark? Thank you ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
Re: [ewg] IPoIB to Ethernet routing performance
Hi Jabe, On Mon, 06 Dec 2010 21:47:42 +0100 Jabe jabe.chap...@shiftmail.org wrote: The router is fitted with one ConnectX2 QDR HCA and one dual port Myricom 10G Ethernet adapter. ... Here are some numbers: - 1 IPoIB stream between client and router: 20 Gbits/sec Looks OK. - 2 Ethernet streams between router and server: 19.5 Gbits/sec Looks OK. Actually I am amazed you can get such a speed with IPoIB. Trying with NPtcp on my DDR infiniband I can only obtain about 4.6Gbit/sec at the best packet size (that is 1/4 of the infiniband bandwidth) with this chip embedded in the mainboard: InfiniBand: Mellanox Technologies MT25204 [InfiniHost III Lx HCA]; and dual E5430 xeon (not nehalem). That's with 2.6.37 kernel and vanilla ib_ipoib module. What's wrong with my setup? I always assumed that such a slow speed was due to the lack of offloading capabilities you get with ethernet cards, but maybe I was wrong...? Also what application did you use for the benchmark? I'm using iperf. Sébastien. Thank you ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg
Re: [ewg] IPoIB to Ethernet routing performance
Hi Jason, On Mon, 6 Dec 2010 14:27:59 -0700 Jason Gunthorpe jguntho...@obsidianresearch.com wrote: On Mon, Dec 06, 2010 at 09:47:42PM +0100, Jabe wrote: Technologies MT25204 [InfiniHost III Lx HCA]; and dual E5430 xeon (not nehalem). Newer Mellanox cards have most of the offloads you see for ethernet so they get better performance. What kind of offload capabilities are you referring to for IPoIB? Plus Nehalem is just better at TCP in the first place.. Well that depends on which Nehalem we're talking about. I've found that the EX performs more poorly than the EP, though I didn't dig enough to find out why. Sébastien. ___ ewg mailing list ewg@lists.openfabrics.org http://lists.openfabrics.org/cgi-bin/mailman/listinfo/ewg