We have nodes in our HPC system that have 2 NIC's, one being QDR IB and the second being a slower 10 Gbps card configured for both RoCE and TCP. Aggregate bandwidth tests with 20 cores on one node yelling at 20 cores on a second node (attached roce.ib.aggregate.pdf) show that without tuning the slower RoCE interface is being used for small messages then QDR IB is used for larger messages (red line). Tuning the tcp_exclusivity to 1024 to match the openib_exclusivity adds another 20 Gbps of bidirectional bandwidth to the high end (green line), and I'm guessing this is TCP traffic and not RoCE.
So by default the slower interface is being chosen on the low end, and I don't think there are tunable parameters to allow me to choose the QDR interface as the default. Going forward we'll probably just disable RoCE on these nodes and go with QDR IB plus 10 Gbps TCP for large messages. However, I do think these issues will come up more in the future. With the low latency of RoCE matching IB, there are more opportunities to do channel bonding or allowing multiple interfaces for aggregate traffic for even smaller message sizes. Dave Turner -- Work: davetur...@ksu.edu (785) 532-7791 118 Nichols Hall, Manhattan KS 66502 Home: drdavetur...@gmail.com cell: (785) 770-5929
roce.ib.aggregate.pdf
Description: Adobe PDF document
roce.ib.pdf
Description: Adobe PDF document
ompi_info.all
Description: Binary data