We have nodes in our HPC system that have 2 NIC's,
one being QDR IB and the second being a slower 10 Gbps card
configured for both RoCE and TCP.  Aggregate bandwidth
tests with 20 cores on one node yelling at 20 cores on a second
node (attached roce.ib.aggregate.pdf) show that without tuning
the slower RoCE interface is being used for small messages
then QDR IB is used for larger messages (red line).  Tuning
the tcp_exclusivity to 1024 to match the openib_exclusivity
adds another 20 Gbps of bidirectional bandwidth to the high end (green
line),
and I'm guessing this is TCP traffic and not RoCE.

     So by default the slower interface is being chosen on the low end, and
I don't think there are tunable parameters to allow me to choose the
QDR interface as the default.  Going forward we'll probably just disable
RoCE on these nodes and go with QDR IB plus 10 Gbps TCP for large messages.


      However, I do think these issues will come up more in the future.
With the low latency of RoCE matching IB, there are more opportunities
to do channel bonding or allowing multiple interfaces for aggregate traffic
for even smaller message sizes.

                Dave Turner

-- 
Work:     davetur...@ksu.edu     (785) 532-7791
             118 Nichols Hall, Manhattan KS  66502
Home:    drdavetur...@gmail.com
              cell: (785) 770-5929

Attachment: roce.ib.aggregate.pdf
Description: Adobe PDF document

Attachment: roce.ib.pdf
Description: Adobe PDF document

Attachment: ompi_info.all
Description: Binary data

Reply via email to