Dave,
Based on your ompi_info.all the following bandwidth are reported on your
system:
MCA btl: parameter "btl_openib_bandwidth" (current value:
"4", data source: default, level: 5 tuner/detail, type: unsigned)
Approximate maximum bandwidth of interconnect (0
= auto-detect value at run-time [not supported in all BTL modules], >= 1 =
bandwidth in Mbps)
MCA btl: parameter "btl_tcp_bandwidth" (current value:
"100", data source: default, level: 5 tuner/detail, type: unsigned)
Approximate maximum bandwidth of interconnect (0
= auto-detect value at run-time [not supported in all BTL modules], >= 1 =
bandwidth in Mbps)
This basically states that on your system the default values for these
parameters are wrong, your TCP network being much faster than the IB. This
explains the somewhat unexpected decision of OMPI.
As a possible solution I suggest you set these bandwidth values to
something more meaningful (directly in your configuration file). As an
example,
btl_openib_bandwidth = 40000
btl_tcp_bandwidth = 10000
make more sense based on your HPC system description.
George.
On Fri, Feb 6, 2015 at 5:37 PM, Dave Turner <[email protected]> wrote:
>
> We have nodes in our HPC system that have 2 NIC's,
> one being QDR IB and the second being a slower 10 Gbps card
> configured for both RoCE and TCP. Aggregate bandwidth
> tests with 20 cores on one node yelling at 20 cores on a second
> node (attached roce.ib.aggregate.pdf) show that without tuning
> the slower RoCE interface is being used for small messages
> then QDR IB is used for larger messages (red line). Tuning
> the tcp_exclusivity to 1024 to match the openib_exclusivity
> adds another 20 Gbps of bidirectional bandwidth to the high end (green
> line),
> and I'm guessing this is TCP traffic and not RoCE.
>
> So by default the slower interface is being chosen on the low end, and
> I don't think there are tunable parameters to allow me to choose the
> QDR interface as the default. Going forward we'll probably just disable
> RoCE on these nodes and go with QDR IB plus 10 Gbps TCP for large
> messages.
>
> However, I do think these issues will come up more in the future.
> With the low latency of RoCE matching IB, there are more opportunities
> to do channel bonding or allowing multiple interfaces for aggregate traffic
> for even smaller message sizes.
>
> Dave Turner
>
> --
> Work: [email protected] (785) 532-7791
> 118 Nichols Hall, Manhattan KS 66502
> Home: [email protected]
> cell: (785) 770-5929
>
> _______________________________________________
> devel mailing list
> [email protected]
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post:
> http://www.open-mpi.org/community/lists/devel/2015/02/16951.php
>