On 9/8/2010 10:41 AM, Jeff Squyres wrote:
On Sep 8, 2010, at 2:09 PM, Brice Goglin wrote:
Was the *real* problem that Brice's OpenFabrics bandwidth was auto-detected
incorrectly somehow?
The first problem came from IB not autodetecting at all by default and
using 800Mbit/s instead.
That shouldn't be the case. Was it autodetecting incorrectly, or just not
autodetecting at all and using 800Mbit/s?
The way the code is currently written, it does not run the autodetect by
default. What happens is it takes a look at
the bandwidth value. If the bandwidth value is 0, it will run the
autodetect code. If the bandwidth is non-zero, it
does not. The bandwidth value is initially set to 800, so the
autodetect is never run. If you want the autodetect
to run, then you have to give it an mca parameter. There are actually
several you can choose. Here is an
example on my machines.
--mca btl_openib_bandwidth_mlx4_0 0
This will then trigger the autodetect to run. Presumably, we need to
figure out what we want to happen
and adjust the code accordingly.
Rolf