On Sep 3, 2010, at 09:50 , Brice Goglin wrote:

> Le 03/09/2010 15:38, George Bosilca a écrit :
>> Jeff,
>> 
>> I think you will have to revert this patch as the btl_bandwidth __IS__ 
>> supposed to be in Mbs and not MBs. We usually talk about networks in Mbs 
>> (there is a pattern in Ethernet 1G/10G, Myricom 10G). In addition the 
>> original design of the multi-rail was based on this assumption, and the 
>> multi-rail handling code deal with these values (at that level I don't think 
>> it really matters, but at least it needs consistent values from all BTLs).
>> 
>> However, going over the existing BTLs I can see that some BTLs do not 
>> correctly set this value:
>> 
>> BTL     Bandwidth        Auto-detect     Status
>> Elan    2000                NO           Correct
>> 
> 
> 2000 looks strange to me. Last time I played with Elan4, bandwidth was
> 900MB/s or so.

Lucky you ;) The 2000 was the bandwidth of the last Elan device we had.

> 
>> GM      250                 NO           Doubtful
>> MX      2000/10000          YES (Mbs)    Correct (before the patch)
>> OFUD    800                 NO           Doubtful
>> OpenIB  2000/4000/8000      YES (Mbs)    Correct (multiplied by the 
>> active_width)
>> 
> 
> I found the problem when using both MX and OpenIB at the same time, so
> they can't be both wrong or both correct. IB was reporting 800, not
> 2000/4000/8000. Maybe because auto-detect didn't work and the default is
> wrong:
> btl_openib_mca.c:527:    mca_btl_openib_module.super.btl_bandwidth = 800;

It appears that Open IB only auto-detect the bandwidth if the value is 
explicitly set to zero via the mca parameters. As a last resort: as for the 
other devices you can set it manually. Use something like 
btl_openib_bandwidth_%dev_name% to set the bandwidth per device.

  george.


> 
> Brice
> 
> _______________________________________________
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel


Reply via email to