On 9/8/2010 8:09 AM, Brice Goglin wrote:
Le 08/09/2010 14:02, Jeff Squyres a écrit :
On Sep 3, 2010, at 3:38 PM, George Bosilca wrote:


However, going over the existing BTLs I can see that some BTLs do not correctly 
set this value:

BTL     Bandwidth        Auto-detect     Status
Elan    2000                NO           Correct
GM      250                 NO           Doubtful
MX      2000/10000          YES (Mbs)    Correct (before the patch)
OFUD    800                 NO           Doubtful
OpenIB  2000/4000/8000      YES (Mbs)    Correct (multiplied by the 
active_width)
Portals 1000                NO           Doubtful
SCTP    100                 NO           Conservative value (correct)
Self    100                 XXX          Correct (doesn't matter anyway)
SM      9000                NO           Correct
TCP     100                 NO           Conservative value (correct)
UDAPL   225                 NO           Incorrect

Now that that patch has been rolled back out, did we come to conclusion here?

- OFUD: why do we still even have this?
- Portals: does it matter if it gets it wrong?  No one will ever multi-rail 
with it.
- TCP: we can add auto-detect code for this (But doesn't have to be right away 
-- i.e., don't make 1.5.0 wait for it).
- UDAPL: I don't think anyone will multi-rail udapl with anything.

Was the *real* problem that Brice's OpenFabrics bandwidth was auto-detected 
incorrectly somehow?

The first problem came from IB not autodetecting at all by default and
using 800Mbit/s instead. When forcing autodetect with mca parameters,
the bandwidth are not perfect but not too bad. When forcing IB manually
to the right bandwidth value, I can tweak things as needed.

Brice
Just to provide some closure on the uDAPL side, we agree with Jeff's comment that we do not see any demand for multi-rail uDAPL with anything. But, we will change the uDPAL number to something more reasonable. Still trying to select an appropriate value.
Rolf

Reply via email to