Re: [OMPI devel] [PATCH] fix mx btl_bandwidth

2010-09-08 Thread Jeff Squyres
On Sep 3, 2010, at 3:38 PM, George Bosilca wrote:

> However, going over the existing BTLs I can see that some BTLs do not 
> correctly set this value:
> 
> BTL BandwidthAuto-detect Status
> Elan2000NO   Correct
> GM  250 NO   Doubtful
> MX  2000/1  YES (Mbs)Correct (before the patch)
> OFUD800 NO   Doubtful
> OpenIB  2000/4000/8000  YES (Mbs)Correct (multiplied by the 
> active_width)
> Portals 1000NO   Doubtful
> SCTP100 NO   Conservative value (correct)
> Self100 XXX  Correct (doesn't matter anyway)
> SM  9000NO   Correct
> TCP 100 NO   Conservative value (correct)
> UDAPL   225 NO   Incorrect

Now that that patch has been rolled back out, did we come to conclusion here?

- OFUD: why do we still even have this?
- Portals: does it matter if it gets it wrong?  No one will ever multi-rail 
with it.
- TCP: we can add auto-detect code for this (But doesn't have to be right away 
-- i.e., don't make 1.5.0 wait for it).
- UDAPL: I don't think anyone will multi-rail udapl with anything.

Was the *real* problem that Brice's OpenFabrics bandwidth was auto-detected 
incorrectly somehow?

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/




Re: [OMPI devel] [PATCH] fix mx btl_bandwidth

2010-09-08 Thread Brice Goglin
Le 08/09/2010 14:02, Jeff Squyres a écrit :
> On Sep 3, 2010, at 3:38 PM, George Bosilca wrote:
>
>   
>> However, going over the existing BTLs I can see that some BTLs do not 
>> correctly set this value:
>>
>> BTL BandwidthAuto-detect Status
>> Elan2000NO   Correct
>> GM  250 NO   Doubtful
>> MX  2000/1  YES (Mbs)Correct (before the patch)
>> OFUD800 NO   Doubtful
>> OpenIB  2000/4000/8000  YES (Mbs)Correct (multiplied by the 
>> active_width)
>> Portals 1000NO   Doubtful
>> SCTP100 NO   Conservative value (correct)
>> Self100 XXX  Correct (doesn't matter anyway)
>> SM  9000NO   Correct
>> TCP 100 NO   Conservative value (correct)
>> UDAPL   225 NO   Incorrect
>> 
> Now that that patch has been rolled back out, did we come to conclusion here?
>
> - OFUD: why do we still even have this?
> - Portals: does it matter if it gets it wrong?  No one will ever multi-rail 
> with it.
> - TCP: we can add auto-detect code for this (But doesn't have to be right 
> away -- i.e., don't make 1.5.0 wait for it).
> - UDAPL: I don't think anyone will multi-rail udapl with anything.
>
> Was the *real* problem that Brice's OpenFabrics bandwidth was auto-detected 
> incorrectly somehow?
>   

The first problem came from IB not autodetecting at all by default and
using 800Mbit/s instead. When forcing autodetect with mca parameters,
the bandwidth are not perfect but not too bad. When forcing IB manually
to the right bandwidth value, I can tweak things as needed.

Brice



Re: [OMPI devel] [PATCH] fix mx btl_bandwidth

2010-09-08 Thread Rolf vandeVaart

 On 9/8/2010 8:09 AM, Brice Goglin wrote:

Le 08/09/2010 14:02, Jeff Squyres a écrit :

On Sep 3, 2010, at 3:38 PM, George Bosilca wrote:



However, going over the existing BTLs I can see that some BTLs do not correctly 
set this value:

BTL BandwidthAuto-detect Status
Elan2000NO   Correct
GM  250 NO   Doubtful
MX  2000/1  YES (Mbs)Correct (before the patch)
OFUD800 NO   Doubtful
OpenIB  2000/4000/8000  YES (Mbs)Correct (multiplied by the 
active_width)
Portals 1000NO   Doubtful
SCTP100 NO   Conservative value (correct)
Self100 XXX  Correct (doesn't matter anyway)
SM  9000NO   Correct
TCP 100 NO   Conservative value (correct)
UDAPL   225 NO   Incorrect


Now that that patch has been rolled back out, did we come to conclusion here?

- OFUD: why do we still even have this?
- Portals: does it matter if it gets it wrong?  No one will ever multi-rail 
with it.
- TCP: we can add auto-detect code for this (But doesn't have to be right away 
-- i.e., don't make 1.5.0 wait for it).
- UDAPL: I don't think anyone will multi-rail udapl with anything.

Was the *real* problem that Brice's OpenFabrics bandwidth was auto-detected 
incorrectly somehow?


The first problem came from IB not autodetecting at all by default and
using 800Mbit/s instead. When forcing autodetect with mca parameters,
the bandwidth are not perfect but not too bad. When forcing IB manually
to the right bandwidth value, I can tweak things as needed.

Brice
Just to provide some closure on the uDAPL side, we agree with Jeff's 
comment that we do not see any demand for multi-rail uDAPL with anything.
But, we will change the uDPAL number to something more reasonable.  
Still trying to select an appropriate value.

Rolf


Re: [OMPI devel] [PATCH] fix mx btl_bandwidth

2010-09-08 Thread Jeff Squyres
On Sep 8, 2010, at 2:09 PM, Brice Goglin wrote:

>> Was the *real* problem that Brice's OpenFabrics bandwidth was auto-detected 
>> incorrectly somehow?
> 
> The first problem came from IB not autodetecting at all by default and
> using 800Mbit/s instead.

That shouldn't be the case.  Was it autodetecting incorrectly, or just not 
autodetecting at all and using 800Mbit/s?

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/




Re: [OMPI devel] [PATCH] fix mx btl_bandwidth

2010-09-08 Thread Rolf vandeVaart

 On 9/8/2010 10:41 AM, Jeff Squyres wrote:

On Sep 8, 2010, at 2:09 PM, Brice Goglin wrote:


Was the *real* problem that Brice's OpenFabrics bandwidth was auto-detected 
incorrectly somehow?

The first problem came from IB not autodetecting at all by default and
using 800Mbit/s instead.

That shouldn't be the case.  Was it autodetecting incorrectly, or just not 
autodetecting at all and using 800Mbit/s?

The way the code is currently written, it does not run the autodetect by 
default.  What happens is it takes a look at
the bandwidth value.  If the bandwidth value is 0, it will run the 
autodetect code.  If the bandwidth is non-zero, it
does not.  The bandwidth value is initially set to 800, so the 
autodetect is never run.  If you want the autodetect
to run, then you have to give it an mca parameter.  There are actually 
several you can choose.  Here is an

example on my machines.

--mca btl_openib_bandwidth_mlx4_0 0

This will then trigger the autodetect to run.  Presumably, we need to 
figure out what we want to happen

and adjust the code accordingly.

Rolf


Re: [OMPI devel] OMPI 1.5 twitter notification plugin probably broken by switch to OAUTH

2010-09-08 Thread Christopher Samuel
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 03/09/10 00:44, Ralph Castain wrote:

> Okay, I fixed this up for you on the devel trunk:

Thanks for that, I've not had a chance to look at this
any further recently as we're flat our bringing up our
new gear!  Hopefully soon..

cheers,
Chris
- -- 
 Christopher Samuel - Senior Systems Administrator
 VLSCI - Victorian Life Sciences Computational Initiative
 Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
 http://www.vlsci.unimelb.edu.au/

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.10 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAkyINeoACgkQO2KABBYQAh/CNACaAzfXue56uW82kpZn+/4rndbd
1egAoI7XoXsZPdYWuLVbGRzbUhDAmig2
=E9n1
-END PGP SIGNATURE-