Re: [OMPI devel] [PATCH] fix mx btl_bandwidth
On Sep 3, 2010, at 3:38 PM, George Bosilca wrote: > However, going over the existing BTLs I can see that some BTLs do not > correctly set this value: > > BTL BandwidthAuto-detect Status > Elan2000NO Correct > GM 250 NO Doubtful > MX 2000/1 YES (Mbs)Correct (before the patch) > OFUD800 NO Doubtful > OpenIB 2000/4000/8000 YES (Mbs)Correct (multiplied by the > active_width) > Portals 1000NO Doubtful > SCTP100 NO Conservative value (correct) > Self100 XXX Correct (doesn't matter anyway) > SM 9000NO Correct > TCP 100 NO Conservative value (correct) > UDAPL 225 NO Incorrect Now that that patch has been rolled back out, did we come to conclusion here? - OFUD: why do we still even have this? - Portals: does it matter if it gets it wrong? No one will ever multi-rail with it. - TCP: we can add auto-detect code for this (But doesn't have to be right away -- i.e., don't make 1.5.0 wait for it). - UDAPL: I don't think anyone will multi-rail udapl with anything. Was the *real* problem that Brice's OpenFabrics bandwidth was auto-detected incorrectly somehow? -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/
Re: [OMPI devel] [PATCH] fix mx btl_bandwidth
Le 08/09/2010 14:02, Jeff Squyres a écrit : > On Sep 3, 2010, at 3:38 PM, George Bosilca wrote: > > >> However, going over the existing BTLs I can see that some BTLs do not >> correctly set this value: >> >> BTL BandwidthAuto-detect Status >> Elan2000NO Correct >> GM 250 NO Doubtful >> MX 2000/1 YES (Mbs)Correct (before the patch) >> OFUD800 NO Doubtful >> OpenIB 2000/4000/8000 YES (Mbs)Correct (multiplied by the >> active_width) >> Portals 1000NO Doubtful >> SCTP100 NO Conservative value (correct) >> Self100 XXX Correct (doesn't matter anyway) >> SM 9000NO Correct >> TCP 100 NO Conservative value (correct) >> UDAPL 225 NO Incorrect >> > Now that that patch has been rolled back out, did we come to conclusion here? > > - OFUD: why do we still even have this? > - Portals: does it matter if it gets it wrong? No one will ever multi-rail > with it. > - TCP: we can add auto-detect code for this (But doesn't have to be right > away -- i.e., don't make 1.5.0 wait for it). > - UDAPL: I don't think anyone will multi-rail udapl with anything. > > Was the *real* problem that Brice's OpenFabrics bandwidth was auto-detected > incorrectly somehow? > The first problem came from IB not autodetecting at all by default and using 800Mbit/s instead. When forcing autodetect with mca parameters, the bandwidth are not perfect but not too bad. When forcing IB manually to the right bandwidth value, I can tweak things as needed. Brice
Re: [OMPI devel] [PATCH] fix mx btl_bandwidth
On 9/8/2010 8:09 AM, Brice Goglin wrote: Le 08/09/2010 14:02, Jeff Squyres a écrit : On Sep 3, 2010, at 3:38 PM, George Bosilca wrote: However, going over the existing BTLs I can see that some BTLs do not correctly set this value: BTL BandwidthAuto-detect Status Elan2000NO Correct GM 250 NO Doubtful MX 2000/1 YES (Mbs)Correct (before the patch) OFUD800 NO Doubtful OpenIB 2000/4000/8000 YES (Mbs)Correct (multiplied by the active_width) Portals 1000NO Doubtful SCTP100 NO Conservative value (correct) Self100 XXX Correct (doesn't matter anyway) SM 9000NO Correct TCP 100 NO Conservative value (correct) UDAPL 225 NO Incorrect Now that that patch has been rolled back out, did we come to conclusion here? - OFUD: why do we still even have this? - Portals: does it matter if it gets it wrong? No one will ever multi-rail with it. - TCP: we can add auto-detect code for this (But doesn't have to be right away -- i.e., don't make 1.5.0 wait for it). - UDAPL: I don't think anyone will multi-rail udapl with anything. Was the *real* problem that Brice's OpenFabrics bandwidth was auto-detected incorrectly somehow? The first problem came from IB not autodetecting at all by default and using 800Mbit/s instead. When forcing autodetect with mca parameters, the bandwidth are not perfect but not too bad. When forcing IB manually to the right bandwidth value, I can tweak things as needed. Brice Just to provide some closure on the uDAPL side, we agree with Jeff's comment that we do not see any demand for multi-rail uDAPL with anything. But, we will change the uDPAL number to something more reasonable. Still trying to select an appropriate value. Rolf
Re: [OMPI devel] [PATCH] fix mx btl_bandwidth
On Sep 8, 2010, at 2:09 PM, Brice Goglin wrote: >> Was the *real* problem that Brice's OpenFabrics bandwidth was auto-detected >> incorrectly somehow? > > The first problem came from IB not autodetecting at all by default and > using 800Mbit/s instead. That shouldn't be the case. Was it autodetecting incorrectly, or just not autodetecting at all and using 800Mbit/s? -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/
Re: [OMPI devel] [PATCH] fix mx btl_bandwidth
On 9/8/2010 10:41 AM, Jeff Squyres wrote: On Sep 8, 2010, at 2:09 PM, Brice Goglin wrote: Was the *real* problem that Brice's OpenFabrics bandwidth was auto-detected incorrectly somehow? The first problem came from IB not autodetecting at all by default and using 800Mbit/s instead. That shouldn't be the case. Was it autodetecting incorrectly, or just not autodetecting at all and using 800Mbit/s? The way the code is currently written, it does not run the autodetect by default. What happens is it takes a look at the bandwidth value. If the bandwidth value is 0, it will run the autodetect code. If the bandwidth is non-zero, it does not. The bandwidth value is initially set to 800, so the autodetect is never run. If you want the autodetect to run, then you have to give it an mca parameter. There are actually several you can choose. Here is an example on my machines. --mca btl_openib_bandwidth_mlx4_0 0 This will then trigger the autodetect to run. Presumably, we need to figure out what we want to happen and adjust the code accordingly. Rolf
Re: [OMPI devel] OMPI 1.5 twitter notification plugin probably broken by switch to OAUTH
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 03/09/10 00:44, Ralph Castain wrote: > Okay, I fixed this up for you on the devel trunk: Thanks for that, I've not had a chance to look at this any further recently as we're flat our bringing up our new gear! Hopefully soon.. cheers, Chris - -- Christopher Samuel - Senior Systems Administrator VLSCI - Victorian Life Sciences Computational Initiative Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545 http://www.vlsci.unimelb.edu.au/ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.10 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAkyINeoACgkQO2KABBYQAh/CNACaAzfXue56uW82kpZn+/4rndbd 1egAoI7XoXsZPdYWuLVbGRzbUhDAmig2 =E9n1 -END PGP SIGNATURE-