Hi Bogdan, Sorry for such a late reply to your e-mail. Glad to hear that the performance anomaly you mentioned below is now gone with 1.3rc3. But I noticed that we either didn't explain something well enough, or not at all... The cm PML does not use BTLs..., only MTLs, so your suggested commandline of: --mca pml cm --mca btl mx,sm,self does not do what you think... the BTL selection is ignored. Thus the above is equivalent to: --mca pml cm And on a machine with MX as the high speed interconnect, would be equivalent to: --mca pml cm --mca mtl mx
So, in short, the PML selection (ob1 or cm) use distinct sets of lower level drivers, with ob1 using potentially multiple BTLs, and CM using a single MTL module. This is kind of explained in this FAQ entry: http://www.open-mpi.org/faq/?category=myrinet#myri-btl-mx On Mon, Nov 17, 2008 at 12:39 PM, Bogdan Costescu <bogdan.coste...@iwr.uni-heidelberg.de> wrote: > > Hi! > > In testing the 1.3b2, I have encountered a rather strange behaviour. > First the setup: > dual-CPU dual-core x86_64 with Myrinet 10G card > self compiled Linux kernel 2.6.22.18, MX 1.2.7(*) > GCC-4.1.2 (from Debian etch), Torque 2.1.10 > OpenMPI 1.3b2 (tar.gz from download page) > IMB 3.1 > > (*) I'm actually tracking a problem together with Myricom people, so it's > not a vanilla 1.2.7, but 1.2.7 with a tiny patch; I believe that this has no > influence > > When starting an IMB run with the default settings, in all collective > communication functions I see huge jumps around 32-1024 bytes and flat > results around 1K-16K like: > > #---------------------------------------------------------------- > # Benchmarking Allgatherv > # #processes = 64 > #---------------------------------------------------------------- > #bytes #repetitions t_min[usec] t_max[usec] t_avg[usec] > 0 1000 0.19 0.21 0.19 > 1 1000 35.29 35.30 35.29 > 2 1000 36.01 36.03 36.02 > 4 1000 38.97 38.98 38.98 > 8 1000 42.12 42.13 42.13 > 16 1000 45.76 45.77 45.77 > 32 1000 19991.83 20011.84 20005.29 > 64 1000 38561.52 38599.66 38587.74 > 128 1000 58263.81 58305.74 58293.48 > 256 1000 77382.83 77425.93 77412.49 > 512 1000 95981.97 96022.70 96010.73 > 1024 1000 480838.00 481214.78 481027.05 > 2048 1000 480522.97 480917.02 480727.98 > 4096 1000 480762.69 481134.49 480955.03 > 8192 1000 481136.70 481505.36 481334.86 > 16384 1000 483629.46 483889.28 483759.38 > 32768 1000 23809.47 23810.27 23809.62 > 65536 640 7085.58 7085.91 7085.69 > 131072 320 11928.29 11929.29 11928.72 > 262144 160 22174.66 22177.67 22175.94 > 524288 80 42270.91 42283.90 42277.55 > 1048576 40 82389.85 82461.10 82428.26 > 2097152 20 161347.04 161624.54 161485.84 > 4194304 10 321467.52 322562.79 322019.24 > > This happens on various numbers of nodes and is reproducable - I have > repeated the run 5 times on 8 nodes. > > I have not seen such results with 1.2.x series with either OB1+BTL or > CM+MTL, timing increases rather smoothly. Trying various options with 1.3b2: > > --mca pml cm --mca mtl mx works well > --mca pml cm --mca btl mx,sm,self works well > --mca pml ob1 --mca btl mx,sm,self jumps like above > >> From what I know, the 1.2.x series defaulted to OB1+BTL; CM was only > > possible with a MTL which internally implemented sm and self, so second test > above would have failed (please correct me if I'm wrong). > > The README for 1.3b2 specifies that CM is now chosen if possible; in my > trials, when I specify CM+BTL, it doesn't complain and works well. > However either the default (no options) or OB1+BTL leads to the jumps > mentioned above, which makes me believe that OB1+BTL is still chosen as > default, contrary to what the README specifies. > > So there are 2 issues: > - which is right, the README or the runtime behaviour that I see ? > - is it normal for the OB1+BTL to behave so poorly with MX ? > > Thanks for any insight into this issues. > > -- > Bogdan Costescu > > IWR, University of Heidelberg, INF 368, D-69120 Heidelberg, Germany > Phone: +49 6221 54 8240, Fax: +49 6221 54 8850 > E-mail: bogdan.coste...@iwr.uni-heidelberg.de > _______________________________________________ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel > -- Tim Mattox, Ph.D. - http://homepage.mac.com/tmattox/ tmat...@gmail.com || timat...@open-mpi.org I'm a bright... http://www.the-brights.net/