Re: [OMPI devel] RFC: Force Slurm to use PMI-1 unless PMI-2 is specifically requested

Ralph Castain Wed, 7 May 2014 04:01:12 -0400 (EDT)

Interesting - how many nodes were involved? As I said, the bad scaling becomes 
more evident at a fairly high node count.


On May 7, 2014, at 12:07 AM, Christopher Samuel <[email protected]> wrote:

> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> 
> Hiya Ralph,
> 
> On 07/05/14 14:49, Ralph Castain wrote:
> 
>> I should have looked closer to see the numbers you posted, Chris -
>> those include time for MPI wireup. So what you are seeing is that
>> mpirun is much more efficient at exchanging the MPI endpoint info
>> than PMI. I suspect that PMI2 is not much better as the primary
>> reason for the difference is that mpriun sends blobs, while PMI
>> requires that everything be encoded into strings and sent in little
>> pieces.
>> 
>> Hence, mpirun can exchange the endpoint info (the dreaded "modex"
>> operation) much faster, and MPI_Init completes faster. Rest of the
>> computation should be the same, so long compute apps will see the
>> difference narrow considerably.
> 
> Unfortunately it looks like I had an enthusiastic cleanup at some point
> and so I cannot find the out files from those runs at the moment, but
> I did find some comparisons from around that time.
> 
> This first pair are comparing running NAMD with OMPI 1.7.3a1r29103
> run with mpirun and srun successively from inside the same Slurm job.
> 
> mpirun namd2 macpf.conf 
> srun --mpi=pmi2 namd2 macpf.conf 
> 
> Firstly the mpirun output (grep'ing the interesting bits):
> 
> Charm++> Running on MPI version: 2.1
> Info: Benchmark time: 512 CPUs 0.0959179 s/step 0.555081 days/ns 1055.19 MB 
> memory
> Info: Benchmark time: 512 CPUs 0.0929002 s/step 0.537617 days/ns 1055.19 MB 
> memory
> Info: Benchmark time: 512 CPUs 0.0727373 s/step 0.420933 days/ns 1055.19 MB 
> memory
> Info: Benchmark time: 512 CPUs 0.0779532 s/step 0.451118 days/ns 1055.19 MB 
> memory
> Info: Benchmark time: 512 CPUs 0.0785246 s/step 0.454425 days/ns 1055.19 MB 
> memory
> WallClock: 1403.388550  CPUTime: 1403.388550  Memory: 1119.085938 MB
> 
> Now the srun output:
> 
> Charm++> Running on MPI version: 2.1
> Info: Benchmark time: 512 CPUs 0.0906865 s/step 0.524806 days/ns 1036.75 MB 
> memory
> Info: Benchmark time: 512 CPUs 0.0874809 s/step 0.506255 days/ns 1036.75 MB 
> memory
> Info: Benchmark time: 512 CPUs 0.0746328 s/step 0.431903 days/ns 1036.75 MB 
> memory
> Info: Benchmark time: 512 CPUs 0.0726161 s/step 0.420232 days/ns 1036.75 MB 
> memory
> Info: Benchmark time: 512 CPUs 0.0710574 s/step 0.411212 days/ns 1036.75 MB 
> memory
> WallClock: 1230.784424  CPUTime: 1230.784424  Memory: 1100.648438 MB
> 
> 
> The next two pairs are first launched using mpirun from 1.6.x and then with 
> srun
> from 1.7.3a1r29103.  Again each pair inside the same Slurm job with the same 
> inputs.
> 
> First pair mpirun:
> 
> Charm++> Running on MPI version: 2.1
> Info: Benchmark time: 64 CPUs 0.410424 s/step 2.37514 days/ns 909.57 MB memory
> Info: Benchmark time: 64 CPUs 0.392106 s/step 2.26913 days/ns 909.57 MB memory
> Info: Benchmark time: 64 CPUs 0.313136 s/step 1.81213 days/ns 909.57 MB memory
> Info: Benchmark time: 64 CPUs 0.316792 s/step 1.83329 days/ns 909.57 MB memory
> Info: Benchmark time: 64 CPUs 0.313867 s/step 1.81636 days/ns 909.57 MB memory
> WallClock: 8341.524414  CPUTime: 8341.524414  Memory: 975.015625 MB
> 
> First pair srun:
> 
> Charm++> Running on MPI version: 2.1
> Info: Benchmark time: 64 CPUs 0.341967 s/step 1.97897 days/ns 903.883 MB 
> memory
> Info: Benchmark time: 64 CPUs 0.339644 s/step 1.96553 days/ns 903.883 MB 
> memory
> Info: Benchmark time: 64 CPUs 0.284424 s/step 1.64597 days/ns 903.883 MB 
> memory
> Info: Benchmark time: 64 CPUs 0.28115 s/step 1.62702 days/ns 903.883 MB memory
> Info: Benchmark time: 64 CPUs 0.279536 s/step 1.61769 days/ns 903.883 MB 
> memory
> WallClock: 7476.643555  CPUTime: 7476.643555  Memory: 968.867188 MB
> 
> 
> Second pair mpirun:
> 
> Charm++> Running on MPI version: 2.1
> Info: Benchmark time: 64 CPUs 0.366327 s/step 2.11995 days/ns 939.527 MB 
> memory
> Info: Benchmark time: 64 CPUs 0.359805 s/step 2.0822 days/ns 939.527 MB memory
> Info: Benchmark time: 64 CPUs 0.292342 s/step 1.69179 days/ns 939.527 MB 
> memory
> Info: Benchmark time: 64 CPUs 0.293499 s/step 1.69849 days/ns 939.527 MB 
> memory
> Info: Benchmark time: 64 CPUs 0.292355 s/step 1.69187 days/ns 939.527 MB 
> memory
> WallClock: 7842.831543  CPUTime: 7842.831543  Memory: 1004.050781 MB
> 
> Second pair srun:
> 
> Charm++> Running on MPI version: 2.1
> Info: Benchmark time: 64 CPUs 0.347864 s/step 2.0131 days/ns 904.91 MB memory
> Info: Benchmark time: 64 CPUs 0.346367 s/step 2.00444 days/ns 904.91 MB memory
> Info: Benchmark time: 64 CPUs 0.29007 s/step 1.67865 days/ns 904.91 MB memory
> Info: Benchmark time: 64 CPUs 0.279447 s/step 1.61717 days/ns 904.91 MB memory
> Info: Benchmark time: 64 CPUs 0.280824 s/step 1.62514 days/ns 904.91 MB memory
> WallClock: 7522.677246  CPUTime: 7522.677246  Memory: 969.433594 MB
> 
> 
> So to me it looks like (for NAMD on our system at least) that
> PMI2 does seem to give better scalability.
> 
> All the best!
> Chris
> - -- 
> Christopher Samuel        Senior Systems Administrator
> VLSCI - Victorian Life Sciences Computation Initiative
> Email: [email protected] Phone: +61 (0)3 903 55545
> http://www.vlsci.org.au/      http://twitter.com/vlsci
> 
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.14 (GNU/Linux)
> Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/
> 
> iEYEARECAAYFAlNp28UACgkQO2KABBYQAh8hagCfewbbxUR6grg5R40GrwjtIZV0
> 1KYAn2uX0yKLdOEbkHARKouzwFilaTTD
> =A/Yw
> -----END PGP SIGNATURE-----
> _______________________________________________
> devel mailing list
> [email protected]
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post: 
> http://www.open-mpi.org/community/lists/devel/2014/05/14697.php

Re: [OMPI devel] RFC: Force Slurm to use PMI-1 unless PMI-2 is specifically requested

Reply via email to