Jeff and I were looking at a similar issue today and suddenly realized that the mappings were different - i.e., what ranks are on what nodes differs depending on how you launch. You might want to check if that's the issue here as well. Just launch the attached program using mpirun vs srun and check to see if the maps are the same or not.
Ralph
hello_nodename.c
Description: Binary data
On Sep 4, 2013, at 7:15 PM, Christopher Samuel <sam...@unimelb.edu.au> wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > On 04/09/13 18:33, George Bosilca wrote: > >> You can confirm that the slowdown happen during the MPI >> initialization stages by profiling the application (especially the >> MPI_Init call). > > NAMD helpfully prints benchmark and timing numbers during the initial > part of the simulation, so here's what they say. For both seconds > per step and days per nanosecond of simulation less is better. > > I've included the benchmark numbers (every 100 steps or so from the > start) and the final timing number after 25000 steps. It looks like > to me (as a sysadmin and not an MD person) that the final timing > number includes CPU time in seconds per step and wallclock time in > seconds per step. > > 64 cores over 10 nodes: > > OMPI 1.7.3a1r29103 mpirun > > Info: Benchmark time: 64 CPUs 0.410424 s/step 2.37514 days/ns 909.57 MB memory > Info: Benchmark time: 64 CPUs 0.392106 s/step 2.26913 days/ns 909.57 MB memory > Info: Benchmark time: 64 CPUs 0.313136 s/step 1.81213 days/ns 909.57 MB memory > Info: Benchmark time: 64 CPUs 0.316792 s/step 1.83329 days/ns 909.57 MB memory > Info: Benchmark time: 64 CPUs 0.313867 s/step 1.81636 days/ns 909.57 MB memory > > TIMING: 25000 CPU: 8247.2, 0.330157/step Wall: 8247.2, 0.330157/step, > 0.0229276 hours remaining, 921.894531 MB of memory in use. > > OMPI 1.7.3a1r29103 srun > > Info: Benchmark time: 64 CPUs 0.341967 s/step 1.97897 days/ns 903.883 MB > memory > Info: Benchmark time: 64 CPUs 0.339644 s/step 1.96553 days/ns 903.883 MB > memory > Info: Benchmark time: 64 CPUs 0.284424 s/step 1.64597 days/ns 903.883 MB > memory > Info: Benchmark time: 64 CPUs 0.28115 s/step 1.62702 days/ns 903.883 MB memory > Info: Benchmark time: 64 CPUs 0.279536 s/step 1.61769 days/ns 903.883 MB > memory > > TIMING: 25000 CPU: 7390.15, 0.296/step Wall: 7390.15, 0.296/step, 0.0205555 > hours remaining, 915.746094 MB of memory in use. > > > 64 cores over 18 nodes: > > OMPI 1.6.5 mpirun > > Info: Benchmark time: 64 CPUs 0.366327 s/step 2.11995 days/ns 939.527 MB > memory > Info: Benchmark time: 64 CPUs 0.359805 s/step 2.0822 days/ns 939.527 MB memory > Info: Benchmark time: 64 CPUs 0.292342 s/step 1.69179 days/ns 939.527 MB > memory > Info: Benchmark time: 64 CPUs 0.293499 s/step 1.69849 days/ns 939.527 MB > memory > Info: Benchmark time: 64 CPUs 0.292355 s/step 1.69187 days/ns 939.527 MB > memory > > TIMING: 25000 CPU: 7754.17, 0.312071/step Wall: 7754.17, 0.312071/step, > 0.0216716 hours remaining, 950.929688 MB of memory in use. > > OMPI 1.7.3a1r29103 srun > > Info: Benchmark time: 64 CPUs 0.347864 s/step 2.0131 days/ns 904.91 MB memory > Info: Benchmark time: 64 CPUs 0.346367 s/step 2.00444 days/ns 904.91 MB memory > Info: Benchmark time: 64 CPUs 0.29007 s/step 1.67865 days/ns 904.91 MB memory > Info: Benchmark time: 64 CPUs 0.279447 s/step 1.61717 days/ns 904.91 MB memory > Info: Benchmark time: 64 CPUs 0.280824 s/step 1.62514 days/ns 904.91 MB memory > > TIMING: 25000 CPU: 7420.91, 0.296029/step Wall: 7420.91, 0.296029/step, > 0.0205575 hours remaining, 916.312500 MB of memory in use. > > > Hope this is useful! > > All the best, > Chris > - -- > Christopher Samuel Senior Systems Administrator > VLSCI - Victorian Life Sciences Computation Initiative > Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545 > http://www.vlsci.org.au/ http://twitter.com/vlsci > > -----BEGIN PGP SIGNATURE----- > Version: GnuPG v1.4.11 (GNU/Linux) > Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ > > iEYEARECAAYFAlIn6UoACgkQO2KABBYQAh9GWgCghcYKSj1i9rDDQospURAeusD5 > E+EAn2beqUlYZWHxi1Dgj8ZEpiai4zH1 > =k5Uz > -----END PGP SIGNATURE----- > _______________________________________________ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel