Re: [OMPI devel] Open-MPI build of NAMD launched from srun over 20% slowed than with mpirun
You can confirm that the slowdown happen during the MPI initialization stages by profiling the application (especially the MPI_Init call). Another possible cause of slowdown might be the communication thread in the ORTE. If it remains active outside the initialization it will definitively disturb the application, by taking away critical resources. George. On Sep 4, 2013, at 05:59 , Christopher Samuel wrote: > -BEGIN PGP SIGNED MESSAGE- > Hash: SHA1 > > On 04/09/13 11:29, Ralph Castain wrote: > >> Your code is obviously doing something much more than just >> launching and wiring up, so it is difficult to assess the >> difference in speed between 1.6.5 and 1.7.3 - my guess is that it >> has to do with changes in the MPI transport layer and nothing to do >> with PMI or not. > > I'm testing with what would be our most used application in aggregate > across our systems, the NAMD molecular dynamics code from here: > > http://www.ks.uiuc.edu/Research/namd/ > > so yes, you're quite right, it's doing a lot more than that and has a > reputation for being a *very* chatty MPI code. > > For comparison whilst users see GROMACS also suffer with srun under > 1.6.5 they don't see anything like the slow down that NAMD gets. > > All the best, > Chris > - -- > Christopher SamuelSenior Systems Administrator > VLSCI - Victorian Life Sciences Computation Initiative > Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545 > http://www.vlsci.org.au/ http://twitter.com/vlsci > > -BEGIN PGP SIGNATURE- > Version: GnuPG v1.4.11 (GNU/Linux) > Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ > > iEYEARECAAYFAlImsCwACgkQO2KABBYQAh8c4wCfQlOd6ZL68tncAd1h3Fyb1hAr > DicAn06seL8GzYPGtGImnYkb7sYd5op9 > =pkwZ > -END PGP SIGNATURE- > ___ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel
[OMPI devel] MPI_Is_thread_main() with provided=MPI_THREAD_SERIALIZED
I'm using Open MPI 1.6.5 as packaged in Fedora 19. This build does not enable THREAD_MULTIPLE support: $ ompi_info | grep Thread Thread support: posix (MPI_THREAD_MULTIPLE: no, progress: no) In my code I call MPI_Init_thread(required=MPI_THREAD_MULTIPLE). After that, MPI_Query_thread() returns MPI_THREAD_SERIALIZED. But calling MPI_Is_thread_main() always return TRUE, either in the main thread or in newly spawned threads. I think this code is wrong for the case provided==MPI_THREAD_SERIALIZED : https://bitbucket.org/ompiteam/ompi-svn-mirror/src/0a159982d7204d4b4b9fa61771d0fc7e9dc16771/ompi/mpi/c/is_thread_main.c?at=default#cl-50 -- Lisandro Dalcin --- CIMEC (INTEC/CONICET-UNL) Predio CONICET-Santa Fe Colectora RN 168 Km 472, Paraje El Pozo 3000 Santa Fe, Argentina Tel: +54-342-4511594 (ext 1011) Tel/Fax: +54-342-4511169
Re: [OMPI devel] MPI_Is_thread_main() with provided=MPI_THREAD_SERIALIZED
You're in the SERIALIZED mode, so any thread can make MPI calls. As in such mode there is no notion of thread_main, consistently returning true out of MPI_Is_thread_main seem like a reasonable approach. This function will have a different behavior in the FUNNELED mode. George. On Sep 4, 2013, at 12:06 , Lisandro Dalcin wrote: > I'm using Open MPI 1.6.5 as packaged in Fedora 19. This build does not > enable THREAD_MULTIPLE support: > > $ ompi_info | grep Thread > Thread support: posix (MPI_THREAD_MULTIPLE: no, progress: no) > > In my code I call MPI_Init_thread(required=MPI_THREAD_MULTIPLE). After > that, MPI_Query_thread() returns MPI_THREAD_SERIALIZED. But calling > MPI_Is_thread_main() always return TRUE, either in the main thread or > in newly spawned threads. > > I think this code is wrong for the case provided==MPI_THREAD_SERIALIZED : > https://bitbucket.org/ompiteam/ompi-svn-mirror/src/0a159982d7204d4b4b9fa61771d0fc7e9dc16771/ompi/mpi/c/is_thread_main.c?at=default#cl-50 > > > -- > Lisandro Dalcin > --- > CIMEC (INTEC/CONICET-UNL) > Predio CONICET-Santa Fe > Colectora RN 168 Km 472, Paraje El Pozo > 3000 Santa Fe, Argentina > Tel: +54-342-4511594 (ext 1011) > Tel/Fax: +54-342-4511169 > ___ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel
Re: [OMPI devel] MPI_Is_thread_main() with provided=MPI_THREAD_SERIALIZED
OK, I take that back. Based on the MPI standard (age 488) only the thread that called MPI_Init or MPI_Init_thread must return true in this case. The logic I was exposing in my previous email is left to the user. George. On Sep 4, 2013, at 12:11 , George Bosilca wrote: > You're in the SERIALIZED mode, so any thread can make MPI calls. As in such > mode there is no notion of thread_main, consistently returning true out of > MPI_Is_thread_main seem like a reasonable approach. > > This function will have a different behavior in the FUNNELED mode. > > George. > > On Sep 4, 2013, at 12:06 , Lisandro Dalcin wrote: > >> I'm using Open MPI 1.6.5 as packaged in Fedora 19. This build does not >> enable THREAD_MULTIPLE support: >> >> $ ompi_info | grep Thread >> Thread support: posix (MPI_THREAD_MULTIPLE: no, progress: no) >> >> In my code I call MPI_Init_thread(required=MPI_THREAD_MULTIPLE). After >> that, MPI_Query_thread() returns MPI_THREAD_SERIALIZED. But calling >> MPI_Is_thread_main() always return TRUE, either in the main thread or >> in newly spawned threads. >> >> I think this code is wrong for the case provided==MPI_THREAD_SERIALIZED : >> https://bitbucket.org/ompiteam/ompi-svn-mirror/src/0a159982d7204d4b4b9fa61771d0fc7e9dc16771/ompi/mpi/c/is_thread_main.c?at=default#cl-50 >> >> >> -- >> Lisandro Dalcin >> --- >> CIMEC (INTEC/CONICET-UNL) >> Predio CONICET-Santa Fe >> Colectora RN 168 Km 472, Paraje El Pozo >> 3000 Santa Fe, Argentina >> Tel: +54-342-4511594 (ext 1011) >> Tel/Fax: +54-342-4511169 >> ___ >> devel mailing list >> de...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/devel >
Re: [OMPI devel] Open-MPI build of NAMD launched from srun over 20% slowed than with mpirun
This is 1.7.3 - there is no comm thread in ORTE in that version. On Sep 4, 2013, at 1:33 AM, George Bosilca wrote: > You can confirm that the slowdown happen during the MPI initialization stages > by profiling the application (especially the MPI_Init call). > > Another possible cause of slowdown might be the communication thread in the > ORTE. If it remains active outside the initialization it will definitively > disturb the application, by taking away critical resources. > > George. > > On Sep 4, 2013, at 05:59 , Christopher Samuel wrote: > >> -BEGIN PGP SIGNED MESSAGE- >> Hash: SHA1 >> >> On 04/09/13 11:29, Ralph Castain wrote: >> >>> Your code is obviously doing something much more than just >>> launching and wiring up, so it is difficult to assess the >>> difference in speed between 1.6.5 and 1.7.3 - my guess is that it >>> has to do with changes in the MPI transport layer and nothing to do >>> with PMI or not. >> >> I'm testing with what would be our most used application in aggregate >> across our systems, the NAMD molecular dynamics code from here: >> >> http://www.ks.uiuc.edu/Research/namd/ >> >> so yes, you're quite right, it's doing a lot more than that and has a >> reputation for being a *very* chatty MPI code. >> >> For comparison whilst users see GROMACS also suffer with srun under >> 1.6.5 they don't see anything like the slow down that NAMD gets. >> >> All the best, >> Chris >> - -- >> Christopher SamuelSenior Systems Administrator >> VLSCI - Victorian Life Sciences Computation Initiative >> Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545 >> http://www.vlsci.org.au/ http://twitter.com/vlsci >> >> -BEGIN PGP SIGNATURE- >> Version: GnuPG v1.4.11 (GNU/Linux) >> Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ >> >> iEYEARECAAYFAlImsCwACgkQO2KABBYQAh8c4wCfQlOd6ZL68tncAd1h3Fyb1hAr >> DicAn06seL8GzYPGtGImnYkb7sYd5op9 >> =pkwZ >> -END PGP SIGNATURE- >> ___ >> devel mailing list >> de...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/devel > > ___ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel
Re: [OMPI devel] Open-MPI build of NAMD launched from srun over 20% slowed than with mpirun
On Sep 4, 2013, at 4:33 AM, George Bosilca wrote: > You can confirm that the slowdown happen during the MPI initialization stages > by profiling the application (especially the MPI_Init call). You can also try just launching "MPI hello world" (i.e., examples/hello_c.c). It just calls MPI_INIT / MPI_FINALIZE. Additionally, you might want to try launching the ring program, too (examples/ring_c.c). That program sends a small message around in a ring, which forces some MPI communication to occur, and therefore does at least some level of setup in the BTLs, etc. (remember: most BTLs are lazy-connect, so they don't actually do anything until the first send. So a simple "ring" program sets up *some* BTL connections, but not nearly all of them). -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/
[OMPI devel] [bugs] OSC-related datatype bugs
Hi, I and my colleague found 3 OSC-related bugs in OMPI datatype code. One for trunk and v1.6/v1.7 branches, and two for only v1.6 branch. (1) OMPI_DATATYPE_ALIGN_PTR should be placed after memcpy Last year I reported a bug in OMPI datatype code and it was fixed in r25721. But the fix was not correct and the problem still exists. My reported bug and the patch: http://www.open-mpi.org/community/lists/devel/2012/01/10207.php r25721: https://svn.open-mpi.org/trac/ompi/changeset/25721 OMPI_DATATYPE_ALIGN_PTR should be placed after memcpy in __ompi_datatype_pack_description function, like the patch attached in my previous mail. I didn't confirm r25721 well when it was committed, sorry. The attached file datatype-align.patch is the correct patch for the latest trunk. This fix should be applied to trunk and v1.7/v1.6 branches. (2) r28790 should be merged into v1.6 The trunk changeset r28790 had been merged into v1.7 in r28790 (ticket #3673), but it is not yet merged into v1.6. I confirmed the problem reported last month also occurs in v1.6 and can be fixed by merging r28790 into v1.6. The original reported problem: http://www.open-mpi.org/community/lists/devel/2013/07/12595.php (3) OMPI_DATATYPE_MAX_PREDEFINED should be 46 for v1.6 In v1.6 branch, ompi/datatype/ompi_datatype.h defines OMPI_DATATYPE_MAX_PREDEFINED as 45 but the number of predefined datatypes is 46 and the last predefined datatype ID (OMPI_DATATYPE_MPI_UB) is 45. OMPI_DATATYPE_MAX_PREDEFINED is used as the number of predefined datatypes or maximum predefined datatype ID + 1, not the maximum predefined datatype ID, like below. ompi/op/op.c:79: // the number of predefined datatypes int ompi_op_ddt_map[OMPI_DATATYPE_MAX_PREDEFINED]; ompi/datatype/ompi_datatype_args.c:573: // maximum predefined datatype ID + 1 assert( data_id < OMPI_DATATYPE_MAX_PREDEFINED ); ompi/datatype/ompi_datatype_args.c:492: // first unused datatype ID // (= maximum predefined datatype ID + 1) int next_index = OMPI_DATATYPE_MAX_PREDEFINED; So its value should be 46 for v1.6. Actually, at r28932 in trunk, one datatype (MPI_Count) is added but OMPI_DATATYPE_MAX_PREDEFINED is increased from 45 to 47. So current trunk is correct. This bug causes a random error, like SEGV, "Error recreating datatype", or "received packet for Window with unknown type", if you use MPI_UB in OSC, like the attached program osc_ub.c. Regards, Takahiro Kawashima, MPI development team, Fujitsu Index: ompi/datatype/ompi_datatype_args.c === --- ompi/datatype/ompi_datatype_args.c (revision 29064) +++ ompi/datatype/ompi_datatype_args.c (working copy) @@ -467,12 +467,13 @@ position = (int*)next_packed; next_packed += sizeof(int) * args->cd; -/* description of next datatype should be 64 bits aligned */ -OMPI_DATATYPE_ALIGN_PTR(next_packed, char*); /* copy the aray of counts (32 bits aligned) */ memcpy( next_packed, args->i, sizeof(int) * args->ci ); next_packed += args->ci * sizeof(int); +/* description of next datatype should be 64 bits aligned */ +OMPI_DATATYPE_ALIGN_PTR(next_packed, char*); + /* copy the rest of the data */ for( i = 0; i < args->cd; i++ ) { ompi_datatype_t* temp_data = args->d[i]; #include #include int main(int argc, char *argv[]) { int size, rank; MPI_Win win; MPI_Datatype datatype; MPI_Datatype datatypes[] = {MPI_INT, MPI_UB}; int blengths[] = {1, 1}; MPI_Aint displs[] = {0, sizeof(int)}; int buf[] = {0}; MPI_Init(&argc, &argv); MPI_Comm_size(MPI_COMM_WORLD, &size); MPI_Comm_rank(MPI_COMM_WORLD, &rank); if (size < 2) { fprintf(stderr, "Needs at least 2 processes\n"); MPI_Abort(MPI_COMM_WORLD, 1); } MPI_Type_create_struct(2, blengths, displs, datatypes, &datatype); MPI_Type_commit(&datatype); MPI_Win_create(buf, sizeof(int), 1, MPI_INFO_NULL, MPI_COMM_WORLD, &win); MPI_Win_fence(0, win); if (rank == 0) { MPI_Put(buf, 1, datatype, 1, 0, 1, datatype, win); } MPI_Win_fence(0, win); MPI_Win_free(&win); MPI_Type_free(&datatype); MPI_Finalize(); return 0; }
Re: [OMPI devel] Open-MPI build of NAMD launched from srun over 20% slowed than with mpirun
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 04/09/13 18:33, George Bosilca wrote: > You can confirm that the slowdown happen during the MPI > initialization stages by profiling the application (especially the > MPI_Init call). NAMD helpfully prints benchmark and timing numbers during the initial part of the simulation, so here's what they say. For both seconds per step and days per nanosecond of simulation less is better. I've included the benchmark numbers (every 100 steps or so from the start) and the final timing number after 25000 steps. It looks like to me (as a sysadmin and not an MD person) that the final timing number includes CPU time in seconds per step and wallclock time in seconds per step. 64 cores over 10 nodes: OMPI 1.7.3a1r29103 mpirun Info: Benchmark time: 64 CPUs 0.410424 s/step 2.37514 days/ns 909.57 MB memory Info: Benchmark time: 64 CPUs 0.392106 s/step 2.26913 days/ns 909.57 MB memory Info: Benchmark time: 64 CPUs 0.313136 s/step 1.81213 days/ns 909.57 MB memory Info: Benchmark time: 64 CPUs 0.316792 s/step 1.83329 days/ns 909.57 MB memory Info: Benchmark time: 64 CPUs 0.313867 s/step 1.81636 days/ns 909.57 MB memory TIMING: 25000 CPU: 8247.2, 0.330157/step Wall: 8247.2, 0.330157/step, 0.0229276 hours remaining, 921.894531 MB of memory in use. OMPI 1.7.3a1r29103 srun Info: Benchmark time: 64 CPUs 0.341967 s/step 1.97897 days/ns 903.883 MB memory Info: Benchmark time: 64 CPUs 0.339644 s/step 1.96553 days/ns 903.883 MB memory Info: Benchmark time: 64 CPUs 0.284424 s/step 1.64597 days/ns 903.883 MB memory Info: Benchmark time: 64 CPUs 0.28115 s/step 1.62702 days/ns 903.883 MB memory Info: Benchmark time: 64 CPUs 0.279536 s/step 1.61769 days/ns 903.883 MB memory TIMING: 25000 CPU: 7390.15, 0.296/step Wall: 7390.15, 0.296/step, 0.020 hours remaining, 915.746094 MB of memory in use. 64 cores over 18 nodes: OMPI 1.6.5 mpirun Info: Benchmark time: 64 CPUs 0.366327 s/step 2.11995 days/ns 939.527 MB memory Info: Benchmark time: 64 CPUs 0.359805 s/step 2.0822 days/ns 939.527 MB memory Info: Benchmark time: 64 CPUs 0.292342 s/step 1.69179 days/ns 939.527 MB memory Info: Benchmark time: 64 CPUs 0.293499 s/step 1.69849 days/ns 939.527 MB memory Info: Benchmark time: 64 CPUs 0.292355 s/step 1.69187 days/ns 939.527 MB memory TIMING: 25000 CPU: 7754.17, 0.312071/step Wall: 7754.17, 0.312071/step, 0.0216716 hours remaining, 950.929688 MB of memory in use. OMPI 1.7.3a1r29103 srun Info: Benchmark time: 64 CPUs 0.347864 s/step 2.0131 days/ns 904.91 MB memory Info: Benchmark time: 64 CPUs 0.346367 s/step 2.00444 days/ns 904.91 MB memory Info: Benchmark time: 64 CPUs 0.29007 s/step 1.67865 days/ns 904.91 MB memory Info: Benchmark time: 64 CPUs 0.279447 s/step 1.61717 days/ns 904.91 MB memory Info: Benchmark time: 64 CPUs 0.280824 s/step 1.62514 days/ns 904.91 MB memory TIMING: 25000 CPU: 7420.91, 0.296029/step Wall: 7420.91, 0.296029/step, 0.0205575 hours remaining, 916.312500 MB of memory in use. Hope this is useful! All the best, Chris - -- Christopher SamuelSenior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545 http://www.vlsci.org.au/ http://twitter.com/vlsci -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.11 (GNU/Linux) Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iEYEARECAAYFAlIn6UoACgkQO2KABBYQAh9GWgCghcYKSj1i9rDDQospURAeusD5 E+EAn2beqUlYZWHxi1Dgj8ZEpiai4zH1 =k5Uz -END PGP SIGNATURE-
Re: [OMPI devel] Open-MPI build of NAMD launched from srun over 20% slowed than with mpirun
Jeff and I were looking at a similar issue today and suddenly realized that the mappings were different - i.e., what ranks are on what nodes differs depending on how you launch. You might want to check if that's the issue here as well. Just launch the attached program using mpirun vs srun and check to see if the maps are the same or not. Ralph hello_nodename.c Description: Binary data On Sep 4, 2013, at 7:15 PM, Christopher Samuel wrote: > -BEGIN PGP SIGNED MESSAGE- > Hash: SHA1 > > On 04/09/13 18:33, George Bosilca wrote: > >> You can confirm that the slowdown happen during the MPI >> initialization stages by profiling the application (especially the >> MPI_Init call). > > NAMD helpfully prints benchmark and timing numbers during the initial > part of the simulation, so here's what they say. For both seconds > per step and days per nanosecond of simulation less is better. > > I've included the benchmark numbers (every 100 steps or so from the > start) and the final timing number after 25000 steps. It looks like > to me (as a sysadmin and not an MD person) that the final timing > number includes CPU time in seconds per step and wallclock time in > seconds per step. > > 64 cores over 10 nodes: > > OMPI 1.7.3a1r29103 mpirun > > Info: Benchmark time: 64 CPUs 0.410424 s/step 2.37514 days/ns 909.57 MB memory > Info: Benchmark time: 64 CPUs 0.392106 s/step 2.26913 days/ns 909.57 MB memory > Info: Benchmark time: 64 CPUs 0.313136 s/step 1.81213 days/ns 909.57 MB memory > Info: Benchmark time: 64 CPUs 0.316792 s/step 1.83329 days/ns 909.57 MB memory > Info: Benchmark time: 64 CPUs 0.313867 s/step 1.81636 days/ns 909.57 MB memory > > TIMING: 25000 CPU: 8247.2, 0.330157/step Wall: 8247.2, 0.330157/step, > 0.0229276 hours remaining, 921.894531 MB of memory in use. > > OMPI 1.7.3a1r29103 srun > > Info: Benchmark time: 64 CPUs 0.341967 s/step 1.97897 days/ns 903.883 MB > memory > Info: Benchmark time: 64 CPUs 0.339644 s/step 1.96553 days/ns 903.883 MB > memory > Info: Benchmark time: 64 CPUs 0.284424 s/step 1.64597 days/ns 903.883 MB > memory > Info: Benchmark time: 64 CPUs 0.28115 s/step 1.62702 days/ns 903.883 MB memory > Info: Benchmark time: 64 CPUs 0.279536 s/step 1.61769 days/ns 903.883 MB > memory > > TIMING: 25000 CPU: 7390.15, 0.296/step Wall: 7390.15, 0.296/step, 0.020 > hours remaining, 915.746094 MB of memory in use. > > > 64 cores over 18 nodes: > > OMPI 1.6.5 mpirun > > Info: Benchmark time: 64 CPUs 0.366327 s/step 2.11995 days/ns 939.527 MB > memory > Info: Benchmark time: 64 CPUs 0.359805 s/step 2.0822 days/ns 939.527 MB memory > Info: Benchmark time: 64 CPUs 0.292342 s/step 1.69179 days/ns 939.527 MB > memory > Info: Benchmark time: 64 CPUs 0.293499 s/step 1.69849 days/ns 939.527 MB > memory > Info: Benchmark time: 64 CPUs 0.292355 s/step 1.69187 days/ns 939.527 MB > memory > > TIMING: 25000 CPU: 7754.17, 0.312071/step Wall: 7754.17, 0.312071/step, > 0.0216716 hours remaining, 950.929688 MB of memory in use. > > OMPI 1.7.3a1r29103 srun > > Info: Benchmark time: 64 CPUs 0.347864 s/step 2.0131 days/ns 904.91 MB memory > Info: Benchmark time: 64 CPUs 0.346367 s/step 2.00444 days/ns 904.91 MB memory > Info: Benchmark time: 64 CPUs 0.29007 s/step 1.67865 days/ns 904.91 MB memory > Info: Benchmark time: 64 CPUs 0.279447 s/step 1.61717 days/ns 904.91 MB memory > Info: Benchmark time: 64 CPUs 0.280824 s/step 1.62514 days/ns 904.91 MB memory > > TIMING: 25000 CPU: 7420.91, 0.296029/step Wall: 7420.91, 0.296029/step, > 0.0205575 hours remaining, 916.312500 MB of memory in use. > > > Hope this is useful! > > All the best, > Chris > - -- > Christopher SamuelSenior Systems Administrator > VLSCI - Victorian Life Sciences Computation Initiative > Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545 > http://www.vlsci.org.au/ http://twitter.com/vlsci > > -BEGIN PGP SIGNATURE- > Version: GnuPG v1.4.11 (GNU/Linux) > Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ > > iEYEARECAAYFAlIn6UoACgkQO2KABBYQAh9GWgCghcYKSj1i9rDDQospURAeusD5 > E+EAn2beqUlYZWHxi1Dgj8ZEpiai4zH1 > =k5Uz > -END PGP SIGNATURE- > ___ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel