Re: [gmx-users] Scaling/performance on Gromacs 4
Apologies for reviving such an old thread. For clarifications, interlagos and bulldozer both have a modular architecture, as mentioned earlier. Each bulldozer module has 2 integer cores and one floating point unit shared between the two cores. So, although you have 64 cores (counting integer cores) reported by the os, the number of floating point units is still 32. Moreover, each FP unit can process two threads when it is possible, but since gromacs is so compute intensive I am guessing it is saturated by just one. Hence you are not observing a scale-up by moving from 32 to 64 threads. Regards, Manu Vajpai IIT Kanpur On Fri, Mar 16, 2012 at 4:24 PM, Szilárd Páll szilard.p...@cbr.su.sewrote: Hi Sara, The bad performance you are seeing is most probably caused by the combination of the new AMD Interlagos CPUs, compiler, operating system and it is very likely the the old Gromacs version also contributes. In practice these new CPUs don't perform as well as expected, but that is partly due to compilers and operating systems not having full support for the new architecture. However, based on the quite extensive benchmarking I've done, the with such a large system should be considerably better than what your numbers show. This is what you should try: - compile Gromacs with gcc 4.6 using the -march=bdver1 optimization flag; - have at least 3.0 or preferably newer Linux kernel; - if you're not required to use 4.0.x, use 4.5. Note that you have to be careful with drawing conclusions from benchmarking on small number of cores with large systems; you will get artifacts from caching effects. And now a bit of fairly technical explanation, for more details ask Google ;) The machine you are using has AMD Interlagos CPUs based on the Bulldozer micro-architecture. This is a new architecture, a departure from previous AMD processors and in fact quite different from most current CPUs. Bulldozer cores are not the traditional physical cores. In fact the hardware unit is the module which consists of two half cores (at least when it comes to floating point units). and enable a special type of multithreading called clustered multithreading. This is slightly similar to the Intel cores with Hyper-Threading. Cheers, -- Szilárd On Mon, Feb 20, 2012 at 5:12 PM, Sara Campos srrcam...@gmail.com wrote: Dear GROMACS users My group has had access to a quad processor, 64 core machine (4 x Opteron 6274 @ 2.2 GHz with 16 cores) and I made some performance tests, using the following specifications: System size: 299787 atoms Number of MD steps: 1500 Electrostatics treatment: PME Gromacs version: 4.0.4 MPI: LAM Command ran: mpirun -ssi rpi tcp C mdrun_mpi ... #CPUS Time (s) Steps/s 64 195.000 7.69 32 192.000 7.81 16 275.000 5.45 8 381.000 3.94 4 751.000 2.00 2 1001.000 1.50 1 2352.000 0.64 The scaling is not good. But the weirdest is the 64 processors performing the same as 32. I see the plots from Dr. Hess on the GROMACS 4 paper on JCTC and I do not understand why this is happening. Can anyone help? Thanks in advance, Sara -- gmx-users mailing listgmx-users@gromacs.org http://lists.gromacs.org/mailman/listinfo/gmx-users Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/Search before posting! Please don't post (un)subscribe requests to the list. Use the www interface or send it to gmx-users-requ...@gromacs.org. Can't post? Read http://www.gromacs.org/Support/Mailing_Lists -- gmx-users mailing listgmx-users@gromacs.org http://lists.gromacs.org/mailman/listinfo/gmx-users Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/Search before posting! Please don't post (un)subscribe requests to the list. Use the www interface or send it to gmx-users-requ...@gromacs.org. Can't post? Read http://www.gromacs.org/Support/Mailing_Lists -- gmx-users mailing listgmx-users@gromacs.org http://lists.gromacs.org/mailman/listinfo/gmx-users Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/Search before posting! Please don't post (un)subscribe requests to the list. Use the www interface or send it to gmx-users-requ...@gromacs.org. Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
Re: [gmx-users] Scaling/performance on Gromacs 4
Hi Sara, The bad performance you are seeing is most probably caused by the combination of the new AMD Interlagos CPUs, compiler, operating system and it is very likely the the old Gromacs version also contributes. In practice these new CPUs don't perform as well as expected, but that is partly due to compilers and operating systems not having full support for the new architecture. However, based on the quite extensive benchmarking I've done, the with such a large system should be considerably better than what your numbers show. This is what you should try: - compile Gromacs with gcc 4.6 using the -march=bdver1 optimization flag; - have at least 3.0 or preferably newer Linux kernel; - if you're not required to use 4.0.x, use 4.5. Note that you have to be careful with drawing conclusions from benchmarking on small number of cores with large systems; you will get artifacts from caching effects. And now a bit of fairly technical explanation, for more details ask Google ;) The machine you are using has AMD Interlagos CPUs based on the Bulldozer micro-architecture. This is a new architecture, a departure from previous AMD processors and in fact quite different from most current CPUs. Bulldozer cores are not the traditional physical cores. In fact the hardware unit is the module which consists of two half cores (at least when it comes to floating point units). and enable a special type of multithreading called clustered multithreading. This is slightly similar to the Intel cores with Hyper-Threading. Cheers, -- Szilárd On Mon, Feb 20, 2012 at 5:12 PM, Sara Campos srrcam...@gmail.com wrote: Dear GROMACS users My group has had access to a quad processor, 64 core machine (4 x Opteron 6274 @ 2.2 GHz with 16 cores) and I made some performance tests, using the following specifications: System size: 299787 atoms Number of MD steps: 1500 Electrostatics treatment: PME Gromacs version: 4.0.4 MPI: LAM Command ran: mpirun -ssi rpi tcp C mdrun_mpi ... #CPUS Time (s) Steps/s 64 195.000 7.69 32 192.000 7.81 16 275.000 5.45 8 381.000 3.94 4 751.000 2.00 2 1001.000 1.50 1 2352.000 0.64 The scaling is not good. But the weirdest is the 64 processors performing the same as 32. I see the plots from Dr. Hess on the GROMACS 4 paper on JCTC and I do not understand why this is happening. Can anyone help? Thanks in advance, Sara -- gmx-users mailing list gmx-users@gromacs.org http://lists.gromacs.org/mailman/listinfo/gmx-users Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/Search before posting! Please don't post (un)subscribe requests to the list. Use the www interface or send it to gmx-users-requ...@gromacs.org. Can't post? Read http://www.gromacs.org/Support/Mailing_Lists -- gmx-users mailing listgmx-users@gromacs.org http://lists.gromacs.org/mailman/listinfo/gmx-users Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/Search before posting! Please don't post (un)subscribe requests to the list. Use the www interface or send it to gmx-users-requ...@gromacs.org. Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
[gmx-users] Scaling/performance on Gromacs 4
Dear GROMACS users My group has had access to a quad processor, 64 core machine (4 x Opteron 6274 @ 2.2 GHz with 16 cores) and I made some performance tests, using the following specifications: System size: 299787 atoms Number of MD steps: 1500 Electrostatics treatment: PME Gromacs version: 4.0.4 MPI: LAM Command ran: mpirun -ssi rpi tcp C mdrun_mpi ... #CPUS Time (s) Steps/s 64 195.000 7.69 32 192.000 7.81 16 275.000 5.45 8 381.000 3.94 4 751.000 2.00 2 1001.000 1.50 1 2352.000 0.64 The scaling is not good. But the weirdest is the 64 processors performing the same as 32. I see the plots from Dr. Hess on the GROMACS 4 paper on JCTC and I do not understand why this is happening. Can anyone help? Thanks in advance, Sara -- gmx-users mailing listgmx-users@gromacs.org http://lists.gromacs.org/mailman/listinfo/gmx-users Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/Search before posting! Please don't post (un)subscribe requests to the list. Use the www interface or send it to gmx-users-requ...@gromacs.org. Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
Re: [gmx-users] Scaling/performance on Gromacs 4
Hi Sara, my guess is that 1500 steps are not at all sufficient for a benchmark on 64 cores. The dynamic load balancing will need more time to adapt the domain sizes for optimal balance. It is also important that you reset the timers when the load is balanced (to get clean performance numbers); you might want to use the -resethway switch for that. g_tune_pme will help you find the performance optimum on any number of nodes, from 4.5 on it is included in Gromacs. Carsten Am Feb 20, 2012 um 5:12 PM schrieb Sara Campos: Dear GROMACS users My group has had access to a quad processor, 64 core machine (4 x Opteron 6274 @ 2.2 GHz with 16 cores) and I made some performance tests, using the following specifications: System size: 299787 atoms Number of MD steps: 1500 Electrostatics treatment: PME Gromacs version: 4.0.4 MPI: LAM Command ran: mpirun -ssi rpi tcp C mdrun_mpi ... #CPUS Time (s) Steps/s 64 195.000 7.69 32 192.000 7.81 16 275.000 5.45 8 381.000 3.94 4 751.000 2.00 2 1001.000 1.50 1 2352.000 0.64 The scaling is not good. But the weirdest is the 64 processors performing the same as 32. I see the plots from Dr. Hess on the GROMACS 4 paper on JCTC and I do not understand why this is happening. Can anyone help? Thanks in advance, Sara -- gmx-users mailing listgmx-users@gromacs.org http://lists.gromacs.org/mailman/listinfo/gmx-users Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/Search before posting! Please don't post (un)subscribe requests to the list. Use the www interface or send it to gmx-users-requ...@gromacs.org. Can't post? Read http://www.gromacs.org/Support/Mailing_Lists -- gmx-users mailing listgmx-users@gromacs.org http://lists.gromacs.org/mailman/listinfo/gmx-users Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/Search before posting! Please don't post (un)subscribe requests to the list. Use the www interface or send it to gmx-users-requ...@gromacs.org. Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
Re: [gmx-users] Scaling/performance on Gromacs 4
Poor scaling with MPI on many-core machines can also be due uneven job distributions across cores or jobs being wastefully swapped between cores. You might be able to fix this with some esoteric configuration options of mpirun (--bind-to-core worked for me with openMPI), but the surest option is to switch to gromacs 4.5 and run using thread-level parallelisation, bypassing MPI entirely. From: Sara Campos srrcam...@gmail.com To: gmx-users@gromacs.org Sent: Monday, 20 February 2012, 17:12 Subject: [gmx-users] Scaling/performance on Gromacs 4 Dear GROMACS users My group has had access to a quad processor, 64 core machine (4 x Opteron 6274 @ 2.2 GHz with 16 cores) and I made some performance tests, using the following specifications: System size: 299787 atoms Number of MD steps: 1500 Electrostatics treatment: PME Gromacs version: 4.0.4 MPI: LAM Command ran: mpirun -ssi rpi tcp C mdrun_mpi ... #CPUS Time (s) Steps/s 64 195.000 7.69 32 192.000 7.81 16 275.000 5.45 8 381.000 3.94 4 751.000 2.00 2 1001.000 1.50 1 2352.000 0.64 The scaling is not good. But the weirdest is the 64 processors performing the same as 32. I see the plots from Dr. Hess on the GROMACS 4 paper on JCTC and I do not understand why this is happening. Can anyone help? Thanks in advance, Sara -- gmx-users mailing list gmx-users@gromacs.org http://lists.gromacs.org/mailman/listinfo/gmx-users Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/Search before posting! Please don't post (un)subscribe requests to the list. Use the www interface or send it to gmx-users-requ...@gromacs.org. Can't post? Read http://www.gromacs.org/Support/Mailing_Lists-- gmx-users mailing listgmx-users@gromacs.org http://lists.gromacs.org/mailman/listinfo/gmx-users Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/Search before posting! Please don't post (un)subscribe requests to the list. Use the www interface or send it to gmx-users-requ...@gromacs.org. Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
Re: [gmx-users] Scaling/performance on Gromacs 4
On 21/02/2012 8:11 AM, Floris Buelens wrote: Poor scaling with MPI on many-core machines can also be due uneven job distributions across cores or jobs being wastefully swapped between cores. You might be able to fix this with some esoteric configuration options of mpirun (--bind-to-core worked for me with openMPI), but the surest option is to switch to gromacs 4.5 and run using thread-level parallelisation, bypassing MPI entirely. That can avoid problems arising from MPI performance, but not those arising from PP-vs-PME load balance, or intra-PP load balance. The end of the .log files will suggest if these latter effects are strong contributors. Carsten's suggested solution is one good one. Mark *From:* Sara Campos srrcam...@gmail.com *To:* gmx-users@gromacs.org *Sent:* Monday, 20 February 2012, 17:12 *Subject:* [gmx-users] Scaling/performance on Gromacs 4 Dear GROMACS users My group has had access to a quad processor, 64 core machine (4 x Opteron 6274 @ 2.2 GHz with 16 cores) and I made some performance tests, using the following specifications: System size: 299787 atoms Number of MD steps: 1500 Electrostatics treatment: PME Gromacs version: 4.0.4 MPI: LAM Command ran: mpirun -ssi rpi tcp C mdrun_mpi ... #CPUS Time (s) Steps/s 64 195.000 7.69 32 192.000 7.81 16 275.000 5.45 8 381.000 3.94 4 751.000 2.00 2 1001.000 1.50 1 2352.000 0.64 The scaling is not good. But the weirdest is the 64 processors performing the same as 32. I see the plots from Dr. Hess on the GROMACS 4 paper on JCTC and I do not understand why this is happening. Can anyone help? Thanks in advance, Sara -- gmx-users mailing list gmx-users@gromacs.org mailto:gmx-users@gromacs.org http://lists.gromacs.org/mailman/listinfo/gmx-users Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/Search before posting! Please don't post (un)subscribe requests to the list. Use the www interface or send it to gmx-users-requ...@gromacs.org mailto:gmx-users-requ...@gromacs.org. Can't post? Read http://www.gromacs.org/Support/Mailing_Lists -- gmx-users mailing listgmx-users@gromacs.org http://lists.gromacs.org/mailman/listinfo/gmx-users Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/Search before posting! Please don't post (un)subscribe requests to the list. Use the www interface or send it to gmx-users-requ...@gromacs.org. Can't post? Read http://www.gromacs.org/Support/Mailing_Lists