[gmx-users] parallelization
Dear gromacs users I would like to run my simulations on all nodes(8) with full utilisation of all cores(2 each). I have compiled gromacs version 4.6.3 using both thread mpi and open mpi. I am using following command: mpirun -np 8 mdrun_mpi -v -s -nt 2 -s *.tpr -c *.gro But I am getting following error: Setting the total number of threads is only supported with thread-MPI and Gromacs was compiled without thread-MPI . Although during compilation I have used: cmake .. -DGMX_MPI=ON -DGMX_THREAD_MPI=ON If I dont use -nt option, I could see that all the processors(8) are utilised but I am not sure whether all cores are being utilised. For version 4.6.3 without mpi, I Know by default gromacs uses all the threads but not sure if mpi version uses all threads or not. Any help is appreciated. -- gmx-users mailing listgmx-users@gromacs.org http://lists.gromacs.org/mailman/listinfo/gmx-users * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/Search before posting! * Please don't post (un)subscribe requests to the list. Use the www interface or send it to gmx-users-requ...@gromacs.org. * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
Re: [gmx-users] parallelization
Hi, On Oct 17, 2013, at 2:25 PM, pratibha kapoor kapoorpratib...@gmail.com wrote: Dear gromacs users I would like to run my simulations on all nodes(8) with full utilisation of all cores(2 each). I have compiled gromacs version 4.6.3 using both thread mpi and open mpi. I am using following command: mpirun -np 8 mdrun_mpi -v -s -nt 2 -s *.tpr -c *.gro But I am getting following error: Setting the total number of threads is only supported with thread-MPI and Gromacs was compiled without thread-MPI . Although during compilation I have used: cmake .. -DGMX_MPI=ON -DGMX_THREAD_MPI=ON you can either use MPI or thread_mpi. But you can use MPI and OpenMP with -DGMX_MPI=ON -DGMX_OPENMP=ON If I dont use -nt option, I could see that all the processors(8) are utilised but I am not sure whether all cores are being utilised. For You can run with mpirun -np 16 mdrun_mpi -v -s -nt 2 -s *.tpr -c *.gro to use all 16 available cores. version 4.6.3 without mpi, I Know by default gromacs uses all the threads but not sure if mpi version uses all threads or not. Take a look at the md.log output file, there it should be written what Groamcs did use! Best, Carsten Any help is appreciated. -- gmx-users mailing listgmx-users@gromacs.org http://lists.gromacs.org/mailman/listinfo/gmx-users * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/Search before posting! * Please don't post (un)subscribe requests to the list. Use the www interface or send it to gmx-users-requ...@gromacs.org. * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists -- Dr. Carsten Kutzner Max Planck Institute for Biophysical Chemistry Theoretical and Computational Biophysics Am Fassberg 11, 37077 Goettingen, Germany Tel. +49-551-2012313, Fax: +49-551-2012302 http://www.mpibpc.mpg.de/grubmueller/kutzner http://www.mpibpc.mpg.de/grubmueller/sppexa -- gmx-users mailing listgmx-users@gromacs.org http://lists.gromacs.org/mailman/listinfo/gmx-users * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/Search before posting! * Please don't post (un)subscribe requests to the list. Use the www interface or send it to gmx-users-requ...@gromacs.org. * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
Re: [gmx-users] Parallelization performance
On Sat, Mar 16, 2013 at 1:50 AM, Sonia Aguilera sm.aguiler...@uniandes.edu.co wrote: Hi! I have been running MD simulations on a 6 processors machine. I just got an account on a cluster. A nvt stabilization takes about 8 hours on my 6 processors machine, but it takes about 12 hours on the cluster using 16 processors. It is my understanding that the idea of running in parallel is to be more efficient, wrigth? Yes, but your performance depends on the hardware and the setup. 16 abacuses are not faster than 6 computers :-) Secondly, even if the hardware is comparable, if your 6-processor machine has 4 cores per processor, then OpenMP might be delivering more performance. Or your MPI environment might be configured wrongly and you're running 16 copies of the same simulation on superior hardware. You should inspect the top of the log files to see what GROMACS thinks your hardware is providing, and the bottom of the log file to see in which aspects of the simulation the two systems are delivering difference performance. This is the command for the run on the 6 processors machine: mdrun -v -s nvtOmpA.tpr -deffnm nvtOmpA This is the command for the run on 16 processor on the cluster: mpirun -np 16 mdrun_mpi -v -s nvtOmpA.tpr -deffnm nvtOmpA With the last command I am imaging that my process is divided in 16 processors that perform in parallel so that the wall time should be less than in the 6 processor machine. My system is a protein in oil and water, and the simulations are for FE calculations. I think it is spected that the run on the 16 processor of the cluster should be faster, but I'm getting the opposite. Am I doing something wrong? Not as far as we know. But you need to inspect your .log files for all the clues GROMACS provides. This is my mdp. I have used the same mdp for simulations in 4, 6 and 8 processor machines and everytime is faster and runs quite well. Any help will be grateful!! title= NVT equilibration ; Run control integrator = sd ; Langevin dynamics There have been fixes for correctness and performance of the SD integrator - you should certainly not be using GROMACS 4.6. Mark tinit= 0 dt = 0.002 nsteps = 15; 300 ps nstcomm = 100 ; Output control nstxout = 500 nstvout = 500 nstfout = 0 nstlog = 500 nstenergy= 500 nstxtcout= 0 xtc-precision= 1000 ; Neighborsearching and short-range nonbonded interactions nstlist = 10 ns_type = grid pbc = xyz rlist= 1.5 ; Electrostatics coulombtype = PME rcoulomb = 1.5 ; van der Waals vdw-type = switch rvdw-switch = 0.8 rvdw = 0.9 ; Apply long range dispersion corrections for Energy and Pressure DispCorr = EnerPres ; Spacing for the PME/PPPM FFT grid fourierspacing = 0.12 ; EWALD/PME/PPPM parameters pme_order= 6 ewald_rtol = 1e-06 epsilon_surface = 0 optimize_fft = no ; Temperature coupling ; tcoupl is implicitly handled by the sd integrator tc_grps = system tau_t= 1.0 ref_t= 300 ; Pressure coupling is off for NVT Pcoupl = No tau_p= 0.5 compressibility = 4.5e-05 ref_p= 1.0 ; Free energy control stuff free_energy = yes init_lambda = 0.1 delta_lambda = 0 foreign_lambda = 0.05 0.2 sc-alpha = 0 sc-power = 0 sc-sigma = 0 couple-moltype = Protein_chain_A ; name of moleculetype to decouple couple-lambda0 = vdw ; couple-lambda1 = vdw-q ; couple-intramol = yes nstdhdl = 10 ; Generate velocities to start gen_vel = yes gen_temp = 300 gen_seed = -1 ; options for bonds constraints = h-bonds ; we only have C-H bonds here ; Type of constraint algorithm constraint-algorithm = lincs ; Do not constrain the starting configuration continuation = no ; Highest order in the expansion of the constraint coupling matrix lincs-order = 12 Thanks in advance! Sonia Aguilera Graduate assistant -- View this message in context: http://gromacs.5086.n6.nabble.com/Parallelization-performance-tp5006357.html Sent from the GROMACS Users Forum mailing list archive at Nabble.com. -- gmx-users mailing listgmx-users@gromacs.org http://lists.gromacs.org/mailman/listinfo/gmx-users * Please search the archive at
[gmx-users] Parallelization performance
Hi! I have been running MD simulations on a 6 processors machine. I just got an account on a cluster. A nvt stabilization takes about 8 hours on my 6 processors machine, but it takes about 12 hours on the cluster using 16 processors. It is my understanding that the idea of running in parallel is to be more efficient, wrigth? This is the command for the run on the 6 processors machine: mdrun -v -s nvtOmpA.tpr -deffnm nvtOmpA This is the command for the run on 16 processor on the cluster: mpirun -np 16 mdrun_mpi -v -s nvtOmpA.tpr -deffnm nvtOmpA With the last command I am imaging that my process is divided in 16 processors that perform in parallel so that the wall time should be less than in the 6 processor machine. My system is a protein in oil and water, and the simulations are for FE calculations. I think it is spected that the run on the 16 processor of the cluster should be faster, but I'm getting the opposite. Am I doing something wrong? This is my mdp. I have used the same mdp for simulations in 4, 6 and 8 processor machines and everytime is faster and runs quite well. Any help will be grateful!! title= NVT equilibration ; Run control integrator = sd ; Langevin dynamics tinit= 0 dt = 0.002 nsteps = 15; 300 ps nstcomm = 100 ; Output control nstxout = 500 nstvout = 500 nstfout = 0 nstlog = 500 nstenergy= 500 nstxtcout= 0 xtc-precision= 1000 ; Neighborsearching and short-range nonbonded interactions nstlist = 10 ns_type = grid pbc = xyz rlist= 1.5 ; Electrostatics coulombtype = PME rcoulomb = 1.5 ; van der Waals vdw-type = switch rvdw-switch = 0.8 rvdw = 0.9 ; Apply long range dispersion corrections for Energy and Pressure DispCorr = EnerPres ; Spacing for the PME/PPPM FFT grid fourierspacing = 0.12 ; EWALD/PME/PPPM parameters pme_order= 6 ewald_rtol = 1e-06 epsilon_surface = 0 optimize_fft = no ; Temperature coupling ; tcoupl is implicitly handled by the sd integrator tc_grps = system tau_t= 1.0 ref_t= 300 ; Pressure coupling is off for NVT Pcoupl = No tau_p= 0.5 compressibility = 4.5e-05 ref_p= 1.0 ; Free energy control stuff free_energy = yes init_lambda = 0.1 delta_lambda = 0 foreign_lambda = 0.05 0.2 sc-alpha = 0 sc-power = 0 sc-sigma = 0 couple-moltype = Protein_chain_A ; name of moleculetype to decouple couple-lambda0 = vdw ; couple-lambda1 = vdw-q ; couple-intramol = yes nstdhdl = 10 ; Generate velocities to start gen_vel = yes gen_temp = 300 gen_seed = -1 ; options for bonds constraints = h-bonds ; we only have C-H bonds here ; Type of constraint algorithm constraint-algorithm = lincs ; Do not constrain the starting configuration continuation = no ; Highest order in the expansion of the constraint coupling matrix lincs-order = 12 Thanks in advance! Sonia Aguilera Graduate assistant -- View this message in context: http://gromacs.5086.n6.nabble.com/Parallelization-performance-tp5006357.html Sent from the GROMACS Users Forum mailing list archive at Nabble.com. -- gmx-users mailing listgmx-users@gromacs.org http://lists.gromacs.org/mailman/listinfo/gmx-users * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/Search before posting! * Please don't post (un)subscribe requests to the list. Use the www interface or send it to gmx-users-requ...@gromacs.org. * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
[gmx-users] Parallelization scheme and terminology help
I have been lost in the sea of terminology for installing gromacs with multi-processors. The plan is to upgrade from 4.5.5 to the 4.6 and i want the optimal install for my system. There is a a nice explanaion at http://www.gromacs.org/Documentation/Acceleration_and_parallelization but the number of different options and terminology has confused me. I currently have one computer with 2 processor sockets each with 4 cores each with 2 threads. A mouthful which in the end allows for 16 processes at once(2*4*2). The way i read the documentation is that MPI is needed for the talk between the 2 physical processors, OpenMP does the talk between the 4 cores in each processor and thread-MPI does the treading? or does thread-MPI do everything? What would be the Parallelization scheme is required? -- gmx-users mailing listgmx-users@gromacs.org http://lists.gromacs.org/mailman/listinfo/gmx-users * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/Search before posting! * Please don't post (un)subscribe requests to the list. Use the www interface or send it to gmx-users-requ...@gromacs.org. * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
Re: [gmx-users] Parallelization scheme and terminology help
On Mon, Jan 21, 2013 at 11:50 PM, Brad Van Oosten bv0...@brocku.ca wrote: I have been lost in the sea of terminology for installing gromacs with multi-processors. The plan is to upgrade from 4.5.5 to the 4.6 and i want the optimal install for my system. There is a a nice explanaion at http://www.gromacs.org/**Documentation/Acceleration_**and_parallelizationhttp://www.gromacs.org/Documentation/Acceleration_and_parallelizationbut the number of different options and terminology has confused me. That's life, unfortunately. Nomenclature is poorly standardized and gets re-used by different vendors to mean different things, or in different contexts. I currently have one computer with 2 processor sockets each with 4 cores each with 2 threads. A mouthful which in the end allows for 16 processes at once(2*4*2). Your sockets don't require a network to talk to each other, so thread-mpi suffices. Probably your threads are hyper-threads, which may or may not be useful for GROMACS. But you will need to read actual documentation and look up chip set descriptions to really know what you have. The way i read the documentation is that MPI is needed for the talk between the 2 physical processors, OpenMP does the talk between the 4 cores in each processor and thread-MPI does the treading? or does thread-MPI do everything? What would be the Parallelization scheme is required? Probably, use ThreadMPI and forget about everything else :-) This question cannot be answered in the abstract (you'd need to know full hardware characteristics and simulation system characteristics). It is best assessed by trying a few options and comparing the throughput you observe on the systems you care about. Mark -- gmx-users mailing listgmx-users@gromacs.org http://lists.gromacs.org/mailman/listinfo/gmx-users * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/Search before posting! * Please don't post (un)subscribe requests to the list. Use the www interface or send it to gmx-users-requ...@gromacs.org. * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
[gmx-users] parallelization error? gromacs-4.0.2
Hello, I tried from the beginning to test gromacs-4.0.2 with a monoclinic system on 8 processors (one two quad core machine). The skew errors seem to be gone, yet other errors appeared. Now after a successful md, taking the output and trying to do annealing I get the following error: Fatal error: Domain decomposition has not been implemented for box vectors that have non-zero components in directions that do not use dom ain decomposition: ncells = 2 1 4, box vector[3] = -1.070924 -0.07 6.415503 In my input files I do not have a vector as the bolded one above. I tried to run the same input on my server which has two processors and up to know that I'm writing this email the run is running without any errors at all. I have compiled gromacs-4.0.2 with fftw-3.2. Any files needed or any other information will be at your disposal. Thank you, Nikos ___ gmx-users mailing listgmx-users@gromacs.org http://www.gromacs.org/mailman/listinfo/gmx-users Please search the archive at http://www.gromacs.org/search before posting! Please don't post (un)subscribe requests to the list. Use the www interface or send it to [EMAIL PROTECTED] Can't post? Read http://www.gromacs.org/mailing_lists/users.php
RE: [gmx-users] parallelization error? gromacs-4.0.2
Hi, Do you have anisotropic pressure coupling turned on? Could you send me the tpr file? Berk Date: Thu, 20 Nov 2008 14:47:53 + From: [EMAIL PROTECTED] To: gmx-users@gromacs.org Subject: [gmx-users] parallelization error? gromacs-4.0.2 Hello, I tried from the beginning to test gromacs-4.0.2 with a monoclinic system on 8 processors (one two quad core machine). The skew errors seem to be gone, yet other errors appeared. Now after a successful md, taking the output and trying to do annealing I get the following error: Fatal error: Domain decomposition has not been implemented for box vectors that have non-zero components in directions that do not use dom ain decomposition: ncells = 2 1 4, box vector[3] = -1.070924 -0.07 6.415503 In my input files I do not have a vector as the bolded one above. I tried to run the same input on my server which has two processors and up to know that I'm writing this email the run is running without any errors at all. I have compiled gromacs-4.0.2 with fftw-3.2. Any files needed or any other information will be at your disposal. Thank you, Nikos _ Express yourself instantly with MSN Messenger! Download today it's FREE! http://messenger.msn.click-url.com/go/onm00200471ave/direct/01/___ gmx-users mailing listgmx-users@gromacs.org http://www.gromacs.org/mailman/listinfo/gmx-users Please search the archive at http://www.gromacs.org/search before posting! Please don't post (un)subscribe requests to the list. Use the www interface or send it to [EMAIL PROTECTED] Can't post? Read http://www.gromacs.org/mailing_lists/users.php
RE: [gmx-users] parallelization error? gromacs-4.0.2
Hi, Ah, so you have anistropic pressure coupling on. I forgot to put in a check for that when chosing the domain decomposition grid. Changing line 200 of src/mdlib/domdec_setup.c from: if (box[j][i] != 0) to if (box[j][i] != 0 || ir-deform[j][i] != 0 || (ir-epc != epcNO ir-compress[j][i] != 0)) should fix the problem. If will commit this fix for 4.0.3. (PS: you can also set -dd nx ny nz by hand, where for anisotropic pressure coupling you should not have nx=1 or ny=1 when ny!=1 or nz!=1) Berk From: [EMAIL PROTECTED] To: gmx-users@gromacs.org Subject: RE: [gmx-users] parallelization error? gromacs-4.0.2 Date: Thu, 20 Nov 2008 22:05:52 +0100 Hi, Do you have anisotropic pressure coupling turned on? Could you send me the tpr file? Berk Date: Thu, 20 Nov 2008 14:47:53 + From: [EMAIL PROTECTED] To: gmx-users@gromacs.org Subject: [gmx-users] parallelization error? gromacs-4.0.2 Hello, I tried from the beginning to test gromacs-4.0.2 with a monoclinic system on 8 processors (one two quad core machine). The skew errors seem to be gone, yet other errors appeared. Now after a successful md, taking the output and trying to do annealing I get the following error: Fatal error: Domain decomposition has not been implemented for box vectors that have non-zero components in directions that do not use dom ain decomposition: ncells = 2 1 4, box vector[3] = -1.070924 -0.07 6.415503 In my input files I do not have a vector as the bolded one above. I tried to run the same input on my server which has two processors and up to know that I'm writing this email the run is running without any errors at all. I have compiled gromacs-4.0.2 with fftw-3.2. Any files needed or any other information will be at your disposal. Thank you, Nikos Express yourself instantly with MSN Messenger! MSN Messenger _ Express yourself instantly with MSN Messenger! Download today it's FREE! http://messenger.msn.click-url.com/go/onm00200471ave/direct/01/___ gmx-users mailing listgmx-users@gromacs.org http://www.gromacs.org/mailman/listinfo/gmx-users Please search the archive at http://www.gromacs.org/search before posting! Please don't post (un)subscribe requests to the list. Use the www interface or send it to [EMAIL PROTECTED] Can't post? Read http://www.gromacs.org/mailing_lists/users.php