Re: [gmx-users] gpu cluster explanation
Hi Richard, Thank you for the help and sorry for the delay in my reply. I tried some test run changing some parameters (e.g. removing PME) and I was able to reach 20ns/day, so I think that 9-11 ns/day it's the max that I can obtain for my setting. thank your again for your help. cheers, Fra On Fri, 12 Jul 2013, at 03:41 PM, Richard Broadbent wrote: > > > On 12/07/13 13:26, Francesco wrote: > > Hi all, > > I'm working with a 200K atoms system (protein + explicit water) and > > after a while using a cpu cluster I had to switch to a gpu cluster. > > I read both Acceleration and parallelization and Gromacs-gpu > > documentation pages > > (http://www.gromacs.org/Documentation/Acceleration_and_parallelization > > and > > http://www.gromacs.org/Documentation/Installation_Instructions_4.5/GROMACS-OpenMM) > > but it's a bit confusing and I need help to understand if I really have > > understood correctly. :) > > I have 2 type of nodes: > > 3gpu ( NVIDIA Tesla M2090) and 2 cpu 6cores each (Intel Xeon E5649 @ > > 2.53GHz) > > 8gpu and 2 cpu (6 cores each) > > > > 1) I can only have 1 MPI per gpu, meaning that with 3 gpu I can have 3 > > MPI max. > > 2) because I have 12 cores I can open 4 OPenMP threads x MPI, because > > 4x3= 12 > > > > now if I have a node with 8 gpu, I can use 4 gpu: > > 4 MPI and 3 OpenMP > > is it right? > > is it possible to use 8 gpu and 8 cores only? > > you could set -ntomp 0, however and setup mpi/thread mpi to use 8 cores. > However, a system that unbalanced (huge amount of gpu power to > comparatively little cpu power) is unlikely to get great performance. > > > > Using gromacs 4.6.2 and 144 cpu cores I reach 35 ns/day, while with 3 > > gpu and 12 cores I get 9-11 ns/day. > > > That slowdown is in line with what I got when I tried a similar cpu-gpu > setup. That said other's might have some advice that will improve your > performance. > > > the command that I use is: > > mdrun -dlb yes -s input_50.tpr -deffnm 306s_50 -v > > with n° gpu set via script : > > #BSUB -n 3 > > > > I also tried to set -npme / -nt / -ntmpi / -ntomp, but nothing changes. > > > > The mdp file and some statistics are following: > > > > START MDP > > > > title = G6PD wt molecular dynamics (2bhl.pdb) - NPT MD > > > > ; Run parameters > > integrator = md; Algorithm options > > nsteps = 2500 ; maximum number of steps to > > perform [50 ns] > > dt = 0.002 ; 2 fs = 0.002 ps > > > > ; Output control > > nstxout= 1 ; [steps] freq to write coordinates to > > trajectory, the last coordinates are always written > > nstvout= 1 ; [steps] freq to write velocities to > > trajectory, the last velocities are always written > > nstlog = 1 ; [steps] freq to write energies to log > > file, the last energies are always written > > nstenergy = 1 ; [steps] write energies to disk > > every nstenergy steps > > nstxtcout = 1 ; [steps] freq to write coordinates to > > xtc trajectory > > xtc_precision = 1000 ; precision to write to xtc trajectory > > (1000 = default) > > xtc_grps= system; which coordinate > > group(s) to write to disk > > energygrps = system; or System / which energy > > group(s) to writk > > > > ; Bond parameters > > continuation= yes ; restarting from npt > > constraints = all-bonds ; Bond types to replace by constraints > > constraint_algorithm= lincs ; holonomic constraints > > lincs_iter = 1 ; accuracy of LINCS > > lincs_order = 4 ; also related to > > accuracy > > lincs_warnangle = 30; [degrees] maximum angle that a bond can > > rotate before LINCS will complain > > > > That seems a little loose for constraints but setting that up and > checking it's conserving energy and preserving bond lengths is something > you'll have to do yourself > > Richard > > ; Neighborsearching > > ns_type = grid ; method of updating neighbor list > > cutoff-scheme = Verlet > > nstlist = 10; [steps] frequence to update > > neighbor list (10) > > rlist = 1.0 ; [nm] cut-off distance for the > > short-range neighbor list (1 default) > > rcoulomb = 1.0 ; [nm] long range electrostatic cut-off > > rvdw = 1.0 ; [nm] long range Van der Waals cut-off > > > > ; Electrostatics > > coulombtype= PME ; treatment of long range electrostatic > > interactions > > vdwtype = cut-off ; treatment of Van der Waals > > interactions > > > > ; Periodic boundary conditions > > pbc = xyz > > > > ; Dispersion correction > > DispCorr= EnerPres ; appling long > > range dispersion
Re: [gmx-users] gpu cluster explanation
On 12/07/13 13:26, Francesco wrote: Hi all, I'm working with a 200K atoms system (protein + explicit water) and after a while using a cpu cluster I had to switch to a gpu cluster. I read both Acceleration and parallelization and Gromacs-gpu documentation pages (http://www.gromacs.org/Documentation/Acceleration_and_parallelization and http://www.gromacs.org/Documentation/Installation_Instructions_4.5/GROMACS-OpenMM) but it's a bit confusing and I need help to understand if I really have understood correctly. :) I have 2 type of nodes: 3gpu ( NVIDIA Tesla M2090) and 2 cpu 6cores each (Intel Xeon E5649 @ 2.53GHz) 8gpu and 2 cpu (6 cores each) 1) I can only have 1 MPI per gpu, meaning that with 3 gpu I can have 3 MPI max. 2) because I have 12 cores I can open 4 OPenMP threads x MPI, because 4x3= 12 now if I have a node with 8 gpu, I can use 4 gpu: 4 MPI and 3 OpenMP is it right? is it possible to use 8 gpu and 8 cores only? you could set -ntomp 0, however and setup mpi/thread mpi to use 8 cores. However, a system that unbalanced (huge amount of gpu power to comparatively little cpu power) is unlikely to get great performance. Using gromacs 4.6.2 and 144 cpu cores I reach 35 ns/day, while with 3 gpu and 12 cores I get 9-11 ns/day. That slowdown is in line with what I got when I tried a similar cpu-gpu setup. That said other's might have some advice that will improve your performance. the command that I use is: mdrun -dlb yes -s input_50.tpr -deffnm 306s_50 -v with n° gpu set via script : #BSUB -n 3 I also tried to set -npme / -nt / -ntmpi / -ntomp, but nothing changes. The mdp file and some statistics are following: START MDP title = G6PD wt molecular dynamics (2bhl.pdb) - NPT MD ; Run parameters integrator = md; Algorithm options nsteps = 2500 ; maximum number of steps to perform [50 ns] dt = 0.002 ; 2 fs = 0.002 ps ; Output control nstxout= 1 ; [steps] freq to write coordinates to trajectory, the last coordinates are always written nstvout= 1 ; [steps] freq to write velocities to trajectory, the last velocities are always written nstlog = 1 ; [steps] freq to write energies to log file, the last energies are always written nstenergy = 1 ; [steps] write energies to disk every nstenergy steps nstxtcout = 1 ; [steps] freq to write coordinates to xtc trajectory xtc_precision = 1000 ; precision to write to xtc trajectory (1000 = default) xtc_grps= system; which coordinate group(s) to write to disk energygrps = system; or System / which energy group(s) to writk ; Bond parameters continuation= yes ; restarting from npt constraints = all-bonds ; Bond types to replace by constraints constraint_algorithm= lincs ; holonomic constraints lincs_iter = 1 ; accuracy of LINCS lincs_order = 4 ; also related to accuracy lincs_warnangle = 30; [degrees] maximum angle that a bond can rotate before LINCS will complain That seems a little loose for constraints but setting that up and checking it's conserving energy and preserving bond lengths is something you'll have to do yourself Richard ; Neighborsearching ns_type = grid ; method of updating neighbor list cutoff-scheme = Verlet nstlist = 10; [steps] frequence to update neighbor list (10) rlist = 1.0 ; [nm] cut-off distance for the short-range neighbor list (1 default) rcoulomb = 1.0 ; [nm] long range electrostatic cut-off rvdw = 1.0 ; [nm] long range Van der Waals cut-off ; Electrostatics coulombtype= PME ; treatment of long range electrostatic interactions vdwtype = cut-off ; treatment of Van der Waals interactions ; Periodic boundary conditions pbc = xyz ; Dispersion correction DispCorr= EnerPres ; appling long range dispersion corrections ; Ewald fourierspacing= 0.12; grid spacing for FFT - controll the higest magnitude of wave vectors (0.12) pme_order = 4 ; interpolation order for PME, 4 = cubic ewald_rtol= 1e-5 ; relative strength of Ewald-shifted potential at rcoulomb ; Temperature coupling tcoupl = nose-hoover ; temperature coupling with Nose-Hoover ensemble tc_grps = Protein Non-Protein tau_t = 0.40.4; [ps] time constant ref_t = 310310; [K] reference temperature for coupling [310 = 28°C ; Pressure coupling pcoupl = parrinello-rahman pcoupltype= isotro
[gmx-users] gpu cluster explanation
Hi all, I'm working with a 200K atoms system (protein + explicit water) and after a while using a cpu cluster I had to switch to a gpu cluster. I read both Acceleration and parallelization and Gromacs-gpu documentation pages (http://www.gromacs.org/Documentation/Acceleration_and_parallelization and http://www.gromacs.org/Documentation/Installation_Instructions_4.5/GROMACS-OpenMM) but it's a bit confusing and I need help to understand if I really have understood correctly. :) I have 2 type of nodes: 3gpu ( NVIDIA Tesla M2090) and 2 cpu 6cores each (Intel Xeon E5649 @ 2.53GHz) 8gpu and 2 cpu (6 cores each) 1) I can only have 1 MPI per gpu, meaning that with 3 gpu I can have 3 MPI max. 2) because I have 12 cores I can open 4 OPenMP threads x MPI, because 4x3= 12 now if I have a node with 8 gpu, I can use 4 gpu: 4 MPI and 3 OpenMP is it right? is it possible to use 8 gpu and 8 cores only? Using gromacs 4.6.2 and 144 cpu cores I reach 35 ns/day, while with 3 gpu and 12 cores I get 9-11 ns/day. the command that I use is: mdrun -dlb yes -s input_50.tpr -deffnm 306s_50 -v with n° gpu set via script : #BSUB -n 3 I also tried to set -npme / -nt / -ntmpi / -ntomp, but nothing changes. The mdp file and some statistics are following: START MDP title = G6PD wt molecular dynamics (2bhl.pdb) - NPT MD ; Run parameters integrator = md; Algorithm options nsteps = 2500 ; maximum number of steps to perform [50 ns] dt = 0.002 ; 2 fs = 0.002 ps ; Output control nstxout= 1 ; [steps] freq to write coordinates to trajectory, the last coordinates are always written nstvout= 1 ; [steps] freq to write velocities to trajectory, the last velocities are always written nstlog = 1 ; [steps] freq to write energies to log file, the last energies are always written nstenergy = 1 ; [steps] write energies to disk every nstenergy steps nstxtcout = 1 ; [steps] freq to write coordinates to xtc trajectory xtc_precision = 1000 ; precision to write to xtc trajectory (1000 = default) xtc_grps= system; which coordinate group(s) to write to disk energygrps = system; or System / which energy group(s) to writk ; Bond parameters continuation= yes ; restarting from npt constraints = all-bonds ; Bond types to replace by constraints constraint_algorithm= lincs ; holonomic constraints lincs_iter = 1 ; accuracy of LINCS lincs_order = 4 ; also related to accuracy lincs_warnangle = 30; [degrees] maximum angle that a bond can rotate before LINCS will complain ; Neighborsearching ns_type = grid ; method of updating neighbor list cutoff-scheme = Verlet nstlist = 10; [steps] frequence to update neighbor list (10) rlist = 1.0 ; [nm] cut-off distance for the short-range neighbor list (1 default) rcoulomb = 1.0 ; [nm] long range electrostatic cut-off rvdw = 1.0 ; [nm] long range Van der Waals cut-off ; Electrostatics coulombtype= PME ; treatment of long range electrostatic interactions vdwtype = cut-off ; treatment of Van der Waals interactions ; Periodic boundary conditions pbc = xyz ; Dispersion correction DispCorr= EnerPres ; appling long range dispersion corrections ; Ewald fourierspacing= 0.12; grid spacing for FFT - controll the higest magnitude of wave vectors (0.12) pme_order = 4 ; interpolation order for PME, 4 = cubic ewald_rtol= 1e-5 ; relative strength of Ewald-shifted potential at rcoulomb ; Temperature coupling tcoupl = nose-hoover ; temperature coupling with Nose-Hoover ensemble tc_grps = Protein Non-Protein tau_t = 0.40.4; [ps] time constant ref_t = 310310; [K] reference temperature for coupling [310 = 28°C ; Pressure coupling pcoupl = parrinello-rahman pcoupltype= isotropic ; uniform scaling of box vect tau_p = 2.0 ; [ps] time constant ref_p = 1.0 ; [bar] reference pressure for coupling compressibility = 4.5e-5 ; [bar^-1] isothermal compressibility of water refcoord_scaling= com ; have a look at GROMACS documentation 7. ; Velocity generation gen_vel = no