Re: [gmx-users] GPU / CPU load imblance

2013-06-25 Thread Justin Lemkul



On 6/25/13 6:33 PM, Dwey wrote:

Hi gmx-users,

 I used  8-cores AMD CPU  with a GTX680 GPU [ with 1536 CUDA Cores]  to
run an example of Umbrella Sampling provided by Justin.
I am happy that GPU acceleration indeed helps me reduce significant time (
from 34 hours to 7 hours)  of computation in this example.
However, I found there was a NOTE on the screen like

++
  The GPU has >20% more load than the CPU. This imbalance causes
performance loss, consider using a shorter cut-off and a finer PME grid
  ++

Given a 20% load imbalance, I wonder if someone can give suggestions as to
how to avoid performance loss in terms of hardware (GPU/CPU)
improvement  or  the modification of  mdp file (see below).



I would avoid tweaking the .mdp settings.  There have been several reports where 
people hacked at nonbonded cutoffs to get better performance, and it resulted in 
totally useless output.  These settings are part of the force field.  Avoid 
changing them.



In terms of hardware,  dose this NOTE suggest that I should use a
higher-capacity GPU like GTX 780 [ with 2304 CUDA Cores] to balance load or
catch up speed  ?
If so,   can it help by adding  another card with  GTX 680 GPU in the same
box ?  but will it cause GPU/CPU imbalance load  again, which two GPU keep
waiting for 8-cores CPU  ?


There has been a lot of discussion on hardware, GPU/CPU balancing, etc. in 
recent days.  Please check the archive.  Some of the threads are quite detailed.


-Justin



Second,

++
Force evaluation time GPU/CPU: 4.006 ms/2.578 ms = 1.554
For optimal performance this ratio should be close to 1
++

I have no idea how this is evaluated by 4.006 ms and 2.578 ms for GPU and
CPU time, respectively.

It will be very helpful to modify  the attached mdp for a better
load balance between GPU and CPU.

I appreciate kind advice and hints to improve this mdp file.

Thanks,

Dwey

### courtesy  to  Justin #

title   = Umbrella pulling simulation
define  = -DPOSRES_B
; Run parameters
integrator  = md
dt  = 0.002
tinit   = 0
nsteps  = 500   ; 10 ns
nstcomm = 10
; Output parameters
nstxout = 5 ; every 100 ps
nstvout = 5
nstfout = 5000
nstxtcout   = 5000  ; every 10 ps
nstenergy   = 5000
; Bond parameters
constraint_algorithm= lincs
constraints = all-bonds
continuation= yes
; Single-range cutoff scheme
nstlist = 5
ns_type = grid
rlist   = 1.4
rcoulomb= 1.4
rvdw= 1.4
; PME electrostatics parameters
coulombtype = PME
fourierspacing  = 0.12
fourier_nx  = 0
fourier_ny  = 0
fourier_nz  = 0
pme_order   = 4
ewald_rtol  = 1e-5
optimize_fft= yes
; Berendsen temperature coupling is on in two groups
Tcoupl  = Nose-Hoover
tc_grps = Protein   Non-Protein
tau_t   = 0.5   0.5
ref_t   = 310   310
; Pressure coupling is on
Pcoupl  = Parrinello-Rahman
pcoupltype  = isotropic
tau_p   = 1.0
compressibility = 4.5e-5
ref_p   = 1.0
refcoord_scaling = com
; Generate velocities is off
gen_vel = no
; Periodic boundary conditions are on in all directions
pbc = xyz
; Long-range dispersion correction
DispCorr= EnerPres
cutoff-scheme   = Verlet
; Pull code
pull= umbrella
pull_geometry   = distance
pull_dim= N N Y
pull_start  = yes
pull_ngroups= 1
pull_group0 = Chain_B
pull_group1 = Chain_A
pull_init1  = 0
pull_rate1  = 0.0
pull_k1 = 1000  ; kJ mol^-1 nm^-2
pull_nstxout= 1000  ; every 2 ps
pull_nstfout= 1000  ; every 2 ps



--


Justin A. Lemkul, Ph.D.
Research Scientist
Department of Biochemistry
Virginia Tech
Blacksburg, VA
jalemkul[at]vt.edu | (540) 231-9080
http://www.bevanlab.biochem.vt.edu/Pages/Personal/justin


--
gmx-users mailing listgmx-users@gromacs.org
http://lists.gromacs.org/mailman/listinfo/gmx-users
* Please search the archive at 
http://www.gromacs.org/Support/Mailing_Lists/Search before posting!
* Please don't post (un)subscribe requests to the list. Use the 
www interface or send it to gmx-users-requ...@gromacs.org.

* Can't post? Read http://www.gromacs.org/Support/Mailing_Lists


[gmx-users] GPU / CPU load imblance

2013-06-25 Thread Dwey
Hi gmx-users,

I used  8-cores AMD CPU  with a GTX680 GPU [ with 1536 CUDA Cores]  to
run an example of Umbrella Sampling provided by Justin.
I am happy that GPU acceleration indeed helps me reduce significant time (
from 34 hours to 7 hours)  of computation in this example.
However, I found there was a NOTE on the screen like

++
 The GPU has >20% more load than the CPU. This imbalance causes
performance loss, consider using a shorter cut-off and a finer PME grid
 ++

Given a 20% load imbalance, I wonder if someone can give suggestions as to
how to avoid performance loss in terms of hardware (GPU/CPU)
improvement  or  the modification of  mdp file (see below).

In terms of hardware,  dose this NOTE suggest that I should use a
higher-capacity GPU like GTX 780 [ with 2304 CUDA Cores] to balance load or
catch up speed  ?
If so,   can it help by adding  another card with  GTX 680 GPU in the same
box ?  but will it cause GPU/CPU imbalance load  again, which two GPU keep
waiting for 8-cores CPU  ?

Second,

++
Force evaluation time GPU/CPU: 4.006 ms/2.578 ms = 1.554
For optimal performance this ratio should be close to 1
++

I have no idea how this is evaluated by 4.006 ms and 2.578 ms for GPU and
CPU time, respectively.

It will be very helpful to modify  the attached mdp for a better
load balance between GPU and CPU.

I appreciate kind advice and hints to improve this mdp file.

Thanks,

Dwey

### courtesy  to  Justin #

title   = Umbrella pulling simulation
define  = -DPOSRES_B
; Run parameters
integrator  = md
dt  = 0.002
tinit   = 0
nsteps  = 500   ; 10 ns
nstcomm = 10
; Output parameters
nstxout = 5 ; every 100 ps
nstvout = 5
nstfout = 5000
nstxtcout   = 5000  ; every 10 ps
nstenergy   = 5000
; Bond parameters
constraint_algorithm= lincs
constraints = all-bonds
continuation= yes
; Single-range cutoff scheme
nstlist = 5
ns_type = grid
rlist   = 1.4
rcoulomb= 1.4
rvdw= 1.4
; PME electrostatics parameters
coulombtype = PME
fourierspacing  = 0.12
fourier_nx  = 0
fourier_ny  = 0
fourier_nz  = 0
pme_order   = 4
ewald_rtol  = 1e-5
optimize_fft= yes
; Berendsen temperature coupling is on in two groups
Tcoupl  = Nose-Hoover
tc_grps = Protein   Non-Protein
tau_t   = 0.5   0.5
ref_t   = 310   310
; Pressure coupling is on
Pcoupl  = Parrinello-Rahman
pcoupltype  = isotropic
tau_p   = 1.0
compressibility = 4.5e-5
ref_p   = 1.0
refcoord_scaling = com
; Generate velocities is off
gen_vel = no
; Periodic boundary conditions are on in all directions
pbc = xyz
; Long-range dispersion correction
DispCorr= EnerPres
cutoff-scheme   = Verlet
; Pull code
pull= umbrella
pull_geometry   = distance
pull_dim= N N Y
pull_start  = yes
pull_ngroups= 1
pull_group0 = Chain_B
pull_group1 = Chain_A
pull_init1  = 0
pull_rate1  = 0.0
pull_k1 = 1000  ; kJ mol^-1 nm^-2
pull_nstxout= 1000  ; every 2 ps
pull_nstfout= 1000  ; every 2 ps
-- 
gmx-users mailing listgmx-users@gromacs.org
http://lists.gromacs.org/mailman/listinfo/gmx-users
* Please search the archive at 
http://www.gromacs.org/Support/Mailing_Lists/Search before posting!
* Please don't post (un)subscribe requests to the list. Use the 
www interface or send it to gmx-users-requ...@gromacs.org.
* Can't post? Read http://www.gromacs.org/Support/Mailing_Lists