Re: [gmx-users] GPU-gromacs

2013-10-25 Thread Carsten Kutzner
On Oct 25, 2013, at 4:07 PM, aixintiankong wrote:

 Dear prof.,
 i want install gromacs on a multi-core workstation with a GPU(tesla c2075), 
 should i install the openmpi or mpich2? 
If you want to run Gromacs on just one workstation with a single GPU, you do
not need to install an MPI library at all!


 gmx-users mailing
 * Please search the archive at before posting!
 * Please don't post (un)subscribe requests to the list. Use the 
 www interface or send it to
 * Can't post? Read

Dr. Carsten Kutzner
Max Planck Institute for Biophysical Chemistry
Theoretical and Computational Biophysics
Am Fassberg 11, 37077 Goettingen, Germany
Tel. +49-551-2012313, Fax: +49-551-2012302

gmx-users mailing
* Please search the archive at before posting!
* Please don't post (un)subscribe requests to the list. Use the
www interface or send it to
* Can't post? Read

Re: [gmx-users] GPU version of Gromacs

2013-08-19 Thread Justin Lemkul

On 8/19/13 5:38 AM, grita wrote:

Hey guys,

Is it possible to make a SD simulation with using the pull code in the GPU
version of Gromacs?

Have you tried it?



Justin A. Lemkul, Ph.D.
Postdoctoral Fellow

Department of Pharmaceutical Sciences
School of Pharmacy
Health Sciences Facility II, Room 601
University of Maryland, Baltimore
20 Penn St.
Baltimore, MD 21201 | (410) 706-7441

gmx-users mailing
* Please search the archive at before posting!
* Please don't post (un)subscribe requests to the list. Use the 
www interface or send it to

* Can't post? Read

Re: [gmx-users] GPU metadynamics

2013-08-15 Thread Albert

On 08/15/2013 11:21 AM, Jacopo Sgrignani wrote:

Dear Albert
to run parallel jobs on multiple GPUs you should use something like this:

mpirun -np (number of parallel sessions on CPU) mdrun_mpi .. -gpu_id 

so you will have 4 calculations for GPU.


thanks a  lot for reply. but there is some problem with following command:

mpirun -np 4 mdrun_mpi -s md.tpr -v -g md.log -o md.trr -x md.xtc 
-plumed plumed2.dat -e md.edr -gpu_id 0123.


4 GPUs detected on host node3:
  #0: NVIDIA GeForce GTX 690, compute cap.: 3.0, ECC:  no, stat: compatible
  #1: NVIDIA GeForce GTX 690, compute cap.: 3.0, ECC:  no, stat: compatible
  #2: NVIDIA GeForce GTX 690, compute cap.: 3.0, ECC:  no, stat: compatible
  #3: NVIDIA GeForce GTX 690, compute cap.: 3.0, ECC:  no, stat: compatible

Program mdrun_mpi, VERSION 4.6.3
Source code file: 
line: 349

Fatal error:
Incorrect launch configuration: mismatching number of PP MPI processes 
and GPUs per node.
mdrun_mpi was started with 1 PP MPI process per node, but you provided 4 

For more information and tips for troubleshooting, please check the GROMACS
website at

gmx-users mailing
* Please search the archive at before posting!
* Please don't post (un)subscribe requests to the list. Use the 
www interface or send it to

* Can't post? Read

Re: [gmx-users] GPU + surface

2013-08-09 Thread Lucio Montero
Hello. ¿Have you removed periodicity?. Because you may only be seeing 
traversal of water molecules among copies of the periodic system.

Lucio Montero
Ph. D. student
Instituto de Biotecnologia, UNAM

El 08/08/13 07:39, Ondrej Kroutil escribió:

Dear GMX users.
   I have done simulation of ions and water near quartz surface
(ClayFF) using GPU (GTX580) and Gromacs (4.6.1, single precision, 64
bit, SSE4.1, fftw-3.3.3) and have observed strange behavior of water
and ions. Its NVT simulation with freezed surface atoms (see .mdp
below) and negative charge on surface (deprotonated silanols), system
is overall neutral. I used same mdp for normal CPU simulation and GPU
simulation, and just added -testverlet option for GPU simulation.
   In CPU simulation ions and water behaved as expected (see
, but in GPU simulation there was a visible flow of ions toward image
of lower surface and all water molecules were oriented with hydrogens
facing downward and oxygens oriented upwards (see
It looks like there was an applied electric field but it is not.
   Do you think there is a problem in initial setup of parameters in
mdp file? Or maybe problem of freezing groups? With no freeze
situation is better, but there is still visible flow and pairing of
same ions (see
   It look as electrostatics problem. Do you have any hints, please?
And sorry if I missed similar topic in mailing list, but I couldn't
find anything similar.

   Ondrej Kroutil

integrator   =  md
dt   =  0.001
nsteps   =  10
comm_mode=  linear
nstcomm  =  1000
nstxout  =  0
nstxtcout=  1000
nstvout  =  0
nstfout  =  0
nstlog   =  1000
xtc_precision=  1
nstlist  =  10
ns_type  =  grid
rlist=  1.2
coulombtype  =  PME
rcoulomb =  1.2
rvdw =  1.2
constraints  =  hbonds
constraint_algorithm =  lincs
lincs_iter   =  1
fourierspacing   =  0.1
pme_order   =  4
ewald_rtol  =  1e-5
ewald_geometry  =  3dc
optimize_fft=  yes
; Nose-Hoover temperature coupling
Tcoupl =  nose-hoover
tau_t  =  1
tc_grps=  system
ref_t  =  298.15
; No Pressure
; Pcoupl =   Parrinello-Rahman
pcoupltype  =  semiisotropic
tau_p   =  1.0
compressibility =  0 4.6e-5
ref_p   =  0 1.0
periodic_molecules  =  no
pbc =  xyz
;energygrps = SOL SOH
freezegrps  = BULK
freezedim   = Y Y Y
gen_vel = yes
gen_temp= 298.15
gen_seed= -1

gmx-users mailing
* Please search the archive at before posting!
* Please don't post (un)subscribe requests to the list. Use the 
www interface or send it to

* Can't post? Read

RE: [gmx-users] GPU + surface

2013-08-08 Thread Berk Hess

The -testverlet option is only for testing (as the name implies).
Please set the mdp option cutoff-scheme = Verlet
Also please update to 4.6.3, as this, potential, issue might have already been 
With the Verlet scheme the CPU and GPU should give the same, correct or 
incorrect, result.

Could it be that your system is located partially above and partially below z=0?
This will cause problems with ewald-geometry = 3dc. To use this option you need 
to ensure your whole system is in the same periodic image.



 Date: Thu, 8 Aug 2013 14:39:59 +0200
 Subject: [gmx-users] GPU + surface

 Dear GMX users.
 I have done simulation of ions and water near quartz surface
 (ClayFF) using GPU (GTX580) and Gromacs (4.6.1, single precision, 64
 bit, SSE4.1, fftw-3.3.3) and have observed strange behavior of water
 and ions. Its NVT simulation with freezed surface atoms (see .mdp
 below) and negative charge on surface (deprotonated silanols), system
 is overall neutral. I used same mdp for normal CPU simulation and GPU
 simulation, and just added -testverlet option for GPU simulation.
 In CPU simulation ions and water behaved as expected (see
 , but in GPU simulation there was a visible flow of ions toward image
 of lower surface and all water molecules were oriented with hydrogens
 facing downward and oxygens oriented upwards (see
 It looks like there was an applied electric field but it is not.
 Do you think there is a problem in initial setup of parameters in
 mdp file? Or maybe problem of freezing groups? With no freeze
 situation is better, but there is still visible flow and pairing of
 same ions (see
 It look as electrostatics problem. Do you have any hints, please?
 And sorry if I missed similar topic in mailing list, but I couldn't
 find anything similar.

 Ondrej Kroutil

 integrator = md
 dt = 0.001
 nsteps = 10
 comm_mode = linear
 nstcomm = 1000
 nstxout = 0
 nstxtcout = 1000
 nstvout = 0
 nstfout = 0
 nstlog = 1000
 xtc_precision = 1
 nstlist = 10
 ns_type = grid
 rlist = 1.2
 coulombtype = PME
 rcoulomb = 1.2
 rvdw = 1.2
 constraints = hbonds
 constraint_algorithm = lincs
 lincs_iter = 1
 fourierspacing = 0.1
 pme_order = 4
 ewald_rtol = 1e-5
 ewald_geometry = 3dc
 optimize_fft = yes
 ; Nose-Hoover temperature coupling
 Tcoupl = nose-hoover
 tau_t = 1
 tc_grps = system
 ref_t = 298.15
 ; No Pressure
 ; Pcoupl = Parrinello-Rahman
 pcoupltype = semiisotropic
 tau_p = 1.0
 compressibility = 0 4.6e-5
 ref_p = 0 1.0
 periodic_molecules = no
 pbc = xyz
 ;energygrps = SOL SOH
 freezegrps = BULK
 freezedim = Y Y Y
 gen_vel = yes
 gen_temp = 298.15
 gen_seed = -1

 Ondřej Kroutil
 ,, Faculty of Health and Social Studies
 ))' University of South Bohemia
 OOO Jirovcova 24, Ceske Budejovice
 OOO The Czech Republic
 | OO E-mail:
-- O Mobile: +420 736 537 190
 gmx-users mailing list
 * Please search the archive at before posting!
 * Please don't post (un)subscribe requests to the list. Use the
 www interface or send it to
 * Can't post? Read   
gmx-users mailing
* Please search the archive at before posting!
* Please don't post (un)subscribe requests to the list. Use the
www interface or send it to
* Can't post? Read

Re: Re: Re: [gmx-users] GPU-based workstation

2013-08-01 Thread Szilárd Páll
I may be late with the reply, but here are my 2 cents.

If you need a single very fast machine (i.e. maximum single simulation
performance), you should get
- either a very fast desktop CPU: i7 3930 or for 2x more the 3970 -
which, BTW, I think is not worth it ($600-1000)
- or 1-2 fast Xeon E5-s - depending on how many and which these will
be $1k-2k each.

For a single CPU setup two Titans may be an overkill and (at least
with the current code) you may get very little extra performance from
using two iso one GPU. With a dual-socket machine (and decently fast
CPUs), if you have a large enough input system, two GPUs will work

However, if you care about total simulation throughput and you have
multiple simulations to run, I'd suggest that you buy 2-3 machines
with the components that give the best ns/day/$: something like
i7-4670 or 4770 with GTX 680/770 (or 780).


On Thu, Jun 27, 2013 at 1:01 PM, James Starlight wrote:
 Back to my question
 I want to build gpu-based workstation based onto 2 titans geforces.

 My current budget allow me only hight-end  6nodes core i 7-3930  and MB
 with 5 PCI-E (like Asus rampage IV series). Would this system be balanced
 with two GPUs ? Should I use two 6-8 nodes XEONS instead of i7?


 2013/5/29 James Starlight

 Dear Dr. Pall!

 Thank you for your suggestions!

 Asumming that I have budget of 5000 $ and I want to build gpu-based
 desktop on this money.

 Previously I've used single 4 core i5 with GTX 670 and obtain average 10
 ns\day performance for the 70k atoms systems (1.0 cutoffs, no virtual sites
 , sd integrator).

 Now I'd like to build system based on 2 hight-end GeForces (e.g like
 Should that system include 2 cpu's for good balancing? (e.g two 6 nodes
 XEONS with faster clocks for instance could be better for simulations than
 i7, couldnt it?)

 What addition properties to the MB should I consider for such system ?


 2013/5/28 lloyd riggs

 Dear Dr. Pali,

 Thank you,

 Stephan Watkins

 *Gesendet:* Dienstag, 28. Mai 2013 um 19:50 Uhr
 *Von:* Szilárd Páll

 *An:* Discussion list for GROMACS users
 *Betreff:* Re: Re: [gmx-users] GPU-based workstation
 Dear all,

 As far as I understand, the OP is interested in hardware for *running*
 GROMACS 4.6 rather than developing code. or running LINPACK.

 To get best performance it is important to use a machine with hardware
 balanced for GROMACS' workloads. Too little GPU resources will result
 in CPU idling; too much GPU resources will lead to the runs being CPU
 or multi-GPU scaling bound and above a certain level GROMACS won't be
 able to make use of additional GPUs.

 Of course, the balance will depend both on hardware and simulation
 settings (mostly the LJ cut-off used).

 An additional factor to consider is typical system size. To reach near
 peak pair-force throughput on GPUs you typically need 20k-40k
 particles/GPU (depends on the architecture) and throughput drops below
 these values. Hence, in most cases it is preferred to use fewer and
 faster GPUs rather than more.

 Without knowing the budgdet and indented use of the machine it is hard
 to make suggestions, but I would say for a budget desktop box a
 quad-core Intel Ivy Bridge or the top-end AMD Piledriver CPU with a
 fast Kepler GTX card (e.g. GTX 680 or GTX 770/780) should work well.
 If you're considering dual-socket workstations, I suggest you go with
 the higher core-count and higher frequency Intel CPUs (6+ cores 2.2
 GHz), otherwise you may not see as much benefit as you would expect
 based on the insane price tag (especially if you compare to an i7
 3939K or its IVB successor).


 On Sat, May 25, 2013 at 1:02 PM, lloyd riggs wrote:
  More RAM the better, and the best I have seen is 4 GPU work station. I
  use/have used 4. The GPU takes 2 slots though, so a 7-8 PCIe board is
  really 3-4 GPU, except the tyan mentioned (there designed as blades so
 an 8
  or 10 slot board really holds 8 or 10 GPU's). There's cooling problems
  though with GPU's, as on a board there packed, so extra cooling things
  help not blow a GPU, but I would look for good ones (ask around), as
 its a
  video game market and they go for looks even though its in casing? The
  external RAM (not onboard GPU RAM) helps if you do a larger sim, but I
  know performance wise, the onboard GPU, the more RAM the
  normal work stations you can get 4 GPU's for a 300 US$ board, but then
  price goes way up (3-4000 US$ for an 8-10 gpu board). RAM ordered
 abroad is
  also cheep, 8 or 16 MB Vs. Shop...I have used 4 GPU's but only on tests
  software, not Gromacs, so would be nice to see performance...for a
 small 100
  atom molecule and 500 solvent, using just the CPU I get it to run 5-10
  minutes real for 1 ns sim, but tried simple large 800 amino, 25,000

Re: [gmx-users] gpu cluster explanation

2013-07-23 Thread Francesco
Hi Richard,
Thank you for the help and sorry for the delay in my reply.
I tried some test run changing some parameters (e.g. removing PME) and I
was able to reach 20ns/day, so I think that 9-11 ns/day it's the max
that I can obtain for my setting.

thank your again for your help.



On Fri, 12 Jul 2013, at 03:41 PM, Richard Broadbent wrote:
 On 12/07/13 13:26, Francesco wrote:
  Hi all,
  I'm working with a 200K atoms system (protein + explicit water) and
  after a while using a cpu cluster I had to switch to a gpu cluster.
  I read both Acceleration and parallelization and Gromacs-gpu
  documentation pages
  but it's a bit confusing and I need help to understand if I really have
  understood correctly. :)
  I have 2 type of nodes:
  3gpu ( NVIDIA Tesla M2090) and 2 cpu 6cores each (Intel Xeon E5649 @
  8gpu and 2 cpu (6 cores each)
  1) I can only have 1 MPI per gpu, meaning that with 3 gpu I can have 3
  MPI max.
  2) because I have 12 cores I can open 4 OPenMP threads x MPI, because
  4x3= 12
  now if I have a node with 8 gpu, I can use 4 gpu:
  4 MPI and 3 OpenMP
  is it right?
  is it possible to use 8 gpu and 8 cores only?
 you could set -ntomp 0, however and setup mpi/thread mpi to use 8 cores. 
 However, a system that unbalanced (huge amount of gpu power to 
 comparatively little cpu power) is unlikely to get great performance.
  Using gromacs 4.6.2 and 144 cpu cores I reach 35 ns/day, while with 3
  gpu  and 12 cores I get 9-11 ns/day.
 That slowdown is in line with what I got when I tried a similar cpu-gpu 
 setup. That said other's might have some advice that will improve your 
  the command that I use is:
  mdrun -dlb yes -s input_50.tpr -deffnm 306s_50 -v
  with n° gpu set via script :
  #BSUB -n 3
  I also tried to set -npme / -nt / -ntmpi / -ntomp, but nothing changes.
  The mdp file and some statistics are following:
  title = G6PD wt molecular dynamics (2bhl.pdb) - NPT MD
  ; Run parameters
  integrator  = md; Algorithm options
  nsteps  = 2500  ; maximum number of steps to
  perform [50 ns]
  dt  = 0.002 ; 2 fs = 0.002 ps
  ; Output control
  nstxout= 1 ; [steps] freq to write coordinates to
  trajectory, the last coordinates are always written
  nstvout= 1 ; [steps] freq to write velocities to
  trajectory, the last velocities are always written
  nstlog  = 1 ; [steps] freq to write energies to log
  file, the last energies are always written
  nstenergy = 1  ; [steps] write energies to disk
  every nstenergy steps
  nstxtcout  = 1 ; [steps] freq to write coordinates to
  xtc trajectory
  xtc_precision   = 1000  ; precision to write to xtc trajectory
  (1000 = default)
  xtc_grps= system; which coordinate
  group(s) to write to disk
  energygrps  = system; or System / which energy
  group(s) to writk
  ; Bond parameters
  continuation= yes   ; restarting from npt
  constraints = all-bonds ; Bond types to replace by constraints
  constraint_algorithm= lincs ; holonomic constraints
  lincs_iter  = 1 ; accuracy of LINCS
  lincs_order = 4 ; also related to
  lincs_warnangle  = 30; [degrees] maximum angle that a bond can
  rotate before LINCS will complain
 That seems a little loose for constraints but setting that up and 
 checking it's conserving energy and preserving bond lengths is something 
 you'll have to do yourself
  ; Neighborsearching
  ns_type = grid  ; method of updating neighbor list
  cutoff-scheme = Verlet
  nstlist = 10; [steps] frequence to update
  neighbor list (10)
  rlist = 1.0   ; [nm] cut-off distance for the
  short-range neighbor list  (1 default)
  rcoulomb  = 1.0   ; [nm] long range electrostatic cut-off
  rvdw  = 1.0   ; [nm]  long range Van der Waals cut-off
  ; Electrostatics
  coulombtype= PME  ; treatment of long range electrostatic
  vdwtype = cut-off   ; treatment of Van der Waals
  ; Periodic boundary conditions
  pbc = xyz
  ; Dispersion correction
  DispCorr= EnerPres  ; appling long
  range dispersion corrections
  ; Ewald
  fourierspacing= 0.12; grid spacing for FFT  -
  controll the higest magnitude of wave vectors (0.12)
  pme_order = 4 ; interpolation order for PME, 

Re: [gmx-users] gpu cluster explanation

2013-07-12 Thread Richard Broadbent

On 12/07/13 13:26, Francesco wrote:

Hi all,
I'm working with a 200K atoms system (protein + explicit water) and
after a while using a cpu cluster I had to switch to a gpu cluster.
I read both Acceleration and parallelization and Gromacs-gpu
documentation pages
but it's a bit confusing and I need help to understand if I really have
understood correctly. :)
I have 2 type of nodes:
3gpu ( NVIDIA Tesla M2090) and 2 cpu 6cores each (Intel Xeon E5649 @
8gpu and 2 cpu (6 cores each)

1) I can only have 1 MPI per gpu, meaning that with 3 gpu I can have 3
MPI max.
2) because I have 12 cores I can open 4 OPenMP threads x MPI, because
4x3= 12

now if I have a node with 8 gpu, I can use 4 gpu:
4 MPI and 3 OpenMP
is it right?
is it possible to use 8 gpu and 8 cores only?

you could set -ntomp 0, however and setup mpi/thread mpi to use 8 cores. 
However, a system that unbalanced (huge amount of gpu power to 
comparatively little cpu power) is unlikely to get great performance.

Using gromacs 4.6.2 and 144 cpu cores I reach 35 ns/day, while with 3
gpu  and 12 cores I get 9-11 ns/day.

That slowdown is in line with what I got when I tried a similar cpu-gpu 
setup. That said other's might have some advice that will improve your 

the command that I use is:
mdrun -dlb yes -s input_50.tpr -deffnm 306s_50 -v
with n° gpu set via script :
#BSUB -n 3

I also tried to set -npme / -nt / -ntmpi / -ntomp, but nothing changes.

The mdp file and some statistics are following:


title = G6PD wt molecular dynamics (2bhl.pdb) - NPT MD

; Run parameters
integrator  = md; Algorithm options
nsteps  = 2500  ; maximum number of steps to
perform [50 ns]
dt  = 0.002 ; 2 fs = 0.002 ps

; Output control
nstxout= 1 ; [steps] freq to write coordinates to
trajectory, the last coordinates are always written
nstvout= 1 ; [steps] freq to write velocities to
trajectory, the last velocities are always written
nstlog  = 1 ; [steps] freq to write energies to log
file, the last energies are always written
nstenergy = 1  ; [steps] write energies to disk
every nstenergy steps
nstxtcout  = 1 ; [steps] freq to write coordinates to
xtc trajectory
xtc_precision   = 1000  ; precision to write to xtc trajectory
(1000 = default)
xtc_grps= system; which coordinate
group(s) to write to disk
energygrps  = system; or System / which energy
group(s) to writk

; Bond parameters
continuation= yes   ; restarting from npt
constraints = all-bonds ; Bond types to replace by constraints
constraint_algorithm= lincs ; holonomic constraints
lincs_iter  = 1 ; accuracy of LINCS
lincs_order = 4 ; also related to
lincs_warnangle  = 30; [degrees] maximum angle that a bond can
rotate before LINCS will complain

That seems a little loose for constraints but setting that up and 
checking it's conserving energy and preserving bond lengths is something 
you'll have to do yourself


; Neighborsearching
ns_type = grid  ; method of updating neighbor list
cutoff-scheme = Verlet
nstlist = 10; [steps] frequence to update
neighbor list (10)
rlist = 1.0   ; [nm] cut-off distance for the
short-range neighbor list  (1 default)
rcoulomb  = 1.0   ; [nm] long range electrostatic cut-off
rvdw  = 1.0   ; [nm]  long range Van der Waals cut-off

; Electrostatics
coulombtype= PME  ; treatment of long range electrostatic
vdwtype = cut-off   ; treatment of Van der Waals

; Periodic boundary conditions
pbc = xyz

; Dispersion correction
DispCorr= EnerPres  ; appling long
range dispersion corrections

; Ewald
fourierspacing= 0.12; grid spacing for FFT  -
controll the higest magnitude of wave vectors (0.12)
pme_order = 4 ; interpolation order for PME, 4 = cubic
ewald_rtol= 1e-5  ; relative strength of Ewald-shifted
potential at rcoulomb

; Temperature coupling
tcoupl  = nose-hoover   ; temperature
coupling with Nose-Hoover ensemble
tc_grps = Protein Non-Protein
tau_t   = 0.40.4; [ps]
time constant
ref_t   = 310310; [K]
reference temperature for coupling [310 = 28°C

; Pressure coupling
pcoupl  = parrinello-rahman

Re: Re: Re: [gmx-users] GPU-based workstation

2013-06-27 Thread James Starlight
Back to my question
I want to build gpu-based workstation based onto 2 titans geforces.

My current budget allow me only hight-end  6nodes core i 7-3930  and MB
with 5 PCI-E (like Asus rampage IV series). Would this system be balanced
with two GPUs ? Should I use two 6-8 nodes XEONS instead of i7?


2013/5/29 James Starlight

 Dear Dr. Pall!

 Thank you for your suggestions!

 Asumming that I have budget of 5000 $ and I want to build gpu-based
 desktop on this money.

 Previously I've used single 4 core i5 with GTX 670 and obtain average 10
 ns\day performance for the 70k atoms systems (1.0 cutoffs, no virtual sites
 , sd integrator).

 Now I'd like to build system based on 2 hight-end GeForces (e.g like
 Should that system include 2 cpu's for good balancing? (e.g two 6 nodes
 XEONS with faster clocks for instance could be better for simulations than
 i7, couldnt it?)

 What addition properties to the MB should I consider for such system ?


 2013/5/28 lloyd riggs

 Dear Dr. Pali,

 Thank you,

 Stephan Watkins

 *Gesendet:* Dienstag, 28. Mai 2013 um 19:50 Uhr
 *Von:* Szilárd Páll

 *An:* Discussion list for GROMACS users
 *Betreff:* Re: Re: [gmx-users] GPU-based workstation
 Dear all,

 As far as I understand, the OP is interested in hardware for *running*
 GROMACS 4.6 rather than developing code. or running LINPACK.

 To get best performance it is important to use a machine with hardware
 balanced for GROMACS' workloads. Too little GPU resources will result
 in CPU idling; too much GPU resources will lead to the runs being CPU
 or multi-GPU scaling bound and above a certain level GROMACS won't be
 able to make use of additional GPUs.

 Of course, the balance will depend both on hardware and simulation
 settings (mostly the LJ cut-off used).

 An additional factor to consider is typical system size. To reach near
 peak pair-force throughput on GPUs you typically need 20k-40k
 particles/GPU (depends on the architecture) and throughput drops below
 these values. Hence, in most cases it is preferred to use fewer and
 faster GPUs rather than more.

 Without knowing the budgdet and indented use of the machine it is hard
 to make suggestions, but I would say for a budget desktop box a
 quad-core Intel Ivy Bridge or the top-end AMD Piledriver CPU with a
 fast Kepler GTX card (e.g. GTX 680 or GTX 770/780) should work well.
 If you're considering dual-socket workstations, I suggest you go with
 the higher core-count and higher frequency Intel CPUs (6+ cores 2.2
 GHz), otherwise you may not see as much benefit as you would expect
 based on the insane price tag (especially if you compare to an i7
 3939K or its IVB successor).


 On Sat, May 25, 2013 at 1:02 PM, lloyd riggs wrote:
  More RAM the better, and the best I have seen is 4 GPU work station. I
  use/have used 4. The GPU takes 2 slots though, so a 7-8 PCIe board is
  really 3-4 GPU, except the tyan mentioned (there designed as blades so
 an 8
  or 10 slot board really holds 8 or 10 GPU's). There's cooling problems
  though with GPU's, as on a board there packed, so extra cooling things
  help not blow a GPU, but I would look for good ones (ask around), as
 its a
  video game market and they go for looks even though its in casing? The
  external RAM (not onboard GPU RAM) helps if you do a larger sim, but I
  know performance wise, the onboard GPU, the more RAM the
  normal work stations you can get 4 GPU's for a 300 US$ board, but then
  price goes way up (3-4000 US$ for an 8-10 gpu board). RAM ordered
 abroad is
  also cheep, 8 or 16 MB Vs. Shop...I have used 4 GPU's but only on tests
  software, not Gromacs, so would be nice to see performance...for a
 small 100
  atom molecule and 500 solvent, using just the CPU I get it to run 5-10
  minutes real for 1 ns sim, but tried simple large 800 amino, 25,000
  eq (NVT or NPT) runs and they clock at around 1 hour real for say 50 ps
  Gesendet: Samstag, 25. Mai 2013 um 07:54 Uhr
  Von: James Starlight
  An: Discussion list for GROMACS users
  Betreff: Re: [gmx-users] GPU-based workstation
  Dear Dr. Watkins!
  Thank you for the suggestions!
  In the local shops I've found only Core i7 with 6 cores (like Core
  i7-39xx) and 4 cores. Should I obtain much better performance with 6
  than with 4 cores in case of i7 cpu (assuming that I run simulation in
  cpu+gpu mode )?
  Also you've mentioned about 4 PCeI MD. Does it means that modern
  work-station could have 4 GPU's in one home-like desktop ? According to
  current task I suppose that 2 GPU's would be suitable for my simulations
  (assuming that I use typical ASUS MB and 650 Watt power unit). Have
  someone tried to use several GPU's on one workstation ? What attributes

Re: [gmx-users] GPU / CPU load imblance

2013-06-25 Thread Justin Lemkul

On 6/25/13 6:33 PM, Dwey wrote:

Hi gmx-users,

 I used  8-cores AMD CPU  with a GTX680 GPU [ with 1536 CUDA Cores]  to
run an example of Umbrella Sampling provided by Justin.
I am happy that GPU acceleration indeed helps me reduce significant time (
from 34 hours to 7 hours)  of computation in this example.
However, I found there was a NOTE on the screen like

  The GPU has 20% more load than the CPU. This imbalance causes
performance loss, consider using a shorter cut-off and a finer PME grid

Given a 20% load imbalance, I wonder if someone can give suggestions as to
how to avoid performance loss in terms of hardware (GPU/CPU)
improvement  or  the modification of  mdp file (see below).

I would avoid tweaking the .mdp settings.  There have been several reports where 
people hacked at nonbonded cutoffs to get better performance, and it resulted in 
totally useless output.  These settings are part of the force field.  Avoid 
changing them.

In terms of hardware,  dose this NOTE suggest that I should use a
higher-capacity GPU like GTX 780 [ with 2304 CUDA Cores] to balance load or
catch up speed  ?
If so,   can it help by adding  another card with  GTX 680 GPU in the same
box ?  but will it cause GPU/CPU imbalance load  again, which two GPU keep
waiting for 8-cores CPU  ?

There has been a lot of discussion on hardware, GPU/CPU balancing, etc. in 
recent days.  Please check the archive.  Some of the threads are quite detailed.



Force evaluation time GPU/CPU: 4.006 ms/2.578 ms = 1.554
For optimal performance this ratio should be close to 1

I have no idea how this is evaluated by 4.006 ms and 2.578 ms for GPU and
CPU time, respectively.

It will be very helpful to modify  the attached mdp for a better
load balance between GPU and CPU.

I appreciate kind advice and hints to improve this mdp file.



### courtesy  to  Justin #

title   = Umbrella pulling simulation
define  = -DPOSRES_B
; Run parameters
integrator  = md
dt  = 0.002
tinit   = 0
nsteps  = 500   ; 10 ns
nstcomm = 10
; Output parameters
nstxout = 5 ; every 100 ps
nstvout = 5
nstfout = 5000
nstxtcout   = 5000  ; every 10 ps
nstenergy   = 5000
; Bond parameters
constraint_algorithm= lincs
constraints = all-bonds
continuation= yes
; Single-range cutoff scheme
nstlist = 5
ns_type = grid
rlist   = 1.4
rcoulomb= 1.4
rvdw= 1.4
; PME electrostatics parameters
coulombtype = PME
fourierspacing  = 0.12
fourier_nx  = 0
fourier_ny  = 0
fourier_nz  = 0
pme_order   = 4
ewald_rtol  = 1e-5
optimize_fft= yes
; Berendsen temperature coupling is on in two groups
Tcoupl  = Nose-Hoover
tc_grps = Protein   Non-Protein
tau_t   = 0.5   0.5
ref_t   = 310   310
; Pressure coupling is on
Pcoupl  = Parrinello-Rahman
pcoupltype  = isotropic
tau_p   = 1.0
compressibility = 4.5e-5
ref_p   = 1.0
refcoord_scaling = com
; Generate velocities is off
gen_vel = no
; Periodic boundary conditions are on in all directions
pbc = xyz
; Long-range dispersion correction
DispCorr= EnerPres
cutoff-scheme   = Verlet
; Pull code
pull= umbrella
pull_geometry   = distance
pull_dim= N N Y
pull_start  = yes
pull_ngroups= 1
pull_group0 = Chain_B
pull_group1 = Chain_A
pull_init1  = 0
pull_rate1  = 0.0
pull_k1 = 1000  ; kJ mol^-1 nm^-2
pull_nstxout= 1000  ; every 2 ps
pull_nstfout= 1000  ; every 2 ps


Justin A. Lemkul, Ph.D.
Research Scientist
Department of Biochemistry
Virginia Tech
Blacksburg, VA
jalemkul[at] | (540) 231-9080

gmx-users mailing
* Please search the archive at before posting!
* Please don't post (un)subscribe requests to the list. Use the 
www interface or send it to

* Can't post? Read

Re: [gmx-users] GPU ECC question

2013-06-09 Thread Szilárd Páll
On Sat, Jun 8, 2013 at 9:21 PM, Albert wrote:

  Recently I found a strange question about Gromacs-4.6.2 on GPU workstaion.
 In my GTX690 machine, when I run md production I found that the ECC is on.
 However, in my another GTX590 machine, I found the ECC was off:

 4 GPUs detected:
   #0: NVIDIA GeForce GTX 590, compute cap.: 2.0, ECC:  no, stat: compatible
   #1: NVIDIA GeForce GTX 590, compute cap.: 2.0, ECC:  no, stat: compatible
   #2: NVIDIA GeForce GTX 590, compute cap.: 2.0, ECC:  no, stat: compatible
   #3: NVIDIA GeForce GTX 590, compute cap.: 2.0, ECC:  no, stat: compatible

 moreover, there is only two GTX590 in the machine, I don't know why Gromacs
 claimed 4 GPU detected. However, in my another Linux machine which also have
 two GTX590, Gromacs-4.6.2 only find 2 GPU, and ECC is still off.

 I am just wondering:

 (1) why in GTX690 the ECC can be on while it is off in my GTX590? I compiled
 Gromacs with the same options and the same version of intel compiler

Unless your 690 is in fact a Tesla K10 it does surely not support ECC!
Note that ECC is not something I personally think you really need.

 (2) why in machines both of physically installed two GTX590 cards, one of
 them was detected with 4 GPU while the other was claimed contains two GPU?

Both GTX 590 and 690 are dual-chip boards which means two independent
processing units with their own memory mounted on the same card and
connected by a PCI switch (NVIDIA NF200). Hence, the two GPUs on these
dual-chip boards will be enumerated as a separate devices. You can
double-check this in nvidia-smi which should give the same devices as
what mdrun reports. I suspect that one of the GPUs which is shown to
have only two GPUs suffers from some hardware or software issues.


 thank you very much

 gmx-users mailing
 * Please search the archive at before posting!
 * Please don't post (un)subscribe requests to the list. Use the www
 interface or send it to
 * Can't post? Read
gmx-users mailing
* Please search the archive at before posting!
* Please don't post (un)subscribe requests to the list. Use the 
www interface or send it to
* Can't post? Read

Re: [gmx-users] GPU problem

2013-06-04 Thread Chandan Choudhury
Hi Albert,

I think using -nt flag (-nt=16) with mdrun would solve your problem.


Chandan kumar Choudhury
NCL, Pune

On Tue, Jun 4, 2013 at 12:56 PM, Albert wrote:


  I've got four GPU in one workstation. I am trying to run two GPU job with

 mdrun -s md.tpr -gpu_id 01
 mdrun -s md.tpr -gpu_id 23

 there are 32 CPU in this workstation. I found that each job trying to use
 the whole CPU, and there are 64 sub job when these two GPU mdrun submitted.
  Moreover, one of the job stopped after short of running, probably because
 of the CPU issue.

 I am just wondering, how can we distribute CPU when we run two GPU job in
 a single workstation?

 thank you very much

 gmx-users mailing**mailman/listinfo/gmx-users
 * Please search the archive at**
 * Please don't post (un)subscribe requests to the list. Use the www
 interface or send it to
 * Can't post? Read**Support/Mailing_Lists

gmx-users mailing
* Please search the archive at before posting!
* Please don't post (un)subscribe requests to the list. Use the 
www interface or send it to
* Can't post? Read

Re: [gmx-users] GPU problem

2013-06-04 Thread Albert

On 06/04/2013 11:22 AM, Chandan Choudhury wrote:

Hi Albert,

I think using -nt flag (-nt=16) with mdrun would solve your problem.


thank you so much.

it works well now.

gmx-users mailing
* Please search the archive at before posting!
* Please don't post (un)subscribe requests to the list. Use the 
www interface or send it to

* Can't post? Read

Re: [gmx-users] GPU problem

2013-06-04 Thread Szilárd Páll
-nt is mostly a backward compatibility option and sets the total
number of threads (per rank). Instead, you should set both -ntmpi
(or -np with MPI) and -ntomp. However, note that unless a single
mdrun uses *all* cores/hardware threads on a node, it won't pin the
threads to cores. Failing to pin threads will lead to considerable
performance degradation; just tried and depending on how (un)lucky the
thread placement and migration is, I get 1.5-2x performance
degradation with running two mdrun-s on a single dual-socket node
without pining threads.

My advise is (yet again) that you should check the
wiki page, in particular the section on how to run simulations. If
things are not, clear please ask for clarification - input and
constructive criticism should help us improve the wiki.

We have been patiently pointing everyone to the wiki, so asking
without reading up first is neither productive nor really fair.


On Tue, Jun 4, 2013 at 11:22 AM, Chandan Choudhury wrote:
 Hi Albert,

 I think using -nt flag (-nt=16) with mdrun would solve your problem.


 Chandan kumar Choudhury
 NCL, Pune

 On Tue, Jun 4, 2013 at 12:56 PM, Albert wrote:


  I've got four GPU in one workstation. I am trying to run two GPU job with

 mdrun -s md.tpr -gpu_id 01
 mdrun -s md.tpr -gpu_id 23

 there are 32 CPU in this workstation. I found that each job trying to use
 the whole CPU, and there are 64 sub job when these two GPU mdrun submitted.
  Moreover, one of the job stopped after short of running, probably because
 of the CPU issue.

 I am just wondering, how can we distribute CPU when we run two GPU job in
 a single workstation?

 thank you very much

 gmx-users mailing**mailman/listinfo/gmx-users
 * Please search the archive at**
 * Please don't post (un)subscribe requests to the list. Use the www
 interface or send it to
 * Can't post? Read**Support/Mailing_Lists

 gmx-users mailing
 * Please search the archive at before posting!
 * Please don't post (un)subscribe requests to the list. Use the
 www interface or send it to
 * Can't post? Read
gmx-users mailing
* Please search the archive at before posting!
* Please don't post (un)subscribe requests to the list. Use the
www interface or send it to
* Can't post? Read

RE:[gmx-users] GPU problem

2013-06-04 Thread lloyd riggs

Dear All or anyone,

A stupid question. Is there an script anyone knows of to convert a 53a6ff from .top redirects to the gromacs/top directory to something like a ligand .itp? This is usefull at the moment. Example:


 6 7 2 gb_5



; ai aj fu c0, c1, ...

 6 7  2 0.139 1080.0 0.139 1080.0 ; C CH

for everything (a protein/DNA complex) inclusive of angles, dihedrials?

Ive been playing with some of the gromacs user supplied files, but nothing yet.

Stephan Watkins
gmx-users mailing
* Please search the archive at before posting!
* Please don't post (un)subscribe requests to the list. Use the 
www interface or send it to
* Can't post? Read

Re: [gmx-users] GPU problem

2013-06-04 Thread Justin Lemkul

On 6/4/13 3:52 PM, lloyd riggs wrote:

Dear All or anyone,
A stupid question.  Is there an script anyone knows of to convert a 53a6ff from
.top redirects to the gromacs/top directory to something like a ligand .itp?
This is usefull at the moment.  Example:
 6 7 2gb_5
; ai  aj  fuc0, c1, ...
   6  7   20.139  1080.00.139  1080.0 ;   C  CH
for everything (a protein/DNA complex) inclusive of angles, dihedrials?
Ive been playing with some of the gromacs user supplied files, but nothing yet.

Sounds like something grompp -pp should take care of.



Justin A. Lemkul, Ph.D.
Research Scientist
Department of Biochemistry
Virginia Tech
Blacksburg, VA
jalemkul[at] | (540) 231-9080

gmx-users mailing
* Please search the archive at before posting!
* Please don't post (un)subscribe requests to the list. Use the 
www interface or send it to

* Can't post? Read

Aw: Re: [gmx-users] GPU problem

2013-06-04 Thread lloyd riggs

Thanks, thats exact what I was looking for.


Gesendet:Dienstag, 04. Juni 2013 um 22:28 Uhr
Von:Justin Lemkul
An:Discussion list for GROMACS users
Betreff:Re: [gmx-users] GPU problem

On 6/4/13 3:52 PM, lloyd riggs wrote:
 Dear All or anyone,
 A stupid question. Is there an script anyone knows of to convert a 53a6ff from
 .top redirects to the gromacs/top directory to something like a ligand .itp?
 This is usefull at the moment. Example:
 6 7 2 gb_5
 ; ai aj fu c0, c1, ...
 6 7 2 0.139 1080.0 0.139 1080.0 ; C CH
 for everything (a protein/DNA complex) inclusive of angles, dihedrials?
 Ive been playing with some of the gromacs user supplied files, but nothing yet.

Sounds like something grompp -pp should take care of.



Justin A. Lemkul, Ph.D.
Research Scientist
Department of Biochemistry
Virginia Tech
Blacksburg, VA
jalemkul[at]  (540) 231-9080

gmx-users mailing list
* Please search the archive at before posting!
* Please dont post (un)subscribe requests to the list. Use the
www interface or send it to
* Cant post? Read

gmx-users mailing
* Please search the archive at before posting!
* Please don't post (un)subscribe requests to the list. Use the 
www interface or send it to
* Can't post? Read

Re: Re: [gmx-users] GPU-based workstation

2013-05-28 Thread Szilárd Páll
Dear all,

As far as I understand, the OP is interested in hardware for *running*
GROMACS 4.6 rather than developing code. or running LINPACK.

To get best performance it is important to use a machine with hardware
balanced for GROMACS' workloads. Too little GPU resources will result
in CPU idling; too much GPU resources will lead to the runs being CPU
or multi-GPU scaling bound and above a certain level GROMACS won't be
able to make use of additional GPUs.

Of course, the balance will depend both on hardware and simulation
settings (mostly the LJ cut-off used).

An additional factor to consider is typical system size. To reach near
peak pair-force throughput on GPUs you typically need 20k-40k
particles/GPU (depends on the architecture) and throughput drops below
these values. Hence, in most cases it is preferred to use fewer and
faster GPUs rather than more.

Without knowing the budgdet and indented use of the machine it is hard
to make suggestions, but I would say for a budget desktop box a
quad-core Intel Ivy Bridge or the top-end AMD Piledriver CPU with a
fast Kepler GTX card (e.g. GTX 680 or GTX 770/780) should work well.
If you're considering dual-socket workstations, I suggest you go with
the higher core-count and higher frequency Intel CPUs (6+ cores 2.2
GHz), otherwise you may not see as much benefit as you would expect
based on the insane price tag (especially if you compare to an i7
3939K or its IVB successor).


On Sat, May 25, 2013 at 1:02 PM, lloyd riggs wrote:
 More RAM the better, and the best I have seen is 4 GPU work station.  I can
 use/have used 4.  The GPU takes 2 slots though, so a 7-8 PCIe board is
 really 3-4 GPU, except the tyan mentioned (there designed as blades so an 8
 or 10 slot board really holds 8 or 10 GPU's).  There's cooling problems
 though with GPU's, as on a board there packed, so extra cooling things may
 help not blow a GPU, but I would look for good ones (ask around), as its a
 video game market and they go for looks even though its in casing?  The
 external RAM (not onboard GPU RAM) helps if you do a larger sim, but I dont
 know performance wise, the onboard GPU, the more RAM the yes,
 normal work stations you can get 4 GPU's for a 300 US$ board, but then the
 price goes way up (3-4000 US$ for an 8-10 gpu board).  RAM ordered abroad is
 also cheep, 8 or 16 MB Vs. Shop...I have used 4 GPU's but only on tests
 software, not Gromacs, so would be nice to see performance...for a small 100
 atom molecule and 500 solvent, using just the CPU I get it to run 5-10
 minutes real  for 1 ns sim, but tried simple large 800 amino, 25,000 solvent
 eq (NVT or NPT) runs and they clock at around 1 hour real for say 50 ps


 Gesendet: Samstag, 25. Mai 2013 um 07:54 Uhr
 Von: James Starlight
 An: Discussion list for GROMACS users
 Betreff: Re: [gmx-users] GPU-based workstation
 Dear Dr. Watkins!

 Thank you for the suggestions!

 In the local shops I've found only Core i7 with 6 cores (like Core
 i7-39xx) and 4 cores. Should I obtain much better performance with 6 cores
 than with 4 cores in case of i7 cpu (assuming that I run simulation in
 cpu+gpu mode )?

 Also you've mentioned about 4 PCeI MD. Does it means that modern
 work-station could have 4 GPU's in one home-like desktop ? According to my
 current task I suppose that 2 GPU's would be suitable for my simulations
 (assuming that I use typical ASUS MB and 650 Watt power unit). Have
 someone tried to use several GPU's on one workstation ? What attributes of
 MB should be taken into account for best performance on such multi-gpu
 station ?


 2013/5/25 lloyd riggs

 There's also these, but 1 chip runs 6K US, they can get performance up to
 2.3 teraflops per chip though double percission...but have no clue about
 integration with GPU's...Intell also sells their chips on PCIe cards...but
 get only about 350 Gflops, and run 1K US$. and vendor

 They can design them though to fit a PCIe slot and run about the same, but
 still need the board, ram etc...

 Mostly just to dream about, they say you can order them with radiation
 shielding as

 Stephan Watkins

 *Gesendet:* Freitag, 24. Mai 2013 um 13:17 Uhr
 *Von:* James Starlight
 *An:* Discussion list for GROMACS users
 *Betreff:* [gmx-users] GPU-based workstation
 Dear Gromacs Users!

 I'd like to build new workstation for performing simulation on GPU with
 Gromacs 4.6 native cuda support.
 Recently I've used such setup with Core i5 cpu and nvidia 670 GTX video
 and obtain good performance ( ~ 20 ns\day for typical 60.000 atom system
 with SD integrator)

 Now I'd like to build multi-gpu wokstation.

 My question - How much GPU would give me best performance on the typical
 home-like workstation. What

Re: Aw: Re: [gmx-users] GPU-based workstation

2013-05-28 Thread Szilárd Páll
On Sat, May 25, 2013 at 2:16 PM, Broadbent, Richard wrote:
 I've been running on my Universities GPU nodes these are one E5-xeon (6-cores 
 12 threads)  and have 4 Nvidia 690gtx's. My system is 93 000 atoms of DMF 
 under NVE.  The performance has been a little disappointing

That sounds like a very imbalanced system for GROMACS, you have
essentially 8 GPUs with rather poor PCI-E performance (a board share a
single PCI-E bus) and only 12 CPU cores to drive the simulation.

~10ns/day. On my home system using a core i5-2500 and a nvidia 560ti I
get 5.4ns/day for the same system. On our HPC system using 32 nodes
each with 2 quad-core xeon processors I get 30-40ns/day.

That sounds somewhat low if these are all moderately fast CPUs and GPUs.

 I think that to achieve reasonable performance the system has to be balanced 
 between CPU's and GPU's probably getting 2 high end GPU's and a top end xeon 
 E5 or core i7 would be a good choice.

Indeed. Even two GPUs may be too much - unless the CPU in question is
a very high end i7 or E5.



 From: lloyd riggs
 Reply-To: Discussion users
 Date: Saturday, 25 May 2013 12:02
 To: Discussion users
 Subject: Aw: Re: [gmx-users] GPU-based workstation

 More RAM the better, and the best I have seen is 4 GPU work station.  I can 
 use/have used 4.  The GPU takes 2 slots though, so a 7-8 PCIe board is really 
 3-4 GPU, except the tyan mentioned (there designed as blades so an 8 or 10 
 slot board really holds 8 or 10 GPU's).  There's cooling problems though with 
 GPU's, as on a board there packed, so extra cooling things may help not blow 
 a GPU, but I would look for good ones (ask around), as its a video game 
 market and they go for looks even though its in casing?  The external RAM 
 (not onboard GPU RAM) helps if you do a larger sim, but I dont know 
 performance wise, the onboard GPU, the more RAM the yes, normal 
 work stations you can get 4 GPU's for a 300 US$ board, but then the price 
 goes way up (3-4000 US$ for an 8-10 gpu board).  RAM ordered abroad is also 
 cheep, 8 or 16 MB Vs. Shop...I have used 4 GPU's but only on tests software, 
 not Gromacs, so would be nice to see performance...for a small 100 atom 
 molecule and 500 solvent, using just the CPU I get it to run 5-10 minutes 
 real  for 1 ns sim, but tried simple large 800 amino, 25,000 solvent eq (NVT 
 or NPT) runs and they clock at around 1 hour real for say 50 ps eq's


 Gesendet: Samstag, 25. Mai 2013 um 07:54 Uhr
 Von: James Starlight
 An: Discussion list for GROMACS users
 Betreff: Re: [gmx-users] GPU-based workstation
 Dear Dr. Watkins!

 Thank you for the suggestions!

 In the local shops I've found only Core i7 with 6 cores (like Core
 i7-39xx) and 4 cores. Should I obtain much better performance with 6 cores
 than with 4 cores in case of i7 cpu (assuming that I run simulation in
 cpu+gpu mode )?

 Also you've mentioned about 4 PCeI MD. Does it means that modern
 work-station could have 4 GPU's in one home-like desktop ? According to my
 current task I suppose that 2 GPU's would be suitable for my simulations
 (assuming that I use typical ASUS MB and 650 Watt power unit). Have
 someone tried to use several GPU's on one workstation ? What attributes of
 MB should be taken into account for best performance on such multi-gpu
 station ?


 2013/5/25 lloyd riggs

 There's also these, but 1 chip runs 6K US, they can get performance up to
 2.3 teraflops per chip though double percission...but have no clue about
 integration with GPU's...Intell also sells their chips on PCIe cards...but
 get only about 350 Gflops, and run 1K US$. and vendor

 They can design them though to fit a PCIe slot and run about the same, but
 still need the board, ram etc...

 Mostly just to dream about, they say you can order them with radiation
 shielding as

 Stephan Watkins

 *Gesendet:* Freitag, 24. Mai 2013 um 13:17 Uhr
 *Von:* James Starlight
 *An:* Discussion list for GROMACS users
 *Betreff:* [gmx-users] GPU-based workstation
 Dear Gromacs Users!

 I'd like to build new workstation for performing simulation on GPU with
 Gromacs 4.6 native cuda support.
 Recently I've used such setup with Core i5 cpu and nvidia 670 GTX video
 and obtain good performance ( ~ 20 ns\day for typical 60.000 atom system
 with SD integrator)

 Now I'd like to build multi-gpu wokstation.

 My question - How much GPU would give me best performance

Aw: Re: Re: [gmx-users] GPU-based workstation

2013-05-28 Thread lloyd riggs

Dear Dr. Pali,

Thank you,

Stephan Watkins

Gesendet:Dienstag, 28. Mai 2013 um 19:50 Uhr
Von:Szilrd Pll
An:Discussion list for GROMACS users
Betreff:Re: Re: [gmx-users] GPU-based workstation

Dear all,

As far as I understand, the OP is interested in hardware for *running*
GROMACS 4.6 rather than developing code. or running LINPACK.

To get best performance it is important to use a machine with hardware
balanced for GROMACS workloads. Too little GPU resources will result
in CPU idling; too much GPU resources will lead to the runs being CPU
or multi-GPU scaling bound and above a certain level GROMACS wont be
able to make use of additional GPUs.

Of course, the balance will depend both on hardware and simulation
settings (mostly the LJ cut-off used).

An additional factor to consider is typical system size. To reach near
peak pair-force throughput on GPUs you typically need 20k-40k
particles/GPU (depends on the architecture) and throughput drops below
these values. Hence, in most cases it is preferred to use fewer and
faster GPUs rather than more.

Without knowing the budgdet and indented use of the machine it is hard
to make suggestions, but I would say for a budget desktop box a
quad-core Intel Ivy Bridge or the top-end AMD Piledriver CPU with a
fast Kepler GTX card (e.g. GTX 680 or GTX 770/780) should work well.
If youre considering dual-socket workstations, I suggest you go with
the higher core-count and higher frequency Intel CPUs (6+ cores 2.2
GHz), otherwise you may not see as much benefit as you would expect
based on the insane price tag (especially if you compare to an i7
3939K or its IVB successor).


On Sat, May 25, 2013 at 1:02 PM, lloyd riggs wrote:
 More RAM the better, and the best I have seen is 4 GPU work station. I can
 use/have used 4. The GPU takes 2 slots though, so a 7-8 PCIe board is
 really 3-4 GPU, except the tyan mentioned (there designed as blades so an 8
 or 10 slot board really holds 8 or 10 GPUs). Theres cooling problems
 though with GPUs, as on a board there packed, so extra cooling things may
 help not blow a GPU, but I would look for good ones (ask around), as its a
 video game market and they go for looks even though its in casing? The
 external RAM (not onboard GPU RAM) helps if you do a larger sim, but I dont
 know performance wise, the onboard GPU, the more RAM the yes,
 normal work stations you can get 4 GPUs for a 300 US board, but then the
 price goes way up (3-4000 US for an 8-10 gpu board). RAM ordered abroad is
 also cheep, 8 or 16 MB Vs. Shop...I have used 4 GPUs but only on tests
 software, not Gromacs, so would be nice to see performance...for a small 100
 atom molecule and 500 solvent, using just the CPU I get it to run 5-10
 minutes real for 1 ns sim, but tried simple large 800 amino, 25,000 solvent
 eq (NVT or NPT) runs and they clock at around 1 hour real for say 50 ps


 Gesendet: Samstag, 25. Mai 2013 um 07:54 Uhr
 Von: James Starlight
 An: Discussion list for GROMACS users
 Betreff: Re: [gmx-users] GPU-based workstation
 Dear Dr. Watkins!

 Thank you for the suggestions!

 In the local shops Ive found only Core i7 with 6 cores (like Core
 i7-39xx) and 4 cores. Should I obtain much better performance with 6 cores
 than with 4 cores in case of i7 cpu (assuming that I run simulation in
 cpu+gpu mode )?

 Also youve mentioned about 4 PCeI MD. Does it means that modern
 work-station could have 4 GPUs in one home-like desktop ? According to my
 current task I suppose that 2 GPUs would be suitable for my simulations
 (assuming that I use typical ASUS MB and 650 Watt power unit). Have
 someone tried to use several GPUs on one workstation ? What attributes of
 MB should be taken into account for best performance on such multi-gpu
 station ?


 2013/5/25 lloyd riggs

 Theres also these, but 1 chip runs 6K US, they can get performance up to
 2.3 teraflops per chip though double percission...but have no clue about
 integration with GPUs...Intell also sells their chips on PCIe cards...but
 get only about 350 Gflops, and run 1K US. and vendor

 They can design them though to fit a PCIe slot and run about the same, but
 still need the board, ram etc...

 Mostly just to dream about, they say you can order them with radiation
 shielding as

 Stephan Watkins

 *Gesendet:* Freitag, 24. Mai 2013 um 13:17 Uhr
 *Von:* James Starlight
 *An:* Discussion list for GROMACS users
 *Betreff:* [gmx-users] GPU-based workstation
 Dear Gromacs Users!

 Id like to build new workstation for performing simulation on GPU with
 Gromacs 4.6 native cuda support.
 Recently Ive used such setup with Core i5 cpu and nvidia 670 GTX video
 and obtain good performance

Re: Re: Re: [gmx-users] GPU-based workstation

2013-05-28 Thread James Starlight
Dear Dr. Pall!

Thank you for your suggestions!

Asumming that I have budget of 5000 $ and I want to build gpu-based desktop
on this money.

Previously I've used single 4 core i5 with GTX 670 and obtain average 10
ns\day performance for the 70k atoms systems (1.0 cutoffs, no virtual sites
, sd integrator).

Now I'd like to build system based on 2 hight-end GeForces (e.g like TITAN).
Should that system include 2 cpu's for good balancing? (e.g two 6 nodes
XEONS with faster clocks for instance could be better for simulations than
i7, couldnt it?)

What addition properties to the MB should I consider for such system ?


2013/5/28 lloyd riggs

 Dear Dr. Pali,

 Thank you,

 Stephan Watkins

 *Gesendet:* Dienstag, 28. Mai 2013 um 19:50 Uhr
 *Von:* Szilárd Páll

 *An:* Discussion list for GROMACS users
 *Betreff:* Re: Re: [gmx-users] GPU-based workstation
 Dear all,

 As far as I understand, the OP is interested in hardware for *running*
 GROMACS 4.6 rather than developing code. or running LINPACK.

 To get best performance it is important to use a machine with hardware
 balanced for GROMACS' workloads. Too little GPU resources will result
 in CPU idling; too much GPU resources will lead to the runs being CPU
 or multi-GPU scaling bound and above a certain level GROMACS won't be
 able to make use of additional GPUs.

 Of course, the balance will depend both on hardware and simulation
 settings (mostly the LJ cut-off used).

 An additional factor to consider is typical system size. To reach near
 peak pair-force throughput on GPUs you typically need 20k-40k
 particles/GPU (depends on the architecture) and throughput drops below
 these values. Hence, in most cases it is preferred to use fewer and
 faster GPUs rather than more.

 Without knowing the budgdet and indented use of the machine it is hard
 to make suggestions, but I would say for a budget desktop box a
 quad-core Intel Ivy Bridge or the top-end AMD Piledriver CPU with a
 fast Kepler GTX card (e.g. GTX 680 or GTX 770/780) should work well.
 If you're considering dual-socket workstations, I suggest you go with
 the higher core-count and higher frequency Intel CPUs (6+ cores 2.2
 GHz), otherwise you may not see as much benefit as you would expect
 based on the insane price tag (especially if you compare to an i7
 3939K or its IVB successor).


 On Sat, May 25, 2013 at 1:02 PM, lloyd riggs wrote:
  More RAM the better, and the best I have seen is 4 GPU work station. I
  use/have used 4. The GPU takes 2 slots though, so a 7-8 PCIe board is
  really 3-4 GPU, except the tyan mentioned (there designed as blades so
 an 8
  or 10 slot board really holds 8 or 10 GPU's). There's cooling problems
  though with GPU's, as on a board there packed, so extra cooling things
  help not blow a GPU, but I would look for good ones (ask around), as its
  video game market and they go for looks even though its in casing? The
  external RAM (not onboard GPU RAM) helps if you do a larger sim, but I
  know performance wise, the onboard GPU, the more RAM the
  normal work stations you can get 4 GPU's for a 300 US$ board, but then
  price goes way up (3-4000 US$ for an 8-10 gpu board). RAM ordered abroad
  also cheep, 8 or 16 MB Vs. Shop...I have used 4 GPU's but only on tests
  software, not Gromacs, so would be nice to see performance...for a small
  atom molecule and 500 solvent, using just the CPU I get it to run 5-10
  minutes real for 1 ns sim, but tried simple large 800 amino, 25,000
  eq (NVT or NPT) runs and they clock at around 1 hour real for say 50 ps
  Gesendet: Samstag, 25. Mai 2013 um 07:54 Uhr
  Von: James Starlight
  An: Discussion list for GROMACS users
  Betreff: Re: [gmx-users] GPU-based workstation
  Dear Dr. Watkins!
  Thank you for the suggestions!
  In the local shops I've found only Core i7 with 6 cores (like Core
  i7-39xx) and 4 cores. Should I obtain much better performance with 6
  than with 4 cores in case of i7 cpu (assuming that I run simulation in
  cpu+gpu mode )?
  Also you've mentioned about 4 PCeI MD. Does it means that modern
  work-station could have 4 GPU's in one home-like desktop ? According to
  current task I suppose that 2 GPU's would be suitable for my simulations
  (assuming that I use typical ASUS MB and 650 Watt power unit). Have
  someone tried to use several GPU's on one workstation ? What attributes
  MB should be taken into account for best performance on such multi-gpu
  station ?
  2013/5/25 lloyd riggs
  There's also these, but 1 chip runs 6K US, they can get performance up
  2.3 teraflops per chip though double percission...but have no clue about
  integration with GPU's...Intell also sells their chips on PCIe

Re: Re: Re: [gmx-users] GPU-based workstation

2013-05-27 Thread James Starlight
On Nvidia benchmarks I've found suggestions of using of the two 6 cores CPU
for systems with the 2 GPU.

Assuming that I'll be using two 680 GTX cards with 256 bits and 4gb ram
(not a profesional nvidia cards like TESLA)
what CPU's could give me the best performance- 1 i7 of 8 cores
or 2 Xeons e5 with 6 cores ? Does it meaningful to use 2 separate CPU's
with several nodes each for the 2 GPU's ?


2013/5/26 lloyd riggs

 You can also look at profilling on varied web sites, the high end Nvidia
 run only slightly better than the 2 year old ones, from an individual point
 not worth the money yet, but if you have the money? as I've been browsing.

 Also, the sim I did on the cluster was 180-190,000 atoms so the exact same
 performance the other person had.

  *Gesendet:* Samstag, 25. Mai 2013 um 15:19 Uhr
 *Von:* James Starlight

 *An:* Discussion list for GROMACS users
 *Betreff:* Re: Aw: Re: [gmx-users] GPU-based workstation

 thanks for suggestion!

 Assuming that I'm using 2 high end GeForce's what performance be better

 1) in case of one i7 (4 or 6 nodes ) ?

 2) in case of 8 core Xeon like CPU Intel Xeon E5-2650 2.0 GHz / 8core

 What properties of MB should take into account primarily for such
 Xenon-based system. Does such MBs support multi-GPU ( I noticed that many
 such MBs lack for PCI)?


 2013/5/25 Broadbent, Richard

  I've been running on my Universities GPU nodes these are one E5-xeon
  (6-cores 12 threads) and have 4 Nvidia 690gtx's. My system is 93 000
  of DMF under NVE. The performance has been a little disappointing
  ~10ns/day. On my home system using a core i5-2500 and a nvidia 560ti I
  5.4ns/day for the same system. On our HPC system using 32 nodes each
 with 2
  quad-core xeon processors I get 30-40ns/day.
  I think that to achieve reasonable performance the system has to be
  balanced between CPU's and GPU's probably getting 2 high end GPU's and a
  top end xeon E5 or core i7 would be a good choice.
  From: lloyd riggs
  Reply-To: Discussion users gmx-users@gromacs.orgmailto:
  Date: Saturday, 25 May 2013 12:02
  To: Discussion users
  Subject: Aw: Re: [gmx-users] GPU-based workstation
  More RAM the better, and the best I have seen is 4 GPU work station. I
  can use/have used 4. The GPU takes 2 slots though, so a 7-8 PCIe board is
  really 3-4 GPU, except the tyan mentioned (there designed as blades so
 an 8
  or 10 slot board really holds 8 or 10 GPU's). There's cooling problems
  though with GPU's, as on a board there packed, so extra cooling things
  help not blow a GPU, but I would look for good ones (ask around), as its
  video game market and they go for looks even though its in casing? The
  external RAM (not onboard GPU RAM) helps if you do a larger sim, but I
  know performance wise, the onboard GPU, the more RAM the
  normal work stations you can get 4 GPU's for a 300 US$ board, but then
  price goes way up (3-4000 US$ for an 8-10 gpu board). RAM ordered abroad
  is also cheep, 8 or 16 MB Vs. Shop...I have used 4 GPU's but only on
  software, not Gromacs, so would be nice to see performance...for a small
  100 atom molecule and 500 solvent, using just the CPU I get it to run
  minutes real for 1 ns sim, but tried simple large 800 amino, 25,000
  solvent eq (NVT or NPT) runs and they clock at around 1 hour real for say
  50 ps eq's
  Gesendet: Samstag, 25. Mai 2013 um 07:54 Uhr
  Von: James Starlight jmsstarli...@gmail.commailto:
  An: Discussion list for GROMACS users gmx-users@gromacs.orgmailto:
  Betreff: Re: [gmx-users] GPU-based workstation
  Dear Dr. Watkins!
  Thank you for the suggestions!
  In the local shops I've found only Core i7 with 6 cores (like Core
  i7-39xx) and 4 cores. Should I obtain much better performance with 6
  than with 4 cores in case of i7 cpu (assuming that I run simulation in
  cpu+gpu mode )?
  Also you've mentioned about 4 PCeI MD. Does it means that modern
  work-station could have 4 GPU's in one home-like desktop ? According to
  current task I suppose that 2 GPU's would be suitable for my simulations
  (assuming that I use typical ASUS MB and 650 Watt power unit). Have
  someone tried to use several GPU's on one workstation ? What attributes
  MB should be taken into account for best performance on such multi-gpu
  station ?
  2013/5/25 lloyd riggs
   There's also these, but 1 chip runs 6K US, they can get performance up
   2.3 teraflops per chip though double percission...but have no clue
   integration with GPU's...Intell also sells their chips on PCIe

Re: [gmx-users] GPU-based workstation

2013-05-25 Thread James Starlight
Dear Dr. Watkins!

Thank you for the suggestions!

In the local shops I've found only Core i7 with 6 cores (like  Core
i7-39xx) and 4 cores.  Should I obtain much better performance with 6 cores
than with 4 cores in case of i7 cpu (assuming that I run simulation in
cpu+gpu mode )?

Also you've mentioned about 4 PCeI MD. Does it means that modern
work-station could have 4 GPU's in one home-like desktop ? According to my
current task I suppose that 2 GPU's would be suitable for my simulations
(assuming that I use typical ASUS MB  and 650 Watt power unit). Have
someone tried to use several GPU's on one workstation ? What attributes of
MB should be taken into account for best performance on such multi-gpu
station ?


2013/5/25 lloyd riggs

 There's also these, but 1 chip runs 6K US, they can get performance up to
 2.3 teraflops per chip though double percission...but have no clue about
 integration with GPU's...Intell also sells their chips on PCIe cards...but
 get only about 350 Gflops, and run 1K US$. and vendor

 They can design them though to fit a PCIe slot and run about the same, but
 still need the board, ram etc...

 Mostly just to dream about, they say you can order them with radiation
 shielding as

 Stephan Watkins

 *Gesendet:* Freitag, 24. Mai 2013 um 13:17 Uhr
 *Von:* James Starlight
 *An:* Discussion list for GROMACS users
 *Betreff:* [gmx-users] GPU-based workstation
 Dear Gromacs Users!

 I'd like to build new workstation for performing simulation on GPU with
 Gromacs 4.6 native cuda support.
 Recently I've used such setup with Core i5 cpu and nvidia 670 GTX video
 and obtain good performance ( ~ 20 ns\day for typical 60.000 atom system
 with SD integrator)

 Now I'd like to build multi-gpu wokstation.

 My question - How much GPU would give me best performance on the typical
 home-like workstation. What algorithm of Ncidia GPU integration should I
 use (e.g SLI etc) ?

 Thanks for help,

 gmx-users mailing list
 * Please search the archive at before posting!
 * Please don't post (un)subscribe requests to the list. Use the
 www interface or send it to
 * Can't post? Read

 gmx-users mailing
 * Please search the archive at before posting!
 * Please don't post (un)subscribe requests to the list. Use the
 www interface or send it to
 * Can't post? Read

gmx-users mailing
* Please search the archive at before posting!
* Please don't post (un)subscribe requests to the list. Use the 
www interface or send it to
* Can't post? Read

Aw: Re: [gmx-users] GPU-based workstation

2013-05-25 Thread lloyd riggs

More RAM the better, and the best I have seen is 4 GPU work station. I can use/have used 4. The GPU takes 2 slots though, so a 7-8 PCIe board is really 3-4 GPU, except the tyan mentioned (there designed as blades so an 8 or 10 slot board really holds 8 or 10 GPUs). Theres cooling problems though with GPUs, as on a board there packed, so extra cooling things may help not blow a GPU, but I would look for good ones (ask around), as its a video game market and they go for looks even though its in casing? The external RAM (not onboard GPU RAM) helps if you do a larger sim, but I dont know performance wise, the onboard GPU, the more RAM the yes, normal work stations you can get 4 GPUs for a 300 US board, but then the price goes way up (3-4000 US for an 8-10 gpu board). RAM ordered abroad is also cheep, 8 or 16 MB Vs. Shop...I have used 4 GPUs but only on tests software, not Gromacs, so would be nice to see performance...for a small 100 atom molecule and 500 solvent, using just the CPU I get it to run 5-10 minutes real for 1 ns sim, but tried simple large 800 amino, 25,000 solvent eq (NVT or NPT) runs and they clock at around 1 hour real for say 50 ps eqs


Gesendet:Samstag, 25. Mai 2013 um 07:54 Uhr
Von:James Starlight
An:Discussion list for GROMACS users
Betreff:Re: [gmx-users] GPU-based workstation

Dear Dr. Watkins!

Thank you for the suggestions!

In the local shops Ive found only Core i7 with 6 cores (like Core
i7-39xx) and 4 cores. Should I obtain much better performance with 6 cores
than with 4 cores in case of i7 cpu (assuming that I run simulation in
cpu+gpu mode )?

Also youve mentioned about 4 PCeI MD. Does it means that modern
work-station could have 4 GPUs in one home-like desktop ? According to my
current task I suppose that 2 GPUs would be suitable for my simulations
(assuming that I use typical ASUS MB and 650 Watt power unit). Have
someone tried to use several GPUs on one workstation ? What attributes of
MB should be taken into account for best performance on such multi-gpu
station ?


2013/5/25 lloyd riggs

 Theres also these, but 1 chip runs 6K US, they can get performance up to
 2.3 teraflops per chip though double percission...but have no clue about
 integration with GPUs...Intell also sells their chips on PCIe cards...but
 get only about 350 Gflops, and run 1K US. and vendor

 They can design them though to fit a PCIe slot and run about the same, but
 still need the board, ram etc...

 Mostly just to dream about, they say you can order them with radiation
 shielding as

 Stephan Watkins

 *Gesendet:* Freitag, 24. Mai 2013 um 13:17 Uhr
 *Von:* James Starlight
 *An:* Discussion list for GROMACS users
 *Betreff:* [gmx-users] GPU-based workstation
 Dear Gromacs Users!

 Id like to build new workstation for performing simulation on GPU with
 Gromacs 4.6 native cuda support.
 Recently Ive used such setup with Core i5 cpu and nvidia 670 GTX video
 and obtain good performance ( ~ 20 nsday for typical 60.000 atom system
 with SD integrator)

 Now Id like to build multi-gpu wokstation.

 My question - How much GPU would give me best performance on the typical
 home-like workstation. What algorithm of Ncidia GPU integration should I
 use (e.g SLI etc) ?

 Thanks for help,

 gmx-users mailing list
 * Please search the archive at before posting!
 * Please dont post (un)subscribe requests to the list. Use the
 www interface or send it to
 * Cant post? Read

 gmx-users mailing list
 * Please search the archive at before posting!
 * Please dont post (un)subscribe requests to the list. Use the
 www interface or send it to
 * Cant post? Read

gmx-users mailing list
* Please search the archive at before posting!
* Please dont post (un)subscribe requests to the list. Use the
www interface or send it to
* Cant post? Read

gmx-users mailing
* Please search the archive at before posting!
* Please don't post (un)subscribe requests to the list. Use the 
www interface or send it to

Re: Aw: Re: [gmx-users] GPU-based workstation

2013-05-25 Thread Broadbent, Richard
I've been running on my Universities GPU nodes these are one E5-xeon (6-cores 
12 threads)  and have 4 Nvidia 690gtx's. My system is 93 000 atoms of DMF under 
NVE.  The performance has been a little disappointing ~10ns/day. On my home 
system using a core i5-2500 and a nvidia 560ti I get 5.4ns/day for the same 
system. On our HPC system using 32 nodes each with 2 quad-core xeon processors 
I get 30-40ns/day.

I think that to achieve reasonable performance the system has to be balanced 
between CPU's and GPU's probably getting 2 high end GPU's and a top end xeon E5 
or core i7 would be a good choice.


From: lloyd riggs
Reply-To: Discussion users
Date: Saturday, 25 May 2013 12:02
To: Discussion users
Subject: Aw: Re: [gmx-users] GPU-based workstation

More RAM the better, and the best I have seen is 4 GPU work station.  I can 
use/have used 4.  The GPU takes 2 slots though, so a 7-8 PCIe board is really 
3-4 GPU, except the tyan mentioned (there designed as blades so an 8 or 10 slot 
board really holds 8 or 10 GPU's).  There's cooling problems though with GPU's, 
as on a board there packed, so extra cooling things may help not blow a GPU, 
but I would look for good ones (ask around), as its a video game market and 
they go for looks even though its in casing?  The external RAM (not onboard GPU 
RAM) helps if you do a larger sim, but I dont know performance wise, the 
onboard GPU, the more RAM the yes, normal work stations you can 
get 4 GPU's for a 300 US$ board, but then the price goes way up (3-4000 US$ for 
an 8-10 gpu board).  RAM ordered abroad is also cheep, 8 or 16 MB Vs. Shop...I 
have used 4 GPU's but only on tests software, not Gromacs, so would be nice to 
see performance...for a small 100 atom molecule and 500 solvent, using just the 
CPU I get it to run 5-10 minutes real  for 1 ns sim, but tried simple large 800 
amino, 25,000 solvent eq (NVT or NPT) runs and they clock at around 1 hour real 
for say 50 ps eq's


Gesendet: Samstag, 25. Mai 2013 um 07:54 Uhr
Von: James Starlight
An: Discussion list for GROMACS users
Betreff: Re: [gmx-users] GPU-based workstation
Dear Dr. Watkins!

Thank you for the suggestions!

In the local shops I've found only Core i7 with 6 cores (like Core
i7-39xx) and 4 cores. Should I obtain much better performance with 6 cores
than with 4 cores in case of i7 cpu (assuming that I run simulation in
cpu+gpu mode )?

Also you've mentioned about 4 PCeI MD. Does it means that modern
work-station could have 4 GPU's in one home-like desktop ? According to my
current task I suppose that 2 GPU's would be suitable for my simulations
(assuming that I use typical ASUS MB and 650 Watt power unit). Have
someone tried to use several GPU's on one workstation ? What attributes of
MB should be taken into account for best performance on such multi-gpu
station ?


2013/5/25 lloyd riggs

 There's also these, but 1 chip runs 6K US, they can get performance up to
 2.3 teraflops per chip though double percission...but have no clue about
 integration with GPU's...Intell also sells their chips on PCIe cards...but
 get only about 350 Gflops, and run 1K US$. and vendor

 They can design them though to fit a PCIe slot and run about the same, but
 still need the board, ram etc...

 Mostly just to dream about, they say you can order them with radiation
 shielding as

 Stephan Watkins

 *Gesendet:* Freitag, 24. Mai 2013 um 13:17 Uhr
 *Von:* James Starlight
 *An:* Discussion list for GROMACS users
 *Betreff:* [gmx-users] GPU-based workstation
 Dear Gromacs Users!

 I'd like to build new workstation for performing simulation on GPU with
 Gromacs 4.6 native cuda support.
 Recently I've used such setup with Core i5 cpu and nvidia 670 GTX video
 and obtain good performance ( ~ 20 ns\day for typical 60.000 atom system
 with SD integrator)

 Now I'd like to build multi-gpu wokstation.

 My question - How much GPU would give me best performance on the typical
 home-like workstation. What algorithm of Ncidia GPU integration should I
 use (e.g SLI etc) ?

 Thanks for help,

 gmx-users mailing list
 * Please search the archive at before posting!
 * Please don't post (un)subscribe requests to the list. Use the
 www interface or send it to
 * Can't post

Re: Aw: Re: [gmx-users] GPU-based workstation

2013-05-25 Thread James Starlight

thanks for suggestion!

Assuming that I'm using 2 high end GeForce's what performance be better

1) in case of one i7 (4 or 6 nodes ) ?

2) in case of 8 core Xeon like  CPU Intel Xeon E5-2650 2.0 GHz / 8core

What properties of MB should take into account primarily for such
Xenon-based system. Does such MBs support multi-GPU ( I noticed that many
such MBs lack for PCI)?


2013/5/25 Broadbent, Richard

 I've been running on my Universities GPU nodes these are one E5-xeon
 (6-cores 12 threads)  and have 4 Nvidia 690gtx's. My system is 93 000 atoms
 of DMF under NVE.  The performance has been a little disappointing
 ~10ns/day. On my home system using a core i5-2500 and a nvidia 560ti I get
 5.4ns/day for the same system. On our HPC system using 32 nodes each with 2
 quad-core xeon processors I get 30-40ns/day.

 I think that to achieve reasonable performance the system has to be
 balanced between CPU's and GPU's probably getting 2 high end GPU's and a
 top end xeon E5 or core i7 would be a good choice.


 From: lloyd riggs
 Reply-To: Discussion users gmx-users@gromacs.orgmailto:
 Date: Saturday, 25 May 2013 12:02
 To: Discussion users
 Subject: Aw: Re: [gmx-users] GPU-based workstation

 More RAM the better, and the best I have seen is 4 GPU work station.  I
 can use/have used 4.  The GPU takes 2 slots though, so a 7-8 PCIe board is
 really 3-4 GPU, except the tyan mentioned (there designed as blades so an 8
 or 10 slot board really holds 8 or 10 GPU's).  There's cooling problems
 though with GPU's, as on a board there packed, so extra cooling things may
 help not blow a GPU, but I would look for good ones (ask around), as its a
 video game market and they go for looks even though its in casing?  The
 external RAM (not onboard GPU RAM) helps if you do a larger sim, but I dont
 know performance wise, the onboard GPU, the more RAM the yes,
 normal work stations you can get 4 GPU's for a 300 US$ board, but then the
 price goes way up (3-4000 US$ for an 8-10 gpu board).  RAM ordered abroad
 is also cheep, 8 or 16 MB Vs. Shop...I have used 4 GPU's but only on tests
 software, not Gromacs, so would be nice to see performance...for a small
 100 atom molecule and 500 solvent, using just the CPU I get it to run 5-10
 minutes real  for 1 ns sim, but tried simple large 800 amino, 25,000
 solvent eq (NVT or NPT) runs and they clock at around 1 hour real for say
 50 ps eq's


 Gesendet: Samstag, 25. Mai 2013 um 07:54 Uhr
 Von: James Starlight jmsstarli...@gmail.commailto:
 An: Discussion list for GROMACS users gmx-users@gromacs.orgmailto:
 Betreff: Re: [gmx-users] GPU-based workstation
 Dear Dr. Watkins!

 Thank you for the suggestions!

 In the local shops I've found only Core i7 with 6 cores (like Core
 i7-39xx) and 4 cores. Should I obtain much better performance with 6 cores
 than with 4 cores in case of i7 cpu (assuming that I run simulation in
 cpu+gpu mode )?

 Also you've mentioned about 4 PCeI MD. Does it means that modern
 work-station could have 4 GPU's in one home-like desktop ? According to my
 current task I suppose that 2 GPU's would be suitable for my simulations
 (assuming that I use typical ASUS MB and 650 Watt power unit). Have
 someone tried to use several GPU's on one workstation ? What attributes of
 MB should be taken into account for best performance on such multi-gpu
 station ?


 2013/5/25 lloyd riggs

  There's also these, but 1 chip runs 6K US, they can get performance up to
  2.3 teraflops per chip though double percission...but have no clue about
  integration with GPU's...Intell also sells their chips on PCIe
  get only about 350 Gflops, and run 1K US$. and vendor
  They can design them though to fit a PCIe slot and run about the same,
  still need the board, ram etc...
  Mostly just to dream about, they say you can order them with radiation
  shielding as
  Stephan Watkins
  *Gesendet:* Freitag, 24. Mai 2013 um 13:17 Uhr
  *Von:* James Starlight jmsstarli...@gmail.commailto:
  *An:* Discussion list for GROMACS users gmx-users@gromacs.orgmailto:
  *Betreff:* [gmx-users] GPU-based workstation
  Dear Gromacs Users!
  I'd like to build new workstation for performing simulation on GPU with
  Gromacs 4.6 native cuda support.
  Recently I've used such setup with Core i5 cpu and nvidia 670 GTX video
  and obtain good performance ( ~ 20 ns\day for typical 60.000 atom system
  with SD integrator)
  Now I'd like to build multi-gpu wokstation.
  My question - How much GPU would give me best performance on the typical

Aw: Re: Re: [gmx-users] GPU-based workstation

2013-05-25 Thread lloyd riggs

Id go for the i7 6 core,

To the other message, funny. I bought ATIs as they clock faster and cost 1/3 the price of Nvidias but then the software all went to Nvidia. The new ATI with twice the shaders runs at the same speed (around 1-1.3 terflops ) due to the same problems the Nvidias ran into with IO (or maybe onboard RAM does solve the problem if they went up to 16 or 32 MB) Gromacs, etc...doesnt run on ATIs, and Ive been hoping they, AMD, catch up, but all I ever see is the constant in 6 months then nothing.

I ran around 40 4 ns simulations on University blades with 8 AMD quad cores, using 3 blades I only was able to get 1 ns/day, but never pressed it as far as why so slow, as I needed to finish. With the Nvidia at even 5 ns/day I or alot of people could do some really nice work as far as publishing, with raw data in 2 weeks time, so now I feel a bit saddened...

I also just found openCL profilling with CUDA 5 that will take any C or C++ software, and mark all sections you need to convert to openCL, but the trial software is 30 day, then 250 US...


Gesendet:Samstag, 25. Mai 2013 um 15:19 Uhr
Von:James Starlight
An:Discussion list for GROMACS users
Betreff:Re: Aw: Re: [gmx-users] GPU-based workstation


thanks for suggestion!

Assuming that Im using 2 high end GeForces what performance be better

1) in case of one i7 (4 or 6 nodes ) ?

2) in case of 8 core Xeon like CPU Intel Xeon E5-2650 2.0 GHz / 8core

What properties of MB should take into account primarily for such
Xenon-based system. Does such MBs support multi-GPU ( I noticed that many
such MBs lack for PCI)?


2013/5/25 Broadbent, Richard

 Ive been running on my Universities GPU nodes these are one E5-xeon
 (6-cores 12 threads) and have 4 Nvidia 690gtxs. My system is 93 000 atoms
 of DMF under NVE. The performance has been a little disappointing
 ~10ns/day. On my home system using a core i5-2500 and a nvidia 560ti I get
 5.4ns/day for the same system. On our HPC system using 32 nodes each with 2
 quad-core xeon processors I get 30-40ns/day.

 I think that to achieve reasonable performance the system has to be
 balanced between CPUs and GPUs probably getting 2 high end GPUs and a
 top end xeon E5 or core i7 would be a good choice.


 From: lloyd riggs
 Reply-To: Discussion users gmx-users@gromacs.orgmailto:
 Date: Saturday, 25 May 2013 12:02
 To: Discussion users
 Subject: Aw: Re: [gmx-users] GPU-based workstation

 More RAM the better, and the best I have seen is 4 GPU work station. I
 can use/have used 4. The GPU takes 2 slots though, so a 7-8 PCIe board is
 really 3-4 GPU, except the tyan mentioned (there designed as blades so an 8
 or 10 slot board really holds 8 or 10 GPUs). Theres cooling problems
 though with GPUs, as on a board there packed, so extra cooling things may
 help not blow a GPU, but I would look for good ones (ask around), as its a
 video game market and they go for looks even though its in casing? The
 external RAM (not onboard GPU RAM) helps if you do a larger sim, but I dont
 know performance wise, the onboard GPU, the more RAM the yes,
 normal work stations you can get 4 GPUs for a 300 US board, but then the
 price goes way up (3-4000 US for an 8-10 gpu board). RAM ordered abroad
 is also cheep, 8 or 16 MB Vs. Shop...I have used 4 GPUs but only on tests
 software, not Gromacs, so would be nice to see performance...for a small
 100 atom molecule and 500 solvent, using just the CPU I get it to run 5-10
 minutes real for 1 ns sim, but tried simple large 800 amino, 25,000
 solvent eq (NVT or NPT) runs and they clock at around 1 hour real for say
 50 ps eqs


 Gesendet: Samstag, 25. Mai 2013 um 07:54 Uhr
 Von: James Starlight jmsstarli...@gmail.commailto:
 An: Discussion list for GROMACS users gmx-users@gromacs.orgmailto:
 Betreff: Re: [gmx-users] GPU-based workstation
 Dear Dr. Watkins!

 Thank you for the suggestions!

 In the local shops Ive found only Core i7 with 6 cores (like Core
 i7-39xx) and 4 cores. Should I obtain much better performance with 6 cores
 than with 4 cores in case of i7 cpu (assuming that I run simulation in
 cpu+gpu mode )?

 Also youve mentioned about 4 PCeI MD. Does it means that modern
 work-station could have 4 GPUs in one home-like desktop ? According to my
 current task I suppose that 2 GPUs would be suitable for my simulations
 (assuming that I use typical ASUS MB and 650 Watt power unit). Have
 someone tried to use several GPUs on one workstation ? What attributes of
 MB should be taken into account for best performance on such multi-gpu
 station ?


 2013/5/25 lloyd riggs

  Theres also these, but 1 chip runs 6K US, they can get

Aw: Re: Re: [gmx-users] GPU-based workstation

2013-05-25 Thread lloyd riggs

You can also look at profilling on varied web sites, the high end Nvidia run only slightly better than the 2 year old ones, from an individual point not worth the money yet, but if you have the money? as Ive been browsing.

Also, the sim I did on the cluster was 180-190,000 atoms so the exact same performance the other person had.


Gesendet:Samstag, 25. Mai 2013 um 15:19 Uhr
Von:James Starlight
An:Discussion list for GROMACS users
Betreff:Re: Aw: Re: [gmx-users] GPU-based workstation


thanks for suggestion!

Assuming that Im using 2 high end GeForces what performance be better

1) in case of one i7 (4 or 6 nodes ) ?

2) in case of 8 core Xeon like CPU Intel Xeon E5-2650 2.0 GHz / 8core

What properties of MB should take into account primarily for such
Xenon-based system. Does such MBs support multi-GPU ( I noticed that many
such MBs lack for PCI)?


2013/5/25 Broadbent, Richard

 Ive been running on my Universities GPU nodes these are one E5-xeon
 (6-cores 12 threads) and have 4 Nvidia 690gtxs. My system is 93 000 atoms
 of DMF under NVE. The performance has been a little disappointing
 ~10ns/day. On my home system using a core i5-2500 and a nvidia 560ti I get
 5.4ns/day for the same system. On our HPC system using 32 nodes each with 2
 quad-core xeon processors I get 30-40ns/day.

 I think that to achieve reasonable performance the system has to be
 balanced between CPUs and GPUs probably getting 2 high end GPUs and a
 top end xeon E5 or core i7 would be a good choice.


 From: lloyd riggs
 Reply-To: Discussion users gmx-users@gromacs.orgmailto:
 Date: Saturday, 25 May 2013 12:02
 To: Discussion users
 Subject: Aw: Re: [gmx-users] GPU-based workstation

 More RAM the better, and the best I have seen is 4 GPU work station. I
 can use/have used 4. The GPU takes 2 slots though, so a 7-8 PCIe board is
 really 3-4 GPU, except the tyan mentioned (there designed as blades so an 8
 or 10 slot board really holds 8 or 10 GPUs). Theres cooling problems
 though with GPUs, as on a board there packed, so extra cooling things may
 help not blow a GPU, but I would look for good ones (ask around), as its a
 video game market and they go for looks even though its in casing? The
 external RAM (not onboard GPU RAM) helps if you do a larger sim, but I dont
 know performance wise, the onboard GPU, the more RAM the yes,
 normal work stations you can get 4 GPUs for a 300 US board, but then the
 price goes way up (3-4000 US for an 8-10 gpu board). RAM ordered abroad
 is also cheep, 8 or 16 MB Vs. Shop...I have used 4 GPUs but only on tests
 software, not Gromacs, so would be nice to see performance...for a small
 100 atom molecule and 500 solvent, using just the CPU I get it to run 5-10
 minutes real for 1 ns sim, but tried simple large 800 amino, 25,000
 solvent eq (NVT or NPT) runs and they clock at around 1 hour real for say
 50 ps eqs


 Gesendet: Samstag, 25. Mai 2013 um 07:54 Uhr
 Von: James Starlight jmsstarli...@gmail.commailto:
 An: Discussion list for GROMACS users gmx-users@gromacs.orgmailto:
 Betreff: Re: [gmx-users] GPU-based workstation
 Dear Dr. Watkins!

 Thank you for the suggestions!

 In the local shops Ive found only Core i7 with 6 cores (like Core
 i7-39xx) and 4 cores. Should I obtain much better performance with 6 cores
 than with 4 cores in case of i7 cpu (assuming that I run simulation in
 cpu+gpu mode )?

 Also youve mentioned about 4 PCeI MD. Does it means that modern
 work-station could have 4 GPUs in one home-like desktop ? According to my
 current task I suppose that 2 GPUs would be suitable for my simulations
 (assuming that I use typical ASUS MB and 650 Watt power unit). Have
 someone tried to use several GPUs on one workstation ? What attributes of
 MB should be taken into account for best performance on such multi-gpu
 station ?


 2013/5/25 lloyd riggs

  Theres also these, but 1 chip runs 6K US, they can get performance up to
  2.3 teraflops per chip though double percission...but have no clue about
  integration with GPUs...Intell also sells their chips on PCIe
  get only about 350 Gflops, and run 1K US. and vendor
  They can design them though to fit a PCIe slot and run about the same,
  still need the board, ram etc...
  Mostly just to dream about, they say you can order them with radiation
  shielding as
  Stephan Watkins
  *Gesendet:* Freitag, 24. Mai 2013 um 13:17 Uhr
  *Von:* James Starlight jmsstarli...@gmail.commailto:
  *An:* Discussion list for GROMACS users gmx-users

Re: [gmx-users] GPU job often stopped

2013-05-02 Thread Albert

the problem is still there...


On 04/29/2013 06:06 PM, Szilárd Páll wrote:

On Mon, Apr 29, 2013 at 3:51 PM,  wrote:

On 04/29/2013 03:47 PM, Szilárd Páll wrote:

In that case, while it isn't very likely, the issue could be caused by
some implementation detail which aims to avoid performance loss caused
by an issue in the NVIDIA drivers.

Try running with the GMX_CUDA_STREAMSYNC environment variable set.

Btw, were there any other processes using the GPU while mdrun was running?


thanks for kind reply.
There is no any other process when I am running Gromacs.

do you mean I should set GMX_CUDA_STREAMSYNC in the job script like:

export GMX_CUDA_STREAMSYNC=/opt/cuda-5.0

Sort of, but the value does not matter. So if your shell is bash, the
above as well as simply export GMX_CUDA_STREAMSYNC= will work fine.

Let us know if this avoided the crash - when you have simulated long
enough to be able to judge.


gmx-users mailing
* Please search the archive at before posting!
* Please don't post (un)subscribe requests to the list. Use the 
www interface or send it to

* Can't post? Read

Re: [gmx-users] GPU job often stopped

2013-04-29 Thread Szilárd Páll
Have you tried running on CPUs only just to see if the issue persists?
Unless the issue does not occur with the same binary on the same
hardware running on CPUs only, I doubt it's a problem in the code.

Do you have ECC on?

On Sun, Apr 28, 2013 at 5:27 PM, Albert wrote:

   I am running MD jobs in a workstation with 4 K20 GPU and I found that the
 job always failed with following messages from time to time:

 [tesla:03432] *** Process received signal ***
 [tesla:03432] Signal: Segmentation fault (11)
 [tesla:03432] Signal code: Address not mapped (1)
 [tesla:03432] Failing at address: 0xfffe02de67e0
 [tesla:03432] [ 0] /lib/x86_64-linux-gnu/
 [tesla:03432] [ 1] mdrun_mpi() [0x47dd61]
 [tesla:03432] [ 2] mdrun_mpi() [0x47d8ae]
 [tesla:03432] [ 3]
 [tesla:03432] *** End of error message ***
 mpirun noticed that process rank 0 with PID 3432 on node tesla exited on
 signal 11 (Segmentation fault).

 I can continue the jobs with mdrun option -append -cpi, but it still
 stopped from time to time. I am just wondering what's the problem?

 thank you very much
 gmx-users mailing
 * Please search the archive at before posting!
 * Please don't post (un)subscribe requests to the list. Use the www
 interface or send it to
 * Can't post? Read
gmx-users mailing
* Please search the archive at before posting!
* Please don't post (un)subscribe requests to the list. Use the
www interface or send it to
* Can't post? Read

Re: [gmx-users] GPU job often stopped

2013-04-29 Thread Albert


 yes, I tried the CPU only version, it goes well and didn't stop. I am 
not sure whether I have ECC on or not. There are 4 Tesla K20 and one 
GTX650 in the workstation, after compilation, I simple submit the jobs 
with command:

mdrun -s md.tpr -gpu_id 0234

I submit the same system in another GTX690 machine, it also goes 
well. I compiled Gromacs with the same options in that machine.

thank you very much

On 04/29/2013 01:19 PM, Szilárd Páll wrote:

Have you tried running on CPUs only just to see if the issue persists?
Unless the issue does not occur with the same binary on the same
hardware running on CPUs only, I doubt it's a problem in the code.

Do you have ECC on?

gmx-users mailing
* Please search the archive at before posting!
* Please don't post (un)subscribe requests to the list. Use the 
www interface or send it to

* Can't post? Read

Re: [gmx-users] GPU job often stopped

2013-04-29 Thread Albert

On 04/28/2013 05:45 PM, Justin Lemkul wrote:

Frequent failures suggest instability in the simulated system. Check 
your .log file or stderr for informative Gromacs diagnostic information.


my log file didn't have any errors, the end of topped log file something 

DD  step 2259  vol min/aver 0.967  load imb.: force  0.8%

   Step   Time Lambda

   Energies (kJ/mol)
  AngleU-BProper Dih.  Improper Dih.  LJ-14
9.86437e+034.02406e+043.52809e+046.13542e+02 8.61815e+03
 Coulomb-14LJ (SR)  Disper. corr.   Coulomb (SR)   Coul. recip.
1.25055e+043.05477e+04   -9.05956e+03   -6.02400e+05 1.58357e+03
 Position Rest.  PotentialKinetic En.   Total Energy Temperature
1.39149e+02   -4.72066e+051.37165e+05   -3.34901e+05 3.11958e+02
 Pres. DC (bar) Pressure (bar)   Constr. rmsd
   -2.94092e+02   -7.91535e+011.79812e-05

also in the information file I only obtained information:

step 13300, will finish Tue Apr 30 14:41
NOTE: Turning on dynamic load balancing

Probably the machine was restarted from time to time?


gmx-users mailing
* Please search the archive at before posting!
* Please don't post (un)subscribe requests to the list. Use the 
www interface or send it to

* Can't post? Read

Re: [gmx-users] GPU job often stopped

2013-04-29 Thread Szilárd Páll
On Mon, Apr 29, 2013 at 2:41 PM, Albert wrote:
 On 04/28/2013 05:45 PM, Justin Lemkul wrote:

 Frequent failures suggest instability in the simulated system. Check your
 .log file or stderr for informative Gromacs diagnostic information.


 my log file didn't have any errors, the end of topped log file something

 DD  step 2259  vol min/aver 0.967  load imb.: force  0.8%

Step   Time Lambda

Energies (kJ/mol)
   AngleU-BProper Dih.  Improper Dih.  LJ-14
 9.86437e+034.02406e+043.52809e+046.13542e+02 8.61815e+03
  Coulomb-14LJ (SR)  Disper. corr.   Coulomb (SR)   Coul. recip.
 1.25055e+043.05477e+04   -9.05956e+03   -6.02400e+05 1.58357e+03
  Position Rest.  PotentialKinetic En.   Total Energy Temperature
 1.39149e+02   -4.72066e+051.37165e+05   -3.34901e+05 3.11958e+02
  Pres. DC (bar) Pressure (bar)   Constr. rmsd
-2.94092e+02   -7.91535e+011.79812e-05

 also in the information file I only obtained information:

 step 13300, will finish Tue Apr 30 14:41
 NOTE: Turning on dynamic load balancing

 Probably the machine was restarted from time to time?

The segv indicates that mdrun crashed and not that the machine was
restarted. The GPU detection output (both on stderr and log) should
show whether ECC is on (and so does the nvidia-smi tool).



 gmx-users mailing
 * Please search the archive at before posting!
 * Please don't post (un)subscribe requests to the list. Use the www
 interface or send it to
 * Can't post? Read
gmx-users mailing
* Please search the archive at before posting!
* Please don't post (un)subscribe requests to the list. Use the
www interface or send it to
* Can't post? Read

Re: [gmx-users] GPU job often stopped

2013-04-29 Thread Albert

On 04/29/2013 03:31 PM, Szilárd Páll wrote:

The segv indicates that mdrun crashed and not that the machine was
restarted. The GPU detection output (both on stderr and log) should
show whether ECC is on (and so does the nvidia-smi tool).


yes it was on:

Reading file heavy.tpr, VERSION 4.6.1 (single precision)
Using 4 MPI threads
Using 8 OpenMP threads per tMPI thread

5 GPUs detected:
  #0: NVIDIA Tesla K20m, compute cap.: 3.5, ECC: yes, stat: compatible
  #1: NVIDIA GeForce GTX 650, compute cap.: 3.0, ECC:  no, stat: compatible
  #2: NVIDIA Tesla K20m, compute cap.: 3.5, ECC: yes, stat: compatible
  #3: NVIDIA Tesla K20m, compute cap.: 3.5, ECC: yes, stat: compatible
  #4: NVIDIA Tesla K20m, compute cap.: 3.5, ECC: yes, stat: compatible

4 GPUs user-selected for this run: #0, #2, #3, #4

gmx-users mailing
* Please search the archive at before posting!
* Please don't post (un)subscribe requests to the list. Use the 
www interface or send it to

* Can't post? Read

Re: [gmx-users] GPU job often stopped

2013-04-29 Thread Szilárd Páll
In that case, while it isn't very likely, the issue could be caused by
some implementation detail which aims to avoid performance loss caused
by an issue in the NVIDIA drivers.

Try running with the GMX_CUDA_STREAMSYNC environment variable set.

Btw, were there any other processes using the GPU while mdrun was running?


On Mon, Apr 29, 2013 at 3:32 PM, Albert wrote:
 On 04/29/2013 03:31 PM, Szilárd Páll wrote:

 The segv indicates that mdrun crashed and not that the machine was
 restarted. The GPU detection output (both on stderr and log) should
 show whether ECC is on (and so does the nvidia-smi tool).


 yes it was on:

 Reading file heavy.tpr, VERSION 4.6.1 (single precision)
 Using 4 MPI threads
 Using 8 OpenMP threads per tMPI thread

 5 GPUs detected:
   #0: NVIDIA Tesla K20m, compute cap.: 3.5, ECC: yes, stat: compatible
   #1: NVIDIA GeForce GTX 650, compute cap.: 3.0, ECC:  no, stat: compatible
   #2: NVIDIA Tesla K20m, compute cap.: 3.5, ECC: yes, stat: compatible
   #3: NVIDIA Tesla K20m, compute cap.: 3.5, ECC: yes, stat: compatible
   #4: NVIDIA Tesla K20m, compute cap.: 3.5, ECC: yes, stat: compatible

 4 GPUs user-selected for this run: #0, #2, #3, #4

 gmx-users mailing
 * Please search the archive at before posting!
 * Please don't post (un)subscribe requests to the list. Use the www
 interface or send it to
 * Can't post? Read
gmx-users mailing
* Please search the archive at before posting!
* Please don't post (un)subscribe requests to the list. Use the
www interface or send it to
* Can't post? Read

Re: [gmx-users] GPU job often stopped

2013-04-29 Thread Albert

On 04/29/2013 03:47 PM, Szilárd Páll wrote:

In that case, while it isn't very likely, the issue could be caused by
some implementation detail which aims to avoid performance loss caused
by an issue in the NVIDIA drivers.

Try running with the GMX_CUDA_STREAMSYNC environment variable set.

Btw, were there any other processes using the GPU while mdrun was running?


thanks for kind reply.
There is no any other process when I am running Gromacs.

do you mean I should set GMX_CUDA_STREAMSYNC in the job script like:

export GMX_CUDA_STREAMSYNC=/opt/cuda-5.0



gmx-users mailing
* Please search the archive at before posting!
* Please don't post (un)subscribe requests to the list. Use the 
www interface or send it to

* Can't post? Read

Re: [gmx-users] GPU job often stopped

2013-04-29 Thread Szilárd Páll
On Mon, Apr 29, 2013 at 3:51 PM, Albert wrote:
 On 04/29/2013 03:47 PM, Szilárd Páll wrote:

 In that case, while it isn't very likely, the issue could be caused by
 some implementation detail which aims to avoid performance loss caused
 by an issue in the NVIDIA drivers.

 Try running with the GMX_CUDA_STREAMSYNC environment variable set.

 Btw, were there any other processes using the GPU while mdrun was running?


 thanks for kind reply.
 There is no any other process when I am running Gromacs.

 do you mean I should set GMX_CUDA_STREAMSYNC in the job script like:

 export GMX_CUDA_STREAMSYNC=/opt/cuda-5.0

Sort of, but the value does not matter. So if your shell is bash, the
above as well as simply export GMX_CUDA_STREAMSYNC= will work fine.

Let us know if this avoided the crash - when you have simulated long
enough to be able to judge.




 gmx-users mailing
 * Please search the archive at before posting!
 * Please don't post (un)subscribe requests to the list. Use the www
 interface or send it to
 * Can't post? Read
gmx-users mailing
* Please search the archive at before posting!
* Please don't post (un)subscribe requests to the list. Use the
www interface or send it to
* Can't post? Read

Re: [gmx-users] GPU job often stopped

2013-04-28 Thread Justin Lemkul

On 4/28/13 11:27 AM, Albert wrote:


   I am running MD jobs in a workstation with 4 K20 GPU and I found that the job
always failed with following messages from time to time:

[tesla:03432] *** Process received signal ***
[tesla:03432] Signal: Segmentation fault (11)
[tesla:03432] Signal code: Address not mapped (1)
[tesla:03432] Failing at address: 0xfffe02de67e0
[tesla:03432] [ 0] /lib/x86_64-linux-gnu/ 
[tesla:03432] [ 1] mdrun_mpi() [0x47dd61]
[tesla:03432] [ 2] mdrun_mpi() [0x47d8ae]
[tesla:03432] [ 3]
/opt/intel/lib/intel64/ [0x7f46667904f3]
[tesla:03432] *** End of error message ***
mpirun noticed that process rank 0 with PID 3432 on node tesla exited on signal
11 (Segmentation fault).

I can continue the jobs with mdrun option -append -cpi, but it still stopped
from time to time. I am just wondering what's the problem?

Frequent failures suggest instability in the simulated system.  Check your .log 
file or stderr for informative Gromacs diagnostic information.



Justin A. Lemkul, Ph.D.
Research Scientist
Department of Biochemistry
Virginia Tech
Blacksburg, VA
jalemkul[at] | (540) 231-9080

gmx-users mailing
* Please search the archive at before posting!
* Please don't post (un)subscribe requests to the list. Use the 
www interface or send it to

* Can't post? Read

Re: [gmx-users] GPU efficiency question

2013-04-27 Thread Mark Abraham
Probably the part of the calculation done on the GPU is not rate limiting.
There's no point having four chefs to make one dish...

Look at the beginning and end of your .log files for diagnostic
information. If this is a single node, you should be using threadMPI, not
real MPI. Generally four CPU cores vs four GPU cores will require an
extremely large PP load for the GPUs to all be effective.


On Fri, Apr 26, 2013 at 8:35 PM, Albert wrote:


  I've got two GTX690 in a a workstation and I found that when I run the md
 production with following two command:

 mpirun -np 4 md_run_mpi


 mpirun -np 2 md_run_mpi

 the efficiency are the same. I notice that gromacs can detect 4 GPU
 (probably because GTX690 have two core..):

 4 GPUs detected on host node4:
   #0: NVIDIA GeForce GTX 690, compute cap.: 3.0, ECC:  no, stat: compatible
   #1: NVIDIA GeForce GTX 690, compute cap.: 3.0, ECC:  no, stat: compatible
   #2: NVIDIA GeForce GTX 690, compute cap.: 3.0, ECC:  no, stat: compatible
   #3: NVIDIA GeForce GTX 690, compute cap.: 3.0, ECC:  no, stat: compatible

 why the -np 2 and -np 4 are the same efficiency? shouldn't it be
 faster for -np 4 ?

 thank you very much


 gmx-users mailing**mailman/listinfo/gmx-users
 * Please search the archive at**
 * Please don't post (un)subscribe requests to the list. Use the www
 interface or send it to
 * Can't post? Read**Support/Mailing_Lists

gmx-users mailing
* Please search the archive at before posting!
* Please don't post (un)subscribe requests to the list. Use the 
www interface or send it to
* Can't post? Read

Re: [gmx-users] GPU performance

2013-04-10 Thread Szilárd Páll
On Wed, Apr 10, 2013 at 3:34 AM, Benjamin Bobay wrote:

 Szilárd -

 First, many thanks for the reply.

 Second, I am glad that I am not crazy.

 Ok so based on your suggestions, I think I know what the problem is/was.
 There was a sander process running on 1 of the CPUs.  Clearly GROMACS was
 trying to use 4 with Using 4 OpenMP thread. I just did not catch that.
 Sorry! Rookie mistake.

 Which I guess leads me to my next question (sorry if its too naive):

 (1) When running GROMACS (or a I guess any other CUDA based programs), its
 best to have all the CPUs free, right? I guess based on my results I have
 pretty much answered that question.  Although I thought that as long as I
 have one CPU available to run the GPU it would be good: would setting
 -ntmpi 1 -ntomp 1 help or would I take a major hit in ns/day as well?

Such a behavior is not specific to GROMACS or CUDA-accelerated codes, but
all compute-intensive codes that expect to be running alone on the set of
CPU cores they are started on. As you could see on the output, mdrun
automatically detected that you have 4 CPU cores and as Mark saied, it
tries to use all of them along the GPU. As one of the cores was busy, you
ended up in a situation in which four threads of mdrun plus the
(presumably) one thread of sander are competing for four cores. This is
made even worse by the fact that when using a full machine, mdrun locks its
threads to physical cores to prevent the OS from moving them around (which
can cause performance loss).

Secondly, using a single core with a GPU will not result in a very good
performance in GROMACS. The current GROMACS acceleration expects to run on
a couple of CPU cores together with a GPU - which is the typical balance of
CPU-GPU hardware most clusters (1 GPU/socket) as well as many home users
would have (1-2 GPUs for 4-8 CPU cores).

 If I try the benchmarks again just to see (for fun) with Using 4 OpenMP
 thread, under top I have - so I think the CPU is fine :
 24791 bobayb20   0 48.3g  51m 7576 R 299.1  0.2  11:32.90

Nope, that just means, roughly speaking, that sander is probably fully
using one core and the four thread of mdrun are crammed on the remaining
three cores - which is bad.

However, you can simply run mdrun using three threads which will run fine
along sander. Whether this will be efficient or not, you'll have to see.
Note that if some other program is using the GPU as well, don't expect full
performance - but the difference will be much less than in the case
of oversubscribed CPU cores.


 When I have a chance (after this sander run is done - hopefully soon) I can
 try the benchmarks again.

 Thanks again for the help!

 gmx-users mailing
 * Please search the archive at before posting!
 * Please don't post (un)subscribe requests to the list. Use the
 www interface or send it to
 * Can't post? Read

gmx-users mailing
* Please search the archive at before posting!
* Please don't post (un)subscribe requests to the list. Use the
www interface or send it to
* Can't post? Read

Re: [gmx-users] GPU performance

2013-04-09 Thread Szilárd Páll
Hi Ben,

That performance is not reasonable at all - neither for CPU only run on
your quad-core Sandy Bridge, nor for the CPU+GPU run. For the latter you
should be getting more like 50 ns/day or so.

What's strange about your run is that the CPU-GPU load balancing is picking
a *very* long cut-off which means that your CPU is for some reason
performing very badly. Check how is mdrun behaving while running in
top/htop nad if you are not seeing ~400% CPU utilization, there is
something wrong - perhaps threads getting locked to the same core (to check
that try -pin off).

Secondly, note that you are using OpenMM-specific settings from the old
GROMACS-OpenMM comparison benchmarks in which the grid spacing is overly
coarse (you could use something like a fourier-spacing=0.125 or even larger
with rc=1.0).



On Tue, Apr 9, 2013 at 10:27 PM, Benjamin Bobay wrote:

 Good afternoon -

 I recently installed gromacs-4.6 on CentOS6.3 and the installation went
 just fine.

 I have a Tesla C2075 GPU.

 I then downloaded the benchmark directories and ran a bench mark on the
 GPU/ dhfr-solv-PME.bench

 This is what I got:

 Using 1 MPI thread
 Using 4 OpenMP threads

 1 GPU detected:
   #0: NVIDIA Tesla C2075, compute cap.: 2.0, ECC: yes, stat: compatible

 1 GPU user-selected for this run: #0

 Back Off! I just backed up ener.edr to ./#ener.edr.1#
 starting mdrun 'Protein in water'
 -1 steps, infinite ps.
 step   40: timed with pme grid 64 64 64, coulomb cutoff 1.000: 4122.9
 step   80: timed with pme grid 56 56 56, coulomb cutoff 1.143: 3685.9
 step  120: timed with pme grid 48 48 48, coulomb cutoff 1.333: 3110.8
 step  160: timed with pme grid 44 44 44, coulomb cutoff 1.455: 3365.1
 step  200: timed with pme grid 40 40 40, coulomb cutoff 1.600: 3499.0
 step  240: timed with pme grid 52 52 52, coulomb cutoff 1.231: 3982.2
 step  280: timed with pme grid 48 48 48, coulomb cutoff 1.333: 3129.2
 step  320: timed with pme grid 44 44 44, coulomb cutoff 1.455: 3425.4
 step  360: timed with pme grid 42 42 42, coulomb cutoff 1.524: 2979.1
   optimal pme grid 42 42 42, coulomb cutoff 1.524
 step 4300 performance: 1.8 ns/day

 and from the nvidia-smi output:
 Tue Apr  9 10:13:46 2013

 | NVIDIA-SMI 4.304.37   Driver Version: 304.37

 | GPU  Name | Bus-IdDisp.  | Volatile Uncorr.
 ECC |
 | Fan  Temp  Perf  Pwr:Usage/Cap| Memory-Usage | GPU-Util  Compute
 M. |

 |   0  Tesla C2075  | :03:00.0  On |
 0 |
 | 30%   67CP080W / 225W |   4%  200MB / 5375MB |  4%
 Default |


 | Compute processes:   GPU
 Memory |
 |  GPU   PID  Process name
 Usage  |

 |0 22568  mdrun
 59MB  |


 So I am only getting 1.8ns/day ! Is that right? It seems very very
 small compared to the CPU test where I am getting the same:

 step 200 performance: 1.8 ns/dayvol 0.79  imb F 14%

 From the md.log of the GPU test:
 Detecting CPU-specific acceleration.
 Present hardware specification:
 Vendor: GenuineIntel
 Brand:  Intel(R) Xeon(R) CPU E5-2603 0 @ 1.80GHz
 Family:  6  Model: 45  Stepping:  7
 Features: aes apic avx clfsh cmov cx8 cx16 htt lahf_lm mmx msr nonstop_tsc
 pcid pclmuldq pdcm pdpe1gb popcnt pse rdtscp sse2 sse3 sse4.1 sse4.2 ssse3
 tdt x2a
 Acceleration most likely to fit this hardware: AVX_256
 Acceleration selected at GROMACS compile time: AVX_256

 1 GPU detected:
   #0: NVIDIA Tesla C2075, compute cap.: 2.0, ECC: yes, stat: compatible

 1 GPU user-selected for this run: #0

 Will do PME sum in reciprocal space.

 Any thoughts as to why it is so slow?

 many thanks!

 Research Assistant Professor
 North Carolina State University
 Department of Molecular and Structural Biochemistry
 128 Polk Hall
 Raleigh, NC 27695
 Phone: (919)-513-0698
 Fax: (919)-515-2047
 gmx-users mailing
 * Please search the archive at before posting!
 * Please don't post (un)subscribe requests to the list. Use the
 www interface or send it to
 * Can't post? Read 

Re: [gmx-users] GPU performance

2013-04-09 Thread Benjamin Bobay
Szilárd -

First, many thanks for the reply.

Second, I am glad that I am not crazy.

Ok so based on your suggestions, I think I know what the problem is/was.
There was a sander process running on 1 of the CPUs.  Clearly GROMACS was
trying to use 4 with Using 4 OpenMP thread. I just did not catch that.
Sorry! Rookie mistake.

Which I guess leads me to my next question (sorry if its too naive):

(1) When running GROMACS (or a I guess any other CUDA based programs), its
best to have all the CPUs free, right? I guess based on my results I have
pretty much answered that question.  Although I thought that as long as I
have one CPU available to run the GPU it would be good: would setting
-ntmpi 1 -ntomp 1 help or would I take a major hit in ns/day as well?

If I try the benchmarks again just to see (for fun) with Using 4 OpenMP
thread, under top I have - so I think the CPU is fine :
24791 bobayb20   0 48.3g  51m 7576 R 299.1  0.2  11:32.90

When I have a chance (after this sander run is done - hopefully soon) I can
try the benchmarks again.

Thanks again for the help!

gmx-users mailing
* Please search the archive at before posting!
* Please don't post (un)subscribe requests to the list. Use the
www interface or send it to
* Can't post? Read

Re: [gmx-users] GPU performance

2013-04-09 Thread Mark Abraham
On Apr 10, 2013 3:34 AM, Benjamin Bobay wrote:

 Szilárd -

 First, many thanks for the reply.

 Second, I am glad that I am not crazy.

 Ok so based on your suggestions, I think I know what the problem is/was.
 There was a sander process running on 1 of the CPUs.  Clearly GROMACS was
 trying to use 4 with Using 4 OpenMP thread. I just did not catch that.
 Sorry! Rookie mistake.

 Which I guess leads me to my next question (sorry if its too naive):

 (1) When running GROMACS (or a I guess any other CUDA based programs), its
 best to have all the CPUs free, right? I guess based on my results I have
 pretty much answered that question.  Although I thought that as long as I
 have one CPU available to run the GPU it would be good: would setting
 -ntmpi 1 -ntomp 1 help or would I take a major hit in ns/day as well?

Some codes might treat the CPU as a I/O, MPI and memory-serving
co-processor of the GPU; those codes will tend to be insensitive to the
CPU config. GROMACS goes to great lengths to use all the hardware in a
dynamically load-balanced way, so CPU load and config tend to affect the
bottom line immediately.


 If I try the benchmarks again just to see (for fun) with Using 4 OpenMP
 thread, under top I have - so I think the CPU is fine :
 24791 bobayb20   0 48.3g  51m 7576 R 299.1  0.2  11:32.90

 When I have a chance (after this sander run is done - hopefully soon) I
 try the benchmarks again.

 Thanks again for the help!

 gmx-users mailing
 * Please search the archive at before posting!
 * Please don't post (un)subscribe requests to the list. Use the
 www interface or send it to
 * Can't post? Read
gmx-users mailing
* Please search the archive at before posting!
* Please don't post (un)subscribe requests to the list. Use the
www interface or send it to
* Can't post? Read

Re: [gmx-users] GPU version of GROMACS 4.6 in MacOS cluster

2013-03-08 Thread George Patargias
Hi Szilard

Thanks for this tip; it was extremely useful. The problem was indeed the
incompatibility between the installed NVIDIA driver and the CUDA 5.0
runtime library. Installation of an older driver solved the problem. The
programs devideQuery etc can now detect the GPU.

GROMACS can also detect now the card but unfortunately aborts with the
following error

Fatal error: Incorrect launch configuration: mismatching number of PP MPI
processes and GPUs per node.
mdrun_mpi was started with 12 PP MPI processes per node, but only 1 GPU
were detected.

Here is my command line
mpirun -np 12 mdrun_mpi -s test.tpr -deffnm test_out -nb gpu

What can be the problem?

Thanks again

 Hi George,
 As I said before, that just means that most probably the GPU driver is
 compatible with the CUDA runtime (libcudart) that you installed with the
CUDA toolkit. I've no clue about the Mac OS installers and releases,
 have to do the research on that. Let us know if you have further
(GROMACS-related) issues.
 On Fri, Mar 1, 2013 at 2:48 PM, George Patargias
 Hi Szilαrd
 Thanks for your reply. I have run the deviceQuery utility and what I
 back is
 /deviceQuery Starting...
  CUDA Device Query (Runtime API) version (CUDART static linking)
 cudaGetDeviceCount returned 38
 - no CUDA-capable device is detected
 Should I understand from this that the CUDA driver was not installed from
 the MAC OS  X CUDA 5.0 Production Release?
  That looks like the driver does not work or is incompatible with the
runtime. Please get the SDK, compile a simple program, e.g.
  see if that works (I suspect that it won't).
  Regarding your machines, just FYI, the Quadro 4000 is a pretty slow
  (somewhat slower than a GTX 460) so you'll hava a quite strong
  imbalance: a lot of CPU compute power (2x Xeon 5xxx, right?) and
  compute power which will lead to the CPU idling while waiting for the
  On Thu, Feb 28, 2013 at 4:52 PM, George Patargias
  We are trying to install the GPU version of GROMACS 4.6 on our own
MacOS cluster. So for the cluster nodes that have the NVIDIA Quadro
  - We have downloaded and install the MAC OS  X CUDA 5.0 Production
  from here:
  placing the libraries contained in this download in
  - We have managed to compile GROMACS 4.6 linking it statically with
  CUDA libraries and the MPI libraries (with BUILD_SHARED_LIBS=OFF and
  Unfortunately, when we tried to run a test job with the generated
mdrun_mpi, GROMACS reported that it cannot detect any CUDA-enabled
devices. It also reports 0.0 version for CUDA driver and runtime. Is
the actual CUDA driver missing from the MAC OS  X CUDA 5.0
  Release that we installed? Do we need to install it from here:
  Or is something else that we need to do?
  Many thanks in advance.
  Dr. George Patargias
  Postdoctoral Researcher
  Biomedical Research Foundation
  Academy of Athens
  4, Soranou Ephessiou
  115 27
  Office: +302106597568
  gmx-users mailing
  * Please search the archive at before posting!
* Please don't post (un)subscribe requests to the list. Use the www
interface or send it to
  * Can't post? Read
  gmx-users mailing
  * Please search the archive at before posting! *
Please don't post (un)subscribe requests to the list. Use the www
interface or send it to
  * Can't post? Read
 Dr. George Patargias
 Postdoctoral Researcher
 Biomedical Research Foundation
 Academy of Athens
 4, Soranou Ephessiou
 115 27
 Office: +302106597568
 gmx-users mailing
 * Please search the archive at before posting! *
Please don't post (un)subscribe requests to the list. Use the www
interface or send it to
 * Can't post? Read
 gmx-users mailing
 * Please search the archive at before posting! *
Please don't post (un)subscribe requests to the list. Use the
 www interface or send it to

Re: [gmx-users] GPU version of GROMACS 4.6 in MacOS cluster

2013-03-01 Thread George Patargias
Hi Szilαrd

Thanks for your reply. I have run the deviceQuery utility and what I got
back is

/deviceQuery Starting...

 CUDA Device Query (Runtime API) version (CUDART static linking)

cudaGetDeviceCount returned 38
- no CUDA-capable device is detected

Should I understand from this that the CUDA driver was not installed from
the MAC OS  X CUDA 5.0 Production Release?



 That looks like the driver does not work or is incompatible with the
 runtime. Please get the SDK, compile a simple program, e.g. deviceQuery
 see if that works (I suspect that it won't).

 Regarding your machines, just FYI, the Quadro 4000 is a pretty slow card
 (somewhat slower than a GTX 460) so you'll hava a quite strong resource
 imbalance: a lot of CPU compute power (2x Xeon 5xxx, right?) and little
 compute power which will lead to the CPU idling while waiting for the GPU.



 On Thu, Feb 28, 2013 at 4:52 PM, George Patargias


 We are trying to install the GPU version of GROMACS 4.6 on our own
 MacOS cluster. So for the cluster nodes that have the NVIDIA Quadro 4000

 - We have downloaded and install the MAC OS  X CUDA 5.0 Production
 from here:

 placing the libraries contained in this download in /usr/local/cuda/lib

 - We have managed to compile GROMACS 4.6 linking it statically with
 CUDA libraries and the MPI libraries (with BUILD_SHARED_LIBS=OFF and

 Unfortunately, when we tried to run a test job with the generated
 mdrun_mpi, GROMACS reported that it cannot detect any CUDA-enabled
 devices. It also reports 0.0 version for CUDA driver and runtime.

 Is the actual CUDA driver missing from the MAC OS  X CUDA 5.0 Production
 Release that we installed? Do we need to install it from here:

 Or is something else that we need to do?

 Many thanks in advance.

 Dr. George Patargias
 Postdoctoral Researcher
 Biomedical Research Foundation
 Academy of Athens
 4, Soranou Ephessiou
 115 27

 Office: +302106597568

 gmx-users mailing
 * Please search the archive at before posting!
 * Please don't post (un)subscribe requests to the list. Use the
 www interface or send it to
 * Can't post? Read

 gmx-users mailing
 * Please search the archive at before posting!
 * Please don't post (un)subscribe requests to the list. Use the
 www interface or send it to
 * Can't post? Read

Dr. George Patargias
Postdoctoral Researcher
Biomedical Research Foundation
Academy of Athens
4, Soranou Ephessiou
115 27

Office: +302106597568

gmx-users mailing
* Please search the archive at before posting!
* Please don't post (un)subscribe requests to the list. Use the 
www interface or send it to
* Can't post? Read

Re: [gmx-users] GPU version of GROMACS 4.6 in MacOS cluster

2013-03-01 Thread Szilárd Páll
Hi George,

As I said before, that just means that most probably the GPU driver is not
compatible with the CUDA runtime (libcudart) that you installed with the
CUDA toolkit. I've no clue about the Mac OS installers and releases, you'll
have to do the research on that. Let us know if you have further
(GROMACS-related) issues.



On Fri, Mar 1, 2013 at 2:48 PM, George Patargias wrote:

 Hi Szilαrd

 Thanks for your reply. I have run the deviceQuery utility and what I got
 back is

 /deviceQuery Starting...

  CUDA Device Query (Runtime API) version (CUDART static linking)

 cudaGetDeviceCount returned 38
 - no CUDA-capable device is detected

 Should I understand from this that the CUDA driver was not installed from
 the MAC OS  X CUDA 5.0 Production Release?


  That looks like the driver does not work or is incompatible with the
  runtime. Please get the SDK, compile a simple program, e.g. deviceQuery
  see if that works (I suspect that it won't).
  Regarding your machines, just FYI, the Quadro 4000 is a pretty slow card
  (somewhat slower than a GTX 460) so you'll hava a quite strong resource
  imbalance: a lot of CPU compute power (2x Xeon 5xxx, right?) and little
  compute power which will lead to the CPU idling while waiting for the
  On Thu, Feb 28, 2013 at 4:52 PM, George Patargias
  We are trying to install the GPU version of GROMACS 4.6 on our own
  MacOS cluster. So for the cluster nodes that have the NVIDIA Quadro 4000
  - We have downloaded and install the MAC OS  X CUDA 5.0 Production
  from here:
  placing the libraries contained in this download in /usr/local/cuda/lib
  - We have managed to compile GROMACS 4.6 linking it statically with
  CUDA libraries and the MPI libraries (with BUILD_SHARED_LIBS=OFF and
  Unfortunately, when we tried to run a test job with the generated
  mdrun_mpi, GROMACS reported that it cannot detect any CUDA-enabled
  devices. It also reports 0.0 version for CUDA driver and runtime.
  Is the actual CUDA driver missing from the MAC OS  X CUDA 5.0 Production
  Release that we installed? Do we need to install it from here:
  Or is something else that we need to do?
  Many thanks in advance.
  Dr. George Patargias
  Postdoctoral Researcher
  Biomedical Research Foundation
  Academy of Athens
  4, Soranou Ephessiou
  115 27
  Office: +302106597568
  gmx-users mailing
  * Please search the archive at before posting!
  * Please don't post (un)subscribe requests to the list. Use the
  www interface or send it to
  * Can't post? Read
  gmx-users mailing
  * Please search the archive at before posting!
  * Please don't post (un)subscribe requests to the list. Use the
  www interface or send it to
  * Can't post? Read

 Dr. George Patargias
 Postdoctoral Researcher
 Biomedical Research Foundation
 Academy of Athens
 4, Soranou Ephessiou
 115 27

 Office: +302106597568

 gmx-users mailing
 * Please search the archive at before posting!
 * Please don't post (un)subscribe requests to the list. Use the
 www interface or send it to
 * Can't post? Read

gmx-users mailing
* Please search the archive at before posting!
* Please don't post (un)subscribe requests to the list. Use the
www interface or send it to
* Can't post? Read

Re: [gmx-users] GPU version of GROMACS 4.6 in MacOS cluster

2013-03-01 Thread Albert

The easiest way for solution is to kill MacOS ans switch to Linux.



On 03/01/2013 06:03 PM, Szilárd Páll wrote:

Hi George,

As I said before, that just means that most probably the GPU driver is not
compatible with the CUDA runtime (libcudart) that you installed with the
CUDA toolkit. I've no clue about the Mac OS installers and releases, you'll
have to do the research on that. Let us know if you have further
(GROMACS-related) issues.



gmx-users mailing
* Please search the archive at before posting!
* Please don't post (un)subscribe requests to the list. Use the 
www interface or send it to

* Can't post? Read

Re: [gmx-users] GPU running problem with GMX-4.6 beta2

2012-12-18 Thread Albert

On 12/17/2012 08:06 PM, Justin Lemkul wrote:
It seems to me that the system is simply crashing like any other that 
becomes unstable.  Does the simulation run at all on plain CPU?


Thank you very much Justin, it's really helpful. I've checked that the 
structure after minization and found that there is some problem with my 
ligand. I regenerated the ligand toplogy with acpype, and resubmit for 
mimization and NVT. Now it goes well. So probably the problems comes 
from the incorrect ligand topolgy which make the system very unstable.

gmx-users mailing
* Please search the archive at before posting!
* Please don't post (un)subscribe requests to the list. Use the 
www interface or send it to

* Can't post? Read

Re: [gmx-users] GPU running problem with GMX-4.6 beta2

2012-12-17 Thread Szilárd Páll

That unfortunately tell exactly about the reason why mdrun is stuck. Can
you reproduce the issue on another machines or with different launch
configurations? At which step does it get stuck (-stepout 1 can help)?

Please try the following:
- try running on a single GPU;
- try running on CPUs only (-nb cpu and to match closer the GPU setup with
-ntomp 12);
- try running in GPU emulation mode with the GMX_EMULATE_GPU=1 env. var
set (and to match closer the GPU setup with -ntomp 12)
- provide a backtrace (using gdb).



On Mon, Dec 17, 2012 at 5:37 PM, Albert wrote:


  I am running GMX-4.6 beta2 GPU work in a 24 CPU core workstation with two
 GTX590, it stacked there without any output i.e the .xtc file size is
 always 0 after hours of running. Here is the md.log file I found:

 Using CUDA 8x8x8 non-bonded kernels

 Potential shift: LJ r^-12: 0.112 r^-6 0.335, Ewald 1.000e-05
 Initialized non-bonded Ewald correction tables, spacing: 7.82e-04 size:

 Removing pbc first time
 Pinning to Hyper-Threading cores with 12 physical cores in a compute node
 There are 1 flexible constraints

 WARNING: step size for flexible constraining = 0
  All flexible constraints will be rigid.
  Will try to keep all flexible constraints at their original
  but the lengths may exhibit some drift.

 Initializing Parallel LINear Constraint Solver
 Linking all bonded interactions to atoms
 There are 161872 inter charge-group exclusions,
 will use an extra communication step for exclusion forces for PME

 The initial number of communication pulses is: X 1
 The initial domain decomposition cell size is: X 1.83 nm

 The maximum allowed distance for charge groups involved in interactions is:
  non-bonded interactions   1.200 nm
 (the following are initial values, they could change due to box
 two-body bonded interactions  (-rdd)   1.200 nm
   multi-body bonded interactions  (-rdd)   1.200 nm
   atoms separated by up to 5 constraints  (-rcon)  1.826 nm

 When dynamic load balancing gets turned on, these settings will change to:
 The maximum number of communication pulses is: X 1
 The minimum size for domain decomposition cells is 1.200 nm
 The requested allowed shrink of DD cells (option -dds) is: 0.80
 The allowed shrink of domain decomposition cells is: X 0.66
 The maximum allowed distance for charge groups involved in interactions is:
  non-bonded interactions   1.200 nm
 two-body bonded interactions  (-rdd)   1.200 nm
   multi-body bonded interactions  (-rdd)   1.200 nm
   atoms separated by up to 5 constraints  (-rcon)  1.200 nm

 Making 1D domain decomposition grid 4 x 1 x 1, home cell index 0 0 0

 Center of mass motion removal mode is Linear
 We have the following groups for center of mass motion removal:
   0:  Protein_LIG_POPC
   1:  Water_and_ions

 G. Bussi, D. Donadio and M. Parrinello
 Canonical sampling through velocity rescaling
 J. Chem. Phys. 126 (2007) pp. 014101
   --- Thank You ---  

 gmx-users mailing**mailman/listinfo/gmx-users
 * Please search the archive at**
 * Please don't post (un)subscribe requests to the list. Use the www
 interface or send it to
 * Can't post? Read**Support/Mailing_Lists

gmx-users mailing
* Please search the archive at before posting!
* Please don't post (un)subscribe requests to the list. Use the
www interface or send it to
* Can't post? Read

Re: [gmx-users] GPU running problem with GMX-4.6 beta2

2012-12-17 Thread Albert


 I reduced the GPU to two, and it said:

Back Off! I just backed up nvt.log to ./#nvt.log.1#
Reading file nvt.tpr, VERSION 4.6-dev-20121004-5d6c49d (single precision)

NOTE: GPU(s) found, but the current simulation can not use GPUs
  To use a GPU, set the mdp option: cutoff-scheme = Verlet
  (for quick performance testing you can use the -testverlet option)

Using 2 MPI processes

4 GPUs detected on host CUDANodeA:
  #0: NVIDIA GeForce GTX 590, compute cap.: 2.0, ECC:  no, stat: compatible
  #1: NVIDIA GeForce GTX 590, compute cap.: 2.0, ECC:  no, stat: compatible
  #2: NVIDIA GeForce GTX 590, compute cap.: 2.0, ECC:  no, stat: compatible
  #3: NVIDIA GeForce GTX 590, compute cap.: 2.0, ECC:  no, stat: compatible

Making 1D domain decomposition 2 x 1 x 1

We have just committed the new CPU detection code in this branch,
and will commit new SSE/AVX kernels in a few days. However, this
means that currently only the NxN kernels are accelerated!
In the mean time, you might want to avoid production runs in 4.6.

when I run it with single GPU, it produced lots of pdb file with prefix 
step, and then it crashed with messages:

Wrote pdb files with previous and current coordinates
Warning: 1-4 interaction between 4674 and 4706 at distance 434.986 which 
is larger than the 1-4 table size 2.200 nm

These are ignored for the rest of the simulation
This usually means your system is exploding,
if not, you should increase table-extension in your mdp file
or with user tables increase the table size
[CUDANodeA:20659] *** Process received signal ***
[CUDANodeA:20659] Signal: Segmentation fault (11)
[CUDANodeA:20659] Signal code: Address not mapped (1)
[CUDANodeA:20659] Failing at address: 0xc7aa00dc
[CUDANodeA:20659] [ 0] /lib64/ [0x2ab25c76d2d0]
[CUDANodeA:20659] [ 1] /opt/gromacs-4.6/lib/ 
[CUDANodeA:20659] [ 2] /opt/gromacs-4.6/lib/ 
[CUDANodeA:20659] [ 3] 
/opt/gromacs-4.6/lib/ [0x2ab259e0cbae]
[CUDANodeA:20659] [ 4] 
[CUDANodeA:20659] [ 5] 

[CUDANodeA:20659] [ 6] mdrun_mpi(do_md+0x8133) [0x4334c3]
[CUDANodeA:20659] [ 7] mdrun_mpi(mdrunner+0x19e9) [0x411639]
[CUDANodeA:20659] [ 8] mdrun_mpi(main+0x17db) [0x4373db]
[CUDANodeA:20659] [ 9] /lib64/ 

[CUDANodeA:20659] [10] mdrun_mpi() [0x407f09]
[CUDANodeA:20659] *** End of error message ***

[1]Segmentation faultmdrun_mpi -v -s nvt.tpr -c nvt.gro 
-g nvt.log -x nvt.xtc

here is the .mdp file I used:

title   = NVT equilibration for OR-POPC system
define  = -DPOSRES -DPOSRES_LIG ; Protein is position restrained 
(uses the posres.itp file information)

; Parameters describing the details of the NVT simulation protocol
integrator  = md; Algorithm (md = molecular dynamics 
[leap-frog integrator]; md-vv = md using velocity verlet; sd = 
stochastic dynamics)

dt  = 0.002 ; Time-step (ps)
nsteps  = 25; Number of steps to run (0.002 * 25 
= 500 ps)

; Parameters controlling output writing
nstxout = 0 ; Write coordinates to output .trr file 
every 2 ps
nstvout = 0 ; Write velocities to output .trr file 
every 2 ps

nstfout = 0

nstxtcout   = 1000
nstenergy   = 1000  ; Write energies to output .edr file 
every 2 ps

nstlog  = 1000  ; Write output to .log file every 2 ps

; Parameters describing neighbors searching and details about 
interaction calculations

ns_type = grid  ; Neighbor list search method (simple, grid)
nstlist = 50; Neighbor list update frequency (after 
every given number of steps)

rlist   = 1.2   ; Neighbor list search cut-off distance (nm)
rlistlong   = 1.4
rcoulomb= 1.2   ; Short-range Coulombic interactions 
cut-off distance (nm)
rvdw= 1.2   ; Short-range van der Waals cutoff 
distance (nm)
pbc = xyz   ; Direction in which to use Perodic 
Boundary Conditions (xyz, xy, no)

cutoff-scheme   =Verlet  ; GPU running

; Parameters for treating bonded interactions
continuation= no; Whether a fresh start or a 
continuation from a previous run (yes/no)

constraint_algorithm = LINCS; Constraint algorithm (LINCS / SHAKE)
constraints = all-bonds ; Which bonds/angles to constrain 
(all-bonds / hbonds / none / all-angles / h-angles)
lincs_iter  = 1 ; Number of iterations to correct for 
rotational lengthening in LINCS (related to accuracy)
lincs_order = 4 ; Highest order in the expansion of the 

Re: [gmx-users] GPU running problem with GMX-4.6 beta2

2012-12-17 Thread Szilárd Páll

How about GPU emulation or CPU-only runs? Also, please try setting the
number of therads to 1 (-ntomp 1).


On Mon, Dec 17, 2012 at 6:01 PM, Albert wrote:


  I reduced the GPU to two, and it said:

 Back Off! I just backed up nvt.log to ./#nvt.log.1#
 Reading file nvt.tpr, VERSION 4.6-dev-20121004-5d6c49d (single precision)

 NOTE: GPU(s) found, but the current simulation can not use GPUs
   To use a GPU, set the mdp option: cutoff-scheme = Verlet
   (for quick performance testing you can use the -testverlet option)

 Using 2 MPI processes

 4 GPUs detected on host CUDANodeA:
   #0: NVIDIA GeForce GTX 590, compute cap.: 2.0, ECC:  no, stat: compatible
   #1: NVIDIA GeForce GTX 590, compute cap.: 2.0, ECC:  no, stat: compatible
   #2: NVIDIA GeForce GTX 590, compute cap.: 2.0, ECC:  no, stat: compatible
   #3: NVIDIA GeForce GTX 590, compute cap.: 2.0, ECC:  no, stat: compatible

 Making 1D domain decomposition 2 x 1 x 1

 We have just committed the new CPU detection code in this branch,
 and will commit new SSE/AVX kernels in a few days. However, this
 means that currently only the NxN kernels are accelerated!
 In the mean time, you might want to avoid production runs in 4.6.

 when I run it with single GPU, it produced lots of pdb file with prefix
 step, and then it crashed with messages:

 Wrote pdb files with previous and current coordinates
 Warning: 1-4 interaction between 4674 and 4706 at distance 434.986 which
 is larger than the 1-4 table size 2.200 nm
 These are ignored for the rest of the simulation
 This usually means your system is exploding,
 if not, you should increase table-extension in your mdp file
 or with user tables increase the table size
 [CUDANodeA:20659] *** Process received signal ***
 [CUDANodeA:20659] Signal: Segmentation fault (11)
 [CUDANodeA:20659] Signal code: Address not mapped (1)
 [CUDANodeA:20659] Failing at address: 0xc7aa00dc
 [CUDANodeA:20659] [ 0] /lib64/**0xf2d0) [0x2ab25c76d2d0]
 [CUDANodeA:20659] [ 1] /opt/gromacs-4.6/lib/libmd_**
 [CUDANodeA:20659] [ 2] /opt/gromacs-4.6/lib/libmd_**
 [CUDANodeA:20659] [ 3] 
 [CUDANodeA:20659] [ 4] /opt/gromacs-4.6/lib/libmd_****0x1eef) [0x2ab259ddd62f]
 [CUDANodeA:20659] [ 5] /opt/gromacs-4.6/lib/libmd_****0x1495) [0x2ab259e72a45]
 [CUDANodeA:20659] [ 6] mdrun_mpi(do_md+0x8133) [0x4334c3]
 [CUDANodeA:20659] [ 7] mdrun_mpi(mdrunner+0x19e9) [0x411639]
 [CUDANodeA:20659] [ 8] mdrun_mpi(main+0x17db) [0x4373db]
 [CUDANodeA:20659] [ 9] /lib64/**main+0xfd)
 [CUDANodeA:20659] [10] mdrun_mpi() [0x407f09]
 [CUDANodeA:20659] *** End of error message ***

 [1]Segmentation faultmdrun_mpi -v -s nvt.tpr -c nvt.gro -g
 nvt.log -x nvt.xtc

 here is the .mdp file I used:

 title   = NVT equilibration for OR-POPC system
 define  = -DPOSRES -DPOSRES_LIG ; Protein is position restrained
 (uses the posres.itp file information)
 ; Parameters describing the details of the NVT simulation protocol
 integrator  = md; Algorithm (md = molecular dynamics
 [leap-frog integrator]; md-vv = md using velocity verlet; sd = stochastic
 dt  = 0.002 ; Time-step (ps)
 nsteps  = 25; Number of steps to run (0.002 * 25 =
 500 ps)

 ; Parameters controlling output writing
 nstxout = 0 ; Write coordinates to output .trr file
 every 2 ps
 nstvout = 0 ; Write velocities to output .trr file
 every 2 ps
 nstfout = 0

 nstxtcout   = 1000
 nstenergy   = 1000  ; Write energies to output .edr file every
 2 ps
 nstlog  = 1000  ; Write output to .log file every 2 ps

 ; Parameters describing neighbors searching and details about interaction
 ns_type = grid  ; Neighbor list search method (simple,
 nstlist = 50; Neighbor list update frequency (after
 every given number of steps)
 rlist   = 1.2   ; Neighbor list search cut-off distance
 rlistlong   = 1.4
 rcoulomb= 1.2   ; Short-range Coulombic interactions
 cut-off distance (nm)
 rvdw= 1.2   ; Short-range van der Waals cutoff
 distance (nm)
 pbc = xyz   ; Direction in which to use Perodic
 Boundary Conditions (xyz, xy, no)
 cutoff-scheme   =Verlet  ; GPU running

 ; Parameters for treating bonded interactions
 continuation= no; Whether a fresh start or a continuation
 from a previous run (yes/no)
 constraint_algorithm = LINCS; Constraint algorithm (LINCS / SHAKE)
 constraints = all-bonds ; Which bonds/angles 

Re: [gmx-users] GPU running problem with GMX-4.6 beta2

2012-12-17 Thread Albert

On 12/17/2012 06:08 PM, Szilárd Páll wrote:


How about GPU emulation or CPU-only runs? Also, please try setting the
number of therads to 1 (-ntomp 1).



I am running in GPU emulation mode with the GMX_EMULATE_GPU=1 env. var
set (and to match closer the GPU setup with -ntomp 12), it failed with log:

Back Off! I just backed up step33b.pdb to ./#step33b.pdb.2#

Back Off! I just backed up step33c.pdb to ./#step33c.pdb.2#
Wrote pdb files with previous and current coordinates
[CUDANodeA:20753] *** Process received signal ***
[CUDANodeA:20753] Signal: Segmentation fault (11)
[CUDANodeA:20753] Signal code: Address not mapped (1)
[CUDANodeA:20753] Failing at address: 0x106ae6a00

[1]Segmentation faultmdrun_mpi -v -s nvt.tpr -c nvt.gro -g 
nvt.log -x nvt.xtc -ntomp 12

I also tried , number of therads to 1 (-ntomp 1), it failed with following 

Back Off! I just backed up step33c.pdb to ./#step33c.pdb.1#
Wrote pdb files with previous and current coordinates
[CUDANodeA:20740] *** Process received signal ***
[CUDANodeA:20740] Signal: Segmentation fault (11)
[CUDANodeA:20740] Signal code: Address not mapped (1)
[CUDANodeA:20740] Failing at address: 0x1f74a96ec
[CUDANodeA:20740] [ 0] /lib64/ [0x2b351d3022d0]
[CUDANodeA:20740] [ 1] /opt/gromacs-4.6/lib/ 
[CUDANodeA:20740] [ 2] /opt/gromacs-4.6/lib/ 
[CUDANodeA:20740] [ 3] 
/opt/gromacs-4.6/lib/ [0x2b351a9a1bae]
[CUDANodeA:20740] [ 4] 
[CUDANodeA:20740] [ 5] 
[CUDANodeA:20740] [ 6] 
/opt/gromacs-4.6/lib/ [0x2b351aa0a0df]

[CUDANodeA:20740] [ 7] mdrun_mpi(do_md+0x8133) [0x4334c3]
[CUDANodeA:20740] [ 8] mdrun_mpi(mdrunner+0x19e9) [0x411639]
[CUDANodeA:20740] [ 9] mdrun_mpi(main+0x17db) [0x4373db]
[CUDANodeA:20740] [10] /lib64/ 

[CUDANodeA:20740] [11] mdrun_mpi() [0x407f09]
[CUDANodeA:20740] *** End of error message ***

[1]Segmentation faultmdrun_mpi -v -s nvt.tpr -c nvt.gro 
-g nvt.log -x nvt.xtc -ntomp 1

gmx-users mailing
* Please search the archive at before posting!
* Please don't post (un)subscribe requests to the list. Use the 
www interface or send it to

* Can't post? Read

Re: [gmx-users] GPU running problem with GMX-4.6 beta2

2012-12-17 Thread Szilárd Páll
Hi Albert,

Thanks for the testing.

Last questions.
- What version are you using? Is it beta2 release or latest git? if it's
the former, getting the latest git might help if...
-  (do) you happen to be using GMX_GPU_ACCELERATION=None (you shouldn't!)?
A bug triggered only with this setting has been fixed recently.

If the above doesn't help, please file a bug report and attach a tpr so we
can reproduce.



On Mon, Dec 17, 2012 at 6:21 PM, Albert wrote:

 On 12/17/2012 06:08 PM, Szilárd Páll wrote:


 How about GPU emulation or CPU-only runs? Also, please try setting the
 number of therads to 1 (-ntomp 1).



 I am running in GPU emulation mode with the GMX_EMULATE_GPU=1 env. var
 set (and to match closer the GPU setup with -ntomp 12), it failed with log:

 Back Off! I just backed up step33b.pdb to ./#step33b.pdb.2#

 Back Off! I just backed up step33c.pdb to ./#step33c.pdb.2#

 Wrote pdb files with previous and current coordinates
 [CUDANodeA:20753] *** Process received signal ***
 [CUDANodeA:20753] Signal: Segmentation fault (11)
 [CUDANodeA:20753] Signal code: Address not mapped (1)
 [CUDANodeA:20753] Failing at address: 0x106ae6a00

 [1]Segmentation faultmdrun_mpi -v -s nvt.tpr -c nvt.gro -g
 nvt.log -x nvt.xtc -ntomp 12

 I also tried , number of therads to 1 (-ntomp 1), it failed with following

 Back Off! I just backed up step33c.pdb to ./#step33c.pdb.1#

 Wrote pdb files with previous and current coordinates
 [CUDANodeA:20740] *** Process received signal ***
 [CUDANodeA:20740] Signal: Segmentation fault (11)
 [CUDANodeA:20740] Signal code: Address not mapped (1)
 [CUDANodeA:20740] Failing at address: 0x1f74a96ec
 [CUDANodeA:20740] [ 0] /lib64/**0xf2d0) [0x2b351d3022d0]
 [CUDANodeA:20740] [ 1] /opt/gromacs-4.6/lib/libmd_**
 [CUDANodeA:20740] [ 2] /opt/gromacs-4.6/lib/libmd_**
 [CUDANodeA:20740] [ 3] 
 [CUDANodeA:20740] [ 4] /opt/gromacs-4.6/lib/libmd_****0x1eef) [0x2b351a97262f]
 [CUDANodeA:20740] [ 5] /opt/gromacs-4.6/lib/libmd_****0x1756) [0x2b351aa04736]
 [CUDANodeA:20740] [ 6] /opt/gromacs-4.6/lib/libmd_**
 [CUDANodeA:20740] [ 7] mdrun_mpi(do_md+0x8133) [0x4334c3]
 [CUDANodeA:20740] [ 8] mdrun_mpi(mdrunner+0x19e9) [0x411639]
 [CUDANodeA:20740] [ 9] mdrun_mpi(main+0x17db) [0x4373db]
 [CUDANodeA:20740] [10] /lib64/**main+0xfd)
 [CUDANodeA:20740] [11] mdrun_mpi() [0x407f09]
 [CUDANodeA:20740] *** End of error message ***

 [1]Segmentation faultmdrun_mpi -v -s nvt.tpr -c nvt.gro -g
 nvt.log -x nvt.xtc -ntomp 1

 gmx-users mailing**mailman/listinfo/gmx-users
 * Please search the archive at**
 * Please don't post (un)subscribe requests to the list. Use the www
 interface or send it to
 * Can't post? Read**Support/Mailing_Lists

gmx-users mailing
* Please search the archive at before posting!
* Please don't post (un)subscribe requests to the list. Use the
www interface or send it to
* Can't post? Read

Re: [gmx-users] GPU running problem with GMX-4.6 beta2

2012-12-17 Thread Mark Abraham
On Mon, Dec 17, 2012 at 6:01 PM, Albert wrote:


  I reduced the GPU to two, and it said:

 Back Off! I just backed up nvt.log to ./#nvt.log.1#
 Reading file nvt.tpr, VERSION 4.6-dev-20121004-5d6c49d (single precision)

This is a development version from October 1. Please use the mdrun version
you think you're using :-)

gmx-users mailing
* Please search the archive at before posting!
* Please don't post (un)subscribe requests to the list. Use the 
www interface or send it to
* Can't post? Read

Re: [gmx-users] GPU running problem with GMX-4.6 beta2

2012-12-17 Thread Szilárd Páll
On Mon, Dec 17, 2012 at 7:56 PM, Mark Abraham mark.j.abra...@gmail.comwrote:

 On Mon, Dec 17, 2012 at 6:01 PM, Albert wrote:

   I reduced the GPU to two, and it said:
  Back Off! I just backed up nvt.log to ./#nvt.log.1#
  Reading file nvt.tpr, VERSION 4.6-dev-20121004-5d6c49d (single precision)

 This is a development version from October 1. Please use the mdrun version
 you think you're using :-)

Thanks Mark, good catch!


 gmx-users mailing
 * Please search the archive at before posting!
 * Please don't post (un)subscribe requests to the list. Use the
 www interface or send it to
 * Can't post? Read

gmx-users mailing
* Please search the archive at before posting!
* Please don't post (un)subscribe requests to the list. Use the
www interface or send it to
* Can't post? Read

Re: [gmx-users] GPU running problem with GMX-4.6 beta2

2012-12-17 Thread Albert

well, that's one of the log files.
I've tried both

VERSION 4.6-dev-20121004-5d6c49d
VERSION 4.6-beta1
VERSION 4.6-beta2
and the latest 5.0 by git.

the problems are the same.:-(

On 12/17/2012 07:56 PM, Mark Abraham wrote:

On Mon, Dec 17, 2012 at 6:01 PM,  wrote:


  I reduced the GPU to two, and it said:

Back Off! I just backed up nvt.log to ./#nvt.log.1#
Reading file nvt.tpr, VERSION 4.6-dev-20121004-5d6c49d (single precision)

This is a development version from October 1. Please use the mdrun version
you think you're using:-)


gmx-users mailing
* Please search the archive at before posting!
* Please don't post (un)subscribe requests to the list. Use the 
www interface or send it to

* Can't post? Read

Re: [gmx-users] GPU running problem with GMX-4.6 beta2

2012-12-17 Thread Justin Lemkul

On 12/17/12 2:03 PM, Albert wrote:

well, that's one of the log files.
I've tried both

VERSION 4.6-dev-20121004-5d6c49d
VERSION 4.6-beta1
VERSION 4.6-beta2
and the latest 5.0 by git.

the problems are the same.:-(

It seems to me that the system is simply crashing like any other that becomes 
unstable.  Does the simulation run at all on plain CPU?


On 12/17/2012 07:56 PM, Mark Abraham wrote:

On Mon, Dec 17, 2012 at 6:01 PM,  wrote:


  I reduced the GPU to two, and it said:

Back Off! I just backed up nvt.log to ./#nvt.log.1#
Reading file nvt.tpr, VERSION 4.6-dev-20121004-5d6c49d (single precision)

This is a development version from October 1. Please use the mdrun version
you think you're using:-)



Justin A. Lemkul, Ph.D.
Research Scientist
Department of Biochemistry
Virginia Tech
Blacksburg, VA
jalemkul[at] | (540) 231-9080

gmx-users mailing
* Please search the archive at before posting!
* Please don't post (un)subscribe requests to the list. Use the 
www interface or send it to

* Can't post? Read

Re: [gmx-users] GPU warnings

2012-12-11 Thread Szilárd Páll
Hi Thomas,

It looks like some gcc 4.7-s don't work with CUDA, although I've been using
various Ubuntu/Linaro versions, most recently 4.7.2 and had no
issues whatsoever. Some people seem to have bumped into the same problem
(see or and the suggested fix is
to put
#undef _GLIBCXX_USE_INT128
in a header and pre-include it for nvcc by calling it like this:
nvcc --pre-include undef_atomics_int128.h



On Sun, Dec 9, 2012 at 12:18 PM, Thomas Evangelidis teva...@gmail.comwrote:

   gcc 4.7.2 is not supported by any CUDA version.
  I suggest that you just fix it by editing the include/host_config.h and
  changing the version check macro (line 82 AFAIK). I've never had real
  problems with using new and officially not supported gcc-s, the version
  check is more of a promise from NVIDIA that we've tested thoroughly
  internally and we more or less vouch for thins combination.
  Disclamer: I don't take responsibility if your machine goes up in flames!
 Hi Szilárd,,

 I tried to compile gromacs-4.6beta1, is this the version you suggested? If
 not, please indicate how to download the source cause I am confused with
 all these development versions.

 Anyway, this is the error I get with 4.6beta1, gcc 4.7.2 and cuda 5:

 [  0%] Building NVCC (Device) object


 error: identifier __atomic_fetch_add is undefined

 error: identifier __atomic_fetch_add is undefined

 2 errors detected in the compilation of
 CMake Error at (message):
   Error generating file


 gmake[3]: ***

 Error 1
 gmake[2]: *** [src/gromacs/gmxlib/cuda_tools/CMakeFiles/cuda_tools.dir/all]
 Error 2
 gmake[1]: *** [src/programs/mdrun/CMakeFiles/mdrun.dir/rule] Error 2
 gmake: *** [mdrun] Error 2

 Unless I am missing something, cuda 5 does not support gcc 4.7.2.

 gmx-users mailing
 * Please search the archive at before posting!
 * Please don't post (un)subscribe requests to the list. Use the
 www interface or send it to
 * Can't post? Read

gmx-users mailing
* Please search the archive at before posting!
* Please don't post (un)subscribe requests to the list. Use the
www interface or send it to
* Can't post? Read

Re: [gmx-users] GPU warnings

2012-12-11 Thread Mirco Wahab

Am 11.12.2012 16:04, schrieb Szilárd Páll:

It looks like some gcc 4.7-s don't work with CUDA, although I've been using
various Ubuntu/Linaro versions, most recently 4.7.2 and had no
issues whatsoever. Some people seem to have bumped into the same problem
(see or and the suggested fix is
to put
#undef _GLIBCXX_USE_INT128
in a header and pre-include it for nvcc by calling it like this:
nvcc --pre-include undef_atomics_int128.h

The same problem occurs in SuSE 12.2/x64 with it's default 4.7.2

Another possible fix on SuSE 12.2: install the (older) gcc repository
from 12.1/x64 (with lower priority), install the gcc/g++ 4.6 from there
as an alternative compiler and select the active gcc through the
update-alternatives --config gcc mechanism. This works very well.



gmx-users mailing
* Please search the archive at before posting!
* Please don't post (un)subscribe requests to the list. Use the 
www interface or send it to

* Can't post? Read

Re: [gmx-users] GPU warnings

2012-12-11 Thread Szilárd Páll
On Tue, Dec 11, 2012 at 6:49 PM, Mirco Wahab wrote:

 Am 11.12.2012 16:04, schrieb Szilárd Páll:

  It looks like some gcc 4.7-s don't work with CUDA, although I've been
 various Ubuntu/Linaro versions, most recently 4.7.2 and had no
 issues whatsoever. Some people seem to have bumped into the same problem
 (see or and the suggested fix is
 to put
 #undef _GLIBCXX_USE_INT128
 in a header and pre-include it for nvcc by calling it like this:
 nvcc --pre-include undef_atomics_int128.h

 The same problem occurs in SuSE 12.2/x64 with it's default 4.7.2

 Another possible fix on SuSE 12.2: install the (older) gcc repository
 from 12.1/x64 (with lower priority), install the gcc/g++ 4.6 from there
 as an alternative compiler and select the active gcc through the
 update-alternatives --config gcc mechanism. This works very well.

Thanks for the info. The Ubuntu/Linaro version must have a fix for
this. Unfortunately, we can't do much about it and gcc 4.7 is anyway
blocked by the CUDA 5.0 headers.

FYI: Verlet scheme nonbonded kernels (and probably the group scheme as
well), especially with AVX, can be quite a bit slower with older gcc

I find it really annoying (and stupid) that NVIDIA did not fix their
compiler to work with gcc 4.7 which had already been out for almost a half
a year at the time of the CUDA 5.0 release.




 gmx-users mailing**mailman/listinfo/gmx-users
 * Please search the archive at**
 * Please don't post (un)subscribe requests to the list. Use the www
 interface or send it to
 * Can't post? Read**Support/Mailing_Lists

gmx-users mailing
* Please search the archive at before posting!
* Please don't post (un)subscribe requests to the list. Use the
www interface or send it to
* Can't post? Read

Re: [gmx-users] GPU compatibility

2012-12-10 Thread Mark Abraham
Correct, C1060 does not have the CUDA 2.0 compute capability required for
GROMACS 4.6. We will not have the ability to support GPU cards of lower
capability in the future. Unfortunately, your only GROMACS options are
probably to use the OpenMM functionality in 4.5.x (which is still present
in 4.6, works as far as we know, but is not in our regular test suite and
the feature is probably headed for deprecation). This will not perform as
well as the new native GPU acceleration, and supports a smaller range of
features, but might be better than wasting the GPUs.



On Mon, Dec 10, 2012 at 7:50 AM, Cara Kreck wrote:


 We've got a GPU cluster in our group and have really been looking forward
 to running gromacs on it with full functionality. Unfortunately, it looks
 like our NVIDIA Tesla C1060 cards aren't supported by the 4.6 beta. I was
 just wondering if there was any chance that they would be supported in the
 full version? These cards are only a couple of years old now and were
 bought specifically for running MD.



 gmx-users mailing
 * Please search the archive at before posting!
 * Please don't post (un)subscribe requests to the list. Use the
 www interface or send it to
 * Can't post? Read

gmx-users mailing
* Please search the archive at before posting!
* Please don't post (un)subscribe requests to the list. Use the 
www interface or send it to
* Can't post? Read

Re: [gmx-users] GPU warnings

2012-12-09 Thread Thomas Evangelidis
  gcc 4.7.2 is not supported by any CUDA version.

 I suggest that you just fix it by editing the include/host_config.h and
 changing the version check macro (line 82 AFAIK). I've never had real
 problems with using new and officially not supported gcc-s, the version
 check is more of a promise from NVIDIA that we've tested thoroughly
 internally and we more or less vouch for thins combination.


 Disclamer: I don't take responsibility if your machine goes up in flames!

Hi Szilárd,,

I tried to compile gromacs-4.6beta1, is this the version you suggested? If
not, please indicate how to download the source cause I am confused with
all these development versions.

Anyway, this is the error I get with 4.6beta1, gcc 4.7.2 and cuda 5:

[  0%] Building NVCC (Device) object
error: identifier __atomic_fetch_add is undefined

error: identifier __atomic_fetch_add is undefined

2 errors detected in the compilation of
CMake Error at (message):
  Error generating file


gmake[3]: ***
Error 1
gmake[2]: *** [src/gromacs/gmxlib/cuda_tools/CMakeFiles/cuda_tools.dir/all]
Error 2
gmake[1]: *** [src/programs/mdrun/CMakeFiles/mdrun.dir/rule] Error 2
gmake: *** [mdrun] Error 2

Unless I am missing something, cuda 5 does not support gcc 4.7.2.

gmx-users mailing
* Please search the archive at before posting!
* Please don't post (un)subscribe requests to the list. Use the
www interface or send it to
* Can't post? Read

Re: [gmx-users] GPU warnings

2012-11-26 Thread Szilárd Páll
On Sun, Nov 25, 2012 at 8:47 PM, Thomas Evangelidis teva...@gmail.comwrote:

 Hi Szilárd,

 I was able to run code compiled with icc 13 on Fedora 17, but as I don't
  have Intel Compiler v13 on this machine I can't check it now.
  Please check if it works for you with gcc 4.7.2 (which is the default)
  let me know if you succeed. The performance difference between icc and
  on your processor should be negligible with GPU runs and at most 5-10%
  CPU-only runs.
  As the issue is quite annoying, I'll try to have a look later, probably
  after the beta is out.
 gcc 4.7.2 is not supported by any CUDA version.

I suggest that you just fix it by editing the include/host_config.h and
changing the version check macro (line 82 AFAIK). I've never had real
problems with using new and officially not supported gcc-s, the version
check is more of a promise from NVIDIA that we've tested thoroughly
internally and we more or less vouch for thins combination.


Disclamer: I don't take responsibility if your machine goes up in flames! ;)

 gmx-users mailing
 * Please search the archive at before posting!
 * Please don't post (un)subscribe requests to the list. Use the
 www interface or send it to
 * Can't post? Read

gmx-users mailing
* Please search the archive at before posting!
* Please don't post (un)subscribe requests to the list. Use the
www interface or send it to
* Can't post? Read

Re: [gmx-users] GPU warnings

2012-11-25 Thread Thomas Evangelidis
Hi Szilárd,

I was able to run code compiled with icc 13 on Fedora 17, but as I don't
 have Intel Compiler v13 on this machine I can't check it now.

 Please check if it works for you with gcc 4.7.2 (which is the default) and
 let me know if you succeed. The performance difference between icc and gcc
 on your processor should be negligible with GPU runs and at most 5-10% with
 CPU-only runs.

 As the issue is quite annoying, I'll try to have a look later, probably
 after the beta is out.

gcc 4.7.2 is not supported by any CUDA version.

gmx-users mailing
* Please search the archive at before posting!
* Please don't post (un)subscribe requests to the list. Use the
www interface or send it to
* Can't post? Read

Re: [gmx-users] GPU warnings

2012-11-21 Thread Szilárd Páll
On Mon, Nov 19, 2012 at 6:25 PM, Szilárd Páll

 On Mon, Nov 19, 2012 at 4:09 PM, Thomas Evangelidis teva...@gmail.comwrote:

 Hi Szilárd,

 I compiled with the Intel compilers, not gcc. In case I am missing
 something, these are the versions I have:

 Indeed, I see it now in the log file. Let me try with icc 13 and will get
 back to you.

I was able to run code compiled with icc 13 on Fedora 17, but as I don't
have Intel Compiler v13 on this machine I can't check it now.

Please check if it works for you with gcc 4.7.2 (which is the default) and
let me know if you succeed. The performance difference between icc and gcc
on your processor should be negligible with GPU runs and at most 5-10% with
CPU-only runs.

As the issue is quite annoying, I'll try to have a look later, probably
after the beta is out.


 glibc.x86_64  2.15-57.fc17
 glibc-common.x86_64   2.15-57.fc17
 glibc-devel.i686  2.15-57.fc17
 glibc-headers.x86_64  2.15-57.fc17   @updates

 gcc-gfortran.x86_64   4.7.2-2.fc17
 libgcc.i686   4.7.2-2.fc17
 libgcc.x86_64 4.7.2-2.fc17   @updates


 On 19 November 2012 16:57, Szilárd Páll wrote:

  Thomas  Albert,
  We are unable to reproduce the issue on FC 17 with glibc 2.15-58 and gcc
  Please try to update your packages (you should have updates available
  glibc), try recompiling with the latest 4.6 code and report back whether
  you succeed.
  On Fri, Nov 16, 2012 at 4:31 PM, Szilárd Páll
   Hi Albert,
   Apologies for hijacking your thread. Do you happen to have Fedora 17
   On Sun, Nov 4, 2012 at 10:55 AM, Albert wrote:
I am running Gromacs 4.6 GPU on a workstation with two GTX 660 Ti
 (2 x
   1344 CUDA cores), and I got the following warnings:
   thank you very much.
   WARNING: On node 0: oversubscribing the available 0 logical CPU cores
   node with 2 MPI processes.
This will cause considerable performance loss!
   2 GPUs detected on host boreas:
 #0: NVIDIA GeForce GTX 660 Ti, compute cap.: 3.0, ECC:  no, stat:
 #1: NVIDIA GeForce GTX 660 Ti, compute cap.: 3.0, ECC:  no, stat:
   2 GPUs auto-selected to be used for this run: #0, #1
   Using CUDA 8x8x8 non-bonded kernels
   Making 1D domain decomposition 1 x 2 x 1
   We have just committed the new CPU detection code in this branch,
   and will commit new SSE/AVX kernels in a few days. However, this
   means that currently only the NxN kernels are accelerated!
   In the mean time, you might want to avoid production runs in 4.6.
   gmx-users mailing**mailman/listinfo/gmx-users
   * Please search the archive at**
   Support/Mailing_Lists/Search posting!
   * Please don't post (un)subscribe requests to the list. Use the www
   interface or send it to
   * Can't post? Read**Support/Mailing_Lists
  gmx-users mailing
  * Please search the archive at before posting!
  * Please don't post (un)subscribe requests to the list. Use the
  www interface or send it to
  * Can't post? Read



 Thomas Evangelidis

 PhD student
 University of Athens
 Faculty of Pharmacy
 Department of Pharmaceutical Chemistry
 157 71 Athens


 gmx-users mailing
 * Please search the archive at before posting!
 * Please don't post (un)subscribe requests to the list. 

Re: [gmx-users] GPU warnings

2012-11-19 Thread Szilárd Páll
Thomas  Albert,

We are unable to reproduce the issue on FC 17 with glibc 2.15-58 and gcc

Please try to update your packages (you should have updates available for
glibc), try recompiling with the latest 4.6 code and report back whether
you succeed.



On Fri, Nov 16, 2012 at 4:31 PM, Szilárd Páll

 Hi Albert,

 Apologies for hijacking your thread. Do you happen to have Fedora 17 as


 On Sun, Nov 4, 2012 at 10:55 AM, Albert wrote:


  I am running Gromacs 4.6 GPU on a workstation with two GTX 660 Ti (2 x
 1344 CUDA cores), and I got the following warnings:

 thank you very much.


 WARNING: On node 0: oversubscribing the available 0 logical CPU cores per
 node with 2 MPI processes.
  This will cause considerable performance loss!

 2 GPUs detected on host boreas:
   #0: NVIDIA GeForce GTX 660 Ti, compute cap.: 3.0, ECC:  no, stat:
   #1: NVIDIA GeForce GTX 660 Ti, compute cap.: 3.0, ECC:  no, stat:

 2 GPUs auto-selected to be used for this run: #0, #1

 Using CUDA 8x8x8 non-bonded kernels
 Making 1D domain decomposition 1 x 2 x 1

 We have just committed the new CPU detection code in this branch,
 and will commit new SSE/AVX kernels in a few days. However, this
 means that currently only the NxN kernels are accelerated!
 In the mean time, you might want to avoid production runs in 4.6.

 gmx-users mailing**mailman/listinfo/gmx-users
 * Please search the archive at**
 * Please don't post (un)subscribe requests to the list. Use the www
 interface or send it to
 * Can't post? Read**Support/Mailing_Lists

gmx-users mailing
* Please search the archive at before posting!
* Please don't post (un)subscribe requests to the list. Use the
www interface or send it to
* Can't post? Read

Re: [gmx-users] GPU warnings

2012-11-19 Thread Thomas Evangelidis
Hi Szilárd,

I compiled with the Intel compilers, not gcc. In case I am missing
something, these are the versions I have:

glibc.x86_64  2.15-57.fc17
glibc-common.x86_64   2.15-57.fc17
glibc-devel.i686  2.15-57.fc17
glibc-headers.x86_64  2.15-57.fc17   @updates

gcc-gfortran.x86_64   4.7.2-2.fc17
libgcc.i686   4.7.2-2.fc17
libgcc.x86_64 4.7.2-2.fc17   @updates


On 19 November 2012 16:57, Szilárd Páll wrote:

 Thomas  Albert,

 We are unable to reproduce the issue on FC 17 with glibc 2.15-58 and gcc

 Please try to update your packages (you should have updates available for
 glibc), try recompiling with the latest 4.6 code and report back whether
 you succeed.



 On Fri, Nov 16, 2012 at 4:31 PM, Szilárd Páll

  Hi Albert,
  Apologies for hijacking your thread. Do you happen to have Fedora 17 as
  On Sun, Nov 4, 2012 at 10:55 AM, Albert wrote:
   I am running Gromacs 4.6 GPU on a workstation with two GTX 660 Ti (2 x
  1344 CUDA cores), and I got the following warnings:
  thank you very much.
  WARNING: On node 0: oversubscribing the available 0 logical CPU cores
  node with 2 MPI processes.
   This will cause considerable performance loss!
  2 GPUs detected on host boreas:
#0: NVIDIA GeForce GTX 660 Ti, compute cap.: 3.0, ECC:  no, stat:
#1: NVIDIA GeForce GTX 660 Ti, compute cap.: 3.0, ECC:  no, stat:
  2 GPUs auto-selected to be used for this run: #0, #1
  Using CUDA 8x8x8 non-bonded kernels
  Making 1D domain decomposition 1 x 2 x 1
  We have just committed the new CPU detection code in this branch,
  and will commit new SSE/AVX kernels in a few days. However, this
  means that currently only the NxN kernels are accelerated!
  In the mean time, you might want to avoid production runs in 4.6.
  gmx-users mailing**mailman/listinfo/gmx-users
  * Please search the archive at**
  Support/Mailing_Lists/Search posting!
  * Please don't post (un)subscribe requests to the list. Use the www
  interface or send it to
  * Can't post? Read**Support/Mailing_Lists
 gmx-users mailing
 * Please search the archive at before posting!
 * Please don't post (un)subscribe requests to the list. Use the
 www interface or send it to
 * Can't post? Read



Thomas Evangelidis

PhD student
University of Athens
Faculty of Pharmacy
Department of Pharmaceutical Chemistry
157 71 Athens


gmx-users mailing
* Please search the archive at before posting!
* Please don't post (un)subscribe requests to the list. Use the
www interface or send it to
* Can't post? Read

Re: [gmx-users] GPU warnings

2012-11-19 Thread Szilárd Páll
On Mon, Nov 19, 2012 at 4:09 PM, Thomas Evangelidis teva...@gmail.comwrote:

 Hi Szilárd,

 I compiled with the Intel compilers, not gcc. In case I am missing
 something, these are the versions I have:

Indeed, I see it now in the log file. Let me try with icc 13 and will get
back to you.

 glibc.x86_64  2.15-57.fc17
 glibc-common.x86_64   2.15-57.fc17
 glibc-devel.i686  2.15-57.fc17
 glibc-headers.x86_64  2.15-57.fc17   @updates

 gcc-gfortran.x86_64   4.7.2-2.fc17
 libgcc.i686   4.7.2-2.fc17
 libgcc.x86_64 4.7.2-2.fc17   @updates


 On 19 November 2012 16:57, Szilárd Páll wrote:

  Thomas  Albert,
  We are unable to reproduce the issue on FC 17 with glibc 2.15-58 and gcc
  Please try to update your packages (you should have updates available for
  glibc), try recompiling with the latest 4.6 code and report back whether
  you succeed.
  On Fri, Nov 16, 2012 at 4:31 PM, Szilárd Páll
   Hi Albert,
   Apologies for hijacking your thread. Do you happen to have Fedora 17 as
   On Sun, Nov 4, 2012 at 10:55 AM, Albert wrote:
I am running Gromacs 4.6 GPU on a workstation with two GTX 660 Ti (2
   1344 CUDA cores), and I got the following warnings:
   thank you very much.
   WARNING: On node 0: oversubscribing the available 0 logical CPU cores
   node with 2 MPI processes.
This will cause considerable performance loss!
   2 GPUs detected on host boreas:
 #0: NVIDIA GeForce GTX 660 Ti, compute cap.: 3.0, ECC:  no, stat:
 #1: NVIDIA GeForce GTX 660 Ti, compute cap.: 3.0, ECC:  no, stat:
   2 GPUs auto-selected to be used for this run: #0, #1
   Using CUDA 8x8x8 non-bonded kernels
   Making 1D domain decomposition 1 x 2 x 1
   We have just committed the new CPU detection code in this branch,
   and will commit new SSE/AVX kernels in a few days. However, this
   means that currently only the NxN kernels are accelerated!
   In the mean time, you might want to avoid production runs in 4.6.
   gmx-users mailing**mailman/listinfo/gmx-users
   * Please search the archive at**
   Support/Mailing_Lists/Search posting!
   * Please don't post (un)subscribe requests to the list. Use the www
   interface or send it to
   * Can't post? Read**Support/Mailing_Lists
  gmx-users mailing
  * Please search the archive at before posting!
  * Please don't post (un)subscribe requests to the list. Use the
  www interface or send it to
  * Can't post? Read



 Thomas Evangelidis

 PhD student
 University of Athens
 Faculty of Pharmacy
 Department of Pharmaceutical Chemistry
 157 71 Athens


 gmx-users mailing
 * Please search the archive at before posting!
 * Please don't post (un)subscribe requests to the list. Use the
 www interface or send it to
 * Can't post? Read

gmx-users mailing
* Please search the archive at before posting!
* Please don't post (un)subscribe requests to the list. Use the
www interface or send it to
* Can't post? Read

Re: [gmx-users] GPU warnings

2012-11-16 Thread Szilárd Páll
Hi Thomas,

The output you get means that you don't have any of the macros we try to
use although your man pages seem to be referring to them. Hence, I'm really
clueless why is this happening. Could you please file a bug report on and add both the initial output as well as my patch and
the resulting output. Don't forget to specify version of software you were


On Thu, Nov 15, 2012 at 3:53 PM, Thomas Evangelidis teva...@gmail.comwrote:

 Hi Szilárd,

 This is the warning message I get this time:

 WARNING: Oversubscribing the available -66 logical CPU cores with 1
 thread-MPI threads.

  This will cause considerable performance loss!

 I have also attached the md.log file.


 On 14 November 2012 19:48, Szilárd Páll wrote:

 Hi Thomas,

 Could you please try applying the attached patch (git apply
 hardware_detect.patch in the 4.6 source root) and let me know what the
 output is?

 This should show which sysconf macro is used and what its return value is
 as well as indicate if none of the macros are in fact defined by your



 On Sat, Nov 10, 2012 at 5:24 PM, Thomas Evangelidis teva...@gmail.comwrote:

 On 10 November 2012 03:21, Szilárd Páll wrote:


 You must have an odd sysconf version! Could you please check what is
 the sysconf system variable's name in the sysconf man page (man sysconf)
 where it says something like:

  The number of processors currently online.

 The first line should be one of the
 _SC_NPROCESSORS_CONF, _SC_NPROC_CONF, but I guess yours is something

 The following text is taken from man sysconf:

These values also exist, but may not be standard.

   The number of pages of physical memory.  Note that it is
 possible for the product of this value and the value of _SC_PAGE_SIZE to

   The number of currently available pages of physical memory.

   The number of processors configured.

   The number of processors currently online (available).

 Can you also check what your glibc version is?

 $ yum list installed | grep glibc
 glibc.x86_64  2.15-57.fc17
 glibc-common.x86_64   2.15-57.fc17
 glibc-devel.i686  2.15-57.fc17
 glibc-headers.x86_64  2.15-57.fc17

 On Fri, Nov 9, 2012 at 5:51 PM, Thomas Evangelidis 

  I get these two warnings when I run the dhfr/GPU/dhfr-solv-PME.bench
  benchmark with the following command line:
  mdrun_intel_cuda5 -v -s topol.tpr -testverlet
  WARNING: Oversubscribing the available 0 logical CPU cores with 1
  thread-MPI threads.
  0 logical CPU cores? Isn't this bizarre? My CPU is Intel Core

 That is bizzarre. Could you run with -debug 1 and have a look at the
 mdrun.debug output which should contain a message like:
 Detected N processors, will use this as the number of supported

 I'm wondering, is N=0 in your case!?

 It says Detected 0 processors, will use this as the number of
 supported hardware threads.

  (2.3 GHz). Unlike Albert, I don't see any performance loss, I get
  ns/day on a single core with 1 GPU and 13.2 ns/day with GROMACS
 v4.5.5 on 4
  cores (8 threads) without the GPU. Yet, I don't see any performance
  with more that 4 -nt threads.
  mdrun_intel_cuda5 -v -nt 2 -s topol.tpr -testverlet : 15.4 ns/day
  mdrun_intel_cuda5 -v -nt 3 -s topol.tpr -testverlet : 16.0 ns/day
  mdrun_intel_cuda5 -v -nt 4 -s topol.tpr -testverlet : 16.3 ns/day
  mdrun_intel_cuda5 -v -nt 6 -s topol.tpr -testverlet : 16.2 ns/day
  mdrun_intel_cuda5 -v -nt 8 -s topol.tpr -testverlet : 15.4 ns/day

 I guess there is not much point in not using all cores, is it? Note
 the performance drops after 4 threads because Hyper-Threading with
 doesn't always help.

  I have also attached my log file (from mdrun_intel_cuda5 -v -s
  -testverlet) in case you find it helpful.

 I don't see it attached.

 I have attached both mdrun_intel_cuda5.debug and md.log files.  They
 will possibly be filtered by the mailing list but will be delivered to 




 Thomas Evangelidis

 PhD student
 University of Athens
 Faculty of Pharmacy
 Department of Pharmaceutical Chemistry
 157 71 Athens



Re: [gmx-users] GPU warnings

2012-11-16 Thread Szilárd Páll
Hi Albert,

Apologies for hijacking your thread. Do you happen to have Fedora 17 as


On Sun, Nov 4, 2012 at 10:55 AM, Albert wrote:


  I am running Gromacs 4.6 GPU on a workstation with two GTX 660 Ti (2 x
 1344 CUDA cores), and I got the following warnings:

 thank you very much.


 WARNING: On node 0: oversubscribing the available 0 logical CPU cores per
 node with 2 MPI processes.
  This will cause considerable performance loss!

 2 GPUs detected on host boreas:
   #0: NVIDIA GeForce GTX 660 Ti, compute cap.: 3.0, ECC:  no, stat:
   #1: NVIDIA GeForce GTX 660 Ti, compute cap.: 3.0, ECC:  no, stat:

 2 GPUs auto-selected to be used for this run: #0, #1

 Using CUDA 8x8x8 non-bonded kernels
 Making 1D domain decomposition 1 x 2 x 1

 We have just committed the new CPU detection code in this branch,
 and will commit new SSE/AVX kernels in a few days. However, this
 means that currently only the NxN kernels are accelerated!
 In the mean time, you might want to avoid production runs in 4.6.

 gmx-users mailing**mailman/listinfo/gmx-users
 * Please search the archive at**
 * Please don't post (un)subscribe requests to the list. Use the www
 interface or send it to
 * Can't post? Read**Support/Mailing_Lists

gmx-users mailing
* Please search the archive at before posting!
* Please don't post (un)subscribe requests to the list. Use the
www interface or send it to
* Can't post? Read

Re: [gmx-users] GPU warnings

2012-11-15 Thread Justin Lemkul

On 11/15/12 9:53 AM, Thomas Evangelidis wrote:

Hi Szilárd,

This is the warning message I get this time:

WARNING: Oversubscribing the available -66 logical CPU cores with 1
thread-MPI threads.
  This will cause considerable performance loss!

I have also attached the md.log file.

Attachments are rejected by the mailing list.  They either have to be copied and 
pasted, linked, or sent to an individual specifically off-list.



Justin A. Lemkul, Ph.D.
Research Scientist
Department of Biochemistry
Virginia Tech
Blacksburg, VA
jalemkul[at] | (540) 231-9080

gmx-users mailing
* Please search the archive at before posting!
* Please don't post (un)subscribe requests to the list. Use the 
www interface or send it to

* Can't post? Read

Re: [gmx-users] GPU warnings

2012-11-14 Thread Szilárd Páll
Hi Thomas,

Could you please try applying the attached patch (git apply
hardware_detect.patch in the 4.6 source root) and let me know what the
output is?

This should show which sysconf macro is used and what its return value is
as well as indicate if none of the macros are in fact defined by your



On Sat, Nov 10, 2012 at 5:24 PM, Thomas Evangelidis teva...@gmail.comwrote:

 On 10 November 2012 03:21, Szilárd Páll wrote:


 You must have an odd sysconf version! Could you please check what is the
 sysconf system variable's name in the sysconf man page (man sysconf) where
 it says something like:

  The number of processors currently online.

 The first line should be one of the
 _SC_NPROCESSORS_CONF, _SC_NPROC_CONF, but I guess yours is something

 The following text is taken from man sysconf:

These values also exist, but may not be standard.

   The number of pages of physical memory.  Note that it is
 possible for the product of this value and the value of _SC_PAGE_SIZE to

   The number of currently available pages of physical memory.

   The number of processors configured.

   The number of processors currently online (available).

 Can you also check what your glibc version is?

 $ yum list installed | grep glibc
 glibc.x86_64  2.15-57.fc17
 glibc-common.x86_64   2.15-57.fc17
 glibc-devel.i686  2.15-57.fc17
 glibc-headers.x86_64  2.15-57.fc17   @updates

 On Fri, Nov 9, 2012 at 5:51 PM, Thomas Evangelidis teva...@gmail.comwrote:

  I get these two warnings when I run the dhfr/GPU/dhfr-solv-PME.bench
  benchmark with the following command line:
  mdrun_intel_cuda5 -v -s topol.tpr -testverlet
  WARNING: Oversubscribing the available 0 logical CPU cores with 1
  thread-MPI threads.
  0 logical CPU cores? Isn't this bizarre? My CPU is Intel Core

 That is bizzarre. Could you run with -debug 1 and have a look at the
 mdrun.debug output which should contain a message like:
 Detected N processors, will use this as the number of supported

 I'm wondering, is N=0 in your case!?

 It says Detected 0 processors, will use this as the number of
 supported hardware threads.

  (2.3 GHz). Unlike Albert, I don't see any performance loss, I get 13.4
  ns/day on a single core with 1 GPU and 13.2 ns/day with GROMACS
 v4.5.5 on 4
  cores (8 threads) without the GPU. Yet, I don't see any performance
  with more that 4 -nt threads.
  mdrun_intel_cuda5 -v -nt 2 -s topol.tpr -testverlet : 15.4 ns/day
  mdrun_intel_cuda5 -v -nt 3 -s topol.tpr -testverlet : 16.0 ns/day
  mdrun_intel_cuda5 -v -nt 4 -s topol.tpr -testverlet : 16.3 ns/day
  mdrun_intel_cuda5 -v -nt 6 -s topol.tpr -testverlet : 16.2 ns/day
  mdrun_intel_cuda5 -v -nt 8 -s topol.tpr -testverlet : 15.4 ns/day

 I guess there is not much point in not using all cores, is it? Note that
 the performance drops after 4 threads because Hyper-Threading with
 doesn't always help.

  I have also attached my log file (from mdrun_intel_cuda5 -v -s
  -testverlet) in case you find it helpful.

 I don't see it attached.

 I have attached both mdrun_intel_cuda5.debug and md.log files.  They
 will possibly be filtered by the mailing list but will be delivered to your




 Thomas Evangelidis

 PhD student
 University of Athens
 Faculty of Pharmacy
 Department of Pharmaceutical Chemistry
 157 71 Athens



gmx-users mailing
* Please search the archive at before posting!
* Please don't post (un)subscribe requests to the list. Use the 
www interface or send it to
* Can't post? Read

Re: [gmx-users] GPU warnings

2012-11-10 Thread Thomas Evangelidis
On 10 November 2012 03:21, Szilárd Páll wrote:


 You must have an odd sysconf version! Could you please check what is the
 sysconf system variable's name in the sysconf man page (man sysconf) where
 it says something like:

  The number of processors currently online.

 The first line should be one of the
 _SC_NPROCESSORS_CONF, _SC_NPROC_CONF, but I guess yours is something

The following text is taken from man sysconf:

   These values also exist, but may not be standard.

  The number of pages of physical memory.  Note that it is
possible for the product of this value and the value of _SC_PAGE_SIZE to

  The number of currently available pages of physical memory.

  The number of processors configured.

  The number of processors currently online (available).

 Can you also check what your glibc version is?

$ yum list installed | grep glibc
glibc.x86_64  2.15-57.fc17
glibc-common.x86_64   2.15-57.fc17
glibc-devel.i686  2.15-57.fc17
glibc-headers.x86_64  2.15-57.fc17   @updates

 On Fri, Nov 9, 2012 at 5:51 PM, Thomas Evangelidis teva...@gmail.comwrote:

  I get these two warnings when I run the dhfr/GPU/dhfr-solv-PME.bench
  benchmark with the following command line:
  mdrun_intel_cuda5 -v -s topol.tpr -testverlet
  WARNING: Oversubscribing the available 0 logical CPU cores with 1
  thread-MPI threads.
  0 logical CPU cores? Isn't this bizarre? My CPU is Intel Core i7-3610QM

 That is bizzarre. Could you run with -debug 1 and have a look at the
 mdrun.debug output which should contain a message like:
 Detected N processors, will use this as the number of supported hardware

 I'm wondering, is N=0 in your case!?

 It says Detected 0 processors, will use this as the number of supported
 hardware threads.

  (2.3 GHz). Unlike Albert, I don't see any performance loss, I get 13.4
  ns/day on a single core with 1 GPU and 13.2 ns/day with GROMACS v4.5.5
 on 4
  cores (8 threads) without the GPU. Yet, I don't see any performance
  with more that 4 -nt threads.
  mdrun_intel_cuda5 -v -nt 2 -s topol.tpr -testverlet : 15.4 ns/day
  mdrun_intel_cuda5 -v -nt 3 -s topol.tpr -testverlet : 16.0 ns/day
  mdrun_intel_cuda5 -v -nt 4 -s topol.tpr -testverlet : 16.3 ns/day
  mdrun_intel_cuda5 -v -nt 6 -s topol.tpr -testverlet : 16.2 ns/day
  mdrun_intel_cuda5 -v -nt 8 -s topol.tpr -testverlet : 15.4 ns/day

 I guess there is not much point in not using all cores, is it? Note that
 the performance drops after 4 threads because Hyper-Threading with OpenMP
 doesn't always help.

  I have also attached my log file (from mdrun_intel_cuda5 -v -s
  -testverlet) in case you find it helpful.

 I don't see it attached.

 I have attached both mdrun_intel_cuda5.debug and md.log files.  They will
 possibly be filtered by the mailing list but will be delivered to your




Thomas Evangelidis

PhD student
University of Athens
Faculty of Pharmacy
Department of Pharmaceutical Chemistry
157 71 Athens


gmx-users mailing
* Please search the archive at before posting!
* Please don't post (un)subscribe requests to the list. Use the
www interface or send it to
* Can't post? Read

Re: [gmx-users] GPU warnings

2012-11-09 Thread Szilárd Páll

On Tue, Nov 6, 2012 at 12:03 AM, Thomas Evangelidis teva...@gmail.comwrote:


 I get these two warnings when I run the dhfr/GPU/dhfr-solv-PME.bench
 benchmark with the following command line:

 mdrun_intel_cuda5 -v -s topol.tpr -testverlet

 WARNING: Oversubscribing the available 0 logical CPU cores with 1
 thread-MPI threads.

 0 logical CPU cores? Isn't this bizarre? My CPU is Intel Core i7-3610QM

That is bizzarre. Could you run with -debug 1 and have a look at the
mdrun.debug output which should contain a message like:
Detected N processors, will use this as the number of supported hardware

I'm wondering, is N=0 in your case!?

 (2.3 GHz). Unlike Albert, I don't see any performance loss, I get 13.4
 ns/day on a single core with 1 GPU and 13.2 ns/day with GROMACS v4.5.5 on 4
 cores (8 threads) without the GPU. Yet, I don't see any performance gain
 with more that 4 -nt threads.

 mdrun_intel_cuda5 -v -nt 2 -s topol.tpr -testverlet : 15.4 ns/day
 mdrun_intel_cuda5 -v -nt 3 -s topol.tpr -testverlet : 16.0 ns/day
 mdrun_intel_cuda5 -v -nt 4 -s topol.tpr -testverlet : 16.3 ns/day
 mdrun_intel_cuda5 -v -nt 6 -s topol.tpr -testverlet : 16.2 ns/day
 mdrun_intel_cuda5 -v -nt 8 -s topol.tpr -testverlet : 15.4 ns/day

I guess there is not much point in not using all cores, is it? Note that
the performance drops after 4 threads because Hyper-Threading with OpenMP
doesn't always help.

 I have also attached my log file (from mdrun_intel_cuda5 -v -s topol.tpr
 -testverlet) in case you find it helpful.

I don't see it attached.



 On 5 November 2012 18:54, Szilárd Páll wrote:

  The first warning indicates that you are starting more threads than the
  hardware supports which would explain the poor performance.
  Could share a log file of the suspiciously slow run as well as the
  line you used to start mdrun?
  On Sun, Nov 4, 2012 at 5:32 PM, Albert wrote:
   well, IC.
   the performance is rather poor than GTX590. 32ns/day vs 4 ns/day
   probably that's also something related to the warnings?
   On 11/04/2012 01:59 PM, Justin Lemkul wrote:
   On 11/4/12 4:55 AM, Albert wrote:
 I am running Gromacs 4.6 GPU on a workstation with two GTX 660 Ti
   CUDA cores), and I got the following warnings:
   thank you very much.
   WARNING: On node 0: oversubscribing the available 0 logical CPU cores
   per node
   with 2 MPI processes.
 This will cause considerable performance loss!
   2 GPUs detected on host boreas:
  #0: NVIDIA GeForce GTX 660 Ti, compute cap.: 3.0, ECC:  no, stat:
  #1: NVIDIA GeForce GTX 660 Ti, compute cap.: 3.0, ECC:  no, stat:
   2 GPUs auto-selected to be used for this run: #0, #1
   Using CUDA 8x8x8 non-bonded kernels
   Making 1D domain decomposition 1 x 2 x 1
   We have just committed the new CPU detection code in this branch,
   and will commit new SSE/AVX kernels in a few days. However, this
   means that currently only the NxN kernels are accelerated!
   In the mean time, you might want to avoid production runs in 4.6.
   I can't address the first warning, but the second is fairly obvious.
You're not using an official release, you're using the development
   - let the user beware.  The code is not yet production-ready.
   gmx-users mailing**mailman/listinfo/gmx-users
   * Please search the archive at**
   Support/Mailing_Lists/Search posting!
   * Please don't post (un)subscribe requests to the list. Use the www
   interface or send it to
   * Can't post? Read**Support/Mailing_Lists
  gmx-users mailing
  * Please search the archive at before posting!
  * Please don't post (un)subscribe requests to the list. Use the
  www interface or send it to
  * Can't post? Read



 Thomas Evangelidis

 PhD student
 University of Athens
 Faculty of Pharmacy
 Department of Pharmaceutical Chemistry
 157 71 Athens



Re: [gmx-users] GPU warnings

2012-11-09 Thread Szilárd Páll

You must have an odd sysconf version! Could you please check what is the
sysconf system variable's name in the sysconf man page (man sysconf) where
it says something like:

 The number of processors currently online.

The first line should be one of the
_SC_NPROCESSORS_CONF, _SC_NPROC_CONF, but I guess yours is something

Can you also check what your glibc version is?



On Fri, Nov 9, 2012 at 5:51 PM, Thomas Evangelidis teva...@gmail.comwrote:

  I get these two warnings when I run the dhfr/GPU/dhfr-solv-PME.bench
  benchmark with the following command line:
  mdrun_intel_cuda5 -v -s topol.tpr -testverlet
  WARNING: Oversubscribing the available 0 logical CPU cores with 1
  thread-MPI threads.
  0 logical CPU cores? Isn't this bizarre? My CPU is Intel Core i7-3610QM

 That is bizzarre. Could you run with -debug 1 and have a look at the
 mdrun.debug output which should contain a message like:
 Detected N processors, will use this as the number of supported hardware

 I'm wondering, is N=0 in your case!?

 It says Detected 0 processors, will use this as the number of supported
 hardware threads.

  (2.3 GHz). Unlike Albert, I don't see any performance loss, I get 13.4
  ns/day on a single core with 1 GPU and 13.2 ns/day with GROMACS v4.5.5
 on 4
  cores (8 threads) without the GPU. Yet, I don't see any performance gain
  with more that 4 -nt threads.
  mdrun_intel_cuda5 -v -nt 2 -s topol.tpr -testverlet : 15.4 ns/day
  mdrun_intel_cuda5 -v -nt 3 -s topol.tpr -testverlet : 16.0 ns/day
  mdrun_intel_cuda5 -v -nt 4 -s topol.tpr -testverlet : 16.3 ns/day
  mdrun_intel_cuda5 -v -nt 6 -s topol.tpr -testverlet : 16.2 ns/day
  mdrun_intel_cuda5 -v -nt 8 -s topol.tpr -testverlet : 15.4 ns/day

 I guess there is not much point in not using all cores, is it? Note that
 the performance drops after 4 threads because Hyper-Threading with OpenMP
 doesn't always help.

  I have also attached my log file (from mdrun_intel_cuda5 -v -s
  -testverlet) in case you find it helpful.

 I don't see it attached.

 I have attached both mdrun_intel_cuda5.debug and md.log files.  They will
 possibly be filtered by the mailing list but will be delivered to your


gmx-users mailing
* Please search the archive at before posting!
* Please don't post (un)subscribe requests to the list. Use the
www interface or send it to
* Can't post? Read

Re: [gmx-users] GPU warnings

2012-11-05 Thread Szilárd Páll
The first warning indicates that you are starting more threads than the
hardware supports which would explain the poor performance.

Could share a log file of the suspiciously slow run as well as the command
line you used to start mdrun?



On Sun, Nov 4, 2012 at 5:32 PM, Albert wrote:

 well, IC.
 the performance is rather poor than GTX590. 32ns/day vs 4 ns/day
 probably that's also something related to the warnings?


 On 11/04/2012 01:59 PM, Justin Lemkul wrote:

 On 11/4/12 4:55 AM, Albert wrote:


   I am running Gromacs 4.6 GPU on a workstation with two GTX 660 Ti (2 x
 CUDA cores), and I got the following warnings:

 thank you very much.


 WARNING: On node 0: oversubscribing the available 0 logical CPU cores
 per node
 with 2 MPI processes.
   This will cause considerable performance loss!

 2 GPUs detected on host boreas:
#0: NVIDIA GeForce GTX 660 Ti, compute cap.: 3.0, ECC:  no, stat:
#1: NVIDIA GeForce GTX 660 Ti, compute cap.: 3.0, ECC:  no, stat:

 2 GPUs auto-selected to be used for this run: #0, #1

 Using CUDA 8x8x8 non-bonded kernels
 Making 1D domain decomposition 1 x 2 x 1

 We have just committed the new CPU detection code in this branch,
 and will commit new SSE/AVX kernels in a few days. However, this
 means that currently only the NxN kernels are accelerated!
 In the mean time, you might want to avoid production runs in 4.6.

 I can't address the first warning, but the second is fairly obvious.
  You're not using an official release, you're using the development version
 - let the user beware.  The code is not yet production-ready.


 gmx-users mailing**mailman/listinfo/gmx-users
 * Please search the archive at**
 * Please don't post (un)subscribe requests to the list. Use the www
 interface or send it to
 * Can't post? Read**Support/Mailing_Lists

gmx-users mailing
* Please search the archive at before posting!
* Please don't post (un)subscribe requests to the list. Use the
www interface or send it to
* Can't post? Read

Re: [gmx-users] GPU warnings

2012-11-05 Thread Thomas Evangelidis

I get these two warnings when I run the dhfr/GPU/dhfr-solv-PME.bench
benchmark with the following command line:

mdrun_intel_cuda5 -v -s topol.tpr -testverlet

WARNING: Oversubscribing the available 0 logical CPU cores with 1
thread-MPI threads.

0 logical CPU cores? Isn't this bizarre? My CPU is Intel Core i7-3610QM
(2.3 GHz). Unlike Albert, I don't see any performance loss, I get 13.4
ns/day on a single core with 1 GPU and 13.2 ns/day with GROMACS v4.5.5 on 4
cores (8 threads) without the GPU. Yet, I don't see any performance gain
with more that 4 -nt threads.

mdrun_intel_cuda5 -v -nt 2 -s topol.tpr -testverlet : 15.4 ns/day
mdrun_intel_cuda5 -v -nt 3 -s topol.tpr -testverlet : 16.0 ns/day
mdrun_intel_cuda5 -v -nt 4 -s topol.tpr -testverlet : 16.3 ns/day
mdrun_intel_cuda5 -v -nt 6 -s topol.tpr -testverlet : 16.2 ns/day
mdrun_intel_cuda5 -v -nt 8 -s topol.tpr -testverlet : 15.4 ns/day

I have also attached my log file (from mdrun_intel_cuda5 -v -s topol.tpr
-testverlet) in case you find it helpful.


On 5 November 2012 18:54, Szilárd Páll wrote:

 The first warning indicates that you are starting more threads than the
 hardware supports which would explain the poor performance.

 Could share a log file of the suspiciously slow run as well as the command
 line you used to start mdrun?



 On Sun, Nov 4, 2012 at 5:32 PM, Albert wrote:

  well, IC.
  the performance is rather poor than GTX590. 32ns/day vs 4 ns/day
  probably that's also something related to the warnings?
  On 11/04/2012 01:59 PM, Justin Lemkul wrote:
  On 11/4/12 4:55 AM, Albert wrote:
I am running Gromacs 4.6 GPU on a workstation with two GTX 660 Ti (2
  CUDA cores), and I got the following warnings:
  thank you very much.
  WARNING: On node 0: oversubscribing the available 0 logical CPU cores
  per node
  with 2 MPI processes.
This will cause considerable performance loss!
  2 GPUs detected on host boreas:
 #0: NVIDIA GeForce GTX 660 Ti, compute cap.: 3.0, ECC:  no, stat:
 #1: NVIDIA GeForce GTX 660 Ti, compute cap.: 3.0, ECC:  no, stat:
  2 GPUs auto-selected to be used for this run: #0, #1
  Using CUDA 8x8x8 non-bonded kernels
  Making 1D domain decomposition 1 x 2 x 1
  We have just committed the new CPU detection code in this branch,
  and will commit new SSE/AVX kernels in a few days. However, this
  means that currently only the NxN kernels are accelerated!
  In the mean time, you might want to avoid production runs in 4.6.
  I can't address the first warning, but the second is fairly obvious.
   You're not using an official release, you're using the development
  - let the user beware.  The code is not yet production-ready.
  gmx-users mailing**mailman/listinfo/gmx-users
  * Please search the archive at**
  Support/Mailing_Lists/Search posting!
  * Please don't post (un)subscribe requests to the list. Use the www
  interface or send it to
  * Can't post? Read**Support/Mailing_Lists
 gmx-users mailing
 * Please search the archive at before posting!
 * Please don't post (un)subscribe requests to the list. Use the
 www interface or send it to
 * Can't post? Read



Thomas Evangelidis

PhD student
University of Athens
Faculty of Pharmacy
Department of Pharmaceutical Chemistry
157 71 Athens


gmx-users mailing
* Please search the archive at before posting!
* Please don't post (un)subscribe requests to the list. Use the 
www interface or send it to
* Can't post? Read

Re: [gmx-users] GPU warnings

2012-11-04 Thread Justin Lemkul

On 11/4/12 4:55 AM, Albert wrote:


  I am running Gromacs 4.6 GPU on a workstation with two GTX 660 Ti (2 x 1344
CUDA cores), and I got the following warnings:

thank you very much.


WARNING: On node 0: oversubscribing the available 0 logical CPU cores per node
with 2 MPI processes.
  This will cause considerable performance loss!

2 GPUs detected on host boreas:
   #0: NVIDIA GeForce GTX 660 Ti, compute cap.: 3.0, ECC:  no, stat: compatible
   #1: NVIDIA GeForce GTX 660 Ti, compute cap.: 3.0, ECC:  no, stat: compatible

2 GPUs auto-selected to be used for this run: #0, #1

Using CUDA 8x8x8 non-bonded kernels
Making 1D domain decomposition 1 x 2 x 1

We have just committed the new CPU detection code in this branch,
and will commit new SSE/AVX kernels in a few days. However, this
means that currently only the NxN kernels are accelerated!
In the mean time, you might want to avoid production runs in 4.6.

I can't address the first warning, but the second is fairly obvious.  You're not 
using an official release, you're using the development version - let the user 
beware.  The code is not yet production-ready.



Justin A. Lemkul, Ph.D.
Research Scientist
Department of Biochemistry
Virginia Tech
Blacksburg, VA
jalemkul[at] | (540) 231-9080

gmx-users mailing
* Please search the archive at before posting!
* Please don't post (un)subscribe requests to the list. Use the 
www interface or send it to

* Can't post? Read

Re: [gmx-users] GPU warnings

2012-11-04 Thread Thomas Evangelidis
I 'm also get the first warning (oversubscribing the available...) and
see no obvious performance gain. Do you know how to avoid that?


On 4 November 2012 14:59, Justin Lemkul wrote:

 On 11/4/12 4:55 AM, Albert wrote:


   I am running Gromacs 4.6 GPU on a workstation with two GTX 660 Ti (2 x
 CUDA cores), and I got the following warnings:

 thank you very much.


 WARNING: On node 0: oversubscribing the available 0 logical CPU cores per
 with 2 MPI processes.
   This will cause considerable performance loss!

 2 GPUs detected on host boreas:
#0: NVIDIA GeForce GTX 660 Ti, compute cap.: 3.0, ECC:  no, stat:
#1: NVIDIA GeForce GTX 660 Ti, compute cap.: 3.0, ECC:  no, stat:

 2 GPUs auto-selected to be used for this run: #0, #1

 Using CUDA 8x8x8 non-bonded kernels
 Making 1D domain decomposition 1 x 2 x 1

 We have just committed the new CPU detection code in this branch,
 and will commit new SSE/AVX kernels in a few days. However, this
 means that currently only the NxN kernels are accelerated!
 In the mean time, you might want to avoid production runs in 4.6.

 I can't address the first warning, but the second is fairly obvious.
  You're not using an official release, you're using the development version
 - let the user beware.  The code is not yet production-ready.



 Justin A. Lemkul, Ph.D.
 Research Scientist
 Department of Biochemistry
 Virginia Tech
 Blacksburg, VA
 jalemkul[at] | (540) 231-9080


 gmx-users mailing**mailman/listinfo/gmx-users
 * Please search the archive at**
 * Please don't post (un)subscribe requests to the list. Use the www
 interface or send it to
 * Can't post? Read**Support/Mailing_Lists



Thomas Evangelidis

PhD student
University of Athens
Faculty of Pharmacy
Department of Pharmaceutical Chemistry
157 71 Athens


gmx-users mailing
* Please search the archive at before posting!
* Please don't post (un)subscribe requests to the list. Use the 
www interface or send it to
* Can't post? Read

Re: [gmx-users] GPU warnings

2012-11-04 Thread Albert

well, IC.
the performance is rather poor than GTX590. 32ns/day vs 4 ns/day
probably that's also something related to the warnings?


On 11/04/2012 01:59 PM, Justin Lemkul wrote:

On 11/4/12 4:55 AM, Albert wrote:


  I am running Gromacs 4.6 GPU on a workstation with two GTX 660 Ti 
(2 x 1344

CUDA cores), and I got the following warnings:

thank you very much.


WARNING: On node 0: oversubscribing the available 0 logical CPU cores 
per node

with 2 MPI processes.
  This will cause considerable performance loss!

2 GPUs detected on host boreas:
   #0: NVIDIA GeForce GTX 660 Ti, compute cap.: 3.0, ECC:  no, stat: 
   #1: NVIDIA GeForce GTX 660 Ti, compute cap.: 3.0, ECC:  no, stat: 

2 GPUs auto-selected to be used for this run: #0, #1

Using CUDA 8x8x8 non-bonded kernels
Making 1D domain decomposition 1 x 2 x 1

We have just committed the new CPU detection code in this branch,
and will commit new SSE/AVX kernels in a few days. However, this
means that currently only the NxN kernels are accelerated!
In the mean time, you might want to avoid production runs in 4.6.

I can't address the first warning, but the second is fairly obvious.  
You're not using an official release, you're using the development 
version - let the user beware.  The code is not yet production-ready.


gmx-users mailing
* Please search the archive at before posting!
* Please don't post (un)subscribe requests to the list. Use the 
www interface or send it to

* Can't post? Read

Re: [gmx-users] GPU-C2075-simulation-solw or GPU only running -reg

2012-10-21 Thread Justin Lemkul

On 10/21/12 3:38 PM, venkatesh s wrote:

Respected Gromacs people's,
 my query is my system very
slow? how can i improve the speed, its running like or equal to (25
minutes) Intel Core I 7 processors only.
Here i am given my entire system information,and  i found my system 8 core
not taking job (GPU only running).

mdrun-gpu -device
OpenMM:platform=Cuda,memtest=15,deviceid=0,force-device=yes -v -deffnm nvt

Non-supported GPU selected (#0, Tesla C2075), forced continuing.Note, that
the simulation can be slow or it migth even crash.
Pre-simulation ~15s memtest in progress...
Memory test completed without errors.

Back Off! I just backed up nvt.log to ./#nvt.log.1#
Getting Loaded...
Reading file nvt.tpr, VERSION 4.5.5 (single precision)
Loaded with Money

Back Off! I just backed up nvt.trr to ./#nvt.trr.1#

Back Off! I just backed up nvt.edr to ./#nvt.edr.1#

WARNING: OpenMM supports only Andersen thermostat with the
md/md-vv/md-vv-avek integrators.

WARNING: OpenMM provides contraints as a combination of SHAKE, SETTLE and
CCMA. Accuracy is based on the SHAKE tolerance set by the shake_tol

WARNING: Non-supported GPU selected (#0, Tesla C2075), forced
continuing.Note, that the simulation can be slow or it migth even crash.

Pre-simulation ~15s memtest in progress...done, no errors detected
starting mdrun 'Protein in water'
5 steps,100.0 ps.

OpenMM run - timing based on wallclock.

NODE (s)   Real (s)  (%)
Time:   1319.043   1319.043100.0
(Mnbf/s)   (MFlops)   (ns/day)  (hour/ns)
Performance:  0.000  0.006  6.550  3.664


| NVIDIA-SMI 3.295.59   Driver Version: 295.59
| Nb.  Name | Bus IdDisp.  | Volatile ECC SB /
DB |
| Fan   Temp   Power Usage /Cap | Memory Usage | GPU Util. Compute
M. |
| 0.  Tesla C2075   | :01:00.0  On | 0
0 |
|  30%   75 C  P0   150W / 225W |   8%  435MB / 5375MB |   95%
| Compute processes:   GPU
Memory |
|  GPU  PID Process name
Usage  |
|  0.  5889 mdrun-gpu
372MB  |



top - 22:48:22 up 13 min,  4 users,  load average: 0.19, 0.18, 0.09
Tasks: 308 total,   2 running, 304 sleeping,   2 stopped,   0 zombie
Cpu0  : 16.4%us,  1.7%sy,  0.0%ni, 81.9%id,  0.0%wa,  0.0%hi,  0.0%si,
Cpu1  :  5.4%us,  0.7%sy,  0.0%ni, 94.0%id,  0.0%wa,  0.0%hi,  0.0%si,
Cpu2  :  9.3%us,  0.7%sy,  0.0%ni, 90.0%id,  0.0%wa,  0.0%hi,  0.0%si,
Cpu3  :  0.0%us,  0.7%sy,  0.0%ni, 99.3%id,  0.0%wa,  0.0%hi,  0.0%si,
Cpu4  : 13.0%us,  0.7%sy,  0.0%ni, 86.4%id,  0.0%wa,  0.0%hi,  0.0%si,
Cpu5  :  1.0%us,  0.0%sy,  0.0%ni, 99.0%id,  0.0%wa,  0.0%hi,  0.0%si,
Cpu6  :  0.3%us,  0.3%sy,  0.0%ni, 99.3%id,  0.0%wa,  0.0%hi,  0.0%si,
Cpu7  :  0.0%us,  0.0%sy,  0.0%ni,100.0%id,  0.0%wa,  0.0%hi,  0.0%si,
Mem:  12188656k total,  1191628k used, 10997028k free,34804k buffers
Swap:0k total,0k used,0k free,   418428k cached

protein  +sol   +  NA   total atom(nvt.gro)
158 residues   10742234646

npt.mdp file

; Run parameters
integrator= md-vv;
nsteps= 5; 2 * 5 = 100 ps
dt= 0.002; 2 fs
; Output control
nstxout= 100; save coordinates every 0.2 ps
nstvout= 100; save velocities every 0.2 ps
nstenergy= 100; save energies every 0.2 ps
nstlog= 100; update log file every 0.2 ps
; Bond parameters
continuation= yes; Restarting after NVT
constraint_algorithm = lincs; holonomic constraints
constraints= all-bonds; all bonds (even heavy atom-H bonds)
lincs_iter= 1; accuracy of LINCS
lincs_order= 4; also related to accuracy
; Neighborsearching
ns_type= grid; search neighboring grid cells
nstlist= 5; 10 fs
rlist= 1.0; short-range neighborlist cutoff (in nm)
rcoulomb= 1.0; short-range electrostatic cutoff (in nm)
rvdw= 1.0; short-range van der Waals cutoff (in nm)
; Electrostatics
coulombtype= PME; Particle Mesh Ewald for long-range
pme_order= 4; cubic interpolation
fourierspacing= 0.16; grid spacing for FFT
; Temperature coupling is on

Re: [gmx-users] GPU-C2075-simulation-solw -reg

2012-10-20 Thread Justin Lemkul

On 10/20/12 1:34 PM, venkatesh s wrote:

Respected Gromacs Users
  i started the energy simulation but
its slow (showing following )

Getting Loaded...
Reading file em.tpr, VERSION 4.5.5 (single precision)
Loaded with Money

WARNING: Non-supported GPU selected (#0, Tesla C2075), forced
continuing.Note, that the simulation can be slow or it might even crash.

Pre-simulation ~15s memtest in progress...done, no errors detected
starting mdrun 'Protein in water'
5 steps, 50.0 ps.

for increase the speed of gpu what i want to do ?
kindly provide the promote solution

No one can suggest a solution without a better statement of the problem.  What 
is your system?  How many atoms does it have?  How fast is it running? What is 
your .mdp file?  How do the benchmark systems perform?



Justin A. Lemkul, Ph.D.
Research Scientist
Department of Biochemistry
Virginia Tech
Blacksburg, VA
jalemkul[at] | (540) 231-9080

gmx-users mailing
* Please search the archive at before posting!
* Please don't post (un)subscribe requests to the list. Use the 
www interface or send it to

* Can't post? Read

Re: [gmx-users] GPU -simulation error -reg

2012-10-14 Thread Justin Lemkul

On 10/14/12 8:01 AM, venkatesh s wrote:

Respected Gromacs People's,
  system Containing protein+peptide  ( Normally i use the
lysosome tutorial md.mdp (only i change the nanosecond) )

  mdrun-gpu -v -deffnm  md_0_1

while running this i got fatal error like this (Following)

Getting Loaded...
Reading file md_0_1.tpr, VERSION 4.5.5 (single precision)
Loaded with Money

WARNING: OpenMM does not support leap-frog, will use velocity-verlet

WARNING: OpenMM supports only Andersen thermostat with the
md/md-vv/md-vv-avek integrators.

Program mdrun-gpu, VERSION 4.5.5
Source code file:
/opt/softwares/compile/gromacs-4.5.5/src/kernel/openmm_wrapper.cpp, line:

Fatal error:
OpenMM does not support multiple temperature coupling groups.
For more information and tips for troubleshooting, please check the GROMACS
website at

Kindly provide prompt answer

The error message is fairly self-explanatory.  You are using multiple 
temperature coupling groups (tc-grps in the .mdp file).  You can't do that when 
running on GPU.  Set tc-grps = System.



Justin A. Lemkul, Ph.D.
Research Scientist
Department of Biochemistry
Virginia Tech
Blacksburg, VA
jalemkul[at] | (540) 231-9080

gmx-users mailing
* Please search the archive at before posting!
* Please don't post (un)subscribe requests to the list. Use the 
www interface or send it to

* Can't post? Read

Re: [gmx-users] GPU

2012-06-13 Thread Szilárd Páll
On Wed, Jun 13, 2012 at 3:59 AM, Mark Abraham

 On 12/06/2012 10:49 PM, Ehud Schreiber wrote:

 Message: 4
 Date: Mon, 11 Jun 2012 15:54:39 +1000
 From: Mark**au
 Subject: Re: [gmx-users] GPU
 To: Discussion list for GROMACS
 Content-Type: text/plain; charset=ISO-8859-1; format=flowed

 On 11/06/2012 2:32 AM, ifat shub wrote:


 If I understand correctly, currently the Gromacs GPU acceleration does
 not support energy minimization. Is this so? Are there any plans to
 include it in the 4.6 version or in a later one (i.e. to allow, say,
 integrator = steep or cg in mdrun-gpu)? I would find such options
 extremely useful.

 EM is normally so quick that it's not worth putting much effort into
 accelerating it, compared to the CPU-months that are spent doing
 subsequent MD.


 Currently, my main use of Gromacs entails running multiple minimizations
 on an ensemble of states.
 Moreover, these states are not obtained using molecular dynamics but
 rather using the Concoord algorithm.
 Therefore, for me the bottleneck is not md but rather minimizations
 (specifically, cg) and so their acceleration on GPUs would be very
 If such usage is not totally idiosyncratic, I hope the development team
 would reconsider GPU accelerating also minimizations.
 I suspect this would not be technically too complex given the work
 already done on dynamics.

 I suspect the upcoming 4.6 release will have GPU-accelerated EM available
 as a side effect of the new Verlet pair-list scheme for computing
 non-bonded interactions. This development is unrelated to previous GPU
 efforts, I

It does work and has been tested extensively. We are working on the final
details,  but you can get the code from the nbnxn_hybrid_acc branch -- it's
pretty safe to use it for non-production purposes!

The pages Mark linked are the resources you want to start with before you
start using the NxN kernels.


 understand. See**Documentation/Acceleration_**
  some advance details. When you hear a call for alpha testers in the
 next few

months, you might want to spend some time on that so that you're sure
 GROMACS will best meet your future needs. :-)


 gmx-users mailing**mailman/listinfo/gmx-users
 Please search the archive at**
 Please don't post (un)subscribe requests to the list. Use the www
 interface or send it to
 Can't post? Read**Support/Mailing_Lists

gmx-users mailing
Please search the archive at before posting!
Please don't post (un)subscribe requests to the list. Use the 
www interface or send it to
Can't post? Read

Re: [gmx-users] GPU

2012-06-12 Thread Ehud Schreiber
Message: 4
Date: Mon, 11 Jun 2012 15:54:39 +1000
From: Mark Abraham
Subject: Re: [gmx-users] GPU
To: Discussion list for GROMACS users
Content-Type: text/plain; charset=ISO-8859-1; format=flowed

On 11/06/2012 2:32 AM, ifat shub wrote:

 If I understand correctly, currently the Gromacs GPU acceleration does
 not support energy minimization. Is this so? Are there any plans to
 include it in the 4.6 version or in a later one (i.e. to allow, say,
 integrator = steep or cg in mdrun-gpu)? I would find such options
 extremely useful.

EM is normally so quick that it's not worth putting much effort into 
accelerating it, compared to the CPU-months that are spent doing 
subsequent MD.


Currently, my main use of Gromacs entails running multiple minimizations on an 
ensemble of states.
Moreover, these states are not obtained using molecular dynamics but rather 
using the Concoord algorithm.
Therefore, for me the bottleneck is not md but rather minimizations 
(specifically, cg) and so their acceleration on GPUs would be very advantageous.
If such usage is not totally idiosyncratic, I hope the development team would 
reconsider GPU accelerating also minimizations.
I suspect this would not be technically too complex given the work already done 
on dynamics.

Ehud Schreiber.

gmx-users mailing
Please search the archive at before posting!
Please don't post (un)subscribe requests to the list. Use the
www interface or send it to
Can't post? Read

Re: [gmx-users] GPU

2012-06-12 Thread Mark Abraham

On 12/06/2012 10:49 PM, Ehud Schreiber wrote:

Message: 4
Date: Mon, 11 Jun 2012 15:54:39 +1000
From: Mark
Subject: Re: [gmx-users] GPU
To: Discussion list for GROMACS
Content-Type: text/plain; charset=ISO-8859-1; format=flowed

On 11/06/2012 2:32 AM, ifat shub wrote:


If I understand correctly, currently the Gromacs GPU acceleration does
not support energy minimization. Is this so? Are there any plans to
include it in the 4.6 version or in a later one (i.e. to allow, say,
integrator = steep or cg in mdrun-gpu)? I would find such options
extremely useful.

EM is normally so quick that it's not worth putting much effort into
accelerating it, compared to the CPU-months that are spent doing
subsequent MD.


Currently, my main use of Gromacs entails running multiple minimizations on an 
ensemble of states.
Moreover, these states are not obtained using molecular dynamics but rather 
using the Concoord algorithm.
Therefore, for me the bottleneck is not md but rather minimizations 
(specifically, cg) and so their acceleration on GPUs would be very advantageous.
If such usage is not totally idiosyncratic, I hope the development team would 
reconsider GPU accelerating also minimizations.
I suspect this would not be technically too complex given the work already done 
on dynamics.

I suspect the upcoming 4.6 release will have GPU-accelerated EM 
available as a side effect of the new Verlet pair-list scheme for 
computing non-bonded interactions. This development is unrelated to 
previous GPU efforts, I understand. See 
and for some 
advance details. When you hear a call for alpha testers in the next few 
months, you might want to spend some time on that so that you're sure 
GROMACS will best meet your future needs. :-)

gmx-users mailing
Please search the archive at before posting!
Please don't post (un)subscribe requests to the list. Use the 
www interface or send it to

Can't post? Read

Re: [gmx-users] GPU

2012-06-10 Thread Mark Abraham

On 11/06/2012 2:32 AM, ifat shub wrote:


If I understand correctly, currently the Gromacs GPU acceleration does
not support energy minimization. Is this so? Are there any plans to
include it in the 4.6 version or in a later one (i.e. to allow, say,
integrator = steep or cg in mdrun-gpu)? I would find such options
extremely useful.

EM is normally so quick that it's not worth putting much effort into 
accelerating it, compared to the CPU-months that are spent doing 
subsequent MD.

gmx-users mailing
Please search the archive at before posting!
Please don't post (un)subscribe requests to the list. Use the 
www interface or send it to

Can't post? Read

Re: [gmx-users] GPU crashes

2012-06-07 Thread lloyd riggs
Did you play with the time step?  Just currious, but I woundered what happened 
with 0.0008, 0.0005, 0.0002.  I found if I had a good behaving protein, as soon 
as I added a small (non-protein) molecule which rotated wildly while attached 
to the protein, it would crash unless I reduced the time step to the above when 
constraints were removed after EQ ... always it seemed to me it didnt like the 
rotation or bond angles, seeing them as a violation but acted like it was an 
amino acid? (the same bond type but with wider rotation as one end wasnt fixed 
to a chain)  If your loop moves via backbone, the calculated angles, bonds or 
whatever might appear to the computer to be violating the parameter settings 
for problems, errors, etc as it cant track them fast enough over the time step. 
Ie atom 1-2-3 and then delta 1-2-3 with xyz parameters, but then the particular 
set has additional rotation, etc and may include the chain atoms which bend 
wildly (n-Ca-Cb-Cg maybe a dihedral) but probab
 ly not this. 

Just a thought but probably not the right answere as well, it might be the way 
it is broken down (above) over GPUs, which convert everything to matricies 
(non-standard just for basic math operations not real matricies per say) for 
exicution and then some library problem which would not account for long range 
rapid (0.0005) movements at the chain (Ca,N,O to something else) and then tries 
to apply these to Cb-Cg-O-H, etc using the initial points while looking at the 
parameters for say a single amino acid...Maybe the constraints would cause 
this, which would make it a pain to EQ, but this allowed me to increase the 
time step, but would ruin the experiment I had worked on as I needed it 
unconstrained to show it didnt float away when proteins were pulled, etc...I 
was using a different integrator though...just normal MD.  

ANd your cutoffs for vdw, etc...Why are they 0?  I dont know if this means a 
defautl set is then used...but if not ?  Wouldnt they try integrating using 
both types of formula, or would it be just using coulumb or vice versa? (dont 
know what that would do to the code but assume it means no vdw, and all coulumb 
but then zeros are alwyas a problem for computers).  

Thats my thoughts on that.  Probably something else though.

Good luck,


 Datum: Wed, 06 Jun 2012 18:42:45 -0400
 Von: Justin A. Lemkul
 An: Discussion list for GROMACS users
 Betreff: [gmx-users] GPU crashes

 Hi All,
 I'm wondering if anyone has experienced what I'm seeing with Gromacs 4.5.5
 GPU.  It seems that certain systems fail inexplicably.  The system I am
 with is a heterodimeric protein complex bound to DNA.  After about 1 ns of
 simulation time using mdrun-gpu, all the energies become NaN.  The
 don't stop, they just carry on merrily producing nonsense.  I would love
 to see 
 some action regarding for this
 reason ;)
 I ran simulations of each of the components of the system individually -
 protein alone, and DNA - to try to track down what might be causing this 
 problem.  The DNA simulation is perfectly stable out to 10 ns, but each
 fails within 2 ns.  Each protein has two domains with a flexible linker,
 and it 
 seems that as soon as the linker flexes a bit, the simulations go poof. 
 Well-behaved proteins like lysozyme and DHFR (from the benchmark set) seem
 but anything that twitches even a small amount fails.  This is very
 for us, as we are hoping to see domain motions on a feasible time scale
 implicit solvent on GPU hardware.
 Has anyone seen anything like this?  Our Gromacs implementation is being
 run on 
 an x86_64 Linux system with Tesla S2050 GPU cards.  The CUDA version is
 3.1 and 
 Gromacs is linked against OpenMM-2.0.  An .mdp file is appended below.  I
 also tested finite values for cutoffs, but the results were worse
 occurred more quickly).
 I have not been able to use the latest git version of Gromacs to test
 anything has been fixed, but will post separately to gmx-developers
 the reasons for that soon.
 === md.mdp ===
 title   = Implicit solvent test
 ; Run parameters
 integrator  = sd
 dt  = 0.002
 nsteps  = 500   ; 1 ps (10 ns)
 nstcomm = 1
 comm_mode   = angular   ; non-periodic system
 ; Output parameters
 nstxout = 0
 nstvout = 0
 nstfout = 0
 nstxtcout   = 1000  ; every 2 ps
 nstlog  = 5000  ; every 10 ps
 nstenergy   = 1000  ; every 2 ps
 ; Bond parameters
 constraint_algorithm= lincs
 constraints = all-bonds
 continuation= no; starting up
 ; required cutoffs for implicit
 nstlist = 0
 ns_type = grid
 rlist   = 0
 rcoulomb= 0
 rvdw= 0

Re: [gmx-users] GPU crashes

2012-06-07 Thread Justin A. Lemkul

On 6/7/12 3:57 AM, lloyd riggs wrote:

Did you play with the time step?  Just currious, but I woundered what
happened with 0.0008, 0.0005, 0.0002.  I found if I had a good behaving
protein, as soon as I added a small (non-protein) molecule which rotated
wildly while attached to the protein, it would crash unless I reduced the
time step to the above when constraints were removed after EQ ... always it
seemed to me it didnt like the rotation or bond angles, seeing them as a
violation but acted like it was an amino acid? (the same bond type but with
wider rotation as one end wasnt fixed to a chain)  If your loop moves via
backbone, the calculated angles, bonds or whatever might appear to the
computer to be violating the parameter settings for problems, errors, etc as
it cant track them fast enough over the time step. Ie atom 1-2-3 and then
delta 1-2-3 with xyz parameters, but then the particular set has additional
rotation, etc and may include the chain atoms which bend wildly (n-Ca-Cb-Cg
maybe a dihedral) but proba! bly not this.

Just a thought but probably not the right answere as well, it might be the
way it is broken down (above) over GPUs, which convert everything to
matricies (non-standard just for basic math operations not real matricies per
say) for exicution and then some library problem which would not account for
long range rapid (0.0005) movements at the chain (Ca,N,O to something else)
and then tries to apply these to Cb-Cg-O-H, etc using the initial points
while looking at the parameters for say a single amino acid...Maybe the
constraints would cause this, which would make it a pain to EQ, but this
allowed me to increase the time step, but would ruin the experiment I had
worked on as I needed it unconstrained to show it didnt float away when
proteins were pulled, etc...I was using a different integrator though...just
normal MD.

I have long wondered if constraints were properly handled by the OpenMM library. 
 I am constraining all bonds, so in principle, dt of 0.002 should not be a 
problem.  The note printed indicates that the constraint algorithm is changed 
from the one selected (LINCS) to whatever OpenMM uses (SHAKE and a few others in 
combination).  Perhaps I can try running without constraints and a reduced dt, 
but I'd like to avoid it.

I wish I could efficiently test to see if this behavior was GPU-specific, but 
unfortunately the non-GPU implementation of the implicit code can currently only 
be run in serial or on 2 CPU due to an existing bug.  I can certainly test it, 
but due to the large number of atoms, it will take several days to even approach 
1 ns.

ANd your cutoffs for vdw, etc...Why are they 0?  I dont know if this means a
defautl set is then used...but if not ?  Wouldnt they try integrating using
both types of formula, or would it be just using coulumb or vice versa? (dont
know what that would do to the code but assume it means no vdw, and all
coulumb but then zeros are alwyas a problem for computers).

The setup is for the all-vs-all kernels.  Setting cutoffs equal to zero and 
using a fixed neighbor list triggers these special optimized kernels.  I have 
also noticed that long, finite cutoffs (on the order of 4.0 nm) lead to 
unacceptable energy drift and structural instability in well-behaved systems 
(even the benchmarks).  For instance, the backbone RMSD of lysozyme is twice as 
large in the case of a 4.0-nm cutoff relative to the all-vs-all setup, and the 
energy drift is quite substantial.



Justin A. Lemkul, Ph.D.
Research Scientist
Department of Biochemistry
Virginia Tech
Blacksburg, VA
jalemkul[at] | (540) 231-9080

gmx-users mailing
Please search the archive at before posting!
Please don't post (un)subscribe requests to the list. Use the 
www interface or send it to

Can't post? Read

Re: [gmx-users] GPU gets faster with more molecules in system

2011-01-24 Thread Mark Abraham

On 25/01/2011 8:25 AM, Christian Mötzing wrote:


I compiled mdrun-gpu and tried some waterbox systems with different
atoms counts.

atoms  | GPU| CPU
2.400  | 1.015s | 774s
4.800  | 1.225s | 1.202s
9.600  | 1.142s | 1.353s
19.200 | 2.984s | 2.812s

Why does the system with 9.600 atoms finish faster than the one with
4.800? I tripple checked the simualtions and even GROMACs tells me that
the atom count in the system is like above. So I think no mistaken
there. A diff of md.log only shows differences in output values for each

Is there any explanation for this behaviour?

As a guess, the cost of overheads for molecular simulations tend to have 
a weaker dependence on system size than the cost of computation (or none 
at all). Only once the latter dominate the cost do you see scaling with 
system size.

I expect you'd see similar behaviour running systems with 64, 128, 256, 
512 atoms on 64 processors.

gmx-users mailing
Please search the archive at before posting!
Please don't post (un)subscribe requests to the list. Use the 
www interface or send it to

Can't post? Read

Re: [gmx-users] gpu

2010-11-07 Thread Rossen Apostolov


Did you read this?


On 11/7/10 1:23 PM, Erik Wensink wrote:

Dear gmx-users,
How to invoke the gpu for simulations, e.g. is there (compiler) flag?

gmx-users mailing
Please search the archive at before posting!
Please don't post (un)subscribe requests to the list. Use the 
www interface or send it to
Can't post? Read

Re: [gmx-users] gpu

2010-11-07 Thread Erik Wensink

--- On Sun, 11/7/10, Rossen Apostolov wrote:

From: Rossen Apostolov
Subject: Re: [gmx-users] gpu
Date: Sunday, November 7, 2010, 4:27 PM



Did you read this?


On 11/7/10 1:23 PM, Erik Wensink wrote:


Dear gmx-users,

  How to invoke the gpu for simulations, e.g. is there
  (compiler) flag?






-Inline Attachment Follows-

gmx-users mailing list
Please search the archive at before posting!
Please don't post (un)subscribe requests to the list. Use the 
www interface or send it to
Can't post? Read

gmx-users mailing
Please search the archive at before posting!
Please don't post (un)subscribe requests to the list. Use the 
www interface or send it to
Can't post? Read

Re: [gmx-users] GPU slower than I7

2010-10-25 Thread Renato Freitas

My OS is Fedora 13 (64 bits) and I used gcc 4.4.4. I ran the program
you sent me. Bellow are the results of 5 runs. As you can see the
results are rougly the same

[ren...@scrat ~]$ ./time
2.09 2.102991
[ren...@scrat ~]$ ./time
2.09 2.102808
[ren...@scrat ~]$ ./time
2.09 2.104577
[ren...@scrat ~]$ ./time
2.09 2.103943
[ren...@scrat ~]$ ./time
2.09 2.104471

Bellow are part of the /src/configure.h

/* Define to 1 if you have the MSVC _aligned_malloc() function. */

/* Define to 1 if you have the gettimeofday() function. */

/* Define to 1 if you have the cbrt() function. */
#define HAVE_CBRT

 Is this OK?


2010/10/22 Roland Schulz

 On Fri, Oct 22, 2010 at 3:20 PM, Renato Freitas wrote:

 Do you think that the NODE and Real time difference could be
 attributed to some compilation problem in the mdrun-gpu. Despite I'm
 asking this I didn't get any error in the compilation.

 It is very odd that these are different for you system. What operating
 system and compiler do you use?
 Is HAVE_GETTIMEOFDAY set in src/config.h?
 I attached a small test program which uses the two different timers used for
 NODE and Real time. You can compile it with cc time.c -o time and run it
 with ./time. Do you get roughly the same time twice with the test program or
 do you see the same discrepancy as with GROMACS?



 2010/10/22 Szilárd Páll
  Hi Renato,
  First of all, what you're seeing is pretty normal, especially that you
  have a CPU that is crossing the border of insane :) Why is it normal?
  The PME algorithms are just simply not very well not well suited for
  current GPU architectures. With an ill-suited algorithm you won't be
  able to see the speedups you can often see in other application areas
  - -even more so that you're comparing to Gromacs on a i7 980X. For
  more info + benchmarks see the Gromacs-GPU page:
  However, there is one strange thing you also pointed out. The fact
  that the NODE and Real time in your mdrun-gpu timing summary is
  not the same, but has 3x deviation is _very_ unusual. I've ran
  mdrun-gpu on quite a wide variety of hardware but I've never seen
  those two counter deviate. It might be an artifact from the cycle
  counters used internally that behave in an unusual way on your CPU.
  One other thing I should point out is that you would be better off
  using the standard mdrun which in 4.5 by default has thread-support
  and therefore will run on a single cpu/node without MPI!
  On Thu, Oct 21, 2010 at 9:18 PM, Renato Freitas
  Hi gromacs users,
  I have installed the lastest version of gromacs (4.5.1) in an i7 980X
  (6 cores or 12 with HT on; 3.3 GHz) with 12GB of RAM and compiled its
  mpi version. Also I compiled the GPU-accelerated
  version of gromacs. Then I did a  2 ns simulation using a small system
  (11042 atoms)  to compare the performance of mdrun-gpu vs mdrun_mpi.
  The results that I got are bellow:
  My *.mdp is:
  constraints         =  all-bonds
  integrator          =  md
  dt                  =  0.002    ; ps !
  nsteps              =  100  ; total 2000 ps.
  nstlist             =  10
  ns_type             =  grid
  coulombtype    = PME
  rvdw                = 0.9
  rlist               = 0.9
  rcoulomb            = 0.9
  fourierspacing      = 0.10
  pme_order           = 4
  ewald_rtol          = 1e-5
  vdwtype             =  cut-off
  pbc                 =  xyz
  epsilon_rf    =  0
  comm_mode           =  linear
  nstxout             =  1000
  nstvout             =  0
  nstfout             =  0
  nstxtcout           =  1000
  nstlog              =  1000
  nstenergy           =  1000
  ; Berendsen temperature coupling is on in four groups
  tcoupl              = berendsen
  tc-grps             = system
  tau-t               = 0.1
  ref-t               = 298
  ; Pressure coupling is on
  Pcoupl = berendsen
  pcoupltype = isotropic
  tau_p = 0.5
  compressibility = 4.5e-5
  ref_p = 1.0
  ; Generate velocites is on at 298 K.
  gen_vel = no
  mdrun-gpu -s topol.tpr -v   out 
  Here is a part of the md.log:
  Started mdrun on node 0 Wed Oct 20 09:52:09 2010
      R E A L   C Y C L E   A N D   T I M E   A C C O U N T I N G
   Computing:     Nodes   Number          G-Cycles        Seconds     %
   Write traj.    1               1021                    106.075 31.7
   Rest                   1               64125.577               19178.6

Re: [gmx-users] GPU slower than I7

2010-10-22 Thread Szilárd Páll
Hi Renato,

First of all, what you're seeing is pretty normal, especially that you
have a CPU that is crossing the border of insane :) Why is it normal?
The PME algorithms are just simply not very well not well suited for
current GPU architectures. With an ill-suited algorithm you won't be
able to see the speedups you can often see in other application areas
- -even more so that you're comparing to Gromacs on a i7 980X. For
more info + benchmarks see the Gromacs-GPU page:

However, there is one strange thing you also pointed out. The fact
that the NODE and Real time in your mdrun-gpu timing summary is
not the same, but has 3x deviation is _very_ unusual. I've ran
mdrun-gpu on quite a wide variety of hardware but I've never seen
those two counter deviate. It might be an artifact from the cycle
counters used internally that behave in an unusual way on your CPU.

One other thing I should point out is that you would be better off
using the standard mdrun which in 4.5 by default has thread-support
and therefore will run on a single cpu/node without MPI!


On Thu, Oct 21, 2010 at 9:18 PM, Renato Freitas wrote:
 Hi gromacs users,

 I have installed the lastest version of gromacs (4.5.1) in an i7 980X
 (6 cores or 12 with HT on; 3.3 GHz) with 12GB of RAM and compiled its
 mpi version. Also I compiled the GPU-accelerated
 version of gromacs. Then I did a  2 ns simulation using a small system
 (11042 atoms)  to compare the performance of mdrun-gpu vs mdrun_mpi.
 The results that I got are bellow:

 My *.mdp is:

 constraints         =  all-bonds
 integrator          =  md
 dt                  =  0.002    ; ps !
 nsteps              =  100  ; total 2000 ps.
 nstlist             =  10
 ns_type             =  grid
 coulombtype    = PME
 rvdw                = 0.9
 rlist               = 0.9
 rcoulomb            = 0.9
 fourierspacing      = 0.10
 pme_order           = 4
 ewald_rtol          = 1e-5
 vdwtype             =  cut-off
 pbc                 =  xyz
 epsilon_rf    =  0
 comm_mode           =  linear
 nstxout             =  1000
 nstvout             =  0
 nstfout             =  0
 nstxtcout           =  1000
 nstlog              =  1000
 nstenergy           =  1000
 ; Berendsen temperature coupling is on in four groups
 tcoupl              = berendsen
 tc-grps             = system
 tau-t               = 0.1
 ref-t               = 298
 ; Pressure coupling is on
 Pcoupl = berendsen
 pcoupltype = isotropic
 tau_p = 0.5
 compressibility = 4.5e-5
 ref_p = 1.0
 ; Generate velocites is on at 298 K.
 gen_vel = no


 mdrun-gpu -s topol.tpr -v   out 

 Here is a part of the md.log:

 Started mdrun on node 0 Wed Oct 20 09:52:09 2010
     R E A L   C Y C L E   A N D   T I M E   A C C O U N T I N G

  Computing:     Nodes   Number          G-Cycles        Seconds     %
  Write traj.    1               1021                    106.075 31.7          
  Rest                   1               64125.577               19178.6 99.8
  Total          1               64231.652               19210.3 100.0

                        NODE (s)                Real (s)                (%)
       Time:    6381.840                19210.349               33.2
                        (Mnbf/s)   (MFlops)     (ns/day)        (hour/ns)
 Performance:    0.000   0.001   27.077  0.886

 Finished mdrun on node 0 Wed Oct 20 15:12:19 2010


 mpirun -np 6 mdrun_mpi -s topol.tpr -npme 3 -v   out 

 Here is a part of the md.log:

 Started mdrun on node 0 Wed Oct 20 18:30:52 2010

     R E A L   C Y C L E   A N D   T I M E   A C C O U N T I N G

  Computing:             Nodes   Number  G-Cycles    Seconds             %
  Domain decomp. 3              11     1452.166      434.7             0.6
  DD comm. load          3              10001        0.745          0.2
  Send X to PME         3              101    249.003       74.5
  Comm. coord.           3              101   637.329        190.8
  Neighbor search        3              11     8738.669      2616.0
  Force                       3              101   99210.202
 29699.2        39.2
  Wait + Comm. F       3              101   3361.591       1006.3         
  PME mesh               3              101   66189.554     19814.2
  Wait + 

Re: [gmx-users] GPU slower than I7

2010-10-22 Thread Renato Freitas
Hi Roland,

In fact I get better performance values using different rcoulomb,
fourierspacing and the values of -npme suggested by g_tune_pme using

The simulation  using GPU was carried out using the dedicated machine,
no other programs was runnig, even the graphical interface was

About the CPU vs GPU simulation time, Szilárd explained that the PME
algorithms still are not very well suited for current GPU
architectures. I just don't know why the NODE and REAL times are not



2010/10/21 Roland Schulz

 On Thu, Oct 21, 2010 at 5:53 PM, Renato Freitas wrote:

 Thanks Roland. I will do a newer test using the fourier spacing equal
 to 0.11.

 I'd also suggest to look at g_tune_pme and run with different rcoulomb,
 fourier_spacing. As long as the ratio is the same you get the same accuracy.
 And you should get better performance (especially on the GPU) for longer
 cut-off and larger grid-spacing.

 However, about the performance of GPU versus CPU (mpi) let me
 try to explain it better:


             NODE (s)                Real (s)                (%)
 Time:    6381.840                19210.349            33.2
                         (Mnbf/s)   (MFlops)     (ns/day)        (hour/ns)
 Performance:    0.000       0.001          27.077          0.886


             NODE (s)         Real (s)                    (%)
 Time:    12621.257       12621.257               100.0
                      (Mnbf/s)      (GFlops)     (ns/day)        (hour/ns)
 Performance: 388.633      28.773        13.691         1.753

 Yes. Sorry I didn't realize that NODE time and  Real time is different. Did
 you run the GPU calculation on a desktop machine which was also doing other
 things at the time. This might explain it. As far as I know for a dedicated
 machine not running any other programs NODE and Real time should be the

 Looking abobe we can see that the gromacs prints in the output that
 the simulation is faster when the GPU is used. But this is not the
 reality. The truth is that simulation time with MPI was 106 min faster
 thatn that with GPU. It seems correct to you? As I said before, I was
 expecting that GPU should take a lower time than the 6 core MPI.

  Well the exact time depends on a lot of factors. And you probably can speed
 up both. But I would expect them to be both about similar fast.
 gmx-users mailing list
 Please search the archive at before posting!
 Please don't post (un)subscribe requests to the list. Use the
 www interface or send it to
 Can't post? Read

gmx-users mailing
Please search the archive at before posting!
Please don't post (un)subscribe requests to the list. Use the
www interface or send it to
Can't post? Read

Re: [gmx-users] GPU slower than I7

2010-10-22 Thread Renato Freitas
Hi Szilárd,

Thans for your explanation. Do you know if there will be a new
improvement of PME algorithms to take the full advantage of GPU video

Do you think that the NODE and Real time difference could be
attributed to some compilation problem in the mdrun-gpu. Despite I'm
asking this I didn't get any error in the compilation.



2010/10/22 Szilárd Páll
 Hi Renato,

 First of all, what you're seeing is pretty normal, especially that you
 have a CPU that is crossing the border of insane :) Why is it normal?
 The PME algorithms are just simply not very well not well suited for
 current GPU architectures. With an ill-suited algorithm you won't be
 able to see the speedups you can often see in other application areas
 - -even more so that you're comparing to Gromacs on a i7 980X. For
 more info + benchmarks see the Gromacs-GPU page:

 However, there is one strange thing you also pointed out. The fact
 that the NODE and Real time in your mdrun-gpu timing summary is
 not the same, but has 3x deviation is _very_ unusual. I've ran
 mdrun-gpu on quite a wide variety of hardware but I've never seen
 those two counter deviate. It might be an artifact from the cycle
 counters used internally that behave in an unusual way on your CPU.

 One other thing I should point out is that you would be better off
 using the standard mdrun which in 4.5 by default has thread-support
 and therefore will run on a single cpu/node without MPI!


 On Thu, Oct 21, 2010 at 9:18 PM, Renato Freitas wrote:
 Hi gromacs users,

 I have installed the lastest version of gromacs (4.5.1) in an i7 980X
 (6 cores or 12 with HT on; 3.3 GHz) with 12GB of RAM and compiled its
 mpi version. Also I compiled the GPU-accelerated
 version of gromacs. Then I did a  2 ns simulation using a small system
 (11042 atoms)  to compare the performance of mdrun-gpu vs mdrun_mpi.
 The results that I got are bellow:

 My *.mdp is:

 constraints         =  all-bonds
 integrator          =  md
 dt                  =  0.002    ; ps !
 nsteps              =  100  ; total 2000 ps.
 nstlist             =  10
 ns_type             =  grid
 coulombtype    = PME
 rvdw                = 0.9
 rlist               = 0.9
 rcoulomb            = 0.9
 fourierspacing      = 0.10
 pme_order           = 4
 ewald_rtol          = 1e-5
 vdwtype             =  cut-off
 pbc                 =  xyz
 epsilon_rf    =  0
 comm_mode           =  linear
 nstxout             =  1000
 nstvout             =  0
 nstfout             =  0
 nstxtcout           =  1000
 nstlog              =  1000
 nstenergy           =  1000
 ; Berendsen temperature coupling is on in four groups
 tcoupl              = berendsen
 tc-grps             = system
 tau-t               = 0.1
 ref-t               = 298
 ; Pressure coupling is on
 Pcoupl = berendsen
 pcoupltype = isotropic
 tau_p = 0.5
 compressibility = 4.5e-5
 ref_p = 1.0
 ; Generate velocites is on at 298 K.
 gen_vel = no


 mdrun-gpu -s topol.tpr -v   out 

 Here is a part of the md.log:

 Started mdrun on node 0 Wed Oct 20 09:52:09 2010
     R E A L   C Y C L E   A N D   T I M E   A C C O U N T I N G

  Computing:     Nodes   Number          G-Cycles        Seconds     %
  Write traj.    1               1021                    106.075 31.7         
  Rest                   1               64125.577               19178.6 99.8
  Total          1               64231.652               19210.3 100.0

                        NODE (s)                Real (s)                (%)
       Time:    6381.840                19210.349               33.2
                        (Mnbf/s)   (MFlops)     (ns/day)        (hour/ns)
 Performance:    0.000   0.001   27.077  0.886

 Finished mdrun on node 0 Wed Oct 20 15:12:19 2010


 mpirun -np 6 mdrun_mpi -s topol.tpr -npme 3 -v   out 

 Here is a part of the md.log:

 Started mdrun on node 0 Wed Oct 20 18:30:52 2010

     R E A L   C Y C L E   A N D   T I M E   A C C O U N T I N G

  Computing:             Nodes   Number  G-Cycles    Seconds             %
  Domain decomp. 3              11     1452.166      434.7             0.6
  DD comm. load          3              10001        0.745          0.2
  Send X to PME         3              101    249.003       74.5

Re: [gmx-users] GPU slower than I7

2010-10-22 Thread Roland Schulz

On Fri, Oct 22, 2010 at 3:20 PM, Renato Freitas wrote:

 Do you think that the NODE and Real time difference could be
 attributed to some compilation problem in the mdrun-gpu. Despite I'm
 asking this I didn't get any error in the compilation.

It is very odd that these are different for you system. What operating
system and compiler do you use?

Is HAVE_GETTIMEOFDAY set in src/config.h?

I attached a small test program which uses the two different timers used for
NODE and Real time. You can compile it with cc time.c -o time and run it
with ./time. Do you get roughly the same time twice with the test program or
do you see the same discrepancy as with GROMACS?




 2010/10/22 Szilárd Páll
  Hi Renato,
  First of all, what you're seeing is pretty normal, especially that you
  have a CPU that is crossing the border of insane :) Why is it normal?
  The PME algorithms are just simply not very well not well suited for
  current GPU architectures. With an ill-suited algorithm you won't be
  able to see the speedups you can often see in other application areas
  - -even more so that you're comparing to Gromacs on a i7 980X. For
  more info + benchmarks see the Gromacs-GPU page:
  However, there is one strange thing you also pointed out. The fact
  that the NODE and Real time in your mdrun-gpu timing summary is
  not the same, but has 3x deviation is _very_ unusual. I've ran
  mdrun-gpu on quite a wide variety of hardware but I've never seen
  those two counter deviate. It might be an artifact from the cycle
  counters used internally that behave in an unusual way on your CPU.
  One other thing I should point out is that you would be better off
  using the standard mdrun which in 4.5 by default has thread-support
  and therefore will run on a single cpu/node without MPI!
  On Thu, Oct 21, 2010 at 9:18 PM, Renato Freitas
  Hi gromacs users,
  I have installed the lastest version of gromacs (4.5.1) in an i7 980X
  (6 cores or 12 with HT on; 3.3 GHz) with 12GB of RAM and compiled its
  mpi version. Also I compiled the GPU-accelerated
  version of gromacs. Then I did a  2 ns simulation using a small system
  (11042 atoms)  to compare the performance of mdrun-gpu vs mdrun_mpi.
  The results that I got are bellow:
  My *.mdp is:
  constraints =  all-bonds
  integrator  =  md
  dt  =  0.002; ps !
  nsteps  =  100  ; total 2000 ps.
  nstlist =  10
  ns_type =  grid
  coulombtype= PME
  rvdw= 0.9
  rlist   = 0.9
  rcoulomb= 0.9
  fourierspacing  = 0.10
  pme_order   = 4
  ewald_rtol  = 1e-5
  vdwtype =  cut-off
  pbc =  xyz
  epsilon_rf=  0
  comm_mode   =  linear
  nstxout =  1000
  nstvout =  0
  nstfout =  0
  nstxtcout   =  1000
  nstlog  =  1000
  nstenergy   =  1000
  ; Berendsen temperature coupling is on in four groups
  tcoupl  = berendsen
  tc-grps = system
  tau-t   = 0.1
  ref-t   = 298
  ; Pressure coupling is on
  Pcoupl = berendsen
  pcoupltype = isotropic
  tau_p = 0.5
  compressibility = 4.5e-5
  ref_p = 1.0
  ; Generate velocites is on at 298 K.
  gen_vel = no
  mdrun-gpu -s topol.tpr -v   out 
  Here is a part of the md.log:
  Started mdrun on node 0 Wed Oct 20 09:52:09 2010
  R E A L   C Y C L E   A N D   T I M E   A C C O U N T I N G
   Computing: Nodes   Number  G-CyclesSeconds %
   Write traj.1   1021106.075 31.7
   Rest   1   64125.577   19178.6
   Total  1   64231.652   19210.3 100.0
 NODE (s)Real (s)
Time:6381.84019210.349   33.2
 (Mnbf/s)   (MFlops) (ns/day)(hour/ns)
  Performance:0.000   0.001   27.077  0.886
  Finished mdrun on node 0 Wed Oct 20 15:12:19 2010
  mpirun -np 6 mdrun_mpi -s topol.tpr -npme 3 -v   out 
  Here is a part of the md.log:
  Started mdrun on node 0 Wed Oct 20 18:30:52 2010
  R E A L   C Y C L E   A N D   T I 

Re: [gmx-users] GPU slower than I7

2010-10-21 Thread Roland Schulz
On Thu, Oct 21, 2010 at 3:18 PM, Renato Freitas wrote:

 Hi gromacs users,

 I have installed the lastest version of gromacs (4.5.1) in an i7 980X
 (6 cores or 12 with HT on; 3.3 GHz) with 12GB of RAM and compiled its
 mpi version. Also I compiled the GPU-accelerated
 version of gromacs. Then I did a  2 ns simulation using a small system
 (11042 atoms)  to compare the performance of mdrun-gpu vs mdrun_mpi.
 The results that I got are bellow:

 My *.mdp is:

 constraints =  all-bonds
 integrator  =  md
 dt  =  0.002; ps !
 nsteps  =  100  ; total 2000 ps.
 nstlist =  10
 ns_type =  grid
 coulombtype= PME
 rvdw= 0.9
 rlist   = 0.9
 rcoulomb= 0.9
 fourierspacing  = 0.10
 pme_order   = 4
 ewald_rtol  = 1e-5
 vdwtype =  cut-off
 pbc =  xyz
 epsilon_rf=  0
 comm_mode   =  linear
 nstxout =  1000
 nstvout =  0
 nstfout =  0
 nstxtcout   =  1000
 nstlog  =  1000
 nstenergy   =  1000
 ; Berendsen temperature coupling is on in four groups
 tcoupl  = berendsen
 tc-grps = system
 tau-t   = 0.1
 ref-t   = 298
 ; Pressure coupling is on
 Pcoupl = berendsen
 pcoupltype = isotropic
 tau_p = 0.5
 compressibility = 4.5e-5
 ref_p = 1.0
 ; Generate velocites is on at 298 K.
 gen_vel = no


 mdrun-gpu -s topol.tpr -v   out 

 Here is a part of the md.log:

 Started mdrun on node 0 Wed Oct 20 09:52:09 2010
 R E A L   C Y C L E   A N D   T I M E   A C C O U N T I N G

  Computing: Nodes   Number  G-CyclesSeconds %

  Write traj.1   1021106.075 31.7
  Rest   1   64125.577   19178.6

  Total  1   64231.652   19210.3 100.0


NODE (s)Real (s)(%)
   Time:6381.84019210.349   33.2
(Mnbf/s)   (MFlops) (ns/day)(hour/ns)
 Performance:0.000   0.001   27.077  0.886

 Finished mdrun on node 0 Wed Oct 20 15:12:19 2010


 mpirun -np 6 mdrun_mpi -s topol.tpr -npme 3 -v   out 

 Here is a part of the md.log:

 Started mdrun on node 0 Wed Oct 20 18:30:52 2010

 R E A L   C Y C L E   A N D   T I M E   A C C O U N T I N G

  Computing: Nodes   Number  G-CyclesSeconds %

  Domain decomp. 3  11 1452.166  434.7
  DD comm. load  3  100010.745  0.2
  Send X to PME 3  101249.003   74.5
  Comm. coord.   3  101   637.329190.8
  Neighbor search3  11 8738.669  2616.0
  Force   3  101   99210.202
  Wait + Comm. F   3  101   3361.591   1006.3
  PME mesh   3  101   66189.554 19814.2
  Wait + Comm. X/F3  60294.513 8049.5  23.8
  Wait + Recv. PME F 3  101801.897240.1
  Write traj. 3  1015 33.464
  10.0 0.0
  Update 3  1013295.820
 986.6  1.3
  Constraints  3  101 6317.568
 1891.2  2.5
  Comm. energies   3  12  70.784  21.2
  Rest3  2314.844
693.0   0.9

  Total6  252968.14875727.5


  PME redist. X/F3  2021945.551  582.4
  PME spread/gather   3  202

Re: [gmx-users] GPU slower than I7

2010-10-21 Thread Renato Freitas
Thanks Roland. I will do a newer test using the fourier spacing equal
to 0.11. However, about the performance of GPU versus CPU (mpi) let me
try to explain it better:

The simulation using gromacs with GPU started and finished:

Started mdrun on node 0 Wed Oct 20 09:52:09 2010
Finished mdrun on node 0 Wed Oct 20 15:12:19 2010

Total time = 320 min

The simulation using gromacs with mpi started and finished:

Started mdrun on node 0 Wed Oct 20 18:30:52 2010
Finished mdrun on node 0 Wed Oct 20 22:01:14 2010

Total time = 211 min

Based on this numbers, it was the CPU with mpi that was faster than
the GPU, by aproximately 106 min. But looking at the end of each
output I have:


 NODE (s)Real (s)(%)
 (Mnbf/s)   (MFlops) (ns/day)(hour/ns)
Performance:0.000   0.001  27.077  0.886


 NODE (s) Real (s)(%)
Time:12621.257   12621.257   100.0
  (Mnbf/s)  (GFlops) (ns/day)(hour/ns)
Performance: 388.633  28.77313.691 1.753

Looking abobe we can see that the gromacs prints in the output that
the simulation is faster when the GPU is used. But this is not the
reality. The truth is that simulation time with MPI was 106 min faster
thatn that with GPU. It seems correct to you? As I said before, I was
expecting that GPU should take a lower time than the 6 core MPI.



2010/10/21 Roland Schulz

 On Thu, Oct 21, 2010 at 3:18 PM, Renato Freitas wrote:

 Hi gromacs users,

 I have installed the lastest version of gromacs (4.5.1) in an i7 980X
 (6 cores or 12 with HT on; 3.3 GHz) with 12GB of RAM and compiled its
 mpi version. Also I compiled the GPU-accelerated
 version of gromacs. Then I did a  2 ns simulation using a small system
 (11042 atoms)  to compare the performance of mdrun-gpu vs mdrun_mpi.
 The results that I got are bellow:

 My *.mdp is:

 constraints         =  all-bonds
 integrator          =  md
 dt                  =  0.002    ; ps !
 nsteps              =  100  ; total 2000 ps.
 nstlist             =  10
 ns_type             =  grid
 coulombtype    = PME
 rvdw                = 0.9
 rlist               = 0.9
 rcoulomb            = 0.9
 fourierspacing      = 0.10
 pme_order           = 4
 ewald_rtol          = 1e-5
 vdwtype             =  cut-off
 pbc                 =  xyz
 epsilon_rf    =  0
 comm_mode           =  linear
 nstxout             =  1000
 nstvout             =  0
 nstfout             =  0
 nstxtcout           =  1000
 nstlog              =  1000
 nstenergy           =  1000
 ; Berendsen temperature coupling is on in four groups
 tcoupl              = berendsen
 tc-grps             = system
 tau-t               = 0.1
 ref-t               = 298
 ; Pressure coupling is on
 Pcoupl = berendsen
 pcoupltype = isotropic
 tau_p = 0.5
 compressibility = 4.5e-5
 ref_p = 1.0
 ; Generate velocites is on at 298 K.
 gen_vel = no


 mdrun-gpu -s topol.tpr -v   out 

 Here is a part of the md.log:

 Started mdrun on node 0 Wed Oct 20 09:52:09 2010
     R E A L   C Y C L E   A N D   T I M E   A C C O U N T I N G

  Computing:     Nodes   Number          G-Cycles        Seconds     %

  Write traj.    1               1021                    106.075 31.7
  Rest                   1               64125.577               19178.6

  Total          1               64231.652               19210.3 100.0


                        NODE (s)                Real (s)                (%)
       Time:    6381.840                19210.349               33.2
                        (Mnbf/s)   (MFlops)     (ns/day)        (hour/ns)
 Performance:    0.000   0.001   27.077  0.886

 Finished mdrun on node 0 Wed Oct 20 15:12:19 2010


 mpirun -np 6 mdrun_mpi -s topol.tpr -npme 3 -v   out 

 Here is a part of the md.log:

 Started mdrun on node 0 Wed Oct 20 18:30:52 2010

     R E A L   C Y C L E   A N D   T I M E   A C C O U N T I N G

  Computing:             Nodes   Number  G-Cycles    Seconds             %

  Domain decomp. 3              11     1452.166      434.7
  DD comm. load          3             

Re: [gmx-users] GPU slower than I7

2010-10-21 Thread Roland Schulz
On Thu, Oct 21, 2010 at 5:53 PM, Renato Freitas wrote:

 Thanks Roland. I will do a newer test using the fourier spacing equal
 to 0.11.

I'd also suggest to look at g_tune_pme and run with different rcoulomb,
fourier_spacing. As long as the ratio is the same you get the same accuracy.
And you should get better performance (especially on the GPU) for longer
cut-off and larger grid-spacing.

 However, about the performance of GPU versus CPU (mpi) let me
 try to explain it better:


 NODE (s)Real (s)(%)
 (Mnbf/s)   (MFlops) (ns/day)(hour/ns)
 Performance:0.000   0.001  27.077  0.886


 NODE (s) Real (s)(%)
 Time:12621.257   12621.257   100.0
  (Mnbf/s)  (GFlops) (ns/day)(hour/ns)
 Performance: 388.633  28.77313.691 1.753

Yes. Sorry I didn't realize that NODE time and  Real time is different. Did
you run the GPU calculation on a desktop machine which was also doing other
things at the time. This might explain it. As far as I know for a dedicated
machine not running any other programs NODE and Real time should be the

Looking abobe we can see that the gromacs prints in the output that
 the simulation is faster when the GPU is used. But this is not the
 reality. The truth is that simulation time with MPI was 106 min faster
 thatn that with GPU. It seems correct to you? As I said before, I was
 expecting that GPU should take a lower time than the 6 core MPI.

 Well the exact time depends on a lot of factors. And you probably can speed
up both. But I would expect them to be both about similar fast.

gmx-users mailing
Please search the archive at before posting!
Please don't post (un)subscribe requests to the list. Use the 
www interface or send it to
Can't post? Read

  1   2   >