
 I reduced the GPU to two, and it said:

Back Off! I just backed up nvt.log to ./#nvt.log.1#
Reading file nvt.tpr, VERSION 4.6-dev-20121004-5d6c49d (single precision)

NOTE: GPU(s) found, but the current simulation can not use GPUs
      To use a GPU, set the mdp option: cutoff-scheme = Verlet
      (for quick performance testing you can use the -testverlet option)

Using 2 MPI processes

4 GPUs detected on host CUDANodeA:
  #0: NVIDIA GeForce GTX 590, compute cap.: 2.0, ECC:  no, stat: compatible
  #1: NVIDIA GeForce GTX 590, compute cap.: 2.0, ECC:  no, stat: compatible
  #2: NVIDIA GeForce GTX 590, compute cap.: 2.0, ECC:  no, stat: compatible
  #3: NVIDIA GeForce GTX 590, compute cap.: 2.0, ECC:  no, stat: compatible

Making 1D domain decomposition 2 x 1 x 1

We have just committed the new CPU detection code in this branch,
and will commit new SSE/AVX kernels in a few days. However, this
means that currently only the NxN kernels are accelerated!
In the mean time, you might want to avoid production runs in 4.6.

when I run it with single GPU, it produced lots of pdb file with prefix "step", and then it crashed with messages:

Wrote pdb files with previous and current coordinates
Warning: 1-4 interaction between 4674 and 4706 at distance 434.986 which is larger than the 1-4 table size 2.200 nm
These are ignored for the rest of the simulation
This usually means your system is exploding,
if not, you should increase table-extension in your mdp file
or with user tables increase the table size
[CUDANodeA:20659] *** Process received signal ***
[CUDANodeA:20659] Signal: Segmentation fault (11)
[CUDANodeA:20659] Signal code: Address not mapped (1)
[CUDANodeA:20659] Failing at address: 0xc7aa00dc
[CUDANodeA:20659] [ 0] /lib64/libpthread.so.0(+0xf2d0) [0x2ab25c76d2d0]
[CUDANodeA:20659] [ 1] /opt/gromacs-4.6/lib/libmd_mpi.so.6(+0x11020f) [0x2ab259e0720f] [CUDANodeA:20659] [ 2] /opt/gromacs-4.6/lib/libmd_mpi.so.6(+0x111c94) [0x2ab259e08c94] [CUDANodeA:20659] [ 3] /opt/gromacs-4.6/lib/libmd_mpi.so.6(gmx_pme_do+0x1d2e) [0x2ab259e0cbae] [CUDANodeA:20659] [ 4] /opt/gromacs-4.6/lib/libmd_mpi.so.6(do_force_lowlevel+0x1eef) [0x2ab259ddd62f] [CUDANodeA:20659] [ 5] /opt/gromacs-4.6/lib/libmd_mpi.so.6(do_force_cutsGROUP+0x1495) [0x2ab259e72a45]
[CUDANodeA:20659] [ 6] mdrun_mpi(do_md+0x8133) [0x4334c3]
[CUDANodeA:20659] [ 7] mdrun_mpi(mdrunner+0x19e9) [0x411639]
[CUDANodeA:20659] [ 8] mdrun_mpi(main+0x17db) [0x4373db]
[CUDANodeA:20659] [ 9] /lib64/libc.so.6(__libc_start_main+0xfd) [0x2ab25c999bfd]
[CUDANodeA:20659] [10] mdrun_mpi() [0x407f09]
[CUDANodeA:20659] *** End of error message ***

[1] Segmentation fault mdrun_mpi -v -s nvt.tpr -c nvt.gro -g nvt.log -x nvt.xtc

here is the .mdp file I used:

title           = NVT equilibration for OR-POPC system
define = -DPOSRES -DPOSRES_LIG ; Protein is position restrained (uses the posres.itp file information)
; Parameters describing the details of the NVT simulation protocol
integrator = md ; Algorithm ("md" = molecular dynamics [leap-frog integrator]; "md-vv" = md using velocity verlet; sd = stochastic dynamics)
dt              = 0.002         ; Time-step (ps)
nsteps = 250000 ; Number of steps to run (0.002 * 250000 = 500 ps)

; Parameters controlling output writing
nstxout = 0 ; Write coordinates to output .trr file every 2 ps nstvout = 0 ; Write velocities to output .trr file every 2 ps
nstfout         = 0

nstxtcout       = 1000
nstenergy = 1000 ; Write energies to output .edr file every 2 ps
nstlog          = 1000          ; Write output to .log file every 2 ps

; Parameters describing neighbors searching and details about interaction calculations
ns_type         = grid          ; Neighbor list search method (simple, grid)
nstlist = 50 ; Neighbor list update frequency (after every given number of steps)
rlist           = 1.2           ; Neighbor list search cut-off distance (nm)
rlistlong       = 1.4
rcoulomb = 1.2 ; Short-range Coulombic interactions cut-off distance (nm) rvdw = 1.2 ; Short-range van der Waals cutoff distance (nm) pbc = xyz ; Direction in which to use Perodic Boundary Conditions (xyz, xy, no)
cutoff-scheme   =Verlet  ; GPU running

; Parameters for treating bonded interactions
continuation = no ; Whether a fresh start or a continuation from a previous run (yes/no)
constraint_algorithm = LINCS    ; Constraint algorithm (LINCS / SHAKE)
constraints = all-bonds ; Which bonds/angles to constrain (all-bonds / hbonds / none / all-angles / h-angles) lincs_iter = 1 ; Number of iterations to correct for rotational lengthening in LINCS (related to accuracy) lincs_order = 4 ; Highest order in the expansion of the constraint coupling matrix (related to accuracy)

; Parameters for treating electrostatic interactions
coulombtype = PME ; Long range electrostatic interactions treatment (cut-off, Ewald, PME) pme_order = 4 ; Interpolation order for PME (cubic interpolation is represented by 4) fourierspacing = 0.12 ; Maximum grid spacing for FFT grid using PME (nm)

; Temperature coupling parameters
tcoupl = V-rescale ; Modified Berendsen thermostat using velocity rescaling tc-grps = Protein_LIG POPC Water_and_ions ; Define groups to be coupled separately to temperature bath tau_t = 0.1 0.1 0.1 ; Group-wise coupling time constant (ps) ref_t = 303 303 303 ; Group-wise reference temperature (K)

; Pressure coupling parameters
pcoupl = no ; Under NVT conditions pressure coupling is not done

; Miscellaneous control parameters
; Dispersion correction
DispCorr = EnerPres ; Dispersion corrections for Energy and Pressure for vdW cut-off
; Initial Velocity Generation
gen_vel = yes ; Generate velocities from Maxwell distribution at given temperature gen_temp = 303 ; Specific temperature for Maxwell distribution (K) gen_seed = -1 ; Use random seed for velocity generation (integer; -1 means seed is calculated from the process ID number)
; Centre of mass (COM) motion removal relative to the specified groups
nstcomm         = 1                     ; COM removal frequency (steps)
comm_mode = Linear ; Remove COM translation (linear / angular / no) comm_grps = Protein_LIG_POPC Water_and_ions ; COM removal relative to the specified groups


On 12/17/2012 05:45 PM, Szilárd Páll wrote:

That unfortunately tell exactly about the reason why mdrun is stuck. Can
you reproduce the issue on another machines or with different launch
configurations? At which step does it get stuck (-stepout 1 can help)?

Please try the following:
- try running on a single GPU;
- try running on CPUs only (-nb cpu and to match closer the GPU setup with
-ntomp 12);
- try running in GPU emulation mode with the GMX_EMULATE_GPU=1 env. var
set (and to match closer the GPU setup with -ntomp 12)
- provide a backtrace (using gdb).



On Mon, Dec 17, 2012 at 5:37 PM, Albert <mailmd2...@gmail.com> wrote:


  I am running GMX-4.6 beta2 GPU work in a 24 CPU core workstation with two
GTX590, it stacked there without any output i.e the .xtc file size is
always 0 after hours of running. Here is the md.log file I found:

Using CUDA 8x8x8 non-bonded kernels

Potential shift: LJ r^-12: 0.112 r^-6 0.335, Ewald 1.000e-05
Initialized non-bonded Ewald correction tables, spacing: 7.82e-04 size:

Removing pbc first time
Pinning to Hyper-Threading cores with 12 physical cores in a compute node
There are 1 flexible constraints

WARNING: step size for flexible constraining = 0
          All flexible constraints will be rigid.
          Will try to keep all flexible constraints at their original
          but the lengths may exhibit some drift.

Initializing Parallel LINear Constraint Solver
Linking all bonded interactions to atoms
There are 161872 inter charge-group exclusions,
will use an extra communication step for exclusion forces for PME

The initial number of communication pulses is: X 1
The initial domain decomposition cell size is: X 1.83 nm

The maximum allowed distance for charge groups involved in interactions is:
                  non-bonded interactions           1.200 nm
(the following are initial values, they could change due to box
             two-body bonded interactions  (-rdd)   1.200 nm
           multi-body bonded interactions  (-rdd)   1.200 nm
   atoms separated by up to 5 constraints  (-rcon)  1.826 nm

When dynamic load balancing gets turned on, these settings will change to:
The maximum number of communication pulses is: X 1
The minimum size for domain decomposition cells is 1.200 nm
The requested allowed shrink of DD cells (option -dds) is: 0.80
The allowed shrink of domain decomposition cells is: X 0.66
The maximum allowed distance for charge groups involved in interactions is:
                  non-bonded interactions           1.200 nm
             two-body bonded interactions  (-rdd)   1.200 nm
           multi-body bonded interactions  (-rdd)   1.200 nm
   atoms separated by up to 5 constraints  (-rcon)  1.200 nm

Making 1D domain decomposition grid 4 x 1 x 1, home cell index 0 0 0

Center of mass motion removal mode is Linear
We have the following groups for center of mass motion removal:
   0:  Protein_LIG_POPC
   1:  Water_and_ions

G. Bussi, D. Donadio and M. Parrinello
Canonical sampling through velocity rescaling
J. Chem. Phys. 126 (2007) pp. 014101
-------- -------- --- Thank You --- -------- --------

Reply via email to