Hi, How about GPU emulation or CPU-only runs? Also, please try setting the number of therads to 1 (-ntomp 1).
-- Szilárd On Mon, Dec 17, 2012 at 6:01 PM, Albert <mailmd2...@gmail.com> wrote: > hello: > > I reduced the GPU to two, and it said: > > Back Off! I just backed up nvt.log to ./#nvt.log.1# > Reading file nvt.tpr, VERSION 4.6-dev-20121004-5d6c49d (single precision) > > NOTE: GPU(s) found, but the current simulation can not use GPUs > To use a GPU, set the mdp option: cutoff-scheme = Verlet > (for quick performance testing you can use the -testverlet option) > > Using 2 MPI processes > > 4 GPUs detected on host CUDANodeA: > #0: NVIDIA GeForce GTX 590, compute cap.: 2.0, ECC: no, stat: compatible > #1: NVIDIA GeForce GTX 590, compute cap.: 2.0, ECC: no, stat: compatible > #2: NVIDIA GeForce GTX 590, compute cap.: 2.0, ECC: no, stat: compatible > #3: NVIDIA GeForce GTX 590, compute cap.: 2.0, ECC: no, stat: compatible > > Making 1D domain decomposition 2 x 1 x 1 > > * WARNING * WARNING * WARNING * WARNING * WARNING * WARNING * > We have just committed the new CPU detection code in this branch, > and will commit new SSE/AVX kernels in a few days. However, this > means that currently only the NxN kernels are accelerated! > In the mean time, you might want to avoid production runs in 4.6. > > > when I run it with single GPU, it produced lots of pdb file with prefix > "step", and then it crashed with messages: > > Wrote pdb files with previous and current coordinates > Warning: 1-4 interaction between 4674 and 4706 at distance 434.986 which > is larger than the 1-4 table size 2.200 nm > These are ignored for the rest of the simulation > This usually means your system is exploding, > if not, you should increase table-extension in your mdp file > or with user tables increase the table size > [CUDANodeA:20659] *** Process received signal *** > [CUDANodeA:20659] Signal: Segmentation fault (11) > [CUDANodeA:20659] Signal code: Address not mapped (1) > [CUDANodeA:20659] Failing at address: 0xc7aa00dc > [CUDANodeA:20659] [ 0] /lib64/libpthread.so.0(+**0xf2d0) [0x2ab25c76d2d0] > [CUDANodeA:20659] [ 1] /opt/gromacs-4.6/lib/libmd_**mpi.so.6(+0x11020f) > [0x2ab259e0720f] > [CUDANodeA:20659] [ 2] /opt/gromacs-4.6/lib/libmd_**mpi.so.6(+0x111c94) > [0x2ab259e08c94] > [CUDANodeA:20659] [ 3] > /opt/gromacs-4.6/lib/libmd_**mpi.so.6(gmx_pme_do+0x1d2e) > [0x2ab259e0cbae] > [CUDANodeA:20659] [ 4] /opt/gromacs-4.6/lib/libmd_** > mpi.so.6(do_force_lowlevel+**0x1eef) [0x2ab259ddd62f] > [CUDANodeA:20659] [ 5] /opt/gromacs-4.6/lib/libmd_** > mpi.so.6(do_force_cutsGROUP+**0x1495) [0x2ab259e72a45] > [CUDANodeA:20659] [ 6] mdrun_mpi(do_md+0x8133) [0x4334c3] > [CUDANodeA:20659] [ 7] mdrun_mpi(mdrunner+0x19e9) [0x411639] > [CUDANodeA:20659] [ 8] mdrun_mpi(main+0x17db) [0x4373db] > [CUDANodeA:20659] [ 9] /lib64/libc.so.6(__libc_start_**main+0xfd) > [0x2ab25c999bfd] > [CUDANodeA:20659] [10] mdrun_mpi() [0x407f09] > [CUDANodeA:20659] *** End of error message *** > > [1] Segmentation fault mdrun_mpi -v -s nvt.tpr -c nvt.gro -g > nvt.log -x nvt.xtc > > > > here is the .mdp file I used: > > title = NVT equilibration for OR-POPC system > define = -DPOSRES -DPOSRES_LIG ; Protein is position restrained > (uses the posres.itp file information) > ; Parameters describing the details of the NVT simulation protocol > integrator = md ; Algorithm ("md" = molecular dynamics > [leap-frog integrator]; "md-vv" = md using velocity verlet; sd = stochastic > dynamics) > dt = 0.002 ; Time-step (ps) > nsteps = 250000 ; Number of steps to run (0.002 * 250000 = > 500 ps) > > ; Parameters controlling output writing > nstxout = 0 ; Write coordinates to output .trr file > every 2 ps > nstvout = 0 ; Write velocities to output .trr file > every 2 ps > nstfout = 0 > > nstxtcout = 1000 > nstenergy = 1000 ; Write energies to output .edr file every > 2 ps > nstlog = 1000 ; Write output to .log file every 2 ps > > ; Parameters describing neighbors searching and details about interaction > calculations > ns_type = grid ; Neighbor list search method (simple, > grid) > nstlist = 50 ; Neighbor list update frequency (after > every given number of steps) > rlist = 1.2 ; Neighbor list search cut-off distance > (nm) > rlistlong = 1.4 > rcoulomb = 1.2 ; Short-range Coulombic interactions > cut-off distance (nm) > rvdw = 1.2 ; Short-range van der Waals cutoff > distance (nm) > pbc = xyz ; Direction in which to use Perodic > Boundary Conditions (xyz, xy, no) > cutoff-scheme =Verlet ; GPU running > > ; Parameters for treating bonded interactions > continuation = no ; Whether a fresh start or a continuation > from a previous run (yes/no) > constraint_algorithm = LINCS ; Constraint algorithm (LINCS / SHAKE) > constraints = all-bonds ; Which bonds/angles to constrain > (all-bonds / hbonds / none / all-angles / h-angles) > lincs_iter = 1 ; Number of iterations to correct for > rotational lengthening in LINCS (related to accuracy) > lincs_order = 4 ; Highest order in the expansion of the > constraint coupling matrix (related to accuracy) > > ; Parameters for treating electrostatic interactions > coulombtype = PME ; Long range electrostatic interactions > treatment (cut-off, Ewald, PME) > pme_order = 4 ; Interpolation order for PME (cubic > interpolation is represented by 4) > fourierspacing = 0.12 ; Maximum grid spacing for FFT grid using > PME (nm) > > ; Temperature coupling parameters > tcoupl = V-rescale ; Modified Berendsen thermostat > using velocity rescaling > tc-grps = Protein_LIG POPC Water_and_ions ; Define groups to be > coupled separately to temperature bath > tau_t = 0.1 0.1 0.1 ; Group-wise coupling time > constant (ps) > ref_t = 303 303 303 ; Group-wise reference temperature > (K) > > ; Pressure coupling parameters > pcoupl = no ; Under NVT conditions pressure coupling > is not done > > ; Miscellaneous control parameters > ; Dispersion correction > DispCorr = EnerPres ; Dispersion corrections for Energy and > Pressure for vdW cut-off > ; Initial Velocity Generation > gen_vel = yes ; Generate velocities from Maxwell > distribution at given temperature > gen_temp = 303 ; Specific temperature for Maxwell > distribution (K) > gen_seed = -1 ; Use random seed for velocity generation > (integer; -1 means seed is calculated from the process ID number) > ; Centre of mass (COM) motion removal relative to the specified groups > nstcomm = 1 ; COM removal frequency (steps) > comm_mode = Linear ; Remove COM translation (linear / > angular / no) > comm_grps = Protein_LIG_POPC Water_and_ions ; COM removal relative > to the specified groups > > THX > > > > > > > On 12/17/2012 05:45 PM, Szilárd Páll wrote: > >> Hi, >> >> That unfortunately tell exactly about the reason why mdrun is stuck. Can >> you reproduce the issue on another machines or with different launch >> configurations? At which step does it get stuck (-stepout 1 can help)? >> >> Please try the following: >> - try running on a single GPU; >> - try running on CPUs only (-nb cpu and to match closer the GPU setup with >> -ntomp 12); >> - try running in GPU emulation mode with the GMX_EMULATE_GPU=1 env. var >> set (and to match closer the GPU setup with -ntomp 12) >> - provide a backtrace (using gdb). >> >> Cheers, >> >> -- >> Szilárd >> >> >> >> On Mon, Dec 17, 2012 at 5:37 PM, Albert <mailmd2...@gmail.com> wrote: >> >> hello: >>> >>> I am running GMX-4.6 beta2 GPU work in a 24 CPU core workstation with >>> two >>> GTX590, it stacked there without any output i.e the .xtc file size is >>> always 0 after hours of running. Here is the md.log file I found: >>> >>> >>> Using CUDA 8x8x8 non-bonded kernels >>> >>> Potential shift: LJ r^-12: 0.112 r^-6 0.335, Ewald 1.000e-05 >>> Initialized non-bonded Ewald correction tables, spacing: 7.82e-04 size: >>> 1536 >>> >>> Removing pbc first time >>> Pinning to Hyper-Threading cores with 12 physical cores in a compute node >>> There are 1 flexible constraints >>> >>> WARNING: step size for flexible constraining = 0 >>> All flexible constraints will be rigid. >>> Will try to keep all flexible constraints at their original >>> length, >>> but the lengths may exhibit some drift. >>> >>> Initializing Parallel LINear Constraint Solver >>> Linking all bonded interactions to atoms >>> There are 161872 inter charge-group exclusions, >>> will use an extra communication step for exclusion forces for PME >>> >>> The initial number of communication pulses is: X 1 >>> The initial domain decomposition cell size is: X 1.83 nm >>> >>> The maximum allowed distance for charge groups involved in interactions >>> is: >>> non-bonded interactions 1.200 nm >>> (the following are initial values, they could change due to box >>> deformation) >>> two-body bonded interactions (-rdd) 1.200 nm >>> multi-body bonded interactions (-rdd) 1.200 nm >>> atoms separated by up to 5 constraints (-rcon) 1.826 nm >>> >>> When dynamic load balancing gets turned on, these settings will change >>> to: >>> The maximum number of communication pulses is: X 1 >>> The minimum size for domain decomposition cells is 1.200 nm >>> The requested allowed shrink of DD cells (option -dds) is: 0.80 >>> The allowed shrink of domain decomposition cells is: X 0.66 >>> The maximum allowed distance for charge groups involved in interactions >>> is: >>> non-bonded interactions 1.200 nm >>> two-body bonded interactions (-rdd) 1.200 nm >>> multi-body bonded interactions (-rdd) 1.200 nm >>> atoms separated by up to 5 constraints (-rcon) 1.200 nm >>> >>> Making 1D domain decomposition grid 4 x 1 x 1, home cell index 0 0 0 >>> >>> Center of mass motion removal mode is Linear >>> We have the following groups for center of mass motion removal: >>> 0: Protein_LIG_POPC >>> 1: Water_and_ions >>> >>> ++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++ >>> G. Bussi, D. Donadio and M. Parrinello >>> Canonical sampling through velocity rescaling >>> J. Chem. Phys. 126 (2007) pp. 014101 >>> -------- -------- --- Thank You --- -------- -------- >>> >>> >>> >>> THX >>> -- >>> gmx-users mailing list gmx-users@gromacs.org >>> http://lists.gromacs.org/****mailman/listinfo/gmx-users<http://lists.gromacs.org/**mailman/listinfo/gmx-users> >>> <htt**p://lists.gromacs.org/mailman/**listinfo/gmx-users<http://lists.gromacs.org/mailman/listinfo/gmx-users> >>> > >>> * Please search the archive at http://www.gromacs.org/** >>> Support/Mailing_Lists/Search<h**ttp://www.gromacs.org/Support/** >>> Mailing_Lists/Search<http://www.gromacs.org/Support/Mailing_Lists/Search>>before >>> posting! >>> >>> * Please don't post (un)subscribe requests to the list. Use the www >>> interface or send it to gmx-users-requ...@gromacs.org. >>> * Can't post? Read >>> http://www.gromacs.org/****Support/Mailing_Lists<http://www.gromacs.org/**Support/Mailing_Lists> >>> <http://**www.gromacs.org/Support/**Mailing_Lists<http://www.gromacs.org/Support/Mailing_Lists> >>> > >>> >>> > -- > gmx-users mailing list gmx-users@gromacs.org > http://lists.gromacs.org/**mailman/listinfo/gmx-users<http://lists.gromacs.org/mailman/listinfo/gmx-users> > * Please search the archive at http://www.gromacs.org/** > Support/Mailing_Lists/Search<http://www.gromacs.org/Support/Mailing_Lists/Search>before > posting! > * Please don't post (un)subscribe requests to the list. Use the www > interface or send it to gmx-users-requ...@gromacs.org. > * Can't post? Read > http://www.gromacs.org/**Support/Mailing_Lists<http://www.gromacs.org/Support/Mailing_Lists> > -- gmx-users mailing list gmx-users@gromacs.org http://lists.gromacs.org/mailman/listinfo/gmx-users * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/Search before posting! * Please don't post (un)subscribe requests to the list. Use the www interface or send it to gmx-users-requ...@gromacs.org. * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists