Re: [gmx-users] MPICH or LAM/MPI

Arneh Babakhani Tue, 27 Jun 2006 09:08:03 -0700

Hi Carsten, thanks for the reply, good question.

I can run it fine on as much as 4 processors, but nothing beyond that.Any idea why?


Arneh

Carsten Kutzner wrote:

Hi Arneh,

do you have the same problem on less processors? Can you run on 1, 2and 4

procs?

Carsten


Arneh Babakhani wrote:

Hi All,

Ok, I've successfully created the mpi version of mdrun. Am now trying to
run my simulation on 32 processors. After processing with grompp and the
option -np 32, I use mdrun with the following script (where CONF is the
input file, NPROC is the number of processors):


/opt/mpich/intel/bin/mpirun -v -np $NPROC -machinefile \$TMPDIR/machines
~/gromacs-mpi/bin/mdrun -np $NPROC -s $CONF -o $CONF -c After$CONF -e
$CONF -g $CONF >& $CONF.job


Everything seems to start up ok, but then GMX stalls (it never actually
starts the simulation. It stalls for about 7 minutes then completely
aborts).  I've pasted the log file below, which shows that the
simulation stalls at Step 0, but there's no discernible error (only
claims that AMD 3D Now support is not available, which makes sense b/c
I'm not running on AMD).

If you scroll further down, I've also pasted the job file, FullMD7.job,
which is normally empty if everything is running smoothly.  There seems
to be some errors at the end, but they're rather cryptic to me, nor am I
sure if this is a cause or effect.  If anyone has any suggestions, I'd
love to hear them.

Thanks,

Arneh


*****FullMD70.log******

Log file opened on Mon Jun 26 21:51:55 2006
Host: compute-0-1.local  pid: 13353  nodeid: 0  nnodes:  32
The Gromacs distribution was built Wed Jun 21 16:01:01 PDT 2006 by
[EMAIL PROTECTED] (Linux 2.6.9-22.ELsmp i686)


                        :-)  G  R  O  M  A  C  S  (-:

                  Groningen Machine for Chemical Simulation

                           :-)  VERSION 3.3.1  (-:

Written by David van der Spoel, Erik Lindahl, Berk Hess, andothers.

      Copyright (c) 1991-2000, University of Groningen, The Netherlands.
            Copyright (c) 2001-2006, The GROMACS development team,
           check out http://www.gromacs.org for more information.

        This program is free software; you can redistribute it and/or
         modify it under the terms of the GNU General Public License
        as published by the Free Software Foundation; either version 2
            of the License, or (at your option) any later version.

                :-)  /home/ababakha/gromacs-mpi/bin/mdrun  (-:


++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
E. Lindahl and B. Hess and D. van der Spoel
GROMACS 3.0: A package for molecular simulation and trajectory analysis
J. Mol. Mod. 7 (2001) pp. 306-317
-------- -------- --- Thank You --- -------- --------


++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
H. J. C. Berendsen, D. van der Spoel and R. van Drunen
GROMACS: A message-passing parallel molecular dynamics implementation
Comp. Phys. Comm. 91 (1995) pp. 43-56
-------- -------- --- Thank You --- -------- --------

CPU=  0, lastcg=  515, targetcg= 5799, myshift=   14
CPU=  1, lastcg= 1055, targetcg= 6339, myshift=   15
CPU=  2, lastcg= 1595, targetcg= 6879, myshift=   16
CPU=  3, lastcg= 2135, targetcg= 7419, myshift=   17
CPU=  4, lastcg= 2675, targetcg= 7959, myshift=   18
CPU=  5, lastcg= 3215, targetcg= 8499, myshift=   19
CPU=  6, lastcg= 3755, targetcg= 9039, myshift=   20
CPU=  7, lastcg= 4112, targetcg= 9396, myshift=   20
CPU=  8, lastcg= 4381, targetcg= 9665, myshift=   20
CPU=  9, lastcg= 4650, targetcg= 9934, myshift=   20
CPU= 10, lastcg= 4919, targetcg=10203, myshift=   20
CPU= 11, lastcg= 5188, targetcg=10472, myshift=   20
CPU= 12, lastcg= 5457, targetcg=  174, myshift=   20
CPU= 13, lastcg= 5726, targetcg=  443, myshift=   19
CPU= 14, lastcg= 5995, targetcg=  712, myshift=   19
CPU= 15, lastcg= 6264, targetcg=  981, myshift=   18
CPU= 16, lastcg= 6533, targetcg= 1250, myshift=   18
CPU= 17, lastcg= 6802, targetcg= 1519, myshift=   17
CPU= 18, lastcg= 7071, targetcg= 1788, myshift=   17
CPU= 19, lastcg= 7340, targetcg= 2057, myshift=   16
CPU= 20, lastcg= 7609, targetcg= 2326, myshift=   16
CPU= 21, lastcg= 7878, targetcg= 2595, myshift=   15
CPU= 22, lastcg= 8147, targetcg= 2864, myshift=   15
CPU= 23, lastcg= 8416, targetcg= 3133, myshift=   14
CPU= 24, lastcg= 8685, targetcg= 3402, myshift=   14
CPU= 25, lastcg= 8954, targetcg= 3671, myshift=   13
CPU= 26, lastcg= 9223, targetcg= 3940, myshift=   13
CPU= 27, lastcg= 9492, targetcg= 4209, myshift=   13
CPU= 28, lastcg= 9761, targetcg= 4478, myshift=   13
CPU= 29, lastcg=10029, targetcg= 4746, myshift=   13
CPU= 30, lastcg=10298, targetcg= 5015, myshift=   13
CPU= 31, lastcg=10566, targetcg= 5283, myshift=   13
nsb->shift =  20, nsb->bshift=  0
Listing Scalars
nsb->nodeid:         0
nsb->nnodes:     32
nsb->cgtotal: 10567
nsb->natoms:  25925
nsb->shift:      20
nsb->bshift:      0
Nodeid   index  homenr  cgload  workload
    0       0     788     516       516
    1     788     828    1056      1056
    2    1616     828    1596      1596
    3    2444     828    2136      2136
    4    3272     828    2676      2676
    5    4100     828    3216      3216
    6    4928     828    3756      3756
    7    5756     807    4113      4113
    8    6563     807    4382      4382
    9    7370     807    4651      4651
   10    8177     807    4920      4920
   11    8984     807    5189      5189
   12    9791     807    5458      5458
   13   10598     807    5727      5727
   14   11405     807    5996      5996
   15   12212     807    6265      6265
   16   13019     807    6534      6534
   17   13826     807    6803      6803
   18   14633     807    7072      7072
   19   15440     807    7341      7341
   20   16247     807    7610      7610
   21   17054     807    7879      7879
   22   17861     807    8148      8148
   23   18668     807    8417      8417
   24   19475     807    8686      8686
   25   20282     807    8955      8955
   26   21089     807    9224      9224
   27   21896     807    9493      9493
   28   22703     807    9762      9762
   29   23510     804   10030     10030
   30   24314     807   10299     10299
   31   25121     804   10567     10567

parameters of the run:
  integrator           = md
  nsteps               = 1500000
  init_step            = 0
  ns_type              = Grid
  nstlist              = 10
  ndelta               = 2
  bDomDecomp           = FALSE
  decomp_dir           = 0
  nstcomm              = 1
  comm_mode            = Linear
  nstcheckpoint        = 1000
  nstlog               = 10
  nstxout              = 500
  nstvout              = 1000
  nstfout              = 0
  nstenergy            = 10
  nstxtcout            = 0
  init_t               = 0
  delta_t              = 0.002
  xtcprec              = 1000
  nkx                  = 64
  nky                  = 64
  nkz                  = 80
  pme_order            = 6
  ewald_rtol           = 1e-05
  ewald_geometry       = 0
  epsilon_surface      = 0
  optimize_fft         = TRUE
  ePBC                 = xyz
  bUncStart            = FALSE
  bShakeSOR            = FALSE
  etc                  = Berendsen
  epc                  = Berendsen
  epctype              = Semiisotropic
  tau_p                = 1
  ref_p (3x3):
     ref_p[    0]={ 1.00000e+00,  0.00000e+00,  0.00000e+00}
     ref_p[    1]={ 0.00000e+00,  1.00000e+00,  0.00000e+00}
     ref_p[    2]={ 0.00000e+00,  0.00000e+00,  1.00000e+00}
  compress (3x3):
     compress[    0]={ 4.50000e-05,  0.00000e+00,  0.00000e+00}
     compress[    1]={ 0.00000e+00,  4.50000e-05,  0.00000e+00}
     compress[    2]={ 0.00000e+00,  0.00000e+00,  1.00000e-30}
  andersen_seed        = 815131
  rlist                = 0.9
  coulombtype          = PME
  rcoulomb_switch      = 0
  rcoulomb             = 0.9
  vdwtype              = Cut-off
  rvdw_switch          = 0
  rvdw                 = 1.4
  epsilon_r            = 1
  epsilon_rf           = 1
  tabext               = 1
  gb_algorithm         = Still
  nstgbradii           = 1
  rgbradii             = 2
  gb_saltconc          = 0
  implicit_solvent     = No
  DispCorr             = No
  fudgeQQ              = 1
  free_energy          = no
  init_lambda          = 0
  sc_alpha             = 0
  sc_power             = 0
  sc_sigma             = 0.3
  delta_lambda         = 0
  disre_weighting      = Conservative
  disre_mixed          = FALSE
  dr_fc                = 1000
  dr_tau               = 0
  nstdisreout          = 100
  orires_fc            = 0
  orires_tau           = 0
  nstorireout          = 100
  dihre-fc             = 1000
  dihre-tau            = 0
  nstdihreout          = 100
  em_stepsize          = 0.01
  em_tol               = 10
  niter                = 20
  fc_stepsize          = 0
  nstcgsteep           = 1000
  nbfgscorr            = 10
  ConstAlg             = Lincs
  shake_tol            = 1e-04
  lincs_order          = 4
  lincs_warnangle      = 30
  lincs_iter           = 1
  bd_fric              = 0
  ld_seed              = 1993
  cos_accel            = 0
  deform (3x3):
     deform[    0]={ 0.00000e+00,  0.00000e+00,  0.00000e+00}
     deform[    1]={ 0.00000e+00,  0.00000e+00,  0.00000e+00}
     deform[    2]={ 0.00000e+00,  0.00000e+00,  0.00000e+00}
  userint1             = 0
  userint2             = 0
  userint3             = 0
  userint4             = 0
  userreal1            = 0
  userreal2            = 0
  userreal3            = 0
  userreal4            = 0
grpopts:
  nrdf:         11903.3     39783.7     285.983
  ref_t:             310         310         310
  tau_t:             0.1         0.1         0.1
anneal:                  No          No          No
ann_npoints:               0           0           0
  acc:               0           0           0
  nfreeze:           N           N           N
  energygrp_flags[  0]: 0
  efield-x:
     n = 0
  efield-xt:
     n = 0
  efield-y:
     n = 0
  efield-yt:
     n = 0
  efield-z:
     n = 0
  efield-zt:
     n = 0
  bQMMM                = FALSE
  QMconstraints        = 0
  QMMMscheme           = 0
  scalefactor          = 1
qm_opts:
  ngQM                 = 0
Max number of graph edges per atom is 4
Table routines are used for coulomb: TRUE
Table routines are used for vdw:     FALSE
Using a Gaussian width (1/beta) of 0.288146 nm for Ewald
Cut-off's:   NS: 0.9   Coulomb: 0.9   LJ: 1.4
System total charge: 0.000
Generated table with 1200 data points for Ewald.
Tabscale = 500 points/nm
Generated table with 1200 data points for LJ6.
Tabscale = 500 points/nm
Generated table with 1200 data points for LJ12.
Tabscale = 500 points/nm
Generated table with 500 data points for 1-4 COUL.
Tabscale = 500 points/nm
Generated table with 500 data points for 1-4 LJ6.
Tabscale = 500 points/nm
Generated table with 500 data points for 1-4 LJ12.
Tabscale = 500 points/nm

Enabling SPC water optimization for 6631 molecules.

Will do PME sum in reciprocal space.

++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++

U. Essman, L. Perela, M. L. Berkowitz, T. Darden, H. Lee and L. G.Pedersen

A smooth particle mesh Ewald method
J. Chem. Phys. 103 (1995) pp. 8577-8592
-------- -------- --- Thank You --- -------- --------

Parallelized PME sum used.
PARALLEL FFT DATA:
  local_nx:                   2  local_x_start:                   0
  local_ny_after_transpose:   2  local_y_start_after_transpose    0
Removing pbc first time
Done rmpbc
Center of mass motion removal mode is Linear
We have the following groups for center of mass motion removal:
 0:  rest, initial mass: 207860
There are: 788 Atoms

Constraining the starting coordinates (step -2)

++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
H. J. C. Berendsen, J. P. M. Postma, A. DiNola and J. R. Haak
Molecular dynamics with coupling to an external bath
J. Chem. Phys. 81 (1984) pp. 3684-3690
-------- -------- --- Thank You --- -------- --------


++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
B. Hess and H. Bekker and H. J. C. Berendsen and J. G. E. M. Fraaije
LINCS: A Linear Constraint Solver for molecular simulations
J. Comp. Chem. 18 (1997) pp. 1463-1472
-------- -------- --- Thank You --- -------- --------


Initializing LINear Constraint Solver
 number of constraints is 776
 average number of constraints coupled to one constraint is 2.5

  Rel. Constraint Deviation:  Max    between atoms     RMS
      Before LINCS         0.008664     87     88   0.003001
       After LINCS         0.000036     95     96   0.000005


Constraining the coordinates at t0-dt (step -1)
  Rel. Constraint Deviation:  Max    between atoms     RMS
      Before LINCS         0.093829     12     13   0.009919
       After LINCS         0.000131     11     14   0.000021

Started mdrun on node 0 Mon Jun 26 21:52:34 2006
Initial temperature: 310.388 K
          Step           Time         Lambda
             0        0.00000        0.00000

Grid: 8 x 8 x 13 cells
Configuring nonbonded kernels...
Testing AMD 3DNow support... not present.
Testing ia32 SSE support... present.






********FullMD7.job***************

*running /home/ababakha/gromacs-mpi/bin/mdrun on 32 LINUX ch_p4processors

Created /home/ababakha/SMDPeptideSimulation/CapParSMD/FullMD/PI12637
NNODES=32, MYRANK=0, HOSTNAME=compute-0-1.local
NNODES=32, MYRANK=1, HOSTNAME=compute-0-1.local
NNODES=32, MYRANK=30, HOSTNAME=compute-0-29.local
NNODES=32, MYRANK=24, HOSTNAME=compute-0-12.local
NNODES=32, MYRANK=28, HOSTNAME=compute-0-30.local
NNODES=32, MYRANK=3, HOSTNAME=compute-0-26.local
NNODES=32, MYRANK=14, HOSTNAME=compute-0-22.local
NNODES=32, MYRANK=6, HOSTNAME=compute-0-31.local
NNODES=32, MYRANK=8, HOSTNAME=compute-0-20.local
NNODES=32, MYRANK=7, HOSTNAME=compute-0-31.local
NNODES=32, MYRANK=18, HOSTNAME=compute-0-27.local
NNODES=32, MYRANK=2, HOSTNAME=compute-0-26.local
NNODES=32, MYRANK=23, HOSTNAME=compute-0-4.local
NNODES=32, MYRANK=31, HOSTNAME=compute-0-29.local
NNODES=32, MYRANK=5, HOSTNAME=compute-0-21.local
NNODES=32, MYRANK=27, HOSTNAME=compute-0-3.local
NNODES=32, MYRANK=4, HOSTNAME=compute-0-21.local
NNODES=32, MYRANK=20, HOSTNAME=compute-0-8.local
NNODES=32, MYRANK=11, HOSTNAME=compute-0-7.local
NNODES=32, MYRANK=9, HOSTNAME=compute-0-20.local
NNODES=32, MYRANK=12, HOSTNAME=compute-0-19.local
NNODES=32, MYRANK=13, HOSTNAME=compute-0-19.local
NNODES=32, MYRANK=21, HOSTNAME=compute-0-8.local
NNODES=32, MYRANK=22, HOSTNAME=compute-0-4.local
NNODES=32, MYRANK=10, HOSTNAME=compute-0-7.local
NNODES=32, MYRANK=17, HOSTNAME=compute-0-25.local
NNODES=32, MYRANK=25, HOSTNAME=compute-0-12.local
NNODES=32, MYRANK=15, HOSTNAME=compute-0-22.local
NNODES=32, MYRANK=29, HOSTNAME=compute-0-30.local
NNODES=32, MYRANK=19, HOSTNAME=compute-0-27.local
NNODES=32, MYRANK=26, HOSTNAME=compute-0-3.local
NNODES=32, MYRANK=16, HOSTNAME=compute-0-25.local
NODEID=26 argc=13
NODEID=25 argc=13
NODEID=24 argc=13
NODEID=23 argc=13
NODEID=22 argc=13
NODEID=21 argc=13
NODEID=20 argc=13
NODEID=19 argc=13
NODEID=18 argc=13
NODEID=13 argc=13
NODEID=17 argc=13
NODEID=15 argc=13
NODEID=14 argc=13
NODEID=16 argc=13
NODEID=0 argc=13
NODEID=12 argc=13
NODEID=6 argc=13
NODEID=11 argc=13
NODEID=1 argc=13
NODEID=10 argc=13
NODEID=5 argc=13
NODEID=30 argc=13
NODEID=7 argc=13
NODEID=27 argc=13
NODEID=31 argc=13
NODEID=2 argc=13
NODEID=9 argc=13
NODEID=28 argc=13
NODEID=4 argc=13
NODEID=29 argc=13
NODEID=8 argc=13
NODEID=3 argc=13
                        :-)  G  R  O  M  A  C  S  (-:

                  Groningen Machine for Chemical Simulation

                           :-)  VERSION 3.3.1  (-:

Written by David van der Spoel, Erik Lindahl, Berk Hess, andothers.

      Copyright (c) 1991-2000, University of Groningen, The Netherlands.
            Copyright (c) 2001-2006, The GROMACS development team,
           check out http://www.gromacs.org for more information.

        This program is free software; you can redistribute it and/or
         modify it under the terms of the GNU General Public License
        as published by the Free Software Foundation; either version 2
            of the License, or (at your option) any later version.

                :-)  /home/ababakha/gromacs-mpi/bin/mdrun  (-:

Option     Filename  Type         Description
------------------------------------------------------------
 -s    FullMD7.tpr  Input        Generic run input: tpr tpb tpa xml
 -o    FullMD7.trr  Output       Full precision trajectory: trr trj
 -x       traj.xtc  Output, Opt. Compressed trajectory (portable xdr
format)
 -c AfterFullMD7.gro  Output       Generic structure: gro g96 pdb xml
 -e    FullMD7.edr  Output       Generic energy: edr ene
 -g    FullMD7.log  Output       Log file
-dgdl      dgdl.xvg  Output, Opt. xvgr/xmgr file
-field    field.xvg  Output, Opt. xvgr/xmgr file
-table    table.xvg  Input, Opt.  xvgr/xmgr file
-tablep  tablep.xvg  Input, Opt.  xvgr/xmgr file
-rerun    rerun.xtc  Input, Opt.  Generic trajectory: xtc trr trj gro
g96 pdb
-tpi        tpi.xvg  Output, Opt. xvgr/xmgr file
-ei        sam.edi  Input, Opt.  ED sampling input
-eo        sam.edo  Output, Opt. ED sampling output
 -j       wham.gct  Input, Opt.  General coupling stuff
-jo        bam.gct  Output, Opt. General coupling stuff
-ffout      gct.xvg  Output, Opt. xvgr/xmgr file
-devout   deviatie.xvg  Output, Opt. xvgr/xmgr file
-runav  runaver.xvg  Output, Opt. xvgr/xmgr file
-pi       pull.ppa  Input, Opt.  Pull parameters
-po    pullout.ppa  Output, Opt. Pull parameters
-pd       pull.pdo  Output, Opt. Pull data output
-pn       pull.ndx  Input, Opt.  Index file
-mtx         nm.mtx  Output, Opt. Hessian matrix
-dn     dipole.ndx  Output, Opt. Index file

     Option   Type  Value  Description
------------------------------------------------------
     -[no]h   bool     no  Print help info and quit

-[no]X bool no Use dialog box GUI to edit command lineoptions

      -nice    int     19  Set the nicelevel
    -deffnm string         Set the default filename for all file options

-[no]xvgr bool yes Add specific codes (legends etc.) in theoutput

                           xvg files for the xmgrace program
        -np    int     32  Number of nodes, must be the same as used for
                           grompp
        -nt    int      1  Number of threads to start on each node
     -[no]v   bool     no  Be loud and noisy
-[no]compact   bool    yes  Write a compact log file
-[no]sepdvdl   bool     no  Write separate V and dVdl terms for each
                           interaction type and node to the log file(s)

-[no]multi bool no Do multiple simulations in parallel (onlywith

                           -np > 1)
    -replex    int      0  Attempt replica exchange every # steps

-reseed int -1 Seed for replica exchange, -1 is generatea seed

  -[no]glas   bool     no  Do glass simulation with special long range
                           corrections

-[no]ionize bool no Do a simulation including the effect of anX-Ray

                           bombardment on your system

Reading file FullMD7.tpr, VERSION 3.3.1 (single precision)
starting mdrun 'My membrane with peptides in water'
1500000 steps,   3000.0 ps.

p30_10831:  p4_error: Timeout in establishing connection to remote
process: 0

rm_l_30_10832: (341.608281) net_send: could not write to fd=5, errno= 32rm_l_31_10896: (341.269706) net_send: could not write to fd=5, errno= 32

p30_10831: (343.634411) net_send: could not write to fd=5, errno = 32
p31_10895: (343.296105) net_send: could not write to fd=5, errno = 32
p0_13353:  p4_error: net_recv read:  probable EOF on socket: 1
Killed by signal 2.
Killed by signal 2.
Killed by signal 2.
Killed by signal 2.
Killed by signal 2.
p0_13353: (389.926083) net_send: could not write to fd=4, errno = 32


_______________________________________________
gmx-users mailing list    gmx-users@gromacs.org
http://www.gromacs.org/mailman/listinfo/gmx-users

Please don't post (un)subscribe requests to the list. Use the wwwinterface or send it to [EMAIL PROTECTED]

Can't post? Read http://www.gromacs.org/mailing_lists/users.php

_______________________________________________
gmx-users mailing list    gmx-users@gromacs.org
http://www.gromacs.org/mailman/listinfo/gmx-users

Please don't post (un)subscribe requests to the list. Use thewww interface or send it to [EMAIL PROTECTED]

Can't post? Read http://www.gromacs.org/mailing_lists/users.php

Re: [gmx-users] MPICH or LAM/MPI

Reply via email to