From: David van der Spoel <[EMAIL PROTECTED]>
Reply-To: Discussion list for GROMACS users <gmx-users@gromacs.org>
To: Discussion list for GROMACS users <gmx-users@gromacs.org>
Subject: Re: [gmx-users] GROMACS Parallel Runs
Date: Sun, 01 Oct 2006 19:58:48 +0200

Sunny wrote:
Hi,

I am using GROMACS 3.3.1 parallel runs on an AIX supercomputing system. My simulation can successfully run on 16 and 32 CPUs (as well as below 16 CPUs). When running on 64 CPUs, however, segmentation fault occurs in multiple tasks from very beginning of the simulation. I'd like know what causes the failure and whether there is any solution to fix the failure.


please supply more details, like system size, PME details etc.


Thanks,

Sunny
David.


Hi all,

Thanks for your replies. The followings are the full configuration info of my simulatione found in md0.log and the error message given in the .err. I'm sorry for the tedious list.

Many thanks,

Sunny

CONFIGURATION INFO:

CPU=  0, lastcg=  298, targetcg= 7732, myshift=   23
CPU=  1, lastcg=  633, targetcg= 8066, myshift=   23
CPU=  2, lastcg=  970, targetcg= 8404, myshift=   23
CPU=  3, lastcg= 1298, targetcg= 8732, myshift=   23
CPU=  4, lastcg= 1629, targetcg= 9062, myshift=   24
CPU=  5, lastcg= 1959, targetcg= 9392, myshift=   25
CPU=  6, lastcg= 2296, targetcg= 9730, myshift=   26
CPU=  7, lastcg= 2624, targetcg=10058, myshift=   27
CPU=  8, lastcg= 2955, targetcg=10388, myshift=   28
CPU=  9, lastcg= 3285, targetcg=10718, myshift=   29
CPU= 10, lastcg= 3622, targetcg=11056, myshift=   30
CPU= 11, lastcg= 3950, targetcg=11384, myshift=   31
CPU= 12, lastcg= 4281, targetcg=11714, myshift=   32
CPU= 13, lastcg= 4611, targetcg=12044, myshift=   33
CPU= 14, lastcg= 4948, targetcg=12382, myshift=   34
CPU= 15, lastcg= 5276, targetcg=12710, myshift=   35
CPU= 16, lastcg= 5607, targetcg=13040, myshift=   36
CPU= 17, lastcg= 5937, targetcg=13370, myshift=   37
CPU= 18, lastcg= 6274, targetcg=13708, myshift=   38
CPU= 19, lastcg= 6602, targetcg=14036, myshift=   39
CPU= 20, lastcg= 6933, targetcg=14366, myshift=   40
CPU= 21, lastcg= 7263, targetcg=14696, myshift=   41
CPU= 22, lastcg= 7600, targetcg=  168, myshift=   42
CPU= 23, lastcg= 7928, targetcg=  496, myshift=   42
CPU= 24, lastcg= 8259, targetcg=  826, myshift=   42
CPU= 25, lastcg= 8589, targetcg= 1156, myshift=   42
CPU= 26, lastcg= 8840, targetcg= 1408, myshift=   42
CPU= 27, lastcg= 9003, targetcg= 1570, myshift=   41
CPU= 28, lastcg= 9166, targetcg= 1734, myshift=   41
CPU= 29, lastcg= 9329, targetcg= 1896, myshift=   40
CPU= 30, lastcg= 9492, targetcg= 2060, myshift=   40
CPU= 31, lastcg= 9655, targetcg= 2222, myshift=   39
CPU= 32, lastcg= 9818, targetcg= 2386, myshift=   39
CPU= 33, lastcg= 9981, targetcg= 2548, myshift=   38
CPU= 34, lastcg=10144, targetcg= 2712, myshift=   38
CPU= 35, lastcg=10307, targetcg= 2874, myshift=   37
CPU= 36, lastcg=10470, targetcg= 3038, myshift=   37
CPU= 37, lastcg=10633, targetcg= 3200, myshift=   36
CPU= 38, lastcg=10796, targetcg= 3364, myshift=   36
CPU= 39, lastcg=10959, targetcg= 3526, myshift=   35
CPU= 40, lastcg=11122, targetcg= 3690, myshift=   35
CPU= 41, lastcg=11285, targetcg= 3852, myshift=   34
CPU= 42, lastcg=11448, targetcg= 4016, myshift=   34
CPU= 43, lastcg=11611, targetcg= 4178, myshift=   33
CPU= 44, lastcg=11774, targetcg= 4342, myshift=   33
CPU= 45, lastcg=11937, targetcg= 4504, myshift=   32
CPU= 46, lastcg=12100, targetcg= 4668, myshift=   32
CPU= 47, lastcg=12263, targetcg= 4830, myshift=   31
CPU= 48, lastcg=12426, targetcg= 4994, myshift=   31
CPU= 49, lastcg=12589, targetcg= 5156, myshift=   30
CPU= 50, lastcg=12752, targetcg= 5320, myshift=   30
CPU= 51, lastcg=12915, targetcg= 5482, myshift=   29
CPU= 52, lastcg=13078, targetcg= 5646, myshift=   29
CPU= 53, lastcg=13240, targetcg= 5808, myshift=   28
CPU= 54, lastcg=13403, targetcg= 5970, myshift=   28
CPU= 55, lastcg=13565, targetcg= 6132, myshift=   27
CPU= 56, lastcg=13728, targetcg= 6296, myshift=   27
CPU= 57, lastcg=13890, targetcg= 6458, myshift=   26
CPU= 58, lastcg=14053, targetcg= 6620, myshift=   26
CPU= 59, lastcg=14215, targetcg= 6782, myshift=   25
CPU= 60, lastcg=14378, targetcg= 6946, myshift=   25
CPU= 61, lastcg=14540, targetcg= 7108, myshift=   24
CPU= 62, lastcg=14703, targetcg= 7270, myshift=   24
CPU= 63, lastcg=14865, targetcg= 7432, myshift=   23
nsb->shift =  42, nsb->bshift=  0
Listing Scalars
nsb->nodeid:         0
nsb->nnodes:     64
nsb->cgtotal: 14866
nsb->natoms:  31242
nsb->shift:      42
nsb->bshift:      0
Nodeid   index  homenr  cgload  workload
    0       0     488     299       299
    1     488     491     634       634
    2     979     488     971       971
    3    1467     488    1299      1299
    4    1955     488    1630      1630
    5    2443     486    1960      1960
    6    2929     488    2297      2297
    7    3417     488    2625      2625
    8    3905     488    2956      2956
    9    4393     486    3286      3286
   10    4879     488    3623      3623
   11    5367     488    3951      3951
   12    5855     488    4282      4282
   13    6343     486    4612      4612
   14    6829     488    4949      4949
   15    7317     488    5277      5277
   16    7805     488    5608      5608
   17    8293     486    5938      5938
   18    8779     488    6275      6275
   19    9267     488    6603      6603
   20    9755     488    6934      6934
   21   10243     486    7264      7264
   22   10729     488    7601      7601
   23   11217     488    7929      7929
   24   11705     488    8260      8260
   25   12193     486    8590      8590
   26   12679     488    8841      8841
   27   13167     489    9004      9004
   28   13656     489    9167      9167
   29   14145     489    9330      9330
   30   14634     489    9493      9493
   31   15123     489    9656      9656
   32   15612     489    9819      9819
   33   16101     489    9982      9982
   34   16590     489   10145     10145
   35   17079     489   10308     10308
   36   17568     489   10471     10471
   37   18057     489   10634     10634
   38   18546     489   10797     10797
   39   19035     489   10960     10960
   40   19524     489   11123     11123
   41   20013     489   11286     11286
   42   20502     489   11449     11449
   43   20991     489   11612     11612
   44   21480     489   11775     11775
   45   21969     489   11938     11938
   46   22458     489   12101     12101
   47   22947     489   12264     12264
   48   23436     489   12427     12427
   49   23925     489   12590     12590
   50   24414     489   12753     12753
   51   24903     489   12916     12916
   52   25392     489   13079     13079
   53   25881     486   13241     13241
   54   26367     489   13404     13404
   55   26856     486   13566     13566
   56   27342     489   13729     13729
   57   27831     486   13891     13891
   58   28317     489   14054     14054
   59   28806     486   14216     14216
   60   29292     489   14379     14379
   61   29781     486   14541     14541
   62   30267     489   14704     14704
   63   30756     486   14866     14866

parameters of the run:
  integrator           = md
  nsteps               = 100000
  init_step            = 0
  ns_type              = Grid
  nstlist              = 10
  ndelta               = 2
  bDomDecomp           = FALSE
  decomp_dir           = 0
  nstcomm              = 1
  comm_mode            = Linear
  nstcheckpoint        = 1000
  nstlog               = 100
  nstxout              = 1000
  nstvout              = 25000
  nstfout              = 0
  nstenergy            = 100
  nstxtcout            = 500
  init_t               = 0
  delta_t              = 0.002
  xtcprec              = 1000
  nkx                  = 64
  nky                  = 128
  nkz                  = 64
  pme_order            = 4
  ewald_rtol           = 1e-05
  ewald_geometry       = 0
  epsilon_surface      = 0
  optimize_fft         = TRUE
  ePBC                 = xyz
  bUncStart            = FALSE
  bShakeSOR            = FALSE
  etc                  = Nose-Hoover
  epc                  = Parrinello-Rahman
  epctype              = Isotropic
  tau_p                = 5
  ref_p (3x3):
     ref_p[    0]={ 1.00000e+00,  0.00000e+00,  0.00000e+00}
     ref_p[    1]={ 0.00000e+00,  1.00000e+00,  0.00000e+00}
     ref_p[    2]={ 0.00000e+00,  0.00000e+00,  1.00000e+00}
  compress (3x3):
     compress[    0]={ 4.50000e-05,  0.00000e+00,  0.00000e+00}
     compress[    1]={ 0.00000e+00,  4.50000e-05,  0.00000e+00}
     compress[    2]={ 0.00000e+00,  0.00000e+00,  4.50000e-05}
  andersen_seed        = 815131
  rlist                = 1
  coulombtype          = PME
  rcoulomb_switch      = 0
  rcoulomb             = 1
  vdwtype              = Cut-off
  rvdw_switch          = 0
  rvdw                 = 1
  epsilon_r            = 1
  epsilon_rf           = 1
  tabext               = 1
  gb_algorithm         = Still
  nstgbradii           = 1
  rgbradii             = 2
  gb_saltconc          = 0
  implicit_solvent     = No
  DispCorr             = No
  fudgeQQ              = 1
  free_energy          = no
  init_lambda          = 0
  sc_alpha             = 0
  sc_power             = 0
  sc_sigma             = 0.3
  delta_lambda         = 0
  disre_weighting      = Conservative
  disre_mixed          = FALSE
  dr_fc                = 1000
  dr_tau               = 0
  nstdisreout          = 100
  orires_fc            = 0
  orires_tau           = 0
  nstorireout          = 100
  dihre-fc             = 1000
  dihre-tau            = 0
  nstdihreout          = 100
  em_stepsize          = 0.001
  em_tol               = 1e-06
  niter                = 1000
  fc_stepsize          = 0
  nstcgsteep           = 10000
  nbfgscorr            = 10
  ConstAlg             = Lincs
  shake_tol            = 0.0001
  lincs_order          = 4
  lincs_warnangle      = 30
  lincs_iter           = 1
  bd_fric              = 0
  ld_seed              = 1993
  cos_accel            = 0
  deform (3x3):
     deform[    0]={ 0.00000e+00,  0.00000e+00,  0.00000e+00}
     deform[    1]={ 0.00000e+00,  0.00000e+00,  0.00000e+00}
     deform[    2]={ 0.00000e+00,  0.00000e+00,  0.00000e+00}
  userint1             = 0
  userint2             = 0
  userint3             = 0
  userint4             = 0
  userreal1            = 0
  userreal2            = 0
  userreal3            = 0
  userreal4            = 0
grpopts:
  nrdf:        75399
  ref_t:                 300
  tau_t:                 0.5
anneal:                   No
ann_npoints:               0
  acc:             0           0           0
  nfreeze:           N           N           N
  energygrp_flags[  0]: 0 0 0
  energygrp_flags[  1]: 0 0 0
  energygrp_flags[  2]: 0 0 0
  efield-x:
     n = 0
  efield-xt:
     n = 0
  efield-y:
     n = 0
  efield-yt:
     n = 0
  efield-z:
     n = 0
  efield-zt:
     n = 0
  bQMMM                = FALSE
  QMconstraints        = 0
  QMMMscheme           = 0
  scalefactor          = 1
qm_opts:
  ngQM                 = 0
Max number of graph edges per atom is 4
Table routines are used for coulomb: TRUE
Table routines are used for vdw:     FALSE
Using a Gaussian width (1/beta) of 0.320163 nm for Ewald
Cut-off's:   NS: 1   Coulomb: 1   LJ: 1
System total charge: 0.000
Generated table with 1000 data points for Ewald.
Tabscale = 500 points/nm
Generated table with 1000 data points for LJ6.
Tabscale = 500 points/nm
Generated table with 1000 data points for LJ12.
Tabscale = 500 points/nm
Generated table with 500 data points for 1-4 COUL.
Tabscale = 500 points/nm
Generated table with 500 data points for 1-4 LJ6.
Tabscale = 500 points/nm
Generated table with 500 data points for 1-4 LJ12.
Tabscale = 500 points/nm

Enabling SPC water optimization for 6108 molecules.

Will do PME sum in reciprocal space.
[End]
--------------------------------------------------------------------------
ERROR MESSAGE:

Reading file topol.tpr, VERSION 3.3.1 (single precision)

Back Off! I just backed up ener.edr to ./#ener.edr.1#
starting mdrun 'sivdppc'
100000 steps,    200.0 ps.


Back Off! I just backed up traj.trr to ./#traj.trr.1#

Back Off! I just backed up traj.xtc to ./#traj.xtc.1#

Back Off! I just backed up step-1.pdb to ./#step-1.pdb.1#
ERROR: 0031-250  task 62: Segmentation fault
ERROR: 0031-250  task 54: Segmentation fault
ERROR: 0031-250  task 58: Segmentation fault
ERROR: 0031-250  task 50: Segmentation fault
ERROR: 0031-250  task 51: Segmentation fault

Back Off! I just backed up step0.pdb to ./#step0.pdb.1#
ERROR: 0031-250  task 19: Segmentation fault
ERROR: 0031-250  task 28: Segmentation fault
ERROR: 0031-250  task 49: Segmentation fault
ERROR: 0031-250  task 17: Segmentation fault
ERROR: 0031-250  task 20: Segmentation fault
ERROR: 0031-250  task 23: Segmentation fault
ERROR: 0031-250  task 26: Segmentation fault
ERROR: 0031-250  task 27: Segmentation fault
ERROR: 0031-250  task 31: Segmentation fault
Wrote pdb files with previous and current coordinates
ERROR: 0031-250  task 52: Segmentation fault
ERROR: 0031-250  task 18: Segmentation fault
ERROR: 0031-250  task 60: Segmentation fault
ERROR: 0031-250  task 24: Segmentation fault
ERROR: 0031-250  task 16: Segmentation fault
ERROR: 0031-250  task 30: Segmentation fault
ERROR: 0031-250  task 21: Segmentation fault
ERROR: 0031-250  task 14: Segmentation fault
ERROR: 0031-250  task 48: Segmentation fault
ERROR: 0031-250  task 38: Segmentation fault
ERROR: 0031-250  task 22: Segmentation fault
ERROR: 0031-250  task 46: Segmentation fault
ERROR: 0031-250  task 3: Segmentation fault
ERROR: 0031-250  task 45: Segmentation fault
ERROR: 0031-250  task 37: Segmentation fault
ERROR: 0031-250  task 40: Segmentation fault
ERROR: 0031-250  task 8: Segmentation fault
ERROR: 0031-250  task 15: Segmentation fault
ERROR: 0031-250  task 33: Segmentation fault
ERROR: 0031-250  task 39: Segmentation fault
ERROR: 0031-250  task 44: Segmentation fault
ERROR: 0031-250  task 56: Segmentation fault
ERROR: 0031-250  task 43: Segmentation fault
ERROR: 0031-250  task 4: Segmentation fault
ERROR: 0031-250  task 12: Segmentation fault
ERROR: 0031-250  task 29: Segmentation fault
ERROR: 0031-250  task 35: Segmentation fault
ERROR: 0031-250  task 25: Segmentation fault
ERROR: 0031-250  task 6: Segmentation fault
ERROR: 0031-250  task 42: Segmentation fault
ERROR: 0031-250  task 13: Segmentation fault
ERROR: 0031-250  task 1: Segmentation fault
ERROR: 0031-250  task 9: Segmentation fault
ERROR: 0031-250  task 10: Segmentation fault
ERROR: 0031-250  task 2: Segmentation fault
ERROR: 0031-250  task 47: Segmentation fault
ERROR: 0031-250  task 5: Segmentation fault
ERROR: 0031-250  task 7: Segmentation fault
ERROR: 0031-250  task 11: Segmentation fault
ERROR: 0031-250  task 32: Segmentation fault
ERROR: 0031-250  task 34: Segmentation fault
ERROR: 0031-250  task 41: Segmentation fault
ERROR: 0031-250  task 36: Segmentation fault
ERROR: 0031-250  task 55: Terminated
ERROR: 0031-250  task 59: Terminated
ERROR: 0031-250  task 53: Terminated
ERROR: 0031-250  task 57: Terminated
ERROR: 0031-250  task 61: Terminated
ERROR: 0031-250  task 63: Terminated
ERROR: 0031-250  task 0: Terminated
[End]

_________________________________________________________________
Find a local pizza place, music store, museum and moreĀ…then map the best route! http://local.live.com

_______________________________________________
gmx-users mailing list    gmx-users@gromacs.org
http://www.gromacs.org/mailman/listinfo/gmx-users
Please don't post (un)subscribe requests to the list. Use the www interface or send it to [EMAIL PROTECTED]
Can't post? Read http://www.gromacs.org/mailing_lists/users.php

Reply via email to