Could someone tell me what tell the below error

Getting Loaded...
Reading file MD_100.tpr, VERSION 4.5.4 (single precision)
Loaded with Money

Will use 30 particle-particle and 18 PME only nodes
This is a guess, check the performance at the end of the log file
[ib02:22825] *** Process received signal ***
[ib02:22825] Signal: Segmentation fault (11)
[ib02:22825] Signal code: Address not mapped (1)
[ib02:22825] Failing at address: 0x10
[ib02:22825] [ 0] /lib/x86_64-linux-gnu/
[ib02:22825] [ 1] /usr/lib/openmpi/lib/openmpi/
[ib02:22825] [ 2] /usr/lib/openmpi/lib/openmpi/
[ib02:22825] [ 3] /usr/lib/openmpi/lib/openmpi/
[ib02:22825] [ 4] /usr/lib/openmpi/lib/openmpi/
[ib02:22825] [ 5] /usr/lib/
[ib02:22825] [ 6] /usr/lib/ [0x7f5359282755]
[ib02:22825] [ 7] /usr/lib/openmpi/lib/openmpi/
[ib02:22825] [ 8] /usr/lib/openmpi/lib/openmpi/
[ib02:22825] [ 9] /usr/lib/
[ib02:22825] [10] /usr/lib/ [0x7f535929dc2b]
[ib02:22825] [11]
/usr/lib/ $
[ib02:22825] [12] mdrun_mpi_d.openmpi(mdrunner+0x46a) [0x40be7a]
[ib02:22825] [13] mdrun_mpi_d.openmpi(main+0x1256) [0x407206]
[ib02:22825] [14] /lib/x86_64-linux-gnu/
[ib02:22825] [15] mdrun_mpi_d.openmpi() [0x407479]
[ib02:22825] *** End of error message ***
mpiexec noticed that process rank 36 with PID 22825 on node ib02 exited on

I've obtained it when I've tried to use my system on multi-node station (
there is no problem on single node). Does this problem with the cluster
system or something wrong with parameters of my simulation?


15 марта 2012 г. 15:25 пользователь James Starlight

> Mark, Peter,
> I've tried to do .tpr file on my local CPU and launch only
> mpiexec -np 24 mdrun_mpi_d.openmpi -v -deffnm MD_100
> on the cluster with 2 nodes.
> I see my job as working but when I've checking the MD_100.log (attached)
> file there are no any information about simulation steps in that file (
> when I use just one node I see in that file step-by-step progression of my
> simulation like below wich was find in the same log file for ONE NODE
> simulation ):
> Started mdrun on node 0 Thu Mar 15 11:22:35 2012
>            Step           Time         Lambda
>               0        0.00000        0.00000
> Grid: 12 x 9 x 12 cells
>    Energies (kJ/mol)
>        G96Angle    Proper Dih.  Improper Dih.          LJ-14     Coulomb-14
>     1.32179e+04    3.27485e+03    2.53267e+03    4.06443e+02    6.15315e+04
>         LJ (SR)        LJ (LR)  Disper. corr.   Coulomb (SR)   Coul. recip.
>     4.12152e+04   -5.51788e+03   -1.70930e+03   -4.54886e+05   -1.46292e+05
>      Dis. Rest. D.R.Viol. (nm)     Dih. Rest.      Potential    Kinetic En.
>     2.14240e-02    3.46794e+00    1.33793e+03   -4.84889e+05    9.88771e+04
>    Total Energy  Conserved En.    Temperature Pres. DC (bar) Pressure (bar)
>    -3.86012e+05   -3.86012e+05    3.11520e+02   -1.14114e+02    3.67861e+02
>    Constr. rmsd
>     3.75854e-05
>            Step           Time         Lambda
>            2000        4.00000        0.00000
>    Energies (kJ/mol)
>        G96Angle    Proper Dih.  Improper Dih.          LJ-14     Coulomb-14
>     1.31741e+04    3.25280e+03    2.58442e+03    3.51371e+02    6.15913e+04
>         LJ (SR)        LJ (LR)  Disper. corr.   Coulomb (SR)   Coul. recip.
>     4.16349e+04   -5.53474e+03   -1.70930e+03   -4.56561e+05   -1.46485e+05
>      Dis. Rest. D.R.Viol. (nm)     Dih. Rest.      Potential    Kinetic En.
>     4.78276e+01    3.38844e+00    9.82735e+00   -4.87644e+05    9.83280e+04
>    Total Energy  Conserved En.    Temperature Pres. DC (bar) Pressure (bar)
>    -3.89316e+05   -3.87063e+05    3.09790e+02   -1.14114e+02    7.25905e+02
>    Constr. rmsd
>     1.88008e-05
> end etc...
> What's wrong can be with multi-node computations?
> James
> 15 марта 2012 г. 11:25 пользователь Mark Abraham 
> <>написал:
> On 15/03/2012 6:13 PM, Peter C. Lai wrote:
>>> Try separating your grompp run from your mpirun:
>>> You should not really be having the scheduler execute the grompp. Run
>>> your grompp step to generate a .tpr either on the head node or on your
>>> local
>>> machine (then copy it over to the cluster).
>> Good advice.
>>> (The -p that the scheduler is complaining about only appears in the
>>> grompp
>>> step, so don't have the scheduler run it).
>> grompp is running successfully, as you can see from the output
>> I think "mpiexec -np 12" is being interpreted as "mpiexec -n 12 -p", and
>> the process of separating the grompp stage from the mdrun stage would help
>> make that clear - read documentation first, however.
>> Mark
>>> On 2012-03-15 10:04:49AM +0300, James Starlight wrote:
>>>> Dear Gromacs Users!
>>>> I have some problems with running my simulation on multi-modes station
>>>> wich
>>>> use open_MPI
>>>> I've launch my jobs by means of that script. The below example of
>>>> running
>>>> work on 1 node ( 12 cpu).
>>>> #!/bin/sh
>>>> #PBS -N gromacs
>>>> #PBS -l nodes=1:red:ppn=12
>>>> #PBS -V
>>>> #PBS -o gromacs.out
>>>> #PBS -e gromacs.err
>>>> cd /globaltmp/xz/job_name
>>>> grompp -f md.mdp -c nvtWprotonated.gro -p -n index.ndx -o
>>>> job.tpr
>>>> mpiexec -np 12 mdrun_mpi_d.openmpi -v -deffnm job
>>>> All nodes of my cluster consist of 12 CPU. When I'm using just 1 node on
>>>> that cluster I have no problems with running of my jobs but when I try
>>>> to
>>>> use more than one nodes I've obtain error ( the example is attached in
>>>> the
>>>> gromacs.err file as well as mmd.mdp of that system). Another outcome of
>>>> such multi-node simulation is that my job has been started but no
>>>> calculation were done ( the name_of_my_job.log file was empty and no
>>>> update
>>>> of .trr file was seen ). Commonly this error occurs when I uses many
>>>> nodes
>>>> (8-10) Finally sometimes I've obtain some errors with the PME order (
>>>> this
>>>> time I've used 3 nodes). The exactly error differs when I varry the
>>>> number
>>>> of nodes.
>>>> Could you tell me whats wrong could be with my cluster?
>>>> Thanks for help
>>>> James
>>>  --
>>>> gmx-users mailing list
>>>> Please search the archive at**
>>>> Support/Mailing_Lists/Search<>before
>>>>  posting!
>>>> Please don't post (un)subscribe requests to the list. Use the
>>>> www interface or send it to
>>>> Can't post? Read 
>> --
>> gmx-users mailing list
>> Please search the archive at**
>> Support/Mailing_Lists/Search<>before
>>  posting!
>> Please don't post (un)subscribe requests to the list. Use the www
>> interface or send it to
>> Can't post? Read 
gmx-users mailing list
Please search the archive at before posting!
Please don't post (un)subscribe requests to the list. Use the 
www interface or send it to
Can't post? Read

Reply via email to