Could someone tell me what tell the below error Getting Loaded... Reading file MD_100.tpr, VERSION 4.5.4 (single precision) Loaded with Money
Will use 30 particle-particle and 18 PME only nodes This is a guess, check the performance at the end of the log file [ib02:22825] *** Process received signal *** [ib02:22825] Signal: Segmentation fault (11) [ib02:22825] Signal code: Address not mapped (1) [ib02:22825] Failing at address: 0x10 [ib02:22825] [ 0] /lib/x86_64-linux-gnu/libpthread.so.0(+0xf030) [0x7f535903e03$ [ib02:22825] [ 1] /usr/lib/openmpi/lib/openmpi/mca_pml_ob1.so(+0x7e23) [0x7f535$ [ib02:22825] [ 2] /usr/lib/openmpi/lib/openmpi/mca_pml_ob1.so(+0x8601) [0x7f535$ [ib02:22825] [ 3] /usr/lib/openmpi/lib/openmpi/mca_pml_ob1.so(+0x8bab) [0x7f535$ [ib02:22825] [ 4] /usr/lib/openmpi/lib/openmpi/mca_btl_sm.so(+0x42af) [0x7f5353$ [ib02:22825] [ 5] /usr/lib/libopen-pal.so.0(opal_progress+0x5b) [0x7f535790506b] [ib02:22825] [ 6] /usr/lib/libmpi.so.0(+0x37755) [0x7f5359282755] [ib02:22825] [ 7] /usr/lib/openmpi/lib/openmpi/mca_coll_tuned.so(+0x1c3a) [0x7f$ [ib02:22825] [ 8] /usr/lib/openmpi/lib/openmpi/mca_coll_tuned.so(+0x7fae) [0x7f$ [ib02:22825] [ 9] /usr/lib/libmpi.so.0(ompi_comm_split+0xbf) [0x7f535926de8f] [ib02:22825] [10] /usr/lib/libmpi.so.0(MPI_Comm_split+0xdb) [0x7f535929dc2b] [ib02:22825] [11] /usr/lib/libgmx_mpi_d.openmpi.so.6(gmx_setup_nodecomm+0x19b) $ [ib02:22825] [12] mdrun_mpi_d.openmpi(mdrunner+0x46a) [0x40be7a] [ib02:22825] [13] mdrun_mpi_d.openmpi(main+0x1256) [0x407206] [ib02:22825] [14] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xfd) [0x7f$ [ib02:22825] [15] mdrun_mpi_d.openmpi() [0x407479] [ib02:22825] *** End of error message *** -------------------------------------------------------------------------- mpiexec noticed that process rank 36 with PID 22825 on node ib02 exited on sign$ -------------------------------------------------------------------------- I've obtained it when I've tried to use my system on multi-node station ( there is no problem on single node). Does this problem with the cluster system or something wrong with parameters of my simulation? JAmes 15 марта 2012 г. 15:25 пользователь James Starlight <jmsstarli...@gmail.com>написал: > Mark, Peter, > > > I've tried to do .tpr file on my local CPU and launch only > > mpiexec -np 24 mdrun_mpi_d.openmpi -v -deffnm MD_100 > > on the cluster with 2 nodes. > > I see my job as working but when I've checking the MD_100.log (attached) > file there are no any information about simulation steps in that file ( > when I use just one node I see in that file step-by-step progression of my > simulation like below wich was find in the same log file for ONE NODE > simulation ): > > Started mdrun on node 0 Thu Mar 15 11:22:35 2012 > > Step Time Lambda > 0 0.00000 0.00000 > > Grid: 12 x 9 x 12 cells > Energies (kJ/mol) > G96Angle Proper Dih. Improper Dih. LJ-14 Coulomb-14 > 1.32179e+04 3.27485e+03 2.53267e+03 4.06443e+02 6.15315e+04 > LJ (SR) LJ (LR) Disper. corr. Coulomb (SR) Coul. recip. > 4.12152e+04 -5.51788e+03 -1.70930e+03 -4.54886e+05 -1.46292e+05 > Dis. Rest. D.R.Viol. (nm) Dih. Rest. Potential Kinetic En. > 2.14240e-02 3.46794e+00 1.33793e+03 -4.84889e+05 9.88771e+04 > Total Energy Conserved En. Temperature Pres. DC (bar) Pressure (bar) > -3.86012e+05 -3.86012e+05 3.11520e+02 -1.14114e+02 3.67861e+02 > Constr. rmsd > 3.75854e-05 > > Step Time Lambda > 2000 4.00000 0.00000 > > Energies (kJ/mol) > G96Angle Proper Dih. Improper Dih. LJ-14 Coulomb-14 > 1.31741e+04 3.25280e+03 2.58442e+03 3.51371e+02 6.15913e+04 > LJ (SR) LJ (LR) Disper. corr. Coulomb (SR) Coul. recip. > 4.16349e+04 -5.53474e+03 -1.70930e+03 -4.56561e+05 -1.46485e+05 > Dis. Rest. D.R.Viol. (nm) Dih. Rest. Potential Kinetic En. > 4.78276e+01 3.38844e+00 9.82735e+00 -4.87644e+05 9.83280e+04 > Total Energy Conserved En. Temperature Pres. DC (bar) Pressure (bar) > -3.89316e+05 -3.87063e+05 3.09790e+02 -1.14114e+02 7.25905e+02 > Constr. rmsd > 1.88008e-05 > > end etc... > > > > What's wrong can be with multi-node computations? > > > James > > > 15 марта 2012 г. 11:25 пользователь Mark Abraham > <mark.abra...@anu.edu.au>написал: > > On 15/03/2012 6:13 PM, Peter C. Lai wrote: >> >>> Try separating your grompp run from your mpirun: >>> You should not really be having the scheduler execute the grompp. Run >>> your grompp step to generate a .tpr either on the head node or on your >>> local >>> machine (then copy it over to the cluster). >>> >> >> Good advice. >> >> >>> (The -p that the scheduler is complaining about only appears in the >>> grompp >>> step, so don't have the scheduler run it). >>> >> >> grompp is running successfully, as you can see from the output >> >> I think "mpiexec -np 12" is being interpreted as "mpiexec -n 12 -p", and >> the process of separating the grompp stage from the mdrun stage would help >> make that clear - read documentation first, however. >> >> Mark >> >> >> >>> >>> On 2012-03-15 10:04:49AM +0300, James Starlight wrote: >>> >>>> Dear Gromacs Users! >>>> >>>> >>>> I have some problems with running my simulation on multi-modes station >>>> wich >>>> use open_MPI >>>> >>>> I've launch my jobs by means of that script. The below example of >>>> running >>>> work on 1 node ( 12 cpu). >>>> >>>> #!/bin/sh >>>> #PBS -N gromacs >>>> #PBS -l nodes=1:red:ppn=12 >>>> #PBS -V >>>> #PBS -o gromacs.out >>>> #PBS -e gromacs.err >>>> >>>> cd /globaltmp/xz/job_name >>>> grompp -f md.mdp -c nvtWprotonated.gro -p topol.top -n index.ndx -o >>>> job.tpr >>>> mpiexec -np 12 mdrun_mpi_d.openmpi -v -deffnm job >>>> >>>> All nodes of my cluster consist of 12 CPU. When I'm using just 1 node on >>>> that cluster I have no problems with running of my jobs but when I try >>>> to >>>> use more than one nodes I've obtain error ( the example is attached in >>>> the >>>> gromacs.err file as well as mmd.mdp of that system). Another outcome of >>>> such multi-node simulation is that my job has been started but no >>>> calculation were done ( the name_of_my_job.log file was empty and no >>>> update >>>> of .trr file was seen ). Commonly this error occurs when I uses many >>>> nodes >>>> (8-10) Finally sometimes I've obtain some errors with the PME order ( >>>> this >>>> time I've used 3 nodes). The exactly error differs when I varry the >>>> number >>>> of nodes. >>>> >>>> >>>> Could you tell me whats wrong could be with my cluster? >>>> >>>> Thanks for help >>>> >>>> James >>>> >>> >>> >>> -- >>>> gmx-users mailing list gmx-users@gromacs.org >>>> http://lists.gromacs.org/**mailman/listinfo/gmx-users<http://lists.gromacs.org/mailman/listinfo/gmx-users> >>>> Please search the archive at http://www.gromacs.org/** >>>> Support/Mailing_Lists/Search<http://www.gromacs.org/Support/Mailing_Lists/Search>before >>>> posting! >>>> Please don't post (un)subscribe requests to the list. Use the >>>> www interface or send it to gmx-users-requ...@gromacs.org. >>>> Can't post? Read >>>> http://www.gromacs.org/**Support/Mailing_Lists<http://www.gromacs.org/Support/Mailing_Lists> >>>> >>> >>> >> -- >> gmx-users mailing list gmx-users@gromacs.org >> http://lists.gromacs.org/**mailman/listinfo/gmx-users<http://lists.gromacs.org/mailman/listinfo/gmx-users> >> Please search the archive at http://www.gromacs.org/** >> Support/Mailing_Lists/Search<http://www.gromacs.org/Support/Mailing_Lists/Search>before >> posting! >> Please don't post (un)subscribe requests to the list. Use the www >> interface or send it to gmx-users-requ...@gromacs.org. >> Can't post? Read >> http://www.gromacs.org/**Support/Mailing_Lists<http://www.gromacs.org/Support/Mailing_Lists> >> > >
-- gmx-users mailing list gmx-users@gromacs.org http://lists.gromacs.org/mailman/listinfo/gmx-users Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/Search before posting! Please don't post (un)subscribe requests to the list. Use the www interface or send it to gmx-users-requ...@gromacs.org. Can't post? Read http://www.gromacs.org/Support/Mailing_Lists