Re: [gmx-users] Replica Exchange MD on more than 64 processors

David van der Spoel Sun, 27 Dec 2009 23:13:08 -0800

bharat v. adkar wrote:

On Mon, 28 Dec 2009, Mark Abraham wrote:
bharat v. adkar wrote:
 On Sun, 27 Dec 2009, Mark Abraham wrote:

>  bharat v. adkar wrote:
> >   On Sun, 27 Dec 2009, Mark Abraham wrote:
> > > > >   bharat v. adkar wrote:
> > > > > >    Dear all,
> > > > I am trying to perform replica exchange MD (REMD) on a >> 'protein in> > > > water' system. I am following instructions given on wiki >> (How-Tos ->> > > > REMD). I have to perform the REMD simulation with 35different> > > > temperatures. As per advise on wiki, I equilibrated thesystem > > > > at> > > > respective temperatures (total of 35 equilibration > >simulations). > > After> > > > this I generated chk_0.tpr, chk_1.tpr, ..., chk_34.tprfiles > > from the
> > > >    equilibrated structures.
> > > > > > Now when I submit final job for REMD with following >> command-line, it > > gives
> > > >    some error:
> > > > > > command line: mpiexec -np 70 mdrun -multi 35 -replex1000 -s > > chk_.tpr > > -v
> > > > > >    error msg:
> > > >    -------------------------------------------------------
> > > >    Program mdrun_mpi, VERSION 4.0.7
> > > > Source code file: ../../../SRC/src/gmxlib/smalloc.c, line:179
> > > > > >    Fatal error:
> > > > Not enough memory. Failed to realloc 790760 bytes for > >> > nlist->jjnr,
> > > >    nlist->jjnr=0x9a400030
> > > >    (called from file ../../../SRC/src/mdlib/ns.c, line 503)
> > > >    -------------------------------------------------------
> > > > > >    Thanx for Using GROMACS - Have a Nice Day
> > > > :    Cannot allocate memory
> > > >    Error on node 19, will try to stop all the nodes
> > > >    Halting parallel program mdrun_mpi on CPU 19 out of 70
> > > > > >***********************************************************************> > > > > > > > The individual node on the cluster has 8GB ofphysical > > memory and 16GB > > of> > > > swap memory. Moreover, when logged onto the individualnodes, > > it > > shows> > > > more than 1GB of free memory, so there should be noproblem > > with > > cluster> > > > memory. Also, the equilibration jobs for the same systemare > > run on > > the
> > > >    same cluster without any problem.
> > > > > > What I have observed by submitting different test jobswith > > varying > > number> > > > of processors (and no. of replicas, wherever necessary),that > > any job > > with> > > > total number of processors <= 64, runs faithfully withoutany > > problem. > > As> > > > soon as total number of processors are more than 64, itgives > > the > > above
> > > >   error. I have tested this with 65 processors/65 replicas also.
> > > > This sounds like you might be running on fewer physicalCPUs > > than you > have available. If so, running multiple MPIprocesses per > > physical CPU > can lead to memory shortageconditions.> > > > I don't understand what you mean. Do you mean, there mightbe more > > than 8> > processes running per node (each node has 8 processors)? Butthat > > also> > does not seem to be the case, as SGE (sun grid engine) outputshows > > only
> >   eight processes per node.
> >  65 processes can't have 8 processes per node.
why can't it have? as i said, there are 8 processors per node. whati havenot mentioned is that how many nodes it is using. The jobs gotdistributedover 9 nodes. 8 of which corresponds to 64 processors + 1 processorfrom
 9th node.
OK, that's a full description. Your symptoms are indicative of someonemaking an error somewhere. Since GROMACS works over more than 64processors elsewhere, the presumption is that you are doing somethingwrong or the machine is not set up in the way you think it is orshould be. To get the most effective help, you need to be sure you'reproviding full information - else we can't tell which error you'remaking or (potentially) eliminate you as a source of error.
Sorry for not being clear in statements.
As far I can tell you, job distribution seems okay to me. It is 1job per
 processor.
Does non-REMD GROMACS run on more than 64 processors? Does yourcluster support using more than 8 nodes in a run? Can you run an MPI"Hello world" application that prints the processor and node ID acrossmore than 64 processors?
Yes, the cluster supports runs with more than 8 nodes. I generated asystem with 10 nm water box and submitted on 80 processors. It wasrunning fine. It printed all 80 NODEIDs. Also showed me when the jobwill get over.
bharat
Mark
 bharat

> >  Mark
> > > >   I don't know what you mean by "swap memory".
> > > >   Sorry, I meant cache memory..
> > > >   bharat
> > > > > >   Mark
> > > > >    System: Protein + water + Na ions (total 46878 atoms)
> > > >    Gromacs version: tested with both v4.0.5 and v4.0.7
> > > >    compiled with: --enable-float --with-fft=fftw3 --enable-mpi
> > > >    compiler: gcc_3.4.6 -O3
> > > >    machine details: uname -mpio: x86_64 x86_64 x86_64 GNU/Linux
> > > > > > > > I tried searching the mailing-list without anyluck. I > > am not sure, if > > i> > > > am doing anything wrong in giving commands. Please correctme > > if it > > is
> > > >    wrong.
> > > > > >    Kindly let me know the solution.
> > > > > > > >    bharat
> > > > > > > > > >

your system is going out of memory. probably too big a system or allreplicas are runing on he same node.


--
David van der Spoel, Ph.D., Professor of Biology
Molec. Biophys. group, Dept. of Cell & Molec. Biol., Uppsala University.
Box 596, 75124 Uppsala, Sweden. Phone:  +46184714205. Fax: +4618511755.
sp...@xray.bmc.uu.se    sp...@gromacs.org   http://folding.bmc.uu.se
--
gmx-users mailing list    gmx-users@gromacs.org
http://lists.gromacs.org/mailman/listinfo/gmx-users
Please search the archive at http://www.gromacs.org/search before posting!

Please don't post (un)subscribe requests to the list. Use thewww interface or send it to gmx-users-requ...@gromacs.org.

Can't post? Read http://www.gromacs.org/mailing_lists/users.php

Re: [gmx-users] Replica Exchange MD on more than 64 processors

Reply via email to