bharat v. adkar wrote:
On Sun, 27 Dec 2009, Mark Abraham wrote:

bharat v. adkar wrote:

 Dear all,
   I am trying to perform replica exchange MD (REMD) on a 'protein in
 water' system. I am following instructions given on wiki (How-Tos ->
 REMD). I have to perform the REMD simulation with 35 different
 temperatures. As per advise on wiki, I equilibrated the system at
 respective temperatures (total of 35 equilibration simulations). After
 this I generated chk_0.tpr, chk_1.tpr, ..., chk_34.tpr files from the
 equilibrated structures.

Now when I submit final job for REMD with following command-line, it gives
 some error:

command line: mpiexec -np 70 mdrun -multi 35 -replex 1000 -s chk_.tpr -v

 error msg:
 -------------------------------------------------------
 Program mdrun_mpi, VERSION 4.0.7
 Source code file: ../../../SRC/src/gmxlib/smalloc.c, line: 179

 Fatal error:
 Not enough memory. Failed to realloc 790760 bytes for nlist->jjnr,
 nlist->jjnr=0x9a400030
 (called from file ../../../SRC/src/mdlib/ns.c, line 503)
 -------------------------------------------------------

 Thanx for Using GROMACS - Have a Nice Day
:  Cannot allocate memory
 Error on node 19, will try to stop all the nodes
 Halting parallel program mdrun_mpi on CPU 19 out of 70
 ***********************************************************************


The individual node on the cluster has 8GB of physical memory and 16GB of
 swap memory. Moreover, when logged onto the individual nodes, it shows
more than 1GB of free memory, so there should be no problem with cluster
 memory. Also, the equilibration jobs for the same system are run on the
 same cluster without any problem.

What I have observed by submitting different test jobs with varying number of processors (and no. of replicas, wherever necessary), that any job with total number of processors <= 64, runs faithfully without any problem. As
 soon as total number of processors are more than 64, it gives the above
 error. I have tested this with 65 processors/65 replicas also.

This sounds like you might be running on fewer physical CPUs than you have available. If so, running multiple MPI processes per physical CPU can lead to memory shortage conditions.

I don't understand what you mean. Do you mean, there might be more than 8 processes running per node (each node has 8 processors)? But that also does not seem to be the case, as SGE (sun grid engine) output shows only eight processes per node.

65 processes can't have 8 processes per node.

Mark

I don't know what you mean by "swap memory".

Sorry, I meant cache memory..

bharat


Mark

 System: Protein + water + Na ions (total 46878 atoms)
 Gromacs version: tested with both v4.0.5 and v4.0.7
 compiled with: --enable-float --with-fft=fftw3 --enable-mpi
 compiler: gcc_3.4.6 -O3
 machine details: uname -mpio: x86_64 x86_64 x86_64 GNU/Linux


I tried searching the mailing-list without any luck. I am not sure, if i
 am doing anything wrong in giving commands. Please correct me if it is
 wrong.

 Kindly let me know the solution.


 bharat




--
gmx-users mailing list    gmx-users@gromacs.org
http://lists.gromacs.org/mailman/listinfo/gmx-users
Please search the archive at http://www.gromacs.org/search before posting!
Please don't post (un)subscribe requests to the list. Use the www interface or send it to gmx-users-requ...@gromacs.org.
Can't post? Read http://www.gromacs.org/mailing_lists/users.php

Reply via email to