Re: [gmx-users] Problems with REMD in Gromacs 4.6.3

gigo Fri, 12 Jul 2013 08:29:04 -0700

Hi!

On 2013-07-12 11:15, Mark Abraham wrote:

What does --loadbalance do?

It balances the total number of processes across all allocated nodes.The thing is that mpiexec does not know that I want each replica to forkto 4 OpenMP threads. Thus, without this option and without affinities(in a sec about it) mpiexec starts too many replicas on some nodes -gromacs complains about the overload then - while some cores on othernodes are not used. It is possible to run my simulation like that:

mpiexec mdrun_mpi -v -cpt 20 -multi 144 -replex 2000 -cpi (without--loadbalance for mpiexec and without -ntomp for mdrun)

Then each replica runs on 4 MPI processes (I allocate 4 times morecores then replicas and mdrun sees it). The problem is that it is muchslower than using OpenMP for each replica. I did not find any other waythan --loadbalance in mpiexec and then -multi 144 -ntomp 4 in mdrun touse MPI and OpenMP at the same time on the torque-controlled cluster.

What do the .log files say about
OMP_NUM_THREADS, thread affinities, pinning, etc?


Each replica logs:
"Using 1 MPI process
Using 4 OpenMP threads",

That is is correct. As I said, the threads are forked, but 3 out of 4don't do anything, and the simulation does not go at all.


About affinities Gromacs says:

"Can not set thread affinities on the current platform. On NUMA systemsthiscan cause performance degradation. If you think your platform shouldsupport

setting affinities, contact the GROMACS developers."

Well, the "current platform" is normal x86_64 cluster, but the wholeinformation about resources is passed by Torque to OpenMPI-linkedGromacs. Can it be that mdrun sees the resources allocated by torque asa big pool of cpus and misses the information about nodes topology?

If you have any suggestions how to debug or trace this issue, I wouldbe glad to participate.

Best,
G

Mark

On Fri, Jul 12, 2013 at 3:46 AM, gigo <g...@poczta.ibb.waw.pl> wrote:
Dear GMXers,
With Gromacs 4.6.2 I was running REMD with 144 replicas. Replicaswereseparate MPI jobs of course (OpenMPI 1.6.4). Each replica I run on 4coreswith OpenMP. There is Torque installed on the cluster build of12-cores
nodes, so I used the following script:

#!/bin/tcsh -f
#PBS -S /bin/tcsh
#PBS -N test
#PBS -l nodes=48:ppn=12
#PBS -l walltime=300:00:00
#PBS -l mem=288Gb
#PBS -r n
cd $PBS_O_WORKDIR
mpiexec -np 144 --loadbalance mdrun_mpi -v -cpt 20 -multi 144 -ntomp4
-replex 2000
It was working just great with 4.6.2. It does not work with 4.6.3.The newversion was compiled with the same options in the same environment.Mpiexecspreads the replicas evenly over the cluster. Each replica forks 4threads,but only one of them uses any cpu. Logs end at the citations. Someempty
energy and trajectory files are created, nothing is written to them.
Please let me know if you have any immediate suggestion on how tomake itwork (maybe based on some differences between versions), or if Ishould fill
the bug report with all the technical details.
Best Regards,

Grzegorz Wieczorek

--
gmx-users mailing list    gmx-users@gromacs.org
http://lists.gromacs.org/mailman/listinfo/gmx-users
* Please search the archive at
http://www.gromacs.org/Support/Mailing_Lists/Search before posting!
* Please don't post (un)subscribe requests to the list. Use the www
interface or send it to gmx-users-requ...@gromacs.org.
* Can't post? Read http://www.gromacs.org/Support/Mailing_Lists

--
gmx-users mailing list    gmx-users@gromacs.org
http://lists.gromacs.org/mailman/listinfo/gmx-users
* Please search the archive at 
http://www.gromacs.org/Support/Mailing_Lists/Search before posting!

* Please don't post (un)subscribe requests to the list. Use thewww interface or send it to gmx-users-requ...@gromacs.org.

* Can't post? Read http://www.gromacs.org/Support/Mailing_Lists

Re: [gmx-users] Problems with REMD in Gromacs 4.6.3

Reply via email to