Hi!

On 2013-07-12 11:15, Mark Abraham wrote:
What does --loadbalance do?

It balances the total number of processes across all allocated nodes. The thing is that mpiexec does not know that I want each replica to fork to 4 OpenMP threads. Thus, without this option and without affinities (in a sec about it) mpiexec starts too many replicas on some nodes - gromacs complains about the overload then - while some cores on other nodes are not used. It is possible to run my simulation like that:

mpiexec mdrun_mpi -v -cpt 20 -multi 144 -replex 2000 -cpi (without --loadbalance for mpiexec and without -ntomp for mdrun)

Then each replica runs on 4 MPI processes (I allocate 4 times more cores then replicas and mdrun sees it). The problem is that it is much slower than using OpenMP for each replica. I did not find any other way than --loadbalance in mpiexec and then -multi 144 -ntomp 4 in mdrun to use MPI and OpenMP at the same time on the torque-controlled cluster.

What do the .log files say about
OMP_NUM_THREADS, thread affinities, pinning, etc?

Each replica logs:
"Using 1 MPI process
Using 4 OpenMP threads",
That is is correct. As I said, the threads are forked, but 3 out of 4 don't do anything, and the simulation does not go at all.

About affinities Gromacs says:
"Can not set thread affinities on the current platform. On NUMA systems this can cause performance degradation. If you think your platform should support
setting affinities, contact the GROMACS developers."

Well, the "current platform" is normal x86_64 cluster, but the whole information about resources is passed by Torque to OpenMPI-linked Gromacs. Can it be that mdrun sees the resources allocated by torque as a big pool of cpus and misses the information about nodes topology?

If you have any suggestions how to debug or trace this issue, I would be glad to participate.
Best,
G







Mark

On Fri, Jul 12, 2013 at 3:46 AM, gigo <g...@poczta.ibb.waw.pl> wrote:
Dear GMXers,
With Gromacs 4.6.2 I was running REMD with 144 replicas. Replicas were separate MPI jobs of course (OpenMPI 1.6.4). Each replica I run on 4 cores with OpenMP. There is Torque installed on the cluster build of 12-cores
nodes, so I used the following script:

#!/bin/tcsh -f
#PBS -S /bin/tcsh
#PBS -N test
#PBS -l nodes=48:ppn=12
#PBS -l walltime=300:00:00
#PBS -l mem=288Gb
#PBS -r n
cd $PBS_O_WORKDIR
mpiexec -np 144 --loadbalance mdrun_mpi -v -cpt 20 -multi 144 -ntomp 4
-replex 2000

It was working just great with 4.6.2. It does not work with 4.6.3. The new version was compiled with the same options in the same environment. Mpiexec spreads the replicas evenly over the cluster. Each replica forks 4 threads, but only one of them uses any cpu. Logs end at the citations. Some empty
energy and trajectory files are created, nothing is written to them.
Please let me know if you have any immediate suggestion on how to make it work (maybe based on some differences between versions), or if I should fill
the bug report with all the technical details.
Best Regards,

Grzegorz Wieczorek

--
gmx-users mailing list    gmx-users@gromacs.org
http://lists.gromacs.org/mailman/listinfo/gmx-users
* Please search the archive at
http://www.gromacs.org/Support/Mailing_Lists/Search before posting!
* Please don't post (un)subscribe requests to the list. Use the www
interface or send it to gmx-users-requ...@gromacs.org.
* Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
--
gmx-users mailing list    gmx-users@gromacs.org
http://lists.gromacs.org/mailman/listinfo/gmx-users
* Please search the archive at 
http://www.gromacs.org/Support/Mailing_Lists/Search before posting!
* Please don't post (un)subscribe requests to the list. Use the www interface or send it to gmx-users-requ...@gromacs.org.
* Can't post? Read http://www.gromacs.org/Support/Mailing_Lists

Reply via email to