Hi!
On 2013-07-12 11:15, Mark Abraham wrote:
What does --loadbalance do?
It balances the total number of processes across all allocated nodes.
The thing is that mpiexec does not know that I want each replica to fork
to 4 OpenMP threads. Thus, without this option and without affinities
(in a sec about it) mpiexec starts too many replicas on some nodes -
gromacs complains about the overload then - while some cores on other
nodes are not used. It is possible to run my simulation like that:
mpiexec mdrun_mpi -v -cpt 20 -multi 144 -replex 2000 -cpi (without
--loadbalance for mpiexec and without -ntomp for mdrun)
Then each replica runs on 4 MPI processes (I allocate 4 times more
cores then replicas and mdrun sees it). The problem is that it is much
slower than using OpenMP for each replica. I did not find any other way
than --loadbalance in mpiexec and then -multi 144 -ntomp 4 in mdrun to
use MPI and OpenMP at the same time on the torque-controlled cluster.
What do the .log files say about
OMP_NUM_THREADS, thread affinities, pinning, etc?
Each replica logs:
"Using 1 MPI process
Using 4 OpenMP threads",
That is is correct. As I said, the threads are forked, but 3 out of 4
don't do anything, and the simulation does not go at all.
About affinities Gromacs says:
"Can not set thread affinities on the current platform. On NUMA systems
this
can cause performance degradation. If you think your platform should
support
setting affinities, contact the GROMACS developers."
Well, the "current platform" is normal x86_64 cluster, but the whole
information about resources is passed by Torque to OpenMPI-linked
Gromacs. Can it be that mdrun sees the resources allocated by torque as
a big pool of cpus and misses the information about nodes topology?
If you have any suggestions how to debug or trace this issue, I would
be glad to participate.
Best,
G
Mark
On Fri, Jul 12, 2013 at 3:46 AM, gigo <g...@poczta.ibb.waw.pl> wrote:
Dear GMXers,
With Gromacs 4.6.2 I was running REMD with 144 replicas. Replicas
were
separate MPI jobs of course (OpenMPI 1.6.4). Each replica I run on 4
cores
with OpenMP. There is Torque installed on the cluster build of
12-cores
nodes, so I used the following script:
#!/bin/tcsh -f
#PBS -S /bin/tcsh
#PBS -N test
#PBS -l nodes=48:ppn=12
#PBS -l walltime=300:00:00
#PBS -l mem=288Gb
#PBS -r n
cd $PBS_O_WORKDIR
mpiexec -np 144 --loadbalance mdrun_mpi -v -cpt 20 -multi 144 -ntomp
4
-replex 2000
It was working just great with 4.6.2. It does not work with 4.6.3.
The new
version was compiled with the same options in the same environment.
Mpiexec
spreads the replicas evenly over the cluster. Each replica forks 4
threads,
but only one of them uses any cpu. Logs end at the citations. Some
empty
energy and trajectory files are created, nothing is written to them.
Please let me know if you have any immediate suggestion on how to
make it
work (maybe based on some differences between versions), or if I
should fill
the bug report with all the technical details.
Best Regards,
Grzegorz Wieczorek
--
gmx-users mailing list gmx-users@gromacs.org
http://lists.gromacs.org/mailman/listinfo/gmx-users
* Please search the archive at
http://www.gromacs.org/Support/Mailing_Lists/Search before posting!
* Please don't post (un)subscribe requests to the list. Use the www
interface or send it to gmx-users-requ...@gromacs.org.
* Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
--
gmx-users mailing list gmx-users@gromacs.org
http://lists.gromacs.org/mailman/listinfo/gmx-users
* Please search the archive at
http://www.gromacs.org/Support/Mailing_Lists/Search before posting!
* Please don't post (un)subscribe requests to the list. Use the
www interface or send it to gmx-users-requ...@gromacs.org.
* Can't post? Read http://www.gromacs.org/Support/Mailing_Lists