Re: [gmx-users] gromacs performance

Benson Muite Fri, 08 Mar 2019 14:19:56 -0800

You seem to be using a relatively large number of GPUs. May want tocheck your input data (many cases will not scale well, but ensemble runscan be quite common). Perhaps check speedup in going from 1 to 2 to 4GPUs on one node.


On 3/9/19 12:11 AM, Carlos Rivas wrote:

Hey guys,
Anybody running GROMACS on AWS ?


I have a strong IT background , but zero understanding of GROMACS or OpenMPI. ( 
even less using sge on AWS ),
Just trying to help some PHD Folks with their work.

When I run gromacs using Thread-mpi on a single, very large node on AWS things 
work fairly fast.
However, when I switch from thread-mpi to OpenMPI even though everything's 
detected properly, the performance is horrible.

This is what I am submitting to sge:

ubuntu@ip-10-10-5-81:/shared/charmm-gui/gromacs$ cat sge.sh
#!/bin/bash
#
#$ -cwd
#$ -j y
#$ -S /bin/bash
#$ -e out.err
#$ -o out.out
#$ -pe mpi 256

cd /shared/charmm-gui/gromacs
touch start.txt
/bin/bash /shared/charmm-gui/gromacs/run_eq.bash
touch end.txt

and this is my test script , provided by one of the Doctors:

ubuntu@ip-10-10-5-81:/shared/charmm-gui/gromacs$ cat run_eq.bash
#!/bin/bash
export GMXMPI="/usr/bin/mpirun --mca btl ^openib 
/shared/gromacs/5.1.5/bin/gmx_mpi"

export MDRUN="mdrun -ntomp 2 -npme 32"

export GMX="/shared/gromacs/5.1.5/bin/gmx_mpi"

for comm in min eq; do
if [ $comm == min ]; then
    echo ${comm}
    $GMX grompp -f step6.0_minimization.mdp -o step6.0_minimization.tpr -c 
step5_charmm2gmx.pdb -p topol.top
    $GMXMPI $MDRUN -deffnm step6.0_minimization

fi

if [ $comm == eq ]; then
   for step in `seq 1 6`;do
    echo $step
    if [ $step -eq 1 ]; then
       echo ${step}
       $GMX grompp -f step6.${step}_equilibration.mdp -o 
step6.${step}_equilibration.tpr -c step6.0_minimization.gro -r 
step5_charmm2gmx.pdb -n index.ndx -p topol.top
       $GMXMPI $MDRUN -deffnm step6.${step}_equilibration
    fi
    if [ $step -gt 1 ]; then
       old=`expr $step - 1`
       echo $old
       $GMX grompp -f step6.${step}_equilibration.mdp -o 
step6.${step}_equilibration.tpr -c step6.${old}_equilibration.gro -r 
step5_charmm2gmx.pdb -n index.ndx -p topol.top
       $GMXMPI $MDRUN -deffnm step6.${step}_equilibration
    fi
   done
fi
done




during the output, I see this , and I get really excited, expecting blazing 
speeds and yet, it's much worse than a single node:

Command line:
   gmx_mpi mdrun -ntomp 2 -npme 32 -deffnm step6.0_minimization


Back Off! I just backed up step6.0_minimization.log to 
./#step6.0_minimization.log.6#

Running on 4 nodes with total 128 cores, 256 logical cores, 32 compatible GPUs
   Cores per node:           32
   Logical cores per node:   64
   Compatible GPUs per node:  8
   All nodes have identical type(s) of GPUs
Hardware detected on host ip-10-10-5-89 (the node of MPI rank 0):
   CPU info:
     Vendor: GenuineIntel
     Brand:  Intel(R) Xeon(R) CPU E5-2686 v4 @ 2.30GHz
     SIMD instructions most likely to fit this hardware: AVX2_256
     SIMD instructions selected at GROMACS compile time: AVX2_256
   GPU info:
     Number of GPUs detected: 8
     #0: NVIDIA Tesla V100-SXM2-16GB, compute cap.: 7.0, ECC: yes, stat: 
compatible
     #1: NVIDIA Tesla V100-SXM2-16GB, compute cap.: 7.0, ECC: yes, stat: 
compatible
     #2: NVIDIA Tesla V100-SXM2-16GB, compute cap.: 7.0, ECC: yes, stat: 
compatible
     #3: NVIDIA Tesla V100-SXM2-16GB, compute cap.: 7.0, ECC: yes, stat: 
compatible
     #4: NVIDIA Tesla V100-SXM2-16GB, compute cap.: 7.0, ECC: yes, stat: 
compatible
     #5: NVIDIA Tesla V100-SXM2-16GB, compute cap.: 7.0, ECC: yes, stat: 
compatible
     #6: NVIDIA Tesla V100-SXM2-16GB, compute cap.: 7.0, ECC: yes, stat: 
compatible
     #7: NVIDIA Tesla V100-SXM2-16GB, compute cap.: 7.0, ECC: yes, stat: 
compatible

Reading file step6.0_minimization.tpr, VERSION 5.1.5 (single precision)
Using 256 MPI processes
Using 2 OpenMP threads per MPI process

On host ip-10-10-5-89 8 compatible GPUs are present, with IDs 0,1,2,3,4,5,6,7
On host ip-10-10-5-89 8 GPUs auto-selected for this run.
Mapping of GPU IDs to the 56 PP ranks in this node: 
0,0,0,0,0,0,0,1,1,1,1,1,1,1,2,2,2,2,2,2,2,3,3,3,3,3,3,3,4,4,4,4,4,4,4,5,5,5,5,5,5,5,6,6,6,6,6,6,6,7,7,7,7,7,7,7



Any suggestions? Greatly appreciate the help.


Carlos J. Rivas
Senior AWS Solutions Architect - Migration Specialist

--
Gromacs Users mailing list

* Please search the archive at 
http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting!

* Can't post? Read http://www.gromacs.org/Support/Mailing_Lists

* For (un)subscribe requests visit
https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a 
mail to gmx-users-requ...@gromacs.org.

Re: [gmx-users] gromacs performance

Reply via email to