Re: [gmx-users] Multi-level parallelization: MPI + OpenMP

2013-07-22 Thread Éric Germaneau

Dear Szilárd,

I'm making some tests using 2 ranks/node, what I was trying to do.
It seems working now, thank you.

 Éric.

On 07/19/2013 08:56 PM, Szilárd Páll wrote:

Depending on the level of parallelization (number of nodes and number
of particles/core) you may want to try:

- 2 ranks/node: 8 cores + 1 GPU, no separate PME (default):
   mpirun -np 2*Nnodes mdrun_mpi [-gpu_id 01 -npme 0]

- 4 ranks per node: 4 cores + 1 GPU (shared between two ranks), no separate PME
   mpirun -np 4*Nnodes mdrun_mpi -gpu_id 0011 [-npme 0]

- 4 ranks per node, 2 PP/2PME: 4 cores + 1 GPU (not shared), separate PME
   mpirun -np 4*Nnodes mdrun_mpi [-gpu_id 01] -npme 2*Nnodes

- at high parallelization you may want to try (especially with
homogeneous systems) 8 ranks per node

Cheers,
--
Szilárd


On Fri, Jul 19, 2013 at 4:35 AM, Éric Germaneau  wrote:

Dear all,

I'm note a gromacs user,  I've installed gromacs 4.6.3 on our cluster and
making some test.
Each node of our machine has 16 cores and 2 GPU.
I'm trying to figure how to submit efficient multiple nodes LSF jobs using
the maximum of resources.
After reading the documentation

on "Acceleration and parallelization" I got confused and inquire some help.
I'm just wondering whether someone with some experiences on this matter.
I thank you in advance,

 Éric.

--
/Be the change you wish to see in the world
/ --- Mahatma Gandhi ---

Éric Germaneau 

Shanghai Jiao Tong University
Network & Information Center
room 205
Minhang Campus
800 Dongchuan Road
Shanghai 200240
China

View Éric Germaneau's profile on LinkedIn


/Please, if possible, don't send me MS Word or PowerPoint attachments
Why? See: http://www.gnu.org/philosophy/no-word-attachments.html/

--
gmx-users mailing listgmx-users@gromacs.org
http://lists.gromacs.org/mailman/listinfo/gmx-users
* Please search the archive at
http://www.gromacs.org/Support/Mailing_Lists/Search before posting!
* Please don't post (un)subscribe requests to the list. Use the www
interface or send it to gmx-users-requ...@gromacs.org.
* Can't post? Read http://www.gromacs.org/Support/Mailing_Lists


--
/Be the change you wish to see in the world
/ --- Mahatma Gandhi ---

Éric Germaneau 

Shanghai Jiao Tong University
Network & Information Center
room 205
Minhang Campus
800 Dongchuan Road
Shanghai 200240
China

View Éric Germaneau's profile on LinkedIn 



/Please, if possible, don't send me MS Word or PowerPoint attachments
Why? See: http://www.gnu.org/philosophy/no-word-attachments.html/

--
gmx-users mailing listgmx-users@gromacs.org
http://lists.gromacs.org/mailman/listinfo/gmx-users
* Please search the archive at 
http://www.gromacs.org/Support/Mailing_Lists/Search before posting!
* Please don't post (un)subscribe requests to the list. Use the 
www interface or send it to gmx-users-requ...@gromacs.org.

* Can't post? Read http://www.gromacs.org/Support/Mailing_Lists


Re: [gmx-users] Multi-level parallelization: MPI + OpenMP

2013-07-19 Thread Szilárd Páll
Depending on the level of parallelization (number of nodes and number
of particles/core) you may want to try:

- 2 ranks/node: 8 cores + 1 GPU, no separate PME (default):
  mpirun -np 2*Nnodes mdrun_mpi [-gpu_id 01 -npme 0]

- 4 ranks per node: 4 cores + 1 GPU (shared between two ranks), no separate PME
  mpirun -np 4*Nnodes mdrun_mpi -gpu_id 0011 [-npme 0]

- 4 ranks per node, 2 PP/2PME: 4 cores + 1 GPU (not shared), separate PME
  mpirun -np 4*Nnodes mdrun_mpi [-gpu_id 01] -npme 2*Nnodes

- at high parallelization you may want to try (especially with
homogeneous systems) 8 ranks per node

Cheers,
--
Szilárd


On Fri, Jul 19, 2013 at 4:35 AM, Éric Germaneau  wrote:
> Dear all,
>
> I'm note a gromacs user,  I've installed gromacs 4.6.3 on our cluster and
> making some test.
> Each node of our machine has 16 cores and 2 GPU.
> I'm trying to figure how to submit efficient multiple nodes LSF jobs using
> the maximum of resources.
> After reading the documentation
> 
> on "Acceleration and parallelization" I got confused and inquire some help.
> I'm just wondering whether someone with some experiences on this matter.
> I thank you in advance,
>
> Éric.
>
> --
> /Be the change you wish to see in the world
> / --- Mahatma Gandhi ---
>
> Éric Germaneau 
>
> Shanghai Jiao Tong University
> Network & Information Center
> room 205
> Minhang Campus
> 800 Dongchuan Road
> Shanghai 200240
> China
>
> View Éric Germaneau's profile on LinkedIn
> 
>
> /Please, if possible, don't send me MS Word or PowerPoint attachments
> Why? See: http://www.gnu.org/philosophy/no-word-attachments.html/
>
> --
> gmx-users mailing listgmx-users@gromacs.org
> http://lists.gromacs.org/mailman/listinfo/gmx-users
> * Please search the archive at
> http://www.gromacs.org/Support/Mailing_Lists/Search before posting!
> * Please don't post (un)subscribe requests to the list. Use the www
> interface or send it to gmx-users-requ...@gromacs.org.
> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
--
gmx-users mailing listgmx-users@gromacs.org
http://lists.gromacs.org/mailman/listinfo/gmx-users
* Please search the archive at 
http://www.gromacs.org/Support/Mailing_Lists/Search before posting!
* Please don't post (un)subscribe requests to the list. Use the
www interface or send it to gmx-users-requ...@gromacs.org.
* Can't post? Read http://www.gromacs.org/Support/Mailing_Lists


Re: [gmx-users] Multi-level parallelization: MPI + OpenMP

2013-07-19 Thread Mark Abraham
What's the simplest case you can make work?

Mark

On Fri, Jul 19, 2013 at 8:38 AM, Éric Germaneau  wrote:
> I actually submitted  using two MPI process per node but log files do not
> get updated, it's like the calculation gets stuck.
>
> Here is how I proceed:
>
>mpirun -np $NM -machinefile nodegpu mdrun_mpi  -nb gpu -v -deffnm
>test184000atoms_verlet.tpr >& mdrun_mpi.log
>
> with the content of /nodegpu/:
>
>gpu04
>gpu04
>gpu11
>gpu11
>
> and with
>
>NM=`cat nodegpu | wc -l`
>
> /bjobs/ gives
>
>3983hpceric RUN   gpu mu0516*gpu11gromacsJul 19
>12:12
>16*gpu04
>
> /mdrun_mpi.log/ contains the description of the options and
> /test184000atoms_verlet.tpr.log/ stops after "PLEASE READ AND CITE THE
> FOLLOWING REFERENCE".
>
> The top of /test184000atoms_verlet.tpr.log/ is:
>
>Log file opened on Fri Jul 19 13:47:36 2013
>Host: gpu11  pid: 124677  nodeid: 0  nnodes:  4
>Gromacs version:VERSION 4.6.3
>Precision:  single
>Memory model:   64 bit
>MPI library:MPI
>OpenMP support: enabled
>GPU support:enabled
>invsqrt routine:gmx_software_invsqrt(x)
>CPU acceleration:   AVX_256
>FFT library:fftw-3.3.3-sse2-avx
>Large file support: enabled
>RDTSCP usage:   enabled
>Built on:   Mon Jul 15 13:44:42 CST 2013
>Built by:   name@node [CMAKE]
>Build OS/arch:  Linux 2.6.32-279.el6.x86_64 x86_64
>Build CPU vendor:   GenuineIntel
>Build CPU brand:Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz
>Build CPU family:   6   Model: 45   Stepping: 7
>Build CPU features: aes apic avx clfsh cmov cx8 cx16 htt lahf_lm mmx
>msr nonstop_tsc pcid pclmuldq pdcm pdpe1gb popcnt pse rdtscp sse2
>sse3 sse4.1 sse4.2 ssse3 tdt x2apic
>C compiler: /lustre/utility/intel/impi/4.1.1.036/intel64/bin/mpicc
>GNU gcc (GCC) 4.4.6 20120305 (Red Hat 4.4.6-4)
>C compiler flags:   -mavx-Wextra -Wno-missing-field-initializers
>-Wno-sign-compare -Wall -Wno-unused -Wunused-value
> -fomit-frame-pointer -funroll-all-loops  -O3 -DNDEBUG
>C++ compiler:
>/lustre/utility/intel/impi/4.1.1.036/intel64/bin/mpicxx GNU g++
>(GCC) 4.4.6 20120305 (Red Hat 4.4.6-4)
>C++ compiler flags: -mavx   -Wextra -Wno-missing-field-initializers
>-Wno-sign-compare -Wall -Wno-unused -Wunused-value
> -fomit-frame-pointer -funroll-all-loops  -O3 -DNDEBUG
>CUDA compiler:  /lustre/utility/cuda-5.0/bin/nvcc nvcc: NVIDIA
>(R) Cuda compiler driver;Copyright (c) 2005-2012 NVIDIA
>Corporation;Built on Fri_Sep_21_17:28:58_PDT_2012;Cuda compilation
>tools, release 5.0, V0.2.1221
>CUDA compiler
>
> flags:-gencode;arch=compute_20,code=sm_20;-gencode;arch=compute_20,code=sm_21;-gencode;arch=compute_30,code=sm_30;-gencode;arch=compute_30,code=compute_30;-use_fast_math;-Xcompiler;-fPIC
>;
>
> -mavx;-Wextra;-Wno-missing-field-initializers;-Wno-sign-compare;-Wall;-Wno-unused;-Wunused-value;-fomit-frame-pointer;-funroll-all-loops;-O3;-DNDEBUG
>CUDA driver:5.0
>CUDA runtime:   5.0
>
> Does any have any idea about what's going wrong here?
> Thanks,
>
>  Éric.
>
>
> On 07/19/2013 09:35 AM, Éric Germaneau wrote:
>>
>> Dear all,
>>
>> I'm note a gromacs user,  I've installed gromacs 4.6.3 on our cluster and
>> making some test.
>> Each node of our machine has 16 cores and 2 GPU.
>> I'm trying to figure how to submit efficient multiple nodes LSF jobs using
>> the maximum of resources.
>> After reading the documentation
>> 
>> on "Acceleration and parallelization" I got confused and inquire some help.
>> I'm just wondering whether someone with some experiences on this matter.
>> I thank you in advance,
>>
>> Éric.
>>
>
> --
> /Be the change you wish to see in the world
> / --- Mahatma Gandhi ---
>
> Éric Germaneau 
>
> Shanghai Jiao Tong University
> Network & Information Center
> room 205
> Minhang Campus
> 800 Dongchuan Road
> Shanghai 200240
> China
>
> View Éric Germaneau's profile on LinkedIn
> 
>
> /Please, if possible, don't send me MS Word or PowerPoint attachments
> Why? See: http://www.gnu.org/philosophy/no-word-attachments.html/
>
> --
> gmx-users mailing listgmx-users@gromacs.org
> http://lists.gromacs.org/mailman/listinfo/gmx-users
> * Please search the archive at
> http://www.gromacs.org/Support/Mailing_Lists/Search before posting!
> * Please don't post (un)subscribe requests to the list. Use the www
> interface or send it to gmx-users-requ...@gromacs.org.
> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
--
gmx-users mailing listgmx-users@gromacs.org
http://lists.gromacs.org/mailman/listinfo/gmx-users
* Please search 

Re: [gmx-users] Multi-level parallelization: MPI + OpenMP

2013-07-18 Thread Éric Germaneau
I actually submitted  using two MPI process per node but log files do 
not get updated, it's like the calculation gets stuck.


Here is how I proceed:

   mpirun -np $NM -machinefile nodegpu mdrun_mpi  -nb gpu -v -deffnm
   test184000atoms_verlet.tpr >& mdrun_mpi.log

with the content of /nodegpu/:

   gpu04
   gpu04
   gpu11
   gpu11

and with

   NM=`cat nodegpu | wc -l`

/bjobs/ gives

   3983hpceric RUN   gpu mu0516*gpu11gromacsJul 19
   12:12
   16*gpu04

/mdrun_mpi.log/ contains the description of the options and 
/test184000atoms_verlet.tpr.log/ stops after "PLEASE READ AND CITE THE 
FOLLOWING REFERENCE".


The top of /test184000atoms_verlet.tpr.log/ is:

   Log file opened on Fri Jul 19 13:47:36 2013
   Host: gpu11  pid: 124677  nodeid: 0  nnodes:  4
   Gromacs version:VERSION 4.6.3
   Precision:  single
   Memory model:   64 bit
   MPI library:MPI
   OpenMP support: enabled
   GPU support:enabled
   invsqrt routine:gmx_software_invsqrt(x)
   CPU acceleration:   AVX_256
   FFT library:fftw-3.3.3-sse2-avx
   Large file support: enabled
   RDTSCP usage:   enabled
   Built on:   Mon Jul 15 13:44:42 CST 2013
   Built by:   name@node [CMAKE]
   Build OS/arch:  Linux 2.6.32-279.el6.x86_64 x86_64
   Build CPU vendor:   GenuineIntel
   Build CPU brand:Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz
   Build CPU family:   6   Model: 45   Stepping: 7
   Build CPU features: aes apic avx clfsh cmov cx8 cx16 htt lahf_lm mmx
   msr nonstop_tsc pcid pclmuldq pdcm pdpe1gb popcnt pse rdtscp sse2
   sse3 sse4.1 sse4.2 ssse3 tdt x2apic
   C compiler: /lustre/utility/intel/impi/4.1.1.036/intel64/bin/mpicc
   GNU gcc (GCC) 4.4.6 20120305 (Red Hat 4.4.6-4)
   C compiler flags:   -mavx-Wextra -Wno-missing-field-initializers
   -Wno-sign-compare -Wall -Wno-unused -Wunused-value  
   -fomit-frame-pointer -funroll-all-loops  -O3 -DNDEBUG

   C++ compiler:
   /lustre/utility/intel/impi/4.1.1.036/intel64/bin/mpicxx GNU g++
   (GCC) 4.4.6 20120305 (Red Hat 4.4.6-4)
   C++ compiler flags: -mavx   -Wextra -Wno-missing-field-initializers
   -Wno-sign-compare -Wall -Wno-unused -Wunused-value  
   -fomit-frame-pointer -funroll-all-loops  -O3 -DNDEBUG

   CUDA compiler:  /lustre/utility/cuda-5.0/bin/nvcc nvcc: NVIDIA
   (R) Cuda compiler driver;Copyright (c) 2005-2012 NVIDIA
   Corporation;Built on Fri_Sep_21_17:28:58_PDT_2012;Cuda compilation
   tools, release 5.0, V0.2.1221
   CUDA compiler
   
flags:-gencode;arch=compute_20,code=sm_20;-gencode;arch=compute_20,code=sm_21;-gencode;arch=compute_30,code=sm_30;-gencode;arch=compute_30,code=compute_30;-use_fast_math;-Xcompiler;-fPIC
   ;
   
-mavx;-Wextra;-Wno-missing-field-initializers;-Wno-sign-compare;-Wall;-Wno-unused;-Wunused-value;-fomit-frame-pointer;-funroll-all-loops;-O3;-DNDEBUG
   CUDA driver:5.0
   CUDA runtime:   5.0

Does any have any idea about what's going wrong here?
Thanks,

 Éric.

On 07/19/2013 09:35 AM, Éric Germaneau wrote:

Dear all,

I'm note a gromacs user,  I've installed gromacs 4.6.3 on our cluster 
and making some test.

Each node of our machine has 16 cores and 2 GPU.
I'm trying to figure how to submit efficient multiple nodes LSF jobs 
using the maximum of resources.
After reading the documentation 
 
on "Acceleration and parallelization" I got confused and inquire some 
help.

I'm just wondering whether someone with some experiences on this matter.
I thank you in advance,

Éric.



--
/Be the change you wish to see in the world
/ --- Mahatma Gandhi ---

Éric Germaneau 

Shanghai Jiao Tong University
Network & Information Center
room 205
Minhang Campus
800 Dongchuan Road
Shanghai 200240
China

View Éric Germaneau's profile on LinkedIn 



/Please, if possible, don't send me MS Word or PowerPoint attachments
Why? See: http://www.gnu.org/philosophy/no-word-attachments.html/

--
gmx-users mailing listgmx-users@gromacs.org
http://lists.gromacs.org/mailman/listinfo/gmx-users
* Please search the archive at 
http://www.gromacs.org/Support/Mailing_Lists/Search before posting!
* Please don't post (un)subscribe requests to the list. Use the 
www interface or send it to gmx-users-requ...@gromacs.org.

* Can't post? Read http://www.gromacs.org/Support/Mailing_Lists


[gmx-users] Multi-level parallelization: MPI + OpenMP

2013-07-18 Thread Éric Germaneau

Dear all,

I'm note a gromacs user,  I've installed gromacs 4.6.3 on our cluster 
and making some test.

Each node of our machine has 16 cores and 2 GPU.
I'm trying to figure how to submit efficient multiple nodes LSF jobs 
using the maximum of resources.
After reading the documentation 
 
on "Acceleration and parallelization" I got confused and inquire some help.

I'm just wondering whether someone with some experiences on this matter.
I thank you in advance,

Éric.

--
/Be the change you wish to see in the world
/ --- Mahatma Gandhi ---

Éric Germaneau 

Shanghai Jiao Tong University
Network & Information Center
room 205
Minhang Campus
800 Dongchuan Road
Shanghai 200240
China

View Éric Germaneau's profile on LinkedIn 



/Please, if possible, don't send me MS Word or PowerPoint attachments
Why? See: http://www.gnu.org/philosophy/no-word-attachments.html/

--
gmx-users mailing listgmx-users@gromacs.org
http://lists.gromacs.org/mailman/listinfo/gmx-users
* Please search the archive at 
http://www.gromacs.org/Support/Mailing_Lists/Search before posting!
* Please don't post (un)subscribe requests to the list. Use the 
www interface or send it to gmx-users-requ...@gromacs.org.

* Can't post? Read http://www.gromacs.org/Support/Mailing_Lists