[gmx-users] parallelization

2013-10-17 Thread pratibha kapoor
Dear gromacs users

I would like to run my simulations on all nodes(8) with full utilisation of
all cores(2 each). I have compiled gromacs version 4.6.3 using both thread
mpi and open mpi. I am using following command:
mpirun -np 8 mdrun_mpi -v -s -nt 2 -s *.tpr -c *.gro
But I am getting following error:
Setting the total number of threads is only supported with thread-MPI and
Gromacs was compiled without thread-MPI .
Although during compilation I have used:
cmake .. -DGMX_MPI=ON -DGMX_THREAD_MPI=ON

If I dont use -nt option, I could see that all the processors(8) are
utilised but I am not sure whether all cores are being utilised. For
version 4.6.3 without mpi, I Know by default gromacs uses all the threads
but not sure if mpi version uses all threads or not.
Any help is appreciated.
-- 
gmx-users mailing listgmx-users@gromacs.org
http://lists.gromacs.org/mailman/listinfo/gmx-users
* Please search the archive at 
http://www.gromacs.org/Support/Mailing_Lists/Search before posting!
* Please don't post (un)subscribe requests to the list. Use the 
www interface or send it to gmx-users-requ...@gromacs.org.
* Can't post? Read http://www.gromacs.org/Support/Mailing_Lists


Re: [gmx-users] parallelization

2013-10-17 Thread Carsten Kutzner
Hi,

On Oct 17, 2013, at 2:25 PM, pratibha kapoor kapoorpratib...@gmail.com wrote:

 Dear gromacs users
 
 I would like to run my simulations on all nodes(8) with full utilisation of
 all cores(2 each). I have compiled gromacs version 4.6.3 using both thread
 mpi and open mpi. I am using following command:
 mpirun -np 8 mdrun_mpi -v -s -nt 2 -s *.tpr -c *.gro
 But I am getting following error:
 Setting the total number of threads is only supported with thread-MPI and
 Gromacs was compiled without thread-MPI .
 Although during compilation I have used:
 cmake .. -DGMX_MPI=ON -DGMX_THREAD_MPI=ON
you can either use MPI or thread_mpi. But you can use MPI and OpenMP with
-DGMX_MPI=ON -DGMX_OPENMP=ON

 If I dont use -nt option, I could see that all the processors(8) are
 utilised but I am not sure whether all cores are being utilised. For
You can run with 
mpirun -np 16 mdrun_mpi -v -s -nt 2 -s *.tpr -c *.gro

to use all 16 available cores.

 version 4.6.3 without mpi, I Know by default gromacs uses all the threads
 but not sure if mpi version uses all threads or not.
Take a look at the md.log output file, there it should be written
what Groamcs did use!

Best,
  Carsten

 Any help is appreciated.
 -- 
 gmx-users mailing listgmx-users@gromacs.org
 http://lists.gromacs.org/mailman/listinfo/gmx-users
 * Please search the archive at 
 http://www.gromacs.org/Support/Mailing_Lists/Search before posting!
 * Please don't post (un)subscribe requests to the list. Use the 
 www interface or send it to gmx-users-requ...@gromacs.org.
 * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists


--
Dr. Carsten Kutzner
Max Planck Institute for Biophysical Chemistry
Theoretical and Computational Biophysics
Am Fassberg 11, 37077 Goettingen, Germany
Tel. +49-551-2012313, Fax: +49-551-2012302
http://www.mpibpc.mpg.de/grubmueller/kutzner
http://www.mpibpc.mpg.de/grubmueller/sppexa

--
gmx-users mailing listgmx-users@gromacs.org
http://lists.gromacs.org/mailman/listinfo/gmx-users
* Please search the archive at 
http://www.gromacs.org/Support/Mailing_Lists/Search before posting!
* Please don't post (un)subscribe requests to the list. Use the
www interface or send it to gmx-users-requ...@gromacs.org.
* Can't post? Read http://www.gromacs.org/Support/Mailing_Lists


Re: [gmx-users] Parallelization performance

2013-03-16 Thread Mark Abraham
On Sat, Mar 16, 2013 at 1:50 AM, Sonia Aguilera 
sm.aguiler...@uniandes.edu.co wrote:

 Hi!

 I have been running MD simulations on a 6 processors machine. I just got an
 account on a cluster. A nvt stabilization takes about 8 hours on my 6
 processors machine, but it takes about 12 hours on the cluster using 16
 processors. It is my understanding that the idea of running in parallel is
 to be more efficient, wrigth?


Yes, but your performance depends on the hardware and the setup. 16
abacuses are not faster than 6 computers :-) Secondly, even if the hardware
is comparable, if your 6-processor machine has 4 cores per processor, then
OpenMP might be delivering more performance. Or your MPI environment might
be configured wrongly and you're running 16 copies of the same simulation
on superior hardware. You should inspect the top of the log files to see
what GROMACS thinks your hardware is providing, and the bottom of the log
file to see in which aspects of the simulation the two systems are
delivering difference performance.


 This is the command for the run on the 6 processors machine:
 mdrun -v -s nvtOmpA.tpr -deffnm nvtOmpA

 This is the command for the run on 16 processor on the cluster:
 mpirun -np 16 mdrun_mpi -v -s nvtOmpA.tpr -deffnm nvtOmpA

 With the last command I am imaging that my process is divided in 16
 processors that perform in parallel so that the wall time should be less
 than in the 6 processor machine. My system is a protein in oil and water,
 and the simulations are for FE calculations. I think it is spected that the
 run on the 16 processor of the cluster should be faster, but I'm getting
 the
 opposite. Am I doing something wrong?


Not as far as we know. But you need to inspect your .log files for all the
clues GROMACS provides.

This is my mdp. I have used the same mdp for simulations in 4, 6 and 8
 processor machines and everytime is faster and runs quite well. Any help
 will be grateful!!

 title= NVT equilibration
 ; Run control
 integrator   = sd   ; Langevin dynamics


There have been fixes for correctness and performance of the SD integrator
- you should certainly not be using GROMACS 4.6.

Mark


 tinit= 0
 dt   = 0.002
 nsteps   = 15; 300 ps
 nstcomm  = 100
 ; Output control
 nstxout  = 500
 nstvout  = 500
 nstfout  = 0
 nstlog   = 500
 nstenergy= 500
 nstxtcout= 0
 xtc-precision= 1000
 ; Neighborsearching and short-range nonbonded interactions
 nstlist  = 10
 ns_type  = grid
 pbc  = xyz
 rlist= 1.5
 ; Electrostatics
 coulombtype  = PME
 rcoulomb = 1.5
 ; van der Waals
 vdw-type = switch
 rvdw-switch  = 0.8
 rvdw = 0.9
 ; Apply long range dispersion corrections for Energy and Pressure
 DispCorr  = EnerPres
 ; Spacing for the PME/PPPM FFT grid
 fourierspacing   = 0.12
 ; EWALD/PME/PPPM parameters
 pme_order= 6
 ewald_rtol   = 1e-06
 epsilon_surface  = 0
 optimize_fft = no
 ; Temperature coupling
 ; tcoupl is implicitly handled by the sd integrator
 tc_grps  = system
 tau_t= 1.0
 ref_t= 300
 ; Pressure coupling is off for NVT
 Pcoupl   = No
 tau_p= 0.5
 compressibility  = 4.5e-05
 ref_p= 1.0
 ; Free energy control stuff
 free_energy  = yes
 init_lambda  = 0.1
 delta_lambda = 0
 foreign_lambda   = 0.05 0.2
 sc-alpha = 0
 sc-power = 0
 sc-sigma = 0
 couple-moltype   = Protein_chain_A ; name of moleculetype to
 decouple
 couple-lambda0   = vdw  ;
 couple-lambda1   = vdw-q   ;
 couple-intramol  = yes
 nstdhdl  = 10
 ; Generate velocities to start
 gen_vel  = yes
 gen_temp = 300
 gen_seed = -1
 ; options for bonds
 constraints  = h-bonds  ; we only have C-H bonds here
 ; Type of constraint algorithm
 constraint-algorithm = lincs
 ; Do not constrain the starting configuration
 continuation = no
 ; Highest order in the expansion of the constraint coupling matrix
 lincs-order  = 12



 Thanks in advance!

 Sonia Aguilera
 Graduate assistant





 --
 View this message in context:
 http://gromacs.5086.n6.nabble.com/Parallelization-performance-tp5006357.html
 Sent from the GROMACS Users Forum mailing list archive at Nabble.com.
 --
 gmx-users mailing listgmx-users@gromacs.org
 http://lists.gromacs.org/mailman/listinfo/gmx-users
 * Please search the archive at
 

[gmx-users] Parallelization performance

2013-03-15 Thread Sonia Aguilera
Hi!

I have been running MD simulations on a 6 processors machine. I just got an
account on a cluster. A nvt stabilization takes about 8 hours on my 6
processors machine, but it takes about 12 hours on the cluster using 16
processors. It is my understanding that the idea of running in parallel is
to be more efficient, wrigth? 

This is the command for the run on the 6 processors machine:
mdrun -v -s nvtOmpA.tpr -deffnm nvtOmpA

This is the command for the run on 16 processor on the cluster:
mpirun -np 16 mdrun_mpi -v -s nvtOmpA.tpr -deffnm nvtOmpA

With the last command I am imaging that my process is divided in 16
processors that perform in parallel so that the wall time should be less
than in the 6 processor machine. My system is a protein in oil and water,
and the simulations are for FE calculations. I think it is spected that the
run on the 16 processor of the cluster should be faster, but I'm getting the
opposite. Am I doing something wrong?

This is my mdp. I have used the same mdp for simulations in 4, 6 and 8
processor machines and everytime is faster and runs quite well. Any help
will be grateful!!

title= NVT equilibration
; Run control
integrator   = sd   ; Langevin dynamics
tinit= 0
dt   = 0.002
nsteps   = 15; 300 ps
nstcomm  = 100
; Output control
nstxout  = 500
nstvout  = 500
nstfout  = 0
nstlog   = 500
nstenergy= 500
nstxtcout= 0
xtc-precision= 1000
; Neighborsearching and short-range nonbonded interactions
nstlist  = 10
ns_type  = grid
pbc  = xyz
rlist= 1.5
; Electrostatics
coulombtype  = PME
rcoulomb = 1.5
; van der Waals
vdw-type = switch
rvdw-switch  = 0.8
rvdw = 0.9
; Apply long range dispersion corrections for Energy and Pressure
DispCorr  = EnerPres
; Spacing for the PME/PPPM FFT grid
fourierspacing   = 0.12
; EWALD/PME/PPPM parameters
pme_order= 6
ewald_rtol   = 1e-06
epsilon_surface  = 0
optimize_fft = no
; Temperature coupling
; tcoupl is implicitly handled by the sd integrator
tc_grps  = system
tau_t= 1.0
ref_t= 300
; Pressure coupling is off for NVT
Pcoupl   = No
tau_p= 0.5
compressibility  = 4.5e-05
ref_p= 1.0
; Free energy control stuff
free_energy  = yes
init_lambda  = 0.1
delta_lambda = 0
foreign_lambda   = 0.05 0.2
sc-alpha = 0
sc-power = 0
sc-sigma = 0
couple-moltype   = Protein_chain_A ; name of moleculetype to
decouple
couple-lambda0   = vdw  ;
couple-lambda1   = vdw-q   ;
couple-intramol  = yes
nstdhdl  = 10
; Generate velocities to start
gen_vel  = yes
gen_temp = 300
gen_seed = -1
; options for bonds
constraints  = h-bonds  ; we only have C-H bonds here
; Type of constraint algorithm
constraint-algorithm = lincs
; Do not constrain the starting configuration
continuation = no
; Highest order in the expansion of the constraint coupling matrix
lincs-order  = 12



Thanks in advance!

Sonia Aguilera
Graduate assistant





--
View this message in context: 
http://gromacs.5086.n6.nabble.com/Parallelization-performance-tp5006357.html
Sent from the GROMACS Users Forum mailing list archive at Nabble.com.
-- 
gmx-users mailing listgmx-users@gromacs.org
http://lists.gromacs.org/mailman/listinfo/gmx-users
* Please search the archive at 
http://www.gromacs.org/Support/Mailing_Lists/Search before posting!
* Please don't post (un)subscribe requests to the list. Use the 
www interface or send it to gmx-users-requ...@gromacs.org.
* Can't post? Read http://www.gromacs.org/Support/Mailing_Lists


[gmx-users] Parallelization scheme and terminology help

2013-01-21 Thread Brad Van Oosten
I have been lost in the sea of terminology for installing gromacs with 
multi-processors.   The plan is to upgrade from 4.5.5 to the 4.6 and i 
want the optimal install for my system.  There is a a nice explanaion at 
http://www.gromacs.org/Documentation/Acceleration_and_parallelization 
but the number of different options and terminology has confused me.


I currently have one computer with 2 processor sockets each with 4 cores 
each with 2 threads.  A mouthful which in the end allows for 16 
processes at once(2*4*2).


The way i read the documentation is that MPI is needed for the talk 
between the 2 physical processors, OpenMP does the talk between the 4 
cores in each processor and thread-MPI does the treading? or does 
thread-MPI do everything?


What would be the Parallelization scheme is required?
--
gmx-users mailing listgmx-users@gromacs.org
http://lists.gromacs.org/mailman/listinfo/gmx-users
* Please search the archive at 
http://www.gromacs.org/Support/Mailing_Lists/Search before posting!
* Please don't post (un)subscribe requests to the list. Use the 
www interface or send it to gmx-users-requ...@gromacs.org.

* Can't post? Read http://www.gromacs.org/Support/Mailing_Lists


Re: [gmx-users] Parallelization scheme and terminology help

2013-01-21 Thread Mark Abraham
On Mon, Jan 21, 2013 at 11:50 PM, Brad Van Oosten bv0...@brocku.ca wrote:

 I have been lost in the sea of terminology for installing gromacs with
 multi-processors.   The plan is to upgrade from 4.5.5 to the 4.6 and i want
 the optimal install for my system.  There is a a nice explanaion at
 http://www.gromacs.org/**Documentation/Acceleration_**and_parallelizationhttp://www.gromacs.org/Documentation/Acceleration_and_parallelizationbut
  the number of different options and terminology has confused me.


That's life, unfortunately. Nomenclature is poorly standardized and gets
re-used by different vendors to mean different things, or in different
contexts.

I currently have one computer with 2 processor sockets each with 4 cores
 each with 2 threads.  A mouthful which in the end allows for 16 processes
 at once(2*4*2).


Your sockets don't require a network to talk to each other, so thread-mpi
suffices. Probably your threads are hyper-threads, which may or may not
be useful for GROMACS. But you will need to read actual documentation and
look up chip set descriptions to really know what you have.


 The way i read the documentation is that MPI is needed for the talk
 between the 2 physical processors, OpenMP does the talk between the 4 cores
 in each processor and thread-MPI does the treading? or does thread-MPI do
 everything?

 What would be the Parallelization scheme is required?


Probably, use ThreadMPI and forget about everything else :-) This question
cannot be answered in the abstract (you'd need to know full hardware
characteristics and simulation system characteristics). It is best assessed
by trying a few options and comparing the throughput you observe on the
systems you care about.

Mark
-- 
gmx-users mailing listgmx-users@gromacs.org
http://lists.gromacs.org/mailman/listinfo/gmx-users
* Please search the archive at 
http://www.gromacs.org/Support/Mailing_Lists/Search before posting!
* Please don't post (un)subscribe requests to the list. Use the 
www interface or send it to gmx-users-requ...@gromacs.org.
* Can't post? Read http://www.gromacs.org/Support/Mailing_Lists


[gmx-users] parallelization error? gromacs-4.0.2

2008-11-20 Thread Claus Valka
Hello,

I tried from the beginning to test gromacs-4.0.2 with a monoclinic system on 8 
processors (one two quad core machine). The skew errors seem to be gone, yet 
other errors appeared.

Now after a successful md, taking the output and trying to do annealing I get 
the following error:

Fatal error:
Domain decomposition has not been implemented for box vectors that have 
non-zero components in directions that do not use dom
ain decomposition: ncells = 2 1 4, box vector[3] = -1.070924 -0.07 6.415503

In my input files I do not have a vector as the bolded one above.

I tried to run the same input on my server which has two processors and up to 
know that I'm writing this email the run is running without any errors at all.

I have compiled gromacs-4.0.2 with fftw-3.2. Any files needed or any other 
information will be at your disposal.

Thank you,
Nikos




  ___
gmx-users mailing listgmx-users@gromacs.org
http://www.gromacs.org/mailman/listinfo/gmx-users
Please search the archive at http://www.gromacs.org/search before posting!
Please don't post (un)subscribe requests to the list. Use the 
www interface or send it to [EMAIL PROTECTED]
Can't post? Read http://www.gromacs.org/mailing_lists/users.php

RE: [gmx-users] parallelization error? gromacs-4.0.2

2008-11-20 Thread Berk Hess

Hi,

Do you have anisotropic pressure coupling turned on?

Could you send me the tpr file?

Berk

Date: Thu, 20 Nov 2008 14:47:53 +
From: [EMAIL PROTECTED]
To: gmx-users@gromacs.org
Subject: [gmx-users] parallelization error? gromacs-4.0.2

Hello,

I tried from the beginning to test gromacs-4.0.2 with a monoclinic system on 8 
processors (one two quad core machine). The skew errors seem to be gone, yet 
other errors appeared.

Now after a successful md, taking the output and trying to do annealing I get 
the following error:

Fatal error:
Domain decomposition has not been implemented for box vectors that have 
non-zero components in directions that do not use dom
ain decomposition: ncells = 2 1 4, box vector[3] = -1.070924 -0.07 6.415503

In my input files I do not have a vector as the bolded one above.

I tried to run the same input on my server which has two processors and up to 
know that I'm writing this email the run is running without any errors at all.

I have compiled gromacs-4.0.2 with fftw-3.2. Any files needed or any other 
information will be at
 your disposal.

Thank you,
Nikos



_
Express yourself instantly with MSN Messenger! Download today it's FREE!
http://messenger.msn.click-url.com/go/onm00200471ave/direct/01/___
gmx-users mailing listgmx-users@gromacs.org
http://www.gromacs.org/mailman/listinfo/gmx-users
Please search the archive at http://www.gromacs.org/search before posting!
Please don't post (un)subscribe requests to the list. Use the 
www interface or send it to [EMAIL PROTECTED]
Can't post? Read http://www.gromacs.org/mailing_lists/users.php

RE: [gmx-users] parallelization error? gromacs-4.0.2

2008-11-20 Thread Berk Hess

Hi,

Ah, so you have anistropic pressure coupling on.
I forgot to put in a check for that when chosing the domain
decomposition grid.

Changing line 200 of src/mdlib/domdec_setup.c from:
if (box[j][i] != 0)
to
if (box[j][i] != 0 || ir-deform[j][i] != 0 ||
(ir-epc != epcNO  ir-compress[j][i] != 0))
should fix the problem.
If will commit this fix for 4.0.3.

(PS: you can also set -dd nx ny nz by hand, where for anisotropic
pressure coupling you should not have nx=1 or ny=1 when ny!=1 or nz!=1)

Berk

From: [EMAIL PROTECTED]
To: gmx-users@gromacs.org
Subject: RE: [gmx-users] parallelization error? gromacs-4.0.2
Date: Thu, 20 Nov 2008 22:05:52 +0100








Hi,

Do you have anisotropic pressure coupling turned on?

Could you send me the tpr file?

Berk

Date: Thu, 20 Nov 2008 14:47:53 +
From: [EMAIL PROTECTED]
To: gmx-users@gromacs.org
Subject: [gmx-users] parallelization error? gromacs-4.0.2

Hello,

I tried from the beginning to test gromacs-4.0.2 with a monoclinic system on 8 
processors (one two quad core machine). The skew errors seem to be gone, yet 
other errors appeared.

Now after a successful md, taking the output and trying to do annealing I get 
the following error:

Fatal error:
Domain decomposition has not been implemented for box vectors that have 
non-zero components in directions that do not use dom
ain decomposition: ncells = 2 1 4, box vector[3] = -1.070924 -0.07 6.415503

In my input files I do not have a vector as the bolded one above.

I tried to run the same input on my server which has two processors and up to 
know that I'm writing this email the run is running without any errors at all.

I have compiled gromacs-4.0.2 with fftw-3.2. Any files needed or any other 
information will be at
 your disposal.

Thank you,
Nikos



Express yourself instantly with MSN Messenger! MSN Messenger
_
Express yourself instantly with MSN Messenger! Download today it's FREE!
http://messenger.msn.click-url.com/go/onm00200471ave/direct/01/___
gmx-users mailing listgmx-users@gromacs.org
http://www.gromacs.org/mailman/listinfo/gmx-users
Please search the archive at http://www.gromacs.org/search before posting!
Please don't post (un)subscribe requests to the list. Use the 
www interface or send it to [EMAIL PROTECTED]
Can't post? Read http://www.gromacs.org/mailing_lists/users.php