On 24/12/2010 3:28 AM, Wojtyczka, André wrote:
On 23/12/2010 10:01 PM, Wojtyczka, André wrote:
Dear Gromacs Enthusiasts.
I am experiencing problems with mdrun_mpi (4.5.3) on a Nehalem cluster.
Problem:
This runs fine:
mpiexec -np 72 /../mdrun_mpi -pd -s full031K_mdrun_ions.tpr
This produces a segmentation fault:
mpiexec -np 128 /../mdrun_mpi -pd -s full031K_mdrun_ions.tpr
Unless you know you need it, don't use -pd. DD will be faster and is
probably better bug-tested too.
Mark
Hi Mark
thanks for the push into that direction, but I am in the unfortunate situation
where
I really need -pd because I have long bonds which is the reason why my large
system
is decomposable just into a little number of domains.
I'm not sure that PD has any advantage here. From memory it has to
create a 128x1x1 grid, and you can direct that with DD also.
The contents of your .log file will be far more helpful than stdout in
diagnosing what condition led to the problem.
Mark
So the only difference is the number of cores I am using.
mdrun_mpi was compiled using the intel compiler 11.1.072 with my own fftw3
installation.
While configuring and make mdrun / make install-mdrun no errors came
up.
Is there some issue with threading or mpi?
If someone has a clue please give me a hint.
integrator = md
dt = 0.004
nsteps = 25000000
nstxout = 0
nstvout = 0
nstlog = 250000
nstenergy = 250000
nstxtcout = 12500
xtc_grps = protein
energygrps = protein non-protein
nstlist = 2
ns_type = grid
rlist = 0.9
coulombtype = PME
rcoulomb = 0.9
fourierspacing = 0.12
pme_order = 4
ewald_rtol = 1e-5
rvdw = 0.9
pbc = xyz
periodic_molecules = yes
tcoupl = nose-hoover
nsttcouple = 1
tc-grps = protein non-protein
tau_t = 0.1 0.1
ref_t = 310 310
Pcoupl = no
gen_vel = yes
gen_temp = 310
gen_seed = 173529
constraints = all-bonds
Error:
Getting Loaded...
Reading file full031K_mdrun_ions.tpr, VERSION 4.5.3 (single precision)
Loaded with Money
NOTE: The load imbalance in PME FFT and solve is 48%.
For optimal PME load balancing
PME grid_x (144) and grid_y (144) should be divisible by #PME_nodes_x
(128)
and PME grid_y (144) and grid_z (144) should be divisible by
#PME_nodes_y (1)
Step 0, time 0 (ps)
PSIlogger: Child with rank 82 exited on signal 11: Segmentation fault
PSIlogger: Child with rank 79 exited on signal 11: Segmentation fault
PSIlogger: Child with rank 2 exited on signal 11: Segmentation fault
PSIlogger: Child with rank 1 exited on signal 11: Segmentation fault
PSIlogger: Child with rank 100 exited on signal 11: Segmentation fault
PSIlogger: Child with rank 97 exited on signal 11: Segmentation fault
PSIlogger: Child with rank 98 exited on signal 11: Segmentation fault
PSIlogger: Child with rank 96 exited on signal 6: Aborted
...
Ps, for now I don't care about the imbalanced PME load unless it's independent
from my problem.
Cheers
André
------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------
Forschungszentrum Juelich GmbH
52425 Juelich
Sitz der Gesellschaft: Juelich
Eingetragen im Handelsregister des Amtsgerichts Dueren Nr. HR B 3498
Vorsitzender des Aufsichtsrats: MinDirig Dr. Karl Eugen Huthmacher
Geschaeftsfuehrung: Prof. Dr. Achim Bachem (Vorsitzender),
Dr. Ulrich Krafft (stellv. Vorsitzender), Prof. Dr.-Ing. Harald Bolt,
Prof. Dr. Sebastian M. Schmidt
------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------
--
gmx-users mailing list gmx-users@gromacs.org
http://lists.gromacs.org/mailman/listinfo/gmx-users
Please search the archive at
http://www.gromacs.org/Support/Mailing_Lists/Search before posting!
Please don't post (un)subscribe requests to the list. Use the
www interface or send it to gmx-users-requ...@gromacs.org.
Can't post? Read http://www.gromacs.org/Support/Mailing_Lists