----- Original Message -----
From: xho...@sohu.com
Date: Tuesday, June 1, 2010 21:59
Subject:  Re: [gmx-users] “Fatal error in PMPI_Bcast: Other MPI error, …..” 
occurs when using the ‘particle decomposition’ option.
To: Discussion list for GROMACS users <gmx-users@gromacs.org>

> 
> Hi, Mark,
> Thanks for the reply! 
> It seemed that I got something messed up. At the beginning, I used 
> ‘constraints = all-bonds’ and ‘domain decomposition’.
>When the simulation scale to more than 2 processes, an error like below will 
>occur: 

The "domain_decomposition" .mdp flag is an artefact of pre-GROMACS-4 
development of DD. It does nothing. Forget about it. DD is enabled by default 
unless you use mdrun -pd.

> ####################
> Fatal error: There is no domain decomposition for 6 nodes that is compatible 
> with the given box and a minimum cell size of 2.06375 nm
> Change the number of nodes or mdrun option -rcon or -dds or your LINCS 
> settings
> Look in the log file for details on the domain decomposition
> ####################
>  

With DD and all-bonds, the coupled constraints create a minimum cell diameter 
that must be satisfied on all processors. Your system is too small for this to 
be true. The manual sections on DD mention this, though perhaps you wouldn't 
pick that up on a first reading.

> I refer to the manual and found no answer. Then I turned to use ‘particle 
> decomposition’, tried
> all kind of method, including change mpich to lammpi, change Gromacs from 
> V4.05
> to V4.07,adjusting the mdp file (e.g. ‘constraints = hbonds’ or no PME), and 
> none of these
> take effect! I thought I have tried ‘constraints = hbonds’ with ‘domain 
> decomposition’, at least with lammpi. 

PD might fail for a similar reason, I suppose.

> However, when I tried ‘constraints = hbonds’ and ‘domain decomposition’ under 
> mpich today, it scaled to more than 2 processes well! And now it also scaled 
> well under lammpi using ‘constraints
= hbonds’ and ‘domain decomposition’!

Yep. Your constraints are not so tightly coupled now.

> So, it seemed the key is ‘constraints= hbonds’ for ‘domain decomposition’.

Knowing how your tools work is key :-) The problem with complex tools like 
GROMACS is knowing what's worth knowing :-)

>  
> Of course, the simulation still crashed when using ‘particle decomposition’ 
> with ‘constraints = hbonds or all-bonds’, and I don’t know why.

Again, your system is probably too small to be bothered with parallelising with 
constraints.

> I use double precision version and NTP ensemble to perform a PCA!

I doubt that you need to collect data in double precision. Any supposed extra 
accuracy of integration is probably getting swapped by noise from temperature 
coupling.  I suppose you may wish to run the analysis tool in double, but it'll 
read a single-precision trajectory just fine. Using single precision will make 
things more than a factor of two faster.

Mark

>  > 
> ----- Original Message -----
> From: xho...@sohu.com
> Date: Tuesday, June 1, 2010 11:53
> Subject: [gmx-users] “Fatal error in PMPI_Bcast: Other MPI error, …..” occurs 
> when using the ‘particle decomposition’ option.
> To: gmx-users <gmx-users@gromacs.org>


> 
> > Hi, everyone of gmx-users,
> > 
> > I met a problem when I use the ‘particle decomposition’ option 
> > in a NTP MD simulation of Engrailed Homeodomain (En) in CL- 
> > neutralized water box. It just crashed with an error “Fatal 
> > error in PMPI_Bcast: Other MPI error, error stack: …..”. 
> > However, I’ve tried the ‘domain decomposition’ and everything is 
> > ok! I use the Gromacs 4.05 and 4.07, the MPI lib is mpich2-
> > 1.2.1p1. The system box size is 5.386(nm)3. The MDP file list as 
> > below:
> > ########################################################
> > title                    = En
> > ;cpp                      = /lib/cpp
> > ;include                  = -I../top
> > define                   = 
> > integrator               = md
> > dt                       = 0.002
> > nsteps                   = 3000000
> > nstxout                  = 500
> > nstvout                  = 500
> > nstlog                   = 250
> > nstenergy                = 250
> > nstxtcout                 = 500
> > comm-
> > mode              = Linear
> > nstcomm                  = 1
> > 
> > ;xtc_grps                 = Protein
> > energygrps               = protein non-protein
> > 
> > nstlist                  = 10
> > ns_type                  = grid
> > pbc                      = xyz      ;default xyz
> > ;periodic_molecules       = 
> > yes ;default no
> > rlist                    = 1.0
> > 
> > coulombtype              = PME
> > rcoulomb                 = 1.0
> > vdwtype                  = Cut-off
> > rvdw                     = 1.4
> > fourierspacing           = 0.12
> > fourier_nx               = 0
> > fourier_ny               = 0
> > fourier_nz               = 0
> > pme_order                = 4
> > ewald_rtol               = 1e-5
> > optimize_fft             = yes
> > 
> > tcoupl                   = v-rescale
> > tc_grps                  = protein non-protein
> > tau_t                    = 0.1  0.1
> > ref_t                    = 298  298
> > Pcoupl                   = Parrinello-Rahman
> > pcoupltype               = isotropic
> > tau_p                    = 0.5
> > compressibility          = 4.5e-5
> > ref_p                    = 1.0
> > 
> > gen_vel                  = yes
> > gen_temp                 = 298
> > gen_seed                 = 173529
> > 
> > constraints              = hbonds
> > lincs_order              = 10
> > ########################################################
> > 
> > When I conduct MD using “nohup mpiexec -np 2 mdrun_dmpi -s 
> > 11_Trun.tpr -g 12_NTPmd.log -o 12_NTPmd.trr -c 12_NTPmd.pdb -e 
> > 12_NTPmd_ener.edr -cpo 12_NTPstate.cpt &”, everything is OK.
> > 
> > Since the system doesn’t support more than 2 processes under 
> > ‘domain decomposition’ option, it took me about 30 days to 
> > calculate a 6ns trajectory. Then I decide to use the ‘particle > 
> 
> Why no more than 2? What GROMACS version? Why are you using double precision 
> with temperature coupling?
> 
> MPICH has known issues. Use OpenMPI.
> 
> > decomposition’ option. The command line is “nohup mpiexec -np 6 
> > mdrun_dmpi -pd -s 11_Trun.tpr -g 12_NTPmd.log -o 12_NTPmd.trr -c 
> > 12_NTPmd.pdb -e 12_NTPmd_ener.edr -cpo 12_NTPstate.cpt &”. And I 
> > got the crash in the nohup file like below:
> > ####################
> > Fatal error in PMPI_Bcast: Other MPI error, error stack:
> > PMPI_Bcast(1302)......................: MPI_Bcast(buf=0x8fedeb0, 
> > count=60720, MPI_BYTE, root=0, MPI_COMM_WORLD) failed
> > MPIR_Bcast(998).......................: 
> > > MPIR_Bcast_scatter_ring_al> lgather(842): 
> > MPIR_Bcast_binomial(187)..............: 
> > MPIC_Send(41).........................: 
> > MPIC_Wait(513)........................: 
> > MPIDI_CH3I_Progress(150)..............: 
> > > MPID_nem_mpich2_blocking_r> ecv(948)....: 
> > MPID_nem_tcp_connpoll(1720)...........: 
> > state_commrdy_handler(1561)...........: 
> > MPID_nem_tcp_send_queued(127).........: writev to socket failed -
> > Bad address
> > rank 0 in job 25  cluster.cn_52655   caused 
> > collective abort of all ranks
> > exit status of rank 0: killed by signal 9
> > ####################
> > 
> > And the ends of the log file list as below:
> > ####################
> > ……..
> > ……..
> > ……..
> > ……..
> >    
> > bQMMM            = FALSE
> >    
> > QMconstraints        = 0
> >    QMMMscheme       = 0
> >    
> > scalefactor           = 1
> > qm_opts:
> >    
> > ngQM                 = 0
> > ####################
> > 
> > I’ve search the gmx-users mail list and tried to adjust the md 
> > parameters, and no solution was found. The "mpiexec -np x" 
> > option doesn't work except when x=1. I did found that when the 
> > whole En protein is constrained using position restraints 
> > (define = -DPOSRES), the ‘particle decomposition’ option works. 
> > However this is not the kind of MD I want to conduct.
> >  
> > Could anyone help me about this problem? And I also want to know 
> > how can I accelerate this kind of MD (long time simulation of 
> > small system) using Gromacs? Thinks a lot!
> > 
> > (Further information about the simulated system: The system has 
> > one En protein (54 residues, 629 atoms), total 4848 spce waters, 
> > and 7 Cl- used to neutralize the system. The system has been 
> > minimized first. A 20ps MD is also performed for the waters and 
> > ions before EM.)> 
> 
> This should be bread-and-butter with either decomposition up to at least 16 
> processors, for a correctly compiled GROMACS with a useful MPI library.
> 
> Mark
> -- 



> -- 
> gmx-users mailing list    gmx-users@gromacs.org
> http://lists.gromacs.org/mailman/listinfo/gmx-users
> Please search the archive at http://www.gromacs.org/search 
> before posting!
> Please don't post (un)subscribe requests to the list. Use the 
> www interface or send it to gmx-users-requ...@gromacs.org.
> Can't post? Read http://www.gromacs.org/mailing_lists/users.php
--
gmx-users mailing list    gmx-users@gromacs.org
http://lists.gromacs.org/mailman/listinfo/gmx-users
Please search the archive at http://www.gromacs.org/search before posting!
Please don't post (un)subscribe requests to the list. Use the
www interface or send it to gmx-users-requ...@gromacs.org.
Can't post? Read http://www.gromacs.org/mailing_lists/users.php

Reply via email to