Re: [gmx-users] RE: Re: RE: About the binary identical results by restarting from the checkpoint file

2013-06-15 Thread Mark Abraham
On Sat, Jun 15, 2013 at 9:00 PM, Cuiying Jian wrote:

>
>
>
>
>
>
>
>
>
> Hi Mark,
>
> I test the simulations again using Berendsen thermostat -- Still, I cannot
> get binary identical results.
>  I do two sets of simulations:
> 1. Use Gromacs 4.5.2 installed on my personal computer:
>

4.6.2, I hope. Nobody is interested in reports about 4.5.2 :-)


> Run 2 simulations using the command: mdrun -s md.tpr -deffnm md -nt 1 -cpt
> 0 -reprod (-nt 1 ensures that the number of threads to start is
> 1).Terminate one simulation manually.Restart this simulation by: mdrun -s
> md.tpr -deffnm md -nt 1 -cpt 0 -cpi md.cpt -reprod -npme 0 (-npme o ensures
> that the number of pme nodes for the restarting the same with that in the
> checkpoint file.)Compare the results with those from continuous ones.


What does gmxcheck say when comparing the resulting ostensibly equivalent
trajectory files? Please provide a snippet of output if it says things
differ. We want to see how big "different" is. Also the top 20 lines of a
.log file.

Also, you can do the above procedure in a controlled manner in 4.6.2 by
using mdrun -nsteps on the run you wish to stop prematurely.

Might your FFT library be multi-threading behind your back?

Mark

2. Use Gromacs 4.0.7 installed on a cluster (only one processor is used
> during the simulation):
> Run 2 simulations using the command: mdrun_s -v -cpt 0 -s md.tpr -deffnm
> md -reprod Terminate one simulation manually.Restart this simulation by:
> mdrun_s -v -cpt 0 -cpi md.cpt -s md.tpr -deffnm md -reprod  Compare the
> results with those from continuous ones. Still, I cannot get binary
> identical results.  As mentioned ealier, the only case I can get binary
> identical results is for SPC rigid water molecules (using velocity
> rescaling thermostat in Gromacs 4.0.7). I guess that the reason for this
> problem may also be caused by the LINCS algorithm used to constraint all
> bonds in other cases except the rigid water case..  Thanks a lot.
> Cheers,Cuiying
>
> > Date: Mon, 3 Jun 2013 19:15:12 +0200
> > From: Mark Abraham 
> > Subject: Re: [gmx-users] RE: About the binary identical results by
> >   restarting  from the checkpoint file
> > To: Discussion list for GROMACS users 
> > Message-ID:
> >c5pzncgwv438mveydosf56r6ytc68...@mail.gmail.com>
> > Content-Type: text/plain; charset=ISO-8859-1
> >
> > On Mon, Jun 3, 2013 at 6:59 PM, Cuiying Jian  >wrote:
> >
> > > Hi Mark,
> > >
> > > Thanks for your reply. I tested restarting simulations with .cpt files
> by
> > > GROMACS 4.6.1.  and the problems are still there, i.e. I cannot get
> binary
> > > identical results from restarted simulations with those from continuous
> > > simulations. The command I used for restarting is as the following
> (Only
> > > one processor is used during the simulations.):
> > > mdrun -v -s md.tpr -cpt 0 -cpi md.cpt -deffnm md -reprod
> > >
> >
> > This is not generally enough to generate a serial run in 4.6, by the way.
> > GROMACS tries very hard to automatically use all the resources available
> in
> > the best way. See mdrun -h for various -nt* options, and consult the
> > pre-step-0 part of the .log file for feedback.
> >
> > For further information, I attach my original .mdp file below:
> > > constraints  =  all-bonds ; convert all bonds to
> > > constraints.
> > > integrator =  md
> > > dt  =  0.002  ; ps !
> > > nsteps  =  1 ; total 2 ns.
> > > nstcomm =  10; frequency for center of
> > > mass motion removal.
> > > nstxout=  5  ; collect data every
> 10.0
> > > ps.
> > > nstxtcout =  5  ; frequency to write
> > > coordinate to xtc trajectory.
> > > nstvout=  5  ; frequency to write
> > > velocities to output trajectory.
> > > nstfout =  5  ; frequency to write
> > > forces to output trajectory.
> > > nstlog   =  5  ; frequency to write
> > > energies to log file.
> > > nstenergy=  5  ; frequency to write
> > > energies to energy file.
> > > nstlist   =  1   ; frequency to
> update
> > > the neighbor list.
> > > ns_type   =  grid
> > > rlist   =  1.4
> > > coulombtype  =  PME
> > > rcoulomb=  1.4
> > > vdwtype  =  cut-off
> > > rvdw =  1.4
> > > pme_order  =  8 ; use 6,8 or 10
> > > when running in parallel
> > > ewald_rtol   =  1e-5
> > > optimize_fft=  yes
> > > DispCorr   =  no ; don't apply any
> > > correction
> > > ;open LINCS
> > > constraint_algorithm = LINCS
> > > lincs_order   = 4   ;highest order in the
> > > expansion of the c

[gmx-users] RE: Re: RE: About the binary identical results by restarting from the checkpoint file

2013-06-15 Thread Cuiying Jian









Hi Mark, 
 
I test the simulations again using Berendsen thermostat -- Still, I cannot get 
binary identical results.
 I do two sets of simulations:
1. Use Gromacs 4.5.2 installed on my personal computer:
Run 2 simulations using the command: mdrun -s md.tpr -deffnm md -nt 1 -cpt 0 
-reprod (-nt 1 ensures that the number of threads to start is 1).Terminate one 
simulation manually.Restart this simulation by: mdrun -s md.tpr -deffnm md -nt 
1 -cpt 0 -cpi md.cpt -reprod -npme 0 (-npme o ensures that the number of pme 
nodes for the restarting the same with that in the checkpoint file.)Compare the 
results with those from continuous ones. 2. Use Gromacs 4.0.7 installed on a 
cluster (only one processor is used during the simulation):
Run 2 simulations using the command: mdrun_s -v -cpt 0 -s md.tpr -deffnm md 
-reprod Terminate one simulation manually.Restart this simulation by: mdrun_s 
-v -cpt 0 -cpi md.cpt -s md.tpr -deffnm md -reprod  Compare the results with 
those from continuous ones. Still, I cannot get binary identical results.  As 
mentioned ealier, the only case I can get binary identical results is for SPC 
rigid water molecules (using velocity rescaling thermostat in Gromacs 4.0.7). I 
guess that the reason for this problem may also be caused by the LINCS 
algorithm used to constraint all bonds in other cases except the rigid water 
case..  Thanks a lot. Cheers,Cuiying

> Date: Mon, 3 Jun 2013 19:15:12 +0200
> From: Mark Abraham 
> Subject: Re: [gmx-users] RE: About the binary identical results by
>   restarting  from the checkpoint file
> To: Discussion list for GROMACS users 
> Message-ID:
>   
> Content-Type: text/plain; charset=ISO-8859-1
> 
> On Mon, Jun 3, 2013 at 6:59 PM, Cuiying Jian wrote:
> 
> > Hi Mark,
> >
> > Thanks for your reply. I tested restarting simulations with .cpt files by
> > GROMACS 4.6.1.  and the problems are still there, i.e. I cannot get binary
> > identical results from restarted simulations with those from continuous
> > simulations. The command I used for restarting is as the following (Only
> > one processor is used during the simulations.):
> > mdrun -v -s md.tpr -cpt 0 -cpi md.cpt -deffnm md -reprod
> >
> 
> This is not generally enough to generate a serial run in 4.6, by the way.
> GROMACS tries very hard to automatically use all the resources available in
> the best way. See mdrun -h for various -nt* options, and consult the
> pre-step-0 part of the .log file for feedback.
> 
> For further information, I attach my original .mdp file below:
> > constraints  =  all-bonds ; convert all bonds to
> > constraints.
> > integrator =  md
> > dt  =  0.002  ; ps !
> > nsteps  =  1 ; total 2 ns.
> > nstcomm =  10; frequency for center of
> > mass motion removal.
> > nstxout=  5  ; collect data every 10.0
> > ps.
> > nstxtcout =  5  ; frequency to write
> > coordinate to xtc trajectory.
> > nstvout=  5  ; frequency to write
> > velocities to output trajectory.
> > nstfout =  5  ; frequency to write
> > forces to output trajectory.
> > nstlog   =  5  ; frequency to write
> > energies to log file.
> > nstenergy=  5  ; frequency to write
> > energies to energy file.
> > nstlist   =  1   ; frequency to update
> > the neighbor list.
> > ns_type   =  grid
> > rlist   =  1.4
> > coulombtype  =  PME
> > rcoulomb=  1.4
> > vdwtype  =  cut-off
> > rvdw =  1.4
> > pme_order  =  8 ; use 6,8 or 10
> > when running in parallel
> > ewald_rtol   =  1e-5
> > optimize_fft=  yes
> > DispCorr   =  no ; don't apply any
> > correction
> > ;open LINCS
> > constraint_algorithm = LINCS
> > lincs_order   = 4   ;highest order in the
> > expansion of the constraint coupling matrix
> > lincs_warnangle  = 30 ;maximum angle that a bond can
> > rotate before LINCS will complain
> > lincs_iter  = 1;number of iterations
> > to correct for a rotational lengthening in LINCS
> > ; Temperature coupling is on
> > Tcoupl  = v-rescale
> >
> 
> This coupling algorithm has a stochastic component, and at least at some
> points in history the random number generator was either not checkpointed
> properly, or not propagated in parallel properly. I'm not sure offhand if
> any of that has been fixed yet (I doubt it), but you can test (parts of)
> this hypothesis by using Berendsen (in any GROMACS 4.x), or really being
> sure you've run a single thread.
> 
> If Be