[gmx-users] RE: Re: RE: Re: RE: About the binary identical results by restarting from the checkpoint file
is single threading? Thanks a lot. Cheers, Cuiying Message: 3 Date: Sat, 15 Jun 2013 21:50:52 +0200 From: Mark Abraham mark.j.abra...@gmail.com Subject: Re: [gmx-users] RE: Re: RE: About the binary identical results by restarting from the checkpoint file To: Discussion list for GROMACS users gmx-users@gromacs.org Message-ID: camnumaqbjd5lvcroybd4-qf04p4dn+wdpabnbdv4ge-ur-k...@mail.gmail.com Content-Type: text/plain; charset=ISO-8859-1 On Sat, Jun 15, 2013 at 9:00 PM, Cuiying Jian cuiying_j...@hotmail.comwrote: Hi Mark, I test the simulations again using Berendsen thermostat -- Still, I cannot get binary identical results. I do two sets of simulations: 1. Use Gromacs 4.5.2 installed on my personal computer: 4.6.2, I hope. Nobody is interested in reports about 4.5.2 :-) Run 2 simulations using the command: mdrun -s md.tpr -deffnm md -nt 1 -cpt 0 -reprod (-nt 1 ensures that the number of threads to start is 1).Terminate one simulation manually.Restart this simulation by: mdrun -s md.tpr -deffnm md -nt 1 -cpt 0 -cpi md.cpt -reprod -npme 0 (-npme o ensures that the number of pme nodes for the restarting the same with that in the checkpoint file.)Compare the results with those from continuous ones. What does gmxcheck say when comparing the resulting ostensibly equivalent trajectory files? Please provide a snippet of output if it says things differ. We want to see how big different is. Also the top 20 lines of a .log file. Also, you can do the above procedure in a controlled manner in 4.6.2 by using mdrun -nsteps on the run you wish to stop prematurely. Might your FFT library be multi-threading behind your back? Mark 2. Use Gromacs 4.0.7 installed on a cluster (only one processor is used during the simulation): Run 2 simulations using the command: mdrun_s -v -cpt 0 -s md.tpr -deffnm md -reprod Terminate one simulation manually.Restart this simulation by: mdrun_s -v -cpt 0 -cpi md.cpt -s md.tpr -deffnm md -reprod Compare the results with those from continuous ones. Still, I cannot get binary identical results. As mentioned ealier, the only case I can get binary identical results is for SPC rigid water molecules (using velocity rescaling thermostat in Gromacs 4.0.7). I guess that the reason for this problem may also be caused by the LINCS algorithm used to constraint all bonds in other cases except the rigid water case.. Thanks a lot. Cheers,Cuiying Date: Mon, 3 Jun 2013 19:15:12 +0200 From: Mark Abraham mark.j.abra...@gmail.com Subject: Re: [gmx-users] RE: About the binary identical results by restarting from the checkpoint file To: Discussion list for GROMACS users gmx-users@gromacs.org Message-ID: CAMNuMARBEZ=m=Y_M1= c5pzncgwv438mveydosf56r6ytc68...@mail.gmail.com Content-Type: text/plain; charset=ISO-8859-1 On Mon, Jun 3, 2013 at 6:59 PM, Cuiying Jian cuiying_j...@hotmail.com wrote: Hi Mark, Thanks for your reply. I tested restarting simulations with .cpt files by GROMACS 4.6.1. and the problems are still there, i.e. I cannot get binary identical results from restarted simulations with those from continuous simulations. The command I used for restarting is as the following (Only one processor is used during the simulations.): mdrun -v -s md.tpr -cpt 0 -cpi md.cpt -deffnm md -reprod This is not generally enough to generate a serial run in 4.6, by the way. GROMACS tries very hard to automatically use all the resources available in the best way. See mdrun -h for various -nt* options, and consult the pre-step-0 part of the .log file for feedback. For further information, I attach my original .mdp file below: constraints = all-bonds ; convert all bonds to constraints. integrator = md dt = 0.002 ; ps ! nsteps = 1 ; total 2 ns. nstcomm = 10; frequency for center of mass motion removal. nstxout= 5 ; collect data every 10.0 ps. nstxtcout = 5 ; frequency to write coordinate to xtc trajectory. nstvout= 5 ; frequency to write velocities to output trajectory. nstfout = 5 ; frequency to write forces to output trajectory. nstlog = 5 ; frequency to write energies to log file. nstenergy= 5 ; frequency to write energies to energy file. nstlist = 1 ; frequency to update the neighbor list. ns_type = grid rlist = 1.4 coulombtype = PME rcoulomb
[gmx-users] RE: Re: RE: About the binary identical results by restarting from the checkpoint file
Hi Mark, I test the simulations again using Berendsen thermostat -- Still, I cannot get binary identical results. I do two sets of simulations: 1. Use Gromacs 4.5.2 installed on my personal computer: Run 2 simulations using the command: mdrun -s md.tpr -deffnm md -nt 1 -cpt 0 -reprod (-nt 1 ensures that the number of threads to start is 1).Terminate one simulation manually.Restart this simulation by: mdrun -s md.tpr -deffnm md -nt 1 -cpt 0 -cpi md.cpt -reprod -npme 0 (-npme o ensures that the number of pme nodes for the restarting the same with that in the checkpoint file.)Compare the results with those from continuous ones. 2. Use Gromacs 4.0.7 installed on a cluster (only one processor is used during the simulation): Run 2 simulations using the command: mdrun_s -v -cpt 0 -s md.tpr -deffnm md -reprod Terminate one simulation manually.Restart this simulation by: mdrun_s -v -cpt 0 -cpi md.cpt -s md.tpr -deffnm md -reprod Compare the results with those from continuous ones. Still, I cannot get binary identical results. As mentioned ealier, the only case I can get binary identical results is for SPC rigid water molecules (using velocity rescaling thermostat in Gromacs 4.0.7). I guess that the reason for this problem may also be caused by the LINCS algorithm used to constraint all bonds in other cases except the rigid water case.. Thanks a lot. Cheers,Cuiying Date: Mon, 3 Jun 2013 19:15:12 +0200 From: Mark Abraham mark.j.abra...@gmail.com Subject: Re: [gmx-users] RE: About the binary identical results by restarting from the checkpoint file To: Discussion list for GROMACS users gmx-users@gromacs.org Message-ID: CAMNuMARBEZ=m=Y_M1=c5pzncgwv438mveydosf56r6ytc68...@mail.gmail.com Content-Type: text/plain; charset=ISO-8859-1 On Mon, Jun 3, 2013 at 6:59 PM, Cuiying Jian cuiying_j...@hotmail.comwrote: Hi Mark, Thanks for your reply. I tested restarting simulations with .cpt files by GROMACS 4.6.1. and the problems are still there, i.e. I cannot get binary identical results from restarted simulations with those from continuous simulations. The command I used for restarting is as the following (Only one processor is used during the simulations.): mdrun -v -s md.tpr -cpt 0 -cpi md.cpt -deffnm md -reprod This is not generally enough to generate a serial run in 4.6, by the way. GROMACS tries very hard to automatically use all the resources available in the best way. See mdrun -h for various -nt* options, and consult the pre-step-0 part of the .log file for feedback. For further information, I attach my original .mdp file below: constraints = all-bonds ; convert all bonds to constraints. integrator = md dt = 0.002 ; ps ! nsteps = 1 ; total 2 ns. nstcomm = 10; frequency for center of mass motion removal. nstxout= 5 ; collect data every 10.0 ps. nstxtcout = 5 ; frequency to write coordinate to xtc trajectory. nstvout= 5 ; frequency to write velocities to output trajectory. nstfout = 5 ; frequency to write forces to output trajectory. nstlog = 5 ; frequency to write energies to log file. nstenergy= 5 ; frequency to write energies to energy file. nstlist = 1 ; frequency to update the neighbor list. ns_type = grid rlist = 1.4 coulombtype = PME rcoulomb= 1.4 vdwtype = cut-off rvdw = 1.4 pme_order = 8 ; use 6,8 or 10 when running in parallel ewald_rtol = 1e-5 optimize_fft= yes DispCorr = no ; don't apply any correction ;open LINCS constraint_algorithm = LINCS lincs_order = 4 ;highest order in the expansion of the constraint coupling matrix lincs_warnangle = 30 ;maximum angle that a bond can rotate before LINCS will complain lincs_iter = 1;number of iterations to correct for a rotational lengthening in LINCS ; Temperature coupling is on Tcoupl = v-rescale This coupling algorithm has a stochastic component, and at least at some points in history the random number generator was either not checkpointed properly, or not propagated in parallel properly. I'm not sure offhand if any of that has been fixed yet (I doubt it), but you can test (parts of) this hypothesis by using Berendsen (in any GROMACS 4.x), or really being sure you've run a single thread
[gmx-users] RE: About the binary identical results by restarting from the checkpoint file
Hi Mark, Sorry for my carelessness. I check my email again and see your reply. I get your point that my problem may be (at least part of) caused by RNG of the thermostat. Thanks again and sorry for my careless bothering. Cheers, Cuiying On Fri, Jun 14, 2013 at 12:52 AM, Cuiying Jian cuiying_j...@hotmail.comwrote: Hi GROMACS Users, I am sorry if you are bothered by my second post about this topic. But the fact is that I tested restarting simulations with .cpt files by GROMACS 4.6.1. and I still cannot get binary identical results from restarted simulations with those from continuous simulations. The command I used for restarting is as the following (e.g., only one processor is used during the simulations.): mdrun -v -s md.tpr -cpt 0 -cpi md.cpt -deffnm md -reprod I answered you 11 days ago and you are re-posting with nothing new. Why? Mark For further information, I attach my original .mdp file below: constraints = all-bonds ; convert all bonds to constraints. integrator = md dt = 0.002 ; ps ! nsteps = 1 ; total 2 ns. nstcomm = 10; frequency for center of mass motion removal. nstxout= 5 ; collect data every 10.0 ps. nstxtcout = 5 ; frequency to write coordinate to xtc trajectory. nstvout= 5 ; frequency to write velocities to output trajectory. nstfout = 5 ; frequency to write forces to output trajectory. nstlog = 5 ; frequency to write energies to log file. nstenergy= 5 ; frequency to write energies to energy file. nstlist = 1 ; frequency to update the neighbor list. ns_type = grid rlist = 1.4 coulombtype = PME rcoulomb= 1.4 vdwtype = cut-off rvdw = 1.4 pme_order = 8 ; use 6,8 or 10 when running in parallel ewald_rtol = 1e-5 optimize_fft= yes DispCorr = no ; don't apply any correction ;open LINCS constraint_algorithm = LINCS lincs_order = 4 ;highest order in the expansion of the constraint coupling matrix lincs_warnangle = 30 ;maximum angle that a bond can rotate before LINCS will complain lincs_iter = 1;number of iterations to correct for a rotational lengthening in LINCS ; Temperature coupling is on Tcoupl = v-rescale tau_t = 0.1 tc-grps = HEP ref_t = 300 ; Pressure coupling is on Pcoupl = parrinello-rahman Pcoupltype = isotropic tau_p= 1.0 compressibility = 4.5e-5 ref_p = 1.0 ; generate velocity is on at 300 K. gen_vel = yes gen_temp = 300.0 gen_seed = -1 Is there something wrong with my .mdp file or my command? Thanks a lot. Cheers, Cuiying From: cuiying_j...@hotmail.com To: gmx-users@gromacs.org Subject: RE: About the binary identical results by restarting from the checkpoint file Date: Mon, 3 Jun 2013 16:59:31 + Hi Mark, Thanks for your reply. I tested restarting simulations with .cpt files by GROMACS 4.6.1. and the problems are still there, i.e. I cannot get binary identical results from restarted simulations with those from continuous simulations. The command I used for restarting is as the following (Only one processor is used during the simulations.): mdrun -v -s md.tpr -cpt 0 -cpi md.cpt -deffnm md -reprod For further information, I attach my original .mdp file below: constraints = all-bonds ; convert all bonds to constraints. integrator = md dt = 0.002 ; ps ! nsteps = 1 ; total 2 ns. nstcomm = 10; frequency for center of mass motion removal. nstxout= 5 ; collect data every 10.0 ps. nstxtcout = 5 ; frequency to write coordinate to xtc trajectory. nstvout= 5 ; frequency to write velocities to output trajectory. nstfout = 5 ; frequency to write forces to output trajectory. nstlog = 5 ; frequency to write
[gmx-users] RE: About the binary identical results by restarting from the checkpoint file
= no ; don't apply any correction ;open LINCS constraint_algorithm = LINCS lincs_order = 4 ;highest order in the expansion of the constraint coupling matrix lincs_warnangle = 30 ;maximum angle that a bond can rotate before LINCS will complain lincs_iter = 1;number of iterations to correct for a rotational lengthening in LINCS ; Temperature coupling is on Tcoupl = v-rescale tau_t = 0.1 tc-grps = HEP ref_t = 300 ; Pressure coupling is on Pcoupl = parrinello-rahman Pcoupltype = isotropic tau_p= 1.0 compressibility = 4.5e-5 ref_p = 1.0 ; generate velocity is on at 300 K. gen_vel = yes gen_temp = 300.0 gen_seed = -1 Is there something wrong with my .mdp file or my command? Thanks a lot. Cheers, Cuiying On Sun, Jun 2, 2013 at 10:37 PM, Cuiying Jian cuiying_j...@hotmail.comwrote: Hi GROMACS Users, These days, I am testing restarting simulaitions with .cpt files. I already set nlist=1 in the .mdp file. I can restart my simulations (which are stopped manually) with the following commands (version 4.0.7): mpiexec mdrun_s_mpi -v -s md.tpr -cpt 0 -cpi md.cpt -deffnm md -reprod -reprod is used to force binary identical simulaitons. During the restarted simulations, same number of processors are used as that in the simulation interrupted. The only case, in which I can get binary identical results with those from the continuous simulations (which are not stopped manually), is for SPC water molecules. Any other molecules (like -heptane), I can never get binary identical results with those from the continuous simulations. I also try to get new .tpr files by: tpbconv_s -s md.tpr -f md.trr -e md.edr -c md_c.tpr -cont and then: mpiexec mdrun_s_mpi -v -s md_c.tpr -cpt 0 -cpi md.cpt -deffnm md_c -reprod But I still cannot get binary identical results. I also test the simulations with only one processor and binary identical results are still not obtained. Using double precision does not solve the problems. I think that the above problems are caused by some information may not be stored during the running of the simulations. That seems likely. The leading candidate would be a random number generator you're using for a stochastic integrator. Your .mdp file would have been useful. On the other hand, if I run two independent simulations using the exactly same number of processors, the same commands and the same input files, i.e. mpiexec mdrun_s_mpi -v -s md.tpr -deffnm md -reprod I can always get binary identical results from these two independent simulations. I understand that MD is chaotic and if we run simulation for enough long time, simulation results should converge. Also, there are factors which may affect the reproducibility as described in the GROMACS website. But for my purpose, I am curious about whether there are certain methods through which I can get binary identical results from restarted simulations and continuous simulations. Thanks a lot. There are ways to be fully reproducible, but probably not every combination of algorithms has that property. 4.0.7 is so old no problem will be fixed, unless it can also be shown in 4.6 ;-) Mark -- gmx-users mailing listgmx-users@gromacs.org http://lists.gromacs.org/mailman/listinfo/gmx-users * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/Search before posting! * Please don't post (un)subscribe requests to the list. Use the www interface or send it to gmx-users-requ...@gromacs.org. * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
[gmx-users] RE: About the binary identical results by restarting from the checkpoint file
Hi Mark, Thanks for your reply. I tested restarting simulations with .cpt files by GROMACS 4.6.1. and the problems are still there, i.e. I cannot get binary identical results from restarted simulations with those from continuous simulations. The command I used for restarting is as the following (Only one processor is used during the simulations.): mdrun -v -s md.tpr -cpt 0 -cpi md.cpt -deffnm md -reprod For further information, I attach my original .mdp file below: constraints = all-bonds ; convert all bonds to constraints. integrator = md dt = 0.002 ; ps ! nsteps = 1 ; total 2 ns. nstcomm = 10; frequency for center of mass motion removal. nstxout= 5 ; collect data every 10.0 ps. nstxtcout = 5 ; frequency to write coordinate to xtc trajectory. nstvout= 5 ; frequency to write velocities to output trajectory. nstfout = 5 ; frequency to write forces to output trajectory. nstlog = 5 ; frequency to write energies to log file. nstenergy= 5 ; frequency to write energies to energy file. nstlist = 1 ; frequency to update the neighbor list. ns_type = grid rlist = 1.4 coulombtype = PME rcoulomb= 1.4 vdwtype = cut-off rvdw = 1.4 pme_order = 8 ; use 6,8 or 10 when running in parallel ewald_rtol = 1e-5 optimize_fft= yes DispCorr = no ; don't apply any correction ;open LINCS constraint_algorithm = LINCS lincs_order = 4 ;highest order in the expansion of the constraint coupling matrix lincs_warnangle = 30 ;maximum angle that a bond can rotate before LINCS will complain lincs_iter = 1;number of iterations to correct for a rotational lengthening in LINCS ; Temperature coupling is on Tcoupl = v-rescale tau_t = 0.1 tc-grps = HEP ref_t = 300 ; Pressure coupling is on Pcoupl = parrinello-rahman Pcoupltype = isotropic tau_p= 1.0 compressibility = 4.5e-5 ref_p = 1.0 ; generate velocity is on at 300 K. gen_vel = yes gen_temp = 300.0 gen_seed = -1 Is there something wrong with my .mdp file or my command? Thanks a lot. Cheers, Cuiying On Sun, Jun 2, 2013 at 10:37 PM, Cuiying Jian cuiying_j...@hotmail.comwrote: Hi GROMACS Users, These days, I am testing restarting simulaitions with .cpt files. I already set nlist=1 in the .mdp file. I can restart my simulations (which are stopped manually) with the following commands (version 4.0.7): mpiexec mdrun_s_mpi -v -s md.tpr -cpt 0 -cpi md.cpt -deffnm md -reprod -reprod is used to force binary identical simulaitons. During the restarted simulations, same number of processors are used as that in the simulation interrupted. The only case, in which I can get binary identical results with those from the continuous simulations (which are not stopped manually), is for SPC water molecules. Any other molecules (like -heptane), I can never get binary identical results with those from the continuous simulations. I also try to get new .tpr files by: tpbconv_s -s md.tpr -f md.trr -e md.edr -c md_c.tpr -cont and then: mpiexec mdrun_s_mpi -v -s md_c.tpr -cpt 0 -cpi md.cpt -deffnm md_c -reprod But I still cannot get binary identical results. I also test the simulations with only one processor and binary identical results are still not obtained. Using double precision does not solve the problems. I think that the above problems are caused by some information may not be stored during the running of the simulations. That seems likely. The leading candidate would be a random number generator you're using for a stochastic integrator. Your .mdp file would have been useful. On the other hand, if I run two independent simulations using the exactly same number of processors, the same commands and the same input files, i.e. mpiexec mdrun_s_mpi -v -s md.tpr -deffnm md -reprod I can always get binary identical results from these two independent simulations. I understand that MD is chaotic and if we run simulation for enough long time, simulation results should converge. Also, there are factors which may affect the reproducibility as described in the GROMACS website. But for my
[gmx-users] About the binary identical results by restarting from the checkpoint file
Hi GROMACS Users, These days, I am testing restarting simulaitions with .cpt files. I already set nlist=1 in the .mdp file. I can restart my simulations (which are stopped manually) with the following commands (version 4.0.7): mpiexec mdrun_s_mpi -v -s md.tpr -cpt 0 -cpi md.cpt -deffnm md -reprod -reprod is used to force binary identical simulaitons. During the restarted simulations, same number of processors are used as that in the simulation interrupted. The only case, in which I can get binary identical results with those from the continuous simulations (which are not stopped manually), is for SPC water molecules. Any other molecules (like -heptane), I can never get binary identical results with those from the continuous simulations. I also try to get new .tpr files by: tpbconv_s -s md.tpr -f md.trr -e md.edr -c md_c.tpr -cont and then: mpiexec mdrun_s_mpi -v -s md_c.tpr -cpt 0 -cpi md.cpt -deffnm md_c -reprod But I still cannot get binary identical results. I also test the simulations with only one processor and binary identical results are still not obtained. Using double precision does not solve the problems. I think that the above problems are caused by some information may not be stored during the running of the simulations. On the other hand, if I run two independent simulations using the exactly same number of processors, the same commands and the same input files, i.e. mpiexec mdrun_s_mpi -v -s md.tpr -deffnm md -reprod I can always get binary identical results from these two independent simulations. I understand that MD is chaotic and if we run simulation for enough long time, simulation results should converge. Also, there are factors which may affect the reproducibility as described in the GROMACS website. But for my purpose, I am curious about whether there are certain methods through which I can get binary identical results from restarted simulations and continuous simulations. Thanks a lot. Cheers, Cuiying -- gmx-users mailing listgmx-users@gromacs.org http://lists.gromacs.org/mailman/listinfo/gmx-users * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/Search before posting! * Please don't post (un)subscribe requests to the list. Use the www interface or send it to gmx-users-requ...@gromacs.org. * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists