[gmx-users] Problem on continuing MD

2017-11-01 Thread YanhuaOuyang
Dear gromacs user,


   Today, I continue the MD twice in two directories from the same point of 
the MD trajectory, for example 100ns, using the same CPU, same checkpoint file, 
same serve node. To my surprise, the energy informations are different between 
the two continued log ouput files, which are shown below. 


continue_md_01.log:

Started mdrun on rank 0 Wed Nov  1 23:17:08 2017

   Step   Time Lambda

   5000   10.00.0

   Energies (kJ/mol)

   BondU-BProper Dih.  Improper Dih.  CMAP Dih.

4.67507e+021.47390e+031.44019e+036.93280e+018.29478e+01

  LJ-14 Coulomb-14LJ (SR)  Disper. corr.   Coulomb (SR)

4.17297e+029.82076e+031.12930e+05   -8.68735e+03   -1.15522e+06

   Coul. recip.  PotentialKinetic En.   Total Energy  Conserved En.

5.88613e+03   -1.03132e+061.63796e+05   -8.67527e+05   -1.90771e+05

Temperature Pres. DC (bar) Pressure (bar)   Constr. rmsd

2.82366e+02   -2.04258e+02   -7.74514e+022.31539e-06

DD  step 5019 load imb.: force 29.5%  pme mesh/force 0.964

At step 5020 the performance loss due to force load imbalance is 11.1 %

   Step   Time Lambda

   50001000   12.00.0

   Energies (kJ/mol)

   BondU-BProper Dih.  Improper Dih.  CMAP Dih.

4.65447e+021.50124e+031.50444e+037.92082e+011.64421e+01

  LJ-14 Coulomb-14LJ (SR)  Disper. corr.   Coulomb (SR)

4.18616e+029.80230e+031.11198e+05   -8.68735e+03   -1.15433e+06

   Coul. recip.  PotentialKinetic En.   Total Energy  Conserved En.

5.95908e+03   -1.03208e+061.64017e+05   -8.68059e+05   -1.90761e+05

Temperature Pres. DC (bar) Pressure (bar)   Constr. rmsd

2.82747e+02   -2.04258e+02   -9.31332e+023.06123e-06

DD  step 50001999  vol min/aver 0.880  load imb.: force 10.0%  pme mesh/force 
1.059

...




continue_md_02.log:

Started mdrun on rank 0 Wed Nov  1 23:39:51 2017

   Step   Time Lambda

   5000   10.00.0

   Energies (kJ/mol)

   BondU-BProper Dih.  Improper Dih.  CMAP Dih.

4.67507e+021.47390e+031.44019e+036.93280e+018.29478e+01

  LJ-14 Coulomb-14LJ (SR)  Disper. corr.   Coulomb (SR)

4.17297e+029.82076e+031.12930e+05   -8.68735e+03   -1.15522e+06

   Coul. recip.  PotentialKinetic En.   Total Energy  Conserved En.

5.88613e+03   -1.03132e+061.63796e+05   -8.67527e+05   -1.90771e+05

Temperature Pres. DC (bar) Pressure (bar)   Constr. rmsd

2.82366e+02   -2.04258e+02   -7.74505e+022.31539e-06

DD  step 5019 load imb.: force 18.3%  pme mesh/force 0.950

At step 5020 the performance loss due to force load imbalance is 6.9 %

   Step   Time Lambda

   50001000   12.00.0

   Energies (kJ/mol)

   BondU-BProper Dih.  Improper Dih.  CMAP Dih.

4.51321e+021.43914e+031.56368e+039.15439e+011.64274e+01

  LJ-14 Coulomb-14LJ (SR)  Disper. corr.   Coulomb (SR)

4.07426e+029.84156e+031.12168e+05   -8.68735e+03   -1.15440e+06

   Coul. recip.  PotentialKinetic En.   Total Energy  Conserved En.

5.88309e+03   -1.03123e+061.62769e+05   -8.68456e+05   -1.90745e+05

Temperature Pres. DC (bar) Pressure (bar)   Constr. rmsd

2.80597e+02   -2.04258e+02   -8.80279e+022.64599e-06

DD  step 50001999  vol min/aver 0.905  load imb.: force 32.7%  pme mesh/force 
1.034

   ...




It is obviously shown that the energy informations varied from 12ps (the md 
is continued from 100ns). Generally speaking, the two continued MD should be 
same each other since the conditions are same.
Why are they different? Does it mean the MD can not be terminated or transfered 
from one server to another because they are changeable if we want to 
investigate the dynamic property?
Do anyone knows the problems?




Best regards,
Ouyang.






-- 
Gromacs Users mailing list

* Please search the archive at 
http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting!

* Can't post? Read http://www.gromacs.org/Support/Mailing_Lists

* For (un)subscribe requests visit
https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a 
mail to gmx-users-requ...@gromacs.org.


Re: [gmx-users] Problem on continuing MD

2017-11-01 Thread Mark Abraham
Hi,

See http://www.gromacs.org/Documentation/Terminology/Reproducibility. You
have non-reproducible load balancing. However this is not a problem unless
your experimental design hinges upon being able to reproduce an exact
trajectory (in which case you will have a tough time getting performance).
You would get a different trajectory if you rotated your initial system by
90 degrees too. Which one is "right?"

Mark

On Wed, Nov 1, 2017 at 10:20 AM YanhuaOuyang <15901283...@163.com> wrote:

> Dear gromacs user,
>
>
>Today, I continue the MD twice in two directories from the same
> point of the MD trajectory, for example 100ns, using the same CPU, same
> checkpoint file, same serve node. To my surprise, the energy informations
> are different between the two continued log ouput files, which are shown
> below.
>
>
> continue_md_01.log:
>
> Started mdrun on rank 0 Wed Nov  1 23:17:08 2017
>
>Step   Time Lambda
>
>5000   10.00.0
>
>Energies (kJ/mol)
>
>BondU-BProper Dih.  Improper Dih.  CMAP Dih.
>
> 4.67507e+021.47390e+031.44019e+036.93280e+018.29478e+01
>
>   LJ-14 Coulomb-14LJ (SR)  Disper. corr.   Coulomb (SR)
>
> 4.17297e+029.82076e+031.12930e+05   -8.68735e+03   -1.15522e+06
>
>Coul. recip.  PotentialKinetic En.   Total Energy  Conserved En.
>
> 5.88613e+03   -1.03132e+061.63796e+05   -8.67527e+05   -1.90771e+05
>
> Temperature Pres. DC (bar) Pressure (bar)   Constr. rmsd
>
> 2.82366e+02   -2.04258e+02   -7.74514e+022.31539e-06
>
> DD  step 5019 load imb.: force 29.5%  pme mesh/force 0.964
>
> At step 5020 the performance loss due to force load imbalance is 11.1 %
>
>Step   Time Lambda
>
>50001000   12.00.0
>
>Energies (kJ/mol)
>
>BondU-BProper Dih.  Improper Dih.  CMAP Dih.
>
> 4.65447e+021.50124e+031.50444e+037.92082e+011.64421e+01
>
>   LJ-14 Coulomb-14LJ (SR)  Disper. corr.   Coulomb (SR)
>
> 4.18616e+029.80230e+031.11198e+05   -8.68735e+03   -1.15433e+06
>
>Coul. recip.  PotentialKinetic En.   Total Energy  Conserved En.
>
> 5.95908e+03   -1.03208e+061.64017e+05   -8.68059e+05   -1.90761e+05
>
> Temperature Pres. DC (bar) Pressure (bar)   Constr. rmsd
>
> 2.82747e+02   -2.04258e+02   -9.31332e+023.06123e-06
>
> DD  step 50001999  vol min/aver 0.880  load imb.: force 10.0%  pme
> mesh/force 1.059
>
> ...
>
>
>
>
> continue_md_02.log:
>
> Started mdrun on rank 0 Wed Nov  1 23:39:51 2017
>
>Step   Time Lambda
>
>5000   10.00.0
>
>Energies (kJ/mol)
>
>BondU-BProper Dih.  Improper Dih.  CMAP Dih.
>
> 4.67507e+021.47390e+031.44019e+036.93280e+018.29478e+01
>
>   LJ-14 Coulomb-14LJ (SR)  Disper. corr.   Coulomb (SR)
>
> 4.17297e+029.82076e+031.12930e+05   -8.68735e+03   -1.15522e+06
>
>Coul. recip.  PotentialKinetic En.   Total Energy  Conserved En.
>
> 5.88613e+03   -1.03132e+061.63796e+05   -8.67527e+05   -1.90771e+05
>
> Temperature Pres. DC (bar) Pressure (bar)   Constr. rmsd
>
> 2.82366e+02   -2.04258e+02   -7.74505e+022.31539e-06
>
> DD  step 5019 load imb.: force 18.3%  pme mesh/force 0.950
>
> At step 5020 the performance loss due to force load imbalance is 6.9 %
>
>Step   Time Lambda
>
>50001000   12.00.0
>
>Energies (kJ/mol)
>
>BondU-BProper Dih.  Improper Dih.  CMAP Dih.
>
> 4.51321e+021.43914e+031.56368e+039.15439e+011.64274e+01
>
>   LJ-14 Coulomb-14LJ (SR)  Disper. corr.   Coulomb (SR)
>
> 4.07426e+029.84156e+031.12168e+05   -8.68735e+03   -1.15440e+06
>
>Coul. recip.  PotentialKinetic En.   Total Energy  Conserved En.
>
> 5.88309e+03   -1.03123e+061.62769e+05   -8.68456e+05   -1.90745e+05
>
> Temperature Pres. DC (bar) Pressure (bar)   Constr. rmsd
>
> 2.80597e+02   -2.04258e+02   -8.80279e+022.64599e-06
>
> DD  step 50001999  vol min/aver 0.905  load imb.: force 32.7%  pme
> mesh/force 1.034
>
>...
>
>
>
>
> It is obviously shown that the energy informations varied from 12ps
> (the md is continued from 100ns). Generally speaking, the two continued MD
> should be same each other since the conditions are same.
> Why are they different? Does it mean the MD can not be terminated or
> transfered from one server to another because they are changeable if we
> want to investigate the dynamic property?
> Do anyone knows the problems?
>
>
>
>
> Best regards,
> Ouyang.
>
>
>
>
>
>
> --
> Gromacs Users mailing list
>
> * Please search the archive at
> http://www.gromacs.

Re: [gmx-users] Problem on continuing MD

2017-11-01 Thread YanhuaOuyang
Dear Mark,


Thank you so much. I have read the linked website you told and know why such 
problem happens.


Best regards,
Ouyang 








At 2017-11-01 17:39:17, "Mark Abraham"  wrote:
>Hi,
>
>See http://www.gromacs.org/Documentation/Terminology/Reproducibility. You
>have non-reproducible load balancing. However this is not a problem unless
>your experimental design hinges upon being able to reproduce an exact
>trajectory (in which case you will have a tough time getting performance).
>You would get a different trajectory if you rotated your initial system by
>90 degrees too. Which one is "right?"
>
>Mark
>
>On Wed, Nov 1, 2017 at 10:20 AM YanhuaOuyang <15901283...@163.com> wrote:
>
>> Dear gromacs user,
>>
>>
>>Today, I continue the MD twice in two directories from the same
>> point of the MD trajectory, for example 100ns, using the same CPU, same
>> checkpoint file, same serve node. To my surprise, the energy informations
>> are different between the two continued log ouput files, which are shown
>> below.
>>
>>
>> continue_md_01.log:
>>
>> Started mdrun on rank 0 Wed Nov  1 23:17:08 2017
>>
>>Step   Time Lambda
>>
>>5000   10.00.0
>>
>>Energies (kJ/mol)
>>
>>BondU-BProper Dih.  Improper Dih.  CMAP Dih.
>>
>> 4.67507e+021.47390e+031.44019e+036.93280e+018.29478e+01
>>
>>   LJ-14 Coulomb-14LJ (SR)  Disper. corr.   Coulomb (SR)
>>
>> 4.17297e+029.82076e+031.12930e+05   -8.68735e+03   -1.15522e+06
>>
>>Coul. recip.  PotentialKinetic En.   Total Energy  Conserved En.
>>
>> 5.88613e+03   -1.03132e+061.63796e+05   -8.67527e+05   -1.90771e+05
>>
>> Temperature Pres. DC (bar) Pressure (bar)   Constr. rmsd
>>
>> 2.82366e+02   -2.04258e+02   -7.74514e+022.31539e-06
>>
>> DD  step 5019 load imb.: force 29.5%  pme mesh/force 0.964
>>
>> At step 5020 the performance loss due to force load imbalance is 11.1 %
>>
>>Step   Time Lambda
>>
>>50001000   12.00.0
>>
>>Energies (kJ/mol)
>>
>>BondU-BProper Dih.  Improper Dih.  CMAP Dih.
>>
>> 4.65447e+021.50124e+031.50444e+037.92082e+011.64421e+01
>>
>>   LJ-14 Coulomb-14LJ (SR)  Disper. corr.   Coulomb (SR)
>>
>> 4.18616e+029.80230e+031.11198e+05   -8.68735e+03   -1.15433e+06
>>
>>Coul. recip.  PotentialKinetic En.   Total Energy  Conserved En.
>>
>> 5.95908e+03   -1.03208e+061.64017e+05   -8.68059e+05   -1.90761e+05
>>
>> Temperature Pres. DC (bar) Pressure (bar)   Constr. rmsd
>>
>> 2.82747e+02   -2.04258e+02   -9.31332e+023.06123e-06
>>
>> DD  step 50001999  vol min/aver 0.880  load imb.: force 10.0%  pme
>> mesh/force 1.059
>>
>> ...
>>
>>
>>
>>
>> continue_md_02.log:
>>
>> Started mdrun on rank 0 Wed Nov  1 23:39:51 2017
>>
>>Step   Time Lambda
>>
>>5000   10.00.0
>>
>>Energies (kJ/mol)
>>
>>BondU-BProper Dih.  Improper Dih.  CMAP Dih.
>>
>> 4.67507e+021.47390e+031.44019e+036.93280e+018.29478e+01
>>
>>   LJ-14 Coulomb-14LJ (SR)  Disper. corr.   Coulomb (SR)
>>
>> 4.17297e+029.82076e+031.12930e+05   -8.68735e+03   -1.15522e+06
>>
>>Coul. recip.  PotentialKinetic En.   Total Energy  Conserved En.
>>
>> 5.88613e+03   -1.03132e+061.63796e+05   -8.67527e+05   -1.90771e+05
>>
>> Temperature Pres. DC (bar) Pressure (bar)   Constr. rmsd
>>
>> 2.82366e+02   -2.04258e+02   -7.74505e+022.31539e-06
>>
>> DD  step 5019 load imb.: force 18.3%  pme mesh/force 0.950
>>
>> At step 5020 the performance loss due to force load imbalance is 6.9 %
>>
>>Step   Time Lambda
>>
>>50001000   12.00.0
>>
>>Energies (kJ/mol)
>>
>>BondU-BProper Dih.  Improper Dih.  CMAP Dih.
>>
>> 4.51321e+021.43914e+031.56368e+039.15439e+011.64274e+01
>>
>>   LJ-14 Coulomb-14LJ (SR)  Disper. corr.   Coulomb (SR)
>>
>> 4.07426e+029.84156e+031.12168e+05   -8.68735e+03   -1.15440e+06
>>
>>Coul. recip.  PotentialKinetic En.   Total Energy  Conserved En.
>>
>> 5.88309e+03   -1.03123e+061.62769e+05   -8.68456e+05   -1.90745e+05
>>
>> Temperature Pres. DC (bar) Pressure (bar)   Constr. rmsd
>>
>> 2.80597e+02   -2.04258e+02   -8.80279e+022.64599e-06
>>
>> DD  step 50001999  vol min/aver 0.905  load imb.: force 32.7%  pme
>> mesh/force 1.034
>>
>>...
>>
>>
>>
>>
>> It is obviously shown that the energy informations varied from 12ps
>> (the md is continued from 100ns). Generally speaking, the two continued MD
>> should be same each other since the conditions are same.
>> Why are they differen