Hi Gromacs users/developers, we have two Gromacs installations on two different clusters with the following sw versions:
1) Cluster: OLD(Myrinet) Gromacs 3.3.1 (CentOS 4 / Rocks 4.1) kernel 2.6.9-22.ELsmp gcc 3.4.4 fftw 3.1.2 mpich-gm 1.2.7p1..18 2) Cluster: NEW(Infiniband) Gromacs 4.0.4 / 4.0.3 (CentOS 5) kernel 2.6.18-53.el5 gcc 4.1.2 20070626 (Red Hat 4.1.2-14) / icc 10.1 (Build 20070913 Pack.ID: l_cc_p_10.1.008) fftw 3.2.1 ofed131 - openmpi 1.2.6 Both serial and parallel, both single- and double-precision versions of Gromacs 4.0.3 and 4.0.4 were compiled with gcc (deprecated 4.1.2, but tests were either passed or failed with minor discrepancies) and with Intel 10.1 compilers). We tried to reproduce on cluster NEW simple MD equilibrations on two different systems (proteins solvated in SPC water + counterions) successfully run on cluster OLD. We used as starting tpr files either the same ones used and produced in 3.3.1, or new 4.0.4 files. Although the starting energies for both systems were substantially equal: ------------------------------------------------------------------------------------------------------ Cluster NEW system 2: Step Time Lambda 0 0.00000 0.00000 Energies (kJ/mol) G96Angle Proper Dih. Improper Dih. LJ-14 Coulomb-14 7.90343e+02 7.80369e+02 2.11086e+02 4.58020e+02 1.92904e+04 LJ (SR) LJ (LR) Coulomb (SR) Coul. recip. Position Rest. 1.09286e+05 -1.43221e+03 -4.54033e+05 -5.15749e+04 2.74030e-01 Potential Kinetic En. Total Energy Temperature Pressure (bar) -3.76224e+05 7.83268e+04 -2.97897e+05 3.12792e+02 9.77797e+03 Cons. rmsd () 2.19464e-05 ------------------------------------------------------------------------------------------------------ Cluster OLD system 2: Rel. Constraint Deviation: Max between atoms RMS Before LINCS 0.098014 1670 1671 0.006831 After LINCS 0.000104 509 511 0.000022 Energies (kJ/mol) G96Angle Proper Dih. Improper Dih. LJ-14 Coulomb-14 7.90348e+02 7.80369e+02 2.11085e+02 4.58017e+02 1.92904e+04 LJ (SR) LJ (LR) Coulomb (SR) Coul. recip. Position Rest. 1.09286e+05 -1.43221e+03 -4.54033e+05 -5.15750e+04 2.74027e-01 Potential Kinetic En. Total Energy Temperature Pressure (bar) -3.76224e+05 7.83267e+04 -2.97897e+05 3.12791e+02 9.78296e+03 ------------------------------------------------------------------------------------------------------ in both cases the simulations with Gromacs 3.3.1 ran without any problem (and provided good starting points for very stable production runs), while those performed with Gromacs 4.0.3 or 4.0.4 after 2 ps or less systematically started exhibiting total energy and temperature wide oscillations with a net increasing drift in energy on both systems, and very rapidly increasing temperature variations in system 1, that led to premature run terminations with errors on LINCS or routines to calculate 1-4 interactions for all runs on system 1. System 2 exhibited a smaller energy drift and rather steady, but still significant, temperature oscillations, so the 100 ps run (8 cores, double-precision parallel version complied with Intel compiler, starting from original 3.3.1 tpr file) ended (apparently) regularly. However, avg. energy was higher than in corresponding 3.3.1 simulation and avg. temperature failed to reach the targeted 300K value. In particular protein suffered from poor thermal relaxation under the same conditions that in 3.3.1 simulations worked flawlessly. The final, average and r.m.s. values from log files of the two corresponding runs on system 2 with 3.3.1 and 4.0.4 are: ---------------------------------------------------------------------------- Cluster NEW system 2: Step Time Lambda 50000 100.00000 0.00000 Writing checkpoint, step 50000 at Thu Mar 12 16:21:03 2009 Energies (kJ/mol) G96Angle Proper Dih. Improper Dih. LJ-14 Coulomb-14 5.20200e+03 1.40303e+03 1.56905e+03 5.80966e+02 1.89786e+04 LJ (SR) LJ (LR) Coulomb (SR) Coul. recip. Position Rest. 7.93006e+04 -1.43727e+03 -4.96019e+05 -6.18250e+04 2.12927e+03 Potential Kinetic En. Total Energy Temperature Pressure (bar) -4.50118e+05 8.53539e+04 -3.64764e+05 3.40853e+02 7.37194e+03 Cons. rmsd () 6.33529e-05 <====== ############### ==> <==== A V E R A G E S ====> <== ############### ======> Energies (kJ/mol) G96Angle Proper Dih. Improper Dih. LJ-14 Coulomb-14 5.22068e+03 1.38036e+03 1.53871e+03 1.11820e+03 1.90723e+04 LJ (SR) LJ (LR) Coulomb (SR) Coul. recip. Position Rest. 7.38412e+04 -1.40961e+03 -4.82837e+05 -6.16567e+04 4.79992e+03 Potential Kinetic En. Total Energy Temperature Pressure (bar) -4.38932e+05 8.23804e+04 -3.56552e+05 3.28979e+02 2.00692e+02 Cons. rmsd () 0.00000e+00 Box-X Box-Y Box-Z Volume Density (SI) 7.45269e+00 7.02652e+00 6.08533e+00 3.18779e+02 9.90264e+02 pV -6.99836e+03 Total Virial (kJ/mol) 3.09153e+04 3.94735e+01 -3.38472e+00 3.94745e+01 3.11413e+04 2.26993e+02 -3.38879e+00 2.26991e+02 3.08214e+04 Pressure (bar) 1.19126e+02 3.73568e+00 8.31997e+00 3.73558e+00 4.54768e+02 -1.26308e+01 8.32041e+00 -1.26307e+01 2.81831e+01 Total Dipole (Debye) 1.44566e+02 4.10437e+02 1.05756e+02 Epot (kJ/mol) Coul-SR LJ-SR LJ-LR Coul-14 LJ-14 Protein-Protein -6.23914e+03 -5.96547e+03 -1.93349e+02 1.90723e+04 1.11820e+03 Protein-Non-Protein -5.33140e+03 -1.38636e+03 -1.87522e+02 0.00000e+00 0.00000e+00 Non-Protein-Non-Protein -4.71266e+05 8.11930e+04 -1.02874e+03 0.00000e+00 0.00000e+00 T-Protein T-SOL T-CL- 5.93312e+02 3.12714e+02 3.24327e+02 <====== ############################### ==> <==== R M S - F L U C T U A T I O N S ====> <== ############################### ======> Energies (kJ/mol) G96Angle Proper Dih. Improper Dih. LJ-14 Coulomb-14 9.66238e+02 1.07074e+02 1.99039e+02 8.27177e+02 7.63636e+02 LJ (SR) LJ (LR) Coulomb (SR) Coul. recip. Position Rest. 2.05222e+04 4.49488e+01 2.39437e+04 2.70027e+02 3.11921e+03 Potential Kinetic En. Total Energy Temperature Pressure (bar) 7.21911e+03 4.02779e+03 6.92839e+03 1.60846e+01 1.78298e+04 Cons. rmsd () 0.00000e+00 Box-X Box-Y Box-Z Volume Density (SI) 8.06489e-02 7.60371e-02 6.58520e-02 1.03515e+01 3.21181e+01 pV 3.41845e+05 Total Virial (kJ/mol) 1.45509e+05 1.37533e+03 1.65462e+03 1.37534e+03 2.49318e+05 1.79716e+03 1.65462e+03 1.79717e+03 1.14997e+05 Pressure (bar) 1.52957e+04 1.45547e+02 1.74844e+02 1.45548e+02 2.61231e+04 1.90420e+02 1.74843e+02 1.90421e+02 1.21129e+04 Total Dipole (Debye) 3.03524e+02 2.43084e+02 2.30363e+02 Epot (kJ/mol) Coul-SR LJ-SR LJ-LR Coul-14 LJ-14 Protein-Protein 5.44055e+02 1.91174e+02 4.60987e+00 7.63636e+02 8.27177e+02 Protein-Non-Protein 2.99090e+02 2.19500e+02 7.77253e+00 0.00000e+00 0.00000e+00 Non-Protein-Non-Protein 2.32076e+04 2.01585e+04 3.30115e+01 0.00000e+00 0.00000e+00 T-Protein T-SOL T-CL- 6.32406e+01 1.61761e+01 6.73077e+01 ---------------------------------------------------------------------------- Cluster OLD system 2: Step Time Lambda 50000 100.00001 0.00000 Rel. Constraint Deviation: Max between atoms RMS Before LINCS 0.062015 369 370 0.007971 After LINCS 0.000087 231 233 0.000021 Energies (kJ/mol) G96Angle Proper Dih. Improper Dih. LJ-14 Coulomb-14 2.70775e+03 1.04157e+03 7.93145e+02 5.36208e+02 1.91215e+04 LJ (SR) LJ (LR) Coulomb (SR) Coul. recip. Position Rest. 7.67921e+04 -1.45280e+03 -4.98410e+05 -6.19374e+04 6.32109e+02 Potential Kinetic En. Total Energy Temperature Pressure (bar) -4.60176e+05 7.54371e+04 -3.84739e+05 3.01252e+02 -1.61958e+02 Total NODE time on node 0: 2449.05 Average NODE time: 306.131 Load imbalance reduced performance to 800% of max <====== ############### ==> <==== A V E R A G E S ====> <== ############### ======> Energies (kJ/mol) G96Angle Proper Dih. Improper Dih. LJ-14 Coulomb-14 2.65581e+03 1.10522e+03 8.58218e+02 5.56582e+02 1.91397e+04 LJ (SR) LJ (LR) Coulomb (SR) Coul. recip. Position Rest. 7.76235e+04 -1.45624e+03 -4.98446e+05 -6.19220e+04 6.48663e+02 Potential Kinetic En. Total Energy Temperature Pressure (bar) -4.59237e+05 7.52388e+04 -3.83998e+05 3.00460e+02 2.82522e-01 Box-X Box-Y Box-Z Volume Density (SI) 7.36717e+00 6.94587e+00 6.01552e+00 3.07828e+02 1.02446e+03 pV -2.19380e+01 Total Virial (kJ/mol) 2.50625e+04 3.74364e+01 -1.09807e+02 -1.21909e+02 2.52114e+04 -2.24475e+01 -1.09718e+02 6.28763e+01 2.49978e+04 Pressure (bar) 5.85471e+00 -9.46889e-01 1.42992e+01 1.55935e+01 -1.28647e+01 5.65603e+00 1.42304e+01 -3.13115e+00 7.85754e+00 Total Dipole (Debye) -4.27471e+02 1.56256e+03 1.39198e+02 Epot (kJ/mol) Coul-SR LJ-SR LJ-LR Coul-14 LJ-14 Protein-Protein -6.43284e+03 -6.06852e+03 -1.93594e+02 1.91397e+04 5.56582e+02 Protein-Non-Protein -6.07176e+03 -1.49620e+03 -1.99600e+02 0.00000e+00 0.00000e+00 Non-Protein-Non-Protein -4.85942e+05 8.51882e+04 -1.06305e+03 0.00000e+00 0.00000e+00 T-Protein T-SOL T-CL- 2.99892e+02 3.00493e+02 3.02237e+02 <====== ############################### ==> <==== R M S - F L U C T U A T I O N S ====> <== ############################### ======> Energies (kJ/mol) G96Angle Proper Dih. Improper Dih. LJ-14 Coulomb-14 7.87191e+01 3.92754e+01 3.97988e+01 4.07835e+01 6.59014e+01 LJ (SR) LJ (LR) Coulomb (SR) Coul. recip. Position Rest. 1.42174e+03 9.85270e+00 3.66032e+03 1.38822e+02 5.13795e+01 Potential Kinetic En. Total Energy Temperature Pressure (bar) 2.79433e+03 1.14191e+03 3.71906e+03 4.56013e+00 4.15482e+02 Box-X Box-Y Box-Z Volume Density (SI) 1.73750e-02 1.63839e-02 1.41834e-02 2.20002e+00 7.10821e+00 pV 7.74026e+03 Total Virial (kJ/mol) 4.40353e+03 3.95656e+03 3.53923e+03 4.91163e+03 6.62942e+03 4.84215e+03 3.02705e+03 3.46706e+03 3.99411e+03 Pressure (bar) 4.80956e+02 4.27103e+02 3.82114e+02 5.31242e+02 7.15522e+02 5.21285e+02 3.27257e+02 3.74634e+02 4.39706e+02 Total Dipole (Debye) 2.65079e+02 3.01101e+02 2.66824e+02 Epot (kJ/mol) Coul-SR LJ-SR LJ-LR Coul-14 LJ-14 Protein-Protein 6.55075e+01 4.63548e+01 6.28258e-01 6.59014e+01 4.07835e+01 Protein-Non-Protein 2.40165e+02 1.09021e+02 3.92781e+00 0.00000e+00 0.00000e+00 Non-Protein-Non-Protein 3.50328e+03 1.43654e+03 6.95241e+00 0.00000e+00 0.00000e+00 T-Protein T-SOL T-CL- 5.26064e+00 4.78751e+00 5.93233e+01 ---------------------------------------------------------------------------- What could be the origin of such discrepancies between 3.3.1 and 4.0.3/4? Is any change in MD protocol strongly suggested on converting input/script files from 3.3 to 4.0? I searched Gromacs mailing-lists and docs, but I could not identify any useful hint or other cases of the same problem, so I apologize in advance if I may have missed this information. Best regards, Pietro -- Dr. Pietro Amodeo, PhD. Istituto di Chimica Biomolecolare del CNR Comprensorio "A. Olivetti", Edificio 70 Via Campi Flegrei 34 I-80078 Pozzuoli (Napoli) - Italy Phone +39-0818675072 Fax +39-0818041770 Email pamo...@icmib.na.cnr.it _______________________________________________ gmx-users mailing list gmx-users@gromacs.org http://www.gromacs.org/mailman/listinfo/gmx-users Please search the archive at http://www.gromacs.org/search before posting! Please don't post (un)subscribe requests to the list. Use the www interface or send it to gmx-users-requ...@gromacs.org. Can't post? Read http://www.gromacs.org/mailing_lists/users.php