Luca Bellucci wrote:
Hi Chris,
thank for the suggestions,
in the previous mail there is a mistake because couple-moltype = SOL (for solvent) and not "Protein_chaim_P".
Now the problem of the load balance seems reasonable, because
the water box is large ~9.0 nm.

Now your outcome makes a lot more sense. You're decoupling all of the solvent? I don't see how that is going to be physically stable or terribly meaningful, but it explains your performance loss. You're annihilating a significant number of interactions (probably the vast majority of all the nonbonded interactions in the system), which I would expect would cause continuous load balancing issues.


However the problem exist and the performance loss is very high, so I have redone calculations with this command:

grompp -f md.mdp -c ../Run-02/confout.gro -t ../Run-02/state.cpt -p ../ -n ../index.ndx -o md.tpr -maxwarn 1

mdrun -s md.tpr -o md

this is part of the md.mdp file:
; Run parameters
; define          = -DPOSRES
integrator = md ; nsteps = 1000 ; dt = 0.002 ; [..]
free_energy    = yes     ; /no
init_lambda = 0.9 delta_lambda = 0.0
couple-moltype = SOL    ; solvent water
couple-lambda0 = vdw-q
couple-lambda1 = none
couple-intramol= yes

Result for free energy calculation Computing: Nodes Number G-Cycles Seconds %
 Domain decomp.       8        126       22.050        8.3     0.1
 DD comm. load          8         15        0.009        0.0     0.0
 DD comm. bounds     8         12        0.031        0.0     0.0
 Comm. coord.            8       1001       17.319        6.5     0.0
 Neighbor search        8        127      436.569      163.7     1.1
 Force                           8       1001    34241.576    12840.9    87.8
 Wait + Comm. F        8       1001       19.486        7.3     0.0
 PME mesh                  8       1001     4190.758     1571.6    10.7
 Write traj.                  8          7        1.827        0.7     0.0
 Update                      8       1001       12.557        4.7     0.0
 Constraints               8       1001       26.496        9.9     0.1
 Comm. energies      8       1002       10.710        4.0     0.0
 Rest                   8                  25.142        9.4     0.1
 Total                  8               39004.531    14627.1   100.0
 PME redist. X/F          8       3003     3479.771     1304.9     8.9
 PME spread/gather   8       4004      277.574      104.1     0.7
 PME 3D-FFT               8       4004      378.090      141.8     1.0
 PME solve                  8       2002       55.033       20.6     0.1
        Parallel run - timing based on wallclock.

               NODE (s)   Real (s)      (%)
       Time:   1828.385   1828.385    100.0
                             (Mnbf/s)   (GFlops)   (ns/day)  (hour/ns)
Performance:      3.115      3.223      0.095    253.689

I Switched off only the free_energy keyword and I redone the calculation I have:
 Computing:         Nodes     Number     G-Cycles    Seconds     %
 Domain decomp.      8         77       10.975        4.1     0.6
 DD comm. load         8          1        0.001        0.0     0.0
 Comm. coord.           8       1001       14.480        5.4     0.8
 Neighbor search       8         78      136.479       51.2     7.3
 Force                         8       1001     1141.115      427.9    61.3
 Wait + Comm. F      8       1001       17.845        6.7     1.0
 PME mesh                8       1001      484.581      181.7    26.0
 Write traj.               8          5        1.221        0.5     0.1
 Update                   8       1001        9.976        3.7     0.5
 Constraints            8       1001       20.275        7.6     1.1
 Comm. energies     8        992        5.933        2.2     0.3
 Rest                         8                  19.670        7.4     1.1
 Total                  8                1862.552      698.5   100.0
 PME redist. X/F        8       2002       92.204       34.6     5.0
 PME spread/gather      8       2002      192.337       72.1    10.3
 PME 3D-FFT             8       2002      177.373       66.5     9.5
 PME solve              8       1001       22.512        8.4     1.2
        Parallel run - timing based on wallclock.

               NODE (s)   Real (s)      (%)
       Time:     87.309     87.309    100.0
                         (Mnbf/s)   (GFlops)   (ns/day)  (hour/ns)
Performance:    439.731     23.995      1.981     12.114
Finished mdrun on node 0 Mon Apr  4 16:52:04 2011


If we accept your text at face value, then the simulation slowed down
by a factor of 1500%, certainly not the 16% of the load balancing.

Please let us know what version of gromacs and cut and paste your
cammands that you used to run gromacs (so we can verify that you ran
on the same number of processors) and cut and paste a diff of the .mdp
files (so that we can verify that you ran for the same number of steps).

You might be correct about the slowdown, but let's rule out some other
more obvious problems first.


-- original message --

Dear all,
when I run a single free energy simulation
i noticed that there is a loss of performace with respect to
the normal MD

free_energy    = yes
init_lambda    = 0.9
delta_lambda   = 0.0
couple-moltype = Protein_Chain_P
couple-lambda0 = vdw-q
couple-lambda0 = none
couple-intramol= yes

    Average load imbalance: 16.3 %
    Part of the total run time spent waiting due to load imbalance: 12.2 %
    Steps where the load balancing was limited by -rdd, -rcon and/or -dds:
X0 % Time:   1852.712   1852.712    100.0

free_energy    = no
    Average load imbalance: 2.7 %
    Part of the total run time spent waiting due to load imbalance: 1.7 %
    Time:    127.394    127.394    100.0

It seems that the loss of performace is due in part to in the load
imbalance in the domain decomposition, however I tried to change
these keywords without benefit
Any comment is welcome.



Justin A. Lemkul
Ph.D. Candidate
ICTAS Doctoral Scholar
Department of Biochemistry
Virginia Tech
Blacksburg, VA
jalemkul[at] | (540) 231-9080

gmx-users mailing list
Please search the archive at before posting!
Please don't post (un)subscribe requests to the list. Use the www interface or send it to
Can't post? Read

Reply via email to