Dear gromacs user,
My free energy calculation works well, however, I am loosing around 56.5 %
of the available CPU time as stated in my log file which is really
considerable. The problem is due to the load imbalance and domain
decomposition, but I have no idea to improve it, below is the very end of
my log file and I would be so appreciated if you could help avoid this.
D O M A I N D E C O M P O S I T I O N S T A T I S T I C S
av. #atoms communicated per step for force: 2 x 115357.4
av. #atoms communicated per step for LINCS: 2 x 2389.1
Average load imbalance: 285.9 %
Part of the total run time spent waiting due to load imbalance: 56.5 %
Steps where the load balancing was limited by -rdd, -rcon and/or -dds: X 2
% Y 2 % Z 2 %
Average PME mesh/force load: 0.384
Part of the total run time spent waiting due to PP/PME imbalance: 14.5 %
NOTE: 56.5 % of the available CPU time was lost due to load imbalance
in the domain decomposition.
NOTE: 14.5 % performance was lost because the PME ranks
had less work to do than the PP ranks.
You might want to decrease the number of PME ranks
or decrease the cut-off and the grid spacing.
R E A L C Y C L E A N D T I M E A C C O U N T I N G
On 96 MPI ranks doing PP, and
on 32 MPI ranks doing PME
Computing: Num Num CallWall time Giga-Cycles
Ranks Threads Count (s) total sum%
-
Domain decomp.961 175000 242.339 53508.472 0.5
DD comm. load 961 174903 9.076 2003.907 0.0
DD comm. bounds 961 174901 27.054 5973.491 0.1
Send X to PME 961701 44.342 9790.652 0.1
Neighbor search 961 175001 251.994 55640.264 0.6
Comm. coord. 96168250001521.009 335838.747 3.4
Force 9617017001.9901546039.264 15.5
Wait + Comm. F961701 10761.2962376093.759 23.8
PME mesh *321701 11796.344 868210.788 8.7
PME wait for PP * 22135.7521629191.096 16.3
Wait + Recv. PME F961701 393.117 86800.265 0.9
NB X/F buffer ops.961 20650001 132.713 29302.991 0.3
COM pull force961701 165.613 36567.368 0.4
Write traj. 961 7037 55.020 12148.457 0.1
Update961 1402 140.972 31126.607 0.3
Constraints 961 1402 12871.2362841968.551 28.4
Comm. energies961 350001 261.976 57844.219 0.6
Rest 52.349 11558.715 0.1
-
Total 33932.0969989607.639 100.0
-
(*) Note that with separate PME ranks, the walltime column actually sums to
twice the total reported, but the cycle count total and % are correct.
-
Breakdown of PME mesh computation
-
PME redist. X/F 321 21032334.608 171827.143 1.7
PME spread/gather 321 28043640.870 267967.972 2.7
PME 3D-FFT321 28041587.105 116810.882 1.2
PME 3D-FFT Comm. 321 56084066.097 299264.666 3.0
PME solve Elec321 1402 148.284 10913.728 0.1
-
Core t (s) Wall t (s)(%)
Time: 4341204.79033932.09612793.8
9h25:32
(ns/day)(hour/ns)
Performance: 35.6480.673
Finished mdrun on rank 0 Sat Aug 13 23:45:45 2016
Thanks,
Regards,
Alex
--
Gromacs Users mailing list
* Please search the archive at
http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting!
* Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
* For (un)subscribe requests visit
https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a
mail to gmx-users-requ...@gromacs.org.