[gmx-users] Re: mdrun -nosum still complains that 15 % of the run time was spent communicating energies

Chris Neale Mon, 20 Jul 2009 14:07:58 -0700

I have now tested with and without -nosum and it appears that the optionis working (see 51 vs. 501 Number of communications) but that the totalamount of time communicating energies didn't go down by very much. Seemsstrange to me. Anybody have any ideas if this is normal?

At the very least, I suggest adding an if statement to mdrun so that itdoesn't output the -nosum usage note if the user did in fact use -nosumin that run.



Without using -nosum:

   R E A L   C Y C L E   A N D   T I M E   A C C O U N T I N G

Computing:         Nodes     Number     G-Cycles    Seconds     %
-----------------------------------------------------------------------
...
Write traj.          256          2      233.218       93.7     0.5
Update               256        501      777.511      312.5     1.7
Constraints          256       1002     1203.894      483.9     2.7
Comm. energies       256        501     7397.995     2973.9    16.5
Rest                 256                 128.058       51.5     0.3
-----------------------------------------------------------------------
Total                384               44897.468    18048.0   100.0
-----------------------------------------------------------------------

NOTE: 16 % of the run time was spent communicating energies,
     you might want to use the -nosum option of mdrun


       Parallel run - timing based on wallclock.

              NODE (s)   Real (s)      (%)
      Time:     47.000     47.000    100.0
              (Mnbf/s)   (GFlops)   (ns/day)  (hour/ns)
Performance:  13485.788    712.634      1.842     13.029
Finished mdrun on node 0 Mon Jul 20 12:53:41 2009

#########

And using -nosum:

   R E A L   C Y C L E   A N D   T I M E   A C C O U N T I N G
Computing:         Nodes     Number     G-Cycles    Seconds     %
-----------------------------------------------------------------------
...
Write traj.          256          2      213.521       83.3     0.5
Update               256        501      776.606      303.0     1.8
Constraints          256       1002     1200.285      468.2     2.7
Comm. energies       256         51     6926.667     2702.1    15.6
Rest                 256                 127.503       49.7     0.3
-----------------------------------------------------------------------
Total                384               44296.670    17280.0   100.0
-----------------------------------------------------------------------

NOTE: 16 % of the run time was spent communicating energies,
     you might want to use the -nosum option of mdrun


       Parallel run - timing based on wallclock.

              NODE (s)   Real (s)      (%)
      Time:     45.000     45.000    100.0
              (Mnbf/s)   (GFlops)   (ns/day)  (hour/ns)
Performance:  14084.547    744.277      1.924     12.475

#########

Thanks,
Chris.

Chris Neale wrote:

Hello,

I have been running simulations on a larger number of processorsrecently and am confused about the message regarding -nosum thatoccurs at the end of the .log file. In this case, I have included the-nosum option to mdrun and I still get this warning (gromacs 4.0.4).


My command was:

mpirun -np $(wc -l $PBS_NODEFILE | gawk '{print $1}') -machinefile$PBS_NODEFILE /scratch/cneale/exe/intel/gromacs-4.0.4/exec/bin/mdrun-deffnm test -nosum -npme 128


#########

To confirm that I am asking mdrun for -nosum, to stderr I get:
...
Option       Type   Value   Description
------------------------------------------------------
-[no]h       bool   no      Print help info and quit
-nice        int    0       Set the nicelevel
-deffnm      string test    Set the default filename for all file options

-[no]xvgr bool yes Add specific codes (legends etc.) in theoutput

                           xvg files for the xmgrace program
-[no]pd      bool   no      Use particle decompostion
-dd          vector 0 0 0   Domain decomposition grid, 0 is optimize

-npme int 128 Number of separate nodes to be used forPME, -1

                           is guess

-ddorder enum interleave DD node order: interleave, pp_pme orcartesian

-[no]ddcheck bool   yes     Check for all bonded interactions with DD

-rdd real 0 The maximum distance for bondedinteractions withDD (nm), 0 is determine from initialcoordinates-rcon real 0 Maximum distance for P-LINCS (nm), 0 isestimate-dlb enum auto Dynamic load balancing (with DD): auto, noor yes-dds real 0.8 Minimum allowed dlb scaling of the DD cellsize

-[no]sum     bool   no      Sum the energies at every step
-[no]v       bool   no      Be loud and noisy
-[no]compact bool   yes     Write a compact log file
-[no]seppot  bool   no      Write separate V and dVdl terms for each
                           interaction type and node to the log file(s)
-pforce      real   -1      Print all forces larger than this (kJ/mol nm)
-[no]reprod  bool   no      Try to avoid optimizations that affect binary
                           reproducibility
-cpt         real   15      Checkpoint interval (minutes)

-[no]append bool no Append to previous output files whencontinuing

                           from checkpoint
-[no]addpart bool   yes     Add the simulation part number to all output
                           files when continuing from checkpoint
-maxh        real   -1      Terminate after 0.99 times this time (hours)
-multi       int    0       Do multiple simulations in parallel
-replex      int    0       Attempt replica exchange every # steps

-reseed int -1 Seed for replica exchange, -1 is generatea seed

-[no]glas    bool   no      Do glass simulation with special long range
                           corrections

-[no]ionize bool no Do a simulation including the effect of anX-Ray

                           bombardment on your system
...

########

And the message at the end of the .log file is:
...
   D O M A I N   D E C O M P O S I T I O N   S T A T I S T I C S

av. #atoms communicated per step for force:  2 x 3376415.3
av. #atoms communicated per step for LINCS:  2 x 192096.6

Average load imbalance: 11.7 %
Part of the total run time spent waiting due to load imbalance: 7.9 %

Steps where the load balancing was limited by -rdd, -rcon and/or -dds:X 0 % Y 0 % Z 0 %

Average PME mesh/force load: 0.620
Part of the total run time spent waiting due to PP/PME imbalance: 10.0 %

NOTE: 7.9 % performance was lost due to load imbalance
     in the domain decomposition.

NOTE: 10.0 % performance was lost because the PME nodes
     had less work to do than the PP nodes.
     You might want to decrease the number of PME nodes
     or decrease the cut-off and the grid spacing.


    R E A L   C Y C L E   A N D   T I M E   A C C O U N T I N G

Computing:         Nodes     Number     G-Cycles    Seconds     %
-----------------------------------------------------------------------
Domain decomp.       256         51      337.551      131.2     0.7
Send X to PME        256        501       59.454       23.1     0.1
Comm. coord.         256        501      289.936      112.7     0.6
Neighbor search      256         51     1250.088      485.9     2.8
Force                256        501    16105.584     6259.9    35.4
Wait + Comm. F       256        501     2441.390      948.9     5.4
PME mesh             128        501     5552.336     2158.1    12.2
Wait + Comm. X/F     128        501     9586.486     3726.1    21.1
Wait + Recv. PME F   256        501      459.752      178.7     1.0
Write traj.          256          2      223.993       87.1     0.5
Update               256        501      777.618      302.2     1.7
Constraints          256       1002     1223.093      475.4     2.7
Comm. energies       256         51     7011.309     2725.1    15.4
Rest                 256                 127.710       49.6     0.3
-----------------------------------------------------------------------
Total                384               45446.299    17664.0   100.0
-----------------------------------------------------------------------

NOTE: 15 % of the run time was spent communicating energies,
     you might want to use the -nosum option of mdrun


       Parallel run - timing based on wallclock.

              NODE (s)   Real (s)      (%)
      Time:     46.000     46.000    100.0
              (Mnbf/s)   (GFlops)   (ns/day)  (hour/ns)
Performance:  13778.036    728.080      1.882     12.752

########

Thanks,
Chris


_______________________________________________
gmx-users mailing list    gmx-users@gromacs.org
http://lists.gromacs.org/mailman/listinfo/gmx-users
Please search the archive at http://www.gromacs.org/search before posting!

Please don't post (un)subscribe requests to the list. Use thewww interface or send it to gmx-users-requ...@gromacs.org.

Can't post? Read http://www.gromacs.org/mailing_lists/users.php

[gmx-users] Re: mdrun -nosum still complains that 15 % of the run time was spent communicating energies

Reply via email to