On 17/07/2012 5:00 PM, DeChang Li wrote:
Dear all,

      I am running a 9000 atom system with GBSA (Gromacs 4.5.5) in a
Blue Gene/Q cluster. I got the speed 1.002 ns/day with 8 cores.
However, in my own workstation with 8 cores the same system can reach
nearly 10 ns/day (Intel(R) Xeon(R) CPU E5620  @ 2.40GHz). Can anyone
tell me what's wrong in my simulation? Any suggestion will be
appreciated.

Your workstation is running highly effective optimized SSE loops. BlueGene/Q is not using its multiple FPU because that code hasn't been written (for explicit or implicit solvation), and BlueGene's processors are probably slower too.

Mark

Following is my md.mdp file:

constraints            = hbonds
constraint_algorithm   = LINCS
lincs_order            = 4
comm_mode              = Angular
comm_grps              = system
integrator             = sd
;annealing           = single single
;annealing_npoints   = 2 2
;annealing_time      = 0 500 0 500
;annealing_temp      = 200 300 200 300
dt                     = 0.002 ; ps !
nsteps                 = 5000000 ; total 5000 ps.
nstcomm                = 10
nstcalcenergy           = 10
nstxout                = 10000 ; collect data every 1 ps
nstenergy              = 10000
nstvout                = 10000
nstlog                 = 1000
;nstxtcout              = 50000
;xtc_grps               = system
nstfout                = 0
nstlist                = 10
ns_type                = grid
pbc                    = no
rlist                  = 1.2
coulombtype            = cut-off
rcoulomb               = 1.2
rvdw                   = 1.2
fourierspacing         = 0.12
fourier_nx             = 0
fourier_ny             = 0
fourier_nz             = 0
pme_order              = 4
ewald_rtol             = 1e-5
optimize_fft           = yes
;energygrps             = alpha1 alpha2 alpha3 beta1 beta2 beta3 gamma
;DispCorr               = EnerPres
; Berendsen temperature coupling is on in two groups
Tcoupl                 =
tau_t                  = 0.5
tc-grps                = system
ref_t                  = 300
; Pressure coupling is on
Pcoupl                 = no ;berendsen
tau_p                  = 1.0
compressibility        = 4.5e-5
ref_p                  = 1.0
; Generate velocites is on at 300 K.
gen_vel                = yes
gen_temp               = 300
gen_seed               = -1

implicit_solvent       = GBSA
gb_algorithm           = OBC
rgbradii               = 1.2
sa_surface_tension     = 2.25936



Here is the preformace info:

         M E G A - F L O P S   A C C O U N T I N G

    RF=Reaction-Field  FE=Free Energy  SCFE=Soft-Core/Free Energy
    T=Tabulated        W3=SPC/TIP3p    W4=TIP4p (single or pairs)
    NF=No Forces

  Computing:                               M-Number         M-Flops  % Flops
-----------------------------------------------------------------------------
  Generalized Born Coulomb                61.482892        2951.179     0.4
  GB Coulomb + LJ                       2565.481100      156494.347    19.4
  Outer nonbonded loop                   152.268546        1522.685     0.2
  1,4 nonbonded interactions             116.143224       10452.890     1.3
  Born radii (HCT/OBC)                  2868.222234      524884.669    64.9
  Born force chain rule                 2868.222234       43023.334     5.3
  NS-Pairs                               516.814696       10853.109     1.3
  Reset In Box                             4.464788          13.394     0.0
  CG-CoM                                   4.482576          13.448     0.0
  Bonds                                   22.174434        1308.292     0.2
  Angles                                  80.586114       13538.467     1.7
  Propers                                160.742142       36809.951     4.6
  Virial                                   4.636254          83.453     0.0
  Update                                  44.478894        1378.846     0.2
  Stop-CM                                  4.455894          44.559     0.0
  Calc-Ekin                               44.487788        1201.170     0.1
  Lincs                                   44.951630        2697.098     0.3
  Lincs-Mat                              261.822552        1047.290     0.1
  Constraint-V                            44.951630         359.613     0.0
  Constraint-Vir                           2.251163          54.028     0.0
-----------------------------------------------------------------------------
  Total                                                  808731.820   100.0
-----------------------------------------------------------------------------


     D O M A I N   D E C O M P O S I T I O N   S T A T I S T I C S

  av. #atoms communicated per step for force:  2 x 660.5
  av. #atoms communicated per step for LINCS:  2 x 34.3

  Average load imbalance: 1.7 %
  Part of the total run time spent waiting due to load imbalance: 1.4 %


      R E A L   C Y C L E   A N D   T I M E   A C C O U N T I N G

  Computing:         Nodes     Number     G-Cycles    Seconds     %
-----------------------------------------------------------------------
  Domain decomp.         8        502       59.421       37.1     0.5
  DD comm. load          8          8        0.004        0.0     0.0
  Comm. coord.           8       5001       16.575       10.4     0.2
  Neighbor search        8        502      136.093       85.1     1.2
  Force                  8       5001     9744.582     6090.7    88.3
  Wait + Comm. F         8       5001       90.905       56.8     0.8
  Write traj.            8          2        0.954        0.6     0.0
  Update                 8       5001       72.936       45.6     0.7
  Constraints            8      10002      171.445      107.2     1.6
  Comm. energies         8        502       10.427        6.5     0.1
  Rest                   8                 732.742      458.0     6.6
-----------------------------------------------------------------------
  Total                  8               11036.086     6897.9   100.0
-----------------------------------------------------------------------

         Parallel run - timing based on wallclock.

                NODE (s)   Real (s)      (%)
        Time:    862.243    862.243    100.0
                        14:22
                (Mnbf/s)   (MFlops)   (ns/day)  (hour/ns)
Performance:      3.047    937.940      1.002     23.946
Finished mdrun on node 0 Tue Jul 17 16:06:48 2012


--
gmx-users mailing list    gmx-users@gromacs.org
http://lists.gromacs.org/mailman/listinfo/gmx-users
* Only plain text messages are allowed!
* Please search the archive at 
http://www.gromacs.org/Support/Mailing_Lists/Search before posting!
* Please don't post (un)subscribe requests to the list. Use the www interface or send it to gmx-users-requ...@gromacs.org.
* Can't post? Read http://www.gromacs.org/Support/Mailing_Lists

Reply via email to