Hi Paul, > On 12. Dec 2018, at 15:36, pbusc...@q.com wrote: > > Dear users ( one more try ) > > I am trying to use 2 GPU cards to improve modeling speed. The computer > described in the log files is used to iron out models and am using to learn > how to use two GPU cards before purchasing two new RTX 2080 ti's. The CPU is > a 8 core 16 thread AMD and the GPU's are two GTX 1060; there are 50000 atoms > in the model > > Using ntpmi and ntomp settings of 1: 16, auto ( 4:4) and 2: 8 ( and any > other combination factoring to 16) the rating for ns/day are approx. 12-16 > and for any other setting ~6-8 i.e adding a card cuts efficiency by half. > The average load imbalance is less than 3.4% for the multicard setup . > > I am not at this point trying to maximize efficiency, but only to show some > improvement going from one to two cards. According to a 2015 paper form the > Gromacs group “ Best bang for your buck: GPU nodes for GROMACS biomolecular > simulations “ I should expect maybe (at best ) 50% improvement for 90k > atoms ( with 2x GTX 970 ) We did not benchmark GTX 970 in that publication.
But from Table 6 you can see that we also had quite a few cases with out 80k benchmark where going from 1 to 2 GPUs, simulation speed did not increase much: E.g. for the E5-2670v2 going from one to 2 GTX 980 GPUs led to an increase of 10 percent. Did you use counter resetting for the benchnarks? Carsten > What bothers me in my initial attempts is that my simulations became slower > by adding the second GPU - it is frustrating to say the least. It's like > swimming backwards. > > I know am missing - as a minimum - the correct setup for mdrun and > suggestions would be welcome > > The output from the last section of the log files is included below. > > =========================== ntpmi 1 ntomp:16 ============================== > > <====== ############### ==> > <==== A V E R A G E S ====> > <== ############### ======> > > Statistics over 29301 steps using 294 frames > > Energies (kJ/mol) > Angle G96Angle Proper Dih. Improper Dih. LJ-14 > 9.17533e+05 2.27874e+04 6.64128e+04 2.31214e+02 8.34971e+04 > Coulomb-14 LJ (SR) Disper. corr. Coulomb (SR) Coul. recip. > -2.84567e+07 -1.43385e+05 -2.04658e+03 1.33320e+07 1.59914e+05 > Position Rest. Potential Kinetic En. Total Energy Temperature > 7.79893e+01 -1.40196e+07 1.88467e+05 -1.38312e+07 3.00376e+02 > Pres. DC (bar) Pressure (bar) Constr. rmsd > -2.88685e+00 3.75436e+01 0.00000e+00 > > Total Virial (kJ/mol) > 5.27555e+04 -4.87626e+02 1.86144e+02 > -4.87648e+02 4.04479e+04 -1.91959e+02 > 1.86177e+02 -1.91957e+02 5.45671e+04 > > Pressure (bar) > 2.22202e+01 1.27887e+00 -4.71738e-01 > 1.27893e+00 6.48135e+01 5.12638e-01 > -4.71830e-01 5.12632e-01 2.55971e+01 > > T-PDMS T-VMOS > 2.99822e+02 3.32834e+02 > > > M E G A - F L O P S A C C O U N T I N G > > NB=Group-cutoff nonbonded kernels NxN=N-by-N cluster Verlet kernels > RF=Reaction-Field VdW=Van der Waals QSTab=quadratic-spline table > W3=SPC/TIP3p W4=TIP4p (single or pairs) > V&F=Potential and force V=Potential only F=Force only > > Computing: M-Number M-Flops % Flops > ----------------------------------------------------------------------------- > Pair Search distance check 2349.753264 21147.779 0.0 > NxN Ewald Elec. + LJ [F] 1771584.591744 116924583.055 96.6 > NxN Ewald Elec. + LJ [V&F] 17953.091840 1920980.827 1.6 > 1,4 nonbonded interactions 5278.575150 475071.763 0.4 > Shift-X 22.173480 133.041 0.0 > Angles 4178.908620 702056.648 0.6 > Propers 879.909030 201499.168 0.2 > Impropers 5.274180 1097.029 0.0 > Pos. Restr. 42.193440 2109.672 0.0 > Virial 22.186710 399.361 0.0 > Update 2209.881420 68506.324 0.1 > Stop-CM 22.248900 222.489 0.0 > Calc-Ekin 44.346960 1197.368 0.0 > Lincs 4414.639320 264878.359 0.2 > Lincs-Mat 100297.229760 401188.919 0.3 > Constraint-V 8829.127980 70633.024 0.1 > Constraint-Vir 22.147020 531.528 0.0 > ----------------------------------------------------------------------------- > Total 121056236.355 100.0 > ----------------------------------------------------------------------------- > R E A L C Y C L E A N D T I M E A C C O U N T I N G > On 1 MPI rank, each using 16 OpenMP threads > > Computing: Num Num Call Wall time Giga-Cycles > Ranks Threads Count (s) total sum % > ----------------------------------------------------------------------------- > Neighbor search 1 16 294 2.191 129.485 1.0 > Launch GPU ops. 1 16 58602 4.257 251.544 2.0 > Force 1 16 29301 23.769 1404.510 11.3 > Wait PME GPU gather 1 16 29301 33.740 1993.695 16.0 > Reduce GPU PME F 1 16 29301 7.244 428.079 3.4 > Wait GPU NB local 1 16 29301 60.054 3548.612 28.5 > NB X/F buffer ops. 1 16 58308 9.823 580.459 4.7 > Write traj. 1 16 7 0.119 7.048 0.1 > Update 1 16 58602 11.089 655.275 5.3 > Constraints 1 16 58602 40.378 2385.992 19.2 > Rest 17.743 1048.462 8.4 > ----------------------------------------------------------------------------- > Total 210.408 12433.160 100.0 > ----------------------------------------------------------------------------- > > Core t (s) Wall t (s) (%) > Time: 3366.529 210.408 1600.0 > (ns/day) (hour/ns) > Performance: 12.032 1.995 > Finished mdrun on rank 0 Mon Dec 10 17:17:04 2018 > > > =========================== ntpmi and ntomp auto ( 4:4 ) > ======================================= > > > <====== ############### ==> > <==== A V E R A G E S ====> > <== ############### ======> > > Statistics over 3301 steps using 34 frames > > Energies (kJ/mol) > Angle G96Angle Proper Dih. Improper Dih. LJ-14 > 9.20586e+05 1.95534e+04 6.56058e+04 2.21093e+02 8.56673e+04 > Coulomb-14 LJ (SR) Disper. corr. Coulomb (SR) Coul. recip. > -2.84553e+07 -1.44595e+05 -2.04658e+03 1.34518e+07 4.26167e+04 > Position Rest. Potential Kinetic En. Total Energy Temperature > 3.83653e+01 -1.40159e+07 1.90353e+05 -1.38255e+07 3.03381e+02 > Pres. DC (bar) Pressure (bar) Constr. rmsd > -2.88685e+00 2.72913e+02 0.00000e+00 > > Total Virial (kJ/mol) > -5.05948e+04 -3.29107e+03 4.84786e+02 > -3.29135e+03 -3.42006e+04 -3.32392e+03 > 4.84606e+02 -3.32403e+03 -2.06849e+04 > > Pressure (bar) > 3.09713e+02 8.98192e+00 -1.19828e+00 > 8.98270e+00 2.73248e+02 8.99543e+00 > -1.19778e+00 8.99573e+00 2.35776e+02 > > T-PDMS T-VMOS > 2.98623e+02 5.82467e+02 > > > P P - P M E L O A D B A L A N C I N G > > NOTE: The PP/PME load balancing was limited by the maximum allowed grid > scaling, > you might not have reached a good load balance. > > PP/PME load balancing changed the cut-off and PME settings: > particle-particle PME > rcoulomb rlist grid spacing 1/beta > initial 1.000 nm 1.000 nm 160 160 128 0.156 nm 0.320 nm > final 1.628 nm 1.628 nm 96 96 80 0.260 nm 0.521 nm > cost-ratio 4.31 0.23 > (note that these numbers concern only part of the total PP and PME load) > > > M E G A - F L O P S A C C O U N T I N G > > NB=Group-cutoff nonbonded kernels NxN=N-by-N cluster Verlet kernels > RF=Reaction-Field VdW=Van der Waals QSTab=quadratic-spline table > W3=SPC/TIP3p W4=TIP4p (single or pairs) > V&F=Potential and force V=Potential only F=Force only > > Computing: M-Number M-Flops % Flops > ----------------------------------------------------------------------------- > Pair Search distance check 285.793872 2572.145 0.0 > NxN Ewald Elec. + LJ [F] 367351.034688 24245168.289 92.1 > NxN Ewald Elec. + LJ [V&F] 3841.181056 411006.373 1.6 > 1,4 nonbonded interactions 594.675150 53520.763 0.2 > Calc Weights 746.884260 26887.833 0.1 > Spread Q Bspline 15933.530880 31867.062 0.1 > Gather F Bspline 15933.530880 95601.185 0.4 > 3D-FFT 154983.295306 1239866.362 4.7 > Solve PME 40.079616 2565.095 0.0 > Reset In Box 2.564280 7.693 0.0 > CG-CoM 2.639700 7.919 0.0 > Angles 470.788620 79092.488 0.3 > Propers 99.129030 22700.548 0.1 > Impropers 0.594180 123.589 0.0 > Pos. Restr. 4.753440 237.672 0.0 > Virial 2.570400 46.267 0.0 > Update 248.961420 7717.804 0.0 > Stop-CM 2.639700 26.397 0.0 > Calc-Ekin 5.128560 138.471 0.0 > Lincs 557.713246 33462.795 0.1 > Lincs-Mat 12624.363456 50497.454 0.2 > Constraint-V 1115.257670 8922.061 0.0 > Constraint-Vir 2.871389 68.913 0.0 > ----------------------------------------------------------------------------- > Total 26312105.181 100.0 > ----------------------------------------------------------------------------- > > > D O M A I N D E C O M P O S I T I O N S T A T I S T I C S > > av. #atoms communicated per step for force: 2 x 16748.9 > av. #atoms communicated per step for LINCS: 2 x 9361.6 > > > Dynamic load balancing report: > DLB was off during the run due to low measured imbalance. > Average load imbalance: 3.4%. > The balanceable part of the MD step is 46%, load imbalance is computed from > this. > Part of the total run time spent waiting due to load imbalance: 1.6%. > > > R E A L C Y C L E A N D T I M E A C C O U N T I N G > > On 4 MPI ranks, each using 4 OpenMP threads > > Computing: Num Num Call Wall time Giga-Cycles > Ranks Threads Count (s) total sum % > ----------------------------------------------------------------------------- > Domain decomp. 4 4 34 0.457 26.976 1.0 > DD comm. load 4 4 2 0.000 0.008 0.0 > Neighbor search 4 4 34 0.138 8.160 0.3 > Launch GPU ops. 4 4 6602 0.441 26.070 0.9 > Comm. coord. 4 4 3267 0.577 34.081 1.2 > Force 4 4 3301 2.298 135.761 4.9 > Wait + Comm. F 4 4 3301 0.276 16.330 0.6 > PME mesh 4 4 3301 25.822 1525.817 54.8 > Wait GPU NB nonloc. 4 4 3301 0.132 7.819 0.3 > Wait GPU NB local 4 4 3301 0.012 0.724 0.0 > NB X/F buffer ops. 4 4 13136 0.471 27.822 1.0 > Write traj. 4 4 2 0.014 0.839 0.0 > Update 4 4 6602 1.006 59.442 2.1 > Constraints 4 4 6602 6.926 409.290 14.7 > Comm. energies 4 4 34 0.009 0.524 0.0 > Rest 8.548 505.108 18.1 > ----------------------------------------------------------------------------- > Total 47.127 2784.772 100.0 > ----------------------------------------------------------------------------- > Breakdown of PME mesh computation > ----------------------------------------------------------------------------- > PME redist. X/F 4 4 6602 2.538 149.998 5.4 > PME spread 4 4 3301 6.055 357.770 12.8 > PME gather 4 4 3301 3.432 202.814 7.3 > PME 3D-FFT 4 4 6602 10.559 623.925 22.4 > PME 3D-FFT Comm. 4 4 6602 2.691 158.993 5.7 > PME solve Elec 4 4 3301 0.521 30.805 1.1 > ----------------------------------------------------------------------------- > > Core t (s) Wall t (s) (%) > Time: 754.033 47.127 1600.0 > (ns/day) (hour/ns) > Performance: 6.052 3.966 > Finished mdrun on rank 0 Mon Dec 10 17:10:34 2018 > > > =========================================== ntmpi 2: ntomp 8 > ============================================== > > <====== ############### ==> > <==== A V E R A G E S ====> > <== ############### ======> > > Statistics over 11201 steps using 113 frames > > Energies (kJ/mol) > Angle G96Angle Proper Dih. Improper Dih. LJ-14 > 9.16403e+05 2.12953e+04 6.61725e+04 2.26296e+02 8.35215e+04 > Coulomb-14 LJ (SR) Disper. corr. Coulomb (SR) Coul. recip. > -2.84508e+07 -1.43740e+05 -2.04658e+03 1.34647e+07 2.76232e+04 > Position Rest. Potential Kinetic En. Total Energy Temperature > 5.93627e+01 -1.40166e+07 1.88847e+05 -1.38277e+07 3.00981e+02 > Pres. DC (bar) Pressure (bar) Constr. rmsd > -2.88685e+00 8.53077e+01 0.00000e+00 > > Total Virial (kJ/mol) > 3.15233e+04 -6.80636e+02 9.80007e+01 > -6.81075e+02 2.45640e+04 -1.40642e+03 > 9.81033e+01 -1.40643e+03 4.02877e+04 > > Pressure (bar) > 8.11163e+01 1.87348e+00 -2.03329e-01 > 1.87469e+00 1.09211e+02 3.83468e+00 > -2.03613e-01 3.83470e+00 6.55961e+01 > > T-PDMS T-VMOS > 2.99551e+02 3.84895e+02 > > > P P - P M E L O A D B A L A N C I N G > > NOTE: The PP/PME load balancing was limited by the maximum allowed grid > scaling, > you might not have reached a good load balance. > > PP/PME load balancing changed the cut-off and PME settings: > particle-particle PME > rcoulomb rlist grid spacing 1/beta > initial 1.000 nm 1.000 nm 160 160 128 0.156 nm 0.320 nm > final 1.628 nm 1.628 nm 96 96 80 0.260 nm 0.521 nm > cost-ratio 4.31 0.23 > (note that these numbers concern only part of the total PP and PME load) > > > M E G A - F L O P S A C C O U N T I N G > > NB=Group-cutoff nonbonded kernels NxN=N-by-N cluster Verlet kernels > RF=Reaction-Field VdW=Van der Waals QSTab=quadratic-spline table > W3=SPC/TIP3p W4=TIP4p (single or pairs) > V&F=Potential and force V=Potential only F=Force only > > Computing: M-Number M-Flops % Flops > ----------------------------------------------------------------------------- > Pair Search distance check 1057.319360 9515.874 0.0 > NxN Ewald Elec. + LJ [F] 1410325.411968 93081477.190 93.9 > NxN Ewald Elec. + LJ [V&F] 14378.367616 1538485.335 1.6 > 1,4 nonbonded interactions 2017.860150 181607.413 0.2 > Calc Weights 2534.338260 91236.177 0.1 > Spread Q Bspline 54065.882880 108131.766 0.1 > Gather F Bspline 54065.882880 324395.297 0.3 > 3D-FFT 383450.341906 3067602.735 3.1 > Solve PME 113.199616 7244.775 0.0 > Reset In Box 8.522460 25.567 0.0 > CG-CoM 8.597880 25.794 0.0 > Angles 1597.486620 268377.752 0.3 > Propers 336.366030 77027.821 0.1 > Impropers 2.016180 419.365 0.0 > Pos. Restr. 16.129440 806.472 0.0 > Virial 8.532630 153.587 0.0 > Update 844.779420 26188.162 0.0 > Stop-CM 8.597880 85.979 0.0 > Calc-Ekin 17.044920 460.213 0.0 > Lincs 1753.732822 105223.969 0.1 > Lincs-Mat 39788.083512 159152.334 0.2 > Constraint-V 3507.309174 28058.473 0.0 > Constraint-Vir 8.845375 212.289 0.0 > ----------------------------------------------------------------------------- > Total 99075914.342 100.0 > ----------------------------------------------------------------------------- > > > D O M A I N D E C O M P O S I T I O N S T A T I S T I C S > > av. #atoms communicated per step for force: 2 x 6810.8 > av. #atoms communicated per step for LINCS: 2 x 3029.3 > > > Dynamic load balancing report: > DLB was off during the run due to low measured imbalance. > Average load imbalance: 0.8%. > The balanceable part of the MD step is 46%, load imbalance is computed from > this. > Part of the total run time spent waiting due to load imbalance: 0.4%. > > > R E A L C Y C L E A N D T I M E A C C O U N T I N G > > On 2 MPI ranks, each using 8 OpenMP threads > > Computing: Num Num Call Wall time Giga-Cycles > Ranks Threads Count (s) total sum % > ----------------------------------------------------------------------------- > Domain decomp. 2 8 113 1.532 90.505 1.4 > DD comm. load 2 8 4 0.000 0.027 0.0 > Neighbor search 2 8 113 0.442 26.107 0.4 > Launch GPU ops. 2 8 22402 1.230 72.668 1.1 > Comm. coord. 2 8 11088 0.894 52.844 0.8 > Force 2 8 11201 8.166 482.534 7.5 > Wait + Comm. F 2 8 11201 0.672 39.720 0.6 > PME mesh 2 8 11201 61.637 3642.183 56.6 > Wait GPU NB nonloc. 2 8 11201 0.342 20.205 0.3 > Wait GPU NB local 2 8 11201 0.031 1.847 0.0 > NB X/F buffer ops. 2 8 44578 1.793 105.947 1.6 > Write traj. 2 8 4 0.040 2.386 0.0 > Update 2 8 22402 4.148 245.121 3.8 > Constraints 2 8 22402 19.207 1134.940 17.6 > Comm. energies 2 8 113 0.006 0.354 0.0 > Rest 8.801 520.065 8.1 > ----------------------------------------------------------------------------- > Total 108.942 6437.452 100.0 > ----------------------------------------------------------------------------- > Breakdown of PME mesh computation > ----------------------------------------------------------------------------- > PME redist. X/F 2 8 22402 4.992 294.991 4.6 > PME spread 2 8 11201 16.979 1003.299 15.6 > PME gather 2 8 11201 11.687 690.563 10.7 > PME 3D-FFT 2 8 22402 21.648 1279.195 19.9 > PME 3D-FFT Comm. 2 8 22402 4.985 294.567 4.6 > PME solve Elec 2 8 11201 1.241 73.332 1.1 > ----------------------------------------------------------------------------- > > Core t (s) Wall t (s) (%) > Time: 1743.073 108.942 1600.0 > (ns/day) (hour/ns) > Performance: 8.883 2.702 > Finished mdrun on rank 0 Mon Dec 10 17:01:45 2018 > > > > -- > Gromacs Users mailing list > > * Please search the archive at > http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting! > > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists > > * For (un)subscribe requests visit > https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a > mail to gmx-users-requ...@gromacs.org. -- Gromacs Users mailing list * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting! * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists * For (un)subscribe requests visit https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a mail to gmx-users-requ...@gromacs.org.