Dear Users, I'm simulating a membrane protein system with approximately 185000 atoms with an Intel Corei7 cpu. I have two questions: 1. Performance of my simulations is about 1.8ns/day. Is this performance normal for such a system? Or my simulations are suffering from lack of performance? 2. when I use mdrun command with -nb gpu, the performance reduces to 1.3ns/day!! How can I resolve this problem?
my mdp file parameters are: integrator = md dt = 0.002 nsteps = 15000000 nstlog = 1000 nstxout = 5000 nstvout = 5000 nstfout = 5000 nstcalcenergy = 100 nstenergy = 1000 nstxtcout = 2000 ; xtc compressed trajectory output every 2 ps ; cutoff-scheme = Verlet nstlist = 20 rlist = 1.0 coulombtype = pme rcoulomb = 1.0 vdwtype = Cut-off vdw-modifier = Force-switch rvdw_switch = 0.9 rvdw = 1.0 ; tcoupl = berendsen tc_grps = PROT NPROT SOL_ION tau_t = 1.0 1.0 1.0 ref_t = 303.15 303.15 303.15 ; pcoupl = berendsen pcoupltype = semiisotropic tau_p = 5.0 5.0 compressibility = 4.5e-5 4.5e-5 ref_p = 1.0 1.0 ; ; constraints = h-bonds constraint_algorithm = LINCS continuation = yes ; nstcomm = 100 comm_mode = linear comm_grps = PROT NPROT SOL_ION ; refcoord_scaling = com and at the end of log file when I use gpu I have: NB=Group-cutoff nonbonded kernels NxN=N-by-N cluster Verlet kernels RF=Reaction-Field VdW=Van der Waals QSTab=quadratic-spline table W3=SPC/TIP3p W4=TIP4p (single or pairs) V&F=Potential and force V=Potential only F=Force only Computing: M-Number M-Flops % Flops ----------------------------------------------------------------------------- NB VdW [V&F] 65.721780 65.722 0.0 Pair Search distance check 354.095696 3186.861 0.1 NxN QSTab Elec. + LJ [F] 78361.108992 4153138.777 92.2 NxN QSTab Elec. + LJ [V&F] 1094.086656 88621.019 2.0 1,4 nonbonded interactions 92.366244 8312.962 0.2 Calc Weights 273.463938 9844.702 0.2 Spread Q Bspline 5833.897344 11667.795 0.3 Gather F Bspline 5833.897344 35003.384 0.8 3D-FFT 19866.277292 158930.218 3.5 Solve PME 5.271904 337.402 0.0 Shift-X 2.625854 15.755 0.0 Bonds 14.647068 864.177 0.0 Propers 106.938468 24488.909 0.5 Impropers 1.961496 407.991 0.0 Virial 4.877756 87.800 0.0 Stop-CM 1.125366 11.254 0.0 Calc-Ekin 9.753172 263.336 0.0 Lincs 20.162196 1209.732 0.0 Lincs-Mat 129.913632 519.655 0.0 Constraint-V 96.517170 772.137 0.0 Constraint-Vir 4.084834 98.036 0.0 Settle 18.730926 6050.089 0.1 (null) 0.653184 0.000 0.0 ----------------------------------------------------------------------------- Total 4503897.712 100.0 ----------------------------------------------------------------------------- R E A L C Y C L E A N D T I M E A C C O U N T I N G On 1 MPI rank, each using 8 OpenMP threads Computing: Num Num Call Wall time Giga-Cycles Ranks Threads Count (s) total sum % ----------------------------------------------------------------------------- Neighbor search 1 8 14 0.301 8.175 0.4 Launch GPU ops. 1 8 486 0.063 1.719 0.1 Force 1 8 486 4.351 118.334 6.3 PME mesh 1 8 486 8.685 236.229 12.5 Wait GPU local 1 8 486 52.321 1423.144 75.5 NB X/F buffer ops. 1 8 958 0.389 10.571 0.6 Write traj. 1 8 1 0.265 7.221 0.4 Update 1 8 486 0.989 26.887 1.4 Constraints 1 8 486 1.041 28.308 1.5 Rest 0.915 24.895 1.3 ----------------------------------------------------------------------------- Total 69.319 1885.482 100.0 ----------------------------------------------------------------------------- Breakdown of PME mesh computation ----------------------------------------------------------------------------- PME spread/gather 1 8 972 5.574 151.608 8.0 PME 3D-FFT 1 8 972 2.862 77.836 4.1 PME solve Elec 1 8 486 0.216 5.880 0.3 ----------------------------------------------------------------------------- GPU timings ----------------------------------------------------------------------------- Computing: Count Wall t (s) ms/step % ----------------------------------------------------------------------------- Pair list H2D 14 0.027 1.919 0.0 X / q H2D 486 0.262 0.539 0.4 Nonbonded F kernel 460 59.334 128.988 90.8 Nonbonded F+ene k. 12 2.819 234.875 4.3 Nonbonded F+ene+prune k. 14 2.761 197.239 4.2 F D2H 486 0.174 0.359 0.3 ----------------------------------------------------------------------------- Total 65.378 134.522 100.0 ----------------------------------------------------------------------------- Force evaluation time GPU/CPU: 134.522 ms/26.822 ms = 5.015 For optimal performance this ratio should be close to 1! NOTE: The GPU has >20% more load than the CPU. This imbalance causes performance loss, consider using a shorter cut-off and a finer PME grid. Core t (s) Wall t (s) (%) Time: 550.116 69.319 793.6 (ns/day) (hour/ns) Performance: 1.212 19.810 Best, Hadi -- Gromacs Users mailing list * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting! * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists * For (un)subscribe requests visit https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a mail to gmx-users-requ...@gromacs.org.