Dear Szilard, yes it seems i just should have done a bit more reserarch regarding the optimal CPU/GPU combination ... and as you point out, the bonded interactions are the culprits ... most often people probably simulate aqueous systems, in which LINCS does most of this work here i have a polymer glass ... different story ... the flops table you miss was in my previous mail (see below for another copy) and indeed it tells me that 65% of ther CPU load is "Force" while only 15.5% is for PME mesh, and i assume only the latter is what can be modified by dynamic load balancing ... i assume this means there is no way to improve things ... i guess i just have to live with the fact that for this type of system my slow CPU is the bottleneck ... if you have any other ideas please let me know... regards mic
: Computing: Num Num Call Wall time Giga-Cycles Ranks Threads Count (s) total sum % ----------------------------------------------------------------------------- Neighbor search 1 12 251 0.574 23.403 2.1 Launch GPU ops. 1 12 10001 0.627 25.569 2.3 Force 1 12 10001 17.392 709.604 64.5 PME mesh 1 12 10001 4.172 170.234 15.5 Wait GPU local 1 12 10001 0.206 8.401 0.8 NB X/F buffer ops. 1 12 19751 0.239 9.736 0.9 Write traj. 1 12 11 0.381 15.554 1.4 Update 1 12 10001 0.303 12.365 1.1 Constraints 1 12 10001 1.458 59.489 5.4 Rest 1.621 66.139 6.0 ----------------------------------------------------------------------------- Total 26.973 1100.493 100.0 =============================== Why be happy when you could be normal? -------------------------------------------- On Tue, 9/16/14, Szilárd Páll <pall.szil...@gmail.com> wrote: Subject: Re: [gmx-users] GPU waits for CPU, any remedies? To: "Michael Brunsteiner" <mbx0...@yahoo.com> Cc: "Discussion list for GROMACS users" <gmx-us...@gromacs.org>, "gromacs.org_gmx-users@maillist.sys.kth.se" <gromacs.org_gmx-users@maillist.sys.kth.se> Date: Tuesday, September 16, 2014, 6:52 PM Well, it looks like you are i) unlucky ii) limited by the huge bonded workload. i) As your system is quite small, mdrun thinks that there are no convenient grids between 32x32x32 and 28x28x28 (see the PP-PME tuning output). As the latter corresponds to quite a big jump in cut-off (from 1.296 to 1.482) which more than doubles the non-bonded workload and is slower than the former, mdrun sticks to using 1.296 nm as coulomb cut-off. You may be able to gain some performance by tweaking your fourier grid spacing a bit to help mdrun generating some additional grids that could give more cut-off settings in the 1.3-1.48 range. However, on a second thought, there aren't more convenient grid sizes between 28 and 32, I guess. ii) The primary issue is however that your bonded workload is much higher than it normally is. I'm not fully familiar with the implementation, but I think this may be due to the RB term which is quite slow. This time it's the flops table that could confirm this this, but as you still have not shared the entire log file, we/I can't tell. Cheers, -- Szilárd -- Gromacs Users mailing list * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting! * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists * For (un)subscribe requests visit https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a mail to gmx-users-requ...@gromacs.org.