Hi Henk, Thanks for the useful comments!
When you run on a single GPU, you do get full timing details both on CPU and GPU - just have a look at the performance tables at the end of the log file. Alternatively you can simply run nvrpof mdrun .... which will by default give you a nice overview of profiling output of CUDA device and API calls. Regarding the performance improvement, I'm suspecting that you are probably seeing the full speed improvement that comes from 5GT/s->8GT/s because of the CPU-GPU load imbalance in your run - probably the CPU one is waiting >20% of the runtime for the GPU to finish. Hence, in these imbalanced cases any improvement on the GPU side - transfer or kernel -, will translate straight into decrease in wall-time. We are working on a few things that should improve performance in this scenario like using multiple weakly dependant non-bonded tasks to some transfer/kernel overlap; non-bonded task splitting for a better load balance. Cheers, -- Szilárd On Wed, Dec 4, 2013 at 8:28 AM, Henk Neefs <henk.ne...@gmail.com> wrote: > Below information might be of interest to the Gromacs > development/optimization team. > > What can we derive from the 10% md_run speedup when PCIE3.0 speed increases > from 5GT/s->8GT/s? > > A 60% PCIE speed increase results in a 10% run time reduction. > Hence about 10/60=17% of the run time gets spent in (non-overlapping) PCIE > bus communication for this particular configuration and for this particular > simulated molecular system. > I'm refering to the "non-overlapping" part as this is the part that is not > hidden by (not overlapped with) calculations. > > So changing the PCIE speed provides a (non-user-friendly) knob to the > gromacs developers to estimate the part of the run time that is determined > by the (non-overlapping) PCIE bus communication. > > Not sure whether the Nvidia CUDA profiling environment provides a better way > to quantify this. In case there isn't a better way, above method is a poor > man's flow (for which you likely need root access) to provide this > quantification. > -- > Henk Neefs > Gromacs user > > > -- > View this message in context: > http://gromacs.5086.x6.nabble.com/Updating-GTX670-PCIE-speed-from-5GT-s-to-8GT-s-resulted-in-about-10-speedup-of-md-run-tp5012945p5013031.html > Sent from the GROMACS Users Forum mailing list archive at Nabble.com. > -- > Gromacs Users mailing list > > * Please search the archive at > http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting! > > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists > > * For (un)subscribe requests visit > https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a > mail to gmx-users-requ...@gromacs.org. -- Gromacs Users mailing list * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting! * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists * For (un)subscribe requests visit https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a mail to gmx-users-requ...@gromacs.org.