Re: [gmx-users] Updating GTX670 PCIE speed from 5GT/s to 8GT/s resulted in about 10% speedup of md_run.
Just to re-iterate what Henk said, this tweak is rather safe and if it happens to cause problems, from what I've read/hear that will become obvious quite quickly. Hence, it is worth trying - especially if you have a CPU-GPU imbalance with your hardware and simulation system. I have personally used this tweak successfully with X79 and Z77 motherboards and various GeForce cards. Cheers, -- Szilárd On Wed, Dec 4, 2013 at 6:58 PM, Henk Neefs henk.ne...@gmail.com wrote: Your suspicion is correct, the CPU is waiting 20% of the runtime for the GPU to finish. -- Henk -- View this message in context: http://gromacs.5086.x6.nabble.com/Updating-GTX670-PCIE-speed-from-5GT-s-to-8GT-s-resulted-in-about-10-speedup-of-md-run-tp5012945p5013082.html Sent from the GROMACS Users Forum mailing list archive at Nabble.com. -- Gromacs Users mailing list * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting! * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists * For (un)subscribe requests visit https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a mail to gmx-users-requ...@gromacs.org. -- Gromacs Users mailing list * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting! * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists * For (un)subscribe requests visit https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a mail to gmx-users-requ...@gromacs.org.
Re: [gmx-users] Updating GTX670 PCIE speed from 5GT/s to 8GT/s resulted in about 10% speedup of md_run.
Hi Henk, Thanks for the useful comments! When you run on a single GPU, you do get full timing details both on CPU and GPU - just have a look at the performance tables at the end of the log file. Alternatively you can simply run nvrpof mdrun which will by default give you a nice overview of profiling output of CUDA device and API calls. Regarding the performance improvement, I'm suspecting that you are probably seeing the full speed improvement that comes from 5GT/s-8GT/s because of the CPU-GPU load imbalance in your run - probably the CPU one is waiting 20% of the runtime for the GPU to finish. Hence, in these imbalanced cases any improvement on the GPU side - transfer or kernel -, will translate straight into decrease in wall-time. We are working on a few things that should improve performance in this scenario like using multiple weakly dependant non-bonded tasks to some transfer/kernel overlap; non-bonded task splitting for a better load balance. Cheers, -- Szilárd On Wed, Dec 4, 2013 at 8:28 AM, Henk Neefs henk.ne...@gmail.com wrote: Below information might be of interest to the Gromacs development/optimization team. What can we derive from the 10% md_run speedup when PCIE3.0 speed increases from 5GT/s-8GT/s? A 60% PCIE speed increase results in a 10% run time reduction. Hence about 10/60=17% of the run time gets spent in (non-overlapping) PCIE bus communication for this particular configuration and for this particular simulated molecular system. I'm refering to the non-overlapping part as this is the part that is not hidden by (not overlapped with) calculations. So changing the PCIE speed provides a (non-user-friendly) knob to the gromacs developers to estimate the part of the run time that is determined by the (non-overlapping) PCIE bus communication. Not sure whether the Nvidia CUDA profiling environment provides a better way to quantify this. In case there isn't a better way, above method is a poor man's flow (for which you likely need root access) to provide this quantification. -- Henk Neefs Gromacs user -- View this message in context: http://gromacs.5086.x6.nabble.com/Updating-GTX670-PCIE-speed-from-5GT-s-to-8GT-s-resulted-in-about-10-speedup-of-md-run-tp5012945p5013031.html Sent from the GROMACS Users Forum mailing list archive at Nabble.com. -- Gromacs Users mailing list * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting! * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists * For (un)subscribe requests visit https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a mail to gmx-users-requ...@gromacs.org. -- Gromacs Users mailing list * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting! * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists * For (un)subscribe requests visit https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a mail to gmx-users-requ...@gromacs.org.
Re: [gmx-users] Updating GTX670 PCIE speed from 5GT/s to 8GT/s resulted in about 10% speedup of md_run.
Below information might be of interest to the Gromacs development/optimization team. What can we derive from the 10% md_run speedup when PCIE3.0 speed increases from 5GT/s-8GT/s? A 60% PCIE speed increase results in a 10% run time reduction. Hence about 10/60=17% of the run time gets spent in (non-overlapping) PCIE bus communication for this particular configuration and for this particular simulated molecular system. I'm refering to the non-overlapping part as this is the part that is not hidden by (not overlapped with) calculations. So changing the PCIE speed provides a (non-user-friendly) knob to the gromacs developers to estimate the part of the run time that is determined by the (non-overlapping) PCIE bus communication. Not sure whether the Nvidia CUDA profiling environment provides a better way to quantify this. In case there isn't a better way, above method is a poor man's flow (for which you likely need root access) to provide this quantification. -- Henk Neefs Gromacs user -- View this message in context: http://gromacs.5086.x6.nabble.com/Updating-GTX670-PCIE-speed-from-5GT-s-to-8GT-s-resulted-in-about-10-speedup-of-md-run-tp5012945p5013031.html Sent from the GROMACS Users Forum mailing list archive at Nabble.com. -- Gromacs Users mailing list * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting! * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists * For (un)subscribe requests visit https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a mail to gmx-users-requ...@gromacs.org.
[gmx-users] Updating GTX670 PCIE speed from 5GT/s to 8GT/s resulted in about 10% speedup of md_run.
By configuring 8 GT/s PCIE 3.0 for the Nvidia driver, I got a 10% speedup on Gromacs md_run. 45 ns/day - 50 ns/day (1AKI protein). This posting is just informational, my findings on how to do this, so others can possibly also exploit this if they desire so. There is no question that i'm asking here. Config: Intel Ivytown (Ivybridge family processor (i7-4960X). Nvidia GeForce GTX670. Single GPU card installed. ASUS X79 Deluxe Motherboard. Single socket system. Fedora 19 Nvidia driver: 331.20 Application: md_run Gromacs 4.6.4 1AKI protein from tutorial http://www.bevanlab.biochem.vt.edu/Pages/Personal/justin/gmx-tutorials/lysozyme/index.html To measure PCIe Link Speed, do either of: 1. nvidia-settings (from cmd-line, this is part of the Nvidia drivers package). Look under PowerMizer to see what PCIe Link Speed is presently used (will change under load). run md_run to generate a load. If 8GT/s is not enabled then expect to see 5 GT/s during an md_run. 2. lspci -vv | grep -i nvidia Use the device PCI reg address to get details (needs root privileges): lspci -vv -s 01:00.0 You would see something like: LnkCap: Port #0, Speed 8GT/s, Width x16= PCIE Capability is set to 8 GT/s here (this is after applying below settings). LnkSta: Speed 2.5GT/s, Width x16 = PCIE Link State presently low speed as md_run is not running (to save power). If the capability shows 5 GT/s (or speed is just 5 GT/s under md_run load) then you can try below setting to elevate to 8 GT/s. I'm using the parameter NVreg_EnablePCIeGen3=1 as provided by the Nvidia driver. 1. I tried it first during a single Fedora boot and ran md_run (a 30 mins wall-time example) to determine whether it's stable. During boot, when the boot options show up for the images: Press 'e' to edit the cmd line. Add to the cmd line: nvidia.NVreg_EnablePCIeGen3=1 Then 'Ctrl-X' to boot with the cmd-line (note: this option will be forgotten on the next boot). Run md_run or some other tests that heavily exercise the graphics card to gauge the stability. 2. To make the change permanent (as root): edit /etc/default/grub and add to the GRUB_CMDLINE_LINUX option: nvidia.NVreg_EnablePCIeGen3=1 grub2-mkconfig -o /boot/grub2/grub.cfg Now reboot and the setting will take effect from then onwards (for every boot). FYI: Note that I'm using an Intel i7-4960X (Ivybridge family: 15MB, 6 Cores). It seems to support PCIE 3.0 with 8 GT/s (I'm not speaking for Intel). Disclaimer: Nvidia does not guarantee link/system stability when doing this. I don't either. It works for me. Your mileage may vary. I only have a single GPU card on the PCIE bus. -- Henk Neefs Gromacs User Computer Architect (Ivytown chip architect at Intel). Not speaking for Intel, these are all personal opinions. -- Gromacs Users mailing list * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting! * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists * For (un)subscribe requests visit https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a mail to gmx-users-requ...@gromacs.org.