Re: [gmx-users] Updating GTX670 PCIE speed from 5GT/s to 8GT/s resulted in about 10% speedup of md_run.

2013-12-05 Thread Szilárd Páll
Just to re-iterate what Henk said, this tweak is rather safe and if it
happens to cause problems, from what I've read/hear that will become
obvious quite quickly. Hence, it is worth trying - especially if you
have a CPU-GPU imbalance with your hardware and simulation system.

I have personally used this tweak successfully with X79 and Z77
motherboards and various GeForce cards.

Cheers,
--
Szilárd


On Wed, Dec 4, 2013 at 6:58 PM, Henk Neefs henk.ne...@gmail.com wrote:
 Your suspicion is correct, the CPU is waiting 20% of the runtime for the GPU
 to finish.
 --
 Henk


 --
 View this message in context: 
 http://gromacs.5086.x6.nabble.com/Updating-GTX670-PCIE-speed-from-5GT-s-to-8GT-s-resulted-in-about-10-speedup-of-md-run-tp5012945p5013082.html
 Sent from the GROMACS Users Forum mailing list archive at Nabble.com.
 --
 Gromacs Users mailing list

 * Please search the archive at 
 http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting!

 * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists

 * For (un)subscribe requests visit
 https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a 
 mail to gmx-users-requ...@gromacs.org.
-- 
Gromacs Users mailing list

* Please search the archive at 
http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting!

* Can't post? Read http://www.gromacs.org/Support/Mailing_Lists

* For (un)subscribe requests visit
https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a 
mail to gmx-users-requ...@gromacs.org.


Re: [gmx-users] Updating GTX670 PCIE speed from 5GT/s to 8GT/s resulted in about 10% speedup of md_run.

2013-12-04 Thread Szilárd Páll
Hi Henk,

Thanks for the useful comments!

When you run on a single GPU, you do get full timing details both on
CPU and GPU - just have a look at the performance tables at the end of
the log file. Alternatively you can simply run nvrpof mdrun  which
will by default give you a nice overview of profiling output of CUDA
device and API calls.

Regarding the performance improvement, I'm suspecting that you are
probably seeing the full speed improvement that comes from
5GT/s-8GT/s because of the CPU-GPU load imbalance in your run -
probably the CPU one is waiting 20% of the runtime for the GPU to
finish. Hence, in these imbalanced cases any improvement on the GPU
side - transfer or kernel -, will translate straight into decrease in
wall-time.

We are working on a few things that should improve performance in this
scenario like using multiple weakly dependant non-bonded tasks to some
transfer/kernel overlap; non-bonded task splitting for a better load
balance.

Cheers,
--
Szilárd


On Wed, Dec 4, 2013 at 8:28 AM, Henk Neefs henk.ne...@gmail.com wrote:
 Below information might be of interest to the Gromacs
 development/optimization team.

 What can we derive from the 10% md_run speedup when PCIE3.0 speed increases
 from 5GT/s-8GT/s?

 A 60% PCIE speed increase results in a 10% run time reduction.
 Hence about 10/60=17% of the run time gets spent in (non-overlapping) PCIE
 bus communication for this particular configuration and for this particular
 simulated molecular system.
 I'm refering to the non-overlapping part as this is the part that is not
 hidden by (not overlapped with) calculations.

 So changing the PCIE speed provides a (non-user-friendly) knob to the
 gromacs developers to estimate the part of the run time that is determined
 by the (non-overlapping) PCIE bus communication.

 Not sure whether the Nvidia CUDA profiling environment provides a better way
 to quantify this. In case there isn't a better way, above method is a poor
 man's flow (for which you likely need root access) to provide this
 quantification.
 --
 Henk Neefs
 Gromacs user


 --
 View this message in context: 
 http://gromacs.5086.x6.nabble.com/Updating-GTX670-PCIE-speed-from-5GT-s-to-8GT-s-resulted-in-about-10-speedup-of-md-run-tp5012945p5013031.html
 Sent from the GROMACS Users Forum mailing list archive at Nabble.com.
 --
 Gromacs Users mailing list

 * Please search the archive at 
 http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting!

 * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists

 * For (un)subscribe requests visit
 https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a 
 mail to gmx-users-requ...@gromacs.org.
-- 
Gromacs Users mailing list

* Please search the archive at 
http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting!

* Can't post? Read http://www.gromacs.org/Support/Mailing_Lists

* For (un)subscribe requests visit
https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a 
mail to gmx-users-requ...@gromacs.org.


Re: [gmx-users] Updating GTX670 PCIE speed from 5GT/s to 8GT/s resulted in about 10% speedup of md_run.

2013-12-03 Thread Henk Neefs
Below information might be of interest to the Gromacs
development/optimization team.

What can we derive from the 10% md_run speedup when PCIE3.0 speed increases
from 5GT/s-8GT/s?

A 60% PCIE speed increase results in a 10% run time reduction. 
Hence about 10/60=17% of the run time gets spent in (non-overlapping) PCIE
bus communication for this particular configuration and for this particular
simulated molecular system.
I'm refering to the non-overlapping part as this is the part that is not
hidden by (not overlapped with) calculations.

So changing the PCIE speed provides a (non-user-friendly) knob to the
gromacs developers to estimate the part of the run time that is determined
by the (non-overlapping) PCIE bus communication.

Not sure whether the Nvidia CUDA profiling environment provides a better way
to quantify this. In case there isn't a better way, above method is a poor
man's flow (for which you likely need root access) to provide this
quantification.
--
Henk Neefs
Gromacs user


--
View this message in context: 
http://gromacs.5086.x6.nabble.com/Updating-GTX670-PCIE-speed-from-5GT-s-to-8GT-s-resulted-in-about-10-speedup-of-md-run-tp5012945p5013031.html
Sent from the GROMACS Users Forum mailing list archive at Nabble.com.
-- 
Gromacs Users mailing list

* Please search the archive at 
http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting!

* Can't post? Read http://www.gromacs.org/Support/Mailing_Lists

* For (un)subscribe requests visit
https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a 
mail to gmx-users-requ...@gromacs.org.


[gmx-users] Updating GTX670 PCIE speed from 5GT/s to 8GT/s resulted in about 10% speedup of md_run.

2013-11-30 Thread Henk Neefs
By configuring 8 GT/s PCIE 3.0 for the Nvidia driver, I got a 10% speedup
on Gromacs md_run.
   45 ns/day - 50 ns/day (1AKI protein).

This posting is just informational, my findings on how to do this, so
others can possibly also exploit this if they desire so. There is no
question that i'm asking here.

Config: Intel Ivytown (Ivybridge family processor (i7-4960X).
Nvidia GeForce GTX670. Single GPU card installed.
ASUS X79 Deluxe Motherboard. Single socket system.
Fedora 19
Nvidia driver: 331.20
Application: md_run
Gromacs 4.6.4
1AKI protein from tutorial
http://www.bevanlab.biochem.vt.edu/Pages/Personal/justin/gmx-tutorials/lysozyme/index.html

To measure PCIe Link Speed, do either of:
  1. nvidia-settings  (from cmd-line, this is part of the Nvidia drivers
package).
 Look under PowerMizer to see what PCIe Link Speed is presently used
(will change under load).
 run md_run to generate a load. If 8GT/s is not enabled then expect to
see 5 GT/s during an md_run.

  2. lspci -vv | grep -i nvidia
 Use the device PCI reg address to get details (needs root privileges):
 lspci -vv  -s 01:00.0
 You would see something like:
LnkCap: Port #0, Speed 8GT/s, Width x16= PCIE Capability is
set to 8 GT/s here (this is after applying below settings).
LnkSta: Speed 2.5GT/s, Width x16  = PCIE Link State
presently low speed as md_run is not running (to save power).

If the capability shows 5 GT/s (or speed is just 5 GT/s under md_run load)
then you can try below setting to elevate to 8 GT/s.

I'm using the parameter NVreg_EnablePCIeGen3=1 as provided by the Nvidia
driver.

1. I tried it first during a single Fedora boot and ran md_run (a 30 mins
wall-time example) to determine whether it's stable.
   During boot, when the boot options show up for the images:
Press 'e' to edit the cmd line.
Add to the cmd line:
  nvidia.NVreg_EnablePCIeGen3=1
Then 'Ctrl-X' to boot with the cmd-line (note: this option will be
forgotten on the next boot).
Run md_run or some other tests that heavily exercise the graphics card
to gauge the stability.

2. To make the change permanent (as root):
edit /etc/default/grub and add to the GRUB_CMDLINE_LINUX option:
  nvidia.NVreg_EnablePCIeGen3=1
grub2-mkconfig -o /boot/grub2/grub.cfg
   Now reboot and the setting will take effect from then onwards (for every
boot).

FYI: Note that I'm using an Intel i7-4960X (Ivybridge family: 15MB, 6
Cores). It seems to support PCIE 3.0 with 8 GT/s (I'm not speaking for
Intel).

Disclaimer: Nvidia does not guarantee link/system stability when doing
this. I don't either. It works for me. Your mileage may vary. I only have a
single GPU card on the PCIE bus.

--
Henk Neefs
Gromacs User
Computer Architect (Ivytown chip architect at Intel).
Not speaking for Intel, these are all personal opinions.
-- 
Gromacs Users mailing list

* Please search the archive at 
http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting!

* Can't post? Read http://www.gromacs.org/Support/Mailing_Lists

* For (un)subscribe requests visit
https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a 
mail to gmx-users-requ...@gromacs.org.