Re: [gmx-users] GPU performance

2013-04-09 Thread Benjamin Bobay
Szilárd -

First, many thanks for the reply.

Second, I am glad that I am not crazy.

Ok so based on your suggestions, I think I know what the problem is/was.
There was a sander process running on 1 of the CPUs.  Clearly GROMACS was
trying to use 4 with "Using 4 OpenMP thread". I just did not catch that.
Sorry! Rookie mistake.

Which I guess leads me to my next question (sorry if its too naive):

(1) When running GROMACS (or a I guess any other CUDA based programs), its
best to have all the CPUs free, right? I guess based on my results I have
pretty much answered that question.  Although I thought that as long as I
have one CPU available to run the GPU it would be good: would setting
"-ntmpi 1 -ntomp 1" help or would I take a major hit in ns/day as well?

If I try the benchmarks again just to see (for fun) with "Using 4 OpenMP
thread", under top I have - so I think the CPU is fine :
PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEMTIME+  COMMAND
24791 bobayb20   0 48.3g  51m 7576 R 299.1  0.2  11:32.90
mdrun


When I have a chance (after this sander run is done - hopefully soon) I can
try the benchmarks again.

Thanks again for the help!

Ben
--
gmx-users mailing listgmx-users@gromacs.org
http://lists.gromacs.org/mailman/listinfo/gmx-users
* Please search the archive at 
http://www.gromacs.org/Support/Mailing_Lists/Search before posting!
* Please don't post (un)subscribe requests to the list. Use the
www interface or send it to gmx-users-requ...@gromacs.org.
* Can't post? Read http://www.gromacs.org/Support/Mailing_Lists


[gmx-users] GPU performance

2013-04-09 Thread Benjamin Bobay
Good afternoon -

I recently installed gromacs-4.6 on CentOS6.3 and the installation went
just fine.

I have a Tesla C2075 GPU.

I then downloaded the benchmark directories and ran a bench mark on the
GPU/ dhfr-solv-PME.bench

This is what I got:

Using 1 MPI thread
Using 4 OpenMP threads

1 GPU detected:
  #0: NVIDIA Tesla C2075, compute cap.: 2.0, ECC: yes, stat: compatible

1 GPU user-selected for this run: #0


Back Off! I just backed up ener.edr to ./#ener.edr.1#
starting mdrun 'Protein in water'
-1 steps, infinite ps.
step   40: timed with pme grid 64 64 64, coulomb cutoff 1.000: 4122.9
M-cycles
step   80: timed with pme grid 56 56 56, coulomb cutoff 1.143: 3685.9
M-cycles
step  120: timed with pme grid 48 48 48, coulomb cutoff 1.333: 3110.8
M-cycles
step  160: timed with pme grid 44 44 44, coulomb cutoff 1.455: 3365.1
M-cycles
step  200: timed with pme grid 40 40 40, coulomb cutoff 1.600: 3499.0
M-cycles
step  240: timed with pme grid 52 52 52, coulomb cutoff 1.231: 3982.2
M-cycles
step  280: timed with pme grid 48 48 48, coulomb cutoff 1.333: 3129.2
M-cycles
step  320: timed with pme grid 44 44 44, coulomb cutoff 1.455: 3425.4
M-cycles
step  360: timed with pme grid 42 42 42, coulomb cutoff 1.524: 2979.1
M-cycles
  optimal pme grid 42 42 42, coulomb cutoff 1.524
step 4300 performance: 1.8 ns/day

and from the nvidia-smi output:
Tue Apr  9 10:13:46 2013
+--+

| NVIDIA-SMI 4.304.37   Driver Version: 304.37
|
|---+--+--+
| GPU  Name | Bus-IdDisp.  | Volatile Uncorr.
ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap| Memory-Usage | GPU-Util  Compute
M. |
|===+==+==|
|   0  Tesla C2075  | :03:00.0  On |
0 |
| 30%   67CP080W / 225W |   4%  200MB / 5375MB |  4%
Default |
+---+--+--+


+-+
| Compute processes:   GPU
Memory |
|  GPU   PID  Process name
Usage  |
|=|
|0 22568  mdrun
59MB  |
+-+


So I am only getting 1.8ns/day ! Is that right? It seems very very
small compared to the CPU test where I am getting the same:

step 200 performance: 1.8 ns/dayvol 0.79  imb F 14%

>From the md.log of the GPU test:
Detecting CPU-specific acceleration.
Present hardware specification:
Vendor: GenuineIntel
Brand:  Intel(R) Xeon(R) CPU E5-2603 0 @ 1.80GHz
Family:  6  Model: 45  Stepping:  7
Features: aes apic avx clfsh cmov cx8 cx16 htt lahf_lm mmx msr nonstop_tsc
pcid pclmuldq pdcm pdpe1gb popcnt pse rdtscp sse2 sse3 sse4.1 sse4.2 ssse3
tdt x2a
pic
Acceleration most likely to fit this hardware: AVX_256
Acceleration selected at GROMACS compile time: AVX_256


1 GPU detected:
  #0: NVIDIA Tesla C2075, compute cap.: 2.0, ECC: yes, stat: compatible

1 GPU user-selected for this run: #0

Will do PME sum in reciprocal space.

Any thoughts as to why it is so slow?

many thanks!
Ben

-- 

Research Assistant Professor
North Carolina State University
Department of Molecular and Structural Biochemistry
128 Polk Hall
Raleigh, NC 27695
Phone: (919)-513-0698
Fax: (919)-515-2047

-- 
gmx-users mailing listgmx-users@gromacs.org
http://lists.gromacs.org/mailman/listinfo/gmx-users
* Please search the archive at 
http://www.gromacs.org/Support/Mailing_Lists/Search before posting!
* Please don't post (un)subscribe requests to the list. Use the 
www interface or send it to gmx-users-requ...@gromacs.org.
* Can't post? Read http://www.gromacs.org/Support/Mailing_Lists