Hi Peter Thanks for your response. I also realized that GTX-610 is not able to catch up with the faster cpu ( Intel(R) Core(TM) i7-5930K CPU @ 3.50GHz). I tried cpu-gpu combination for -nb option. It improves it slightly but not by much. So, we are planning to go for a replacement of GPU cards. At this point, we have two plans: either go for single 4 GB GTX-970 or two 2 GB GTX-960 . I was wondering whether you can comment on which options will be better as far as performance is concerned. Thanks for your input jagannath
On Fri, Sep 4, 2015 at 6:45 PM, Peter Kroon <p.c.kr...@rug.nl> wrote: > Hi Jagannath, > > AFAIK GT610's are rather slow. What you could try is using both cpu and > gpu for non-bonded interactions (-nb gpu_cpu) > > Peter > > On 04/09/15 15:01, jagannath mondal wrote: > > Dear Gromacs Users > > > > I am trying to run gpu version of gromacs5.0.6 in a work-station which > is > > a hexacore processor that can be multithreaded to 12. The workstation > has 2 > > Geforce GT 610 GPUs . I am finding the simulation using -nb gpu is > > exceedingly slower than -nb cpu ( i,e turning off gpu) > > > > I installed cuda-7.0 and using this I could install gpu version of > gromacs > > 5.0.6 as follows. > > > > cmake ../ -DGMX_BUILD_OWN_FFTW=ON > > -DCMAKE_INSTALL_PREFIX=/home/jmondal/UTIL/GROMACS_5.0.6_gpu/ > > -DCMAKE_C_COMPILER=gcc -DCMAKE_CXX_COMPILER=g++ -DGMX_GPU=ON > > -DCUDA_TOOLKIT_ROOT_DIR=/usr/local/cuda/ > > > > > > However, the performance with gpu is very weird. If I do mdrun using > > following command: > > 1) gmx mdrun -s topol. -nb gpu -v &>log_run > > > > and then repeat the same thing by turning of gpu usage > > > > 2) gmx mdrun -s topol -nb cpu -v >& log_run > > > > using gpus, the performance drops about 3 times !! Using both the GPUs > > along with CPUs, the performance is: 1.620 ns/day > > using only CPUs, the performance is 4.6 ns/day... usage of GPUs is > > frustratingly slowing down the performance. > > > > when using -nb gpu option, gromacs md.log correctly detects gpu and cpu > as > > follows: > > > > Using 2 MPI threads > > Using 6 OpenMP threads per tMPI thread > > > > Detecting CPU SIMD instructions. > > Present hardware specification: > > Vendor: GenuineIntel > > Brand: Intel(R) Core(TM) i7-5930K CPU @ 3.50GHz > > Family: 6 Model: 63 Stepping: 2 > > Features: aes apic avx avx2 clfsh cmov cx8 cx16 f16c fma htt lahf_lm mmx > > msr nonstop_tsc pcid pclmuldq pdcm pdpe1gb popcnt pse rdrnd rdtscp sse2 > > sse3 sse4.1 sse4.2 ssse3 tdt x2apic > > SIMD instructions most likely to fit this hardware: AVX2_256 > > SIMD instructions selected at GROMACS compile time: AVX2_256 > > > > > > 2 GPUs detected: > > #0: NVIDIA GeForce GT 610, compute cap.: 2.1, ECC: no, stat: > compatible > > #1: NVIDIA GeForce GT 610, compute cap.: 2.1, ECC: no, stat: > compatible > > > > 2 GPUs auto-selected for this run. > > Mapping of GPUs to the 2 PP ranks in this node: #0, #1 > > > > > > However, when I look at the performance at the end of the simulation, the > > 'wait GPU nonlocal' takes awfully long time. > > I also tried few other options ( such as using only 1 gpu using gpu_id 0 > ). > > Also played with ntmpi and ntomp option. But GPUs performance is > > drastically poor ( surprisingly 3 times slower than only cpu-based > > simulation), > > > > I am struggling to figure out whether it is a hardware issue or > GPU-driver > > issue or whether I am not using best optimal option. > > Your suggestion will be useful in solving the issue. > > Jagannath > > > > > > R E A L C Y C L E A N D T I M E A C C O U N T I N G > > > > On 2 MPI ranks, each using 6 OpenMP threads > > > > Computing: Num Num Call Wall time Giga-Cycles > > Ranks Threads Count (s) total sum % > > > ----------------------------------------------------------------------------- > > Domain decomp. 2 6 63 0.270 11.322 > 0.2 > > DD comm. load 2 6 13 0.000 0.002 > 0.0 > > Neighbor search 2 6 63 0.311 13.062 > 0.2 > > Launch GPU ops. 2 6 5002 0.205 8.614 > 0.2 > > Comm. coord. 2 6 2438 0.239 10.016 > 0.2 > > Force 2 6 2501 1.358 57.011 > 1.0 > > Wait + Comm. F 2 6 2501 0.404 16.954 > 0.3 > > PME mesh 2 6 2501 9.734 408.587 > 7.3 > > Wait GPU nonlocal 2 6 2501 117.798 4944.651 > 88.3 > > Wait GPU local 2 6 2501 0.005 0.206 > 0.0 > > NB X/F buffer ops. 2 6 9878 0.255 10.683 > 0.2 > > Write traj. 2 6 4 0.180 7.558 > 0.1 > > Update 2 6 2501 0.807 33.886 > 0.6 > > Constraints 2 6 2501 1.216 51.025 > 0.9 > > Comm. energies 2 6 126 0.001 0.055 > 0.0 > > Rest 0.609 25.573 > 0.5 > > > ----------------------------------------------------------------------------- > > Total 133.392 5599.205 > 100.0 > > > > -- > Gromacs Users mailing list > > * Please search the archive at > http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before > posting! > > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists > > * For (un)subscribe requests visit > https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or > send a mail to gmx-users-requ...@gromacs.org. > -- Gromacs Users mailing list * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting! * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists * For (un)subscribe requests visit https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a mail to gmx-users-requ...@gromacs.org.