On Thu, Jan 10, 2013 at 7:25 AM, James Starlight <jmsstarli...@gmail.com>wrote:
> Szilárd , > > thanks again for explanation! > > Today I've performed some tests on my calmodulin in water system with > different cutt-offs (I've used all cutt-ooffs 1.0 , 0.9 and 0.8 > respectually) > > Below you can see that the highest performance was in case of 0.8 cut-offs > > all cut-offs 1.0 > Force evaluation time GPU/CPU: 6.134 ms/4.700 ms = 1.305 > > NOTE: The GPU has >20% more load than the CPU. This imbalance causes > performance loss, consider using a shorter cut-off and a finer PME > grid. > > > Core t (s) Wall t (s) (%) > Time: 1313.420 464.035 283.0 > (ns/day) (hour/ns) > Performance: 9.310 2.578 > Finished mdrun on node 0 Thu Jan 10 09:39:23 2013 > > > all cut-offs 0.9 > Force evaluation time GPU/CPU: 4.951 ms/4.675 ms = 1.059 > > Core t (s) Wall t (s) (%) > Time: 2414.930 856.179 282.1 > (ns/day) (hour/ns) > Performance: 10.092 2.378 > Finished mdrun on node 0 Thu Jan 10 10:09:52 2013 > > all cut-offs 0.8 > Force evaluation time GPU/CPU: 4.001 ms/4.659 ms = 0.859 > > Core t (s) Wall t (s) (%) > Time: 1166.390 413.598 282.0 > (ns/day) (hour/ns) > Performance: 10.445 2.298 > Finished mdrun on node 0 Thu Jan 10 09:50:33 2013 > > Also I've noticed that 2-4 CPU cores usage in 2 and 3rd case was only > 67%. Is there any other ways to increase performance by means of > neighboor search parameters ( e.g nstlist etc) ? > You can tweak nstlist and it often helps to increase it with GPUs, especially in parallel. However, as increasing nstlist requires larger rlist and more non-bonded calculations, this will not help you. You can try to decrease it to 10-15 which will increase the NS cost but decrease the GPU time, but it won't change the performance dramatically. What's strange is that your Core time/Wall time = (%) is quite low. If you're running on four threads on an otherwise empty machine, you should get close to 400 if the threads are not idling, e.g waiting for the GPU. For instance in the rc=0.8 case you can see that the GPU/CPU balance is <1.0 meaning that the GPU has less work than the CPU, case in which there should be no idling and you should be getting (%) = 400. Long story short: are you sure you're not running anything else on the computer while simulating? What do you get if you run on CPU only? Might such reduced cut-off be used with the force fields ( e,g charmm) > where initially usage of longest cut-offs have given better results > (e,g in charmm27 and gromos56 I always use 1.2 and 1.4 nm for rvdw, > respectually) ? > No, at least not without *carefully* checking whether a shorter LJ cut-off makes sense and that it does not break the physics of your simulation. Although we advise you to consider decreasing your cut-off - mostly because these days a large number of simulations are carried out with overly long cut-off chosen by the rule of thumb or folclore -, you should always either make sure that this makes sense before doing it or not do it at all. Cheers, -- Szilárd > > > James > > 2013/1/10 Szilárd Páll <szilard.p...@cbr.su.se>: > > Hi James, > > > > The build looks mostly fine except that you are using fftw3 compiled with > > AVX which is slower than with only SSE (even on AVX-capable CPUs) - you > > should have been warned about this at configure-time. > > > > Now, performance-wise everything looks fine except that with a 1.2 nm > > cut-off your GPU is not able to keep up with the CPU and finish the > > non-bonded work before the CPU is done with Bonded + PME. That's why you > > see the "Wait GPU" taking 20% of the total time and that's also why you > see > > some cores idling (because for 20% of the run-time thread 0 on core 0 > > is blocked waiting for the GPU while the rest idle). > > > > As the suggestion at the end of the log file point out, you can consider > > using a shorter cut-off which will push more work back to the PME on the > > CPU, but whether you can do this it depends on your very problem. > > > > There is one more alternative of running two MPI processes on the GPU > > (mpirun -np 2 mdrun -gpu_id 00) and using the -nb gpu_cpu mode which will > > execute part of the nonbonded on the CPU, but this might not help. > > > > Cheers, > > > > -- > > Szilárd > > > > > > On Wed, Jan 9, 2013 at 8:27 PM, James Starlight <jmsstarli...@gmail.com > >wrote: > > > >> Dear Szilárd, thanks for help again! > >> > >> 2013/1/9 Szilárd Páll <szilard.p...@cbr.su.se>: > >> > >> > > >> > There could be, but I/we can't well without more information on what > and > >> > how you compiled and ran. The minimum we need is a log file. > >> > > >> I've compilated gromacs 4.6-3 beta via simple > >> > >> > >> cmake CMakeLists.txt -DGMX_GPU=ON > >> -DCUDA_TOOLKIT_ROOT_DIR=/usr/local/cuda-5.0 > >> make > >> sudo make install > >> > >> I have not added any special params to the grompp or mdrun. > >> > >> After that I've run tested simulation of the calmodulin in explicit > >> water ( 60k atoms ) 100ps and obtain next output > >> > >> Host: starlight pid: 21028 nodeid: 0 nnodes: 1 > >> Gromacs version: VERSION 4.6-beta3 > >> Precision: single > >> MPI library: thread_mpi > >> OpenMP support: enabled > >> GPU support: enabled > >> invsqrt routine: gmx_software_invsqrt(x) > >> CPU acceleration: AVX_256 > >> FFT library: fftw-3.3.2-sse2-avx > >> Large file support: enabled > >> RDTSCP usage: enabled > >> Built on: Wed Jan 9 20:44:51 MSK 2013 > >> Built by: own@starlight [CMAKE] > >> Build OS/arch: Linux 3.2.0-2-amd64 x86_64 > >> Build CPU vendor: GenuineIntel > >> Build CPU brand: Intel(R) Core(TM) i5-3570 CPU @ 3.40GHz > >> Build CPU family: 6 Model: 58 Stepping: 9 > >> Build CPU features: aes apic avx clfsh cmov cx8 cx16 f16c htt lahf_lm > >> mmx msr nonstop_tsc pcid pclmuldq pdcm popcnt pse rdrnd rdtscp sse2 > >> sse3 sse4.1 sse4.2 ssse3 tdt x2apic > >> C compiler: /usr/bin/gcc GNU gcc (Debian 4.6.3-11) 4.6.3 > >> C compiler flags: -mavx -Wextra -Wno-missing-field-initializers > >> -Wno-sign-compare -Wall -Wno-unused -Wunused-value > >> -fomit-frame-pointer -funroll-all-loops -fexcess-precision=fast -O3 > >> -DNDEBUG > >> C++ compiler: /usr/bin/c++ GNU c++ (Debian 4.6.3-11) 4.6.3 > >> C++ compiler flags: -mavx -Wextra -Wno-missing-field-initializers > >> -Wno-sign-compare -Wall -Wno-unused -Wunused-value > >> -fomit-frame-pointer -funroll-all-loops -fexcess-precision=fast -O3 > >> -DNDEBUG > >> CUDA compiler: nvcc: NVIDIA (R) Cuda compiler driver;Copyright > >> (c) 2005-2012 NVIDIA Corporation;Built on > >> Fri_Sep_21_17:28:58_PDT_2012;Cuda compilation tools, release 5.0, > >> V0.2.1221 > >> CUDA driver: 5.0 > >> CUDA runtime: 5.0 > >> > >> **************** > >> > >> Core t (s) Wall t (s) (%) > >> Time: 2770.700 1051.927 263.4 > >> (ns/day) (hour/ns) > >> Performance: 8.214 2.922 > >> > >> full log can be found here http://www.sendspace.com/file/inum84 > >> > >> > >> Finally when I check CPU usage I notice that only 1 CPU was full > >> loaded ( 100%) and 2-4 cores were loaded on only 60% but gave me > >> strange results that GPU is not used (I've only monitored temperature > >> of video card and noticed increase of the temperature up to 65 degrees > >> ) > >> > >> +------------------------------------------------------+ > >> | NVIDIA-SMI 4.304.54 Driver Version: 304.54 | > >> > >> > |-------------------------------+----------------------+----------------------+ > >> | GPU Name | Bus-Id Disp. | Volatile > Uncorr. > >> ECC | > >> | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util > Compute > >> M. | > >> > >> > |===============================+======================+======================| > >> | 0 GeForce GTX 670 | 0000:02:00.0 N/A | > >> N/A | > >> | 38% 63C N/A N/A / N/A | 9% 174MB / 2047MB | N/A > >> Default | > >> > >> > +-------------------------------+----------------------+----------------------+ > >> > >> > >> > +-----------------------------------------------------------------------------+ > >> | Compute processes: GPU > >> Memory | > >> | GPU PID Process name Usage > >> | > >> > >> > |=============================================================================| > >> | 0 Not Supported > >> | > >> > >> > +-----------------------------------------------------------------------------+ > >> > >> > >> Thanks for help again, > >> > >> James > >> -- > >> gmx-users mailing list gmx-users@gromacs.org > >> http://lists.gromacs.org/mailman/listinfo/gmx-users > >> * Please search the archive at > >> http://www.gromacs.org/Support/Mailing_Lists/Search before posting! > >> * Please don't post (un)subscribe requests to the list. Use the > >> www interface or send it to gmx-users-requ...@gromacs.org. > >> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists > >> > > -- > > gmx-users mailing list gmx-users@gromacs.org > > http://lists.gromacs.org/mailman/listinfo/gmx-users > > * Please search the archive at > http://www.gromacs.org/Support/Mailing_Lists/Search before posting! > > * Please don't post (un)subscribe requests to the list. Use the > > www interface or send it to gmx-users-requ...@gromacs.org. > > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists > -- > gmx-users mailing list gmx-users@gromacs.org > http://lists.gromacs.org/mailman/listinfo/gmx-users > * Please search the archive at > http://www.gromacs.org/Support/Mailing_Lists/Search before posting! > * Please don't post (un)subscribe requests to the list. Use the > www interface or send it to gmx-users-requ...@gromacs.org. > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists > -- gmx-users mailing list gmx-users@gromacs.org http://lists.gromacs.org/mailman/listinfo/gmx-users * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/Search before posting! * Please don't post (un)subscribe requests to the list. Use the www interface or send it to gmx-users-requ...@gromacs.org. * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists