Szilárd, the regime with 4 cores + cut-offs 0.8 has been best still.
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 1652 own 20 0 28.4g 135m 33m R 288.8 0.8 4:30.33 mdrun Force evaluation time GPU/CPU: 5.257 ms/5.187 ms = 1.013 For optimal performance this ratio should be close to 1! Core t (s) Wall t (s) (%) Time: 494.240 171.719 287.8 (ns/day) (hour/ns) Performance: 10.064 2.385 Finished mdrun on node 0 Fri Jan 11 09:38:38 2013 I've tried to use compination of the different core numbers but results was the same ( below example with the 2 cores +gpu) PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 1578 own 20 0 28.3g 163m 33m R 170.7 1.0 1:50.68 mdrun Also cut-offs lower that 0.8 produced the same results. When I've used cut-off 0.1 the simulation have been crushed :) Finally increassing of nstlist up to 50 also gave slightly better results ( cpu usage up to 295) but I'm not sure about influence os such large cutoofs on other aspects of simulation. On that tests I'm using cpu CPU Intel Core i5-3570 3.4 ГГц / 4core / SVGA HD Graphics 2500 / 1+6Мб / as well as GPU GeForce GTX 670. Also I want to point out that all simulation have been run from debian GE GNOME desktope. Should I run simulation from console mode only to kill all hidden cpu-dependent processes? By the way I wounder to know is it possible to use 2 gpu at the same time (in the SLI mode) ? How It might increase overasl performance ? In future I'd like to built new work-station on 8 cores i7 CPU + 2 GPU. What the performance of such work-station will be? ( in comparison to the typical cluster from several nodes of 8-12 cpu ) ? Thanks for suggestions, James 2013/1/11 Szilárd Páll <szilard.p...@cbr.su.se>: > Hi, > > On Thu, Jan 10, 2013 at 8:30 PM, James Starlight > <jmsstarli...@gmail.com>wrote: > >> Szilárd, >> >> There are no any others cpu-usage tasks. Below you can see log from the >> TOP. >> >> 26553 own 20 0 28.4g 106m 33m S 285.6 0.7 2263:57 mdrun >> > > This still shows that the average CPU utilization is only 285.6 iso 400 and > that matches with what mdrun's log shows. Try to run with a very short > cut-off, one which leads to <=1 GPU/CPU balance (i.e no waiting) and if you > still don't get 400, something weird is going on. > > >> 1611 root 20 0 171m 65m 24m S 3.0 0.4 7:43.05 Xorg >> 29647 own 20 0 381m 22m 17m S 3.0 0.1 0:01.77 >> mate-system-mon >> 2344 own 20 0 358m 17m 11m S 1.3 0.1 0:33.76 mate-terminal >> 29018 root 20 0 0 0 0 S 0.3 0.0 0:04.99 kworker/0:0 >> 29268 root 20 0 0 0 0 S 0.3 0.0 0:00.22 kworker/u:2 >> 29705 root 20 0 0 0 0 S 0.3 0.0 0:00.03 kworker/3:0 >> 29706 own 20 0 23284 1648 1188 R 0.3 0.0 0:00.05 top >> 1 root 20 0 8584 872 736 S 0.0 0.0 0:02.34 init >> 2 root 20 0 0 0 0 S 0.0 0.0 0:00.02 kthreadd >> 3 root 20 0 0 0 0 S 0.0 0.0 0:00.57 ksoftirqd/0 >> 6 root rt 0 0 0 0 S 0.0 0.0 0:00.00 migration/0 >> 7 root rt 0 0 0 0 S 0.0 0.0 0:00.17 watchdog/0 >> 8 root rt 0 0 0 0 S 0.0 0.0 0:00.00 migration/1 >> 10 root 20 0 0 0 0 S 0.0 0.0 0:00.43 ksoftirqd/1 >> 12 root rt 0 0 0 0 S 0.0 0.0 0:00.17 watchdog/1 >> 13 root rt 0 0 0 0 S 0.0 0.0 0:00.00 migration/2 >> 15 root 20 0 0 0 0 S 0.0 0.0 0:00.37 ksoftirqd/2 >> 16 root rt 0 0 0 0 S 0.0 0.0 0:00.16 watchdog/2 >> 17 root rt 0 0 0 0 S 0.0 0.0 0:00.00 migration/3 >> 19 root 20 0 0 0 0 S 0.0 0.0 0:00.38 ksoftirqd/3 >> 20 root rt 0 0 0 0 S 0.0 0.0 0:00.16 watchdog/3 >> 21 root 0 -20 0 0 0 S 0.0 0.0 0:00.00 cpuset >> 22 root 0 -20 0 0 0 S 0.0 0.0 0:00.00 khelper >> 23 root 20 0 0 0 0 S 0.0 0.0 0:00.00 kdevtmpfs >> >> >> Usually I run my simulations by means of simple mdrun -v -deffnm md. >> Should I specify number of cores manually by means of -nt (or -ntmpi) >> > > If you just want to run on the full machine, simply running like that > should in most cases still be the optimal run configuration or very close > to the optimal, i.e. in your case: > mdrun > <=> > mdrun -ntmpi 1 -ntomp 4 -gpu_id 0 -pinht > > >> flagg? Also I notice that -pinht flagg could give me Hyper-Threading >> support. Does it reasonable in the simulation on cpu+gpu ? What >> > > Correctly using HT is also fully automatic and optimal as long as you are > using the full machine. > > >> another possible options of md_run should I consider ? Finally is it >> possible that problems due to openMP (4.7.2) or open-mpi (1.4.5) >> drivers ? >> > > No, you are using the latest version of compilers which is good. Other than > my earlier suggestions, there isn't much you can do to eliminate the idling > on the CPU (I assume that's what bugs you) - except getting a faster GPU. > Btw, have you tried the hybrid GPU-CPU mode (although I expect it to not be > faster)? > > Cheers, > -- > Szilárd > > > >> >> >> Thanks for help >> >> James >> >> >> 2013/1/10 Szilárd Páll <szilard.p...@cbr.su.se>: >> > On Thu, Jan 10, 2013 at 7:25 AM, James Starlight <jmsstarli...@gmail.com >> >wrote: >> > >> >> Szilárd , >> >> >> >> thanks again for explanation! >> >> >> >> Today I've performed some tests on my calmodulin in water system with >> >> different cutt-offs (I've used all cutt-ooffs 1.0 , 0.9 and 0.8 >> >> respectually) >> >> >> >> Below you can see that the highest performance was in case of 0.8 >> cut-offs >> >> >> >> all cut-offs 1.0 >> >> Force evaluation time GPU/CPU: 6.134 ms/4.700 ms = 1.305 >> >> >> >> NOTE: The GPU has >20% more load than the CPU. This imbalance causes >> >> performance loss, consider using a shorter cut-off and a finer PME >> >> grid. >> >> >> >> >> >> Core t (s) Wall t (s) (%) >> >> Time: 1313.420 464.035 283.0 >> >> (ns/day) (hour/ns) >> >> Performance: 9.310 2.578 >> >> Finished mdrun on node 0 Thu Jan 10 09:39:23 2013 >> >> >> >> >> >> all cut-offs 0.9 >> >> Force evaluation time GPU/CPU: 4.951 ms/4.675 ms = 1.059 >> >> >> >> Core t (s) Wall t (s) (%) >> >> Time: 2414.930 856.179 282.1 >> >> (ns/day) (hour/ns) >> >> Performance: 10.092 2.378 >> >> Finished mdrun on node 0 Thu Jan 10 10:09:52 2013 >> >> >> >> all cut-offs 0.8 >> >> Force evaluation time GPU/CPU: 4.001 ms/4.659 ms = 0.859 >> >> >> >> Core t (s) Wall t (s) (%) >> >> Time: 1166.390 413.598 282.0 >> >> (ns/day) (hour/ns) >> >> Performance: 10.445 2.298 >> >> Finished mdrun on node 0 Thu Jan 10 09:50:33 2013 >> >> >> >> Also I've noticed that 2-4 CPU cores usage in 2 and 3rd case was only >> >> 67%. Is there any other ways to increase performance by means of >> >> neighboor search parameters ( e.g nstlist etc) ? >> >> >> > >> > You can tweak nstlist and it often helps to increase it with GPUs, >> > especially in parallel. However, as increasing nstlist requires larger >> > rlist and more non-bonded calculations, this will not help you. You can >> try >> > to decrease it to 10-15 which will increase the NS cost but decrease the >> > GPU time, but it won't change the performance dramatically. >> > >> > What's strange is that your Core time/Wall time = (%) is quite low. If >> > you're running on four threads on an otherwise empty machine, you should >> > get close to 400 if the threads are not idling, e.g waiting for the GPU. >> > For instance in the rc=0.8 case you can see that the GPU/CPU balance is >> > <1.0 meaning that the GPU has less work than the CPU, case in which there >> > should be no idling and you should be getting (%) = 400. >> > >> > Long story short: are you sure you're not running anything else on the >> > computer while simulating? What do you get if you run on CPU only? >> > >> > Might such reduced cut-off be used with the force fields ( e,g charmm) >> >> where initially usage of longest cut-offs have given better results >> >> (e,g in charmm27 and gromos56 I always use 1.2 and 1.4 nm for rvdw, >> >> respectually) ? >> >> >> > >> > No, at least not without *carefully* checking whether a shorter LJ >> cut-off >> > makes sense and that it does not break the physics of your simulation. >> > >> > Although we advise you to consider decreasing your cut-off - mostly >> because >> > these days a large number of simulations are carried out with overly long >> > cut-off chosen by the rule of thumb or folclore -, you should always >> either >> > make sure that this makes sense before doing it or not do it at all. >> > >> > Cheers, >> > -- >> > Szilárd >> > >> > >> >> >> >> >> >> James >> >> >> >> 2013/1/10 Szilárd Páll <szilard.p...@cbr.su.se>: >> >> > Hi James, >> >> > >> >> > The build looks mostly fine except that you are using fftw3 compiled >> with >> >> > AVX which is slower than with only SSE (even on AVX-capable CPUs) - >> you >> >> > should have been warned about this at configure-time. >> >> > >> >> > Now, performance-wise everything looks fine except that with a 1.2 nm >> >> > cut-off your GPU is not able to keep up with the CPU and finish the >> >> > non-bonded work before the CPU is done with Bonded + PME. That's why >> you >> >> > see the "Wait GPU" taking 20% of the total time and that's also why >> you >> >> see >> >> > some cores idling (because for 20% of the run-time thread 0 on core 0 >> >> > is blocked waiting for the GPU while the rest idle). >> >> > >> >> > As the suggestion at the end of the log file point out, you can >> consider >> >> > using a shorter cut-off which will push more work back to the PME on >> the >> >> > CPU, but whether you can do this it depends on your very problem. >> >> > >> >> > There is one more alternative of running two MPI processes on the GPU >> >> > (mpirun -np 2 mdrun -gpu_id 00) and using the -nb gpu_cpu mode which >> will >> >> > execute part of the nonbonded on the CPU, but this might not help. >> >> > >> >> > Cheers, >> >> > >> >> > -- >> >> > Szilárd >> >> > >> >> > >> >> > On Wed, Jan 9, 2013 at 8:27 PM, James Starlight < >> jmsstarli...@gmail.com >> >> >wrote: >> >> > >> >> >> Dear Szilárd, thanks for help again! >> >> >> >> >> >> 2013/1/9 Szilárd Páll <szilard.p...@cbr.su.se>: >> >> >> >> >> >> > >> >> >> > There could be, but I/we can't well without more information on >> what >> >> and >> >> >> > how you compiled and ran. The minimum we need is a log file. >> >> >> > >> >> >> I've compilated gromacs 4.6-3 beta via simple >> >> >> >> >> >> >> >> >> cmake CMakeLists.txt -DGMX_GPU=ON >> >> >> -DCUDA_TOOLKIT_ROOT_DIR=/usr/local/cuda-5.0 >> >> >> make >> >> >> sudo make install >> >> >> >> >> >> I have not added any special params to the grompp or mdrun. >> >> >> >> >> >> After that I've run tested simulation of the calmodulin in explicit >> >> >> water ( 60k atoms ) 100ps and obtain next output >> >> >> >> >> >> Host: starlight pid: 21028 nodeid: 0 nnodes: 1 >> >> >> Gromacs version: VERSION 4.6-beta3 >> >> >> Precision: single >> >> >> MPI library: thread_mpi >> >> >> OpenMP support: enabled >> >> >> GPU support: enabled >> >> >> invsqrt routine: gmx_software_invsqrt(x) >> >> >> CPU acceleration: AVX_256 >> >> >> FFT library: fftw-3.3.2-sse2-avx >> >> >> Large file support: enabled >> >> >> RDTSCP usage: enabled >> >> >> Built on: Wed Jan 9 20:44:51 MSK 2013 >> >> >> Built by: own@starlight [CMAKE] >> >> >> Build OS/arch: Linux 3.2.0-2-amd64 x86_64 >> >> >> Build CPU vendor: GenuineIntel >> >> >> Build CPU brand: Intel(R) Core(TM) i5-3570 CPU @ 3.40GHz >> >> >> Build CPU family: 6 Model: 58 Stepping: 9 >> >> >> Build CPU features: aes apic avx clfsh cmov cx8 cx16 f16c htt lahf_lm >> >> >> mmx msr nonstop_tsc pcid pclmuldq pdcm popcnt pse rdrnd rdtscp sse2 >> >> >> sse3 sse4.1 sse4.2 ssse3 tdt x2apic >> >> >> C compiler: /usr/bin/gcc GNU gcc (Debian 4.6.3-11) 4.6.3 >> >> >> C compiler flags: -mavx -Wextra -Wno-missing-field-initializers >> >> >> -Wno-sign-compare -Wall -Wno-unused -Wunused-value >> >> >> -fomit-frame-pointer -funroll-all-loops -fexcess-precision=fast -O3 >> >> >> -DNDEBUG >> >> >> C++ compiler: /usr/bin/c++ GNU c++ (Debian 4.6.3-11) 4.6.3 >> >> >> C++ compiler flags: -mavx -Wextra -Wno-missing-field-initializers >> >> >> -Wno-sign-compare -Wall -Wno-unused -Wunused-value >> >> >> -fomit-frame-pointer -funroll-all-loops -fexcess-precision=fast -O3 >> >> >> -DNDEBUG >> >> >> CUDA compiler: nvcc: NVIDIA (R) Cuda compiler driver;Copyright >> >> >> (c) 2005-2012 NVIDIA Corporation;Built on >> >> >> Fri_Sep_21_17:28:58_PDT_2012;Cuda compilation tools, release 5.0, >> >> >> V0.2.1221 >> >> >> CUDA driver: 5.0 >> >> >> CUDA runtime: 5.0 >> >> >> >> >> >> **************** >> >> >> >> >> >> Core t (s) Wall t (s) (%) >> >> >> Time: 2770.700 1051.927 263.4 >> >> >> (ns/day) (hour/ns) >> >> >> Performance: 8.214 2.922 >> >> >> >> >> >> full log can be found here http://www.sendspace.com/file/inum84 >> >> >> >> >> >> >> >> >> Finally when I check CPU usage I notice that only 1 CPU was full >> >> >> loaded ( 100%) and 2-4 cores were loaded on only 60% but gave me >> >> >> strange results that GPU is not used (I've only monitored temperature >> >> >> of video card and noticed increase of the temperature up to 65 >> degrees >> >> >> ) >> >> >> >> >> >> +------------------------------------------------------+ >> >> >> | NVIDIA-SMI 4.304.54 Driver Version: 304.54 | >> >> >> >> >> >> >> >> >> |-------------------------------+----------------------+----------------------+ >> >> >> | GPU Name | Bus-Id Disp. | Volatile >> >> Uncorr. >> >> >> ECC | >> >> >> | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util >> >> Compute >> >> >> M. | >> >> >> >> >> >> >> >> >> |===============================+======================+======================| >> >> >> | 0 GeForce GTX 670 | 0000:02:00.0 N/A | >> >> >> N/A | >> >> >> | 38% 63C N/A N/A / N/A | 9% 174MB / 2047MB | N/A >> >> >> Default | >> >> >> >> >> >> >> >> >> +-------------------------------+----------------------+----------------------+ >> >> >> >> >> >> >> >> >> >> >> >> +-----------------------------------------------------------------------------+ >> >> >> | Compute processes: >> GPU >> >> >> Memory | >> >> >> | GPU PID Process name >> Usage >> >> >> | >> >> >> >> >> >> >> >> >> |=============================================================================| >> >> >> | 0 Not Supported >> >> >> | >> >> >> >> >> >> >> >> >> +-----------------------------------------------------------------------------+ >> >> >> >> >> >> >> >> >> Thanks for help again, >> >> >> >> >> >> James >> >> >> -- >> >> >> gmx-users mailing list gmx-users@gromacs.org >> >> >> http://lists.gromacs.org/mailman/listinfo/gmx-users >> >> >> * Please search the archive at >> >> >> http://www.gromacs.org/Support/Mailing_Lists/Search before posting! >> >> >> * Please don't post (un)subscribe requests to the list. Use the >> >> >> www interface or send it to gmx-users-requ...@gromacs.org. >> >> >> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists >> >> >> >> >> > -- >> >> > gmx-users mailing list gmx-users@gromacs.org >> >> > http://lists.gromacs.org/mailman/listinfo/gmx-users >> >> > * Please search the archive at >> >> http://www.gromacs.org/Support/Mailing_Lists/Search before posting! >> >> > * Please don't post (un)subscribe requests to the list. Use the >> >> > www interface or send it to gmx-users-requ...@gromacs.org. >> >> > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists >> >> -- >> >> gmx-users mailing list gmx-users@gromacs.org >> >> http://lists.gromacs.org/mailman/listinfo/gmx-users >> >> * Please search the archive at >> >> http://www.gromacs.org/Support/Mailing_Lists/Search before posting! >> >> * Please don't post (un)subscribe requests to the list. Use the >> >> www interface or send it to gmx-users-requ...@gromacs.org. >> >> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists >> >> >> > -- >> > gmx-users mailing list gmx-users@gromacs.org >> > http://lists.gromacs.org/mailman/listinfo/gmx-users >> > * Please search the archive at >> http://www.gromacs.org/Support/Mailing_Lists/Search before posting! >> > * Please don't post (un)subscribe requests to the list. Use the >> > www interface or send it to gmx-users-requ...@gromacs.org. >> > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists >> -- >> gmx-users mailing list gmx-users@gromacs.org >> http://lists.gromacs.org/mailman/listinfo/gmx-users >> * Please search the archive at >> http://www.gromacs.org/Support/Mailing_Lists/Search before posting! >> * Please don't post (un)subscribe requests to the list. Use the >> www interface or send it to gmx-users-requ...@gromacs.org. >> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists >> > -- > gmx-users mailing list gmx-users@gromacs.org > http://lists.gromacs.org/mailman/listinfo/gmx-users > * Please search the archive at > http://www.gromacs.org/Support/Mailing_Lists/Search before posting! > * Please don't post (un)subscribe requests to the list. Use the > www interface or send it to gmx-users-requ...@gromacs.org. > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists -- gmx-users mailing list gmx-users@gromacs.org http://lists.gromacs.org/mailman/listinfo/gmx-users * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/Search before posting! * Please don't post (un)subscribe requests to the list. Use the www interface or send it to gmx-users-requ...@gromacs.org. * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists