Hi,
On the new machine your CUDA runtime and driver versions are lower than on the old machine. Maybe that could explain it? (is the GPU even used with -rerun?) You would need to recompile gromacs. Peter On 27-03-17 15:51, Michael Brunsteiner wrote: > Hi,I have to run a lot (many thousands) of very short MD reruns with > gmx.Using gmx-2016.3 it works without problems, however, what i see is > thatthe overall performance (in terms of REAL execution time as measured with > the unix time command)which I get on a relatively new computer is poorer than > what i get with a much older machine > (by a factor of about 2 - this in spite of gmx reporting a better > performance of the new machine in thelog file) > > both machines run linux (debian), the old has eight intel cores the newer one > 12. > on the newer machine gmx uses a supposedly faster SIMD instruction > setotherwise hardware (including hard drives) is comparable. > > below output of a typical job (gmx mdrun -rerun with a trajectory > containingnot more than a couple of thousand conformations of a single small > molecule)on both machines (mdp file content below) > > old machine:prompt> time gmx mdrun ... > in the log file: > Core t (s) Wall t (s) (%) > Time: 4.527 0.566 800.0 > (ns/day) (hour/ns) > Performance: 1.527 15.719 > on the command line: > real 2m45.562s <==================================== > user 15m40.901s > sys 0m33.319s > > new machine: > prompt> time gmx mdrun ... > in the log file: Core t (s) Wall t (s) (%) > Time: 6.030 0.502 1200.0 > (ns/day) (hour/ns) > Performance: 1.719 13.958 > > on the command line:real 5m30.962s <==================================== > user 20m2.208s > sys 3m28.676s > > The specs of the two gmx installations are given below.I'd be grateful if > anyone could suggest ways to improve performanceon the newer machine! > cheers,Michael > > > the older machine (here the jobs run faster): gmx --version > > GROMACS version: 2016.3 > Precision: single > Memory model: 64 bit > MPI library: thread_mpi > OpenMP support: enabled (GMX_OPENMP_MAX_THREADS = 32) > GPU support: CUDA > SIMD instructions: SSE4.1 > FFT library: fftw-3.3.5-sse2 > RDTSCP usage: enabled > TNG support: enabled > Hwloc support: hwloc-1.8.0 > Tracing support: disabled > Built on: Tue Mar 21 11:24:42 CET 2017 > Built by: root@rcpetemp1 [CMAKE] > Build OS/arch: Linux 3.13.0-79-generic x86_64 > Build CPU vendor: Intel > Build CPU brand: Intel(R) Core(TM) i7 CPU 960 @ 3.20GHz > Build CPU family: 6 Model: 26 Stepping: 5 > Build CPU features: apic clfsh cmov cx8 cx16 htt lahf mmx msr nonstop_tsc > pdcm popcnt pse rdtscp sse2 sse3 sse4.1 sse4.2 ssse3 > C compiler: /usr/bin/cc GNU 4.8.4 > C compiler flags: -msse4.1 -O3 -DNDEBUG -funroll-all-loops > -fexcess-precision=fast > C++ compiler: /usr/bin/c++ GNU 4.8.4 > C++ compiler flags: -msse4.1 -std=c++0x -O3 -DNDEBUG -funroll-all-loops > -fexcess-precision=fast > CUDA compiler: /usr/local/cuda/bin/nvcc nvcc: NVIDIA (R) Cuda compiler > driver;Copyright (c) 2005-2015 NVIDIA Corporation;Built on > Tue_Aug_11_14:27:32_CDT_2015;Cuda compilation tools, release 7.5, V7.5.17 > CUDA compiler > flags:-gencode;arch=compute_20,code=sm_20;-gencode;arch=compute_30,code=sm_30;-gencode;arch=compute_35,code=sm_35;-gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_52,code=sm_52;-gencode;arch=compute_52,code=compute_52;-use_fast_math;;;-Xcompiler;,-msse4.1,,,,,,;-Xcompiler;-O3,-DNDEBUG,-funroll-all-loops,-fexcess-precision=fast,,; > > CUDA driver: 7.50 > CUDA runtime: 7.50 > > > > the newer machine (here execution is slower by a factor 2): gmx --version > > GROMACS version: 2016.3 > Precision: single > Memory model: 64 bit > MPI library: thread_mpi > OpenMP support: enabled (GMX_OPENMP_MAX_THREADS = 32) > GPU support: CUDA > SIMD instructions: AVX_256 > FFT library: fftw-3.3.5 > RDTSCP usage: enabled > TNG support: enabled > Hwloc support: hwloc-1.10.0 > Tracing support: disabled > Built on: Fri Mar 24 11:18:29 CET 2017 > Built by: root@rcpe-sbd-node01 [CMAKE] > Build OS/arch: Linux 3.14-2-amd64 x86_64 > Build CPU vendor: Intel > Build CPU brand: Intel(R) Core(TM) i7-4930K CPU @ 3.40GHz > Build CPU family: 6 Model: 62 Stepping: 4 > Build CPU features: aes apic avx clfsh cmov cx8 cx16 f16c htt lahf mmx msr > nonstop_tsc pcid pclmuldq pdcm pdpe1gb popcnt pse rdrnd rdtscp sse2 sse3 > sse4.1 sse4.2 ssse3 tdt x2apic > C compiler: /usr/bin/cc GNU 4.9.2 > C compiler flags: -mavx -O3 -DNDEBUG -funroll-all-loops > -fexcess-precision=fast > C++ compiler: /usr/bin/c++ GNU 4.9.2 > C++ compiler flags: -mavx -std=c++0x -O3 -DNDEBUG -funroll-all-loops > -fexcess-precision=fast > CUDA compiler: /usr/bin/nvcc nvcc: NVIDIA (R) Cuda compiler > driver;Copyright (c) 2005-2013 NVIDIA Corporation;Built on > Wed_Jul_17_18:36:13_PDT_2013;Cuda compilation tools, release 5.5, V5.5.0 > CUDA compiler > flags:-gencode;arch=compute_20,code=sm_20;-gencode;arch=compute_30,code=sm_30;-gencode;arch=compute_35,code=sm_35;-gencode;arch=compute_35,code=compute_35;-use_fast_math;;;-Xcompiler;,-mavx,,,,,,;-Xcompiler;-O3,-DNDEBUG,-funroll-all-loops,-fexcess-precision=fast,,; > > CUDA driver: 6.50 > CUDA runtime: 5.50 > > > > mdp-file: > > integrator = md > dt = 0.001 > nsteps = 0 > comm-grps = System > cutoff-scheme = verlet > ; > nstxout = 0 > nstvout = 0 > nstfout = 0 > nstlog = 0 > nstenergy = 1 > ; > nstlist = 10000 > ns_type = grid > pbc = xyz > rlist = 3.9 > ; > coulombtype = cut-off > rcoulomb = 3.9 > vdw_type = cut-off > rvdw = 3.9 > DispCorr = no > ; > constraints = none > ; > continuation = yes
-- Gromacs Users mailing list * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting! * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists * For (un)subscribe requests visit https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a mail to [email protected].
