Hi,I have to run a lot (many thousands) of very short MD reruns with gmx.Using gmx-2016.3 it works without problems, however, what i see is thatthe overall performance (in terms of REAL execution time as measured with the unix time command)which I get on a relatively new computer is poorer than what i get with a much older machine (by a factor of about 2 - this in spite of gmx reporting a better performance of the new machine in thelog file)
both machines run linux (debian), the old has eight intel cores the newer one 12. on the newer machine gmx uses a supposedly faster SIMD instruction setotherwise hardware (including hard drives) is comparable. below output of a typical job (gmx mdrun -rerun with a trajectory containingnot more than a couple of thousand conformations of a single small molecule)on both machines (mdp file content below) old machine:prompt> time gmx mdrun ... in the log file: Core t (s) Wall t (s) (%) Time: 4.527 0.566 800.0 (ns/day) (hour/ns) Performance: 1.527 15.719 on the command line: real 2m45.562s <==================================== user 15m40.901s sys 0m33.319s new machine: prompt> time gmx mdrun ... in the log file: Core t (s) Wall t (s) (%) Time: 6.030 0.502 1200.0 (ns/day) (hour/ns) Performance: 1.719 13.958 on the command line:real 5m30.962s <==================================== user 20m2.208s sys 3m28.676s The specs of the two gmx installations are given below.I'd be grateful if anyone could suggest ways to improve performanceon the newer machine! cheers,Michael the older machine (here the jobs run faster): gmx --version GROMACS version: 2016.3 Precision: single Memory model: 64 bit MPI library: thread_mpi OpenMP support: enabled (GMX_OPENMP_MAX_THREADS = 32) GPU support: CUDA SIMD instructions: SSE4.1 FFT library: fftw-3.3.5-sse2 RDTSCP usage: enabled TNG support: enabled Hwloc support: hwloc-1.8.0 Tracing support: disabled Built on: Tue Mar 21 11:24:42 CET 2017 Built by: root@rcpetemp1 [CMAKE] Build OS/arch: Linux 3.13.0-79-generic x86_64 Build CPU vendor: Intel Build CPU brand: Intel(R) Core(TM) i7 CPU 960 @ 3.20GHz Build CPU family: 6 Model: 26 Stepping: 5 Build CPU features: apic clfsh cmov cx8 cx16 htt lahf mmx msr nonstop_tsc pdcm popcnt pse rdtscp sse2 sse3 sse4.1 sse4.2 ssse3 C compiler: /usr/bin/cc GNU 4.8.4 C compiler flags: -msse4.1 -O3 -DNDEBUG -funroll-all-loops -fexcess-precision=fast C++ compiler: /usr/bin/c++ GNU 4.8.4 C++ compiler flags: -msse4.1 -std=c++0x -O3 -DNDEBUG -funroll-all-loops -fexcess-precision=fast CUDA compiler: /usr/local/cuda/bin/nvcc nvcc: NVIDIA (R) Cuda compiler driver;Copyright (c) 2005-2015 NVIDIA Corporation;Built on Tue_Aug_11_14:27:32_CDT_2015;Cuda compilation tools, release 7.5, V7.5.17 CUDA compiler flags:-gencode;arch=compute_20,code=sm_20;-gencode;arch=compute_30,code=sm_30;-gencode;arch=compute_35,code=sm_35;-gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_52,code=sm_52;-gencode;arch=compute_52,code=compute_52;-use_fast_math;;;-Xcompiler;,-msse4.1,,,,,,;-Xcompiler;-O3,-DNDEBUG,-funroll-all-loops,-fexcess-precision=fast,,; CUDA driver: 7.50 CUDA runtime: 7.50 the newer machine (here execution is slower by a factor 2): gmx --version GROMACS version: 2016.3 Precision: single Memory model: 64 bit MPI library: thread_mpi OpenMP support: enabled (GMX_OPENMP_MAX_THREADS = 32) GPU support: CUDA SIMD instructions: AVX_256 FFT library: fftw-3.3.5 RDTSCP usage: enabled TNG support: enabled Hwloc support: hwloc-1.10.0 Tracing support: disabled Built on: Fri Mar 24 11:18:29 CET 2017 Built by: root@rcpe-sbd-node01 [CMAKE] Build OS/arch: Linux 3.14-2-amd64 x86_64 Build CPU vendor: Intel Build CPU brand: Intel(R) Core(TM) i7-4930K CPU @ 3.40GHz Build CPU family: 6 Model: 62 Stepping: 4 Build CPU features: aes apic avx clfsh cmov cx8 cx16 f16c htt lahf mmx msr nonstop_tsc pcid pclmuldq pdcm pdpe1gb popcnt pse rdrnd rdtscp sse2 sse3 sse4.1 sse4.2 ssse3 tdt x2apic C compiler: /usr/bin/cc GNU 4.9.2 C compiler flags: -mavx -O3 -DNDEBUG -funroll-all-loops -fexcess-precision=fast C++ compiler: /usr/bin/c++ GNU 4.9.2 C++ compiler flags: -mavx -std=c++0x -O3 -DNDEBUG -funroll-all-loops -fexcess-precision=fast CUDA compiler: /usr/bin/nvcc nvcc: NVIDIA (R) Cuda compiler driver;Copyright (c) 2005-2013 NVIDIA Corporation;Built on Wed_Jul_17_18:36:13_PDT_2013;Cuda compilation tools, release 5.5, V5.5.0 CUDA compiler flags:-gencode;arch=compute_20,code=sm_20;-gencode;arch=compute_30,code=sm_30;-gencode;arch=compute_35,code=sm_35;-gencode;arch=compute_35,code=compute_35;-use_fast_math;;;-Xcompiler;,-mavx,,,,,,;-Xcompiler;-O3,-DNDEBUG,-funroll-all-loops,-fexcess-precision=fast,,; CUDA driver: 6.50 CUDA runtime: 5.50 mdp-file: integrator = md dt = 0.001 nsteps = 0 comm-grps = System cutoff-scheme = verlet ; nstxout = 0 nstvout = 0 nstfout = 0 nstlog = 0 nstenergy = 1 ; nstlist = 10000 ns_type = grid pbc = xyz rlist = 3.9 ; coulombtype = cut-off rcoulomb = 3.9 vdw_type = cut-off rvdw = 3.9 DispCorr = no ; constraints = none ; continuation = yes -- Gromacs Users mailing list * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting! * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists * For (un)subscribe requests visit https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a mail to [email protected].
