Re: [gmx-users] performance issue with many short MD runs

Peter Kroon Mon, 27 Mar 2017 07:25:48 -0700

Hi,


On the new machine your CUDA runtime and driver versions are lower than
on the old machine. Maybe that could explain it? (is the GPU even used
with -rerun?) You would need to recompile gromacs.


Peter


On 27-03-17 15:51, Michael Brunsteiner wrote:
> Hi,I have to run a lot (many thousands) of very short MD reruns with 
> gmx.Using gmx-2016.3 it works without problems, however, what i see is 
> thatthe overall performance (in terms of REAL execution time as measured with 
> the unix time command)which I get on a relatively new computer is poorer than 
> what i get with a much older machine 
> (by a factor of about 2 -  this in spite of gmx reporting a better 
> performance of the new machine in thelog file)
>
> both machines run linux (debian), the old has eight intel cores the newer one 
> 12. 
> on the newer machine gmx uses a supposedly faster SIMD instruction 
> setotherwise hardware (including hard drives) is comparable.
>
> below output of a typical job (gmx mdrun -rerun with a trajectory 
> containingnot more than a couple of thousand conformations of a single small 
> molecule)on both machines (mdp file content below)
>
> old machine:prompt> time gmx mdrun ...
> in the log file:
>                Core t (s)   Wall t (s)        (%)
>        Time:        4.527        0.566      800.0
>                  (ns/day)    (hour/ns)
> Performance:        1.527       15.719
> on the command line:
> real    2m45.562s  <====================================
> user    15m40.901s
> sys     0m33.319s
>
> new machine:
> prompt> time gmx mdrun ...
> in the log file:               Core t (s)   Wall t (s)        (%)
>        Time:        6.030        0.502     1200.0
>                  (ns/day)    (hour/ns)
> Performance:        1.719       13.958
>
> on the command line:real    5m30.962s  <====================================
> user    20m2.208s
> sys     3m28.676s
>
>  The specs of the two gmx installations are given below.I'd be grateful if 
> anyone could suggest ways to improve performanceon the newer machine!
> cheers,Michael
>
>
> the older machine (here the jobs run faster):  gmx --version
>
> GROMACS version:    2016.3
> Precision:          single
> Memory model:       64 bit
> MPI library:        thread_mpi
> OpenMP support:     enabled (GMX_OPENMP_MAX_THREADS = 32)
> GPU support:        CUDA
> SIMD instructions:  SSE4.1
> FFT library:        fftw-3.3.5-sse2
> RDTSCP usage:       enabled
> TNG support:        enabled
> Hwloc support:      hwloc-1.8.0
> Tracing support:    disabled
> Built on:           Tue Mar 21 11:24:42 CET 2017
> Built by:           root@rcpetemp1 [CMAKE]
> Build OS/arch:      Linux 3.13.0-79-generic x86_64
> Build CPU vendor:   Intel
> Build CPU brand:    Intel(R) Core(TM) i7 CPU         960  @ 3.20GHz
> Build CPU family:   6   Model: 26   Stepping: 5
> Build CPU features: apic clfsh cmov cx8 cx16 htt lahf mmx msr nonstop_tsc 
> pdcm popcnt pse rdtscp sse2 sse3 sse4.1 sse4.2 ssse3
> C compiler:         /usr/bin/cc GNU 4.8.4
> C compiler flags:    -msse4.1     -O3 -DNDEBUG -funroll-all-loops 
> -fexcess-precision=fast  
> C++ compiler:       /usr/bin/c++ GNU 4.8.4
> C++ compiler flags:  -msse4.1    -std=c++0x   -O3 -DNDEBUG -funroll-all-loops 
> -fexcess-precision=fast  
> CUDA compiler:      /usr/local/cuda/bin/nvcc nvcc: NVIDIA (R) Cuda compiler 
> driver;Copyright (c) 2005-2015 NVIDIA Corporation;Built on 
> Tue_Aug_11_14:27:32_CDT_2015;Cuda compilation tools, release 7.5, V7.5.17
> CUDA compiler 
> flags:-gencode;arch=compute_20,code=sm_20;-gencode;arch=compute_30,code=sm_30;-gencode;arch=compute_35,code=sm_35;-gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_52,code=sm_52;-gencode;arch=compute_52,code=compute_52;-use_fast_math;;;-Xcompiler;,-msse4.1,,,,,,;-Xcompiler;-O3,-DNDEBUG,-funroll-all-loops,-fexcess-precision=fast,,;
>  
> CUDA driver:        7.50
> CUDA runtime:       7.50
>
>
>
> the newer machine (here execution is slower by a factor 2):  gmx --version
>
> GROMACS version:    2016.3
> Precision:          single
> Memory model:       64 bit
> MPI library:        thread_mpi
> OpenMP support:     enabled (GMX_OPENMP_MAX_THREADS = 32)
> GPU support:        CUDA
> SIMD instructions:  AVX_256
> FFT library:        fftw-3.3.5
> RDTSCP usage:       enabled
> TNG support:        enabled
> Hwloc support:      hwloc-1.10.0
> Tracing support:    disabled
> Built on:           Fri Mar 24 11:18:29 CET 2017
> Built by:           root@rcpe-sbd-node01 [CMAKE]
> Build OS/arch:      Linux 3.14-2-amd64 x86_64
> Build CPU vendor:   Intel
> Build CPU brand:    Intel(R) Core(TM) i7-4930K CPU @ 3.40GHz
> Build CPU family:   6   Model: 62   Stepping: 4
> Build CPU features: aes apic avx clfsh cmov cx8 cx16 f16c htt lahf mmx msr 
> nonstop_tsc pcid pclmuldq pdcm pdpe1gb popcnt pse rdrnd rdtscp sse2 sse3 
> sse4.1 sse4.2 ssse3 tdt x2apic
> C compiler:         /usr/bin/cc GNU 4.9.2
> C compiler flags:    -mavx     -O3 -DNDEBUG -funroll-all-loops 
> -fexcess-precision=fast  
> C++ compiler:       /usr/bin/c++ GNU 4.9.2
> C++ compiler flags:  -mavx    -std=c++0x   -O3 -DNDEBUG -funroll-all-loops 
> -fexcess-precision=fast  
> CUDA compiler:      /usr/bin/nvcc nvcc: NVIDIA (R) Cuda compiler 
> driver;Copyright (c) 2005-2013 NVIDIA Corporation;Built on 
> Wed_Jul_17_18:36:13_PDT_2013;Cuda compilation tools, release 5.5, V5.5.0
> CUDA compiler 
> flags:-gencode;arch=compute_20,code=sm_20;-gencode;arch=compute_30,code=sm_30;-gencode;arch=compute_35,code=sm_35;-gencode;arch=compute_35,code=compute_35;-use_fast_math;;;-Xcompiler;,-mavx,,,,,,;-Xcompiler;-O3,-DNDEBUG,-funroll-all-loops,-fexcess-precision=fast,,;
>  
> CUDA driver:        6.50
> CUDA runtime:       5.50
>
>
>
> mdp-file:
>
> integrator               = md
> dt                       = 0.001
> nsteps                   = 0
> comm-grps                = System
> cutoff-scheme            = verlet
> ;
> nstxout                  = 0
> nstvout                  = 0
> nstfout                  = 0
> nstlog                   = 0
> nstenergy                = 1
> ;
> nstlist                  = 10000
> ns_type                  = grid
> pbc                      = xyz
> rlist                    = 3.9
> ;
> coulombtype              = cut-off
> rcoulomb                 = 3.9
> vdw_type                 = cut-off
> rvdw                     = 3.9
> DispCorr                 = no
> ;
> constraints              = none
> ;
> continuation             = yes

-- 
Gromacs Users mailing list

* Please search the archive at 
http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting!

* Can't post? Read http://www.gromacs.org/Support/Mailing_Lists

* For (un)subscribe requests visit
https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a 
mail to [email protected].

Re: [gmx-users] performance issue with many short MD runs

Reply via email to