Hi. I wonder if gromacs 4.6.7 can run faster on xsede.org because I see cpu waits for gpu in the log.
There is 16 cpu (2.7 GHz), 1 phi co-processor, and 1 GPU. I compiled gromacs with gpu and without phi and with intel compiler and mkl. I didn't install for 5.0.1 because I worry this bug might mess up equilibration when I switch from one ensemble to another one ( http://redmine.gromacs.org/issues/1603). Below are from the log: Gromacs version: VERSION 4.6.7 Precision: single Memory model: 64 bit MPI library: thread_mpi OpenMP support: enabled GPU support: enabled invsqrt routine: gmx_software_invsqrt(x) CPU acceleration: AVX_256 FFT library: MKL Large file support: enabled RDTSCP usage: enabled Built on: Wed Sep 24 08:33:22 CDT 2014 Built by: jlu...@login2.stampede.tacc.utexas.edu [CMAKE] Build OS/arch: Linux 2.6.32-431.17.1.el6.x86_64 x86_64 Build CPU vendor: GenuineIntel Build CPU brand: Intel(R) Xeon(R) CPU E5-2680 0 @ 2.70GHz Build CPU family: 6 Model: 45 Stepping: 7 Build CPU features: aes apic avx clfsh cmov cx8 cx16 htt lahf_lm mmx msr nonstop_tsc pcid pclmuldq pdcm pdpe1gb popcnt pse rdtscp sse2 sse3 sse4.1 sse4.2 ssse3 tdt x2apic C compiler: /opt/apps/intel/13/composer_xe_2013.3.163/bin/intel64/icc Intel icc (ICC) 13.1.1 20130313 C compiler flags: -mavx -mkl=sequential -std=gnu99 -Wall -ip -funroll-all-loops -O3 -DNDEBUG C++ compiler: /opt/apps/intel/13/composer_xe_2013.3.163/bin/intel64/icc Intel icc (ICC) 13.1.1 20130313 C++ compiler flags: -mavx -Wall -ip -funroll-all-loops -O3 -DNDEBUG Linked with Intel MKL version 11.0.3. CUDA compiler: /opt/apps/cuda/6.0/bin/nvcc nvcc: NVIDIA (R) Cuda compiler driver;Copyright (c) 2005-2013 NVIDIA Corporation;Built on Thu_Mar_13_11:58:58_PDT_2014;Cuda compilation tools, release 6.0, V6.0.1 CUDA compiler flags:-gencode;arch=compute_20,code=sm_20;-gencode;arch=compute_20,code=sm_21;-gencode;arch=compute_30,code=sm_30;-gencode;arch=compute_35,code=sm_35;-gencode;arch=compute_35,code=compute_35;-use_fast_math;; -mavx;-Wall;-ip;-funroll-all-loops;-O3;-DNDEBUG CUDA driver: 6.0 CUDA runtime: 6.0 ... Using 1 MPI thread Using 16 OpenMP threads Detecting CPU-specific acceleration. Present hardware specification: Vendor: GenuineIntel Brand: Intel(R) Xeon(R) CPU E5-2680 0 @ 2.70GHz Family: 6 Model: 45 Stepping: 7 Features: aes apic avx clfsh cmov cx8 cx16 htt lahf_lm mmx msr nonstop_tsc pcid pclmuldq pdcm pdpe1gb popcnt pse rdtscp sse2 sse3 sse4.1 sse4.2 ssse3 tdt x2apic Acceleration most likely to fit this hardware: AVX_256 Acceleration selected at GROMACS compile time: AVX_256 1 GPU detected: #0: NVIDIA Tesla K20m, compute cap.: 3.5, ECC: yes, stat: compatible 1 GPU auto-selected for this run. Mapping of GPU to the 1 PP rank in this node: #0 Will do PME sum in reciprocal space. ... M E G A - F L O P S A C C O U N T I N G NB=Group-cutoff nonbonded kernels NxN=N-by-N cluster Verlet kernels RF=Reaction-Field VdW=Van der Waals QSTab=quadratic-spline table W3=SPC/TIP3p W4=TIP4p (single or pairs) V&F=Potential and force V=Potential only F=Force only Computing: M-Number M-Flops % Flops ----------------------------------------------------------------------------- Pair Search distance check 1517304.154000 13655737.386 0.1 NxN Ewald Elec. + VdW [F] 370461474.587968 24450457322.806 92.7 NxN Ewald Elec. + VdW [V&F] 3742076.012672 400402133.356 1.5 1,4 nonbonded interactions 101910.006794 9171900.611 0.0 Calc Weights 1343655.089577 48371583.225 0.2 Spread Q Bspline 28664641.910976 57329283.822 0.2 Gather F Bspline 28664641.910976 171987851.466 0.7 3D-FFT 141557361.449024 1132458891.592 4.3 Solve PME 61439.887616 3932152.807 0.0 Shift-X 11197.154859 67182.929 0.0 Angles 71010.004734 11929680.795 0.0 Propers 108285.007219 24797266.653 0.1 Impropers 8145.000543 1694160.113 0.0 Virial 44856.029904 807408.538 0.0 Stop-CM 4478.909718 44789.097 0.0 Calc-Ekin 89577.059718 2418580.612 0.0 Lincs 39405.002627 2364300.158 0.0 Lincs-Mat 852120.056808 3408480.227 0.0 Constraint-V 487680.032512 3901440.260 0.0 Constraint-Vir 44827.529885 1075860.717 0.0 Settle 136290.009086 44021672.935 0.2 ----------------------------------------------------------------------------- Total 26384297680.107 100.0 ----------------------------------------------------------------------------- R E A L C Y C L E A N D T I M E A C C O U N T I N G Computing: Nodes Th. Count Wall t (s) G-Cycles % ----------------------------------------------------------------------------- Neighbor search 1 16 375001 578.663 24997.882 1.7 Launch GPU ops. 1 16 15000001 814.410 35181.984 2.3 Force 1 16 15000001 2954.603 127637.010 8.5 PME mesh 1 16 15000001 11736.454 507007.492 33.7 Wait GPU local 1 16 15000001 11159.455 482081.496 32.0 NB X/F buffer ops. 1 16 29625001 1061.959 45875.952 3.0 Write traj. 1 16 39 5.207 224.956 0.0 -- Gromacs Users mailing list * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting! * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists * For (un)subscribe requests visit https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a mail to gmx-users-requ...@gromacs.org.