Hi, It's a bit curious to want to run two 8-thread jobs on a machine with 10 physical cores because you'll get lots of performance imbalance because some threads must share the same physical core, but I guess it's a free world. As I suggested the other day, http://manual.gromacs.org/documentation/2016/user-guide/mdrun-performance.html#examples-for-mdrun-on-one-node has some examples. The fact you've compiled and linked with an MPI library means it may be involving itself in the thread-affinity management, but whether it is doing that is something between you, it, the docs and the cluster admins. If you're just wanting to run on a single node, do yourself a favour and build the thread-MPI flavour.
If so, you probably want more like gmx mdrun -ntomp 10 -pin on -pinoffset 0 -gpu_id 0 -s run1 gmx mdrun -ntomp 10 -pin on -pinoffset 10 -gpu_id 1 -s run2 If you want to use the MPI build, then I suggest you read up on how its mpirun will let you manage keeping the threads of processes where you want them (ie apart). Mark On Thu, Aug 18, 2016 at 7:57 AM Albert <[email protected]> wrote: > anybody has more suggestions? > > thx a lot > > > On 08/17/2016 09:07 AM, Albert wrote: > > Hello: > > > > Here is the information that you asked for. > > > > gmx_mpi mdrun -s 7.tpr -v -g 7.log -c 7.gro -x 7.xtc -ntomp 8 > > -gpu_id 0 -pin on > > > -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- > > > > GROMACS: gmx mdrun, VERSION 5.1.3 > > Executable: /soft/gromacs/5.1.3_intel/bin/gmx_mpi > > Data prefix: /soft/gromacs/5.1.3_intel > > Command line: > > gmx_mpi mdrun -s 7.tpr -v -g 7.log -c 7.gro -x 7.xtc -ntomp 8 > > -gpu_id 0 -pin on > > > > GROMACS version: VERSION 5.1.3 > > Precision: single > > Memory model: 64 bit > > MPI library: MPI > > OpenMP support: enabled (GMX_OPENMP_MAX_THREADS = 32) > > GPU support: enabled > > OpenCL support: disabled > > invsqrt routine: gmx_software_invsqrt(x) > > SIMD instructions: AVX_256 > > FFT library: fftw-3.3.4-sse2 > > RDTSCP usage: enabled > > C++11 compilation: disabled > > TNG support: enabled > > Tracing support: disabled > > Built on: Thu Aug 11 16:15:26 CEST 2016 > > Built by: albert@cudaB [CMAKE] > > Build OS/arch: Linux 3.16.7-35-desktop x86_64 > > Build CPU vendor: GenuineIntel > > Build CPU brand: Intel(R) Xeon(R) CPU E5-2690 v2 @ 3.00GHz > > Build CPU family: 6 Model: 62 Stepping: 4 > > Build CPU features: aes apic avx clfsh cmov cx8 cx16 f16c htt lahf_lm > > mmx msr nonstop_tsc pcid pclmuldq pdcm pdpe1gb popcnt pse rdrnd rdtscp > > sse2 sse3 sse4.1 > > sse4.2 ssse3 tdt x2apic > > C compiler: /soft/intel/impi/5.1.3.223/bin64/mpicc GNU 4.8.3 > > C compiler flags: -mavx -Wextra -Wno-missing-field-initializers > > -Wno-sign-compare -Wpointer-arith -Wall -Wno-unused -Wunused-value > > -Wunused-parameter - > > O3 -DNDEBUG -funroll-all-loops -fexcess-precision=fast -Wno-array-bounds > > C++ compiler: /soft/intel/impi/5.1.3.223/bin64/mpicxx GNU 4.8.3 > > C++ compiler flags: -mavx -Wextra -Wno-missing-field-initializers > > -Wpointer-arith -Wall -Wno-unused-function -O3 -DNDEBUG > > -funroll-all-loops -fexcess-pre > > cision=fast -Wno-array-bounds > > Boost version: 1.54.0 (external) > > CUDA compiler: /usr/local/cuda/bin/nvcc nvcc: NVIDIA (R) Cuda > > compiler driver;Copyright (c) 2005-2016 NVIDIA Corporation;Built on > > Wed_May__4_21:01:56_CDT > > _2016;Cuda compilation tools, release 8.0, V8.0.26 > > CUDA compiler > > > flags:-gencode;arch=compute_20,code=sm_20;-gencode;arch=compute_30,code=sm_30;-gencode;arch=compute_35,code=sm_35;-gencode;arch=compute_37,code= > > > sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_52,code=sm_52;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode > > > > > ;arch=compute_60,code=compute_60;-gencode;arch=compute_61,code=compute_61;-use_fast_math;; > > ;-mavx;-Wextra;-Wno-missing-field-initializers;-Wpointer-arith;-Wal > > > l;-Wno-unused-function;-O3;-DNDEBUG;-funroll-all-loops;-fexcess-precision=fast;-Wno-array-bounds; > > > > CUDA driver: 8.0 > > CUDA runtime: 8.0 > > > > Running on 1 node with total 10 cores, 20 logical cores, 2 compatible > > GPUs > > Hardware detected on host cudaB (the node of MPI rank 0): > > CPU info: > > Vendor: GenuineIntel > > Brand: Intel(R) Xeon(R) CPU E5-2690 v2 @ 3.00GHz > > Family: 6 model: 62 stepping: 4 > > CPU features: aes apic avx clfsh cmov cx8 cx16 f16c htt lahf_lm > > mmx msr nonstop_tsc pcid pclmuldq pdcm pdpe1gb popcnt pse rdrnd rdtscp > > sse2 sse3 sse4.1 ss > > e4.2 ssse3 tdt x2apic > > SIMD instructions most likely to fit this hardware: AVX_256 > > SIMD instructions selected at GROMACS compile time: AVX_256 > > GPU info: > > Number of GPUs detected: 2 > > #0: NVIDIA GeForce GTX 780 Ti, compute cap.: 3.5, ECC: no, stat: > > compatible > > #1: NVIDIA GeForce GTX 780 Ti, compute cap.: 3.5, ECC: no, stat: > > compatible > > > ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ > > > > > > > > > > > > > > > > > > gmx_mpi mdrun -s 7.tpr -v -g 7.log -c 7.gro -x 7.xtc -ntomp 8 > > -gpu_id 1 -pin on -cpi -append -pinoffset 8 > > > ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ > > > > GROMACS: gmx mdrun, VERSION 5.1.3 > > Executable: /soft/gromacs/5.1.3_intel/bin/gmx_mpi > > Data prefix: /soft/gromacs/5.1.3_intel > > Command line: > > gmx_mpi mdrun -s 7.tpr -v -g 7.log -c 7.gro -x 7.xtc -ntomp 8 > > -gpu_id 1 -pin on -cpi -append -pinoffset 8 > > > > > > Running on 1 node with total 10 cores, 20 logical cores, 2 compatible > > GPUs > > Hardware detected on host cudaB (the node of MPI rank 0): > > CPU info: > > Vendor: GenuineIntel > > Brand: Intel(R) Xeon(R) CPU E5-2690 v2 @ 3.00GHz > > SIMD instructions most likely to fit this hardware: AVX_256 > > SIMD instructions selected at GROMACS compile time: AVX_256 > > GPU info: > > Number of GPUs detected: 2 > > #0: NVIDIA GeForce GTX 780 Ti, compute cap.: 3.5, ECC: no, stat: > > compatible > > #1: NVIDIA GeForce GTX 780 Ti, compute cap.: 3.5, ECC: no, stat: > > compatible > > > > Reading file 7.tpr, VERSION 5.1.3 (single precision) > > > > Reading checkpoint file state.cpt generated: Wed Aug 17 09:01:46 2016 > > > > > > Using 1 MPI process > > Using 8 OpenMP threads > > > > 1 GPU user-selected for this run. > > Mapping of GPU ID to the 1 PP rank in this node: 1 > > > > Applying core pinning offset 8 > > starting mdrun 'Title' > > 50000000 steps, 100000.0 ps (continuing from step 5746000, 11492.0 ps). > > step 5746080: timed with pme grid 60 60 84, coulomb cutoff 1.000: > > 2451.9 M-cycles > > > > > > > > > > > > > > > > On 08/16/2016 05:27 PM, Szilárd Páll wrote: > >> Most of that copy-pasted info is not what I asked for and overall not > >> very useful. You have still not shown any log files (or details on the > >> hardware). Share the *relevant* stuff, please! > >> -- > >> Szilárd > >> > >> > >> On Tue, Aug 16, 2016 at 5:07 PM, Albert <[email protected]> wrote: > >>> Hello: > >>> > >>> Here is my MDP file: > >>> > >>> define = -DREST_ON -DSTEP6_4 > >>> integrator = md > >>> dt = 0.002 > >>> nsteps = 1000000 > >>> nstlog = 1000 > >>> nstxout = 0 > >>> nstvout = 0 > >>> nstfout = 0 > >>> nstcalcenergy = 100 > >>> nstenergy = 1000 > >>> nstxout-compressed = 10000 > >>> ; > >>> cutoff-scheme = Verlet > >>> nstlist = 20 > >>> rlist = 1.0 > >>> coulombtype = pme > >>> rcoulomb = 1.0 > >>> vdwtype = Cut-off > >>> vdw-modifier = Force-switch > >>> rvdw_switch = 0.9 > >>> rvdw = 1.0 > >>> ; > >>> tcoupl = berendsen > >>> tc_grps = PROT MEMB SOL_ION > >>> tau_t = 1.0 1.0 1.0 > >>> ref_t = 310 310 310 > >>> ; > >>> pcoupl = berendsen > >>> pcoupltype = semiisotropic > >>> tau_p = 5.0 > >>> compressibility = 4.5e-5 4.5e-5 > >>> ref_p = 1.0 1.0 > >>> ; > >>> constraints = h-bonds > >>> constraint_algorithm = LINCS > >>> continuation = yes > >>> ; > >>> nstcomm = 100 > >>> comm_mode = linear > >>> comm_grps = PROT MEMB SOL_ION > >>> ; > >>> refcoord_scaling = com > >>> > >>> > >>> I compiled Gromacs with the following settings, using Intel MPI: > >>> > >>> env CC=mpicc CXX=mpicxx F77=mpif90 FC=mpif90 LDF90=mpif90 > >>> CMAKE_PREFIX_PATH=/soft/gromacs/fftw-3.3.4:/soft/intel/impi/5.1.3.223 > cmake > >>> > >>> .. -DBUILD_SHARED_LIB=OFF -DBUILD_TESTING=OFF > >>> -DCMAKE_INSTALL_PREFIX=/soft/gromacs/5.1.3_intel -DGMX_MPI=ON > >>> -DGMX_GPU=ON > >>> -DGMX_PREFER_STATIC_LIBS=ON -DCUDA_TOOLKIT_ROOT_DIR=/usr/local/cuda > >>> > >>> > >>> I tried it again with one of the job with options: > >>> > >>> -ntomp 8 -pin on -pinoffset 8 > >>> > >>> > >>> The two submitted jobs can still only use 8 CPU and the speed is > >>> extremely > >>> slow (10ns/day)....when I remove option "-pin on" from one of the > >>> job, it > >>> fasten a lot (32ns/day) and 16 CPU were used..... If I only submit > >>> one job > >>> with option "-pin on", I can obtain 52ns/day.......... > >>> > >>> > >>> thx a lot > >>> > >>> > >>> On 08/16/2016 04:59 PM, Szilárd Páll wrote: > >>>> Hi, > >>>> > >>>> Without log and hw configs, I it's hard to tell what's happening. > >>>> > >>>> By turning off pinning the OS is free to move threads around and it > >>>> will try to ensure cores are utilized. However, by leaving threads > >>>> up-pinned you risk taking a significant performance hit. So I'd > >>>> recommend that you run with correct settings. > >>>> > >>>> If you start with "-ntomp 8 -pin on -pioffset 8" (and you indeed have > >>>> 16 cores no HT), you should be able to see in htop the first eight > >>>> cores empty while the last eight occupied. > >>>> > >>>> Cheers, > >>>> -- > >>>> Szilárd > >>> > >>> -- > >>> Gromacs Users mailing list > >>> > >>> * Please search the archive at > >>> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before > >>> posting! > >>> > >>> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists > >>> > >>> * For (un)subscribe requests visit > >>> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users > >>> or send a > >>> mail to [email protected]. > > > > -- > Gromacs Users mailing list > > * Please search the archive at > http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before > posting! > > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists > > * For (un)subscribe requests visit > https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or > send a mail to [email protected]. -- Gromacs Users mailing list * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting! * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists * For (un)subscribe requests visit https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a mail to [email protected].
