Yup, your assessment agrees with our guess. Our HPC guru will be taking his findings, along with your quote, to the admins.
Thank you, Alex On Thu, May 9, 2019 at 2:51 PM Szilárd Páll <pall.szil...@gmail.com> wrote: > On Thu, May 9, 2019 at 10:01 PM Alex <nedoma...@gmail.com> wrote: > > > Okay, we're positively unable to run a Gromacs (2019.1) test on Power9. > The > > test procedure is simple, using slurm: > > 1. Request an interactive session: > srun -N 1 -n 20 --pty > > --partition=debug --time=1:00:00 --gres=gpu:1 bash > > 2. Load CUDA library: module load cuda > > 3. Run test batch. This starts with a CPU-only static EM, which, despite > > the mdrun variables, runs on a single thread. Any help will be highly > > appreciated. > > > > md.log below: > > > > GROMACS: gmx mdrun, version 2019.1 > > Executable: /home/reida/ppc64le/stow/gromacs/bin/gmx > > Data prefix: /home/reida/ppc64le/stow/gromacs > > Working dir: /home/smolyan/gmx_test1 > > Process ID: 115831 > > Command line: > > gmx mdrun -pin on -pinstride 2 -ntomp 4 -ntmpi 4 -pme cpu -nb cpu -s > > em.tpr -o traj.trr -g md.log -c after_em.pdb > > > > GROMACS version: 2019.1 > > Precision: single > > Memory model: 64 bit > > MPI library: thread_mpi > > OpenMP support: enabled (GMX_OPENMP_MAX_THREADS = 64) > > GPU support: CUDA > > SIMD instructions: IBM_VSX > > FFT library: fftw-3.3.8 > > RDTSCP usage: disabled > > TNG support: enabled > > Hwloc support: hwloc-1.11.8 > > Tracing support: disabled > > C compiler: /opt/rh/devtoolset-7/root/usr/bin/cc GNU 7.3.1 > > C compiler flags: -mcpu=power9 -mtune=power9 -mvsx -O2 -DNDEBUG > > -funroll-all-loops -fexcess-precision=fast > > C++ compiler: /opt/rh/devtoolset-7/root/usr/bin/c++ GNU 7.3.1 > > C++ compiler flags: -mcpu=power9 -mtune=power9 -mvsx -std=c++11 -O2 > > -DNDEBUG -funroll-all-loops -fexcess-precision=fast > > CUDA compiler: /usr/local/cuda-10.0/bin/nvcc nvcc: NVIDIA (R) Cuda > > compiler driver;Copyright (c) 2005-2018 NVIDIA Corporation;Built on > > Sat_Aug_25_21:10:00_CDT_2018;Cuda compilation tools, release 10.0, > > V10.0.130 > > CUDA compiler > > > > > flags:-gencode;arch=compute_30,code=sm_30;-gencode;arch=compute_35,code=sm_35;-gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_52,code=sm_52;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=compute_75;-use_fast_math;;; > > > > > -mcpu=power9;-mtune=power9;-mvsx;-std=c++11;-O2;-DNDEBUG;-funroll-all-loops;-fexcess-precision=fast; > > CUDA driver: 10.10 > > CUDA runtime: 10.0 > > > > > > Running on 1 node with total 160 cores, 160 logical cores, 1 compatible > GPU > > Hardware detected: > > CPU info: > > Vendor: IBM > > Brand: POWER9, altivec supported > > Family: 0 Model: 0 Stepping: 0 > > Features: vmx vsx > > Hardware topology: Only logical processor count > > GPU info: > > Number of GPUs detected: 1 > > #0: NVIDIA Tesla V100-SXM2-16GB, compute cap.: 7.0, ECC: yes, stat: > > compatible > > > > > > ++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++ > > > > *SKIPPED* > > > > Input Parameters: > > integrator = steep > > tinit = 0 > > dt = 0.001 > > nsteps = 50000 > > init-step = 0 > > simulation-part = 1 > > comm-mode = Linear > > nstcomm = 100 > > bd-fric = 0 > > ld-seed = 1941752878 > > emtol = 100 > > emstep = 0.01 > > niter = 20 > > fcstep = 0 > > nstcgsteep = 1000 > > nbfgscorr = 10 > > rtpi = 0.05 > > nstxout = 0 > > nstvout = 0 > > nstfout = 0 > > nstlog = 1000 > > nstcalcenergy = 100 > > nstenergy = 1000 > > nstxout-compressed = 0 > > compressed-x-precision = 1000 > > cutoff-scheme = Verlet > > nstlist = 1 > > ns-type = Grid > > pbc = xyz > > periodic-molecules = true > > verlet-buffer-tolerance = 0.005 > > rlist = 1.2 > > coulombtype = PME > > coulomb-modifier = Potential-shift > > rcoulomb-switch = 0 > > rcoulomb = 1.2 > > epsilon-r = 1 > > epsilon-rf = inf > > vdw-type = Cut-off > > vdw-modifier = Potential-shift > > rvdw-switch = 0 > > rvdw = 1.2 > > DispCorr = No > > table-extension = 1 > > fourierspacing = 0.12 > > fourier-nx = 52 > > fourier-ny = 52 > > fourier-nz = 52 > > pme-order = 4 > > ewald-rtol = 1e-05 > > ewald-rtol-lj = 0.001 > > lj-pme-comb-rule = Geometric > > ewald-geometry = 0 > > epsilon-surface = 0 > > tcoupl = No > > nsttcouple = -1 > > nh-chain-length = 0 > > print-nose-hoover-chain-variables = false > > pcoupl = No > > pcoupltype = Isotropic > > nstpcouple = -1 > > tau-p = 1 > > compressibility (3x3): > > compressibility[ 0]={ 0.00000e+00, 0.00000e+00, 0.00000e+00} > > compressibility[ 1]={ 0.00000e+00, 0.00000e+00, 0.00000e+00} > > compressibility[ 2]={ 0.00000e+00, 0.00000e+00, 0.00000e+00} > > ref-p (3x3): > > ref-p[ 0]={ 0.00000e+00, 0.00000e+00, 0.00000e+00} > > ref-p[ 1]={ 0.00000e+00, 0.00000e+00, 0.00000e+00} > > ref-p[ 2]={ 0.00000e+00, 0.00000e+00, 0.00000e+00} > > refcoord-scaling = No > > posres-com (3): > > posres-com[0]= 0.00000e+00 > > posres-com[1]= 0.00000e+00 > > posres-com[2]= 0.00000e+00 > > posres-comB (3): > > posres-comB[0]= 0.00000e+00 > > posres-comB[1]= 0.00000e+00 > > posres-comB[2]= 0.00000e+00 > > QMMM = false > > QMconstraints = 0 > > QMMMscheme = 0 > > MMChargeScaleFactor = 1 > > qm-opts: > > ngQM = 0 > > constraint-algorithm = Lincs > > continuation = false > > Shake-SOR = false > > shake-tol = 0.0001 > > lincs-order = 4 > > lincs-iter = 1 > > lincs-warnangle = 30 > > nwall = 0 > > wall-type = 9-3 > > wall-r-linpot = -1 > > wall-atomtype[0] = -1 > > wall-atomtype[1] = -1 > > wall-density[0] = 0 > > wall-density[1] = 0 > > wall-ewald-zfac = 3 > > pull = false > > awh = false > > rotation = false > > interactiveMD = false > > disre = No > > disre-weighting = Conservative > > disre-mixed = false > > dr-fc = 1000 > > dr-tau = 0 > > nstdisreout = 100 > > orire-fc = 0 > > orire-tau = 0 > > nstorireout = 100 > > free-energy = no > > cos-acceleration = 0 > > deform (3x3): > > deform[ 0]={ 0.00000e+00, 0.00000e+00, 0.00000e+00} > > deform[ 1]={ 0.00000e+00, 0.00000e+00, 0.00000e+00} > > deform[ 2]={ 0.00000e+00, 0.00000e+00, 0.00000e+00} > > simulated-tempering = false > > swapcoords = no > > userint1 = 0 > > userint2 = 0 > > userint3 = 0 > > userint4 = 0 > > userreal1 = 0 > > userreal2 = 0 > > userreal3 = 0 > > userreal4 = 0 > > applied-forces: > > electric-field: > > x: > > E0 = 0 > > omega = 0 > > t0 = 0 > > sigma = 0 > > y: > > E0 = 0 > > omega = 0 > > t0 = 0 > > sigma = 0 > > z: > > E0 = 0 > > omega = 0 > > t0 = 0 > > sigma = 0 > > grpopts: > > nrdf: 47805 > > ref-t: 0 > > tau-t: 0 > > annealing: No > > annealing-npoints: 0 > > acc: 0 0 0 > > nfreeze: N N N > > energygrp-flags[ 0]: 0 > > > > > > Initializing Domain Decomposition on 4 ranks > > NOTE: disabling dynamic load balancing as it is only supported with > > dynamics, not with integrator 'steep'. > > Dynamic load balancing: auto > > Using update groups, nr 10529, average size 2.5 atoms, max. radius 0.078 > nm > > Minimum cell size due to atom displacement: 0.000 nm > > NOTE: Periodic molecules are present in this system. Because of this, the > > domain decomposition algorithm cannot easily determine the minimum cell > > size that it requires for treating bonded interactions. Instead, domain > > decomposition will assume that half the non-bonded cut-off will be a > > suitable lower bound. > > Minimum cell size due to bonded interactions: 0.678 nm > > Using 0 separate PME ranks, as there are too few total > > ranks for efficient splitting > > Optimizing the DD grid for 4 cells with a minimum initial size of 0.678 > nm > > The maximum allowed number of cells is: X 8 Y 8 Z 8 > > Domain decomposition grid 1 x 4 x 1, separate PME ranks 0 > > PME domain decomposition: 1 x 4 x 1 > > Domain decomposition rank 0, coordinates 0 0 0 > > > > The initial number of communication pulses is: Y 1 > > The initial domain decomposition cell size is: Y 1.50 nm > > > > The maximum allowed distance for atom groups involved in interactions is: > > non-bonded interactions 1.356 nm > > two-body bonded interactions (-rdd) 1.356 nm > > multi-body bonded interactions (-rdd) 1.356 nm > > virtual site constructions (-rcon) 1.503 nm > > > > Using 4 MPI threads > > Using 4 OpenMP threads per tMPI thread > > > > > > Overriding thread affinity set outside gmx mdrun > > > > Pinning threads with a user-specified logical core stride of 2 > > > > NOTE: Thread affinity was not set. > > > > The threads are not pinned -- see above --, but why I can't say. I suggest: > i) talk to your admins ii) try to tell the job scheduler to not set > affinities and let mdrun set it. > > > > System total charge: 0.000 > > Will do PME sum in reciprocal space for electrostatic interactions. > > > > ++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++ > > U. Essmann, L. Perera, M. L. Berkowitz, T. Darden, H. Lee and L. G. > > Pedersen > > A smooth particle mesh Ewald method > > J. Chem. Phys. 103 (1995) pp. 8577-8592 > > -------- -------- --- Thank You --- -------- -------- > > > > Using a Gaussian width (1/beta) of 0.384195 nm for Ewald > > Potential shift: LJ r^-12: -1.122e-01 r^-6: -3.349e-01, Ewald -8.333e-06 > > Initialized non-bonded Ewald correction tables, spacing: 1.02e-03 size: > > 1176 > > > > Generated table with 1100 data points for 1-4 COUL. > > Tabscale = 500 points/nm > > Generated table with 1100 data points for 1-4 LJ6. > > Tabscale = 500 points/nm > > Generated table with 1100 data points for 1-4 LJ12. > > Tabscale = 500 points/nm > > > > Using SIMD 4x4 nonbonded short-range kernels > > > > Using a 4x4 pair-list setup: > > updated every 1 steps, buffer 0.000 nm, rlist 1.200 nm > > > > Using geometric Lennard-Jones combination rule > > > > > > ++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++ > > S. Miyamoto and P. A. Kollman > > SETTLE: An Analytical Version of the SHAKE and RATTLE Algorithms for > Rigid > > Water Models > > J. Comp. Chem. 13 (1992) pp. 952-962 > > -------- -------- --- Thank You --- -------- -------- > > > > > > Linking all bonded interactions to atoms > > There are 5407 inter charge-group virtual sites, > > will an extra communication step for selected coordinates and forces > > > > > > Note that activating steepest-descent energy minimization via the > > integrator .mdp option and the command gmx mdrun may be available in a > > different form in a future version of GROMACS, e.g. gmx minimize and an > > .mdp option. > > Initiating Steepest Descents > > > > Atom distribution over 4 domains: av 6687 stddev 134 min 6515 max 6792 > > Started Steepest Descents on rank 0 Thu May 9 15:49:36 2019 > > -- > > Gromacs Users mailing list > > > > * Please search the archive at > > http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before > > posting! > > > > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists > > > > * For (un)subscribe requests visit > > https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or > > send a mail to gmx-users-requ...@gromacs.org. > > > -- > Gromacs Users mailing list > > * Please search the archive at > http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before > posting! > > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists > > * For (un)subscribe requests visit > https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or > send a mail to gmx-users-requ...@gromacs.org. > -- Gromacs Users mailing list * Please search the archive at http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting! * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists * For (un)subscribe requests visit https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a mail to gmx-users-requ...@gromacs.org.