Yup, your assessment agrees with our guess. Our HPC guru will be taking his
findings, along with your quote, to the admins.

Thank you,

Alex

On Thu, May 9, 2019 at 2:51 PM Szilárd Páll <pall.szil...@gmail.com> wrote:

> On Thu, May 9, 2019 at 10:01 PM Alex <nedoma...@gmail.com> wrote:
>
> > Okay, we're positively unable to run a Gromacs (2019.1) test on Power9.
> The
> > test procedure is simple, using slurm:
> > 1. Request an interactive session: > srun -N 1 -n 20 --pty
> > --partition=debug --time=1:00:00 --gres=gpu:1 bash
> > 2. Load CUDA library: module load cuda
> > 3. Run test batch. This starts with a CPU-only static EM, which, despite
> > the mdrun variables, runs on a single thread. Any help will be highly
> > appreciated.
> >
> >  md.log below:
> >
> > GROMACS:      gmx mdrun, version 2019.1
> > Executable:   /home/reida/ppc64le/stow/gromacs/bin/gmx
> > Data prefix:  /home/reida/ppc64le/stow/gromacs
> > Working dir:  /home/smolyan/gmx_test1
> > Process ID:   115831
> > Command line:
> >   gmx mdrun -pin on -pinstride 2 -ntomp 4 -ntmpi 4 -pme cpu -nb cpu -s
> > em.tpr -o traj.trr -g md.log -c after_em.pdb
> >
> > GROMACS version:    2019.1
> > Precision:          single
> > Memory model:       64 bit
> > MPI library:        thread_mpi
> > OpenMP support:     enabled (GMX_OPENMP_MAX_THREADS = 64)
> > GPU support:        CUDA
> > SIMD instructions:  IBM_VSX
> > FFT library:        fftw-3.3.8
> > RDTSCP usage:       disabled
> > TNG support:        enabled
> > Hwloc support:      hwloc-1.11.8
> > Tracing support:    disabled
> > C compiler:         /opt/rh/devtoolset-7/root/usr/bin/cc GNU 7.3.1
> > C compiler flags:   -mcpu=power9 -mtune=power9  -mvsx     -O2 -DNDEBUG
> > -funroll-all-loops -fexcess-precision=fast
> > C++ compiler:       /opt/rh/devtoolset-7/root/usr/bin/c++ GNU 7.3.1
> > C++ compiler flags: -mcpu=power9 -mtune=power9  -mvsx    -std=c++11   -O2
> > -DNDEBUG -funroll-all-loops -fexcess-precision=fast
> > CUDA compiler:      /usr/local/cuda-10.0/bin/nvcc nvcc: NVIDIA (R) Cuda
> > compiler driver;Copyright (c) 2005-2018 NVIDIA Corporation;Built on
> > Sat_Aug_25_21:10:00_CDT_2018;Cuda compilation tools, release 10.0,
> > V10.0.130
> > CUDA compiler
> >
> >
> flags:-gencode;arch=compute_30,code=sm_30;-gencode;arch=compute_35,code=sm_35;-gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_52,code=sm_52;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=compute_75;-use_fast_math;;;
> >
> >
> -mcpu=power9;-mtune=power9;-mvsx;-std=c++11;-O2;-DNDEBUG;-funroll-all-loops;-fexcess-precision=fast;
> > CUDA driver:        10.10
> > CUDA runtime:       10.0
> >
> >
> > Running on 1 node with total 160 cores, 160 logical cores, 1 compatible
> GPU
> > Hardware detected:
> >   CPU info:
> >     Vendor: IBM
> >     Brand:  POWER9, altivec supported
> >     Family: 0   Model: 0   Stepping: 0
> >     Features: vmx vsx
> >   Hardware topology: Only logical processor count
> >   GPU info:
> >     Number of GPUs detected: 1
> >     #0: NVIDIA Tesla V100-SXM2-16GB, compute cap.: 7.0, ECC: yes, stat:
> > compatible
> >
> >
> > ++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
> >
> > *SKIPPED*
> >
> > Input Parameters:
> >    integrator                     = steep
> >    tinit                          = 0
> >    dt                             = 0.001
> >    nsteps                         = 50000
> >    init-step                      = 0
> >    simulation-part                = 1
> >    comm-mode                      = Linear
> >    nstcomm                        = 100
> >    bd-fric                        = 0
> >    ld-seed                        = 1941752878
> >    emtol                          = 100
> >    emstep                         = 0.01
> >    niter                          = 20
> >    fcstep                         = 0
> >    nstcgsteep                     = 1000
> >    nbfgscorr                      = 10
> >    rtpi                           = 0.05
> >    nstxout                        = 0
> >    nstvout                        = 0
> >    nstfout                        = 0
> >    nstlog                         = 1000
> >    nstcalcenergy                  = 100
> >    nstenergy                      = 1000
> >    nstxout-compressed             = 0
> >    compressed-x-precision         = 1000
> >    cutoff-scheme                  = Verlet
> >    nstlist                        = 1
> >    ns-type                        = Grid
> >    pbc                            = xyz
> >    periodic-molecules             = true
> >    verlet-buffer-tolerance        = 0.005
> >    rlist                          = 1.2
> >    coulombtype                    = PME
> >    coulomb-modifier               = Potential-shift
> >    rcoulomb-switch                = 0
> >    rcoulomb                       = 1.2
> >    epsilon-r                      = 1
> >    epsilon-rf                     = inf
> >    vdw-type                       = Cut-off
> >    vdw-modifier                   = Potential-shift
> >    rvdw-switch                    = 0
> >    rvdw                           = 1.2
> >    DispCorr                       = No
> >    table-extension                = 1
> >    fourierspacing                 = 0.12
> >    fourier-nx                     = 52
> >    fourier-ny                     = 52
> >    fourier-nz                     = 52
> >    pme-order                      = 4
> >    ewald-rtol                     = 1e-05
> >    ewald-rtol-lj                  = 0.001
> >    lj-pme-comb-rule               = Geometric
> >    ewald-geometry                 = 0
> >    epsilon-surface                = 0
> >    tcoupl                         = No
> >    nsttcouple                     = -1
> >    nh-chain-length                = 0
> >    print-nose-hoover-chain-variables = false
> >    pcoupl                         = No
> >    pcoupltype                     = Isotropic
> >    nstpcouple                     = -1
> >    tau-p                          = 1
> >    compressibility (3x3):
> >       compressibility[    0]={ 0.00000e+00,  0.00000e+00,  0.00000e+00}
> >       compressibility[    1]={ 0.00000e+00,  0.00000e+00,  0.00000e+00}
> >       compressibility[    2]={ 0.00000e+00,  0.00000e+00,  0.00000e+00}
> >    ref-p (3x3):
> >       ref-p[    0]={ 0.00000e+00,  0.00000e+00,  0.00000e+00}
> >       ref-p[    1]={ 0.00000e+00,  0.00000e+00,  0.00000e+00}
> >       ref-p[    2]={ 0.00000e+00,  0.00000e+00,  0.00000e+00}
> >    refcoord-scaling               = No
> >    posres-com (3):
> >       posres-com[0]= 0.00000e+00
> >       posres-com[1]= 0.00000e+00
> >       posres-com[2]= 0.00000e+00
> >    posres-comB (3):
> >       posres-comB[0]= 0.00000e+00
> >       posres-comB[1]= 0.00000e+00
> >       posres-comB[2]= 0.00000e+00
> >    QMMM                           = false
> >    QMconstraints                  = 0
> >    QMMMscheme                     = 0
> >    MMChargeScaleFactor            = 1
> > qm-opts:
> >    ngQM                           = 0
> >    constraint-algorithm           = Lincs
> >    continuation                   = false
> >    Shake-SOR                      = false
> >    shake-tol                      = 0.0001
> >    lincs-order                    = 4
> >    lincs-iter                     = 1
> >    lincs-warnangle                = 30
> >    nwall                          = 0
> >    wall-type                      = 9-3
> >    wall-r-linpot                  = -1
> >    wall-atomtype[0]               = -1
> >    wall-atomtype[1]               = -1
> >    wall-density[0]                = 0
> >    wall-density[1]                = 0
> >    wall-ewald-zfac                = 3
> >    pull                           = false
> >    awh                            = false
> >    rotation                       = false
> >    interactiveMD                  = false
> >    disre                          = No
> >    disre-weighting                = Conservative
> >    disre-mixed                    = false
> >    dr-fc                          = 1000
> >    dr-tau                         = 0
> >    nstdisreout                    = 100
> >    orire-fc                       = 0
> >    orire-tau                      = 0
> >    nstorireout                    = 100
> >    free-energy                    = no
> >    cos-acceleration               = 0
> >    deform (3x3):
> >       deform[    0]={ 0.00000e+00,  0.00000e+00,  0.00000e+00}
> >       deform[    1]={ 0.00000e+00,  0.00000e+00,  0.00000e+00}
> >       deform[    2]={ 0.00000e+00,  0.00000e+00,  0.00000e+00}
> >    simulated-tempering            = false
> >    swapcoords                     = no
> >    userint1                       = 0
> >    userint2                       = 0
> >    userint3                       = 0
> >    userint4                       = 0
> >    userreal1                      = 0
> >    userreal2                      = 0
> >    userreal3                      = 0
> >    userreal4                      = 0
> >    applied-forces:
> >      electric-field:
> >        x:
> >          E0                       = 0
> >          omega                    = 0
> >          t0                       = 0
> >          sigma                    = 0
> >        y:
> >          E0                       = 0
> >          omega                    = 0
> >          t0                       = 0
> >          sigma                    = 0
> >        z:
> >          E0                       = 0
> >          omega                    = 0
> >          t0                       = 0
> >          sigma                    = 0
> > grpopts:
> >    nrdf:       47805
> >    ref-t:           0
> >    tau-t:           0
> > annealing:          No
> > annealing-npoints:           0
> >    acc:            0           0           0
> >    nfreeze:           N           N           N
> >    energygrp-flags[  0]: 0
> >
> >
> > Initializing Domain Decomposition on 4 ranks
> > NOTE: disabling dynamic load balancing as it is only supported with
> > dynamics, not with integrator 'steep'.
> > Dynamic load balancing: auto
> > Using update groups, nr 10529, average size 2.5 atoms, max. radius 0.078
> nm
> > Minimum cell size due to atom displacement: 0.000 nm
> > NOTE: Periodic molecules are present in this system. Because of this, the
> > domain decomposition algorithm cannot easily determine the minimum cell
> > size that it requires for treating bonded interactions. Instead, domain
> > decomposition will assume that half the non-bonded cut-off will be a
> > suitable lower bound.
> > Minimum cell size due to bonded interactions: 0.678 nm
> > Using 0 separate PME ranks, as there are too few total
> >  ranks for efficient splitting
> > Optimizing the DD grid for 4 cells with a minimum initial size of 0.678
> nm
> > The maximum allowed number of cells is: X 8 Y 8 Z 8
> > Domain decomposition grid 1 x 4 x 1, separate PME ranks 0
> > PME domain decomposition: 1 x 4 x 1
> > Domain decomposition rank 0, coordinates 0 0 0
> >
> > The initial number of communication pulses is: Y 1
> > The initial domain decomposition cell size is: Y 1.50 nm
> >
> > The maximum allowed distance for atom groups involved in interactions is:
> >                  non-bonded interactions           1.356 nm
> >             two-body bonded interactions  (-rdd)   1.356 nm
> >           multi-body bonded interactions  (-rdd)   1.356 nm
> >               virtual site constructions  (-rcon)  1.503 nm
> >
> > Using 4 MPI threads
> > Using 4 OpenMP threads per tMPI thread
> >
> >
> > Overriding thread affinity set outside gmx mdrun
> >
> > Pinning threads with a user-specified logical core stride of 2
> >
> > NOTE: Thread affinity was not set.
> >
>
> The threads are not pinned -- see above --, but why I can't say. I suggest:
> i) talk to your admins ii) try to tell the job scheduler to not set
> affinities and let mdrun set it.
>
>
> > System total charge: 0.000
> > Will do PME sum in reciprocal space for electrostatic interactions.
> >
> > ++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
> > U. Essmann, L. Perera, M. L. Berkowitz, T. Darden, H. Lee and L. G.
> > Pedersen
> > A smooth particle mesh Ewald method
> > J. Chem. Phys. 103 (1995) pp. 8577-8592
> > -------- -------- --- Thank You --- -------- --------
> >
> > Using a Gaussian width (1/beta) of 0.384195 nm for Ewald
> > Potential shift: LJ r^-12: -1.122e-01 r^-6: -3.349e-01, Ewald -8.333e-06
> > Initialized non-bonded Ewald correction tables, spacing: 1.02e-03 size:
> > 1176
> >
> > Generated table with 1100 data points for 1-4 COUL.
> > Tabscale = 500 points/nm
> > Generated table with 1100 data points for 1-4 LJ6.
> > Tabscale = 500 points/nm
> > Generated table with 1100 data points for 1-4 LJ12.
> > Tabscale = 500 points/nm
> >
> > Using SIMD 4x4 nonbonded short-range kernels
> >
> > Using a 4x4 pair-list setup:
> >   updated every 1 steps, buffer 0.000 nm, rlist 1.200 nm
> >
> > Using geometric Lennard-Jones combination rule
> >
> >
> > ++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
> > S. Miyamoto and P. A. Kollman
> > SETTLE: An Analytical Version of the SHAKE and RATTLE Algorithms for
> Rigid
> > Water Models
> > J. Comp. Chem. 13 (1992) pp. 952-962
> > -------- -------- --- Thank You --- -------- --------
> >
> >
> > Linking all bonded interactions to atoms
> > There are 5407 inter charge-group virtual sites,
> > will an extra communication step for selected coordinates and forces
> >
> >
> > Note that activating steepest-descent energy minimization via the
> > integrator .mdp option and the command gmx mdrun may be available in a
> > different form in a future version of GROMACS, e.g. gmx minimize and an
> > .mdp option.
> > Initiating Steepest Descents
> >
> > Atom distribution over 4 domains: av 6687 stddev 134 min 6515 max 6792
> > Started Steepest Descents on rank 0 Thu May  9 15:49:36 2019
> > --
> > Gromacs Users mailing list
> >
> > * Please search the archive at
> > http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
> > posting!
> >
> > * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
> >
> > * For (un)subscribe requests visit
> > https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
> > send a mail to gmx-users-requ...@gromacs.org.
> >
> --
> Gromacs Users mailing list
>
> * Please search the archive at
> http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before
> posting!
>
> * Can't post? Read http://www.gromacs.org/Support/Mailing_Lists
>
> * For (un)subscribe requests visit
> https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or
> send a mail to gmx-users-requ...@gromacs.org.
>
-- 
Gromacs Users mailing list

* Please search the archive at 
http://www.gromacs.org/Support/Mailing_Lists/GMX-Users_List before posting!

* Can't post? Read http://www.gromacs.org/Support/Mailing_Lists

* For (un)subscribe requests visit
https://maillist.sys.kth.se/mailman/listinfo/gromacs.org_gmx-users or send a 
mail to gmx-users-requ...@gromacs.org.

Reply via email to