Hi, I’m Rohan, a student working on compilation techniques for distributed tensor computations. I’m looking at using PETSc as a baseline for experiments I’m running, and want to understand if I’m using PETSc as it was intended to achieve high performance, and if the performance I’m seeing is expected. Currently, I’m just looking at SpMV operations.
My experiments are run on the Lassen Supercomputer ( https://hpc.llnl.gov/hardware/platforms/lassen). The system has 40 CPUs, 4 V100s and an Infiniband interconnect. A visualization of the architecture is here: https://hpc.llnl.gov/sites/default/files/power9-AC922systemDiagram2_1.png. As of now, I’m trying to understand the single-node performance of PETSc, as the scaling performance onto multiple nodes appears to be as I expect. I’m using the arabic-2005 sparse matrix from the SuiteSparse matrix collection, detailed here: https://sparse.tamu.edu/LAW/arabic-2005. As a trusted baseline, I am comparing against SpMV code generated by the TACO compiler ( http://tensor-compiler.org/codegen.html?expr=y(i)%20=%20A(i,j)%20*%20x(j)&format=y:d:0;A:ds:0,1;x:d:0&sched=split:i:i0:i1:32;reorder:i0:i1:j;parallelize:i0:CPU%20Thread:No%20Races) . My experiments find that PETSc is roughly 4 times slower on a single thread and node than the kernel generated by TACO: PETSc: 1 Thread: 5694.72 ms, 1 Node 40 threads: 262.6 ms. TACO: 1 Thread: 1341 ms, 1 Node 40 threads: 86 ms. My code using PETSc is here: https://github.com/rohany/taco/blob/9e0e30b16bfba5319b15b2d1392f35376952f838/petsc/benchmark.cpp#L38 . Runs from 1 thread and 1 node with -log_view are attached to the email. The command lines for each were as follows: 1 node 1 thread: `jsrun -n 1 -c 1 -r 1 -b rs ./bin/benchmark -n 20 -warmup 10 -matrix $TENSOR_DIR/arabic-2005.petsc -log_view` 1 node 40 threads: `jsrun -n 40 -c 1 -r 40 -b rs ./bin/benchmark -n 20 -warmup 10 -matrix $TENSOR_DIR/arabic-2005.petsc -log_view` In addition to these benchmarking concerns, I wanted to share my experiences trying to load data from Matrix Market files into PETSc, which ended up 1being much more difficult than I anticipated. Essentially, trying to iterate through the Matrix Market files and using `write` to insert entries into a `Mat` was extremely slow. In order to get reasonable performance, I had to use an external utility to basically construct a CSR matrix, and then pass the arrays from the CSR Matrix into `MatCreateSeqAIJWithArrays`. I couldn’t find any more guidance on PETSc forums or Google, so I wanted to know if this was the right way to go. Thanks, Rohan Yadav
/g/g15/yadav2/.bashrc: line 1: module: command not found /g/g15/yadav2/.bashrc: line 2: module: command not found /g/g15/yadav2/.bashrc: line 3: module: command not found /g/g15/yadav2/.bashrc: line 6: module: command not found /g/g15/yadav2/.bashrc: line 7: module: command not found Before matrix load After matrix load Average time: 5652.444561 ms. ************************************************************************************************************************ *** WIDEN YOUR WINDOW TO 120 CHARACTERS. Use 'enscript -r -fCourier9' to print this document *** ************************************************************************************************************************ ---------------------------------------------- PETSc Performance Summary: ---------------------------------------------- ./bin/benchmark on a named lassen776 with 1 processor, by yadav2 Fri Dec 10 15:28:04 2021 Using Petsc Release Version 3.13.0, Mar 29, 2020 Max Max/Min Avg Total Time (sec): 2.731e+02 1.000 2.731e+02 Objects: 5.000e+00 1.000 5.000e+00 Flop: 3.782e+10 1.000 3.782e+10 3.782e+10 Flop/sec: 1.385e+08 1.000 1.385e+08 1.385e+08 MPI Messages: 0.000e+00 0.000 0.000e+00 0.000e+00 MPI Message Lengths: 0.000e+00 0.000 0.000e+00 0.000e+00 MPI Reductions: 0.000e+00 0.000 Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract) e.g., VecAXPY() for real vectors of length N --> 2N flop and VecAXPY() for complex vectors of length N --> 8N flop Summary of Stages: ----- Time ------ ----- Flop ------ --- Messages --- -- Message Lengths -- -- Reductions -- Avg %Total Avg %Total Count %Total Avg %Total Count %Total 0: Main Stage: 2.7308e+02 100.0% 3.7817e+10 100.0% 0.000e+00 0.0% 0.000e+00 0.0% 0.000e+00 0.0% ------------------------------------------------------------------------------------------------------------------------ See the 'Profiling' chapter of the users' manual for details on interpreting output. Phase summary info: Count: number of times phase was executed Time and Flop: Max - maximum over all processors Ratio - ratio of maximum to minimum over all processors Mess: number of messages sent AvgLen: average message length (bytes) Reduct: number of global reductions Global: entire computation Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop(). %T - percent time in this phase %F - percent flop in this phase %M - percent messages in this phase %L - percent message lengths in this phase %R - percent reductions in this phase Total Mflop/s: 10e-6 * (sum of flop over all processors)/(max time over all processors) ------------------------------------------------------------------------------------------------------------------------ Event Count Time (sec) Flop --- Global --- --- Stage ---- Total Max Ratio Max Ratio Max Ratio Mess AvgLen Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s ------------------------------------------------------------------------------------------------------------------------ --- Event Stage 0: Main Stage MatMult 30 1.0 1.6957e+02 1.0 3.78e+10 1.0 0.0e+00 0.0e+00 0.0e+00 62100 0 0 0 62100 0 0 0 223 MatAssemblyBegin 1 1.0 7.2800e-07 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatAssemblyEnd 1 1.0 4.9849e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 2 0 0 0 0 2 0 0 0 0 0 MatLoad 1 1.0 1.0329e+02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 38 0 0 0 0 38 0 0 0 0 0 VecSet 4 1.0 2.0869e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 ------------------------------------------------------------------------------------------------------------------------ Memory usage is given in bytes: Object Type Creations Destructions Memory Descendants' Mem. Reports information only for process 0. --- Event Stage 0: Main Stage Matrix 1 0 0 0. Viewer 2 0 0 0. Vector 2 0 0 0. ======================================================================================================================== Average time to get PetscTime(): 4.98e-08 #PETSc Option Table entries: -log_view -matload_block_size 1 -matrix /p/gpfs1/yadav2/tensors//arabic-2005.petsc -n 20 -warmup 10 #End of PETSc Option Table entries Compiled without FORTRAN kernels Compiled with full precision matrices (default) sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 sizeof(PetscInt) 4 Configure options: --prefix=/usr/tcetmp/packages/petsc/build/3.13.0/spack/opt/spack/linux-rhel7-power9le/xl_r-16.1/petsc-3.13.0-p3r7bdjqozrknezyzslglns7czzjnx2q --with-ssl=0 --download-c2html=0 --download-sowing=0 --download-hwloc=0 CFLAGS="-g -DNoChange -qfullpath" FFLAGS="-g -qfullpath -qzerosize -qxlf2003=polymorphic" CXXFLAGS= --with-cc=/usr/tce/packages/spectrum-mpi/spectrum-mpi-rolling-release-xl-2020.03.18/bin/mpixlc --with-cxx=/usr/tce/packages/spectrum-mpi/spectrum-mpi-rolling-release-xl-2020.03.18/bin/mpixlC --with-fc=/usr/tce/packages/spectrum-mpi/spectrum-mpi-rolling-release-xl-2020.03.18/bin/mpixlf --with-precision=double --with-scalar-type=real --with-shared-libraries=1 --with-debugging=0 --with-64-bit-indices=0 COPTFLAGS= FOPTFLAGS= CXXOPTFLAGS= --with-blaslapack-lib="/usr/tcetmp/packages/lapack/lapack-3.8.0-xl-2019.08.20/lib/liblapack.so /usr/tcetmp/packages/lapack/lapack-3.8.0-xl-2019.08.20/lib/libblas.so" --with-x=0 --with-clanguage=C --with-scalapack=0 --with-metis=1 --with-metis-dir=/usr/tcetmp/packages/petsc/build/3.13.0/spack/opt/spack/linux-rhel7-power9le/xl_r-16.1/metis-5.1.0-4n7lo2n2smlcxs7lcywnifyjjbdsmslv --with-hdf5=1 --with-hdf5-dir=/usr/tcetmp/packages/petsc/build/3.13.0/spack/opt/spack/linux-rhel7-power9le/xl_r-16.1/hdf5-1.10.6-e7e7urb5k7va3ib7j4uro56grvzmcmd4 --with-hypre=1 --with-hypre-dir=/usr/tcetmp/packages/petsc/build/3.13.0/spack/opt/spack/linux-rhel7-power9le/xl_r-16.1/hypre-2.18.2-xqkfpph37m6w5orp7njh4xwa7cxdarnk --with-parmetis=1 --with-parmetis-dir=/usr/tcetmp/packages/petsc/build/3.13.0/spack/opt/spack/linux-rhel7-power9le/xl_r-16.1/parmetis-4.0.3-csuosao5j54uozjs6dkt3vyyiotjxlcz --with-mumps=0 --with-trilinos=0 --with-fftw=0 --with-valgrind=0 --with-cxx-dialect=C++11 --with-superlu_dist-include=/usr/tcetmp/packages/petsc/build/3.13.0/spack/opt/spack/linux-rhel7-power9le/xl_r-16.1/superlu-dist-6.3.0-ez2nmnbchitslys66t2pqyah42stvi3u/include --with-superlu_dist-lib=/usr/tcetmp/packages/petsc/build/3.13.0/spack/opt/spack/linux-rhel7-power9le/xl_r-16.1/superlu-dist-6.3.0-ez2nmnbchitslys66t2pqyah42stvi3u/lib/libsuperlu_dist.a --with-superlu_dist=1 --with-suitesparse=0 --with-zlib-include=/usr/include --with-zlib-lib=/usr/lib64/libz.so --with-zlib=1 ----------------------------------------- Libraries compiled on 2020-04-09 16:35:17 on rzansel18 Machine characteristics: Linux-4.14.0-115.10.1.1chaos.ch6a.ppc64le-ppc64le-with-redhat-7.6-Maipo Using PETSc directory: /usr/tcetmp/packages/petsc/build/3.13.0/spack/opt/spack/linux-rhel7-power9le/xl_r-16.1/petsc-3.13.0-p3r7bdjqozrknezyzslglns7czzjnx2q Using PETSc arch: ----------------------------------------- Using C compiler: /usr/tce/packages/spectrum-mpi/spectrum-mpi-rolling-release-xl-2020.03.18/bin/mpixlc -g -DNoChange -qfullpath Using Fortran compiler: /usr/tce/packages/spectrum-mpi/spectrum-mpi-rolling-release-xl-2020.03.18/bin/mpixlf -g -qfullpath -qzerosize -qxlf2003=polymorphic ----------------------------------------- Using include paths: -I/usr/tcetmp/packages/petsc/build/3.13.0/spack/opt/spack/linux-rhel7-power9le/xl_r-16.1/petsc-3.13.0-p3r7bdjqozrknezyzslglns7czzjnx2q/include -I/usr/tcetmp/packages/petsc/build/3.13.0/spack/opt/spack/linux-rhel7-power9le/xl_r-16.1/hypre-2.18.2-xqkfpph37m6w5orp7njh4xwa7cxdarnk/include -I/usr/tcetmp/packages/petsc/build/3.13.0/spack/opt/spack/linux-rhel7-power9le/xl_r-16.1/superlu-dist-6.3.0-ez2nmnbchitslys66t2pqyah42stvi3u/include -I/usr/tcetmp/packages/petsc/build/3.13.0/spack/opt/spack/linux-rhel7-power9le/xl_r-16.1/hdf5-1.10.6-e7e7urb5k7va3ib7j4uro56grvzmcmd4/include -I/usr/tcetmp/packages/petsc/build/3.13.0/spack/opt/spack/linux-rhel7-power9le/xl_r-16.1/parmetis-4.0.3-csuosao5j54uozjs6dkt3vyyiotjxlcz/include -I/usr/tcetmp/packages/petsc/build/3.13.0/spack/opt/spack/linux-rhel7-power9le/xl_r-16.1/metis-5.1.0-4n7lo2n2smlcxs7lcywnifyjjbdsmslv/include -I/usr/include ----------------------------------------- Using C linker: /usr/tce/packages/spectrum-mpi/spectrum-mpi-rolling-release-xl-2020.03.18/bin/mpixlc Using Fortran linker: /usr/tce/packages/spectrum-mpi/spectrum-mpi-rolling-release-xl-2020.03.18/bin/mpixlf Using libraries: -Wl,-rpath,/usr/tcetmp/packages/petsc/build/3.13.0/spack/opt/spack/linux-rhel7-power9le/xl_r-16.1/petsc-3.13.0-p3r7bdjqozrknezyzslglns7czzjnx2q/lib -L/usr/tcetmp/packages/petsc/build/3.13.0/spack/opt/spack/linux-rhel7-power9le/xl_r-16.1/petsc-3.13.0-p3r7bdjqozrknezyzslglns7czzjnx2q/lib -lpetsc -Wl,-rpath,/usr/tcetmp/packages/petsc/build/3.13.0/spack/opt/spack/linux-rhel7-power9le/xl_r-16.1/hypre-2.18.2-xqkfpph37m6w5orp7njh4xwa7cxdarnk/lib -L/usr/tcetmp/packages/petsc/build/3.13.0/spack/opt/spack/linux-rhel7-power9le/xl_r-16.1/hypre-2.18.2-xqkfpph37m6w5orp7njh4xwa7cxdarnk/lib -Wl,-rpath,/usr/tcetmp/packages/petsc/build/3.13.0/spack/opt/spack/linux-rhel7-power9le/xl_r-16.1/superlu-dist-6.3.0-ez2nmnbchitslys66t2pqyah42stvi3u/lib -L/usr/tcetmp/packages/petsc/build/3.13.0/spack/opt/spack/linux-rhel7-power9le/xl_r-16.1/superlu-dist-6.3.0-ez2nmnbchitslys66t2pqyah42stvi3u/lib -Wl,-rpath,/usr/tcetmp/packages/lapack/lapack-3.8.0-xl-2019.08.20/lib -L/usr/tcetmp/packages/lapack/lapack-3.8.0-xl-2019.08.20/lib -Wl,-rpath,/usr/tcetmp/packages/petsc/build/3.13.0/spack/opt/spack/linux-rhel7-power9le/xl_r-16.1/hdf5-1.10.6-e7e7urb5k7va3ib7j4uro56grvzmcmd4/lib -L/usr/tcetmp/packages/petsc/build/3.13.0/spack/opt/spack/linux-rhel7-power9le/xl_r-16.1/hdf5-1.10.6-e7e7urb5k7va3ib7j4uro56grvzmcmd4/lib -Wl,-rpath,/usr/tcetmp/packages/petsc/build/3.13.0/spack/opt/spack/linux-rhel7-power9le/xl_r-16.1/parmetis-4.0.3-csuosao5j54uozjs6dkt3vyyiotjxlcz/lib -L/usr/tcetmp/packages/petsc/build/3.13.0/spack/opt/spack/linux-rhel7-power9le/xl_r-16.1/parmetis-4.0.3-csuosao5j54uozjs6dkt3vyyiotjxlcz/lib -Wl,-rpath,/usr/tcetmp/packages/petsc/build/3.13.0/spack/opt/spack/linux-rhel7-power9le/xl_r-16.1/metis-5.1.0-4n7lo2n2smlcxs7lcywnifyjjbdsmslv/lib -L/usr/tcetmp/packages/petsc/build/3.13.0/spack/opt/spack/linux-rhel7-power9le/xl_r-16.1/metis-5.1.0-4n7lo2n2smlcxs7lcywnifyjjbdsmslv/lib /usr/lib64/libz.so -Wl,-rpath,/usr/tce/packages/spectrum-mpi/ibm/spectrum-mpi-rolling-release/lib -L/usr/tce/packages/spectrum-mpi/ibm/spectrum-mpi-rolling-release/lib -Wl,-rpath,/usr/tce/packages/xl/xl-2020.03.18/xlsmp/5.1.1/lib -L/usr/tce/packages/xl/xl-2020.03.18/xlsmp/5.1.1/lib -Wl,-rpath,/usr/tce/packages/xl/xl-2020.03.18/xlmass/9.1.1/lib -L/usr/tce/packages/xl/xl-2020.03.18/xlmass/9.1.1/lib -Wl,-rpath,/usr/tce/packages/xl/xl-2020.03.18/xlf/16.1.1/lib -L/usr/tce/packages/xl/xl-2020.03.18/xlf/16.1.1/lib -Wl,-rpath,/usr/tce/packages/xl/xl-2020.03.18/lib -Wl,-rpath,/usr/tce/packages/gcc/gcc-4.9.3/gnu/lib64/gcc/powerpc64le-unknown-linux-gnu/4.9.3 -L/usr/tce/packages/gcc/gcc-4.9.3/gnu/lib64/gcc/powerpc64le-unknown-linux-gnu/4.9.3 -Wl,-rpath,/usr/tce/packages/gcc/gcc-4.9.3/gnu/lib64 -L/usr/tce/packages/gcc/gcc-4.9.3/gnu/lib64 -Wl,-rpath,/usr/tce/packages/gcc/gcc-4.9.3/gnu/lib:/usr/tce/packages/gcc/gcc-4.9.3/gnu/lib64 -Wl,-rpath,/usr/tce/packages/xl/xl-2020.03.18/xlC/16.1.1/lib -L/usr/tce/packages/xl/xl-2020.03.18/xlC/16.1.1/lib -lHYPRE -lsuperlu_dist -llapack -lblas -lhdf5_hl -lhdf5 -lparmetis -lmetis -ldl -lmpiprofilesupport -lmpi_ibm_usempi -lmpi_ibm_mpifh -lmpi_ibm -lxlf90_r -lxlopt -lxl -lxlfmath -lgcc_s -lrt -lpthread -lm -ldl -lmpiprofilesupport -lmpi_ibm -lxlopt -lxl -libmc++ -lstdc++ -lm -lgcc_s -lpthread -ldl ----------------------------------------- logout ------------------------------------------------------------ Sender: LSF System <lsfadmin@lassen710> Subject: Job 3035809: <jsrun -n 1 -c 1 -r 1 -b rs ./bin/benchmark -n 20 -warmup 10 -matrix /p/gpfs1/yadav2/tensors//arabic-2005.petsc -log_view> in cluster <lassen> Done Job <jsrun -n 1 -c 1 -r 1 -b rs ./bin/benchmark -n 20 -warmup 10 -matrix /p/gpfs1/yadav2/tensors//arabic-2005.petsc -log_view> was submitted from host <lassen627> by user <yadav2> in cluster <lassen> at Fri Dec 10 15:23:21 2021 Job was executed on host(s) <1*lassen710>, in queue <pbatch>, as user <yadav2> in cluster <lassen> at Fri Dec 10 15:23:24 2021 <40*lassen776> </g/g15/yadav2> was used as the home directory. </g/g15/yadav2/taco/petsc> was used as the working directory. Started at Fri Dec 10 15:23:24 2021 Terminated at Fri Dec 10 15:28:35 2021 Results reported at Fri Dec 10 15:28:35 2021 Your job looked like: ------------------------------------------------------------ # LSBATCH: User input jsrun -n 1 -c 1 -r 1 -b rs ./bin/benchmark -n 20 -warmup 10 -matrix /p/gpfs1/yadav2/tensors//arabic-2005.petsc -log_view ------------------------------------------------------------ Successfully completed. Resource usage summary: CPU time : 0.41 sec. Max Memory : 158 MB Average Memory : 68.49 MB Total Requested Memory : - Delta Memory : - Max Swap : 1060 MB Max Processes : 4 Max Threads : 27 Run time : 311 sec. Turnaround time : 314 sec. The output (if any) is above this job summary.
/g/g15/yadav2/.bashrc: line 1: module: command not found /g/g15/yadav2/.bashrc: line 2: module: command not found /g/g15/yadav2/.bashrc: line 3: module: command not found /g/g15/yadav2/.bashrc: line 6: module: command not found /g/g15/yadav2/.bashrc: line 7: module: command not found Before matrix load After matrix load Average time: 262.627921 ms. ************************************************************************************************************************ *** WIDEN YOUR WINDOW TO 120 CHARACTERS. Use 'enscript -r -fCourier9' to print this document *** ************************************************************************************************************************ ---------------------------------------------- PETSc Performance Summary: ---------------------------------------------- ./bin/benchmark on a named lassen772 with 40 processors, by yadav2 Fri Dec 10 15:25:40 2021 Using Petsc Release Version 3.13.0, Mar 29, 2020 Max Max/Min Avg Total Time (sec): 1.093e+02 1.000 1.093e+02 Objects: 1.300e+01 1.000 1.300e+01 Flop: 1.715e+09 3.071 9.456e+08 3.783e+10 Flop/sec: 1.569e+07 3.071 8.652e+06 3.461e+08 MPI Messages: 1.365e+03 1.233 1.247e+03 4.987e+04 MPI Message Lengths: 3.821e+09 58.505 1.632e+05 8.137e+09 MPI Reductions: 3.200e+01 1.000 Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract) e.g., VecAXPY() for real vectors of length N --> 2N flop and VecAXPY() for complex vectors of length N --> 8N flop Summary of Stages: ----- Time ------ ----- Flop ------ --- Messages --- -- Message Lengths -- -- Reductions -- Avg %Total Avg %Total Count %Total Avg %Total Count %Total 0: Main Stage: 1.0929e+02 100.0% 3.7825e+10 100.0% 4.987e+04 100.0% 1.632e+05 100.0% 2.500e+01 78.1% ------------------------------------------------------------------------------------------------------------------------ See the 'Profiling' chapter of the users' manual for details on interpreting output. Phase summary info: Count: number of times phase was executed Time and Flop: Max - maximum over all processors Ratio - ratio of maximum to minimum over all processors Mess: number of messages sent AvgLen: average message length (bytes) Reduct: number of global reductions Global: entire computation Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop(). %T - percent time in this phase %F - percent flop in this phase %M - percent messages in this phase %L - percent message lengths in this phase %R - percent reductions in this phase Total Mflop/s: 10e-6 * (sum of flop over all processors)/(max time over all processors) ------------------------------------------------------------------------------------------------------------------------ Event Count Time (sec) Flop --- Global --- --- Stage ---- Total Max Ratio Max Ratio Max Ratio Mess AvgLen Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s ------------------------------------------------------------------------------------------------------------------------ --- Event Stage 0: Main Stage BuildTwoSided 1 1.0 9.2847e-0322.5 0.00e+00 0.0 1.6e+03 4.0e+00 1.0e+00 0 0 3 0 3 0 0 3 0 4 0 MatMult 30 1.0 7.8912e+00 1.0 1.71e+09 3.1 4.7e+04 1.1e+04 0.0e+00 7100 93 6 0 7100 93 6 0 4793 MatAssemblyBegin 1 1.0 1.0711e-05 2.4 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatAssemblyEnd 1 1.0 2.3071e+00 6.3 0.00e+00 0.0 3.1e+03 2.8e+03 5.0e+00 2 0 6 0 16 2 0 6 0 20 0 MatLoad 1 1.0 1.0140e+02 1.0 0.00e+00 0.0 3.3e+03 2.3e+06 1.9e+01 93 0 7 94 59 93 0 7 94 76 0 VecSet 3 1.0 5.3865e-03 1.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecScatterBegin 30 1.0 4.3584e-02 2.7 0.00e+00 0.0 4.7e+04 1.1e+04 0.0e+00 0 0 93 6 0 0 0 93 6 0 0 VecScatterEnd 30 1.0 4.9138e+00853.7 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 3 0 0 0 0 3 0 0 0 0 0 SFSetGraph 1 1.0 9.3534e-04 3.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 SFSetUp 1 1.0 1.1334e-02 2.1 0.00e+00 0.0 3.1e+03 2.8e+03 1.0e+00 0 0 6 0 3 0 0 6 0 4 0 SFBcastOpBegin 30 1.0 4.3470e-02 2.7 0.00e+00 0.0 4.7e+04 1.1e+04 0.0e+00 0 0 93 6 0 0 0 93 6 0 0 SFBcastOpEnd 30 1.0 4.9136e+00862.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 3 0 0 0 0 3 0 0 0 0 0 SFPack 30 1.0 3.5817e-02 2.7 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 SFUnpack 30 1.0 3.0848e-05 5.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 ------------------------------------------------------------------------------------------------------------------------ Memory usage is given in bytes: Object Type Creations Destructions Memory Descendants' Mem. Reports information only for process 0. --- Event Stage 0: Main Stage Matrix 3 0 0 0. Viewer 2 0 0 0. Vec Scatter 1 0 0 0. Vector 4 1 1696 0. Index Set 2 2 312812 0. Star Forest Graph 1 0 0 0. ======================================================================================================================== Average time to get PetscTime(): 4.2e-08 Average time for MPI_Barrier(): 1.933e-06 Average time for zero size MPI_Send(): 2.3585e-06 #PETSc Option Table entries: -log_view -matload_block_size 1 -matrix /p/gpfs1/yadav2/tensors//arabic-2005.petsc -n 20 -warmup 10 #End of PETSc Option Table entries Compiled without FORTRAN kernels Compiled with full precision matrices (default) sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 sizeof(PetscInt) 4 Configure options: --prefix=/usr/tcetmp/packages/petsc/build/3.13.0/spack/opt/spack/linux-rhel7-power9le/xl_r-16.1/petsc-3.13.0-p3r7bdjqozrknezyzslglns7czzjnx2q --with-ssl=0 --download-c2html=0 --download-sowing=0 --download-hwloc=0 CFLAGS="-g -DNoChange -qfullpath" FFLAGS="-g -qfullpath -qzerosize -qxlf2003=polymorphic" CXXFLAGS= --with-cc=/usr/tce/packages/spectrum-mpi/spectrum-mpi-rolling-release-xl-2020.03.18/bin/mpixlc --with-cxx=/usr/tce/packages/spectrum-mpi/spectrum-mpi-rolling-release-xl-2020.03.18/bin/mpixlC --with-fc=/usr/tce/packages/spectrum-mpi/spectrum-mpi-rolling-release-xl-2020.03.18/bin/mpixlf --with-precision=double --with-scalar-type=real --with-shared-libraries=1 --with-debugging=0 --with-64-bit-indices=0 COPTFLAGS= FOPTFLAGS= CXXOPTFLAGS= --with-blaslapack-lib="/usr/tcetmp/packages/lapack/lapack-3.8.0-xl-2019.08.20/lib/liblapack.so /usr/tcetmp/packages/lapack/lapack-3.8.0-xl-2019.08.20/lib/libblas.so" --with-x=0 --with-clanguage=C --with-scalapack=0 --with-metis=1 --with-metis-dir=/usr/tcetmp/packages/petsc/build/3.13.0/spack/opt/spack/linux-rhel7-power9le/xl_r-16.1/metis-5.1.0-4n7lo2n2smlcxs7lcywnifyjjbdsmslv --with-hdf5=1 --with-hdf5-dir=/usr/tcetmp/packages/petsc/build/3.13.0/spack/opt/spack/linux-rhel7-power9le/xl_r-16.1/hdf5-1.10.6-e7e7urb5k7va3ib7j4uro56grvzmcmd4 --with-hypre=1 --with-hypre-dir=/usr/tcetmp/packages/petsc/build/3.13.0/spack/opt/spack/linux-rhel7-power9le/xl_r-16.1/hypre-2.18.2-xqkfpph37m6w5orp7njh4xwa7cxdarnk --with-parmetis=1 --with-parmetis-dir=/usr/tcetmp/packages/petsc/build/3.13.0/spack/opt/spack/linux-rhel7-power9le/xl_r-16.1/parmetis-4.0.3-csuosao5j54uozjs6dkt3vyyiotjxlcz --with-mumps=0 --with-trilinos=0 --with-fftw=0 --with-valgrind=0 --with-cxx-dialect=C++11 --with-superlu_dist-include=/usr/tcetmp/packages/petsc/build/3.13.0/spack/opt/spack/linux-rhel7-power9le/xl_r-16.1/superlu-dist-6.3.0-ez2nmnbchitslys66t2pqyah42stvi3u/include --with-superlu_dist-lib=/usr/tcetmp/packages/petsc/build/3.13.0/spack/opt/spack/linux-rhel7-power9le/xl_r-16.1/superlu-dist-6.3.0-ez2nmnbchitslys66t2pqyah42stvi3u/lib/libsuperlu_dist.a --with-superlu_dist=1 --with-suitesparse=0 --with-zlib-include=/usr/include --with-zlib-lib=/usr/lib64/libz.so --with-zlib=1 ----------------------------------------- Libraries compiled on 2020-04-09 16:35:17 on rzansel18 Machine characteristics: Linux-4.14.0-115.10.1.1chaos.ch6a.ppc64le-ppc64le-with-redhat-7.6-Maipo Using PETSc directory: /usr/tcetmp/packages/petsc/build/3.13.0/spack/opt/spack/linux-rhel7-power9le/xl_r-16.1/petsc-3.13.0-p3r7bdjqozrknezyzslglns7czzjnx2q Using PETSc arch: ----------------------------------------- Using C compiler: /usr/tce/packages/spectrum-mpi/spectrum-mpi-rolling-release-xl-2020.03.18/bin/mpixlc -g -DNoChange -qfullpath Using Fortran compiler: /usr/tce/packages/spectrum-mpi/spectrum-mpi-rolling-release-xl-2020.03.18/bin/mpixlf -g -qfullpath -qzerosize -qxlf2003=polymorphic ----------------------------------------- Using include paths: -I/usr/tcetmp/packages/petsc/build/3.13.0/spack/opt/spack/linux-rhel7-power9le/xl_r-16.1/petsc-3.13.0-p3r7bdjqozrknezyzslglns7czzjnx2q/include -I/usr/tcetmp/packages/petsc/build/3.13.0/spack/opt/spack/linux-rhel7-power9le/xl_r-16.1/hypre-2.18.2-xqkfpph37m6w5orp7njh4xwa7cxdarnk/include -I/usr/tcetmp/packages/petsc/build/3.13.0/spack/opt/spack/linux-rhel7-power9le/xl_r-16.1/superlu-dist-6.3.0-ez2nmnbchitslys66t2pqyah42stvi3u/include -I/usr/tcetmp/packages/petsc/build/3.13.0/spack/opt/spack/linux-rhel7-power9le/xl_r-16.1/hdf5-1.10.6-e7e7urb5k7va3ib7j4uro56grvzmcmd4/include -I/usr/tcetmp/packages/petsc/build/3.13.0/spack/opt/spack/linux-rhel7-power9le/xl_r-16.1/parmetis-4.0.3-csuosao5j54uozjs6dkt3vyyiotjxlcz/include -I/usr/tcetmp/packages/petsc/build/3.13.0/spack/opt/spack/linux-rhel7-power9le/xl_r-16.1/metis-5.1.0-4n7lo2n2smlcxs7lcywnifyjjbdsmslv/include -I/usr/include ----------------------------------------- Using C linker: /usr/tce/packages/spectrum-mpi/spectrum-mpi-rolling-release-xl-2020.03.18/bin/mpixlc Using Fortran linker: /usr/tce/packages/spectrum-mpi/spectrum-mpi-rolling-release-xl-2020.03.18/bin/mpixlf Using libraries: -Wl,-rpath,/usr/tcetmp/packages/petsc/build/3.13.0/spack/opt/spack/linux-rhel7-power9le/xl_r-16.1/petsc-3.13.0-p3r7bdjqozrknezyzslglns7czzjnx2q/lib -L/usr/tcetmp/packages/petsc/build/3.13.0/spack/opt/spack/linux-rhel7-power9le/xl_r-16.1/petsc-3.13.0-p3r7bdjqozrknezyzslglns7czzjnx2q/lib -lpetsc -Wl,-rpath,/usr/tcetmp/packages/petsc/build/3.13.0/spack/opt/spack/linux-rhel7-power9le/xl_r-16.1/hypre-2.18.2-xqkfpph37m6w5orp7njh4xwa7cxdarnk/lib -L/usr/tcetmp/packages/petsc/build/3.13.0/spack/opt/spack/linux-rhel7-power9le/xl_r-16.1/hypre-2.18.2-xqkfpph37m6w5orp7njh4xwa7cxdarnk/lib -Wl,-rpath,/usr/tcetmp/packages/petsc/build/3.13.0/spack/opt/spack/linux-rhel7-power9le/xl_r-16.1/superlu-dist-6.3.0-ez2nmnbchitslys66t2pqyah42stvi3u/lib -L/usr/tcetmp/packages/petsc/build/3.13.0/spack/opt/spack/linux-rhel7-power9le/xl_r-16.1/superlu-dist-6.3.0-ez2nmnbchitslys66t2pqyah42stvi3u/lib -Wl,-rpath,/usr/tcetmp/packages/lapack/lapack-3.8.0-xl-2019.08.20/lib -L/usr/tcetmp/packages/lapack/lapack-3.8.0-xl-2019.08.20/lib -Wl,-rpath,/usr/tcetmp/packages/petsc/build/3.13.0/spack/opt/spack/linux-rhel7-power9le/xl_r-16.1/hdf5-1.10.6-e7e7urb5k7va3ib7j4uro56grvzmcmd4/lib -L/usr/tcetmp/packages/petsc/build/3.13.0/spack/opt/spack/linux-rhel7-power9le/xl_r-16.1/hdf5-1.10.6-e7e7urb5k7va3ib7j4uro56grvzmcmd4/lib -Wl,-rpath,/usr/tcetmp/packages/petsc/build/3.13.0/spack/opt/spack/linux-rhel7-power9le/xl_r-16.1/parmetis-4.0.3-csuosao5j54uozjs6dkt3vyyiotjxlcz/lib -L/usr/tcetmp/packages/petsc/build/3.13.0/spack/opt/spack/linux-rhel7-power9le/xl_r-16.1/parmetis-4.0.3-csuosao5j54uozjs6dkt3vyyiotjxlcz/lib -Wl,-rpath,/usr/tcetmp/packages/petsc/build/3.13.0/spack/opt/spack/linux-rhel7-power9le/xl_r-16.1/metis-5.1.0-4n7lo2n2smlcxs7lcywnifyjjbdsmslv/lib -L/usr/tcetmp/packages/petsc/build/3.13.0/spack/opt/spack/linux-rhel7-power9le/xl_r-16.1/metis-5.1.0-4n7lo2n2smlcxs7lcywnifyjjbdsmslv/lib /usr/lib64/libz.so -Wl,-rpath,/usr/tce/packages/spectrum-mpi/ibm/spectrum-mpi-rolling-release/lib -L/usr/tce/packages/spectrum-mpi/ibm/spectrum-mpi-rolling-release/lib -Wl,-rpath,/usr/tce/packages/xl/xl-2020.03.18/xlsmp/5.1.1/lib -L/usr/tce/packages/xl/xl-2020.03.18/xlsmp/5.1.1/lib -Wl,-rpath,/usr/tce/packages/xl/xl-2020.03.18/xlmass/9.1.1/lib -L/usr/tce/packages/xl/xl-2020.03.18/xlmass/9.1.1/lib -Wl,-rpath,/usr/tce/packages/xl/xl-2020.03.18/xlf/16.1.1/lib -L/usr/tce/packages/xl/xl-2020.03.18/xlf/16.1.1/lib -Wl,-rpath,/usr/tce/packages/xl/xl-2020.03.18/lib -Wl,-rpath,/usr/tce/packages/gcc/gcc-4.9.3/gnu/lib64/gcc/powerpc64le-unknown-linux-gnu/4.9.3 -L/usr/tce/packages/gcc/gcc-4.9.3/gnu/lib64/gcc/powerpc64le-unknown-linux-gnu/4.9.3 -Wl,-rpath,/usr/tce/packages/gcc/gcc-4.9.3/gnu/lib64 -L/usr/tce/packages/gcc/gcc-4.9.3/gnu/lib64 -Wl,-rpath,/usr/tce/packages/gcc/gcc-4.9.3/gnu/lib:/usr/tce/packages/gcc/gcc-4.9.3/gnu/lib64 -Wl,-rpath,/usr/tce/packages/xl/xl-2020.03.18/xlC/16.1.1/lib -L/usr/tce/packages/xl/xl-2020.03.18/xlC/16.1.1/lib -lHYPRE -lsuperlu_dist -llapack -lblas -lhdf5_hl -lhdf5 -lparmetis -lmetis -ldl -lmpiprofilesupport -lmpi_ibm_usempi -lmpi_ibm_mpifh -lmpi_ibm -lxlf90_r -lxlopt -lxl -lxlfmath -lgcc_s -lrt -lpthread -lm -ldl -lmpiprofilesupport -lmpi_ibm -lxlopt -lxl -libmc++ -lstdc++ -lm -lgcc_s -lpthread -ldl ----------------------------------------- logout ------------------------------------------------------------ Sender: LSF System <lsfadmin@lassen710> Subject: Job 3035811: <jsrun -n 40 -c 1 -r 40 -b rs ./bin/benchmark -n 20 -warmup 10 -matrix /p/gpfs1/yadav2/tensors//arabic-2005.petsc -log_view> in cluster <lassen> Done Job <jsrun -n 40 -c 1 -r 40 -b rs ./bin/benchmark -n 20 -warmup 10 -matrix /p/gpfs1/yadav2/tensors//arabic-2005.petsc -log_view> was submitted from host <lassen627> by user <yadav2> in cluster <lassen> at Fri Dec 10 15:23:41 2021 Job was executed on host(s) <1*lassen710>, in queue <pbatch>, as user <yadav2> in cluster <lassen> at Fri Dec 10 15:23:43 2021 <40*lassen772> </g/g15/yadav2> was used as the home directory. </g/g15/yadav2/taco/petsc> was used as the working directory. Started at Fri Dec 10 15:23:43 2021 Terminated at Fri Dec 10 15:26:21 2021 Results reported at Fri Dec 10 15:26:21 2021 Your job looked like: ------------------------------------------------------------ # LSBATCH: User input jsrun -n 40 -c 1 -r 40 -b rs ./bin/benchmark -n 20 -warmup 10 -matrix /p/gpfs1/yadav2/tensors//arabic-2005.petsc -log_view ------------------------------------------------------------ Successfully completed. Resource usage summary: CPU time : 0.36 sec. Max Memory : 59 MB Average Memory : 57.24 MB Total Requested Memory : - Delta Memory : - Max Swap : 1425 MB Max Processes : 4 Max Threads : 27 Run time : 157 sec. Turnaround time : 160 sec. The output (if any) is above this job summary.