> > > > > VecPointwiseMult 402 1.0 2.9605e-01 3.6 1.05e+08 1.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 5 1 0 0 0 22515 70608 0 0.00e+00 0 > 0.00e+00 100 > > VecScatterBegin 400 1.0 1.6791e-01 6.0 0.00e+00 0.0 3.7e+05 1.6e+04 > 0.0e+00 0 0 62 54 0 2 0100100 0 0 0 0 0.00e+00 0 > 0.00e+00 0 > > VecScatterEnd 400 1.0 1.0057e+00 7.0 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 5 0 0 0 0 0 0 0 0.00e+00 0 > 0.00e+00 0 > > PCApply 402 1.0 2.9638e-01 3.6 1.05e+08 1.0 0.0e+00 0.0e+00 > 0.0e+00 0 0 0 0 0 5 1 0 0 0 22490 70608 0 0.00e+00 0 > 0.00e+00 100 > > Most of the MatMult time is attributed to VecScatterEnd here. Can you > share a run of the same total problem size on 8 ranks (one rank per GPU)? > > attached. I ran out of memory with the same size problem so this is the 262K / GPU version.
> From the other log file (10x bigger problem) > > ????
DM Object: box 8 MPI processes type: plex box in 3 dimensions: Number of 0-cells per rank: 274625 274625 274625 274625 274625 274625 274625 274625 Number of 1-cells per rank: 811200 811200 811200 811200 811200 811200 811200 811200 Number of 2-cells per rank: 798720 798720 798720 798720 798720 798720 798720 798720 Number of 3-cells per rank: 262144 262144 262144 262144 262144 262144 262144 262144 Labels: celltype: 4 strata with value/size (0 (274625), 1 (811200), 4 (798720), 7 (262144)) depth: 4 strata with value/size (0 (274625), 1 (811200), 2 (798720), 3 (262144)) marker: 1 strata with value/size (1 (49530)) Face Sets: 3 strata with value/size (1 (16129), 3 (16129), 6 (16129)) Linear solve did not converge due to DIVERGED_ITS iterations 200 KSP Object: 8 MPI processes type: cg maximum iterations=200, initial guess is zero tolerances: relative=1e-12, absolute=1e-50, divergence=10000. left preconditioning using UNPRECONDITIONED norm type for convergence test PC Object: 8 MPI processes type: jacobi type DIAGONAL linear system matrix = precond matrix: Mat Object: 8 MPI processes type: mpiaijkokkos rows=16581375, cols=16581375 total: nonzeros=1045678375, allocated nonzeros=1045678375 total number of mallocs used during MatSetValues calls=0 not using I-node (on process 0) routines Linear solve did not converge due to DIVERGED_ITS iterations 200 KSP Object: 8 MPI processes type: cg maximum iterations=200, initial guess is zero tolerances: relative=1e-12, absolute=1e-50, divergence=10000. left preconditioning using UNPRECONDITIONED norm type for convergence test PC Object: 8 MPI processes type: jacobi type DIAGONAL linear system matrix = precond matrix: Mat Object: 8 MPI processes type: mpiaijkokkos rows=16581375, cols=16581375 total: nonzeros=1045678375, allocated nonzeros=1045678375 total number of mallocs used during MatSetValues calls=0 not using I-node (on process 0) routines Linear solve did not converge due to DIVERGED_ITS iterations 200 KSP Object: 8 MPI processes type: cg maximum iterations=200, initial guess is zero tolerances: relative=1e-12, absolute=1e-50, divergence=10000. left preconditioning using UNPRECONDITIONED norm type for convergence test PC Object: 8 MPI processes type: jacobi type DIAGONAL linear system matrix = precond matrix: Mat Object: 8 MPI processes type: mpiaijkokkos rows=16581375, cols=16581375 total: nonzeros=1045678375, allocated nonzeros=1045678375 total number of mallocs used during MatSetValues calls=0 not using I-node (on process 0) routines ************************************************************************************************************************ *** WIDEN YOUR WINDOW TO 120 CHARACTERS. Use 'enscript -r -fCourier9' to print this document *** ************************************************************************************************************************ ---------------------------------------------- PETSc Performance Summary: ---------------------------------------------- /gpfs/alpine/csc314/scratch/adams/petsc/src/snes/tests/data/../ex13 on a arch-olcf-crusher named crusher003 with 8 processors, by adams Sat Jan 22 12:15:11 2022 Using Petsc Development GIT revision: v3.16.3-682-g5f40ebe68c GIT Date: 2022-01-22 09:12:56 -0500 Max Max/Min Avg Total Time (sec): 3.812e+02 1.000 3.812e+02 Objects: 1.990e+03 1.027 1.947e+03 Flop: 1.940e+11 1.027 1.915e+11 1.532e+12 Flop/sec: 5.088e+08 1.027 5.022e+08 4.018e+09 MPI Messages: 4.806e+03 1.066 4.571e+03 3.657e+04 MPI Message Lengths: 4.434e+08 1.015 9.611e+04 3.515e+09 MPI Reductions: 1.991e+03 1.000 Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract) e.g., VecAXPY() for real vectors of length N --> 2N flop and VecAXPY() for complex vectors of length N --> 8N flop Summary of Stages: ----- Time ------ ----- Flop ------ --- Messages --- -- Message Lengths -- -- Reductions -- Avg %Total Avg %Total Count %Total Avg %Total Count %Total 0: Main Stage: 3.6874e+02 96.7% 6.0875e+11 39.7% 1.417e+04 38.7% 1.143e+05 46.1% 7.660e+02 38.5% 1: PCSetUp: 1.7459e-01 0.0% 0.0000e+00 0.0% 0.000e+00 0.0% 0.000e+00 0.0% 0.000e+00 0.0% 2: KSP Solve only: 1.2303e+01 3.2% 9.2287e+11 60.3% 2.240e+04 61.3% 8.459e+04 53.9% 1.206e+03 60.6% ------------------------------------------------------------------------------------------------------------------------ See the 'Profiling' chapter of the users' manual for details on interpreting output. Phase summary info: Count: number of times phase was executed Time and Flop: Max - maximum over all processors Ratio - ratio of maximum to minimum over all processors Mess: number of messages sent AvgLen: average message length (bytes) Reduct: number of global reductions Global: entire computation Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop(). %T - percent time in this phase %F - percent flop in this phase %M - percent messages in this phase %L - percent message lengths in this phase %R - percent reductions in this phase Total Mflop/s: 10e-6 * (sum of flop over all processors)/(max time over all processors) GPU Mflop/s: 10e-6 * (sum of flop on GPU over all processors)/(max GPU time over all processors) CpuToGpu Count: total number of CPU to GPU copies per processor CpuToGpu Size (Mbytes): 10e-6 * (total size of CPU to GPU copies per processor) GpuToCpu Count: total number of GPU to CPU copies per processor GpuToCpu Size (Mbytes): 10e-6 * (total size of GPU to CPU copies per processor) GPU %F: percent flops on GPU in this event ------------------------------------------------------------------------------------------------------------------------ Event Count Time (sec) Flop --- Global --- --- Stage ---- Total GPU - CpuToGpu - - GpuToCpu - GPU Max Ratio Max Ratio Max Ratio Mess AvgLen Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s Mflop/s Count Size Count Size %F --------------------------------------------------------------------------------------------------------------------------------------------------------------- --- Event Stage 0: Main Stage PetscBarrier 6 1.0 1.1378e+00 1.0 0.00e+00 0.0 9.3e+02 3.2e+03 2.1e+01 0 0 3 0 1 0 0 7 0 3 0 0 0 0.00e+00 0 0.00e+00 0 BuildTwoSided 42 1.0 1.0037e+00 6.5 0.00e+00 0.0 7.5e+02 4.0e+00 4.2e+01 0 0 2 0 2 0 0 5 0 5 0 0 0 0.00e+00 0 0.00e+00 0 BuildTwoSidedF 6 1.0 8.8592e-01 7.4 0.00e+00 0.0 1.5e+02 2.0e+06 6.0e+00 0 0 0 8 0 0 0 1 18 1 0 0 0 0.00e+00 0 0.00e+00 0 MatMult 48589 1.0 4.5152e+00 1.0 5.31e+10 1.0 1.1e+04 8.3e+04 2.0e+00 1 27 31 27 0 1 69 81 59 0 92638 131974 1 2.96e-01 0 0.00e+00 100 MatAssemblyBegin 43 1.0 1.0505e+00 2.4 0.00e+00 0.0 1.5e+02 2.0e+06 6.0e+00 0 0 0 8 0 0 0 1 18 1 0 0 0 0.00e+00 0 0.00e+00 0 MatAssemblyEnd 43 1.0 1.2382e+00 2.5 4.67e+06 0.0 0.0e+00 0.0e+00 9.0e+00 0 0 0 0 0 0 0 0 0 1 15 0 0 0.00e+00 0 0.00e+00 0 MatZeroEntries 3 1.0 8.0263e-03 3.6 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0 MatView 1 1.0 6.0014e-04 2.0 0.00e+00 0.0 0.0e+00 0.0e+00 1.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0 KSPSetUp 1 1.0 6.6280e-03 2.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0 KSPSolve 1 1.0 6.7075e+00 1.0 5.85e+10 1.0 1.1e+04 8.4e+04 6.0e+02 2 30 31 27 30 2 76 80 59 79 68793 112584 1 2.96e-01 0 0.00e+00 100 SNESSolve 1 1.0 1.2357e+02 1.0 6.79e+10 1.0 1.1e+04 9.6e+04 6.1e+02 32 35 31 31 31 34 88 81 68 80 4342 112578 3 1.70e+01 2 3.32e+01 86 SNESSetUp 1 1.0 4.4082e+01 1.0 0.00e+00 0.0 3.6e+02 9.4e+05 1.8e+01 12 0 1 10 1 12 0 3 21 2 0 0 0 0.00e+00 0 0.00e+00 0 SNESFunctionEval 2 1.0 2.1082e+01 1.0 6.33e+09 1.0 1.1e+02 6.2e+04 3.0e+00 5 3 0 0 0 6 8 1 0 0 2402 34866 3 3.33e+01 2 3.32e+01 0 SNESJacobianEval 2 1.0 2.9580e+02 1.0 1.21e+10 1.0 1.1e+02 2.6e+06 2.0e+00 78 6 0 9 0 80 16 1 19 0 327 0 0 0.00e+00 2 3.32e+01 0 DMCreateInterp 1 1.0 7.4460e-04 1.1 8.29e+04 1.0 7.6e+01 1.1e+03 1.6e+01 0 0 0 0 1 0 0 1 0 2 891 0 0 0.00e+00 0 0.00e+00 0 DMCreateMat 1 1.0 4.4074e+01 1.0 0.00e+00 0.0 3.6e+02 9.4e+05 1.8e+01 12 0 1 10 1 12 0 3 21 2 0 0 0 0.00e+00 0 0.00e+00 0 Mesh Partition 1 1.0 5.6438e-04 1.0 0.00e+00 0.0 3.5e+01 1.1e+02 8.0e+00 0 0 0 0 0 0 0 0 0 1 0 0 0 0.00e+00 0 0.00e+00 0 Mesh Migration 1 1.0 3.9047e-03 1.0 0.00e+00 0.0 2.0e+02 8.2e+01 2.9e+01 0 0 1 0 1 0 0 1 0 4 0 0 0 0.00e+00 0 0.00e+00 0 DMPlexPartSelf 1 1.0 1.0068e-0417.5 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0 DMPlexPartLblInv 1 1.0 2.0813e-04 1.8 0.00e+00 0.0 0.0e+00 0.0e+00 3.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0 DMPlexPartLblSF 1 1.0 1.1349e-04 1.8 0.00e+00 0.0 1.4e+01 5.6e+01 1.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0 DMPlexPartStrtSF 1 1.0 1.0916e-04 1.1 0.00e+00 0.0 7.0e+00 2.2e+02 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0 DMPlexPointSF 1 1.0 2.0593e-04 1.0 0.00e+00 0.0 1.4e+01 2.7e+02 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0 DMPlexInterp 19 1.0 5.8542e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0 DMPlexDistribute 1 1.0 4.6947e-03 1.0 0.00e+00 0.0 2.5e+02 9.7e+01 3.7e+01 0 0 1 0 2 0 0 2 0 5 0 0 0 0.00e+00 0 0.00e+00 0 DMPlexDistCones 1 1.0 1.0043e-04 1.0 0.00e+00 0.0 4.2e+01 1.4e+02 2.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0 DMPlexDistLabels 1 1.0 2.0351e-04 1.1 0.00e+00 0.0 1.0e+02 6.6e+01 2.4e+01 0 0 0 0 1 0 0 1 0 3 0 0 0 0.00e+00 0 0.00e+00 0 DMPlexDistField 1 1.0 3.4538e-03 1.0 0.00e+00 0.0 4.9e+01 5.9e+01 2.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0 DMPlexStratify 34 1.0 4.0886e-02 1.7 0.00e+00 0.0 0.0e+00 0.0e+00 8.0e+00 0 0 0 0 0 0 0 0 0 1 0 0 0 0.00e+00 0 0.00e+00 0 DMPlexSymmetrize 34 1.0 1.0803e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0 DMPlexPrealloc 1 1.0 4.4024e+01 1.0 0.00e+00 0.0 3.6e+02 9.4e+05 1.6e+01 12 0 1 10 1 12 0 3 21 2 0 0 0 0.00e+00 0 0.00e+00 0 DMPlexResidualFE 2 1.0 1.9932e+01 1.0 6.29e+09 1.0 0.0e+00 0.0e+00 0.0e+00 5 3 0 0 0 5 8 0 0 0 2526 0 0 0.00e+00 0 0.00e+00 0 DMPlexJacobianFE 2 1.0 2.9515e+02 1.0 1.21e+10 1.0 7.6e+01 3.9e+06 2.0e+00 77 6 0 8 0 80 16 1 18 0 327 0 0 0.00e+00 0 0.00e+00 0 DMPlexInterpFE 1 1.0 7.2838e-04 1.2 8.29e+04 1.0 7.6e+01 1.1e+03 1.6e+01 0 0 0 0 1 0 0 1 0 2 911 0 0 0.00e+00 0 0.00e+00 0 SFSetGraph 46 1.0 3.3543e-03 2.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0 SFSetUp 36 1.0 3.6481e-01 1.2 0.00e+00 0.0 1.3e+03 9.1e+04 3.6e+01 0 0 4 3 2 0 0 9 7 5 0 0 0 0.00e+00 0 0.00e+00 0 SFBcastBegin 68 1.0 3.1905e-0118.9 0.00e+00 0.0 1.0e+03 5.4e+04 0.0e+00 0 0 3 2 0 0 0 7 3 0 0 0 1 9.79e-02 4 6.63e+01 0 SFBcastEnd 68 1.0 1.0374e+0011.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0 SFReduceBegin 17 1.0 1.5238e-0113.3 4.19e+06 1.0 3.1e+02 3.9e+05 0.0e+00 0 0 1 3 0 0 0 2 7 0 218 0 2 3.32e+01 0 0.00e+00 100 SFReduceEnd 17 1.0 1.1112e+0029.0 9.91e+04 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 100 SFFetchOpBegin 2 1.0 5.6017e-03166.3 0.00e+00 0.0 3.8e+01 1.0e+06 0.0e+00 0 0 0 1 0 0 0 0 2 0 0 0 0 0.00e+00 0 0.00e+00 0 SFFetchOpEnd 2 1.0 3.6141e-02 2.7 0.00e+00 0.0 3.8e+01 1.0e+06 0.0e+00 0 0 0 1 0 0 0 0 2 0 0 0 0 0.00e+00 0 0.00e+00 0 SFCreateEmbed 9 1.0 3.6284e-0151.8 0.00e+00 0.0 1.6e+02 2.9e+03 0.0e+00 0 0 0 0 0 0 0 1 0 0 0 0 0 0.00e+00 0 0.00e+00 0 SFDistSection 9 1.0 3.3711e-02 3.3 0.00e+00 0.0 3.1e+02 2.6e+04 1.1e+01 0 0 1 0 1 0 0 2 0 1 0 0 0 0.00e+00 0 0.00e+00 0 SFSectionSF 17 1.0 1.0259e-01 2.3 0.00e+00 0.0 5.2e+02 7.6e+04 1.7e+01 0 0 1 1 1 0 0 4 2 2 0 0 0 0.00e+00 0 0.00e+00 0 SFRemoteOff 8 1.0 3.7808e-0117.2 0.00e+00 0.0 4.9e+02 5.3e+03 5.0e+00 0 0 1 0 0 0 0 3 0 1 0 0 0 0.00e+00 0 0.00e+00 0 SFPack 294 1.0 2.8500e-0119.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 2 3.94e-01 0 0.00e+00 0 SFUnpack 296 1.0 1.7092e-01 5.1 4.29e+06 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 196 0 0 0.00e+00 0 0.00e+00 100 VecTDot 401 1.0 7.4630e-01 1.2 1.68e+09 1.0 0.0e+00 0.0e+00 4.0e+02 0 1 0 0 20 0 2 0 0 52 17819 44903 0 0.00e+00 0 0.00e+00 100 VecNorm 201 1.0 5.0483e-01 2.2 8.43e+08 1.0 0.0e+00 0.0e+00 2.0e+02 0 0 0 0 10 0 1 0 0 26 13204 121237 0 0.00e+00 0 0.00e+00 100 VecCopy 2 1.0 1.3590e-03 6.6 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0 VecSet 55 1.0 1.8782e-02 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0 VecAXPY 400 1.0 4.0624e-01 1.0 1.68e+09 1.0 0.0e+00 0.0e+00 0.0e+00 0 1 0 0 0 0 2 0 0 0 32653 63482 0 0.00e+00 0 0.00e+00 100 VecAYPX 199 1.0 4.9216e-01 2.0 8.35e+08 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 1 0 0 0 13409 16198 0 0.00e+00 0 0.00e+00 100 VecPointwiseMult 201 1.0 1.8710e-01 1.1 4.22e+08 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 1 0 0 0 17813 34162 0 0.00e+00 0 0.00e+00 100 VecScatterBegin 201 1.0 6.7449e-01 2.7 0.00e+00 0.0 1.1e+04 8.3e+04 2.0e+00 0 0 31 27 0 0 0 81 59 0 0 0 1 2.96e-01 0 0.00e+00 0 VecScatterEnd 201 1.0 7.5789e-01 1.6 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0 DualSpaceSetUp 2 1.0 2.5864e-03 1.0 1.80e+03 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 6 0 0 0.00e+00 0 0.00e+00 0 FESetUp 2 1.0 1.8056e-03 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0 PCSetUp 1 1.0 8.7270e-06 3.8 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0 PCApply 201 1.0 4.0403e-01 1.0 4.22e+08 1.0 0.0e+00 0.0e+00 2.0e+00 0 0 0 0 0 0 1 0 0 0 8249 28661 0 0.00e+00 0 0.00e+00 100 --- Event Stage 1: PCSetUp PCSetUp 1 1.0 1.7966e-01 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 100 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0 --- Event Stage 2: KSP Solve only MatMult 400 1.0 8.8003e+00 1.1 1.06e+11 1.0 2.2e+04 8.5e+04 0.0e+00 2 55 61 54 0 70 91100100 0 95058 132242 0 0.00e+00 0 0.00e+00 100 MatView 2 1.0 1.1643e-03 1.8 0.00e+00 0.0 0.0e+00 0.0e+00 2.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0 KSPSolve 2 1.0 1.2540e+01 1.0 1.17e+11 1.0 2.2e+04 8.5e+04 1.2e+03 3 60 61 54 60 100100100100100 73592 116796 0 0.00e+00 0 0.00e+00 100 SFPack 400 1.0 1.8276e-03 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0 SFUnpack 400 1.0 6.2653e-05 1.6 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0 VecTDot 802 1.0 1.3551e+00 1.2 3.36e+09 1.0 0.0e+00 0.0e+00 8.0e+02 0 2 0 0 40 10 3 0 0 67 19627 52599 0 0.00e+00 0 0.00e+00 100 VecNorm 402 1.0 9.0151e-01 2.2 1.69e+09 1.0 0.0e+00 0.0e+00 4.0e+02 0 1 0 0 20 5 1 0 0 33 14788 125477 0 0.00e+00 0 0.00e+00 100 VecCopy 4 1.0 7.3905e-03 2.5 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0 VecSet 4 1.0 3.1814e-03 1.8 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0 VecAXPY 800 1.0 8.2617e-01 1.0 3.36e+09 1.0 0.0e+00 0.0e+00 0.0e+00 0 2 0 0 0 7 3 0 0 0 32112 61644 0 0.00e+00 0 0.00e+00 100 VecAYPX 398 1.0 8.1525e-01 1.6 1.67e+09 1.0 0.0e+00 0.0e+00 0.0e+00 0 1 0 0 0 5 1 0 0 0 16190 20689 0 0.00e+00 0 0.00e+00 100 VecPointwiseMult 402 1.0 3.5694e-01 1.0 8.43e+08 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 3 1 0 0 0 18675 38633 0 0.00e+00 0 0.00e+00 100 VecScatterBegin 400 1.0 1.3391e+00 2.6 0.00e+00 0.0 2.2e+04 8.5e+04 0.0e+00 0 0 61 54 0 7 0100100 0 0 0 0 0.00e+00 0 0.00e+00 0 VecScatterEnd 400 1.0 1.3240e+00 1.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 9 0 0 0 0 0 0 0 0.00e+00 0 0.00e+00 0 PCApply 402 1.0 3.5712e-01 1.0 8.43e+08 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 3 1 0 0 0 18665 38633 0 0.00e+00 0 0.00e+00 100 --------------------------------------------------------------------------------------------------------------------------------------------------------------- Memory usage is given in bytes: Object Type Creations Destructions Memory Descendants' Mem. Reports information only for process 0. --- Event Stage 0: Main Stage Container 33 33 19008 0. SNES 1 1 1540 0. DMSNES 1 1 688 0. Krylov Solver 1 1 1664 0. DMKSP interface 1 1 656 0. Matrix 76 76 1627827176 0. Distributed Mesh 72 72 58958528 0. DM Label 180 180 113760 0. Quadrature 148 148 87616 0. Mesh Transform 6 6 4536 0. Index Set 665 665 4081364 0. IS L to G Mapping 2 2 8588672 0. Section 256 256 182272 0. Star Forest Graph 179 179 195360 0. Discrete System 121 121 116164 0. Weak Form 122 122 75152 0. GraphPartitioner 34 34 23392 0. Vector 55 55 157135208 0. Linear Space 5 5 3416 0. Dual Space 26 26 24336 0. FE Space 2 2 1576 0. Viewer 2 1 840 0. Preconditioner 1 1 872 0. Field over DM 1 1 704 0. --- Event Stage 1: PCSetUp --- Event Stage 2: KSP Solve only ======================================================================================================================== Average time to get PetscTime(): 3.9e-08 Average time for MPI_Barrier(): 8.136e-07 Average time for zero size MPI_Send(): 7.50075e-06 #PETSc Option Table entries: -benchmark_it 2 -dm_distribute -dm_mat_type aijkokkos -dm_plex_box_faces 2,2,2 -dm_plex_box_lower 0,0,0 -dm_plex_box_upper 1,1,1 -dm_plex_dim 3 -dm_plex_simplex 0 -dm_refine 6 -dm_vec_type kokkos -dm_view -ksp_converged_reason -ksp_max_it 200 -ksp_norm_type unpreconditioned -ksp_rtol 1.e-12 -ksp_type cg -ksp_view -log_view -mg_levels_esteig_ksp_max_it 10 -mg_levels_esteig_ksp_type cg -mg_levels_ksp_chebyshev_esteig 0,0.05,0,1.05 -mg_levels_ksp_type chebyshev -mg_levels_pc_type jacobi -options_left -pc_gamg_coarse_eq_limit 100 -pc_gamg_coarse_grid_layout_type compact -pc_gamg_esteig_ksp_max_it 10 -pc_gamg_esteig_ksp_type cg -pc_gamg_process_eq_limit 400 -pc_gamg_repartition false -pc_gamg_reuse_interpolation true -pc_gamg_square_graph 0 -pc_gamg_threshold -0.01 -pc_type jacobi -petscpartitioner_simple_node_grid 1,1,1 -petscpartitioner_simple_process_grid 2,2,2 -petscpartitioner_type simple -potential_petscspace_degree 2 -snes_max_it 1 -snes_rtol 1.e-8 -snes_type ksponly -use_gpu_aware_mpi true #End of PETSc Option Table entries Compiled without FORTRAN kernels Compiled with full precision matrices (default) sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 sizeof(PetscInt) 4 Configure options: --with-cc=cc --with-cxx=CC --with-fc=ftn --with-fortran-bindings=0 LIBS="-L/opt/cray/pe/mpich/8.1.12/gtl/lib -lmpi_gtl_hsa" --with-debugging=0 --with-mpiexec="srun -p batch -N 1 -A csc314_crusher -t 00:10:00" --with-hip --with-hipc=hipcc --download-hypre --download-hypre-configure-arguments=--enable-unified-memory --with-hip-arch=gfx90a --download-kokkos --download-kokkos-kernels --with-kokkos-kernels-tpl=0 PETSC_ARCH=arch-olcf-crusher ----------------------------------------- Libraries compiled on 2022-01-22 14:37:56 on login2 Machine characteristics: Linux-5.3.18-59.16_11.0.39-cray_shasta_c-x86_64-with-glibc2.3.4 Using PETSc directory: /gpfs/alpine/csc314/scratch/adams/petsc Using PETSc arch: arch-olcf-crusher ----------------------------------------- Using C compiler: cc -fPIC -Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -fstack-protector -Qunused-arguments -fvisibility=hidden -g -O3 Using Fortran compiler: ftn -fPIC ----------------------------------------- Using include paths: -I/gpfs/alpine/csc314/scratch/adams/petsc/include -I/gpfs/alpine/csc314/scratch/adams/petsc/arch-olcf-crusher/include -I/opt/rocm-4.5.0/include ----------------------------------------- Using C linker: cc Using Fortran linker: ftn Using libraries: -Wl,-rpath,/gpfs/alpine/csc314/scratch/adams/petsc/arch-olcf-crusher/lib -L/gpfs/alpine/csc314/scratch/adams/petsc/arch-olcf-crusher/lib -lpetsc -Wl,-rpath,/gpfs/alpine/csc314/scratch/adams/petsc/arch-olcf-crusher/lib -L/gpfs/alpine/csc314/scratch/adams/petsc/arch-olcf-crusher/lib -Wl,-rpath,/opt/rocm-4.5.0/lib -L/opt/rocm-4.5.0/lib -Wl,-rpath,/opt/cray/pe/mpich/8.1.12/gtl/lib -L/opt/cray/pe/mpich/8.1.12/gtl/lib -Wl,-rpath,/opt/cray/pe/gcc/8.1.0/snos/lib64 -L/opt/cray/pe/gcc/8.1.0/snos/lib64 -Wl,-rpath,/opt/cray/pe/libsci/21.08.1.2/CRAY/9.0/x86_64/lib -L/opt/cray/pe/libsci/21.08.1.2/CRAY/9.0/x86_64/lib -Wl,-rpath,/opt/cray/pe/mpich/8.1.12/ofi/cray/10.0/lib -L/opt/cray/pe/mpich/8.1.12/ofi/cray/10.0/lib -Wl,-rpath,/opt/cray/pe/dsmml/0.2.2/dsmml/lib -L/opt/cray/pe/dsmml/0.2.2/dsmml/lib -Wl,-rpath,/opt/cray/pe/pmi/6.0.16/lib -L/opt/cray/pe/pmi/6.0.16/lib -Wl,-rpath,/opt/cray/pe/cce/13.0.0/cce/x86_64/lib -L/opt/cray/pe/cce/13.0.0/cce/x86_64/lib -Wl,-rpath,/opt/cray/xpmem/2.3.2-2.2_1.16__g9ea452c.shasta/lib64 -L/opt/cray/xpmem/2.3.2-2.2_1.16__g9ea452c.shasta/lib64 -Wl,-rpath,/opt/cray/pe/cce/13.0.0/cce-clang/x86_64/lib/clang/13.0.0/lib/linux -L/opt/cray/pe/cce/13.0.0/cce-clang/x86_64/lib/clang/13.0.0/lib/linux -Wl,-rpath,/opt/cray/pe/gcc/8.1.0/snos/lib/gcc/x86_64-suse-linux/8.1.0 -L/opt/cray/pe/gcc/8.1.0/snos/lib/gcc/x86_64-suse-linux/8.1.0 -Wl,-rpath,/opt/cray/pe/cce/13.0.0/binutils/x86_64/x86_64-unknown-linux-gnu/lib -L/opt/cray/pe/cce/13.0.0/binutils/x86_64/x86_64-unknown-linux-gnu/lib -lHYPRE -lkokkoskernels -lkokkoscontainers -lkokkoscore -lhipsparse -lhipblas -lrocsparse -lrocsolver -lrocblas -lrocrand -lamdhip64 -ldl -lmpi_gtl_hsa -lmpifort_cray -lmpi_cray -ldsmml -lpmi -lpmi2 -lxpmem -lstdc++ -lpgas-shmem -lquadmath -lmodules -lfi -lcraymath -lf -lu -lcsup -lgfortran -lpthread -lgcc_eh -lm -lclang_rt.craypgo-x86_64 -lclang_rt.builtins-x86_64 -lquadmath -ldl -lmpi_gtl_hsa ----------------------------------------- #PETSc Option Table entries: -benchmark_it 2 -dm_distribute -dm_mat_type aijkokkos -dm_plex_box_faces 2,2,2 -dm_plex_box_lower 0,0,0 -dm_plex_box_upper 1,1,1 -dm_plex_dim 3 -dm_plex_simplex 0 -dm_refine 6 -dm_vec_type kokkos -dm_view -ksp_converged_reason -ksp_max_it 200 -ksp_norm_type unpreconditioned -ksp_rtol 1.e-12 -ksp_type cg -ksp_view -log_view -mg_levels_esteig_ksp_max_it 10 -mg_levels_esteig_ksp_type cg -mg_levels_ksp_chebyshev_esteig 0,0.05,0,1.05 -mg_levels_ksp_type chebyshev -mg_levels_pc_type jacobi -options_left -pc_gamg_coarse_eq_limit 100 -pc_gamg_coarse_grid_layout_type compact -pc_gamg_esteig_ksp_max_it 10 -pc_gamg_esteig_ksp_type cg -pc_gamg_process_eq_limit 400 -pc_gamg_repartition false -pc_gamg_reuse_interpolation true -pc_gamg_square_graph 0 -pc_gamg_threshold -0.01 -pc_type jacobi -petscpartitioner_simple_node_grid 1,1,1 -petscpartitioner_simple_process_grid 2,2,2 -petscpartitioner_type simple -potential_petscspace_degree 2 -snes_max_it 1 -snes_rtol 1.e-8 -snes_type ksponly -use_gpu_aware_mpi true #End of PETSc Option Table entries WARNING! There are options you set that were not used! WARNING! could be spelling mistake, etc! There are 14 unused database options. They are: Option left: name:-mg_levels_esteig_ksp_max_it value: 10 Option left: name:-mg_levels_esteig_ksp_type value: cg Option left: name:-mg_levels_ksp_chebyshev_esteig value: 0,0.05,0,1.05 Option left: name:-mg_levels_ksp_type value: chebyshev Option left: name:-mg_levels_pc_type value: jacobi Option left: name:-pc_gamg_coarse_eq_limit value: 100 Option left: name:-pc_gamg_coarse_grid_layout_type value: compact Option left: name:-pc_gamg_esteig_ksp_max_it value: 10 Option left: name:-pc_gamg_esteig_ksp_type value: cg Option left: name:-pc_gamg_process_eq_limit value: 400 Option left: name:-pc_gamg_repartition value: false Option left: name:-pc_gamg_reuse_interpolation value: true Option left: name:-pc_gamg_square_graph value: 0 Option left: name:-pc_gamg_threshold value: -0.01