> On Jan 14, 2016, at 2:01 PM, Griffith, Boyce Eugene <boy...@email.unc.edu> > wrote: > >> >> On Jan 14, 2016, at 2:24 PM, Barry Smith <bsm...@mcs.anl.gov> wrote: >> >> >> Matt is right, there is a lot of "missing" time from the output. Please >> send the output from -ksp_view so we can see exactly what solver is being >> used. >> >> From the output we have: >> >> Nonlinear solver 78 % of the time (so your "setup code" outside of PETSC >> is taking about 22% of the time) >> Linear solver 77 % of the time (this is reasonable pretty much the entire >> cost of the nonlinear solve is the linear solve) >> Time to set up the preconditioner is 19% (10 + 9) >> Time of iteration in KSP 35 % (this is the sum of the vector operations >> and MatMult() and MatSolve()) >> >> So 77 - (19 + 35) = 23 % unexplained time inside the linear solver >> (custom preconditioner???) >> >> Also getting the results with Instruments or HPCToolkit would be useful >> (so long as we don't need to install HPCTool ourselves to see the results). > > Thanks, Barry (& Matt & Dave) --- This is a solver that is mixing some > matrix-based stuff implemented using PETSc along with some matrix-free stuff > that is built on top of SAMRAI. Amneet and I should take a look at > performance off-list first.
Just put an PetscLogEvent() in (or several) to track that part. Plus put an event or two outside the SNESSolve to track the outside PETSc setup time. The PETSc time looks reasonable at most I can only image any optimizations we could do bringing it down a small percentage. Barry > > -- Boyce > >> >> >> Barry >> >>> On Jan 14, 2016, at 1:26 AM, Bhalla, Amneet Pal S <amne...@live.unc.edu> >>> wrote: >>> >>> >>> >>>> On Jan 13, 2016, at 9:17 PM, Griffith, Boyce Eugene <boy...@email.unc.edu> >>>> wrote: >>>> >>>> I see one hot spot: >>> >>> >>> Here is with opt build >>> >>> ************************************************************************************************************************ >>> *** WIDEN YOUR WINDOW TO 120 CHARACTERS. Use 'enscript -r >>> -fCourier9' to print this document *** >>> ************************************************************************************************************************ >>> >>> ---------------------------------------------- PETSc Performance Summary: >>> ---------------------------------------------- >>> >>> ./main2d on a linux-opt named aorta with 1 processor, by amneetb Thu Jan 14 >>> 02:24:43 2016 >>> Using Petsc Development GIT revision: v3.6.3-3098-ga3ecda2 GIT Date: >>> 2016-01-13 21:30:26 -0600 >>> >>> Max Max/Min Avg Total >>> Time (sec): 1.018e+00 1.00000 1.018e+00 >>> Objects: 2.935e+03 1.00000 2.935e+03 >>> Flops: 4.957e+08 1.00000 4.957e+08 4.957e+08 >>> Flops/sec: 4.868e+08 1.00000 4.868e+08 4.868e+08 >>> MPI Messages: 0.000e+00 0.00000 0.000e+00 0.000e+00 >>> MPI Message Lengths: 0.000e+00 0.00000 0.000e+00 0.000e+00 >>> MPI Reductions: 0.000e+00 0.00000 >>> >>> Flop counting convention: 1 flop = 1 real number operation of type >>> (multiply/divide/add/subtract) >>> e.g., VecAXPY() for real vectors of length N --> >>> 2N flops >>> and VecAXPY() for complex vectors of length N --> >>> 8N flops >>> >>> Summary of Stages: ----- Time ------ ----- Flops ----- --- Messages --- >>> -- Message Lengths -- -- Reductions -- >>> Avg %Total Avg %Total counts %Total >>> Avg %Total counts %Total >>> 0: Main Stage: 1.0183e+00 100.0% 4.9570e+08 100.0% 0.000e+00 0.0% >>> 0.000e+00 0.0% 0.000e+00 0.0% >>> >>> ------------------------------------------------------------------------------------------------------------------------ >>> See the 'Profiling' chapter of the users' manual for details on >>> interpreting output. >>> Phase summary info: >>> Count: number of times phase was executed >>> Time and Flops: Max - maximum over all processors >>> Ratio - ratio of maximum to minimum over all processors >>> Mess: number of messages sent >>> Avg. len: average message length (bytes) >>> Reduct: number of global reductions >>> Global: entire computation >>> Stage: stages of a computation. Set stages with PetscLogStagePush() and >>> PetscLogStagePop(). >>> %T - percent time in this phase %F - percent flops in this phase >>> %M - percent messages in this phase %L - percent message lengths in >>> this phase >>> %R - percent reductions in this phase >>> Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over >>> all processors) >>> ------------------------------------------------------------------------------------------------------------------------ >>> Event Count Time (sec) Flops >>> --- Global --- --- Stage --- Total >>> Max Ratio Max Ratio Max Ratio Mess Avg len >>> Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s >>> ------------------------------------------------------------------------------------------------------------------------ >>> >>> --- Event Stage 0: Main Stage >>> >>> VecDot 4 1.0 2.9564e-05 1.0 3.31e+04 1.0 0.0e+00 0.0e+00 >>> 0.0e+00 0 0 0 0 0 0 0 0 0 0 1120 >>> VecDotNorm2 272 1.0 1.4565e-03 1.0 4.25e+06 1.0 0.0e+00 0.0e+00 >>> 0.0e+00 0 1 0 0 0 0 1 0 0 0 2920 >>> VecMDot 624 1.0 8.4300e-03 1.0 5.29e+06 1.0 0.0e+00 0.0e+00 >>> 0.0e+00 1 1 0 0 0 1 1 0 0 0 627 >>> VecNorm 565 1.0 3.8033e-03 1.0 4.38e+06 1.0 0.0e+00 0.0e+00 >>> 0.0e+00 0 1 0 0 0 0 1 0 0 0 1151 >>> VecScale 86 1.0 5.5480e-04 1.0 1.55e+05 1.0 0.0e+00 0.0e+00 >>> 0.0e+00 0 0 0 0 0 0 0 0 0 0 279 >>> VecCopy 28 1.0 5.2261e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 >>> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>> VecSet 14567 1.0 1.2443e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 >>> 0.0e+00 1 0 0 0 0 1 0 0 0 0 0 >>> VecAXPY 903 1.0 4.2996e-03 1.0 6.66e+06 1.0 0.0e+00 0.0e+00 >>> 0.0e+00 0 1 0 0 0 0 1 0 0 0 1550 >>> VecAYPX 225 1.0 1.2550e-03 1.0 8.55e+05 1.0 0.0e+00 0.0e+00 >>> 0.0e+00 0 0 0 0 0 0 0 0 0 0 681 >>> VecAXPBYCZ 42 1.0 1.7118e-04 1.0 3.45e+05 1.0 0.0e+00 0.0e+00 >>> 0.0e+00 0 0 0 0 0 0 0 0 0 0 2014 >>> VecWAXPY 70 1.0 1.9503e-04 1.0 2.98e+05 1.0 0.0e+00 0.0e+00 >>> 0.0e+00 0 0 0 0 0 0 0 0 0 0 1528 >>> VecMAXPY 641 1.0 1.1136e-02 1.0 5.29e+06 1.0 0.0e+00 0.0e+00 >>> 0.0e+00 1 1 0 0 0 1 1 0 0 0 475 >>> VecSwap 135 1.0 4.5896e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 >>> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>> VecAssemblyBegin 745 1.0 4.9477e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 >>> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>> VecAssemblyEnd 745 1.0 9.2411e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 >>> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>> VecScatterBegin 40831 1.0 3.4502e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 >>> 0.0e+00 3 0 0 0 0 3 0 0 0 0 0 >>> BuildTwoSidedF 738 1.0 2.6712e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 >>> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>> MatMult 513 1.0 9.1235e-02 1.0 7.75e+07 1.0 0.0e+00 0.0e+00 >>> 0.0e+00 9 16 0 0 0 9 16 0 0 0 849 >>> MatSolve 13568 1.0 2.3605e-01 1.0 3.45e+08 1.0 0.0e+00 0.0e+00 >>> 0.0e+00 23 70 0 0 0 23 70 0 0 0 1460 >>> MatLUFactorSym 84 1.0 3.7430e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 >>> 0.0e+00 4 0 0 0 0 4 0 0 0 0 0 >>> MatLUFactorNum 85 1.0 3.9623e-02 1.0 4.19e+07 1.0 0.0e+00 0.0e+00 >>> 0.0e+00 4 8 0 0 0 4 8 0 0 0 1058 >>> MatILUFactorSym 1 1.0 3.3617e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 >>> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>> MatScale 4 1.0 2.5511e-04 1.0 2.51e+05 1.0 0.0e+00 0.0e+00 >>> 0.0e+00 0 0 0 0 0 0 0 0 0 0 984 >>> MatAssemblyBegin 108 1.0 6.3658e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 >>> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>> MatAssemblyEnd 108 1.0 2.9490e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 >>> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>> MatGetRow 33120 1.0 2.0157e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 >>> 0.0e+00 2 0 0 0 0 2 0 0 0 0 0 >>> MatGetRowIJ 85 1.0 1.2145e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 >>> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>> MatGetSubMatrice 4 1.0 8.4379e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 >>> 0.0e+00 1 0 0 0 0 1 0 0 0 0 0 >>> MatGetOrdering 85 1.0 7.7887e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 >>> 0.0e+00 1 0 0 0 0 1 0 0 0 0 0 >>> MatAXPY 4 1.0 4.9596e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 >>> 0.0e+00 5 0 0 0 0 5 0 0 0 0 0 >>> MatPtAP 4 1.0 4.4426e-02 1.0 4.99e+06 1.0 0.0e+00 0.0e+00 >>> 0.0e+00 4 1 0 0 0 4 1 0 0 0 112 >>> MatPtAPSymbolic 4 1.0 2.7664e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 >>> 0.0e+00 3 0 0 0 0 3 0 0 0 0 0 >>> MatPtAPNumeric 4 1.0 1.6732e-02 1.0 4.99e+06 1.0 0.0e+00 0.0e+00 >>> 0.0e+00 2 1 0 0 0 2 1 0 0 0 298 >>> MatGetSymTrans 4 1.0 3.6621e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 >>> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>> KSPGMRESOrthog 16 1.0 9.7778e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 >>> 0.0e+00 1 0 0 0 0 1 0 0 0 0 0 >>> KSPSetUp 90 1.0 5.7650e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 >>> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>> KSPSolve 1 1.0 7.8831e-01 1.0 4.90e+08 1.0 0.0e+00 0.0e+00 >>> 0.0e+00 77 99 0 0 0 77 99 0 0 0 622 >>> PCSetUp 90 1.0 9.9725e-02 1.0 4.19e+07 1.0 0.0e+00 0.0e+00 >>> 0.0e+00 10 8 0 0 0 10 8 0 0 0 420 >>> PCSetUpOnBlocks 112 1.0 8.7547e-02 1.0 4.19e+07 1.0 0.0e+00 0.0e+00 >>> 0.0e+00 9 8 0 0 0 9 8 0 0 0 479 >>> PCApply 16 1.0 7.1952e-01 1.0 4.89e+08 1.0 0.0e+00 0.0e+00 >>> 0.0e+00 71 99 0 0 0 71 99 0 0 0 680 >>> SNESSolve 1 1.0 7.9225e-01 1.0 4.90e+08 1.0 0.0e+00 0.0e+00 >>> 0.0e+00 78 99 0 0 0 78 99 0 0 0 619 >>> SNESFunctionEval 2 1.0 3.2940e-03 1.0 4.68e+04 1.0 0.0e+00 0.0e+00 >>> 0.0e+00 0 0 0 0 0 0 0 0 0 0 14 >>> SNESJacobianEval 1 1.0 4.7255e-04 1.0 4.26e+03 1.0 0.0e+00 0.0e+00 >>> 0.0e+00 0 0 0 0 0 0 0 0 0 0 9 >>> ------------------------------------------------------------------------------------------------------------------------ >>> >>> Memory usage is given in bytes: >>> >>> Object Type Creations Destructions Memory Descendants' Mem. >>> Reports information only for process 0. >>> >>> --- Event Stage 0: Main Stage >>> >>> Vector 971 839 15573352 0. >>> Vector Scatter 290 289 189584 0. >>> Index Set 1171 823 951928 0. >>> IS L to G Mapping 110 109 2156656 0. >>> Application Order 6 6 99952 0. >>> MatMFFD 1 1 776 0. >>> Matrix 189 189 24083332 0. >>> Matrix Null Space 4 4 2432 0. >>> Krylov Solver 90 90 122720 0. >>> DMKSP interface 1 1 648 0. >>> Preconditioner 90 90 89872 0. >>> SNES 1 1 1328 0. >>> SNESLineSearch 1 1 984 0. >>> DMSNES 1 1 664 0. >>> Distributed Mesh 2 2 9168 0. >>> Star Forest Bipartite Graph 4 4 3168 0. >>> Discrete System 2 2 1712 0. >>> Viewer 1 0 0 0. >>> ======================================================================================================================== >>> Average time to get PetscTime(): 9.53674e-07 >>> #PETSc Option Table entries: >>> -ib_ksp_converged_reason >>> -ib_ksp_monitor_true_residual >>> -ib_snes_type ksponly >>> -log_summary >>> -stokes_ib_pc_level_0_sub_pc_factor_nonzeros_along_diagonal >>> -stokes_ib_pc_level_0_sub_pc_type ilu >>> -stokes_ib_pc_level_ksp_richardson_self_scale >>> -stokes_ib_pc_level_ksp_type richardson >>> -stokes_ib_pc_level_pc_asm_local_type additive >>> -stokes_ib_pc_level_sub_pc_factor_nonzeros_along_diagonal >>> -stokes_ib_pc_level_sub_pc_type lu >>> #End of PETSc Option Table entries >>> Compiled without FORTRAN kernels >>> Compiled with full precision matrices (default) >>> sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 >>> sizeof(PetscScalar) 8 sizeof(PetscInt) 4 >>> Configure options: --CC=mpicc --CXX=mpicxx --FC=mpif90 >>> --with-default-arch=0 --PETSC_ARCH=linux-opt --with-debugging=0 >>> --with-c++-support=1 --with-hypre=1 --download-hypre=1 --with-hdf5=yes >>> --COPTFLAGS=-O3 --CXXOPTFLAGS=-O3 --FOPTFLAGS=-O3 >>> ----------------------------------------- >>> Libraries compiled on Thu Jan 14 01:29:56 2016 on aorta >>> Machine characteristics: >>> Linux-3.13.0-63-generic-x86_64-with-Ubuntu-14.04-trusty >>> Using PETSc directory: >>> /not_backed_up/amneetb/softwares/PETSc-BitBucket/PETSc >>> Using PETSc arch: linux-opt >>> ----------------------------------------- >>> >>> Using C compiler: mpicc -fPIC -Wall -Wwrite-strings -Wno-strict-aliasing >>> -Wno-unknown-pragmas -Qunused-arguments -O3 ${COPTFLAGS} ${CFLAGS} >>> Using Fortran compiler: mpif90 -fPIC -Wall -Wno-unused-variable >>> -ffree-line-length-0 -Wno-unused-dummy-argument -O3 ${FOPTFLAGS} >>> ${FFLAGS} >>> ----------------------------------------- >>> >>> Using include paths: >>> -I/not_backed_up/amneetb/softwares/PETSc-BitBucket/PETSc/linux-opt/include >>> -I/not_backed_up/amneetb/softwares/PETSc-BitBucket/PETSc/include >>> -I/not_backed_up/amneetb/softwares/PETSc-BitBucket/PETSc/include >>> -I/not_backed_up/amneetb/softwares/PETSc-BitBucket/PETSc/linux-opt/include >>> -I/not_backed_up/softwares/MPICH/include >>> ----------------------------------------- >>> >>> Using C linker: mpicc >>> Using Fortran linker: mpif90 >>> Using libraries: >>> -Wl,-rpath,/not_backed_up/amneetb/softwares/PETSc-BitBucket/PETSc/linux-opt/lib >>> -L/not_backed_up/amneetb/softwares/PETSc-BitBucket/PETSc/linux-opt/lib >>> -lpetsc >>> -Wl,-rpath,/not_backed_up/amneetb/softwares/PETSc-BitBucket/PETSc/linux-opt/lib >>> -L/not_backed_up/amneetb/softwares/PETSc-BitBucket/PETSc/linux-opt/lib >>> -lHYPRE -Wl,-rpath,/not_backed_up/softwares/MPICH/lib >>> -L/not_backed_up/softwares/MPICH/lib >>> -Wl,-rpath,/usr/lib/gcc/x86_64-linux-gnu/4.8 >>> -L/usr/lib/gcc/x86_64-linux-gnu/4.8 -Wl,-rpath,/usr/lib/x86_64-linux-gnu >>> -L/usr/lib/x86_64-linux-gnu -Wl,-rpath,/lib/x86_64-linux-gnu >>> -L/lib/x86_64-linux-gnu -lmpicxx -lstdc++ -llapack -lblas -lpthread >>> -lhdf5hl_fortran -lhdf5_fortran -lhdf5_hl -lhdf5 -lX11 -lm -lmpifort >>> -lgfortran -lm -lgfortran -lm -lquadmath -lm -lmpicxx -lstdc++ >>> -Wl,-rpath,/not_backed_up/softwares/MPICH/lib >>> -L/not_backed_up/softwares/MPICH/lib >>> -Wl,-rpath,/usr/lib/gcc/x86_64-linux-gnu/4.8 >>> -L/usr/lib/gcc/x86_64-linux-gnu/4.8 -Wl,-rpath,/usr/lib/x86_64-linux-gnu >>> -L/usr/lib/x86_64-linux-gnu -Wl,-rpath,/lib/x86_64-linux-gnu >>> -L/lib/x86_64-linux-gnu -Wl,-rpath,/usr/lib/x86_64-linux-gnu >>> -L/usr/lib/x86_64-linux-gnu -ldl >>> -Wl,-rpath,/not_backed_up/softwares/MPICH/lib -lmpi -lgcc_s -ldl >>> -----------------------------------------