Attached is the -log_view output (from firedrake). Event Stage 1: Linear_solver is where I assemble and solve the linear system of equations.
I am using the HYPRE BoomerAMG preconditioner so log_view cannot "see into" the exact steps, but based on what it can see, how do I distinguish between these various setup and timing phases? For example, when I look at these lines: PCSetUp 1 1.0 2.2858e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 9 0 0 0 0 11 0 0 0 0 0 PCApply 38 1.0 1.4102e+01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 56 0 0 0 0 66 0 0 0 0 0 KSPSetUp 1 1.0 9.9111e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 KSPSolve 1 1.0 1.7529e+01 1.0 2.44e+09 1.0 0.0e+00 0.0e+00 0.0e+00 70 7 0 0 0 82 7 0 0 0 139 SNESSolve 1 1.0 2.1056e+01 1.0 3.75e+10 1.0 0.0e+00 0.0e+00 0.0e+00 84100 0 0 0 99100 0 0 0 1781 SNESFunctionEval 1 1.0 1.0763e+00 1.0 1.07e+10 1.0 0.0e+00 0.0e+00 0.0e+00 4 29 0 0 0 5 29 0 0 0 9954 SNESJacobianEval 1 1.0 2.4495e+00 1.0 2.43e+10 1.0 0.0e+00 0.0e+00 0.0e+00 10 65 0 0 0 12 65 0 0 0 9937 So how do I break down "mesh setup", "matrix setup", and "solve time" phases? I am guessing "PCSetUp" has to do with one of the first two phases, but how would I categorize the rest of the events? I see that HYPRE doesn't have as much information as the other PCs like GAMG and ML but can one still breakdown the timing phases through log_view alone? Thanks, Justin On Tue, Aug 30, 2016 at 11:14 PM, Jed Brown <j...@jedbrown.org> wrote: > Mark Adams <mfad...@lbl.gov> writes: > > >> > >> > >> Anyway, what I really wanted to say is, it's good to know that these > >> "dynamic range/performance spectrum/static scaling" plots are designed > to > >> go past the sweet spots. I also agree that it would be interesting to > see a > >> time vs dofs*iterations/time plot. Would it then also be useful to look > at > >> the step to setting up the preconditioner? > >> > >> > > Yes, I generally split up timing between "mesh setup" (symbolic > > factorization of LU), "matrix setup" (eg, factorizations), and solve > time. > > The degree of amortization that you get for the two setup phases depends > on > > your problem and so it is useful to separate them. > > Right, there is nothing wrong with splitting up the phases, but if you > never show a spectrum for the total, then I will be suspicious. And if > you only show "per iteration" instead of for a complete solve, then I > will assume that you're only doing that because convergence is unusably > slow. >
Residual norms for linear_ solve. 0 KSP Residual norm 5.261660052036e+02 1 KSP Residual norm 1.356995663739e+02 2 KSP Residual norm 4.098866223191e+01 3 KSP Residual norm 1.600475709119e+01 4 KSP Residual norm 6.956667251063e+00 5 KSP Residual norm 3.861942754258e+00 6 KSP Residual norm 2.331981130299e+00 7 KSP Residual norm 1.404876311943e+00 8 KSP Residual norm 8.215556397889e-01 9 KSP Residual norm 5.226439657305e-01 10 KSP Residual norm 3.421520551962e-01 11 KSP Residual norm 2.382992002722e-01 12 KSP Residual norm 1.743249670147e-01 13 KSP Residual norm 1.277911689618e-01 14 KSP Residual norm 9.453802371730e-02 15 KSP Residual norm 7.022732618304e-02 16 KSP Residual norm 5.276835142527e-02 17 KSP Residual norm 3.966717849679e-02 18 KSP Residual norm 2.987708356527e-02 19 KSP Residual norm 2.221046390150e-02 20 KSP Residual norm 1.631262945106e-02 21 KSP Residual norm 1.188030506469e-02 22 KSP Residual norm 8.655984108945e-03 23 KSP Residual norm 6.239072936196e-03 24 KSP Residual norm 4.455419528387e-03 25 KSP Residual norm 3.235023376588e-03 26 KSP Residual norm 2.345588803418e-03 27 KSP Residual norm 1.668600898579e-03 28 KSP Residual norm 1.180578845647e-03 29 KSP Residual norm 8.327223711005e-04 30 KSP Residual norm 5.853054571413e-04 31 KSP Residual norm 4.038722556707e-04 32 KSP Residual norm 2.731786184181e-04 33 KSP Residual norm 1.853188978548e-04 34 KSP Residual norm 1.277834040044e-04 35 KSP Residual norm 8.853670330190e-05 36 KSP Residual norm 6.151569062192e-05 37 KSP Residual norm 4.247283089736e-05 Linear linear_ solve converged due to CONVERGED_RTOL iterations 37 Wall-clock time: 2.126e+01 seconds ************************************************************************************************************************ *** WIDEN YOUR WINDOW TO 120 CHARACTERS. Use 'enscript -r -fCourier9' to print this document *** ************************************************************************************************************************ ---------------------------------------------- PETSc Performance Summary: ---------------------------------------------- 3D_ex1.py on a arch-python-linux-x86_64 named pacotaco-xps with 1 processor, by justin Tue Aug 30 23:34:47 2016 Using Petsc Development GIT revision: v3.4.2-13575-gc28f300 GIT Date: 2016-07-10 20:22:41 -0500 Max Max/Min Avg Total Time (sec): 2.497e+01 1.00000 2.497e+01 Objects: 1.310e+02 1.00000 1.310e+02 Flops: 3.749e+10 1.00000 3.749e+10 3.749e+10 Flops/sec: 1.502e+09 1.00000 1.502e+09 1.502e+09 MPI Messages: 0.000e+00 0.00000 0.000e+00 0.000e+00 MPI Message Lengths: 0.000e+00 0.00000 0.000e+00 0.000e+00 MPI Reductions: 0.000e+00 0.00000 Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract) e.g., VecAXPY() for real vectors of length N --> 2N flops and VecAXPY() for complex vectors of length N --> 8N flops Summary of Stages: ----- Time ------ ----- Flops ----- --- Messages --- -- Message Lengths -- -- Reductions -- Avg %Total Avg %Total counts %Total Avg %Total counts %Total 0: Main Stage: 3.7065e+00 14.8% 0.0000e+00 0.0% 0.000e+00 0.0% 0.000e+00 0.0% 0.000e+00 0.0% 1: Linear_solver: 2.1265e+01 85.2% 3.7494e+10 100.0% 0.000e+00 0.0% 0.000e+00 0.0% 0.000e+00 0.0% 2: Nonlinear_solver: 9.5367e-07 0.0% 0.0000e+00 0.0% 0.000e+00 0.0% 0.000e+00 0.0% 0.000e+00 0.0% ------------------------------------------------------------------------------------------------------------------------ See the 'Profiling' chapter of the users' manual for details on interpreting output. Phase summary info: Count: number of times phase was executed Time and Flops: Max - maximum over all processors Ratio - ratio of maximum to minimum over all processors Mess: number of messages sent Avg. len: average message length (bytes) Reduct: number of global reductions Global: entire computation Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop(). %T - percent time in this phase %F - percent flops in this phase %M - percent messages in this phase %L - percent message lengths in this phase %R - percent reductions in this phase Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over all processors) ------------------------------------------------------------------------------------------------------------------------ Event Count Time (sec) Flops --- Global --- --- Stage --- Total Max Ratio Max Ratio Max Ratio Mess Avg len Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s ------------------------------------------------------------------------------------------------------------------------ --- Event Stage 0: Main Stage VecSet 4 1.0 3.1900e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatAssemblyBegin 1 1.0 0.0000e+00 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatAssemblyEnd 1 1.0 2.7848e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 1 0 0 0 0 0 DMPlexInterp 1 1.0 1.2410e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 DMPlexStratify 2 1.0 4.7100e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 SFSetGraph 7 1.0 2.5320e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 1 0 0 0 0 0 CreateMesh 8 1.0 1.2974e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 CreateExtMesh 1 1.0 4.5982e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 2 0 0 0 0 12 0 0 0 0 0 Mesh: reorder 1 1.0 1.7152e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 Mesh: numbering 1 1.0 7.5190e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 CreateFunctionSpace 5 1.0 4.4637e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 2 0 0 0 0 12 0 0 0 0 0 Trace: eval 4 1.0 1.3766e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 6 0 0 0 0 37 0 0 0 0 0 ParLoopExecute 2 1.0 1.3765e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 6 0 0 0 0 37 0 0 0 0 0 ParLoopCKernel 6 1.0 1.3747e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 6 0 0 0 0 37 0 0 0 0 0 ParLoopReductionBegin 2 1.0 9.0599e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 ParLoopReductionEnd 2 1.0 5.0068e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 CreateSparsity 1 1.0 6.4163e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 3 0 0 0 0 17 0 0 0 0 0 MatZeroInitial 1 1.0 6.9048e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 3 0 0 0 0 19 0 0 0 0 0 --- Event Stage 1: Linear_solver VecTDot 74 1.0 6.9256e-02 1.0 1.52e+08 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 2202 VecNorm 38 1.0 1.8549e-02 1.0 7.83e+07 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 4221 VecCopy 4 1.0 4.4966e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecSet 45 1.0 1.4026e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecAXPY 75 1.0 9.5319e-02 1.0 1.55e+08 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 1621 VecAYPX 36 1.0 4.9965e-02 1.0 7.42e+07 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 1485 MatMult 37 1.0 9.0438e-01 1.0 1.98e+09 1.0 0.0e+00 0.0e+00 0.0e+00 4 5 0 0 0 4 5 0 0 0 2189 MatConvert 1 1.0 1.0125e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatAssemblyBegin 2 1.0 0.0000e+00 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatAssemblyEnd 2 1.0 1.6134e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatGetRowIJ 1 1.0 0.0000e+00 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatZeroEntries 1 1.0 8.9929e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 PCSetUp 1 1.0 2.2858e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 9 0 0 0 0 11 0 0 0 0 0 PCApply 38 1.0 1.4102e+01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 56 0 0 0 0 66 0 0 0 0 0 KSPSetUp 1 1.0 9.9111e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 KSPSolve 1 1.0 1.7529e+01 1.0 2.44e+09 1.0 0.0e+00 0.0e+00 0.0e+00 70 7 0 0 0 82 7 0 0 0 139 SNESSolve 1 1.0 2.1056e+01 1.0 3.75e+10 1.0 0.0e+00 0.0e+00 0.0e+00 84100 0 0 0 99100 0 0 0 1781 SNESFunctionEval 1 1.0 1.0763e+00 1.0 1.07e+10 1.0 0.0e+00 0.0e+00 0.0e+00 4 29 0 0 0 5 29 0 0 0 9954 SNESJacobianEval 1 1.0 2.4495e+00 1.0 2.43e+10 1.0 0.0e+00 0.0e+00 0.0e+00 10 65 0 0 0 12 65 0 0 0 9937 Trace: eval 11 1.0 3.6623e+00 1.0 3.51e+10 1.0 0.0e+00 0.0e+00 0.0e+00 15 93 0 0 0 17 93 0 0 0 9572 ParLoopExecute 14 1.0 3.6407e+00 1.0 3.51e+10 1.0 0.0e+00 0.0e+00 0.0e+00 15 93 0 0 0 17 93 0 0 0 9629 ParLoopCKernel 31 1.0 3.6314e+00 1.0 3.51e+10 1.0 0.0e+00 0.0e+00 0.0e+00 15 93 0 0 0 17 93 0 0 0 9653 ParLoopReductionBegin 14 1.0 4.6015e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 ParLoopReductionEnd 14 1.0 2.2411e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 ApplyBC 6 1.0 1.6722e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 --- Event Stage 2: Nonlinear_solver ------------------------------------------------------------------------------------------------------------------------ Memory usage is given in bytes: Object Type Creations Destructions Memory Descendants' Mem. Reports information only for process 0. --- Event Stage 0: Main Stage Container 10 8 4608 0. Viewer 2 0 0 0. Index Set 36 27 21384 0. IS L to G Mapping 1 0 0 0. Section 24 8 5568 0. Vector 5 4 32975776 0. Matrix 2 0 0 0. Preconditioner 1 1 1400 0. Krylov Solver 1 1 1248 0. SNES 1 1 1344 0. SNESLineSearch 1 1 992 0. DMSNES 1 0 0 0. Distributed Mesh 8 4 19008 0. GraphPartitioner 2 1 612 0. Star Forest Bipartite Graph 19 11 8888 0. Discrete System 8 4 3520 0. --- Event Stage 1: Linear_solver Vector 8 2 16487888 0. DMKSP interface 1 0 0 0. --- Event Stage 2: Nonlinear_solver ======================================================================================================================== Average time to get PetscTime(): 0. #PETSc Option Table entries: -ksp_rtol 1e-3 -linear_ksp_atol 1e-50 -linear_ksp_converged_reason -linear_ksp_monitor -linear_ksp_rtol 1e-7 -linear_ksp_type cg -linear_pc_hypre_boomeramg_agg_nl 2 -linear_pc_hypre_boomeramg_strong_threshold 0.75 -linear_pc_hypre_type boomeramg -linear_pc_type hypre -linear_snes_atol 1e-8 -linear_snes_type ksponly -log_view -snes_converged_reason -snes_max_it 1000 -tao_converged_reason -tao_max_it 1000 #End of PETSc Option Table entries Compiled without FORTRAN kernels Compiled with full precision matrices (default) sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 sizeof(PetscInt) 4 Configure options: --prefix=/home/justin/Software/firedrake/lib/python2.7/site-packages/petsc PETSC_ARCH=arch-python-linux-x86_64 --with-shared-libraries=1 --with-debugging=0 --with-c2html=0 --with-cc=/usr/bin/mpicc --with-cxx=/usr/bin/mpicxx --with-fc=/usr/bin/mpif90 --download-ml --download-ctetgen --download-triangle --download-chaco --download-metis --download-parmetis --download-scalapack --download-hypre --download-mumps --download-netcdf --download-hdf5 --download-exodusii ----------------------------------------- Libraries compiled on Fri Aug 5 02:51:37 2016 on pacotaco-xps Machine characteristics: Linux-4.4.0-31-generic-x86_64-with-Ubuntu-16.04-xenial Using PETSc directory: /tmp/pip-kA7m2r-build Using PETSc arch: arch-python-linux-x86_64 ----------------------------------------- Using C compiler: /usr/bin/mpicc -fPIC -Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -fvisibility=hidden -g -O ${COPTFLAGS} ${CFLAGS} Using Fortran compiler: /usr/bin/mpif90 -fPIC -Wall -ffree-line-length-0 -Wno-unused-dummy-argument -g -O ${FOPTFLAGS} ${FFLAGS} ----------------------------------------- Using include paths: -I/tmp/pip-kA7m2r-build/arch-python-linux-x86_64/include -I/tmp/pip-kA7m2r-build/include -I/tmp/pip-kA7m2r-build/include -I/tmp/pip-kA7m2r-build/arch-python-linux-x86_64/include -I/home/justin/Software/firedrake/lib/python2.7/site-packages/petsc/include -I/usr/lib/openmpi/include/openmpi/opal/mca/event/libevent2021/libevent -I/usr/lib/openmpi/include/openmpi/opal/mca/event/libevent2021/libevent/include -I/usr/lib/openmpi/include -I/usr/lib/openmpi/include/openmpi ----------------------------------------- Using C linker: /usr/bin/mpicc Using Fortran linker: /usr/bin/mpif90 Using libraries: -Wl,-rpath,/tmp/pip-kA7m2r-build/arch-python-linux-x86_64/lib -L/tmp/pip-kA7m2r-build/arch-python-linux-x86_64/lib -lpetsc -Wl,-rpath,/home/justin/Software/firedrake/lib/python2.7/site-packages/petsc/lib -L/home/justin/Software/firedrake/lib/python2.7/site-packages/petsc/lib -lHYPRE -Wl,-rpath,/usr/lib/openmpi/lib -L/usr/lib/openmpi/lib -Wl,-rpath,/usr/lib/gcc/x86_64-linux-gnu/5 -L/usr/lib/gcc/x86_64-linux-gnu/5 -Wl,-rpath,/usr/lib/x86_64-linux-gnu -L/usr/lib/x86_64-linux-gnu -Wl,-rpath,/lib/x86_64-linux-gnu -L/lib/x86_64-linux-gnu -lmpi_cxx -lstdc++ -lcmumps -ldmumps -lsmumps -lzmumps -lmumps_common -lpord -lscalapack -lml -lmpi_cxx -lstdc++ -llapack -lblas -lparmetis -lmetis -lexoIIv2for -lexodus -lnetcdf -lhdf5hl_fortran -lhdf5_fortran -lhdf5_hl -lhdf5 -ltriangle -lX11 -lhwloc -lctetgen -lssl -lcrypto -lchaco -lm -lmpi_usempif08 -lmpi_usempi_ignore_tkr -lmpi_mpifh -lgfortran -lm -lgfortran -lm -lquadmath -lm -lmpi_cxx -lstdc++ -Wl,-rpath,/usr/lib/openmpi/lib -L/usr/lib/openmpi/lib -Wl,-rpath,/usr/lib/gcc/x86_64-linux-gnu/5 -L/usr/lib/gcc/x86_64-linux-gnu/5 -Wl,-rpath,/usr/lib/x86_64-linux-gnu -L/usr/lib/x86_64-linux-gnu -Wl,-rpath,/lib/x86_64-linux-gnu -L/lib/x86_64-linux-gnu -Wl,-rpath,/usr/lib/x86_64-linux-gnu -L/usr/lib/x86_64-linux-gnu -ldl -Wl,-rpath,/usr/lib/openmpi/lib -lmpi -lgcc_s -lpthread -ldl -----------------------------------------