Re: [petsc-users] strong-scaling vs weak-scaling

Justin Chang Wed, 31 Aug 2016 00:03:00 -0700

Attached is the -log_view output (from firedrake). Event Stage 1:
Linear_solver is where I assemble and solve the linear system of equations.


I am using the HYPRE BoomerAMG preconditioner so log_view cannot "see into"
the exact steps, but based on what it can see, how do I distinguish between
these various setup and timing phases?

For example, when I look at these lines:

PCSetUp                1 1.0 2.2858e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00  9  0  0  0  0  11  0  0  0  0     0
PCApply               38 1.0 1.4102e+01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 56  0  0  0  0  66  0  0  0  0     0
KSPSetUp               1 1.0 9.9111e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
KSPSolve               1 1.0 1.7529e+01 1.0 2.44e+09 1.0 0.0e+00 0.0e+00
0.0e+00 70  7  0  0  0  82  7  0  0  0   139
SNESSolve              1 1.0 2.1056e+01 1.0 3.75e+10 1.0 0.0e+00 0.0e+00
0.0e+00 84100  0  0  0  99100  0  0  0  1781
SNESFunctionEval       1 1.0 1.0763e+00 1.0 1.07e+10 1.0 0.0e+00 0.0e+00
0.0e+00  4 29  0  0  0   5 29  0  0  0  9954
SNESJacobianEval       1 1.0 2.4495e+00 1.0 2.43e+10 1.0 0.0e+00 0.0e+00
0.0e+00 10 65  0  0  0  12 65  0  0  0  9937

So how do I break down "mesh setup", "matrix setup", and "solve time"
phases? I am guessing "PCSetUp" has to do with one of the first two phases,
but how would I categorize the rest of the events? I see that HYPRE doesn't
have as much information as the other PCs like GAMG and ML but can one
still breakdown the timing phases through log_view alone?

Thanks,
Justin

On Tue, Aug 30, 2016 at 11:14 PM, Jed Brown <j...@jedbrown.org> wrote:

> Mark Adams <mfad...@lbl.gov> writes:
>
> >>
> >>
> >> Anyway, what I really wanted to say is, it's good to know that these
> >> "dynamic range/performance spectrum/static scaling" plots are designed
> to
> >> go past the sweet spots. I also agree that it would be interesting to
> see a
> >> time vs dofs*iterations/time plot. Would it then also be useful to look
> at
> >> the step to setting up the preconditioner?
> >>
> >>
> > Yes, I generally split up timing between "mesh setup" (symbolic
> > factorization of LU), "matrix setup" (eg, factorizations), and solve
> time.
> > The degree of amortization that you get for the two setup phases depends
> on
> > your problem and so it is useful to separate them.
>
> Right, there is nothing wrong with splitting up the phases, but if you
> never show a spectrum for the total, then I will be suspicious.  And if
> you only show "per iteration" instead of for a complete solve, then I
> will assume that you're only doing that because convergence is unusably
> slow.
>

    Residual norms for linear_ solve.
    0 KSP Residual norm 5.261660052036e+02 
    1 KSP Residual norm 1.356995663739e+02 
    2 KSP Residual norm 4.098866223191e+01 
    3 KSP Residual norm 1.600475709119e+01 
    4 KSP Residual norm 6.956667251063e+00 
    5 KSP Residual norm 3.861942754258e+00 
    6 KSP Residual norm 2.331981130299e+00 
    7 KSP Residual norm 1.404876311943e+00 
    8 KSP Residual norm 8.215556397889e-01 
    9 KSP Residual norm 5.226439657305e-01 
   10 KSP Residual norm 3.421520551962e-01 
   11 KSP Residual norm 2.382992002722e-01 
   12 KSP Residual norm 1.743249670147e-01 
   13 KSP Residual norm 1.277911689618e-01 
   14 KSP Residual norm 9.453802371730e-02 
   15 KSP Residual norm 7.022732618304e-02 
   16 KSP Residual norm 5.276835142527e-02 
   17 KSP Residual norm 3.966717849679e-02 
   18 KSP Residual norm 2.987708356527e-02 
   19 KSP Residual norm 2.221046390150e-02 
   20 KSP Residual norm 1.631262945106e-02 
   21 KSP Residual norm 1.188030506469e-02 
   22 KSP Residual norm 8.655984108945e-03 
   23 KSP Residual norm 6.239072936196e-03 
   24 KSP Residual norm 4.455419528387e-03 
   25 KSP Residual norm 3.235023376588e-03 
   26 KSP Residual norm 2.345588803418e-03 
   27 KSP Residual norm 1.668600898579e-03 
   28 KSP Residual norm 1.180578845647e-03 
   29 KSP Residual norm 8.327223711005e-04 
   30 KSP Residual norm 5.853054571413e-04 
   31 KSP Residual norm 4.038722556707e-04 
   32 KSP Residual norm 2.731786184181e-04 
   33 KSP Residual norm 1.853188978548e-04 
   34 KSP Residual norm 1.277834040044e-04 
   35 KSP Residual norm 8.853670330190e-05 
   36 KSP Residual norm 6.151569062192e-05 
   37 KSP Residual norm 4.247283089736e-05 
  Linear linear_ solve converged due to CONVERGED_RTOL iterations 37
        Wall-clock time: 2.126e+01 seconds
************************************************************************************************************************
***             WIDEN YOUR WINDOW TO 120 CHARACTERS.  Use 'enscript -r 
-fCourier9' to print this document            ***
************************************************************************************************************************

---------------------------------------------- PETSc Performance Summary: 
----------------------------------------------

3D_ex1.py on a arch-python-linux-x86_64 named pacotaco-xps with 1 processor, by 
justin Tue Aug 30 23:34:47 2016
Using Petsc Development GIT revision: v3.4.2-13575-gc28f300  GIT Date: 
2016-07-10 20:22:41 -0500

                         Max       Max/Min        Avg      Total 
Time (sec):           2.497e+01      1.00000   2.497e+01
Objects:              1.310e+02      1.00000   1.310e+02
Flops:                3.749e+10      1.00000   3.749e+10  3.749e+10
Flops/sec:            1.502e+09      1.00000   1.502e+09  1.502e+09
MPI Messages:         0.000e+00      0.00000   0.000e+00  0.000e+00
MPI Message Lengths:  0.000e+00      0.00000   0.000e+00  0.000e+00
MPI Reductions:       0.000e+00      0.00000

Flop counting convention: 1 flop = 1 real number operation of type 
(multiply/divide/add/subtract)
                            e.g., VecAXPY() for real vectors of length N --> 2N 
flops
                            and VecAXPY() for complex vectors of length N --> 
8N flops

Summary of Stages:   ----- Time ------  ----- Flops -----  --- Messages ---  -- 
Message Lengths --  -- Reductions --
                        Avg     %Total     Avg     %Total   counts   %Total     
Avg         %Total   counts   %Total 
 0:      Main Stage: 3.7065e+00  14.8%  0.0000e+00   0.0%  0.000e+00   0.0%  
0.000e+00        0.0%  0.000e+00   0.0% 
 1:   Linear_solver: 2.1265e+01  85.2%  3.7494e+10 100.0%  0.000e+00   0.0%  
0.000e+00        0.0%  0.000e+00   0.0% 
 2: Nonlinear_solver: 9.5367e-07   0.0%  0.0000e+00   0.0%  0.000e+00   0.0%  
0.000e+00        0.0%  0.000e+00   0.0% 

------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on interpreting 
output.
Phase summary info:
   Count: number of times phase was executed
   Time and Flops: Max - maximum over all processors
                   Ratio - ratio of maximum to minimum over all processors
   Mess: number of messages sent
   Avg. len: average message length (bytes)
   Reduct: number of global reductions
   Global: entire computation
   Stage: stages of a computation. Set stages with PetscLogStagePush() and 
PetscLogStagePop().
      %T - percent time in this phase         %F - percent flops in this phase
      %M - percent messages in this phase     %L - percent message lengths in 
this phase
      %R - percent reductions in this phase
   Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over all 
processors)
------------------------------------------------------------------------------------------------------------------------
Event                Count      Time (sec)     Flops                            
 --- Global ---  --- Stage ---   Total
                   Max Ratio  Max     Ratio   Max  Ratio  Mess   Avg len Reduct 
 %T %F %M %L %R  %T %F %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------

--- Event Stage 0: Main Stage

VecSet                 4 1.0 3.1900e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatAssemblyBegin       1 1.0 0.0000e+00 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatAssemblyEnd         1 1.0 2.7848e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  0  0  0  0  0   1  0  0  0  0     0
DMPlexInterp           1 1.0 1.2410e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
DMPlexStratify         2 1.0 4.7100e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
SFSetGraph             7 1.0 2.5320e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  0  0  0  0  0   1  0  0  0  0     0
CreateMesh             8 1.0 1.2974e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
CreateExtMesh          1 1.0 4.5982e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  2  0  0  0  0  12  0  0  0  0     0
Mesh: reorder          1 1.0 1.7152e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
Mesh: numbering        1 1.0 7.5190e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
CreateFunctionSpace       5 1.0 4.4637e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  2  0  0  0  0  12  0  0  0  0     0
Trace: eval            4 1.0 1.3766e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  6  0  0  0  0  37  0  0  0  0     0
ParLoopExecute         2 1.0 1.3765e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  6  0  0  0  0  37  0  0  0  0     0
ParLoopCKernel         6 1.0 1.3747e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  6  0  0  0  0  37  0  0  0  0     0
ParLoopReductionBegin       2 1.0 9.0599e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
ParLoopReductionEnd       2 1.0 5.0068e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
CreateSparsity         1 1.0 6.4163e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  3  0  0  0  0  17  0  0  0  0     0
MatZeroInitial         1 1.0 6.9048e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  3  0  0  0  0  19  0  0  0  0     0

--- Event Stage 1: Linear_solver

VecTDot               74 1.0 6.9256e-02 1.0 1.52e+08 1.0 0.0e+00 0.0e+00 
0.0e+00  0  0  0  0  0   0  0  0  0  0  2202
VecNorm               38 1.0 1.8549e-02 1.0 7.83e+07 1.0 0.0e+00 0.0e+00 
0.0e+00  0  0  0  0  0   0  0  0  0  0  4221
VecCopy                4 1.0 4.4966e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecSet                45 1.0 1.4026e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecAXPY               75 1.0 9.5319e-02 1.0 1.55e+08 1.0 0.0e+00 0.0e+00 
0.0e+00  0  0  0  0  0   0  0  0  0  0  1621
VecAYPX               36 1.0 4.9965e-02 1.0 7.42e+07 1.0 0.0e+00 0.0e+00 
0.0e+00  0  0  0  0  0   0  0  0  0  0  1485
MatMult               37 1.0 9.0438e-01 1.0 1.98e+09 1.0 0.0e+00 0.0e+00 
0.0e+00  4  5  0  0  0   4  5  0  0  0  2189
MatConvert             1 1.0 1.0125e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatAssemblyBegin       2 1.0 0.0000e+00 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatAssemblyEnd         2 1.0 1.6134e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatGetRowIJ            1 1.0 0.0000e+00 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatZeroEntries         1 1.0 8.9929e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
PCSetUp                1 1.0 2.2858e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  9  0  0  0  0  11  0  0  0  0     0
PCApply               38 1.0 1.4102e+01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00 56  0  0  0  0  66  0  0  0  0     0
KSPSetUp               1 1.0 9.9111e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
KSPSolve               1 1.0 1.7529e+01 1.0 2.44e+09 1.0 0.0e+00 0.0e+00 
0.0e+00 70  7  0  0  0  82  7  0  0  0   139
SNESSolve              1 1.0 2.1056e+01 1.0 3.75e+10 1.0 0.0e+00 0.0e+00 
0.0e+00 84100  0  0  0  99100  0  0  0  1781
SNESFunctionEval       1 1.0 1.0763e+00 1.0 1.07e+10 1.0 0.0e+00 0.0e+00 
0.0e+00  4 29  0  0  0   5 29  0  0  0  9954
SNESJacobianEval       1 1.0 2.4495e+00 1.0 2.43e+10 1.0 0.0e+00 0.0e+00 
0.0e+00 10 65  0  0  0  12 65  0  0  0  9937
Trace: eval           11 1.0 3.6623e+00 1.0 3.51e+10 1.0 0.0e+00 0.0e+00 
0.0e+00 15 93  0  0  0  17 93  0  0  0  9572
ParLoopExecute        14 1.0 3.6407e+00 1.0 3.51e+10 1.0 0.0e+00 0.0e+00 
0.0e+00 15 93  0  0  0  17 93  0  0  0  9629
ParLoopCKernel        31 1.0 3.6314e+00 1.0 3.51e+10 1.0 0.0e+00 0.0e+00 
0.0e+00 15 93  0  0  0  17 93  0  0  0  9653
ParLoopReductionBegin      14 1.0 4.6015e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
ParLoopReductionEnd      14 1.0 2.2411e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
ApplyBC                6 1.0 1.6722e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  0  0  0  0  0   0  0  0  0  0     0

--- Event Stage 2: Nonlinear_solver

------------------------------------------------------------------------------------------------------------------------

Memory usage is given in bytes:

Object Type          Creations   Destructions     Memory  Descendants' Mem.
Reports information only for process 0.

--- Event Stage 0: Main Stage

           Container    10              8         4608     0.
              Viewer     2              0            0     0.
           Index Set    36             27        21384     0.
   IS L to G Mapping     1              0            0     0.
             Section    24              8         5568     0.
              Vector     5              4     32975776     0.
              Matrix     2              0            0     0.
      Preconditioner     1              1         1400     0.
       Krylov Solver     1              1         1248     0.
                SNES     1              1         1344     0.
      SNESLineSearch     1              1          992     0.
              DMSNES     1              0            0     0.
    Distributed Mesh     8              4        19008     0.
    GraphPartitioner     2              1          612     0.
Star Forest Bipartite Graph    19             11         8888     0.
     Discrete System     8              4         3520     0.

--- Event Stage 1: Linear_solver

              Vector     8              2     16487888     0.
     DMKSP interface     1              0            0     0.

--- Event Stage 2: Nonlinear_solver

========================================================================================================================
Average time to get PetscTime(): 0.
#PETSc Option Table entries:
-ksp_rtol 1e-3
-linear_ksp_atol 1e-50
-linear_ksp_converged_reason
-linear_ksp_monitor
-linear_ksp_rtol 1e-7
-linear_ksp_type cg
-linear_pc_hypre_boomeramg_agg_nl 2
-linear_pc_hypre_boomeramg_strong_threshold 0.75
-linear_pc_hypre_type boomeramg
-linear_pc_type hypre
-linear_snes_atol 1e-8
-linear_snes_type ksponly
-log_view
-snes_converged_reason
-snes_max_it 1000
-tao_converged_reason
-tao_max_it 1000
#End of PETSc Option Table entries
Compiled without FORTRAN kernels
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 
sizeof(PetscScalar) 8 sizeof(PetscInt) 4
Configure options: 
--prefix=/home/justin/Software/firedrake/lib/python2.7/site-packages/petsc 
PETSC_ARCH=arch-python-linux-x86_64 --with-shared-libraries=1 
--with-debugging=0 --with-c2html=0 --with-cc=/usr/bin/mpicc 
--with-cxx=/usr/bin/mpicxx --with-fc=/usr/bin/mpif90 --download-ml 
--download-ctetgen --download-triangle --download-chaco --download-metis 
--download-parmetis --download-scalapack --download-hypre --download-mumps 
--download-netcdf --download-hdf5 --download-exodusii
-----------------------------------------
Libraries compiled on Fri Aug  5 02:51:37 2016 on pacotaco-xps 
Machine characteristics: Linux-4.4.0-31-generic-x86_64-with-Ubuntu-16.04-xenial
Using PETSc directory: /tmp/pip-kA7m2r-build
Using PETSc arch: arch-python-linux-x86_64
-----------------------------------------

Using C compiler: /usr/bin/mpicc  -fPIC  -Wall -Wwrite-strings 
-Wno-strict-aliasing -Wno-unknown-pragmas -fvisibility=hidden -g -O  
${COPTFLAGS} ${CFLAGS}
Using Fortran compiler: /usr/bin/mpif90  -fPIC -Wall -ffree-line-length-0 
-Wno-unused-dummy-argument -g -O   ${FOPTFLAGS} ${FFLAGS} 
-----------------------------------------

Using include paths: -I/tmp/pip-kA7m2r-build/arch-python-linux-x86_64/include 
-I/tmp/pip-kA7m2r-build/include -I/tmp/pip-kA7m2r-build/include 
-I/tmp/pip-kA7m2r-build/arch-python-linux-x86_64/include 
-I/home/justin/Software/firedrake/lib/python2.7/site-packages/petsc/include 
-I/usr/lib/openmpi/include/openmpi/opal/mca/event/libevent2021/libevent 
-I/usr/lib/openmpi/include/openmpi/opal/mca/event/libevent2021/libevent/include 
-I/usr/lib/openmpi/include -I/usr/lib/openmpi/include/openmpi
-----------------------------------------

Using C linker: /usr/bin/mpicc
Using Fortran linker: /usr/bin/mpif90
Using libraries: -Wl,-rpath,/tmp/pip-kA7m2r-build/arch-python-linux-x86_64/lib 
-L/tmp/pip-kA7m2r-build/arch-python-linux-x86_64/lib -lpetsc 
-Wl,-rpath,/home/justin/Software/firedrake/lib/python2.7/site-packages/petsc/lib
 -L/home/justin/Software/firedrake/lib/python2.7/site-packages/petsc/lib 
-lHYPRE -Wl,-rpath,/usr/lib/openmpi/lib -L/usr/lib/openmpi/lib 
-Wl,-rpath,/usr/lib/gcc/x86_64-linux-gnu/5 -L/usr/lib/gcc/x86_64-linux-gnu/5 
-Wl,-rpath,/usr/lib/x86_64-linux-gnu -L/usr/lib/x86_64-linux-gnu 
-Wl,-rpath,/lib/x86_64-linux-gnu -L/lib/x86_64-linux-gnu -lmpi_cxx -lstdc++ 
-lcmumps -ldmumps -lsmumps -lzmumps -lmumps_common -lpord -lscalapack -lml 
-lmpi_cxx -lstdc++ -llapack -lblas -lparmetis -lmetis -lexoIIv2for -lexodus 
-lnetcdf -lhdf5hl_fortran -lhdf5_fortran -lhdf5_hl -lhdf5 -ltriangle -lX11 
-lhwloc -lctetgen -lssl -lcrypto -lchaco -lm -lmpi_usempif08 
-lmpi_usempi_ignore_tkr -lmpi_mpifh -lgfortran -lm -lgfortran -lm -lquadmath 
-lm -lmpi_cxx -lstdc++ -Wl,-rpath,/usr/lib/openmpi/lib -L/usr/lib/openmpi/lib 
-Wl,-rpath,/usr/lib/gcc/x86_64-linux-gnu/5 -L/usr/lib/gcc/x86_64-linux-gnu/5 
-Wl,-rpath,/usr/lib/x86_64-linux-gnu -L/usr/lib/x86_64-linux-gnu 
-Wl,-rpath,/lib/x86_64-linux-gnu -L/lib/x86_64-linux-gnu 
-Wl,-rpath,/usr/lib/x86_64-linux-gnu -L/usr/lib/x86_64-linux-gnu -ldl 
-Wl,-rpath,/usr/lib/openmpi/lib -lmpi -lgcc_s -lpthread -ldl 
-----------------------------------------

Re: [petsc-users] strong-scaling vs weak-scaling

Reply via email to