On Thu, Jan 14, 2016 at 7:37 AM, Dave May <dave.mayhe...@gmail.com> wrote:
> > > On 14 January 2016 at 14:24, Matthew Knepley <knep...@gmail.com> wrote: > >> On Wed, Jan 13, 2016 at 11:12 PM, Bhalla, Amneet Pal S < >> amne...@live.unc.edu> wrote: >> >>> >>> >>> On Jan 13, 2016, at 6:22 PM, Matthew Knepley <knep...@gmail.com> wrote: >>> >>> Can you mail us a -log_summary for a rough cut? Sometimes its hard >>> to interpret the data avalanche from one of those tools without a simple >>> map. >>> >>> >>> Does this indicate some hot spots? >>> >> >> 1) There is a misspelled option -stokes_ib_pc_level_ksp_ >> richardson_self_scae >> >> You can try to avoid this by giving -options_left >> >> 2) Are you using any custom code during the solve? There is a gaping >> whole in the timing. It take 9s to >> do PCApply(), but something like a collective 1s to do everything we >> time under that. >> > > > You are looking at the timing from a debug build. > The results from the optimized build don't have such a gaping hole. > It still looks like 50% of the runtime to me. Matt > >> Since this is serial, we can use something like kcachegrind to look at >> performance as well, which should >> at least tell us what is sucking up this time so we can put a PETSc even >> on it. >> >> Thanks, >> >> Matt >> >> >> >>> >>> ************************************************************************************************************************ >>> *** WIDEN YOUR WINDOW TO 120 CHARACTERS. Use 'enscript -r >>> -fCourier9' to print this document *** >>> >>> ************************************************************************************************************************ >>> >>> ---------------------------------------------- PETSc Performance >>> Summary: ---------------------------------------------- >>> >>> ./main2d on a darwin-dbg named Amneets-MBP.attlocal.net with 1 >>> processor, by Taylor Wed Jan 13 21:07:43 2016 >>> Using Petsc Development GIT revision: v3.6.1-2556-g6721a46 GIT Date: >>> 2015-11-16 13:07:08 -0600 >>> >>> Max Max/Min Avg Total >>> Time (sec): 1.039e+01 1.00000 1.039e+01 >>> Objects: 2.834e+03 1.00000 2.834e+03 >>> Flops: 3.552e+08 1.00000 3.552e+08 3.552e+08 >>> Flops/sec: 3.418e+07 1.00000 3.418e+07 3.418e+07 >>> Memory: 3.949e+07 1.00000 3.949e+07 >>> MPI Messages: 0.000e+00 0.00000 0.000e+00 0.000e+00 >>> MPI Message Lengths: 0.000e+00 0.00000 0.000e+00 0.000e+00 >>> MPI Reductions: 0.000e+00 0.00000 >>> >>> Flop counting convention: 1 flop = 1 real number operation of type >>> (multiply/divide/add/subtract) >>> e.g., VecAXPY() for real vectors of length N >>> --> 2N flops >>> and VecAXPY() for complex vectors of length >>> N --> 8N flops >>> >>> Summary of Stages: ----- Time ------ ----- Flops ----- --- Messages >>> --- -- Message Lengths -- -- Reductions -- >>> Avg %Total Avg %Total counts >>> %Total Avg %Total counts %Total >>> 0: Main Stage: 1.0391e+01 100.0% 3.5520e+08 100.0% 0.000e+00 >>> 0.0% 0.000e+00 0.0% 0.000e+00 0.0% >>> >>> >>> ------------------------------------------------------------------------------------------------------------------------ >>> See the 'Profiling' chapter of the users' manual for details on >>> interpreting output. >>> Phase summary info: >>> Count: number of times phase was executed >>> Time and Flops: Max - maximum over all processors >>> Ratio - ratio of maximum to minimum over all >>> processors >>> Mess: number of messages sent >>> Avg. len: average message length (bytes) >>> Reduct: number of global reductions >>> Global: entire computation >>> Stage: stages of a computation. Set stages with PetscLogStagePush() >>> and PetscLogStagePop(). >>> %T - percent time in this phase %F - percent flops in this >>> phase >>> %M - percent messages in this phase %L - percent message >>> lengths in this phase >>> %R - percent reductions in this phase >>> Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time >>> over all processors) >>> >>> ------------------------------------------------------------------------------------------------------------------------ >>> >>> >>> ########################################################## >>> # # >>> # WARNING!!! # >>> # # >>> # This code was compiled with a debugging option, # >>> # To get timing results run ./configure # >>> # using --with-debugging=no, the performance will # >>> # be generally two or three times faster. # >>> # # >>> ########################################################## >>> >>> >>> Event Count Time (sec) Flops >>> --- Global --- --- Stage --- Total >>> Max Ratio Max Ratio Max Ratio Mess Avg len >>> Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s >>> >>> ------------------------------------------------------------------------------------------------------------------------ >>> >>> --- Event Stage 0: Main Stage >>> >>> VecDot 4 1.0 9.0525e-04 1.0 3.31e+04 1.0 0.0e+00 0.0e+00 >>> 0.0e+00 0 0 0 0 0 0 0 0 0 0 37 >>> VecMDot 533 1.0 1.5936e-02 1.0 5.97e+06 1.0 0.0e+00 0.0e+00 >>> 0.0e+00 0 2 0 0 0 0 2 0 0 0 375 >>> VecNorm 412 1.0 9.2107e-03 1.0 3.57e+06 1.0 0.0e+00 0.0e+00 >>> 0.0e+00 0 1 0 0 0 0 1 0 0 0 388 >>> VecScale 331 1.0 5.8195e-01 1.0 1.41e+06 1.0 0.0e+00 0.0e+00 >>> 0.0e+00 6 0 0 0 0 6 0 0 0 0 2 >>> VecCopy 116 1.0 1.9983e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 >>> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>> VecSet 18362 1.0 1.5249e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 >>> 0.0e+00 1 0 0 0 0 1 0 0 0 0 0 >>> VecAXPY 254 1.0 4.3961e-01 1.0 1.95e+06 1.0 0.0e+00 0.0e+00 >>> 0.0e+00 4 1 0 0 0 4 1 0 0 0 4 >>> VecAYPX 92 1.0 2.5167e-03 1.0 2.66e+05 1.0 0.0e+00 0.0e+00 >>> 0.0e+00 0 0 0 0 0 0 0 0 0 0 106 >>> VecAXPBYCZ 36 1.0 8.6242e-04 1.0 2.94e+05 1.0 0.0e+00 0.0e+00 >>> 0.0e+00 0 0 0 0 0 0 0 0 0 0 341 >>> VecWAXPY 58 1.0 1.2539e-03 1.0 2.47e+05 1.0 0.0e+00 0.0e+00 >>> 0.0e+00 0 0 0 0 0 0 0 0 0 0 197 >>> VecMAXPY 638 1.0 2.3439e-02 1.0 7.68e+06 1.0 0.0e+00 0.0e+00 >>> 0.0e+00 0 2 0 0 0 0 2 0 0 0 328 >>> VecSwap 111 1.0 1.9721e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 >>> 0.0e+00 2 0 0 0 0 2 0 0 0 0 0 >>> VecAssemblyBegin 607 1.0 3.8150e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 >>> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>> VecAssemblyEnd 607 1.0 8.3705e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 >>> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>> VecScatterBegin 26434 1.0 3.0096e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 >>> 0.0e+00 3 0 0 0 0 3 0 0 0 0 0 >>> VecNormalize 260 1.0 4.9754e-01 1.0 3.84e+06 1.0 0.0e+00 0.0e+00 >>> 0.0e+00 5 1 0 0 0 5 1 0 0 0 8 >>> BuildTwoSidedF 600 1.0 1.8942e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 >>> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>> MatMult 365 1.0 6.0306e-01 1.0 6.26e+07 1.0 0.0e+00 0.0e+00 >>> 0.0e+00 6 18 0 0 0 6 18 0 0 0 104 >>> MatSolve 8775 1.0 6.8506e-01 1.0 2.25e+08 1.0 0.0e+00 0.0e+00 >>> 0.0e+00 7 63 0 0 0 7 63 0 0 0 328 >>> MatLUFactorSym 85 1.0 1.0664e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 >>> 0.0e+00 1 0 0 0 0 1 0 0 0 0 0 >>> MatLUFactorNum 85 1.0 1.2066e-01 1.0 4.23e+07 1.0 0.0e+00 0.0e+00 >>> 0.0e+00 1 12 0 0 0 1 12 0 0 0 350 >>> MatScale 4 1.0 4.0145e-04 1.0 2.51e+05 1.0 0.0e+00 0.0e+00 >>> 0.0e+00 0 0 0 0 0 0 0 0 0 0 625 >>> MatAssemblyBegin 108 1.0 4.8849e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 >>> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>> MatAssemblyEnd 108 1.0 9.8455e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 >>> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>> MatGetRow 33120 1.0 1.4157e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 >>> 0.0e+00 1 0 0 0 0 1 0 0 0 0 0 >>> MatGetRowIJ 85 1.0 2.6060e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 >>> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>> MatGetSubMatrice 4 1.0 4.2922e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 >>> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>> MatGetOrdering 85 1.0 3.1230e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 >>> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>> MatAXPY 4 1.0 4.0459e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 >>> 0.0e+00 4 0 0 0 0 4 0 0 0 0 0 >>> MatPtAP 4 1.0 1.1362e-01 1.0 4.99e+06 1.0 0.0e+00 0.0e+00 >>> 0.0e+00 1 1 0 0 0 1 1 0 0 0 44 >>> MatPtAPSymbolic 4 1.0 6.4973e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 >>> 0.0e+00 1 0 0 0 0 1 0 0 0 0 0 >>> MatPtAPNumeric 4 1.0 4.8521e-02 1.0 4.99e+06 1.0 0.0e+00 0.0e+00 >>> 0.0e+00 0 1 0 0 0 0 1 0 0 0 103 >>> MatGetSymTrans 4 1.0 5.9780e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 >>> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>> KSPGMRESOrthog 182 1.0 2.0538e-02 1.0 5.11e+06 1.0 0.0e+00 0.0e+00 >>> 0.0e+00 0 1 0 0 0 0 1 0 0 0 249 >>> KSPSetUp 90 1.0 2.1210e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 >>> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>> KSPSolve 1 1.0 9.5567e+00 1.0 3.50e+08 1.0 0.0e+00 0.0e+00 >>> 0.0e+00 92 98 0 0 0 92 98 0 0 0 37 >>> PCSetUp 90 1.0 4.0597e-01 1.0 4.23e+07 1.0 0.0e+00 0.0e+00 >>> 0.0e+00 4 12 0 0 0 4 12 0 0 0 104 >>> PCSetUpOnBlocks 91 1.0 2.9886e-01 1.0 4.23e+07 1.0 0.0e+00 0.0e+00 >>> 0.0e+00 3 12 0 0 0 3 12 0 0 0 141 >>> PCApply 13 1.0 9.0558e+00 1.0 3.49e+08 1.0 0.0e+00 0.0e+00 >>> 0.0e+00 87 98 0 0 0 87 98 0 0 0 39 >>> SNESSolve 1 1.0 9.5729e+00 1.0 3.50e+08 1.0 0.0e+00 0.0e+00 >>> 0.0e+00 92 98 0 0 0 92 98 0 0 0 37 >>> SNESFunctionEval 2 1.0 1.3347e-02 1.0 4.68e+04 1.0 0.0e+00 0.0e+00 >>> 0.0e+00 0 0 0 0 0 0 0 0 0 0 4 >>> SNESJacobianEval 1 1.0 2.4613e-03 1.0 4.26e+03 1.0 0.0e+00 0.0e+00 >>> 0.0e+00 0 0 0 0 0 0 0 0 0 0 2 >>> >>> ------------------------------------------------------------------------------------------------------------------------ >>> >>> Memory usage is given in bytes: >>> >>> Object Type Creations Destructions Memory Descendants' >>> Mem. >>> Reports information only for process 0. >>> >>> --- Event Stage 0: Main Stage >>> >>> Vector 870 762 13314200 0. >>> Vector Scatter 290 289 189584 0. >>> Index Set 1171 823 951096 0. >>> IS L to G Mapping 110 109 2156656 0. >>> Application Order 6 6 99952 0. >>> MatMFFD 1 1 776 0. >>> Matrix 189 189 24202324 0. >>> Matrix Null Space 4 4 2432 0. >>> Krylov Solver 90 90 190080 0. >>> DMKSP interface 1 1 648 0. >>> Preconditioner 90 90 89128 0. >>> SNES 1 1 1328 0. >>> SNESLineSearch 1 1 856 0. >>> DMSNES 1 1 664 0. >>> Distributed Mesh 2 2 9024 0. >>> Star Forest Bipartite Graph 4 4 3168 0. >>> Discrete System 2 2 1696 0. >>> Viewer 1 0 0 0. >>> >>> ======================================================================================================================== >>> Average time to get PetscTime(): 4.74e-08 >>> #PETSc Option Table entries: >>> -ib_ksp_converged_reason >>> -ib_ksp_monitor_true_residual >>> -ib_snes_type ksponly >>> -log_summary >>> -stokes_ib_pc_level_ksp_richardson_self_scae >>> -stokes_ib_pc_level_ksp_type gmres >>> -stokes_ib_pc_level_pc_asm_local_type additive >>> -stokes_ib_pc_level_pc_asm_type interpolate >>> -stokes_ib_pc_level_sub_pc_factor_nonzeros_along_diagonal >>> -stokes_ib_pc_level_sub_pc_type lu >>> #End of PETSc Option Table entries >>> Compiled without FORTRAN kernels >>> Compiled with full precision matrices (default) >>> sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 >>> sizeof(PetscScalar) 8 sizeof(PetscInt) 4 >>> Configure options: --CC=mpicc --CXX=mpicxx --FC=mpif90 >>> --PETSC_ARCH=darwin-dbg --with-debugging=1 --with-c++-support=1 >>> --with-hypre=1 --download-hypre=1 --with-hdf5=yes >>> --with-hdf5-dir=/Users/Taylor/Documents/SOFTWARES/HDF5/ >>> ----------------------------------------- >>> Libraries compiled on Mon Nov 16 15:11:21 2015 on d209.math.ucdavis.edu >>> Machine characteristics: Darwin-14.5.0-x86_64-i386-64bit >>> Using PETSc directory: >>> /Users/Taylor/Documents/SOFTWARES/PETSc-BitBucket/PETSc >>> Using PETSc arch: darwin-dbg >>> ----------------------------------------- >>> >>> Using C compiler: mpicc -g ${COPTFLAGS} ${CFLAGS} >>> Using Fortran compiler: mpif90 -g ${FOPTFLAGS} ${FFLAGS} >>> ----------------------------------------- >>> >>> Using include paths: >>> -I/Users/Taylor/Documents/SOFTWARES/PETSc-BitBucket/PETSc/darwin-dbg/include >>> -I/Users/Taylor/Documents/SOFTWARES/PETSc-BitBucket/PETSc/include >>> -I/Users/Taylor/Documents/SOFTWARES/PETSc-BitBucket/PETSc/include >>> -I/Users/Taylor/Documents/SOFTWARES/PETSc-BitBucket/PETSc/darwin-dbg/include >>> -I/opt/X11/include -I/Users/Taylor/Documents/SOFTWARES/HDF5/include >>> -I/opt/local/include -I/Users/Taylor/Documents/SOFTWARES/MPICH/include >>> ----------------------------------------- >>> >>> Using C linker: mpicc >>> Using Fortran linker: mpif90 >>> Using libraries: >>> -Wl,-rpath,/Users/Taylor/Documents/SOFTWARES/PETSc-BitBucket/PETSc/darwin-dbg/lib >>> -L/Users/Taylor/Documents/SOFTWARES/PETSc-BitBucket/PETSc/darwin-dbg/lib >>> -lpetsc >>> -Wl,-rpath,/Users/Taylor/Documents/SOFTWARES/PETSc-BitBucket/PETSc/darwin-dbg/lib >>> -L/Users/Taylor/Documents/SOFTWARES/PETSc-BitBucket/PETSc/darwin-dbg/lib >>> -lHYPRE -Wl,-rpath,/Users/Taylor/Documents/SOFTWARES/MPICH/lib >>> -L/Users/Taylor/Documents/SOFTWARES/MPICH/lib >>> -Wl,-rpath,/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/lib/clang/6.1.0/lib/darwin >>> -L/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/lib/clang/6.1.0/lib/darwin >>> -lclang_rt.osx -lmpicxx -lc++ >>> -Wl,-rpath,/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/../lib/clang/6.1.0/lib/darwin >>> -L/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/../lib/clang/6.1.0/lib/darwin >>> -lclang_rt.osx -llapack -lblas -Wl,-rpath,/opt/X11/lib -L/opt/X11/lib -lX11 >>> -Wl,-rpath,/Users/Taylor/Documents/SOFTWARES/HDF5/lib >>> -L/Users/Taylor/Documents/SOFTWARES/HDF5/lib -lhdf5_hl -lhdf5 -lssl >>> -lcrypto -lmpifort -lgfortran >>> -Wl,-rpath,/opt/local/lib/gcc49/gcc/x86_64-apple-darwin14/4.9.1 >>> -L/opt/local/lib/gcc49/gcc/x86_64-apple-darwin14/4.9.1 >>> -Wl,-rpath,/opt/local/lib/gcc49 -L/opt/local/lib/gcc49 -lgfortran >>> -lgcc_ext.10.5 -lquadmath -lm -lclang_rt.osx -lmpicxx -lc++ -lclang_rt.osx >>> -Wl,-rpath,/Users/Taylor/Documents/SOFTWARES/MPICH/lib >>> -L/Users/Taylor/Documents/SOFTWARES/MPICH/lib -ldl -lmpi -lpmpi -lSystem >>> -Wl,-rpath,/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/../lib/clang/6.1.0/lib/darwin >>> -L/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/../lib/clang/6.1.0/lib/darwin >>> -lclang_rt.osx -ldl >>> ----------------------------------------- >>> >>> >>> >> >> >> -- >> What most experimenters take for granted before they begin their >> experiments is infinitely more interesting than any results to which their >> experiments lead. >> -- Norbert Wiener >> > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener