It looks like the Schur solve is requiring a huge number of iterates to converge (based on the instances of MatMult). This is killing the performance.
Are you sure that A11 is a good approximation to S? You might consider trying the selfp option http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/PC/PCFieldSplitSetSchurPre.html#PCFieldSplitSetSchurPre Note that the best approx to S is likely both problem and discretisation dependent so if selfp is also terrible, you might want to consider coding up your own approx to S for your specific system. Thanks, Dave On Wed, 11 Jan 2017 at 22:34, David Knezevic <david.kneze...@akselos.com> wrote: I have a definite block 2x2 system and I figured it'd be good to apply the PCFIELDSPLIT functionality with Schur complement, as described in Section 4.5 of the manual. The A00 block of my matrix is very small so I figured I'd specify a direct solver (i.e. MUMPS) for that block. So I did the following: - PCFieldSplitSetIS to specify the indices of the two splits - PCFieldSplitGetSubKSP to get the two KSP objects, and to set the solver and PC types for each (MUMPS for A00, ILU+CG for A11) - I set -pc_fieldsplit_schur_fact_type full Below I have pasted the output of "-ksp_view -ksp_monitor -log_view" for a test case. It seems to converge well, but I'm concerned about the speed (about 90 seconds, vs. about 1 second if I use a direct solver for the entire system). I just wanted to check if I'm setting this up in a good way? Many thanks, David ----------------------------------------------------------------------------------- 0 KSP Residual norm 5.405774214400e+04 1 KSP Residual norm 1.849649014371e+02 2 KSP Residual norm 7.462775074989e-02 3 KSP Residual norm 2.680497175260e-04 KSP Object: 1 MPI processes type: cg maximum iterations=1000 tolerances: relative=1e-06, absolute=1e-50, divergence=10000. left preconditioning using nonzero initial guess using PRECONDITIONED norm type for convergence test PC Object: 1 MPI processes type: fieldsplit FieldSplit with Schur preconditioner, factorization FULL Preconditioner for the Schur complement formed from A11 Split info: Split number 0 Defined by IS Split number 1 Defined by IS KSP solver for A00 block KSP Object: (fieldsplit_RB_split_) 1 MPI processes type: preonly maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000. left preconditioning using NONE norm type for convergence test PC Object: (fieldsplit_RB_split_) 1 MPI processes type: cholesky Cholesky: out-of-place factorization tolerance for zero pivot 2.22045e-14 matrix ordering: natural factor fill ratio given 0., needed 0. Factored matrix follows: Mat Object: 1 MPI processes type: seqaij rows=324, cols=324 package used to perform factorization: mumps total: nonzeros=3042, allocated nonzeros=3042 total number of mallocs used during MatSetValues calls =0 MUMPS run parameters: SYM (matrix type): 2 PAR (host participation): 1 ICNTL(1) (output for error): 6 ICNTL(2) (output of diagnostic msg): 0 ICNTL(3) (output for global info): 0 ICNTL(4) (level of printing): 0 ICNTL(5) (input mat struct): 0 ICNTL(6) (matrix prescaling): 7 ICNTL(7) (sequentia matrix ordering):7 ICNTL(8) (scalling strategy): 77 ICNTL(10) (max num of refinements): 0 ICNTL(11) (error analysis): 0 ICNTL(12) (efficiency control): 0 ICNTL(13) (efficiency control): 0 ICNTL(14) (percentage of estimated workspace increase): 20 ICNTL(18) (input mat struct): 0 ICNTL(19) (Shur complement info): 0 ICNTL(20) (rhs sparse pattern): 0 ICNTL(21) (solution struct): 0 ICNTL(22) (in-core/out-of-core facility): 0 ICNTL(23) (max size of memory can be allocated locally):0 ICNTL(24) (detection of null pivot rows): 0 ICNTL(25) (computation of a null space basis): 0 ICNTL(26) (Schur options for rhs or solution): 0 ICNTL(27) (experimental parameter): -24 ICNTL(28) (use parallel or sequential ordering): 1 ICNTL(29) (parallel ordering): 0 ICNTL(30) (user-specified set of entries in inv(A)): 0 ICNTL(31) (factors is discarded in the solve phase): 0 ICNTL(33) (compute determinant): 0 CNTL(1) (relative pivoting threshold): 0.01 CNTL(2) (stopping criterion of refinement): 1.49012e-08 CNTL(3) (absolute pivoting threshold): 0. CNTL(4) (value of static pivoting): -1. CNTL(5) (fixation for null pivots): 0. RINFO(1) (local estimated flops for the elimination after analysis): [0] 29394. RINFO(2) (local estimated flops for the assembly after factorization): [0] 1092. RINFO(3) (local estimated flops for the elimination after factorization): [0] 29394. INFO(15) (estimated size of (in MB) MUMPS internal data for running numerical factorization): [0] 1 INFO(16) (size of (in MB) MUMPS internal data used during numerical factorization): [0] 1 INFO(23) (num of pivots eliminated on this processor after factorization): [0] 324 RINFOG(1) (global estimated flops for the elimination after analysis): 29394. RINFOG(2) (global estimated flops for the assembly after factorization): 1092. RINFOG(3) (global estimated flops for the elimination after factorization): 29394. (RINFOG(12) RINFOG(13))*2^INFOG(34) (determinant): (0.,0.)*(2^0) INFOG(3) (estimated real workspace for factors on all processors after analysis): 3888 INFOG(4) (estimated integer workspace for factors on all processors after analysis): 2067 INFOG(5) (estimated maximum front size in the complete tree): 12 INFOG(6) (number of nodes in the complete tree): 53 INFOG(7) (ordering option effectively use after analysis): 2 INFOG(8) (structural symmetry in percent of the permuted matrix after analysis): 100 INFOG(9) (total real/complex workspace to store the matrix factors after factorization): 3888 INFOG(10) (total integer space store the matrix factors after factorization): 2067 INFOG(11) (order of largest frontal matrix after factorization): 12 INFOG(12) (number of off-diagonal pivots): 0 INFOG(13) (number of delayed pivots after factorization): 0 INFOG(14) (number of memory compress after factorization): 0 INFOG(15) (number of steps of iterative refinement after solution): 0 INFOG(16) (estimated size (in MB) of all MUMPS internal data for factorization after analysis: value on the most memory consuming processor): 1 INFOG(17) (estimated size of all MUMPS internal data for factorization after analysis: sum over all processors): 1 INFOG(18) (size of all MUMPS internal data allocated during factorization: value on the most memory consuming processor): 1 INFOG(19) (size of all MUMPS internal data allocated during factorization: sum over all processors): 1 INFOG(20) (estimated number of entries in the factors): 3042 INFOG(21) (size in MB of memory effectively used during factorization - value on the most memory consuming processor): 1 INFOG(22) (size in MB of memory effectively used during factorization - sum over all processors): 1 INFOG(23) (after analysis: value of ICNTL(6) effectively used): 5 INFOG(24) (after analysis: value of ICNTL(12) effectively used): 1 INFOG(25) (after factorization: number of pivots modified by static pivoting): 0 INFOG(28) (after factorization: number of null pivots encountered): 0 INFOG(29) (after factorization: effective number of entries in the factors (sum over all processors)): 3042 INFOG(30, 31) (after solution: size in Mbytes of memory used during solution phase): 0, 0 INFOG(32) (after analysis: type of analysis done): 1 INFOG(33) (value used for ICNTL(8)): -2 INFOG(34) (exponent of the determinant if determinant is requested): 0 linear system matrix = precond matrix: Mat Object: (fieldsplit_RB_split_) 1 MPI processes type: seqaij rows=324, cols=324 total: nonzeros=5760, allocated nonzeros=5760 total number of mallocs used during MatSetValues calls =0 using I-node routines: found 108 nodes, limit used is 5 KSP solver for S = A11 - A10 inv(A00) A01 KSP Object: (fieldsplit_FE_split_) 1 MPI processes type: cg maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000. left preconditioning using PRECONDITIONED norm type for convergence test PC Object: (fieldsplit_FE_split_) 1 MPI processes type: bjacobi block Jacobi: number of blocks = 1 Local solve is same for all blocks, in the following KSP and PC objects: KSP Object: (fieldsplit_FE_split_sub_) 1 MPI processes type: preonly maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000. left preconditioning using NONE norm type for convergence test PC Object: (fieldsplit_FE_split_sub_) 1 MPI processes type: ilu ILU: out-of-place factorization 0 levels of fill tolerance for zero pivot 2.22045e-14 matrix ordering: natural factor fill ratio given 1., needed 1. Factored matrix follows: Mat Object: 1 MPI processes type: seqaij rows=28476, cols=28476 package used to perform factorization: petsc total: nonzeros=1017054, allocated nonzeros=1017054 total number of mallocs used during MatSetValues calls =0 using I-node routines: found 9492 nodes, limit used is 5 linear system matrix = precond matrix: Mat Object: (fieldsplit_FE_split_) 1 MPI processes type: seqaij rows=28476, cols=28476 total: nonzeros=1017054, allocated nonzeros=1017054 total number of mallocs used during MatSetValues calls =0 using I-node routines: found 9492 nodes, limit used is 5 linear system matrix followed by preconditioner matrix: Mat Object: (fieldsplit_FE_split_) 1 MPI processes type: schurcomplement rows=28476, cols=28476 Schur complement A11 - A10 inv(A00) A01 A11 Mat Object: (fieldsplit_FE_split_) 1 MPI processes type: seqaij rows=28476, cols=28476 total: nonzeros=1017054, allocated nonzeros=1017054 total number of mallocs used during MatSetValues calls =0 using I-node routines: found 9492 nodes, limit used is 5 A10 Mat Object: 1 MPI processes type: seqaij rows=28476, cols=324 total: nonzeros=936, allocated nonzeros=936 total number of mallocs used during MatSetValues calls =0 using I-node routines: found 5717 nodes, limit used is 5 KSP of A00 KSP Object: (fieldsplit_RB_split_) 1 MPI processes type: preonly maximum iterations=10000, initial guess is zero tolerances: relative=1e-05, absolute=1e-50, divergence=10000. left preconditioning using NONE norm type for convergence test PC Object: (fieldsplit_RB_split_) 1 MPI processes type: cholesky Cholesky: out-of-place factorization tolerance for zero pivot 2.22045e-14 matrix ordering: natural factor fill ratio given 0., needed 0. Factored matrix follows: Mat Object: 1 MPI processes type: seqaij rows=324, cols=324 package used to perform factorization: mumps total: nonzeros=3042, allocated nonzeros=3042 total number of mallocs used during MatSetValues calls =0 MUMPS run parameters: SYM (matrix type): 2 PAR (host participation): 1 ICNTL(1) (output for error): 6 ICNTL(2) (output of diagnostic msg): 0 ICNTL(3) (output for global info): 0 ICNTL(4) (level of printing): 0 ICNTL(5) (input mat struct): 0 ICNTL(6) (matrix prescaling): 7 ICNTL(7) (sequentia matrix ordering):7 ICNTL(8) (scalling strategy): 77 ICNTL(10) (max num of refinements): 0 ICNTL(11) (error analysis): 0 ICNTL(12) (efficiency control): 0 ICNTL(13) (efficiency control): 0 ICNTL(14) (percentage of estimated workspace increase): 20 ICNTL(18) (input mat struct): 0 ICNTL(19) (Shur complement info): 0 ICNTL(20) (rhs sparse pattern): 0 ICNTL(21) (solution struct): 0 ICNTL(22) (in-core/out-of-core facility): 0 ICNTL(23) (max size of memory can be allocated locally):0 ICNTL(24) (detection of null pivot rows): 0 ICNTL(25) (computation of a null space basis): 0 ICNTL(26) (Schur options for rhs or solution): 0 ICNTL(27) (experimental parameter): -24 ICNTL(28) (use parallel or sequential ordering): 1 ICNTL(29) (parallel ordering): 0 ICNTL(30) (user-specified set of entries in inv(A)): 0 ICNTL(31) (factors is discarded in the solve phase): 0 ICNTL(33) (compute determinant): 0 CNTL(1) (relative pivoting threshold): 0.01 CNTL(2) (stopping criterion of refinement): 1.49012e-08 CNTL(3) (absolute pivoting threshold): 0. CNTL(4) (value of static pivoting): -1. CNTL(5) (fixation for null pivots): 0. RINFO(1) (local estimated flops for the elimination after analysis): [0] 29394. RINFO(2) (local estimated flops for the assembly after factorization): [0] 1092. RINFO(3) (local estimated flops for the elimination after factorization): [0] 29394. INFO(15) (estimated size of (in MB) MUMPS internal data for running numerical factorization): [0] 1 INFO(16) (size of (in MB) MUMPS internal data used during numerical factorization): [0] 1 INFO(23) (num of pivots eliminated on this processor after factorization): [0] 324 RINFOG(1) (global estimated flops for the elimination after analysis): 29394. RINFOG(2) (global estimated flops for the assembly after factorization): 1092. RINFOG(3) (global estimated flops for the elimination after factorization): 29394. (RINFOG(12) RINFOG(13))*2^INFOG(34) (determinant): (0.,0.)*(2^0) INFOG(3) (estimated real workspace for factors on all processors after analysis): 3888 INFOG(4) (estimated integer workspace for factors on all processors after analysis): 2067 INFOG(5) (estimated maximum front size in the complete tree): 12 INFOG(6) (number of nodes in the complete tree): 53 INFOG(7) (ordering option effectively use after analysis): 2 INFOG(8) (structural symmetry in percent of the permuted matrix after analysis): 100 INFOG(9) (total real/complex workspace to store the matrix factors after factorization): 3888 INFOG(10) (total integer space store the matrix factors after factorization): 2067 INFOG(11) (order of largest frontal matrix after factorization): 12 INFOG(12) (number of off-diagonal pivots): 0 INFOG(13) (number of delayed pivots after factorization): 0 INFOG(14) (number of memory compress after factorization): 0 INFOG(15) (number of steps of iterative refinement after solution): 0 INFOG(16) (estimated size (in MB) of all MUMPS internal data for factorization after analysis: value on the most memory consuming processor): 1 INFOG(17) (estimated size of all MUMPS internal data for factorization after analysis: sum over all processors): 1 INFOG(18) (size of all MUMPS internal data allocated during factorization: value on the most memory consuming processor): 1 INFOG(19) (size of all MUMPS internal data allocated during factorization: sum over all processors): 1 INFOG(20) (estimated number of entries in the factors): 3042 INFOG(21) (size in MB of memory effectively used during factorization - value on the most memory consuming processor): 1 INFOG(22) (size in MB of memory effectively used during factorization - sum over all processors): 1 INFOG(23) (after analysis: value of ICNTL(6) effectively used): 5 INFOG(24) (after analysis: value of ICNTL(12) effectively used): 1 INFOG(25) (after factorization: number of pivots modified by static pivoting): 0 INFOG(28) (after factorization: number of null pivots encountered): 0 INFOG(29) (after factorization: effective number of entries in the factors (sum over all processors)): 3042 INFOG(30, 31) (after solution: size in Mbytes of memory used during solution phase): 0, 0 INFOG(32) (after analysis: type of analysis done): 1 INFOG(33) (value used for ICNTL(8)): -2 INFOG(34) (exponent of the determinant if determinant is requested): 0 linear system matrix = precond matrix: Mat Object: (fieldsplit_RB_split_) 1 MPI processes type: seqaij rows=324, cols=324 total: nonzeros=5760, allocated nonzeros=5760 total number of mallocs used during MatSetValues calls =0 using I-node routines: found 108 nodes, limit used is 5 A01 Mat Object: 1 MPI processes type: seqaij rows=324, cols=28476 total: nonzeros=936, allocated nonzeros=936 total number of mallocs used during MatSetValues calls =0 using I-node routines: found 67 nodes, limit used is 5 Mat Object: (fieldsplit_FE_split_) 1 MPI processes type: seqaij rows=28476, cols=28476 total: nonzeros=1017054, allocated nonzeros=1017054 total number of mallocs used during MatSetValues calls =0 using I-node routines: found 9492 nodes, limit used is 5 linear system matrix = precond matrix: Mat Object: () 1 MPI processes type: seqaij rows=28800, cols=28800 total: nonzeros=1024686, allocated nonzeros=1024794 total number of mallocs used during MatSetValues calls =0 using I-node routines: found 9600 nodes, limit used is 5 ---------------------------------------------- PETSc Performance Summary: ---------------------------------------------- /home/dknez/akselos-dev/scrbe/build/bin/fe_solver-opt_real on a arch-linux2-c-opt named david-Lenovo with 1 processor, by dknez Wed Jan 11 16:16:47 2017 Using Petsc Release Version 3.7.3, unknown Max Max/Min Avg Total Time (sec): 9.179e+01 1.00000 9.179e+01 Objects: 1.990e+02 1.00000 1.990e+02 Flops: 1.634e+11 1.00000 1.634e+11 1.634e+11 Flops/sec: 1.780e+09 1.00000 1.780e+09 1.780e+09 MPI Messages: 0.000e+00 0.00000 0.000e+00 0.000e+00 MPI Message Lengths: 0.000e+00 0.00000 0.000e+00 0.000e+00 MPI Reductions: 0.000e+00 0.00000 Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract) e.g., VecAXPY() for real vectors of length N --> 2N flops and VecAXPY() for complex vectors of length N --> 8N flops Summary of Stages: ----- Time ------ ----- Flops ----- --- Messages --- -- Message Lengths -- -- Reductions -- Avg %Total Avg %Total counts %Total Avg %Total counts %Total 0: Main Stage: 9.1787e+01 100.0% 1.6336e+11 100.0% 0.000e+00 0.0% 0.000e+00 0.0% 0.000e+00 0.0% ------------------------------------------------------------------------------------------------------------------------ See the 'Profiling' chapter of the users' manual for details on interpreting output. Phase summary info: Count: number of times phase was executed Time and Flops: Max - maximum over all processors Ratio - ratio of maximum to minimum over all processors Mess: number of messages sent Avg. len: average message length (bytes) Reduct: number of global reductions Global: entire computation Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop(). %T - percent time in this phase %F - percent flops in this phase %M - percent messages in this phase %L - percent message lengths in this phase %R - percent reductions in this phase Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over all processors) ------------------------------------------------------------------------------------------------------------------------ Event Count Time (sec) Flops --- Global --- --- Stage --- Total Max Ratio Max Ratio Max Ratio Mess Avg len Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s ------------------------------------------------------------------------------------------------------------------------ --- Event Stage 0: Main Stage VecDot 42 1.0 2.4080e-05 1.0 8.53e+03 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 354 VecTDot 74012 1.0 1.2440e+00 1.0 4.22e+09 1.0 0.0e+00 0.0e+00 0.0e+00 1 3 0 0 0 1 3 0 0 0 3388 VecNorm 37020 1.0 8.3580e-01 1.0 2.11e+09 1.0 0.0e+00 0.0e+00 0.0e+00 1 1 0 0 0 1 1 0 0 0 2523 VecScale 37008 1.0 3.5800e-01 1.0 1.05e+09 1.0 0.0e+00 0.0e+00 0.0e+00 0 1 0 0 0 0 1 0 0 0 2944 VecCopy 37034 1.0 2.5754e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecSet 74137 1.0 3.0537e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecAXPY 74029 1.0 1.7233e+00 1.0 4.22e+09 1.0 0.0e+00 0.0e+00 0.0e+00 2 3 0 0 0 2 3 0 0 0 2446 VecAYPX 37001 1.0 1.2214e+00 1.0 2.11e+09 1.0 0.0e+00 0.0e+00 0.0e+00 1 1 0 0 0 1 1 0 0 0 1725 VecAssemblyBegin 68 1.0 2.0432e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecAssemblyEnd 68 1.0 2.5988e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecScatterBegin 48 1.0 4.6921e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatMult 37017 1.0 4.1269e+01 1.0 7.65e+10 1.0 0.0e+00 0.0e+00 0.0e+00 45 47 0 0 0 45 47 0 0 0 1853 MatMultAdd 37015 1.0 3.3638e+01 1.0 7.53e+10 1.0 0.0e+00 0.0e+00 0.0e+00 37 46 0 0 0 37 46 0 0 0 2238 MatSolve 74021 1.0 4.6602e+01 1.0 7.42e+10 1.0 0.0e+00 0.0e+00 0.0e+00 51 45 0 0 0 51 45 0 0 0 1593 MatLUFactorNum 1 1.0 1.7209e-02 1.0 2.44e+07 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 1420 MatCholFctrSym 1 1.0 8.8310e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatCholFctrNum 1 1.0 3.6907e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatILUFactorSym 1 1.0 3.7372e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatAssemblyBegin 29 1.0 2.1458e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatAssemblyEnd 29 1.0 9.9473e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatGetRow 58026 1.0 2.8155e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatGetRowIJ 2 1.0 0.0000e+00 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatGetSubMatrice 6 1.0 1.5399e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatGetOrdering 2 1.0 3.0112e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatZeroEntries 6 1.0 2.9490e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatView 7 1.0 3.4356e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 KSPSetUp 4 1.0 9.4891e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 KSPSolve 1 1.0 8.8793e+01 1.0 1.63e+11 1.0 0.0e+00 0.0e+00 0.0e+00 97100 0 0 0 97100 0 0 0 1840 PCSetUp 4 1.0 3.8375e-02 1.0 2.44e+07 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 637 PCSetUpOnBlocks 5 1.0 2.1250e-02 1.0 2.44e+07 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 1150 PCApply 5 1.0 8.8789e+01 1.0 1.63e+11 1.0 0.0e+00 0.0e+00 0.0e+00 97100 0 0 0 97100 0 0 0 1840 KSPSolve_FS_0 5 1.0 7.5364e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 KSPSolve_FS_Schu 5 1.0 8.8785e+01 1.0 1.63e+11 1.0 0.0e+00 0.0e+00 0.0e+00 97100 0 0 0 97100 0 0 0 1840 KSPSolve_FS_Low 5 1.0 2.1019e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 ------------------------------------------------------------------------------------------------------------------------ Memory usage is given in bytes: Object Type Creations Destructions Memory Descendants' Mem. Reports information only for process 0. --- Event Stage 0: Main Stage Vector 91 91 9693912 0. Vector Scatter 24 24 15936 0. Index Set 51 51 537888 0. IS L to G Mapping 3 3 240408 0. Matrix 13 13 64097868 0. Krylov Solver 6 6 7888 0. Preconditioner 6 6 6288 0. Viewer 1 0 0 0. Distributed Mesh 1 1 4624 0. Star Forest Bipartite Graph 2 2 1616 0. Discrete System 1 1 872 0. ======================================================================================================================== Average time to get PetscTime(): 0. #PETSc Option Table entries: -ksp_monitor -ksp_view -log_view #End of PETSc Option Table entries Compiled without FORTRAN kernels Compiled with full precision matrices (default) sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 sizeof(PetscInt) 4 Configure options: --with-shared-libraries=1 --with-debugging=0 --download-suitesparse --download-blacs --download-ptscotch=yes --with-blas-lapack-dir=/opt/intel/system_studio_2015.2.050/mkl --CXXFLAGS=-Wl,--no-as-needed --download-scalapack --download-mumps --download-metis --prefix=/home/dknez/software/libmesh_install/opt_real/petsc --download-hypre --download-ml ----------------------------------------- Libraries compiled on Wed Sep 21 17:38:52 2016 on david-Lenovo Machine characteristics: Linux-4.4.0-38-generic-x86_64-with-Ubuntu-16.04-xenial Using PETSc directory: /home/dknez/software/petsc-src Using PETSc arch: arch-linux2-c-opt ----------------------------------------- Using C compiler: mpicc -fPIC -Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -fvisibility=hidden -g -O ${COPTFLAGS} ${CFLAGS} Using Fortran compiler: mpif90 -fPIC -Wall -ffree-line-length-0 -Wno-unused-dummy-argument -g -O ${FOPTFLAGS} ${FFLAGS} ----------------------------------------- Using include paths: -I/home/dknez/software/petsc-src/arch-linux2-c-opt/include -I/home/dknez/software/petsc-src/include -I/home/dknez/software/petsc-src/include -I/home/dknez/software/petsc-src/arch-linux2-c-opt/include -I/home/dknez/software/libmesh_install/opt_real/petsc/include -I/usr/lib/openmpi/include/openmpi/opal/mca/event/libevent2021/libevent -I/usr/lib/openmpi/include/openmpi/opal/mca/event/libevent2021/libevent/include -I/usr/lib/openmpi/include -I/usr/lib/openmpi/include/openmpi ----------------------------------------- Using C linker: mpicc Using Fortran linker: mpif90 Using libraries: -Wl,-rpath,/home/dknez/software/petsc-src/arch-linux2-c-opt/lib -L/home/dknez/software/petsc-src/arch-linux2-c-opt/lib -lpetsc -Wl,-rpath,/home/dknez/software/libmesh_install/opt_real/petsc/lib -L/home/dknez/software/libmesh_install/opt_real/petsc/lib -lcmumps -ldmumps -lsmumps -lzmumps -lmumps_common -lpord -lmetis -lHYPRE -Wl,-rpath,/usr/lib/openmpi/lib -L/usr/lib/openmpi/lib -Wl,-rpath,/usr/lib/gcc/x86_64-linux-gnu/5 -L/usr/lib/gcc/x86_64-linux-gnu/5 -Wl,-rpath,/usr/lib/x86_64-linux-gnu -L/usr/lib/x86_64-linux-gnu -Wl,-rpath,/lib/x86_64-linux-gnu -L/lib/x86_64-linux-gnu -lmpi_cxx -lstdc++ -lscalapack -lml -lmpi_cxx -lstdc++ -lumfpack -lklu -lcholmod -lbtf -lccolamd -lcolamd -lcamd -lamd -lsuitesparseconfig -Wl,-rpath,/opt/intel/system_studio_2015.2.050/mkl/lib/intel64 -L/opt/intel/system_studio_2015.2.050/mkl/lib/intel64 -lmkl_intel_lp64 -lmkl_sequential -lmkl_core -lpthread -lm -lhwloc -lptesmumps -lptscotch -lptscotcherr -lscotch -lscotcherr -lX11 -lm -lmpi_usempif08 -lmpi_usempi_ignore_tkr -lmpi_mpifh -lgfortran -lm -lgfortran -lm -lquadmath -lm -lmpi_cxx -lstdc++ -lrt -lm -lpthread -lz -Wl,-rpath,/usr/lib/openmpi/lib -L/usr/lib/openmpi/lib -Wl,-rpath,/usr/lib/gcc/x86_64-linux-gnu/5 -L/usr/lib/gcc/x86_64-linux-gnu/5 -Wl,-rpath,/usr/lib/x86_64-linux-gnu -L/usr/lib/x86_64-linux-gnu -Wl,-rpath,/lib/x86_64-linux-gnu -L/lib/x86_64-linux-gnu -Wl,-rpath,/usr/lib/x86_64-linux-gnu -L/usr/lib/x86_64-linux-gnu -ldl -Wl,-rpath,/usr/lib/openmpi/lib -lmpi -lgcc_s -lpthread -ldl -----------------------------------------