Re: [petsc-users] Strange GAMG performance for mixed FE formulation

Matthew Knepley Wed, 02 Mar 2016 17:49:04 -0800

On Wed, Mar 2, 2016 at 7:15 PM, Justin Chang <[email protected]> wrote:


> Barry,
>
> Attached are the log_summary output for each preconditioner.
>

MatPtAP takes all the time. It looks like there is no coarsening at all at
the first level. Mark, can you see what is going on here?

   Matt


> Thanks,
> Justin
>
>
> On Wednesday, March 2, 2016, Barry Smith <[email protected]> wrote:
>
>>
>>   Justin,
>>
>>    Do you have the -log_summary output for these runs?
>>
>> Barry
>>
>> > On Mar 2, 2016, at 4:28 PM, Justin Chang <[email protected]> wrote:
>> >
>> > Dear all,
>> >
>> > Using the firedrake project, I am solving this simple mixed poisson
>> problem:
>> >
>> > mesh = UnitCubeMesh(40,40,40)
>> > V = FunctionSpace(mesh,"RT",1)
>> > Q = FunctionSpace(mesh,"DG",0)
>> > W = V*Q
>> >
>> > v, p = TrialFunctions(W)
>> > w, q = TestFunctions(W)
>> >
>> > f = Function(Q)
>> >
>> f.interpolate(Expression("12*pi*pi*sin(pi*x[0]*2)*sin(pi*x[1]*2)*sin(2*pi*x[2])"))
>> >
>> > a = dot(v,w)*dx - p*div(w)*dx + div(v)*q*dx
>> > L = f*q*dx
>> >
>> > u = Function(W)
>> > solve(a==L,u,solver_parameters={...})
>> >
>> > This problem has 1161600 degrees of freedom. The solver_parameters are:
>> >
>> > -ksp_type gmres
>> > -pc_type fieldsplit
>> > -pc_fieldsplit_type schur
>> > -pc_fieldsplit_schur_fact_type: upper
>> > -pc_fieldsplit_schur_precondition selfp
>> > -fieldsplit_0_ksp_type preonly
>> > -fieldsplit_0_pc_type bjacobi
>> > -fieldsplit_1_ksp_type preonly
>> > -fieldsplit_1_pc_type hypre/ml/gamg
>> >
>> > for the last option, I compared the wall-clock timings for hypre,
>> ml,and gamg. Here are the strong-scaling results (across 64 cores, 8 cores
>> per Intel Xeon E5-2670 node) for hypre, ml, and gamg:
>> >
>> > hypre:
>> > 1 core: 47.5 s, 12 solver iters
>> > 2 cores: 34.1 s, 15 solver iters
>> > 4 cores: 21.5 s, 15 solver iters
>> > 8 cores: 16.6 s, 15 solver iters
>> > 16 cores: 10.2 s, 15 solver iters
>> > 24 cores: 7.66 s, 15 solver iters
>> > 32 cores: 6.31 s, 15 solver iters
>> > 40 cores: 5.68 s, 15 solver iters
>> > 48 cores: 5.36 s, 16 solver iters
>> > 56 cores: 5.12 s, 16 solver iters
>> > 64 cores: 4.99 s, 16 solver iters
>> >
>> > ml:
>> > 1 core: 4.44 s, 14 solver iters
>> > 2 cores: 2.85 s, 16 solver iters
>> > 4 cores: 1.6 s, 17 solver iters
>> > 8 cores: 0.966 s, 17 solver iters
>> > 16 cores: 0.585 s, 18 solver iters
>> > 24 cores: 0.440 s, 18 solver iters
>> > 32 cores: 0.375 s, 18 solver iters
>> > 40 cores: 0.332 s, 18 solver iters
>> > 48 cores: 0.307 s, 17 solver iters
>> > 56 cores: 0.290 s, 18 solver iters
>> > 64 cores: 0.281 s, 18 solver items
>> >
>> > gamg:
>> > 1 core: 613 s, 12 solver iters
>> > 2 cores: 204 s, 15 solver iters
>> > 4 cores: 77.1 s, 15 solver iters
>> > 8 cores: 38.1 s, 15 solver iters
>> > 16 cores: 15.9 s, 16 solver iters
>> > 24 cores: 9.24 s, 16 solver iters
>> > 32 cores: 5.92 s, 16 solver iters
>> > 40 cores: 4.72 s, 16 solver iters
>> > 48 cores: 3.89 s, 16 solver iters
>> > 56 cores: 3.65 s, 16 solver iters
>> > 64 cores: 3.46 s, 16 solver iters
>> >
>> > The performance difference between ML and HYPRE makes sense to me, but
>> what I am really confused about is GAMG. It seems GAMG is really slow on a
>> single core but something internally is causing it to speed up
>> super-linearly as I increase the number of MPI processes. Shouldn't ML and
>> GAMG have the same performance? I am not sure what log outputs to give you
>> guys, but for starters, below is -ksp_view for the single core case with
>> GAMG
>> >
>> > KSP Object:(solver_) 1 MPI processes
>> >   type: gmres
>> >     GMRES: restart=30, using Classical (unmodified) Gram-Schmidt
>> Orthogonalization with no iterative refinement
>> >     GMRES: happy breakdown tolerance 1e-30
>> >   maximum iterations=10000, initial guess is zero
>> >   tolerances:  relative=1e-07, absolute=1e-50, divergence=10000.
>> >   left preconditioning
>> >   using PRECONDITIONED norm type for convergence test
>> > PC Object:(solver_) 1 MPI processes
>> >   type: fieldsplit
>> >     FieldSplit with Schur preconditioner, factorization UPPER
>> >     Preconditioner for the Schur complement formed from Sp, an
>> assembled approximation to S, which uses (lumped, if requested) A00's
>> diagonal's inverse
>> >     Split info:
>> >     Split number 0 Defined by IS
>> >     Split number 1 Defined by IS
>> >     KSP solver for A00 block
>> >       KSP Object:      (solver_fieldsplit_0_)       1 MPI processes
>> >         type: preonly
>> >         maximum iterations=10000, initial guess is zero
>> >         tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
>> >         left preconditioning
>> >         using NONE norm type for convergence test
>> >       PC Object:      (solver_fieldsplit_0_)       1 MPI processes
>> >         type: bjacobi
>> >           block Jacobi: number of blocks = 1
>> >           Local solve is same for all blocks, in the following KSP and
>> PC objects:
>> >           KSP Object:          (solver_fieldsplit_0_sub_)           1
>> MPI processes
>> >             type: preonly
>> >             maximum iterations=10000, initial guess is zero
>> >             tolerances:  relative=1e-05, absolute=1e-50,
>> divergence=10000.
>> >             left preconditioning
>> >             using NONE norm type for convergence test
>> >           PC Object:          (solver_fieldsplit_0_sub_)           1
>> MPI processes
>> >             type: ilu
>> >               ILU: out-of-place factorization
>> >               0 levels of fill
>> >               tolerance for zero pivot 2.22045e-14
>> >               matrix ordering: natural
>> >               factor fill ratio given 1., needed 1.
>> >                 Factored matrix follows:
>> >                   Mat Object:                   1 MPI processes
>> >                     type: seqaij
>> >                     rows=777600, cols=777600
>> >                     package used to perform factorization: petsc
>> >                     total: nonzeros=5385600, allocated nonzeros=5385600
>> >                     total number of mallocs used during MatSetValues
>> calls =0
>> >                       not using I-node routines
>> >             linear system matrix = precond matrix:
>> >             Mat Object:            (solver_fieldsplit_0_)             1
>> MPI processes
>> >               type: seqaij
>> >               rows=777600, cols=777600
>> >               total: nonzeros=5385600, allocated nonzeros=5385600
>> >               total number of mallocs used during MatSetValues calls =0
>> >                 not using I-node routines
>> >         linear system matrix = precond matrix:
>> >         Mat Object:        (solver_fieldsplit_0_)         1 MPI
>> processes
>> >           type: seqaij
>> >           rows=777600, cols=777600
>> >           total: nonzeros=5385600, allocated nonzeros=5385600
>> >           total number of mallocs used during MatSetValues calls =0
>> >             not using I-node routines
>> >     KSP solver for S = A11 - A10 inv(A00) A01
>> >       KSP Object:      (solver_fieldsplit_1_)       1 MPI processes
>> >         type: preonly
>> >         maximum iterations=10000, initial guess is zero
>> >         tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
>> >         left preconditioning
>> >         using NONE norm type for convergence test
>> >       PC Object:      (solver_fieldsplit_1_)       1 MPI processes
>> >         type: gamg
>> >           MG: type is MULTIPLICATIVE, levels=5 cycles=v
>> >             Cycles per PCApply=1
>> >             Using Galerkin computed coarse grid matrices
>> >             GAMG specific options
>> >               Threshold for dropping small values from graph 0.
>> >               AGG specific options
>> >                 Symmetric graph false
>> >         Coarse grid solver -- level -------------------------------
>> >           KSP Object:          (solver_fieldsplit_1_mg_coarse_)
>>    1 MPI processes
>> >             type: preonly
>> >             maximum iterations=1, initial guess is zero
>> >             tolerances:  relative=1e-05, absolute=1e-50,
>> divergence=10000.
>> >             left preconditioning
>> >             using NONE norm type for convergence test
>> >           PC Object:          (solver_fieldsplit_1_mg_coarse_)
>>  1 MPI processes
>> >             type: bjacobi
>> >               block Jacobi: number of blocks = 1
>> >               Local solve is same for all blocks, in the following KSP
>> and PC objects:
>> >               KSP Object:
>> (solver_fieldsplit_1_mg_coarse_sub_)               1 MPI processes
>> >                 type: preonly
>> >                 maximum iterations=1, initial guess is zero
>> >                 tolerances:  relative=1e-05, absolute=1e-50,
>> divergence=10000.
>> >                 left preconditioning
>> >                 using NONE norm type for convergence test
>> >               PC Object:
>> (solver_fieldsplit_1_mg_coarse_sub_)               1 MPI processes
>> >                 type: lu
>> >                   LU: out-of-place factorization
>> >                   tolerance for zero pivot 2.22045e-14
>> >                   using diagonal shift on blocks to prevent zero pivot
>> [INBLOCKS]
>> >                   matrix ordering: nd
>> >                   factor fill ratio given 5., needed 1.
>> >                     Factored matrix follows:
>> >                       Mat Object:                       1 MPI processes
>> >                         type: seqaij
>> >                         rows=9, cols=9
>> >                         package used to perform factorization: petsc
>> >                         total: nonzeros=81, allocated nonzeros=81
>> >                         total number of mallocs used during
>> MatSetValues calls =0
>> >                           using I-node routines: found 2 nodes, limit
>> used is 5
>> >                 linear system matrix = precond matrix:
>> >                 Mat Object:                 1 MPI processes
>> >                   type: seqaij
>> >                   rows=9, cols=9
>> >                   total: nonzeros=81, allocated nonzeros=81
>> >                   total number of mallocs used during MatSetValues
>> calls =0
>> >                     using I-node routines: found 2 nodes, limit used is
>> 5
>> >             linear system matrix = precond matrix:
>> >             Mat Object:             1 MPI processes
>> >               type: seqaij
>> >               rows=9, cols=9
>> >               total: nonzeros=81, allocated nonzeros=81
>> >               total number of mallocs used during MatSetValues calls =0
>> >                 using I-node routines: found 2 nodes, limit used is 5
>> >         Down solver (pre-smoother) on level 1
>> -------------------------------
>> >           KSP Object:          (solver_fieldsplit_1_mg_levels_1_)
>>      1 MPI processes
>> >             type: chebyshev
>> >               Chebyshev: eigenvalue estimates:  min = 0.0999525, max =
>> 1.09948
>> >               Chebyshev: eigenvalues estimated using gmres with
>> translations  [0. 0.1; 0. 1.1]
>> >               KSP Object:
>> (solver_fieldsplit_1_mg_levels_1_esteig_)               1 MPI processes
>> >                 type: gmres
>> >                   GMRES: restart=30, using Classical (unmodified)
>> Gram-Schmidt Orthogonalization with no iterative refinement
>> >                   GMRES: happy breakdown tolerance 1e-30
>> >                 maximum iterations=10, initial guess is zero
>> >                 tolerances:  relative=1e-12, absolute=1e-50,
>> divergence=10000.
>> >                 left preconditioning
>> >                 using PRECONDITIONED norm type for convergence test
>> >             maximum iterations=2
>> >             tolerances:  relative=1e-05, absolute=1e-50,
>> divergence=10000.
>> >             left preconditioning
>> >             using nonzero initial guess
>> >             using NONE norm type for convergence test
>> >           PC Object:          (solver_fieldsplit_1_mg_levels_1_)
>>    1 MPI processes
>> >             type: sor
>> >               SOR: type = local_symmetric, iterations = 1, local
>> iterations = 1, omega = 1.
>> >             linear system matrix = precond matrix:
>> >             Mat Object:             1 MPI processes
>> >               type: seqaij
>> >               rows=207, cols=207
>> >               total: nonzeros=42849, allocated nonzeros=42849
>> >               total number of mallocs used during MatSetValues calls =0
>> >                 using I-node routines: found 42 nodes, limit used is 5
>> >         Up solver (post-smoother) same as down solver (pre-smoother)
>> >         Down solver (pre-smoother) on level 2
>> -------------------------------
>> >           KSP Object:          (solver_fieldsplit_1_mg_levels_2_)
>>      1 MPI processes
>> >             type: chebyshev
>> >               Chebyshev: eigenvalue estimates:  min = 0.0996628, max =
>> 1.09629
>> >               Chebyshev: eigenvalues estimated using gmres with
>> translations  [0. 0.1; 0. 1.1]
>> >               KSP Object:
>> (solver_fieldsplit_1_mg_levels_2_esteig_)               1 MPI processes
>> >                 type: gmres
>> >                   GMRES: restart=30, using Classical (unmodified)
>> Gram-Schmidt Orthogonalization with no iterative refinement
>> >                   GMRES: happy breakdown tolerance 1e-30
>> >                 maximum iterations=10, initial guess is zero
>> >                 tolerances:  relative=1e-12, absolute=1e-50,
>> divergence=10000.
>> >                 left preconditioning
>> >                 using PRECONDITIONED norm type for convergence test
>> >             maximum iterations=2
>> >             tolerances:  relative=1e-05, absolute=1e-50,
>> divergence=10000.
>> >             left preconditioning
>> >             using nonzero initial guess
>> >             using NONE norm type for convergence test
>> >           PC Object:          (solver_fieldsplit_1_mg_levels_2_)
>>    1 MPI processes
>> >             type: sor
>> >               SOR: type = local_symmetric, iterations = 1, local
>> iterations = 1, omega = 1.
>> >             linear system matrix = precond matrix:
>> >             Mat Object:             1 MPI processes
>> >               type: seqaij
>> >               rows=5373, cols=5373
>> >               total: nonzeros=28852043, allocated nonzeros=28852043
>> >               total number of mallocs used during MatSetValues calls =0
>> >                 using I-node routines: found 1481 nodes, limit used is 5
>> >         Up solver (post-smoother) same as down solver (pre-smoother)
>> >         Down solver (pre-smoother) on level 3
>> -------------------------------
>> >           KSP Object:          (solver_fieldsplit_1_mg_levels_3_)
>>      1 MPI processes
>> >             type: chebyshev
>> >               Chebyshev: eigenvalue estimates:  min = 0.0994294, max =
>> 1.09372
>> >               Chebyshev: eigenvalues estimated using gmres with
>> translations  [0. 0.1; 0. 1.1]
>> >               KSP Object:
>> (solver_fieldsplit_1_mg_levels_3_esteig_)               1 MPI processes
>> >                 type: gmres
>> >                   GMRES: restart=30, using Classical (unmodified)
>> Gram-Schmidt Orthogonalization with no iterative refinement
>> >                   GMRES: happy breakdown tolerance 1e-30
>> >                 maximum iterations=10, initial guess is zero
>> >                 tolerances:  relative=1e-12, absolute=1e-50,
>> divergence=10000.
>> >                 left preconditioning
>> >                 using PRECONDITIONED norm type for convergence test
>> >             maximum iterations=2
>> >             tolerances:  relative=1e-05, absolute=1e-50,
>> divergence=10000.
>> >             left preconditioning
>> >             using nonzero initial guess
>> >             using NONE norm type for convergence test
>> >           PC Object:          (solver_fieldsplit_1_mg_levels_3_)
>>    1 MPI processes
>> >             type: sor
>> >               SOR: type = local_symmetric, iterations = 1, local
>> iterations = 1, omega = 1.
>> >             linear system matrix = precond matrix:
>> >             Mat Object:             1 MPI processes
>> >               type: seqaij
>> >               rows=52147, cols=52147
>> >               total: nonzeros=38604909, allocated nonzeros=38604909
>> >               total number of mallocs used during MatSetValues calls =2
>> >                 not using I-node routines
>> >         Up solver (post-smoother) same as down solver (pre-smoother)
>> >         Down solver (pre-smoother) on level 4
>> -------------------------------
>> >           KSP Object:          (solver_fieldsplit_1_mg_levels_4_)
>>      1 MPI processes
>> >             type: chebyshev
>> >               Chebyshev: eigenvalue estimates:  min = 0.158979, max =
>> 1.74876
>> >               Chebyshev: eigenvalues estimated using gmres with
>> translations  [0. 0.1; 0. 1.1]
>> >               KSP Object:
>> (solver_fieldsplit_1_mg_levels_4_esteig_)               1 MPI processes
>> >                 type: gmres
>> >                   GMRES: restart=30, using Classical (unmodified)
>> Gram-Schmidt Orthogonalization with no iterative refinement
>> >                   GMRES: happy breakdown tolerance 1e-30
>> >                 maximum iterations=10, initial guess is zero
>> >                 tolerances:  relative=1e-12, absolute=1e-50,
>> divergence=10000.
>> >                 left preconditioning
>> >                 using PRECONDITIONED norm type for convergence test
>> >             maximum iterations=2
>> >             tolerances:  relative=1e-05, absolute=1e-50,
>> divergence=10000.
>> >             left preconditioning
>> >             using nonzero initial guess
>> >             using NONE norm type for convergence test
>> >           PC Object:          (solver_fieldsplit_1_mg_levels_4_)
>>    1 MPI processes
>> >             type: sor
>> >               SOR: type = local_symmetric, iterations = 1, local
>> iterations = 1, omega = 1.
>> >             linear system matrix followed by preconditioner matrix:
>> >             Mat Object:            (solver_fieldsplit_1_)             1
>> MPI processes
>> >               type: schurcomplement
>> >               rows=384000, cols=384000
>> >                 Schur complement A11 - A10 inv(A00) A01
>> >                 A11
>> >                   Mat Object:                  (solver_fieldsplit_1_)
>>                  1 MPI processes
>> >                     type: seqaij
>> >                     rows=384000, cols=384000
>> >                     total: nonzeros=384000, allocated nonzeros=384000
>> >                     total number of mallocs used during MatSetValues
>> calls =0
>> >                       not using I-node routines
>> >                 A10
>> >                   Mat Object:                   1 MPI processes
>> >                     type: seqaij
>> >                     rows=384000, cols=777600
>> >                     total: nonzeros=1919999, allocated nonzeros=1919999
>> >                     total number of mallocs used during MatSetValues
>> calls =0
>> >                       not using I-node routines
>> >                 KSP of A00
>> >                   KSP Object:                  (solver_fieldsplit_0_)
>>                  1 MPI processes
>> >                     type: preonly
>> >                     maximum iterations=10000, initial guess is zero
>> >                     tolerances:  relative=1e-05, absolute=1e-50,
>> divergence=10000.
>> >                     left preconditioning
>> >                     using NONE norm type for convergence test
>> >                   PC Object:                  (solver_fieldsplit_0_)
>>                1 MPI processes
>> >                     type: bjacobi
>> >                       block Jacobi: number of blocks = 1
>> >                       Local solve is same for all blocks, in the
>> following KSP and PC objects:
>> >                       KSP Object:
>> (solver_fieldsplit_0_sub_)                       1 MPI processes
>> >                         type: preonly
>> >                         maximum iterations=10000, initial guess is zero
>> >                         tolerances:  relative=1e-05, absolute=1e-50,
>> divergence=10000.
>> >                         left preconditioning
>> >                         using NONE norm type for convergence test
>> >                       PC Object:
>> (solver_fieldsplit_0_sub_)                       1 MPI processes
>> >                         type: ilu
>> >                           ILU: out-of-place factorization
>> >                           0 levels of fill
>> >                           tolerance for zero pivot 2.22045e-14
>> >                           matrix ordering: natural
>> >                           factor fill ratio given 1., needed 1.
>> >                             Factored matrix follows:
>> >                               Mat Object:
>>  1 MPI processes
>> >                                 type: seqaij
>> >                                 rows=777600, cols=777600
>> >                                 package used to perform factorization:
>> petsc
>> >                                 total: nonzeros=5385600, allocated
>> nonzeros=5385600
>> >                                 total number of mallocs used during
>> MatSetValues calls =0
>> >                                   not using I-node routines
>> >                         linear system matrix = precond matrix:
>> >                         Mat Object:
>> (solver_fieldsplit_0_)                         1 MPI processes
>> >                           type: seqaij
>> >                           rows=777600, cols=777600
>> >                           total: nonzeros=5385600, allocated
>> nonzeros=5385600
>> >                           total number of mallocs used during
>> MatSetValues calls =0
>> >                             not using I-node routines
>> >                     linear system matrix = precond matrix:
>> >                     Mat Object:
>> (solver_fieldsplit_0_)                     1 MPI processes
>> >                       type: seqaij
>> >                       rows=777600, cols=777600
>> >                       total: nonzeros=5385600, allocated
>> nonzeros=5385600
>> >                       total number of mallocs used during MatSetValues
>> calls =0
>> >                         not using I-node routines
>> >                 A01
>> >                   Mat Object:                   1 MPI processes
>> >                     type: seqaij
>> >                     rows=777600, cols=384000
>> >                     total: nonzeros=1919999, allocated nonzeros=1919999
>> >                     total number of mallocs used during MatSetValues
>> calls =0
>> >                       not using I-node routines
>> >             Mat Object:             1 MPI processes
>> >               type: seqaij
>> >               rows=384000, cols=384000
>> >               total: nonzeros=3416452, allocated nonzeros=3416452
>> >               total number of mallocs used during MatSetValues calls =0
>> >                 not using I-node routines
>> >         Up solver (post-smoother) same as down solver (pre-smoother)
>> >         linear system matrix followed by preconditioner matrix:
>> >         Mat Object:        (solver_fieldsplit_1_)         1 MPI
>> processes
>> >           type: schurcomplement
>> >           rows=384000, cols=384000
>> >             Schur complement A11 - A10 inv(A00) A01
>> >             A11
>> >               Mat Object:              (solver_fieldsplit_1_)
>>      1 MPI processes
>> >                 type: seqaij
>> >                 rows=384000, cols=384000
>> >                 total: nonzeros=384000, allocated nonzeros=384000
>> >                 total number of mallocs used during MatSetValues calls
>> =0
>> >                   not using I-node routines
>> >             A10
>> >               Mat Object:               1 MPI processes
>> >                 type: seqaij
>> >                 rows=384000, cols=777600
>> >                 total: nonzeros=1919999, allocated nonzeros=1919999
>> >                 total number of mallocs used during MatSetValues calls
>> =0
>> >                   not using I-node routines
>> >             KSP of A00
>> >               KSP Object:              (solver_fieldsplit_0_)
>>      1 MPI processes
>> >                 type: preonly
>> >                 maximum iterations=10000, initial guess is zero
>> >                 tolerances:  relative=1e-05, absolute=1e-50,
>> divergence=10000.
>> >                 left preconditioning
>> >                 using NONE norm type for convergence test
>> >               PC Object:              (solver_fieldsplit_0_)
>>    1 MPI processes
>> >                 type: bjacobi
>> >                   block Jacobi: number of blocks = 1
>> >                   Local solve is same for all blocks, in the following
>> KSP and PC objects:
>> >                   KSP Object:
>> (solver_fieldsplit_0_sub_)                   1 MPI processes
>> >                     type: preonly
>> >                     maximum iterations=10000, initial guess is zero
>> >                     tolerances:  relative=1e-05, absolute=1e-50,
>> divergence=10000.
>> >                     left preconditioning
>> >                     using NONE norm type for convergence test
>> >                   PC Object:
>> (solver_fieldsplit_0_sub_)                   1 MPI processes
>> >                     type: ilu
>> >                       ILU: out-of-place factorization
>> >                       0 levels of fill
>> >                       tolerance for zero pivot 2.22045e-14
>> >                       matrix ordering: natural
>> >                       factor fill ratio given 1., needed 1.
>> >                         Factored matrix follows:
>> >                           Mat Object:                           1 MPI
>> processes
>> >                             type: seqaij
>> >                             rows=777600, cols=777600
>> >                             package used to perform factorization: petsc
>> >                             total: nonzeros=5385600, allocated
>> nonzeros=5385600
>> >                             total number of mallocs used during
>> MatSetValues calls =0
>> >                               not using I-node routines
>> >                     linear system matrix = precond matrix:
>> >                     Mat Object:
>> (solver_fieldsplit_0_)                     1 MPI processes
>> >                       type: seqaij
>> >                       rows=777600, cols=777600
>> >                       total: nonzeros=5385600, allocated
>> nonzeros=5385600
>> >                       total number of mallocs used during MatSetValues
>> calls =0
>> >                         not using I-node routines
>> >                 linear system matrix = precond matrix:
>> >                 Mat Object:                (solver_fieldsplit_0_)
>>            1 MPI processes
>> >                   type: seqaij
>> >                   rows=777600, cols=777600
>> >                   total: nonzeros=5385600, allocated nonzeros=5385600
>> >                   total number of mallocs used during MatSetValues
>> calls =0
>> >                     not using I-node routines
>> >             A01
>> >               Mat Object:               1 MPI processes
>> >                 type: seqaij
>> >                 rows=777600, cols=384000
>> >                 total: nonzeros=1919999, allocated nonzeros=1919999
>> >                 total number of mallocs used during MatSetValues calls
>> =0
>> >                   not using I-node routines
>> >         Mat Object:         1 MPI processes
>> >           type: seqaij
>> >           rows=384000, cols=384000
>> >           total: nonzeros=3416452, allocated nonzeros=3416452
>> >           total number of mallocs used during MatSetValues calls =0
>> >             not using I-node routines
>> >   linear system matrix = precond matrix:
>> >   Mat Object:   1 MPI processes
>> >     type: nest
>> >     rows=1161600, cols=1161600
>> >       Matrix object:
>> >         type=nest, rows=2, cols=2
>> >         MatNest structure:
>> >         (0,0) : prefix="solver_fieldsplit_0_", type=seqaij,
>> rows=777600, cols=777600
>> >         (0,1) : type=seqaij, rows=777600, cols=384000
>> >         (1,0) : type=seqaij, rows=384000, cols=777600
>> >         (1,1) : prefix="solver_fieldsplit_1_", type=seqaij,
>> rows=384000, cols=384000
>> >
>> > Any insight as to what's happening? Btw this firedrake/petsc-mapdes is
>> from way back in october 2015 (yes much has changed since but
>> reinstalling/updating firedrake and petsc on LANL's firewall HPC machines
>> is a big pain in the ass).
>> >
>> > Thanks,
>> > Justin
>>
>>


-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener

Re: [petsc-users] Strange GAMG performance for mixed FE formulation

Reply via email to