On Sun, Jun 11, 2017 at 11:06 PM, David Nolte <dno...@dim.uchile.cl> wrote:
> Thanks Matt, makes sense to me! > > I skipped direct solvers at first because for these 'real' configurations > LU (mumps/superlu_dist) usally goes out of memory (got 32GB RAM). It would > be reasonable to take one more step back and play with synthetic examples. > I managed to run one case though with 936k dofs using: ("user" =pressure > mass matrix) > > <...> > -pc_fieldsplit_schur_fact_type upper > -pc_fieldsplit_schur_precondition user > -fieldsplit_0_ksp_type preonly > -fieldsplit_0_pc_type lu > -fieldsplit_0_pc_factor_mat_solver_package mumps > > -fieldsplit_1_ksp_type gmres > -fieldsplit_1_ksp_monitor_true_residuals > -fieldsplit_1_ksp_rtol 1e-10 > -fieldsplit_1_pc_type lu > -fieldsplit_1_pc_factor_mat_solver_package mumps > > It takes 2 outer iterations, as expected. However the fieldsplit_1 solve > takes very long. > 1) It should take 1 outer iterate, not two. The problem is that your Schur tolerance is way too high. Use -fieldsplit_1_ksp_rtol 1e-10 or something like that. Then it will take 1 iterate. 2) There is a problem with the Schur solve. Now from the iterates 423 KSP preconditioned resid norm 2.638419658982e-02 true resid norm 7.229653211635e-11 ||r(i)||/||b|| 7.229653211635e-11 it is clear that the preconditioner is really screwing stuff up. For testing, you can use -pc_fieldsplit_schur_precondition full and your same setup here. It should take one iterate. I think there is something wrong with your mass matrix. Thanks, Matt > 0 KSP unpreconditioned resid norm 4.038466809302e-03 true resid norm > 4.038466809302e-03 ||r(i)||/||b|| 1.000000000000e+00 > Residual norms for fieldsplit_1_ solve. > 0 KSP preconditioned resid norm 0.000000000000e+00 true resid norm > 0.000000000000e+00 ||r(i)||/||b|| -nan > Linear fieldsplit_1_ solve converged due to CONVERGED_ATOL iterations 0 > 1 KSP unpreconditioned resid norm 4.860095964831e-06 true resid norm > 4.860095964831e-06 ||r(i)||/||b|| 1.203450763452e-03 > Residual norms for fieldsplit_1_ solve. > 0 KSP preconditioned resid norm 2.965546249872e+08 true resid norm > 1.000000000000e+00 ||r(i)||/||b|| 1.000000000000e+00 > 1 KSP preconditioned resid norm 1.347596594634e+08 true resid norm > 3.599678801575e-01 ||r(i)||/||b|| 3.599678801575e-01 > 2 KSP preconditioned resid norm 5.913230136403e+07 true resid norm > 2.364916760834e-01 ||r(i)||/||b|| 2.364916760834e-01 > 3 KSP preconditioned resid norm 4.629700028930e+07 true resid norm > 1.984444715595e-01 ||r(i)||/||b|| 1.984444715595e-01 > 4 KSP preconditioned resid norm 3.804431276819e+07 true resid norm > 1.747224559120e-01 ||r(i)||/||b|| 1.747224559120e-01 > 5 KSP preconditioned resid norm 3.178769422140e+07 true resid norm > 1.402254864444e-01 ||r(i)||/||b|| 1.402254864444e-01 > 6 KSP preconditioned resid norm 2.648669043919e+07 true resid norm > 1.191164310866e-01 ||r(i)||/||b|| 1.191164310866e-01 > 7 KSP preconditioned resid norm 2.203522108614e+07 true resid norm > 9.690500018007e-02 ||r(i)||/||b|| 9.690500018007e-02 > <...> > 422 KSP preconditioned resid norm 2.984888715147e-02 true resid norm > 8.598401046494e-11 ||r(i)||/||b|| 8.598401046494e-11 > 423 KSP preconditioned resid norm 2.638419658982e-02 true resid norm > 7.229653211635e-11 ||r(i)||/||b|| 7.229653211635e-11 > Linear fieldsplit_1_ solve converged due to CONVERGED_RTOL iterations 423 > 2 KSP unpreconditioned resid norm 3.539889585599e-16 true resid norm > 3.542279617063e-16 ||r(i)||/||b|| 8.771347603759e-14 > Linear solve converged due to CONVERGED_RTOL iterations 2 > > > Does the slow convergence of the Schur block mean that my preconditioning > matrix Sp is a poor choice? > > Thanks, > David > > > On 06/11/2017 08:53 AM, Matthew Knepley wrote: > > On Sat, Jun 10, 2017 at 8:25 PM, David Nolte <dno...@dim.uchile.cl> wrote: > >> Dear all, >> >> I am solving a Stokes problem in 3D aorta geometries, using a P2/P1 >> finite elements discretization on tetrahedral meshes resulting in >> ~1-1.5M DOFs. Viscosity is uniform (can be adjusted arbitrarily), and >> the right hand side is a function of noisy measurement data. >> >> In other settings of "standard" Stokes flow problems I have obtained >> good convergence with an "upper" Schur complement preconditioner, using >> AMG (ML or Hypre) on the velocity block and approximating the Schur >> complement matrix by the diagonal of the pressure mass matrix: >> >> -ksp_converged_reason >> -ksp_monitor_true_residual >> -ksp_initial_guess_nonzero >> -ksp_diagonal_scale >> -ksp_diagonal_scale_fix >> -ksp_type fgmres >> -ksp_rtol 1.0e-8 >> >> -pc_type fieldsplit >> -pc_fieldsplit_type schur >> -pc_fieldsplit_detect_saddle_point >> -pc_fieldsplit_schur_fact_type upper >> -pc_fieldsplit_schur_precondition user # <-- pressure mass matrix >> >> -fieldsplit_0_ksp_type preonly >> -fieldsplit_0_pc_type ml >> >> -fieldsplit_1_ksp_type preonly >> -fieldsplit_1_pc_type jacobi >> > > 1) I always recommend starting from an exact solver and backing off in > small steps for optimization. Thus > I would start with LU on the upper block and GMRES/LU with toelrance > 1e-10 on the Schur block. > This should converge in 1 iterate. > > 2) I don't think you want preonly on the Schur system. You might want > GMRES/Jacobi to invert the mass matrix. > > 3) You probably want to tighten the tolerance on the Schur solve, at least > to start, and then slowly let it out. The > tight tolerance will show you how effective the preconditioner is > using that Schur operator. Then you can start > to evaluate how effective the Schur linear sovler is. > > Does this make sense? > > Thanks, > > Matt > > >> In my present case this setup gives rather slow convergence (varies for >> different geometries between 200-500 or several thousands!). I obtain >> better convergence with "-pc_fieldsplit_schur_precondition selfp"and >> using multigrid on S, with "-fieldsplit_1_pc_type ml" (I don't think >> this is optimal, though). >> >> I don't understand why the pressure mass matrix approach performs so >> poorly and wonder what I could try to improve the convergence. Until now >> I have been using ML and Hypre BoomerAMG mostly with default parameters. >> Surely they can be improved by tuning some parameters. Which could be a >> good starting point? Are there other options I should consider? >> >> With the above setup (jacobi) for a case that works better than others, >> the KSP terminates with >> 467 KSP unpreconditioned resid norm 2.072014323515e-09 true resid norm >> 2.072014322600e-09 ||r(i)||/||b|| 9.939098100674e-09 >> >> You can find the output of -ksp_view below. Let me know if you need more >> details. >> >> Thanks in advance for your advice! >> Best wishes >> David >> >> >> KSP Object: 1 MPI processes >> type: fgmres >> GMRES: restart=30, using Classical (unmodified) Gram-Schmidt >> Orthogonalization with no iterative refinement >> GMRES: happy breakdown tolerance 1e-30 >> maximum iterations=10000 >> tolerances: relative=1e-08, absolute=1e-50, divergence=10000. >> right preconditioning >> diagonally scaled system >> using nonzero initial guess >> using UNPRECONDITIONED norm type for convergence test >> PC Object: 1 MPI processes >> type: fieldsplit >> FieldSplit with Schur preconditioner, factorization UPPER >> Preconditioner for the Schur complement formed from user provided >> matrix >> Split info: >> Split number 0 Defined by IS >> Split number 1 Defined by IS >> KSP solver for A00 block >> KSP Object: (fieldsplit_0_) 1 MPI processes >> type: preonly >> maximum iterations=10000, initial guess is zero >> tolerances: relative=1e-05, absolute=1e-50, divergence=10000. >> left preconditioning >> using NONE norm type for convergence test >> PC Object: (fieldsplit_0_) 1 MPI processes >> type: ml >> MG: type is MULTIPLICATIVE, levels=5 cycles=v >> Cycles per PCApply=1 >> Using Galerkin computed coarse grid matrices >> Coarse grid solver -- level ------------------------------- >> KSP Object: (fieldsplit_0_mg_coarse_) 1 MPI >> processes >> type: preonly >> maximum iterations=10000, initial guess is zero >> tolerances: relative=1e-05, absolute=1e-50, divergence=10000. >> left preconditioning >> using NONE norm type for convergence test >> PC Object: (fieldsplit_0_mg_coarse_) 1 MPI >> processes >> type: lu >> LU: out-of-place factorization >> tolerance for zero pivot 2.22045e-14 >> using diagonal shift on blocks to prevent zero pivot >> [INBLOCKS] >> matrix ordering: nd >> factor fill ratio given 5., needed 1. >> Factored matrix follows: >> Mat Object: 1 MPI processes >> type: seqaij >> rows=3, cols=3 >> package used to perform factorization: petsc >> total: nonzeros=3, allocated nonzeros=3 >> total number of mallocs used during MatSetValues >> calls =0 >> not using I-node routines >> linear system matrix = precond matrix: >> Mat Object: 1 MPI processes >> type: seqaij >> rows=3, cols=3 >> total: nonzeros=3, allocated nonzeros=3 >> total number of mallocs used during MatSetValues calls =0 >> not using I-node routines >> Down solver (pre-smoother) on level 1 >> ------------------------------- >> KSP Object: (fieldsplit_0_mg_levels_1_) 1 >> MPI processes >> type: richardson >> Richardson: damping factor=1. >> maximum iterations=2 >> tolerances: relative=1e-05, absolute=1e-50, divergence=10000. >> left preconditioning >> using nonzero initial guess >> using NONE norm type for convergence test >> PC Object: (fieldsplit_0_mg_levels_1_) 1 >> MPI processes >> type: sor >> SOR: type = local_symmetric, iterations = 1, local >> iterations = 1, omega = 1. >> linear system matrix = precond matrix: >> Mat Object: 1 MPI processes >> type: seqaij >> rows=15, cols=15 >> total: nonzeros=69, allocated nonzeros=69 >> total number of mallocs used during MatSetValues calls =0 >> not using I-node routines >> Up solver (post-smoother) same as down solver (pre-smoother) >> Down solver (pre-smoother) on level 2 >> ------------------------------- >> KSP Object: (fieldsplit_0_mg_levels_2_) 1 >> MPI processes >> type: richardson >> Richardson: damping factor=1. >> maximum iterations=2 >> tolerances: relative=1e-05, absolute=1e-50, divergence=10000. >> left preconditioning >> using nonzero initial guess >> using NONE norm type for convergence test >> PC Object: (fieldsplit_0_mg_levels_2_) 1 >> MPI processes >> type: sor >> SOR: type = local_symmetric, iterations = 1, local >> iterations = 1, omega = 1. >> linear system matrix = precond matrix: >> Mat Object: 1 MPI processes >> type: seqaij >> rows=304, cols=304 >> total: nonzeros=7354, allocated nonzeros=7354 >> total number of mallocs used during MatSetValues calls =0 >> not using I-node routines >> Up solver (post-smoother) same as down solver (pre-smoother) >> Down solver (pre-smoother) on level 3 >> ------------------------------- >> KSP Object: (fieldsplit_0_mg_levels_3_) 1 >> MPI processes >> type: richardson >> Richardson: damping factor=1. >> maximum iterations=2 >> tolerances: relative=1e-05, absolute=1e-50, divergence=10000. >> left preconditioning >> using nonzero initial guess >> using NONE norm type for convergence test >> PC Object: (fieldsplit_0_mg_levels_3_) 1 >> MPI processes >> type: sor >> SOR: type = local_symmetric, iterations = 1, local >> iterations = 1, omega = 1. >> linear system matrix = precond matrix: >> Mat Object: 1 MPI processes >> type: seqaij >> rows=30236, cols=30236 >> total: nonzeros=2730644, allocated nonzeros=2730644 >> total number of mallocs used during MatSetValues calls =0 >> not using I-node routines >> Up solver (post-smoother) same as down solver (pre-smoother) >> Down solver (pre-smoother) on level 4 >> ------------------------------- >> KSP Object: (fieldsplit_0_mg_levels_4_) 1 >> MPI processes >> type: richardson >> Richardson: damping factor=1. >> maximum iterations=2 >> tolerances: relative=1e-05, absolute=1e-50, divergence=10000. >> left preconditioning >> using nonzero initial guess >> using NONE norm type for convergence test >> PC Object: (fieldsplit_0_mg_levels_4_) 1 >> MPI processes >> type: sor >> SOR: type = local_symmetric, iterations = 1, local >> iterations = 1, omega = 1. >> linear system matrix = precond matrix: >> Mat Object: (fieldsplit_0_) 1 MPI >> processes >> type: seqaij >> rows=894132, cols=894132 >> total: nonzeros=70684164, allocated nonzeros=70684164 >> total number of mallocs used during MatSetValues calls =0 >> not using I-node routines >> Up solver (post-smoother) same as down solver (pre-smoother) >> linear system matrix = precond matrix: >> Mat Object: (fieldsplit_0_) 1 MPI processes >> type: seqaij >> rows=894132, cols=894132 >> total: nonzeros=70684164, allocated nonzeros=70684164 >> total number of mallocs used during MatSetValues calls =0 >> not using I-node routines >> KSP solver for S = A11 - A10 inv(A00) A01 >> KSP Object: (fieldsplit_1_) 1 MPI processes >> type: preonly >> maximum iterations=10000, initial guess is zero >> tolerances: relative=1e-05, absolute=1e-50, divergence=10000. >> left preconditioning >> using NONE norm type for convergence test >> PC Object: (fieldsplit_1_) 1 MPI processes >> type: jacobi >> linear system matrix followed by preconditioner matrix: >> Mat Object: (fieldsplit_1_) 1 MPI processes >> type: schurcomplement >> rows=42025, cols=42025 >> Schur complement A11 - A10 inv(A00) A01 >> A11 >> Mat Object: (fieldsplit_1_) 1 >> MPI processes >> type: seqaij >> rows=42025, cols=42025 >> total: nonzeros=554063, allocated nonzeros=554063 >> total number of mallocs used during MatSetValues calls =0 >> not using I-node routines >> A10 >> Mat Object: 1 MPI processes >> type: seqaij >> rows=42025, cols=894132 >> total: nonzeros=6850107, allocated nonzeros=6850107 >> total number of mallocs used during MatSetValues calls =0 >> not using I-node routines >> KSP of A00 >> KSP Object: (fieldsplit_0_) 1 >> MPI processes >> type: preonly >> maximum iterations=10000, initial guess is zero >> tolerances: relative=1e-05, absolute=1e-50, >> divergence=10000. >> left preconditioning >> using NONE norm type for convergence test >> PC Object: (fieldsplit_0_) 1 >> MPI processes >> type: ml >> MG: type is MULTIPLICATIVE, levels=5 cycles=v >> Cycles per PCApply=1 >> Using Galerkin computed coarse grid matrices >> Coarse grid solver -- level ------------------------------ >> - >> KSP Object: >> (fieldsplit_0_mg_coarse_) 1 MPI processes >> type: preonly >> maximum iterations=10000, initial guess is zero >> tolerances: relative=1e-05, absolute=1e-50, >> divergence=10000. >> left preconditioning >> using NONE norm type for convergence test >> PC Object: >> (fieldsplit_0_mg_coarse_) 1 MPI processes >> type: lu >> LU: out-of-place factorization >> tolerance for zero pivot 2.22045e-14 >> using diagonal shift on blocks to prevent zero >> pivot [INBLOCKS] >> matrix ordering: nd >> factor fill ratio given 5., needed 1. >> Factored matrix follows: >> Mat Object: 1 MPI >> processes >> type: seqaij >> rows=3, cols=3 >> package used to perform factorization: petsc >> total: nonzeros=3, allocated nonzeros=3 >> total number of mallocs used during >> MatSetValues calls =0 >> not using I-node routines >> linear system matrix = precond matrix: >> Mat Object: 1 MPI processes >> type: seqaij >> rows=3, cols=3 >> total: nonzeros=3, allocated nonzeros=3 >> total number of mallocs used during MatSetValues >> calls =0 >> not using I-node routines >> Down solver (pre-smoother) on level 1 >> ------------------------------- >> KSP Object: >> (fieldsplit_0_mg_levels_1_) 1 MPI processes >> type: richardson >> Richardson: damping factor=1. >> maximum iterations=2 >> tolerances: relative=1e-05, absolute=1e-50, >> divergence=10000. >> left preconditioning >> using nonzero initial guess >> using NONE norm type for convergence test >> PC Object: >> (fieldsplit_0_mg_levels_1_) 1 MPI processes >> type: sor >> SOR: type = local_symmetric, iterations = 1, local >> iterations = 1, omega = 1. >> linear system matrix = precond matrix: >> Mat Object: 1 MPI processes >> type: seqaij >> rows=15, cols=15 >> total: nonzeros=69, allocated nonzeros=69 >> total number of mallocs used during MatSetValues >> calls =0 >> not using I-node routines >> Up solver (post-smoother) same as down solver >> (pre-smoother) >> Down solver (pre-smoother) on level 2 >> ------------------------------- >> KSP Object: >> (fieldsplit_0_mg_levels_2_) 1 MPI processes >> type: richardson >> Richardson: damping factor=1. >> maximum iterations=2 >> tolerances: relative=1e-05, absolute=1e-50, >> divergence=10000. >> left preconditioning >> using nonzero initial guess >> using NONE norm type for convergence test >> PC Object: >> (fieldsplit_0_mg_levels_2_) 1 MPI processes >> type: sor >> SOR: type = local_symmetric, iterations = 1, local >> iterations = 1, omega = 1. >> linear system matrix = precond matrix: >> Mat Object: 1 MPI processes >> type: seqaij >> rows=304, cols=304 >> total: nonzeros=7354, allocated nonzeros=7354 >> total number of mallocs used during MatSetValues >> calls =0 >> not using I-node routines >> Up solver (post-smoother) same as down solver >> (pre-smoother) >> Down solver (pre-smoother) on level 3 >> ------------------------------- >> KSP Object: >> (fieldsplit_0_mg_levels_3_) 1 MPI processes >> type: richardson >> Richardson: damping factor=1. >> maximum iterations=2 >> tolerances: relative=1e-05, absolute=1e-50, >> divergence=10000. >> left preconditioning >> using nonzero initial guess >> using NONE norm type for convergence test >> PC Object: >> (fieldsplit_0_mg_levels_3_) 1 MPI processes >> type: sor >> SOR: type = local_symmetric, iterations = 1, local >> iterations = 1, omega = 1. >> linear system matrix = precond matrix: >> Mat Object: 1 MPI processes >> type: seqaij >> rows=30236, cols=30236 >> total: nonzeros=2730644, allocated nonzeros=2730644 >> total number of mallocs used during MatSetValues >> calls =0 >> not using I-node routines >> Up solver (post-smoother) same as down solver >> (pre-smoother) >> Down solver (pre-smoother) on level 4 >> ------------------------------- >> KSP Object: >> (fieldsplit_0_mg_levels_4_) 1 MPI processes >> type: richardson >> Richardson: damping factor=1. >> maximum iterations=2 >> tolerances: relative=1e-05, absolute=1e-50, >> divergence=10000. >> left preconditioning >> using nonzero initial guess >> using NONE norm type for convergence test >> PC Object: >> (fieldsplit_0_mg_levels_4_) 1 MPI processes >> type: sor >> SOR: type = local_symmetric, iterations = 1, local >> iterations = 1, omega = 1. >> linear system matrix = precond matrix: >> Mat Object: >> (fieldsplit_0_) 1 MPI processes >> type: seqaij >> rows=894132, cols=894132 >> total: nonzeros=70684164, allocated >> nonzeros=70684164 >> total number of mallocs used during MatSetValues >> calls =0 >> not using I-node routines >> Up solver (post-smoother) same as down solver >> (pre-smoother) >> linear system matrix = precond matrix: >> Mat Object: >> (fieldsplit_0_) 1 MPI processes >> type: seqaij >> rows=894132, cols=894132 >> total: nonzeros=70684164, allocated nonzeros=70684164 >> total number of mallocs used during MatSetValues calls >> =0 >> not using I-node routines >> A01 >> Mat Object: 1 MPI processes >> type: seqaij >> rows=894132, cols=42025 >> total: nonzeros=6850107, allocated nonzeros=6850107 >> total number of mallocs used during MatSetValues calls =0 >> not using I-node routines >> Mat Object: 1 MPI processes >> type: seqaij >> rows=42025, cols=42025 >> total: nonzeros=554063, allocated nonzeros=554063 >> total number of mallocs used during MatSetValues calls =0 >> not using I-node routines >> linear system matrix = precond matrix: >> Mat Object: 1 MPI processes >> type: seqaij >> rows=936157, cols=936157 >> total: nonzeros=84938441, allocated nonzeros=84938441 >> total number of mallocs used during MatSetValues calls =0 >> not using I-node routines >> >> >> > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > http://www.caam.rice.edu/~mk51/ > > > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener http://www.caam.rice.edu/~mk51/