It looks like you have 531441 equations and 2K processors. THis is pretty small.
You have ~23 non-zeros per row. What kind of discretization is this? On Oct 9, 2013, at 9:34 AM, Pierre Jolivet <joli...@ann.jussieu.fr> wrote: > Mark and Barry, > You will find attached the log for BoomerAMG (better, but still slow > imho), ML (still lost), GAMG (better, I took Jed's advice and recompiled > petsc-maint but forgot to relink my app so please discard the time spent > in MatView again) and GASM (best, really ? for a Poisson equation ?). > > I'll try bigger matrices (that is likely the real problem now, at least > for GAMG), but if you still see something fishy that I might need to > adjust in the parameters, please tell me. > > Also, the first results I got for elasticity (before going back to plain > scalar diffusion) were at least as bad. Do you have any tips for such > problems beside setting the correct BlockSize and MatNearNullSpace and > using parameters similar to the ones you just gave me or the ones that can > be found here > http://lists.mcs.anl.gov/pipermail/petsc-users/2012-April/012790.html ? > > Thanks for your help, > Pierre > >> >> On Oct 8, 2013, at 8:18 PM, Barry Smith <bsm...@mcs.anl.gov> wrote: >> >>> >>> MatView 6 1.0 3.4042e+01269.1 0.00e+00 0.0 0.0e+00 >>> 0.0e+00 4.0e+00 18 0 0 0 0 25 0 0 0 0 0 >>> >>> Something is seriously wrong with the default matview (or pcview) for >>> PC GAMG? It is printing way to much for the default view and thus >>> totally hosing the timings. The default PCView() is suppose to be >>> very light weight (not do excessive communication) and provide very >>> high level information. >>> >> >> Oh, I think the problem is that GAMG sets the coarse grid solver >> explicitly as a block jacobi with LU local. GAMG insures that all >> equation are on one PE for the coarsest grid. ML uses redundant. You >> should be able to use redundant in GAMG, it is just not the default. This >> is not tested. So I'm guessing the problem is that block Jacobi is noisy. >> >>> >>> Barry >>> >>> >>> On Oct 8, 2013, at 6:50 PM, "Mark F. Adams" <mfad...@lbl.gov> wrote: >>> >>>> Something is going terrible wrong with the setup in hypre and ML. >>>> hypre's default parameters are not setup well for 3D. I use: >>>> >>>> -pc_hypre_boomeramg_no_CF >>>> -pc_hypre_boomeramg_agg_nl 1 >>>> -pc_hypre_boomeramg_coarsen_type HMIS >>>> -pc_hypre_boomeramg_interp_type ext+i >>>> >>>> I'm not sure what is going wrong with ML's setup. >>>> >>>> GAMG is converging terribly. Is this just a simple 7-point Laplacian? >>>> It looks like you the eigen estimate is low on the finest grid, which >>>> messes up the smoother. Try running with these parameters and send the >>>> output: >>>> >>>> -pc_gamg_agg_nsmooths 1 >>>> -pc_gamg_verbose 2 >>>> -mg_levels_ksp_type richardson >>>> -mg_levels_pc_type sor >>>> >>>> Mark >>>> >>>> On Oct 8, 2013, at 5:46 PM, Pierre Jolivet <joli...@ann.jussieu.fr> >>>> wrote: >>>> >>>>> Please find the log for BoomerAMG, ML and GAMG attached. The set up >>>>> for >>>>> GAMG doesn't look so bad compared to the other packages, so I'm >>>>> wondering >>>>> what is going on with those ? >>>>> >>>>>> >>>>>> We need the output from running with -log_summary -pc_mg_log >>>>>> >>>>>> Also you can run with PETSc's AMG called GAMG (run with -pc_type >>>>>> gamg) >>>>>> This will give the most useful information about where it is spending >>>>>> the time. >>>>>> >>>>>> >>>>>> Barry >>>>>> >>>>>> >>>>>> On Oct 8, 2013, at 4:11 PM, Pierre Jolivet <joli...@ann.jussieu.fr> >>>>>> wrote: >>>>>> >>>>>>> Dear all, >>>>>>> I'm trying to compare linear solvers for a simple Poisson equation >>>>>>> in >>>>>>> 3D. >>>>>>> I thought that MG was the way to go, but looking at my log, the >>>>>>> performance looks abysmal (I know that the matrices are way too >>>>>>> small >>>>>>> but >>>>>>> if I go bigger, it just never performs a single iteration ..). Even >>>>>>> though >>>>>>> this is neither the BoomerAMG nor the ML mailing list, could you >>>>>>> please >>>>>>> tell me if PETSc sets some default flags that make the setup for >>>>>>> those >>>>>>> solvers so slow for this simple problem ? The performance of (G)ASM >>>>>>> is >>>>>>> in >>>>>>> comparison much better. >>>>>>> >>>>>>> Thanks in advance for your help. >>>>>>> >>>>>>> PS: first the BoomerAMG log, then ML (much more verbose, sorry). >>>>>>> >>>>>>> 0 KSP Residual norm 1.599647112604e+00 >>>>>>> 1 KSP Residual norm 5.450838232404e-02 >>>>>>> 2 KSP Residual norm 3.549673478318e-03 >>>>>>> 3 KSP Residual norm 2.901826808841e-04 >>>>>>> 4 KSP Residual norm 2.574235778729e-05 >>>>>>> 5 KSP Residual norm 2.253410171682e-06 >>>>>>> 6 KSP Residual norm 1.871067784877e-07 >>>>>>> 7 KSP Residual norm 1.681162800670e-08 >>>>>>> 8 KSP Residual norm 2.120841512414e-09 >>>>>>> KSP Object: 2048 MPI processes >>>>>>> type: gmres >>>>>>> GMRES: restart=30, using Classical (unmodified) Gram-Schmidt >>>>>>> Orthogonalization with no iterative refinement >>>>>>> GMRES: happy breakdown tolerance 1e-30 >>>>>>> maximum iterations=200, initial guess is zero >>>>>>> tolerances: relative=1e-08, absolute=1e-50, divergence=10000 >>>>>>> left preconditioning >>>>>>> using PRECONDITIONED norm type for convergence test >>>>>>> PC Object: 2048 MPI processes >>>>>>> type: hypre >>>>>>> HYPRE BoomerAMG preconditioning >>>>>>> HYPRE BoomerAMG: Cycle type V >>>>>>> HYPRE BoomerAMG: Maximum number of levels 25 >>>>>>> HYPRE BoomerAMG: Maximum number of iterations PER hypre call 1 >>>>>>> HYPRE BoomerAMG: Convergence tolerance PER hypre call 0 >>>>>>> HYPRE BoomerAMG: Threshold for strong coupling 0.25 >>>>>>> HYPRE BoomerAMG: Interpolation truncation factor 0 >>>>>>> HYPRE BoomerAMG: Interpolation: max elements per row 0 >>>>>>> HYPRE BoomerAMG: Number of levels of aggressive coarsening 0 >>>>>>> HYPRE BoomerAMG: Number of paths for aggressive coarsening 1 >>>>>>> HYPRE BoomerAMG: Maximum row sums 0.9 >>>>>>> HYPRE BoomerAMG: Sweeps down 1 >>>>>>> HYPRE BoomerAMG: Sweeps up 1 >>>>>>> HYPRE BoomerAMG: Sweeps on coarse 1 >>>>>>> HYPRE BoomerAMG: Relax down symmetric-SOR/Jacobi >>>>>>> HYPRE BoomerAMG: Relax up symmetric-SOR/Jacobi >>>>>>> HYPRE BoomerAMG: Relax on coarse Gaussian-elimination >>>>>>> HYPRE BoomerAMG: Relax weight (all) 1 >>>>>>> HYPRE BoomerAMG: Outer relax weight (all) 1 >>>>>>> HYPRE BoomerAMG: Using CF-relaxation >>>>>>> HYPRE BoomerAMG: Measure type local >>>>>>> HYPRE BoomerAMG: Coarsen type Falgout >>>>>>> HYPRE BoomerAMG: Interpolation type classical >>>>>>> linear system matrix = precond matrix: >>>>>>> Matrix Object: 2048 MPI processes >>>>>>> type: mpiaij >>>>>>> rows=4173281, cols=4173281 >>>>>>> total: nonzeros=102576661, allocated nonzeros=102576661 >>>>>>> total number of mallocs used during MatSetValues calls =0 >>>>>>> not using I-node (on process 0) routines >>>>>>> --- system solved with PETSc (in 1.005199e+02 seconds) >>>>>>> >>>>>>> 0 KSP Residual norm 2.368804472986e-01 >>>>>>> 1 KSP Residual norm 5.676430019132e-02 >>>>>>> 2 KSP Residual norm 1.898005876002e-02 >>>>>>> 3 KSP Residual norm 6.193922902926e-03 >>>>>>> 4 KSP Residual norm 2.008448794493e-03 >>>>>>> 5 KSP Residual norm 6.390465670228e-04 >>>>>>> 6 KSP Residual norm 2.157709394389e-04 >>>>>>> 7 KSP Residual norm 7.295973819979e-05 >>>>>>> 8 KSP Residual norm 2.358343271482e-05 >>>>>>> 9 KSP Residual norm 7.489696222066e-06 >>>>>>> 10 KSP Residual norm 2.390946857593e-06 >>>>>>> 11 KSP Residual norm 8.068086385140e-07 >>>>>>> 12 KSP Residual norm 2.706607789749e-07 >>>>>>> 13 KSP Residual norm 8.636910863376e-08 >>>>>>> 14 KSP Residual norm 2.761981175852e-08 >>>>>>> 15 KSP Residual norm 8.755459874369e-09 >>>>>>> 16 KSP Residual norm 2.708848598341e-09 >>>>>>> 17 KSP Residual norm 8.968748876265e-10 >>>>>>> KSP Object: 2048 MPI processes >>>>>>> type: gmres >>>>>>> GMRES: restart=30, using Classical (unmodified) Gram-Schmidt >>>>>>> Orthogonalization with no iterative refinement >>>>>>> GMRES: happy breakdown tolerance 1e-30 >>>>>>> maximum iterations=200, initial guess is zero >>>>>>> tolerances: relative=1e-08, absolute=1e-50, divergence=10000 >>>>>>> left preconditioning >>>>>>> using PRECONDITIONED norm type for convergence test >>>>>>> PC Object: 2048 MPI processes >>>>>>> type: ml >>>>>>> MG: type is MULTIPLICATIVE, levels=3 cycles=v >>>>>>> Cycles per PCApply=1 >>>>>>> Using Galerkin computed coarse grid matrices >>>>>>> Coarse grid solver -- level ------------------------------- >>>>>>> KSP Object: (mg_coarse_) 2048 MPI processes >>>>>>> type: preonly >>>>>>> maximum iterations=1, initial guess is zero >>>>>>> tolerances: relative=1e-05, absolute=1e-50, divergence=10000 >>>>>>> left preconditioning >>>>>>> using NONE norm type for convergence test >>>>>>> PC Object: (mg_coarse_) 2048 MPI processes >>>>>>> type: redundant >>>>>>> Redundant preconditioner: First (color=0) of 2048 PCs follows >>>>>>> KSP Object: (mg_coarse_redundant_) 1 MPI processes >>>>>>> type: preonly >>>>>>> maximum iterations=10000, initial guess is zero >>>>>>> tolerances: relative=1e-05, absolute=1e-50, divergence=10000 >>>>>>> left preconditioning >>>>>>> using NONE norm type for convergence test >>>>>>> PC Object: (mg_coarse_redundant_) 1 MPI processes >>>>>>> type: lu >>>>>>> LU: out-of-place factorization >>>>>>> tolerance for zero pivot 2.22045e-14 >>>>>>> using diagonal shift on blocks to prevent zero pivot >>>>>>> matrix ordering: nd >>>>>>> factor fill ratio given 5, needed 4.38504 >>>>>>> Factored matrix follows: >>>>>>> Matrix Object: 1 MPI processes >>>>>>> type: seqaij >>>>>>> rows=2055, cols=2055 >>>>>>> package used to perform factorization: petsc >>>>>>> total: nonzeros=2476747, allocated nonzeros=2476747 >>>>>>> total number of mallocs used during MatSetValues calls >>>>>>> =0 >>>>>>> using I-node routines: found 1638 nodes, limit used is >>>>>>> 5 >>>>>>> linear system matrix = precond matrix: >>>>>>> Matrix Object: 1 MPI processes >>>>>>> type: seqaij >>>>>>> rows=2055, cols=2055 >>>>>>> total: nonzeros=564817, allocated nonzeros=1093260 >>>>>>> total number of mallocs used during MatSetValues calls =0 >>>>>>> not using I-node routines >>>>>>> linear system matrix = precond matrix: >>>>>>> Matrix Object: 2048 MPI processes >>>>>>> type: mpiaij >>>>>>> rows=2055, cols=2055 >>>>>>> total: nonzeros=564817, allocated nonzeros=564817 >>>>>>> total number of mallocs used during MatSetValues calls =0 >>>>>>> not using I-node (on process 0) routines >>>>>>> Down solver (pre-smoother) on level 1 >>>>>>> ------------------------------- >>>>>>> KSP Object: (mg_levels_1_) 2048 MPI processes >>>>>>> type: richardson >>>>>>> Richardson: damping factor=1 >>>>>>> maximum iterations=2 >>>>>>> tolerances: relative=1e-05, absolute=1e-50, divergence=10000 >>>>>>> left preconditioning >>>>>>> using nonzero initial guess >>>>>>> using NONE norm type for convergence test >>>>>>> PC Object: (mg_levels_1_) 2048 MPI processes >>>>>>> type: sor >>>>>>> SOR: type = local_symmetric, iterations = 1, local iterations = >>>>>>> 1, >>>>>>> omega = 1 >>>>>>> linear system matrix = precond matrix: >>>>>>> Matrix Object: 2048 MPI processes >>>>>>> type: mpiaij >>>>>>> rows=30194, cols=30194 >>>>>>> total: nonzeros=3368414, allocated nonzeros=3368414 >>>>>>> total number of mallocs used during MatSetValues calls =0 >>>>>>> not using I-node (on process 0) routines >>>>>>> Up solver (post-smoother) same as down solver (pre-smoother) >>>>>>> Down solver (pre-smoother) on level 2 >>>>>>> ------------------------------- >>>>>>> KSP Object: (mg_levels_2_) 2048 MPI processes >>>>>>> type: richardson >>>>>>> Richardson: damping factor=1 >>>>>>> maximum iterations=2 >>>>>>> tolerances: relative=1e-05, absolute=1e-50, divergence=10000 >>>>>>> left preconditioning >>>>>>> using nonzero initial guess >>>>>>> using NONE norm type for convergence test >>>>>>> PC Object: (mg_levels_2_) 2048 MPI processes >>>>>>> type: sor >>>>>>> SOR: type = local_symmetric, iterations = 1, local iterations = >>>>>>> 1, >>>>>>> omega = 1 >>>>>>> linear system matrix = precond matrix: >>>>>>> Matrix Object: 2048 MPI processes >>>>>>> type: mpiaij >>>>>>> rows=531441, cols=531441 >>>>>>> total: nonzeros=12476324, allocated nonzeros=12476324 >>>>>>> total number of mallocs used during MatSetValues calls =0 >>>>>>> not using I-node (on process 0) routines >>>>>>> Up solver (post-smoother) same as down solver (pre-smoother) >>>>>>> linear system matrix = precond matrix: >>>>>>> Matrix Object: 2048 MPI processes >>>>>>> type: mpiaij >>>>>>> rows=531441, cols=531441 >>>>>>> total: nonzeros=12476324, allocated nonzeros=12476324 >>>>>>> total number of mallocs used during MatSetValues calls =0 >>>>>>> not using I-node (on process 0) routines >>>>>>> --- system solved with PETSc (in 2.407844e+02 seconds) >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>> <log-GAMG><log-ML><log-BoomerAMG> >>>> >>> >> > <log-ML><log-GASM><log-GAMG><log-BoomerAMG>