On Oct 8, 2013, at 8:18 PM, Barry Smith <bsm...@mcs.anl.gov> wrote: > > MatView 6 1.0 3.4042e+01269.1 0.00e+00 0.0 0.0e+00 0.0e+00 > 4.0e+00 18 0 0 0 0 25 0 0 0 0 0 > > Something is seriously wrong with the default matview (or pcview) for PC > GAMG?
GAMG does not set a pcview. I never touch view functions as far as I can tell. > It is printing way to much for the default view and thus totally hosing the > timings. It might be catching load imbalance from someplace upstream (look at the ratio). > The default PCView() is suppose to be very light weight (not do excessive > communication) and provide very high level information. > > > Barry > > > On Oct 8, 2013, at 6:50 PM, "Mark F. Adams" <mfad...@lbl.gov> wrote: > >> Something is going terrible wrong with the setup in hypre and ML. hypre's >> default parameters are not setup well for 3D. I use: >> >> -pc_hypre_boomeramg_no_CF >> -pc_hypre_boomeramg_agg_nl 1 >> -pc_hypre_boomeramg_coarsen_type HMIS >> -pc_hypre_boomeramg_interp_type ext+i >> >> I'm not sure what is going wrong with ML's setup. >> >> GAMG is converging terribly. Is this just a simple 7-point Laplacian? It >> looks like you the eigen estimate is low on the finest grid, which messes up >> the smoother. Try running with these parameters and send the output: >> >> -pc_gamg_agg_nsmooths 1 >> -pc_gamg_verbose 2 >> -mg_levels_ksp_type richardson >> -mg_levels_pc_type sor >> >> Mark >> >> On Oct 8, 2013, at 5:46 PM, Pierre Jolivet <joli...@ann.jussieu.fr> wrote: >> >>> Please find the log for BoomerAMG, ML and GAMG attached. The set up for >>> GAMG doesn't look so bad compared to the other packages, so I'm wondering >>> what is going on with those ? >>> >>>> >>>> We need the output from running with -log_summary -pc_mg_log >>>> >>>> Also you can run with PETSc's AMG called GAMG (run with -pc_type gamg) >>>> This will give the most useful information about where it is spending >>>> the time. >>>> >>>> >>>> Barry >>>> >>>> >>>> On Oct 8, 2013, at 4:11 PM, Pierre Jolivet <joli...@ann.jussieu.fr> wrote: >>>> >>>>> Dear all, >>>>> I'm trying to compare linear solvers for a simple Poisson equation in >>>>> 3D. >>>>> I thought that MG was the way to go, but looking at my log, the >>>>> performance looks abysmal (I know that the matrices are way too small >>>>> but >>>>> if I go bigger, it just never performs a single iteration ..). Even >>>>> though >>>>> this is neither the BoomerAMG nor the ML mailing list, could you please >>>>> tell me if PETSc sets some default flags that make the setup for those >>>>> solvers so slow for this simple problem ? The performance of (G)ASM is >>>>> in >>>>> comparison much better. >>>>> >>>>> Thanks in advance for your help. >>>>> >>>>> PS: first the BoomerAMG log, then ML (much more verbose, sorry). >>>>> >>>>> 0 KSP Residual norm 1.599647112604e+00 >>>>> 1 KSP Residual norm 5.450838232404e-02 >>>>> 2 KSP Residual norm 3.549673478318e-03 >>>>> 3 KSP Residual norm 2.901826808841e-04 >>>>> 4 KSP Residual norm 2.574235778729e-05 >>>>> 5 KSP Residual norm 2.253410171682e-06 >>>>> 6 KSP Residual norm 1.871067784877e-07 >>>>> 7 KSP Residual norm 1.681162800670e-08 >>>>> 8 KSP Residual norm 2.120841512414e-09 >>>>> KSP Object: 2048 MPI processes >>>>> type: gmres >>>>> GMRES: restart=30, using Classical (unmodified) Gram-Schmidt >>>>> Orthogonalization with no iterative refinement >>>>> GMRES: happy breakdown tolerance 1e-30 >>>>> maximum iterations=200, initial guess is zero >>>>> tolerances: relative=1e-08, absolute=1e-50, divergence=10000 >>>>> left preconditioning >>>>> using PRECONDITIONED norm type for convergence test >>>>> PC Object: 2048 MPI processes >>>>> type: hypre >>>>> HYPRE BoomerAMG preconditioning >>>>> HYPRE BoomerAMG: Cycle type V >>>>> HYPRE BoomerAMG: Maximum number of levels 25 >>>>> HYPRE BoomerAMG: Maximum number of iterations PER hypre call 1 >>>>> HYPRE BoomerAMG: Convergence tolerance PER hypre call 0 >>>>> HYPRE BoomerAMG: Threshold for strong coupling 0.25 >>>>> HYPRE BoomerAMG: Interpolation truncation factor 0 >>>>> HYPRE BoomerAMG: Interpolation: max elements per row 0 >>>>> HYPRE BoomerAMG: Number of levels of aggressive coarsening 0 >>>>> HYPRE BoomerAMG: Number of paths for aggressive coarsening 1 >>>>> HYPRE BoomerAMG: Maximum row sums 0.9 >>>>> HYPRE BoomerAMG: Sweeps down 1 >>>>> HYPRE BoomerAMG: Sweeps up 1 >>>>> HYPRE BoomerAMG: Sweeps on coarse 1 >>>>> HYPRE BoomerAMG: Relax down symmetric-SOR/Jacobi >>>>> HYPRE BoomerAMG: Relax up symmetric-SOR/Jacobi >>>>> HYPRE BoomerAMG: Relax on coarse Gaussian-elimination >>>>> HYPRE BoomerAMG: Relax weight (all) 1 >>>>> HYPRE BoomerAMG: Outer relax weight (all) 1 >>>>> HYPRE BoomerAMG: Using CF-relaxation >>>>> HYPRE BoomerAMG: Measure type local >>>>> HYPRE BoomerAMG: Coarsen type Falgout >>>>> HYPRE BoomerAMG: Interpolation type classical >>>>> linear system matrix = precond matrix: >>>>> Matrix Object: 2048 MPI processes >>>>> type: mpiaij >>>>> rows=4173281, cols=4173281 >>>>> total: nonzeros=102576661, allocated nonzeros=102576661 >>>>> total number of mallocs used during MatSetValues calls =0 >>>>> not using I-node (on process 0) routines >>>>> --- system solved with PETSc (in 1.005199e+02 seconds) >>>>> >>>>> 0 KSP Residual norm 2.368804472986e-01 >>>>> 1 KSP Residual norm 5.676430019132e-02 >>>>> 2 KSP Residual norm 1.898005876002e-02 >>>>> 3 KSP Residual norm 6.193922902926e-03 >>>>> 4 KSP Residual norm 2.008448794493e-03 >>>>> 5 KSP Residual norm 6.390465670228e-04 >>>>> 6 KSP Residual norm 2.157709394389e-04 >>>>> 7 KSP Residual norm 7.295973819979e-05 >>>>> 8 KSP Residual norm 2.358343271482e-05 >>>>> 9 KSP Residual norm 7.489696222066e-06 >>>>> 10 KSP Residual norm 2.390946857593e-06 >>>>> 11 KSP Residual norm 8.068086385140e-07 >>>>> 12 KSP Residual norm 2.706607789749e-07 >>>>> 13 KSP Residual norm 8.636910863376e-08 >>>>> 14 KSP Residual norm 2.761981175852e-08 >>>>> 15 KSP Residual norm 8.755459874369e-09 >>>>> 16 KSP Residual norm 2.708848598341e-09 >>>>> 17 KSP Residual norm 8.968748876265e-10 >>>>> KSP Object: 2048 MPI processes >>>>> type: gmres >>>>> GMRES: restart=30, using Classical (unmodified) Gram-Schmidt >>>>> Orthogonalization with no iterative refinement >>>>> GMRES: happy breakdown tolerance 1e-30 >>>>> maximum iterations=200, initial guess is zero >>>>> tolerances: relative=1e-08, absolute=1e-50, divergence=10000 >>>>> left preconditioning >>>>> using PRECONDITIONED norm type for convergence test >>>>> PC Object: 2048 MPI processes >>>>> type: ml >>>>> MG: type is MULTIPLICATIVE, levels=3 cycles=v >>>>> Cycles per PCApply=1 >>>>> Using Galerkin computed coarse grid matrices >>>>> Coarse grid solver -- level ------------------------------- >>>>> KSP Object: (mg_coarse_) 2048 MPI processes >>>>> type: preonly >>>>> maximum iterations=1, initial guess is zero >>>>> tolerances: relative=1e-05, absolute=1e-50, divergence=10000 >>>>> left preconditioning >>>>> using NONE norm type for convergence test >>>>> PC Object: (mg_coarse_) 2048 MPI processes >>>>> type: redundant >>>>> Redundant preconditioner: First (color=0) of 2048 PCs follows >>>>> KSP Object: (mg_coarse_redundant_) 1 MPI processes >>>>> type: preonly >>>>> maximum iterations=10000, initial guess is zero >>>>> tolerances: relative=1e-05, absolute=1e-50, divergence=10000 >>>>> left preconditioning >>>>> using NONE norm type for convergence test >>>>> PC Object: (mg_coarse_redundant_) 1 MPI processes >>>>> type: lu >>>>> LU: out-of-place factorization >>>>> tolerance for zero pivot 2.22045e-14 >>>>> using diagonal shift on blocks to prevent zero pivot >>>>> matrix ordering: nd >>>>> factor fill ratio given 5, needed 4.38504 >>>>> Factored matrix follows: >>>>> Matrix Object: 1 MPI processes >>>>> type: seqaij >>>>> rows=2055, cols=2055 >>>>> package used to perform factorization: petsc >>>>> total: nonzeros=2476747, allocated nonzeros=2476747 >>>>> total number of mallocs used during MatSetValues calls =0 >>>>> using I-node routines: found 1638 nodes, limit used is >>>>> 5 >>>>> linear system matrix = precond matrix: >>>>> Matrix Object: 1 MPI processes >>>>> type: seqaij >>>>> rows=2055, cols=2055 >>>>> total: nonzeros=564817, allocated nonzeros=1093260 >>>>> total number of mallocs used during MatSetValues calls =0 >>>>> not using I-node routines >>>>> linear system matrix = precond matrix: >>>>> Matrix Object: 2048 MPI processes >>>>> type: mpiaij >>>>> rows=2055, cols=2055 >>>>> total: nonzeros=564817, allocated nonzeros=564817 >>>>> total number of mallocs used during MatSetValues calls =0 >>>>> not using I-node (on process 0) routines >>>>> Down solver (pre-smoother) on level 1 ------------------------------- >>>>> KSP Object: (mg_levels_1_) 2048 MPI processes >>>>> type: richardson >>>>> Richardson: damping factor=1 >>>>> maximum iterations=2 >>>>> tolerances: relative=1e-05, absolute=1e-50, divergence=10000 >>>>> left preconditioning >>>>> using nonzero initial guess >>>>> using NONE norm type for convergence test >>>>> PC Object: (mg_levels_1_) 2048 MPI processes >>>>> type: sor >>>>> SOR: type = local_symmetric, iterations = 1, local iterations = >>>>> 1, >>>>> omega = 1 >>>>> linear system matrix = precond matrix: >>>>> Matrix Object: 2048 MPI processes >>>>> type: mpiaij >>>>> rows=30194, cols=30194 >>>>> total: nonzeros=3368414, allocated nonzeros=3368414 >>>>> total number of mallocs used during MatSetValues calls =0 >>>>> not using I-node (on process 0) routines >>>>> Up solver (post-smoother) same as down solver (pre-smoother) >>>>> Down solver (pre-smoother) on level 2 ------------------------------- >>>>> KSP Object: (mg_levels_2_) 2048 MPI processes >>>>> type: richardson >>>>> Richardson: damping factor=1 >>>>> maximum iterations=2 >>>>> tolerances: relative=1e-05, absolute=1e-50, divergence=10000 >>>>> left preconditioning >>>>> using nonzero initial guess >>>>> using NONE norm type for convergence test >>>>> PC Object: (mg_levels_2_) 2048 MPI processes >>>>> type: sor >>>>> SOR: type = local_symmetric, iterations = 1, local iterations = >>>>> 1, >>>>> omega = 1 >>>>> linear system matrix = precond matrix: >>>>> Matrix Object: 2048 MPI processes >>>>> type: mpiaij >>>>> rows=531441, cols=531441 >>>>> total: nonzeros=12476324, allocated nonzeros=12476324 >>>>> total number of mallocs used during MatSetValues calls =0 >>>>> not using I-node (on process 0) routines >>>>> Up solver (post-smoother) same as down solver (pre-smoother) >>>>> linear system matrix = precond matrix: >>>>> Matrix Object: 2048 MPI processes >>>>> type: mpiaij >>>>> rows=531441, cols=531441 >>>>> total: nonzeros=12476324, allocated nonzeros=12476324 >>>>> total number of mallocs used during MatSetValues calls =0 >>>>> not using I-node (on process 0) routines >>>>> --- system solved with PETSc (in 2.407844e+02 seconds) >>>>> >>>>> >>>> >>>> >>> <log-GAMG><log-ML><log-BoomerAMG> >> >