> On Mar 1, 2023, at 10:30 AM, Paul Grosse-Bley > <paul.grosse-b...@ziti.uni-heidelberg.de> wrote: > > Thank you for the detailed answer, Barry. I had hit a deadend on my side. > If you wish to compare, for example, ex45.c with a code that does not > incorporate the Dirichlet boundary nodes in the linear system you can just > use 0 boundary conditions for both codes. > > Do you mean to implement the boundary conditions explicitly in e.g. > hpgmg-cuda instead of using the ghosted cells for them?
I don't know anything about hpgmg-cuda and what it means by "ghosted cells". I am just saying I think it is reasonable to use the style of ex45.c that retains Dirichlet unknowns in the global matrix (with zero on the those points) in PETSc to compare with other codes that may or may not do something different. But you need to use zero for Dirichlet points to ensure that the "funny" convergence rates do not happen at the beginning making the comparison unbalanced between the two codes. > > Do I go right in the assumption that the PCMG coarsening (using DMDAs > geometric information) will cause the boundary condition on the coarser grids > to be finite (>0)? In general even if the Dirichlet points on the fine grid are zero, during PCMG with DMDA as in ex45.c yes those "boundary" values on the coarser grid may end up during the iterative process as non-zero. But this is "harmless", just part of the algorithm but it does mean the convergence with a code that does not include those points will be different (not necessarily better or worse, just different). > > Ideally I would like to just use some kind of GPU-parallel (colored) > SOR/Gauss-Seidel instead of Jacobi. One can relatively easily implement > Red-Black GS using cuSPARSE's masked matrix vector products, but I have not > found any information on implementing a custom preconditioner in PETSc. You can use https://petsc.org/release/docs/manualpages/PC/PCSHELL/ You can look at src/ksp/pc/impls/jacobi.c for detailed comments on what goes into a preconditioner object. If you write such a GPU-parallel (colored) SOR/Gauss-Seidel we would love to include it in PETSc. Note also https://petsc.org/release/docs/manualpages/KSP/KSPCHEBYSHEV/ and potentially other "polynomial preconditioners" are an alternative approach for having more powerful parallel smoothers. > > Best, > Paul Grosse-Bley > > On Wednesday, March 01, 2023 05:38 CET, Barry Smith <bsm...@petsc.dev> wrote: > >> >> > > Ok, here is the situation. The command line options as given do not result > in multigrid quality convergence in any of the runs; the error contraction > factor is around .94 (meaning that for the modes that the multigrid algorithm > does the worst on it only removes about 6 percent of them per iteration). > > But this is hidden by the initial right hand side for the linear system as > written in ex45.c which has O(h) values on the boundary nodes and O(h^3) > values on the interior nodes. The first iterations are largely working on the > boundary residual and making great progress attacking that so that it looks > like the one has a good error contraction factor. One then sees the error > contraction factor start to get worse and worse for the later iterations. > With the 0 on the boundary the iterations quickly get to the bad regime where > the error contraction factor is near one. One can see this by using a > -ksp_rtol 1.e-12 and having the MG code print the residual decrease for each > iteration. Thought it appears the 0 boundary condition one converges much > slower (since it requires many more iterations) if you factor out the huge > advantage of the nonzero boundary condition case at the beginning (in terms > of decreasing the residual) you see they both have an asymptotic error > contraction factor of around .94 (which is horrible for multigrid). > > I now add -mg_levels_ksp_richardson_scale .9 > -mg_coarse_ksp_richardson_scale .9 and rerun the two cases (nonzero and zero > boundary right hand side) they take 35 and 41 iterations (much better) > > initial residual norm 14.6993 > next residual norm 0.84167 0.0572591 > next residual norm 0.0665392 0.00452668 > next residual norm 0.0307273 0.00209039 > next residual norm 0.0158949 0.00108134 > next residual norm 0.00825189 0.000561378 > next residual norm 0.00428474 0.000291492 > next residual norm 0.00222482 0.000151355 > next residual norm 0.00115522 7.85898e-05 > next residual norm 0.000599836 4.0807e-05 > next residual norm 0.000311459 2.11887e-05 > next residual norm 0.000161722 1.1002e-05 > next residual norm 8.39727e-05 5.71269e-06 > next residual norm 4.3602e-05 2.96626e-06 > next residual norm 2.26399e-05 1.5402e-06 > next residual norm 1.17556e-05 7.99735e-07 > next residual norm 6.10397e-06 4.15255e-07 > next residual norm 3.16943e-06 2.15617e-07 > next residual norm 1.64569e-06 1.11957e-07 > next residual norm 8.54511e-07 5.81326e-08 > next residual norm 4.43697e-07 3.01848e-08 > next residual norm 2.30385e-07 1.56732e-08 > next residual norm 1.19625e-07 8.13815e-09 > next residual norm 6.21143e-08 4.22566e-09 > next residual norm 3.22523e-08 2.19413e-09 > next residual norm 1.67467e-08 1.13928e-09 > next residual norm 8.69555e-09 5.91561e-10 > next residual norm 4.51508e-09 3.07162e-10 > next residual norm 2.34441e-09 1.59491e-10 > next residual norm 1.21731e-09 8.28143e-11 > next residual norm 6.32079e-10 4.30005e-11 > next residual norm 3.28201e-10 2.23276e-11 > next residual norm 1.70415e-10 1.15934e-11 > next residual norm 8.84865e-11 6.01976e-12 > next residual norm 4.59457e-11 3.1257e-12 > next residual norm 2.38569e-11 1.62299e-12 > next residual norm 1.23875e-11 8.42724e-13 > Linear solve converged due to CONVERGED_RTOL iterations 35 > Residual norm 1.23875e-11 > > initial residual norm 172.601 > next residual norm 154.803 0.896887 > next residual norm 66.9409 0.387837 > next residual norm 34.4572 0.199636 > next residual norm 17.8836 0.103612 > next residual norm 9.28582 0.0537995 > next residual norm 4.82161 0.027935 > next residual norm 2.50358 0.014505 > next residual norm 1.29996 0.0075316 > next residual norm 0.674992 0.00391071 > next residual norm 0.350483 0.0020306 > next residual norm 0.181985 0.00105437 > next residual norm 0.094494 0.000547472 > next residual norm 0.0490651 0.000284269 > next residual norm 0.0254766 0.000147604 > next residual norm 0.0132285 7.6642e-05 > next residual norm 0.00686876 3.97956e-05 > next residual norm 0.00356654 2.06635e-05 > next residual norm 0.00185189 1.07293e-05 > next residual norm 0.000961576 5.5711e-06 > next residual norm 0.000499289 2.89274e-06 > next residual norm 0.000259251 1.50203e-06 > next residual norm 0.000134614 7.79914e-07 > next residual norm 6.98969e-05 4.04963e-07 > next residual norm 3.62933e-05 2.10273e-07 > next residual norm 1.88449e-05 1.09182e-07 > next residual norm 9.78505e-06 5.66919e-08 > next residual norm 5.0808e-06 2.94367e-08 > next residual norm 2.63815e-06 1.52847e-08 > next residual norm 1.36984e-06 7.93645e-09 > next residual norm 7.11275e-07 4.12093e-09 > next residual norm 3.69322e-07 2.13975e-09 > next residual norm 1.91767e-07 1.11105e-09 > next residual norm 9.95733e-08 5.769e-10 > next residual norm 5.17024e-08 2.99549e-10 > next residual norm 2.6846e-08 1.55538e-10 > next residual norm 1.39395e-08 8.07615e-11 > next residual norm 7.23798e-09 4.19348e-11 > next residual norm 3.75824e-09 2.17742e-11 > next residual norm 1.95138e-09 1.13058e-11 > next residual norm 1.01327e-09 5.87059e-12 > next residual norm 5.26184e-10 3.04856e-12 > next residual norm 2.73182e-10 1.58274e-12 > next residual norm 1.41806e-10 8.21586e-13 > Linear solve converged due to CONVERGED_RTOL iterations 42 > Residual norm 1.41806e-10 > > Notice in the first run the residual norm still dives much more quickly for > the first 2 iterations than the second run. This is because the first run has > "lucky error" that gets wiped out easily from the big boundary term. After > that you can see that the convergence for both is very similar with both > having a reasonable error contraction factor of .51 > > I' ve attached the modified src/ksp/pc/impls/mg/mg.c that prints the > residuals along the way.