On Mon, Aug 14, 2023 at 11:03 AM Stephan Kramer <s.kra...@imperial.ac.uk> wrote:
> Many thanks for looking into this, Mark > > My 3D tests were not that different and I see you lowered the threshold. > > Note, you can set the threshold to zero, but your test is running so much > > differently than mine there is something else going on. > > Note, the new, bad, coarsening rate of 30:1 is what we tend to shoot for > > in 3D. > > > > So it is not clear what the problem is. Some questions: > > > > * do you have a picture of this mesh to show me? > > It's just a standard hexahedral cubed sphere mesh with the refinement > level giving the number of times each of the six sides have been > subdivided: so Level_5 mean 2^5 x 2^5 squares which is extruded to 16 > layers. So the total number of elements at Level_5 is 6 x 32 x 32 x 16 = > 98304 hexes. And everything doubles in all 3 dimensions (so 2^3) going > to the next Level > I see, and I assume these are pretty stretched elements. > > > * what do you mean by Q1-Q2 elements? > > Q2-Q1, basically Taylor hood on hexes, so (tri)quadratic for velocity > and (tri)linear for pressure > > I guess you could argue we could/should just do good old geometric > multigrid instead. More generally we do use this solver configuration a > lot for tetrahedral Taylor Hood (P2-P1) in particular also for our > adaptive mesh runs - would it be worth to see if we have the same > performance issues with tetrahedral P2-P1? > No, you have a clear reproducer, if not minimal. The first coarsening is very different. I am working on this and I see that I added a heuristic for thin bodies where you order the vertices in greedy algorithms with minimum degree first. This will tend to pick corners first, edges then faces, etc. That may be the problem. I would like to understand it better (see below). > > > > It would be nice to see if the new and old codes are similar without > > aggressive coarsening. > > This was the intended change of the major change in this time frame as > you > > noticed. > > If these jobs are easy to run, could you check that the old and new > > versions are similar with "-pc_gamg_square_graph 0 ", ( and you only > need > > one time step). > > All you need to do is check that the first coarse grid has about the same > > number of equations (large). > Unfortunately we're seeing some memory errors when we use this option, > and I'm not entirely clear whether we're just running out of memory and > need to put it on a special queue. > > The run with square_graph 0 using new PETSc managed to get through one > solve at level 5, and is giving the following mg levels: > > rows=174, cols=174, bs=6 > total: nonzeros=30276, allocated nonzeros=30276 > -- > rows=2106, cols=2106, bs=6 > total: nonzeros=4238532, allocated nonzeros=4238532 > -- > rows=21828, cols=21828, bs=6 > total: nonzeros=62588232, allocated nonzeros=62588232 > -- > rows=589824, cols=589824, bs=6 > total: nonzeros=1082528928, allocated nonzeros=1082528928 > -- > rows=2433222, cols=2433222, bs=3 > total: nonzeros=456526098, allocated nonzeros=456526098 > > comparing with square_graph 100 with new PETSc > > rows=96, cols=96, bs=6 > total: nonzeros=9216, allocated nonzeros=9216 > -- > rows=1440, cols=1440, bs=6 > total: nonzeros=647856, allocated nonzeros=647856 > -- > rows=97242, cols=97242, bs=6 > total: nonzeros=65656836, allocated nonzeros=65656836 > -- > rows=2433222, cols=2433222, bs=3 > total: nonzeros=456526098, allocated nonzeros=456526098 > > and old PETSc with square_graph 100 > > rows=90, cols=90, bs=6 > total: nonzeros=8100, allocated nonzeros=8100 > -- > rows=1872, cols=1872, bs=6 > total: nonzeros=1234080, allocated nonzeros=1234080 > -- > rows=47652, cols=47652, bs=6 > total: nonzeros=23343264, allocated nonzeros=23343264 > -- > rows=2433222, cols=2433222, bs=3 > total: nonzeros=456526098, allocated nonzeros=456526098 > -- > > Unfortunately old PETSc with square_graph 0 did not complete a single > solve before giving the memory error > OK, thanks for trying. I am working on this and I will give you a branch to test, but if you can rebuild PETSc here is a quick test that might fix your problem. In src/ksp/pc/impls/gamg/agg.c you will see: PetscCall(PetscSortIntWithArray(nloc, degree, permute)); If you can comment this out in the new code and compare with the old, that might fix the problem. Thanks, Mark > > > > > BTW, I am starting to think I should add the old method back as an > option. > > I did not think this change would cause large differences. > > Yes, I think that would be much appreciated. Let us know if we can do > any testing > > Best wishes > Stephan > > > > > > Thanks, > > Mark > > > > > > > > > >> Note that we are providing the rigid body near nullspace, > >> hence the bs=3 to bs=6. > >> We have tried different values for the gamg_threshold but it doesn't > >> really seem to significantly alter the coarsening amount in that first > >> step. > >> > >> Do you have any suggestions for further things we should try/look at? > >> Any feedback would be much appreciated > >> > >> Best wishes > >> Stephan Kramer > >> > >> Full logs including log_view timings available from > >> https://github.com/stephankramer/petsc-scaling/ > >> > >> In particular: > >> > >> > >> > https://github.com/stephankramer/petsc-scaling/blob/main/before/Level_5/output_2.dat > >> > >> > https://github.com/stephankramer/petsc-scaling/blob/main/after/Level_5/output_2.dat > >> > >> > https://github.com/stephankramer/petsc-scaling/blob/main/before/Level_6/output_2.dat > >> > >> > https://github.com/stephankramer/petsc-scaling/blob/main/after/Level_6/output_2.dat > >> > >> > https://github.com/stephankramer/petsc-scaling/blob/main/before/Level_7/output_2.dat > >> > >> > https://github.com/stephankramer/petsc-scaling/blob/main/after/Level_7/output_2.dat > >> > >> > >