Slightly better results with a PCTELESCOPE, but still not scalable, cf. below. Maybe I’ll increase the telescope_reduction_factor.
I don’t have much problem with the current operator complexity on 512 or 2048 processes, but I do mind the MatMultAdd and MatMultTranspose being inefficient when the problem is distributed on the same communicator as the original Mat. I tried playing around with the -pc_gamg_threshold option, but, I don’t know if it’s due to the fact that the shift is complex (which, by the by, makes BoomerAMG not an option), for values which have the same effect as -pc_gamg_threshold 0 (i.e, same coarsening has without the option set), I have a perfectly fine solver: Linear st_fieldsplit_pressure_sub_1_telescope_ solve converged due to CONVERGED_RTOL iterations 8 and such… and for greater values (i.e., different coarsening), the solver goes wild: Linear st_fieldsplit_pressure_sub_1_telescope_ solve did not converge due to DIVERGED_ITS iterations 10000 Thanks your help, Pierre Timings: MatMultAdd 191310 0.0 1.9475e+03 0.0 6.78e+09 0.0 1.5e+09 2.0e+02 0.0e+00 6 0 4 0 0 6 0 4 0 0 3320 MatMultTranspose 191310 0.0 1.3959e+03 0.0 6.78e+09 0.0 1.5e+09 2.0e+02 0.0e+00 0 0 4 0 0 0 0 4 0 0 4632 [..] KSPSolve_FS_3 6559 1.0 2.3480e+03 1.0 3.79e+1161.0 2.3e+10 1.1e+03 1.7e+05 16 11 60 18 21 16 11 60 18 22 153414 (Just as a reminder, here are the original timings: MatMultAdd 222360 1.0 2.5904e+0348.0 4.31e+09 1.9 2.4e+09 1.3e+02 0.0e+00 14 0 4 0 0 14 0 4 0 0 2872 MatMultTranspose 222360 1.0 1.8736e+03421.8 4.31e+09 1.9 2.4e+09 1.3e+02 0.0e+00 0 0 4 0 0 0 0 4 0 0 3970 [..] KSPSolve_FS_3 7412 1.0 2.8939e+03 1.0 2.66e+11 2.1 3.5e+10 6.1e+02 2.7e+05 17 11 67 14 28 17 11 67 14 28 148175 ) > On 26 Jul 2018, at 8:52 PM, Jed Brown <j...@jedbrown.org> wrote: > > Matthew Knepley <knep...@gmail.com> writes: > >> On Thu, Jul 26, 2018 at 2:43 PM Jed Brown <j...@jedbrown.org> wrote: >> >>> Matthew Knepley <knep...@gmail.com> writes: >>> >>>> On Thu, Jul 26, 2018 at 12:56 PM Fande Kong <fdkong...@gmail.com> wrote: >>>> >>>>> >>>>> >>>>> On Thu, Jul 26, 2018 at 10:35 AM, Junchao Zhang <jczh...@mcs.anl.gov> >>>>> wrote: >>>>> >>>>>> On Thu, Jul 26, 2018 at 11:15 AM, Fande Kong <fdkong...@gmail.com> >>> wrote: >>>>>> >>>>>>> >>>>>>> >>>>>>> On Thu, Jul 26, 2018 at 9:51 AM, Junchao Zhang <jczh...@mcs.anl.gov> >>>>>>> wrote: >>>>>>> >>>>>>>> Hi, Pierre, >>>>>>>> From your log_view files, I see you did strong scaling. You used 4X >>>>>>>> more cores, but the execution time only dropped from 3.9143e+04 >>>>>>>> to 1.6910e+04. >>>>>>>> From my previous analysis of a GAMG weak scaling test, it looks >>>>>>>> communication is one of the reasons that caused poor scaling. In >>> your >>>>>>>> case, VecScatterEnd time was doubled from 1.5575e+03 to 3.2413e+03. >>> Its >>>>>>>> time percent jumped from 1% to 17%. This time can contribute to the >>> big >>>>>>>> time ratio in MatMultAdd ant MatMultTranspose, misleading you guys >>> thinking >>>>>>>> there was load-imbalance computation-wise. >>>>>>>> The reason is that I found in the interpolation and restriction >>>>>>>> phases of gamg, the communication pattern is very bad. Few processes >>>>>>>> communicate with hundreds of neighbors with message sizes of a few >>> bytes. >>>>>>>> >>>>>>> >>>>>>> We may need to truncate interpolation/restriction operators. Also do >>>>>>> some aggressive coarsening. Unfortunately, GAMG currently does not >>> support. >>>>>>> >>>>>> >>>>>> Are these gamg options the truncation you thought? >>>>>> >>>>> >>>>>> -pc_gamg_threshold[] <thresh,default=0> - Before aggregating the graph >>>>>> GAMG will remove small values from the graph on each level >>>>>> -pc_gamg_threshold_scale <scale,default=1> - Scaling of threshold on >>> each >>>>>> coarser grid if not specified >>>>>> >>>>> >>>>> Nope. Totally different things. >>>>> >>>> >>>> Well, you could use _threshold to do more aggressive coarsening, but not >>>> for thinning out >>>> the interpolation. >>> >>> Increasing the threshold results in slower coarsening. >>> >> >> Hmm, I think we have to change the webpage then: >> >> >> http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/PC/PCGAMGSetThreshold.html >> >> I read it the opposite way. > > More coarse points is "better" (stronger), but higher complexity.