Re: [petsc-users] Slow convergence while parallel computations.

Lawrence Mitchell Wed, 01 Sep 2021 02:02:54 -0700

> On 1 Sep 2021, at 09:42, Наздрачёв Виктор <[email protected]> wrote:
> 
> I have a 3D elasticity problem with heterogeneous properties.

What does your coefficient variation look like? How large is the contrast?

> There is unstructured grid with aspect ratio varied from 4 to 25. Zero 
> Dirichlet BCs  are imposed on bottom face of mesh. Also, Neumann (traction) 
> BCs are imposed on side faces. Gravity load is also accounted for. The grid I 
> use consists of 500k cells (which is approximately 1.6M of DOFs).
> 
> The best performance and memory usage for single MPI process was obtained 
> with HPDDM(BFBCG) solver and bjacobian + ICC (1) in subdomains as 
> preconditioner, it took 1 m 45 s and RAM 5.0 GB. Parallel computation with 4 
> MPI processes took 2 m 46 s when using 5.6 GB of RAM. This because of number 
> of iterations required to achieve the same tolerance is significantly 
> increased.

How many iterations do you have in serial (and then in parallel)?

> I`ve also tried PCGAMG (agg) preconditioner with ICС (1) sub-precondtioner. 
> For single MPI process, the calculation took 10 min and 3.4 GB of RAM. To 
> improve the convergence rate, the nullspace was attached using 
> MatNullSpaceCreateRigidBody and MatSetNearNullSpace subroutines.  This has 
> reduced calculation time to 3 m 58 s when using 4.3 GB of RAM. Also, there is 
> peak memory usage with 14.1 GB, which appears just before the start of the 
> iterations. Parallel computation with 4 MPI processes took 2 m 53 s when 
> using 8.4 GB of RAM. In that case the peak memory usage is about 22 GB.

Does the number of iterates increase in parallel? Again, how many iterations do 
you have?

> Are there ways to avoid decreasing of the convergence rate for bjacobi 
> precondtioner in parallel mode? Does it make sense to use hierarchical or 
> nested krylov methods with a local gmres solver (sub_pc_type gmres) and some 
> sub-precondtioner (for example, sub_pc_type bjacobi)?

bjacobi is only a one-level method, so you would not expect process-independent 
convergence rate for this kind of problem. If the coefficient variation is not 
too extreme, then I would expect GAMG (or some other smoothed aggregation 
package, perhaps -pc_type ml (you need --download-ml)) would work well with 
some tuning.

If you have extremely high contrast coefficients you might need something with 
stronger coarse grids. If you can assemble so-called Neumann matrices 
(https://petsc.org/release/docs/manualpages/Mat/MATIS.html#MATIS) then you 
could try the geneo scheme offered by PCHPDDM.

> Is this peak memory usage expected for gamg preconditioner? is there any way 
> to reduce it?

I think that peak memory usage comes from building the coarse grids. Can you 
run with `-info` and grep for GAMG, this will provide some output that more 
expert GAMG users can interpret.

Lawrence
Re: [petsc-users] Slow convergence while parallel computations.

Reply via email to