Re: [petsc-users] Slow convergence while parallel computations.

Mark Adams Fri, 03 Sep 2021 05:02:55 -0700

On Fri, Sep 3, 2021 at 1:57 AM Viktor Nazdrachev <[email protected]>
wrote:


> Hello, Lawrence!
> Thank you for your response!
>
> I attached log files (txt files with convergence behavior and RAM usage
> log in separate txt files) and resulting table with convergence
> investigation data(xls). Data for main non-regular grid with 500K cells and
> heterogeneous properties are in 500K folder, whereas data for simple
> uniform 125K cells grid with constant properties are in 125K folder.
>
>
> >>* On 1 Sep 2021, at 09:42, **Наздрачёв** Виктор** <numbersixvs at gmail.com 
> >><https://lists.mcs.anl.gov/mailman/listinfo/petsc-users>**> wrote:*
>
> >>
>
> >>* I have a 3D elasticity problem with heterogeneous properties.*
>
> >
>
> >What does your coefficient variation look like? How large is the contrast?
>
>
>
> Young modulus varies from 1 to 10 GPa, Poisson ratio varies from 0.3 to
> 0.44 and density – from 1700 to 2600 kg/m^3.
>

That is not too bad. Poorly shaped elements are the next thing to worry
about. Try to keep the aspect ratio below 10 if possible.


>
>
>
>
> >>* There is unstructured grid with aspect ratio varied from 4 to 25. Zero 
> >>Dirichlet BCs  are imposed on bottom face of mesh. Also, Neumann (traction) 
> >>BCs are imposed on side faces. Gravity load is also accounted for. The grid 
> >>I use consists of 500k cells (which is approximately 1.6M of DOFs).*
>
> >>
>
> >>* The best performance and memory usage for single MPI process was obtained 
> >>with HPDDM(BFBCG) solver and bjacobian + ICC (1) in subdomains as 
> >>preconditioner, it took 1 m 45 s and RAM 5.0 GB. Parallel computation with 
> >>4 MPI processes took 2 m 46 s when using 5.6 GB of RAM. This because of 
> >>number of iterations required to achieve the same tolerance is 
> >>significantly increased.*
>
> >
>
> >How many iterations do you have in serial (and then in parallel)?
>
>
>
> Serial run is required 112 iterations to reach convergence 
> (log_hpddm(bfbcg)_bjacobian_icc_1_mpi.txt), parallel run with 4 MPI – 680 
> iterations.
>
>
>
> I attached log files for all simulations (txt files with convergence
> behavior and RAM usage log in separate txt files) and resulting table with
> convergence/memory usage data(xls). Data for main non-regular grid with
> 500K cells and heterogeneous properties are in 500K folder, whereas data
> for simple uniform 125K cells grid with constant properties are in 125K
> folder.
>
>
>
>
>
> >>* I`ve also tried PCGAMG (agg) preconditioner with IC**С** (1) 
> >>sub-precondtioner. For single MPI process, the calculation took 10 min and 
> >>3.4 GB of RAM. To improve the convergence rate, the nullspace was attached 
> >>using MatNullSpaceCreateRigidBody and MatSetNearNullSpace subroutines.  
> >>This has reduced calculation time to 3 m 58 s when using 4.3 GB of RAM. 
> >>Also, there is peak memory usage with 14.1 GB, which appears just before 
> >>the start of the iterations. Parallel computation with 4 MPI processes took 
> >>2 m 53 s when using 8.4 GB of RAM. In that case the peak memory usage is 
> >>about 22 GB.*
>
> >
>
> >Does the number of iterates increase in parallel? Again, how many iterations 
> >do you have?
>
>
>
> For case with 4 MPI processes and attached nullspace it is required 177 
> iterations to reach convergence (you may see detailed log in 
> log_hpddm(bfbcg)_gamg_nearnullspace_4_mpi.txt). For comparison, 90 iterations 
> are required for sequential 
> run(log_hpddm(bfbcg)_gamg_nearnullspace_1_mpi.txt).
>
>
Again, do not use ICC. I am surprised to see such a large jump in iteration
count, but get ICC off the table.

You will see variability in the iteration count with processor count with
GAMG. As much as 10% +-. Maybe more (random) variability , but usually less.

You can decrease the memory a little, and the setup time a lot, by
aggressively coarsening, at the expense of higher iteration counts. It's a
balancing act.

You can run with the defaults, add '-info', grep on GAMG and send the ~30
lines of output if you want advice on parameters.

Thanks,
Mark


>
>
>
>
>
>
> >>* Are there ways to avoid decreasing of the convergence rate for bjacobi 
> >>precondtioner in parallel mode? Does it make sense to use hierarchical or 
> >>nested krylov methods with a local gmres solver (sub_pc_type gmres) and 
> >>some sub-precondtioner (for example, sub_pc_type bjacobi)?*
>
> >
>
> >bjacobi is only a one-level method, so you would not expect 
> >process-independent convergence rate for this kind of problem. If the 
> >coefficient variation is not too extreme, then I would expect GAMG (or some 
> >other smoothed aggregation package, perhaps -pc_type ml (you need 
> >--download-ml)) would work well with some tuning.
>
>
>
> Thanks for idea, but, unfortunately, ML cannot be compiled with 64bit
> integers (It is extremely necessary to perform computation on mesh with
> more than 10M cells).
>
>
>
>
>
> >If you have extremely high contrast coefficients you might need something 
> >with stronger coarse grids. If you can assemble so-called Neumann matrices 
> >(https://petsc.org/release/docs/manualpages/Mat/MATIS.html#MATIS) then you 
> >could try the geneo scheme offered by PCHPDDM.
>
>
>
>
>
> I found strange convergence behavior for HPDDM preconditioner. For 1 MPI
> process BFBCG solver did not converged
> (log_hpddm(bfbcg)_pchpddm_1_mpi.txt), while for 4 MPI processes computation
> was successful (1018 to reach convergence,
> log_hpddm(bfbcg)_pchpddm_4_mpi.txt).
>
> But it should be mentioned that stiffness matrix was created in AIJ format
> (our default matrix format in program).
>
> Matrix conversion to MATIS format via MatConvert subroutine resulted in
> losing of convergence for both serial and parallel run.
>
>
> >>* Is this peak memory usage expected for gamg preconditioner? is there any 
> >>way to reduce it?*
>
> >
>
> >I think that peak memory usage comes from building the coarse grids. Can you 
> >run with `-info` and grep for GAMG, this will provide some output that more 
> >expert GAMG users can interpret.
>
>
>
>  Thanks, I`ll try to use a strong threshold only for coarse grids.
>
>
>
> Kind regards,
>
>
>
> Viktor Nazdrachev
>
>
>
> R&D senior researcher
>
>
>
> Geosteering Technologies LLC
>
>
>
>
>
>
>
>
>
> ср, 1 сент. 2021 г. в 12:02, Lawrence Mitchell <[email protected]>:
>
>>
>>
>> > On 1 Sep 2021, at 09:42, Наздрачёв Виктор <[email protected]>
>> wrote:
>> >
>> > I have a 3D elasticity problem with heterogeneous properties.
>>
>> What does your coefficient variation look like? How large is the contrast?
>>
>> > There is unstructured grid with aspect ratio varied from 4 to 25. Zero
>> Dirichlet BCs  are imposed on bottom face of mesh. Also, Neumann (traction)
>> BCs are imposed on side faces. Gravity load is also accounted for. The grid
>> I use consists of 500k cells (which is approximately 1.6M of DOFs).
>> >
>> > The best performance and memory usage for single MPI process was
>> obtained with HPDDM(BFBCG) solver and bjacobian + ICC (1) in subdomains as
>> preconditioner, it took 1 m 45 s and RAM 5.0 GB. Parallel computation with
>> 4 MPI processes took 2 m 46 s when using 5.6 GB of RAM. This because of
>> number of iterations required to achieve the same tolerance is
>> significantly increased.
>>
>> How many iterations do you have in serial (and then in parallel)?
>>
>> > I`ve also tried PCGAMG (agg) preconditioner with ICС (1)
>> sub-precondtioner. For single MPI process, the calculation took 10 min and
>> 3.4 GB of RAM. To improve the convergence rate, the nullspace was attached
>> using MatNullSpaceCreateRigidBody and MatSetNearNullSpace subroutines.
>> This has reduced calculation time to 3 m 58 s when using 4.3 GB of RAM.
>> Also, there is peak memory usage with 14.1 GB, which appears just before
>> the start of the iterations. Parallel computation with 4 MPI processes took
>> 2 m 53 s when using 8.4 GB of RAM. In that case the peak memory usage is
>> about 22 GB.
>>
>> Does the number of iterates increase in parallel? Again, how many
>> iterations do you have?
>>
>> > Are there ways to avoid decreasing of the convergence rate for bjacobi
>> precondtioner in parallel mode? Does it make sense to use hierarchical or
>> nested krylov methods with a local gmres solver (sub_pc_type gmres) and
>> some sub-precondtioner (for example, sub_pc_type bjacobi)?
>>
>> bjacobi is only a one-level method, so you would not expect
>> process-independent convergence rate for this kind of problem. If the
>> coefficient variation is not too extreme, then I would expect GAMG (or some
>> other smoothed aggregation package, perhaps -pc_type ml (you need
>> --download-ml)) would work well with some tuning.
>>
>> If you have extremely high contrast coefficients you might need something
>> with stronger coarse grids. If you can assemble so-called Neumann matrices (
>> https://petsc.org/release/docs/manualpages/Mat/MATIS.html#MATIS) then
>> you could try the geneo scheme offered by PCHPDDM.
>>
>> > Is this peak memory usage expected for gamg preconditioner? is there
>> any way to reduce it?
>>
>> I think that peak memory usage comes from building the coarse grids. Can
>> you run with `-info` and grep for GAMG, this will provide some output that
>> more expert GAMG users can interpret.
>>
>> Lawrence
>>
>>

Re: [petsc-users] Slow convergence while parallel computations.

Reply via email to