On Fri, Sep 3, 2021 at 8:02 AM Mark Adams <[email protected]> wrote:
>
>
> On Fri, Sep 3, 2021 at 1:57 AM Viktor Nazdrachev <[email protected]>
> wrote:
>
>> Hello, Lawrence!
>> Thank you for your response!
>>
>> I attached log files (txt files with convergence behavior and RAM usage
>> log in separate txt files) and resulting table with convergence
>> investigation data(xls). Data for main non-regular grid with 500K cells and
>> heterogeneous properties are in 500K folder, whereas data for simple
>> uniform 125K cells grid with constant properties are in 125K folder.
>>
>>
>> >>* On 1 Sep 2021, at 09:42, **Наздрачёв** Виктор** <numbersixvs at
>> >>gmail.com <https://lists.mcs.anl.gov/mailman/listinfo/petsc-users>**>
>> >>wrote:*
>>
>> >>
>>
>> >>* I have a 3D elasticity problem with heterogeneous properties.*
>>
>> >
>>
>> >What does your coefficient variation look like? How large is the contrast?
>>
>>
>>
>> Young modulus varies from 1 to 10 GPa, Poisson ratio varies from 0.3 to
>> 0.44 and density – from 1700 to 2600 kg/m^3.
>>
>
> That is not too bad. Poorly shaped elements are the next thing to worry
> about. Try to keep the aspect ratio below 10 if possible.
>
>
>>
>>
>>
>>
>> >>* There is unstructured grid with aspect ratio varied from 4 to 25. Zero
>> >>Dirichlet BCs are imposed on bottom face of mesh. Also, Neumann
>> >>(traction) BCs are imposed on side faces. Gravity load is also accounted
>> >>for. The grid I use consists of 500k cells (which is approximately 1.6M of
>> >>DOFs).*
>>
>> >>
>>
>> >>* The best performance and memory usage for single MPI process was
>> >>obtained with HPDDM(BFBCG) solver and bjacobian + ICC (1) in subdomains as
>> >>preconditioner, it took 1 m 45 s and RAM 5.0 GB. Parallel computation with
>> >>4 MPI processes took 2 m 46 s when using 5.6 GB of RAM. This because of
>> >>number of iterations required to achieve the same tolerance is
>> >>significantly increased.*
>>
>> >
>>
>> >How many iterations do you have in serial (and then in parallel)?
>>
>>
>>
>> Serial run is required 112 iterations to reach convergence
>> (log_hpddm(bfbcg)_bjacobian_icc_1_mpi.txt), parallel run with 4 MPI – 680
>> iterations.
>>
>>
>>
>> I attached log files for all simulations (txt files with convergence
>> behavior and RAM usage log in separate txt files) and resulting table with
>> convergence/memory usage data(xls). Data for main non-regular grid with
>> 500K cells and heterogeneous properties are in 500K folder, whereas data
>> for simple uniform 125K cells grid with constant properties are in 125K
>> folder.
>>
>>
>>
>>
>>
>> >>* I`ve also tried PCGAMG (agg) preconditioner with IC**С** (1)
>> >>sub-precondtioner. For single MPI process, the calculation took 10 min and
>> >>3.4 GB of RAM. To improve the convergence rate, the nullspace was attached
>> >>using MatNullSpaceCreateRigidBody and MatSetNearNullSpace subroutines.
>> >>This has reduced calculation time to 3 m 58 s when using 4.3 GB of RAM.
>> >>Also, there is peak memory usage with 14.1 GB, which appears just before
>> >>the start of the iterations. Parallel computation with 4 MPI processes
>> >>took 2 m 53 s when using 8.4 GB of RAM. In that case the peak memory usage
>> >>is about 22 GB.*
>>
>> >
>>
>> >Does the number of iterates increase in parallel? Again, how many
>> >iterations do you have?
>>
>>
>>
>> For case with 4 MPI processes and attached nullspace it is required 177
>> iterations to reach convergence (you may see detailed log in
>> log_hpddm(bfbcg)_gamg_nearnullspace_4_mpi.txt). For comparison, 90
>> iterations are required for sequential
>> run(log_hpddm(bfbcg)_gamg_nearnullspace_1_mpi.txt).
>>
>>
> Again, do not use ICC. I am surprised to see such a large jump in
> iteration count, but get ICC off the table.
>
> You will see variability in the iteration count with processor count with
> GAMG. As much as 10% +-. Maybe more (random) variability , but usually less.
>
> You can decrease the memory a little, and the setup time a lot, by
> aggressively coarsening, at the expense of higher iteration counts. It's a
> balancing act.
>
> You can run with the defaults, add '-info', grep on GAMG and send the ~30
> lines of output if you want advice on parameters.
>
Can you send the output of
-ksp_view -ksp_monitor_true_residual -ksp_converged_reason
Thanks,
Matt
> Thanks,
> Mark
>
>
>>
>>
>>
>>
>>
>>
>> >>* Are there ways to avoid decreasing of the convergence rate for bjacobi
>> >>precondtioner in parallel mode? Does it make sense to use hierarchical or
>> >>nested krylov methods with a local gmres solver (sub_pc_type gmres) and
>> >>some sub-precondtioner (for example, sub_pc_type bjacobi)?*
>>
>> >
>>
>> >bjacobi is only a one-level method, so you would not expect
>> >process-independent convergence rate for this kind of problem. If the
>> >coefficient variation is not too extreme, then I would expect GAMG (or some
>> >other smoothed aggregation package, perhaps -pc_type ml (you need
>> >--download-ml)) would work well with some tuning.
>>
>>
>>
>> Thanks for idea, but, unfortunately, ML cannot be compiled with 64bit
>> integers (It is extremely necessary to perform computation on mesh with
>> more than 10M cells).
>>
>>
>>
>>
>>
>> >If you have extremely high contrast coefficients you might need something
>> >with stronger coarse grids. If you can assemble so-called Neumann matrices
>> >(https://petsc.org/release/docs/manualpages/Mat/MATIS.html#MATIS) then you
>> >could try the geneo scheme offered by PCHPDDM.
>>
>>
>>
>>
>>
>> I found strange convergence behavior for HPDDM preconditioner. For 1 MPI
>> process BFBCG solver did not converged
>> (log_hpddm(bfbcg)_pchpddm_1_mpi.txt), while for 4 MPI processes computation
>> was successful (1018 to reach convergence,
>> log_hpddm(bfbcg)_pchpddm_4_mpi.txt).
>>
>> But it should be mentioned that stiffness matrix was created in AIJ
>> format (our default matrix format in program).
>>
>> Matrix conversion to MATIS format via MatConvert subroutine resulted in
>> losing of convergence for both serial and parallel run.
>>
>>
>> >>* Is this peak memory usage expected for gamg preconditioner? is there any
>> >>way to reduce it?*
>>
>> >
>>
>> >I think that peak memory usage comes from building the coarse grids. Can
>> >you run with `-info` and grep for GAMG, this will provide some output that
>> >more expert GAMG users can interpret.
>>
>>
>>
>> Thanks, I`ll try to use a strong threshold only for coarse grids.
>>
>>
>>
>> Kind regards,
>>
>>
>>
>> Viktor Nazdrachev
>>
>>
>>
>> R&D senior researcher
>>
>>
>>
>> Geosteering Technologies LLC
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> ср, 1 сент. 2021 г. в 12:02, Lawrence Mitchell <[email protected]>:
>>
>>>
>>>
>>> > On 1 Sep 2021, at 09:42, Наздрачёв Виктор <[email protected]>
>>> wrote:
>>> >
>>> > I have a 3D elasticity problem with heterogeneous properties.
>>>
>>> What does your coefficient variation look like? How large is the
>>> contrast?
>>>
>>> > There is unstructured grid with aspect ratio varied from 4 to 25. Zero
>>> Dirichlet BCs are imposed on bottom face of mesh. Also, Neumann (traction)
>>> BCs are imposed on side faces. Gravity load is also accounted for. The grid
>>> I use consists of 500k cells (which is approximately 1.6M of DOFs).
>>> >
>>> > The best performance and memory usage for single MPI process was
>>> obtained with HPDDM(BFBCG) solver and bjacobian + ICC (1) in subdomains as
>>> preconditioner, it took 1 m 45 s and RAM 5.0 GB. Parallel computation with
>>> 4 MPI processes took 2 m 46 s when using 5.6 GB of RAM. This because of
>>> number of iterations required to achieve the same tolerance is
>>> significantly increased.
>>>
>>> How many iterations do you have in serial (and then in parallel)?
>>>
>>> > I`ve also tried PCGAMG (agg) preconditioner with ICС (1)
>>> sub-precondtioner. For single MPI process, the calculation took 10 min and
>>> 3.4 GB of RAM. To improve the convergence rate, the nullspace was attached
>>> using MatNullSpaceCreateRigidBody and MatSetNearNullSpace subroutines.
>>> This has reduced calculation time to 3 m 58 s when using 4.3 GB of RAM.
>>> Also, there is peak memory usage with 14.1 GB, which appears just before
>>> the start of the iterations. Parallel computation with 4 MPI processes took
>>> 2 m 53 s when using 8.4 GB of RAM. In that case the peak memory usage is
>>> about 22 GB.
>>>
>>> Does the number of iterates increase in parallel? Again, how many
>>> iterations do you have?
>>>
>>> > Are there ways to avoid decreasing of the convergence rate for bjacobi
>>> precondtioner in parallel mode? Does it make sense to use hierarchical or
>>> nested krylov methods with a local gmres solver (sub_pc_type gmres) and
>>> some sub-precondtioner (for example, sub_pc_type bjacobi)?
>>>
>>> bjacobi is only a one-level method, so you would not expect
>>> process-independent convergence rate for this kind of problem. If the
>>> coefficient variation is not too extreme, then I would expect GAMG (or some
>>> other smoothed aggregation package, perhaps -pc_type ml (you need
>>> --download-ml)) would work well with some tuning.
>>>
>>> If you have extremely high contrast coefficients you might need
>>> something with stronger coarse grids. If you can assemble so-called Neumann
>>> matrices (
>>> https://petsc.org/release/docs/manualpages/Mat/MATIS.html#MATIS) then
>>> you could try the geneo scheme offered by PCHPDDM.
>>>
>>> > Is this peak memory usage expected for gamg preconditioner? is there
>>> any way to reduce it?
>>>
>>> I think that peak memory usage comes from building the coarse grids. Can
>>> you run with `-info` and grep for GAMG, this will provide some output that
>>> more expert GAMG users can interpret.
>>>
>>> Lawrence
>>>
>>>
--
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener
https://www.cse.buffalo.edu/~knepley/ <http://www.cse.buffalo.edu/~knepley/>