On Fri, Sep 3, 2021 at 1:57 AM Viktor Nazdrachev <[email protected]> wrote:
> Hello, Lawrence! > Thank you for your response! > > I attached log files (txt files with convergence behavior and RAM usage > log in separate txt files) and resulting table with convergence > investigation data(xls). Data for main non-regular grid with 500K cells and > heterogeneous properties are in 500K folder, whereas data for simple > uniform 125K cells grid with constant properties are in 125K folder. > > > >>* On 1 Sep 2021, at 09:42, **Наздрачёв** Виктор** <numbersixvs at gmail.com > >><https://lists.mcs.anl.gov/mailman/listinfo/petsc-users>**> wrote:* > > >> > > >>* I have a 3D elasticity problem with heterogeneous properties.* > > > > > >What does your coefficient variation look like? How large is the contrast? > > > > Young modulus varies from 1 to 10 GPa, Poisson ratio varies from 0.3 to > 0.44 and density – from 1700 to 2600 kg/m^3. > That is not too bad. Poorly shaped elements are the next thing to worry about. Try to keep the aspect ratio below 10 if possible. > > > > > >>* There is unstructured grid with aspect ratio varied from 4 to 25. Zero > >>Dirichlet BCs are imposed on bottom face of mesh. Also, Neumann (traction) > >>BCs are imposed on side faces. Gravity load is also accounted for. The grid > >>I use consists of 500k cells (which is approximately 1.6M of DOFs).* > > >> > > >>* The best performance and memory usage for single MPI process was obtained > >>with HPDDM(BFBCG) solver and bjacobian + ICC (1) in subdomains as > >>preconditioner, it took 1 m 45 s and RAM 5.0 GB. Parallel computation with > >>4 MPI processes took 2 m 46 s when using 5.6 GB of RAM. This because of > >>number of iterations required to achieve the same tolerance is > >>significantly increased.* > > > > > >How many iterations do you have in serial (and then in parallel)? > > > > Serial run is required 112 iterations to reach convergence > (log_hpddm(bfbcg)_bjacobian_icc_1_mpi.txt), parallel run with 4 MPI – 680 > iterations. > > > > I attached log files for all simulations (txt files with convergence > behavior and RAM usage log in separate txt files) and resulting table with > convergence/memory usage data(xls). Data for main non-regular grid with > 500K cells and heterogeneous properties are in 500K folder, whereas data > for simple uniform 125K cells grid with constant properties are in 125K > folder. > > > > > > >>* I`ve also tried PCGAMG (agg) preconditioner with IC**С** (1) > >>sub-precondtioner. For single MPI process, the calculation took 10 min and > >>3.4 GB of RAM. To improve the convergence rate, the nullspace was attached > >>using MatNullSpaceCreateRigidBody and MatSetNearNullSpace subroutines. > >>This has reduced calculation time to 3 m 58 s when using 4.3 GB of RAM. > >>Also, there is peak memory usage with 14.1 GB, which appears just before > >>the start of the iterations. Parallel computation with 4 MPI processes took > >>2 m 53 s when using 8.4 GB of RAM. In that case the peak memory usage is > >>about 22 GB.* > > > > > >Does the number of iterates increase in parallel? Again, how many iterations > >do you have? > > > > For case with 4 MPI processes and attached nullspace it is required 177 > iterations to reach convergence (you may see detailed log in > log_hpddm(bfbcg)_gamg_nearnullspace_4_mpi.txt). For comparison, 90 iterations > are required for sequential > run(log_hpddm(bfbcg)_gamg_nearnullspace_1_mpi.txt). > > Again, do not use ICC. I am surprised to see such a large jump in iteration count, but get ICC off the table. You will see variability in the iteration count with processor count with GAMG. As much as 10% +-. Maybe more (random) variability , but usually less. You can decrease the memory a little, and the setup time a lot, by aggressively coarsening, at the expense of higher iteration counts. It's a balancing act. You can run with the defaults, add '-info', grep on GAMG and send the ~30 lines of output if you want advice on parameters. Thanks, Mark > > > > > > > >>* Are there ways to avoid decreasing of the convergence rate for bjacobi > >>precondtioner in parallel mode? Does it make sense to use hierarchical or > >>nested krylov methods with a local gmres solver (sub_pc_type gmres) and > >>some sub-precondtioner (for example, sub_pc_type bjacobi)?* > > > > > >bjacobi is only a one-level method, so you would not expect > >process-independent convergence rate for this kind of problem. If the > >coefficient variation is not too extreme, then I would expect GAMG (or some > >other smoothed aggregation package, perhaps -pc_type ml (you need > >--download-ml)) would work well with some tuning. > > > > Thanks for idea, but, unfortunately, ML cannot be compiled with 64bit > integers (It is extremely necessary to perform computation on mesh with > more than 10M cells). > > > > > > >If you have extremely high contrast coefficients you might need something > >with stronger coarse grids. If you can assemble so-called Neumann matrices > >(https://petsc.org/release/docs/manualpages/Mat/MATIS.html#MATIS) then you > >could try the geneo scheme offered by PCHPDDM. > > > > > > I found strange convergence behavior for HPDDM preconditioner. For 1 MPI > process BFBCG solver did not converged > (log_hpddm(bfbcg)_pchpddm_1_mpi.txt), while for 4 MPI processes computation > was successful (1018 to reach convergence, > log_hpddm(bfbcg)_pchpddm_4_mpi.txt). > > But it should be mentioned that stiffness matrix was created in AIJ format > (our default matrix format in program). > > Matrix conversion to MATIS format via MatConvert subroutine resulted in > losing of convergence for both serial and parallel run. > > > >>* Is this peak memory usage expected for gamg preconditioner? is there any > >>way to reduce it?* > > > > > >I think that peak memory usage comes from building the coarse grids. Can you > >run with `-info` and grep for GAMG, this will provide some output that more > >expert GAMG users can interpret. > > > > Thanks, I`ll try to use a strong threshold only for coarse grids. > > > > Kind regards, > > > > Viktor Nazdrachev > > > > R&D senior researcher > > > > Geosteering Technologies LLC > > > > > > > > > > ср, 1 сент. 2021 г. в 12:02, Lawrence Mitchell <[email protected]>: > >> >> >> > On 1 Sep 2021, at 09:42, Наздрачёв Виктор <[email protected]> >> wrote: >> > >> > I have a 3D elasticity problem with heterogeneous properties. >> >> What does your coefficient variation look like? How large is the contrast? >> >> > There is unstructured grid with aspect ratio varied from 4 to 25. Zero >> Dirichlet BCs are imposed on bottom face of mesh. Also, Neumann (traction) >> BCs are imposed on side faces. Gravity load is also accounted for. The grid >> I use consists of 500k cells (which is approximately 1.6M of DOFs). >> > >> > The best performance and memory usage for single MPI process was >> obtained with HPDDM(BFBCG) solver and bjacobian + ICC (1) in subdomains as >> preconditioner, it took 1 m 45 s and RAM 5.0 GB. Parallel computation with >> 4 MPI processes took 2 m 46 s when using 5.6 GB of RAM. This because of >> number of iterations required to achieve the same tolerance is >> significantly increased. >> >> How many iterations do you have in serial (and then in parallel)? >> >> > I`ve also tried PCGAMG (agg) preconditioner with ICС (1) >> sub-precondtioner. For single MPI process, the calculation took 10 min and >> 3.4 GB of RAM. To improve the convergence rate, the nullspace was attached >> using MatNullSpaceCreateRigidBody and MatSetNearNullSpace subroutines. >> This has reduced calculation time to 3 m 58 s when using 4.3 GB of RAM. >> Also, there is peak memory usage with 14.1 GB, which appears just before >> the start of the iterations. Parallel computation with 4 MPI processes took >> 2 m 53 s when using 8.4 GB of RAM. In that case the peak memory usage is >> about 22 GB. >> >> Does the number of iterates increase in parallel? Again, how many >> iterations do you have? >> >> > Are there ways to avoid decreasing of the convergence rate for bjacobi >> precondtioner in parallel mode? Does it make sense to use hierarchical or >> nested krylov methods with a local gmres solver (sub_pc_type gmres) and >> some sub-precondtioner (for example, sub_pc_type bjacobi)? >> >> bjacobi is only a one-level method, so you would not expect >> process-independent convergence rate for this kind of problem. If the >> coefficient variation is not too extreme, then I would expect GAMG (or some >> other smoothed aggregation package, perhaps -pc_type ml (you need >> --download-ml)) would work well with some tuning. >> >> If you have extremely high contrast coefficients you might need something >> with stronger coarse grids. If you can assemble so-called Neumann matrices ( >> https://petsc.org/release/docs/manualpages/Mat/MATIS.html#MATIS) then >> you could try the geneo scheme offered by PCHPDDM. >> >> > Is this peak memory usage expected for gamg preconditioner? is there >> any way to reduce it? >> >> I think that peak memory usage comes from building the coarse grids. Can >> you run with `-info` and grep for GAMG, this will provide some output that >> more expert GAMG users can interpret. >> >> Lawrence >> >>
