That does not seem to be an ASCII file. On Fri, Sep 3, 2021 at 10:48 AM Viktor Nazdrachev <[email protected]> wrote:
> Hello Mark and Matthew! > > > > I attached log files for serial and parallel cases and corresponding > information about GAMG preconditioner (using grep). > > I have to notice, that assembling of global stiffness matrix in code was > performed by MatSetValues subrotuine (not MatSetValuesBlocked) > > !nnds – number of nodes > > !dmn=3 > > call MatCreate(Petsc_Comm_World,Mat_K,ierr) > > call MatSetFromOptions(Mat_K,ierr) > > call MatSetSizes(Mat_K,Petsc_Decide,Petsc_Decide,n,n,ierr_m) > > … > > call MatMPIAIJSetPreallocation(Mat_K,0,dbw,0,obw,ierr) > > … > > call MatSetOption(Mat_K,Mat_New_Nonzero_Allocation_Err,Petsc_False,ierr) > > … > > do i=1,nels > > call FormLocalK(i,k,indx,"Kp") ! find local stiffness matrix > > indx=indxmap(indx,2) !find global indices for DOFs > > call MatSetValues(Mat_K,ef_eldof,indx,ef_eldof,indx,k,Add_Values,ierr) > > end do > > > > But nullspace vector was created using VecSetBlockSize subroutine. > > > > call VecCreate(Petsc_Comm_World,Vec_NullSpace,ierr) > > call VecSetBlockSize(Vec_NullSpace,dmn,ierr) > > call VecSetSizes(Vec_NullSpace,nnds*dmn,Petsc_Decide,ierr) > > call VecSetUp(Vec_NullSpace,ierr) > > call VecGetArrayF90(Vec_NullSpace,null_space,ierr) > > … > > call VecRestoreArrayF90(Vec_NullSpace,null_space,ierr) > > call MatNullSpaceCreateRigidBody(Vec_NullSpace,matnull,ierr) > > call MatSetNearNullSpace(Mat_K,matnull,ierr) > > > > I suppose it can be one of the reasons of GAMG slow convergence. > > So I attached log files for parallel run with «pure» GAMG precondtioner. > > > > > > Kind regards, > > > > Viktor Nazdrachev > > > > R&D senior researcher > > > > Geosteering Technologies LLC > > пт, 3 сент. 2021 г. в 15:11, Matthew Knepley <[email protected]>: > >> On Fri, Sep 3, 2021 at 8:02 AM Mark Adams <[email protected]> wrote: >> >>> >>> >>> On Fri, Sep 3, 2021 at 1:57 AM Viktor Nazdrachev <[email protected]> >>> wrote: >>> >>>> Hello, Lawrence! >>>> Thank you for your response! >>>> >>>> I attached log files (txt files with convergence behavior and RAM usage >>>> log in separate txt files) and resulting table with convergence >>>> investigation data(xls). Data for main non-regular grid with 500K cells and >>>> heterogeneous properties are in 500K folder, whereas data for simple >>>> uniform 125K cells grid with constant properties are in 125K folder. >>>> >>>> >>>> >>* On 1 Sep 2021, at 09:42, **Наздрачёв** Виктор** <numbersixvs at >>>> >>gmail.com <https://lists.mcs.anl.gov/mailman/listinfo/petsc-users>**> >>>> >>wrote:* >>>> >>>> >> >>>> >>>> >>* I have a 3D elasticity problem with heterogeneous properties.* >>>> >>>> > >>>> >>>> >What does your coefficient variation look like? How large is the contrast? >>>> >>>> >>>> >>>> Young modulus varies from 1 to 10 GPa, Poisson ratio varies from 0.3 to >>>> 0.44 and density – from 1700 to 2600 kg/m^3. >>>> >>> >>> That is not too bad. Poorly shaped elements are the next thing to worry >>> about. Try to keep the aspect ratio below 10 if possible. >>> >>> >>>> >>>> >>>> >>>> >>>> >>* There is unstructured grid with aspect ratio varied from 4 to 25. Zero >>>> >>Dirichlet BCs are imposed on bottom face of mesh. Also, Neumann >>>> >>(traction) BCs are imposed on side faces. Gravity load is also accounted >>>> >>for. The grid I use consists of 500k cells (which is approximately 1.6M >>>> >>of DOFs).* >>>> >>>> >> >>>> >>>> >>* The best performance and memory usage for single MPI process was >>>> >>obtained with HPDDM(BFBCG) solver and bjacobian + ICC (1) in subdomains >>>> >>as preconditioner, it took 1 m 45 s and RAM 5.0 GB. Parallel computation >>>> >>with 4 MPI processes took 2 m 46 s when using 5.6 GB of RAM. This >>>> >>because of number of iterations required to achieve the same tolerance >>>> >>is significantly increased.* >>>> >>>> > >>>> >>>> >How many iterations do you have in serial (and then in parallel)? >>>> >>>> >>>> >>>> Serial run is required 112 iterations to reach convergence >>>> (log_hpddm(bfbcg)_bjacobian_icc_1_mpi.txt), parallel run with 4 MPI – 680 >>>> iterations. >>>> >>>> >>>> >>>> I attached log files for all simulations (txt files with convergence >>>> behavior and RAM usage log in separate txt files) and resulting table with >>>> convergence/memory usage data(xls). Data for main non-regular grid with >>>> 500K cells and heterogeneous properties are in 500K folder, whereas data >>>> for simple uniform 125K cells grid with constant properties are in 125K >>>> folder. >>>> >>>> >>>> >>>> >>>> >>>> >>* I`ve also tried PCGAMG (agg) preconditioner with IC**С** (1) >>>> >>sub-precondtioner. For single MPI process, the calculation took 10 min >>>> >>and 3.4 GB of RAM. To improve the convergence rate, the nullspace was >>>> >>attached using MatNullSpaceCreateRigidBody and MatSetNearNullSpace >>>> >>subroutines. This has reduced calculation time to 3 m 58 s when using >>>> >>4.3 GB of RAM. Also, there is peak memory usage with 14.1 GB, which >>>> >>appears just before the start of the iterations. Parallel computation >>>> >>with 4 MPI processes took 2 m 53 s when using 8.4 GB of RAM. In that >>>> >>case the peak memory usage is about 22 GB.* >>>> >>>> > >>>> >>>> >Does the number of iterates increase in parallel? Again, how many >>>> >iterations do you have? >>>> >>>> >>>> >>>> For case with 4 MPI processes and attached nullspace it is required 177 >>>> iterations to reach convergence (you may see detailed log in >>>> log_hpddm(bfbcg)_gamg_nearnullspace_4_mpi.txt). For comparison, 90 >>>> iterations are required for sequential >>>> run(log_hpddm(bfbcg)_gamg_nearnullspace_1_mpi.txt). >>>> >>>> >>> Again, do not use ICC. I am surprised to see such a large jump in >>> iteration count, but get ICC off the table. >>> >>> You will see variability in the iteration count with processor count >>> with GAMG. As much as 10% +-. Maybe more (random) variability , but usually >>> less. >>> >>> You can decrease the memory a little, and the setup time a lot, by >>> aggressively coarsening, at the expense of higher iteration counts. It's a >>> balancing act. >>> >>> You can run with the defaults, add '-info', grep on GAMG and send the >>> ~30 lines of output if you want advice on parameters. >>> >> >> Can you send the output of >> >> -ksp_view -ksp_monitor_true_residual -ksp_converged_reason >> >> Thanks, >> >> Matt >> >> >>> Thanks, >>> Mark >>> >>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>* Are there ways to avoid decreasing of the convergence rate for bjacobi >>>> >>precondtioner in parallel mode? Does it make sense to use hierarchical >>>> >>or nested krylov methods with a local gmres solver (sub_pc_type gmres) >>>> >>and some sub-precondtioner (for example, sub_pc_type bjacobi)?* >>>> >>>> > >>>> >>>> >bjacobi is only a one-level method, so you would not expect >>>> >process-independent convergence rate for this kind of problem. If the >>>> >coefficient variation is not too extreme, then I would expect GAMG (or >>>> >some other smoothed aggregation package, perhaps -pc_type ml (you need >>>> >--download-ml)) would work well with some tuning. >>>> >>>> >>>> >>>> Thanks for idea, but, unfortunately, ML cannot be compiled with 64bit >>>> integers (It is extremely necessary to perform computation on mesh with >>>> more than 10M cells). >>>> >>>> >>>> >>>> >>>> >>>> >If you have extremely high contrast coefficients you might need something >>>> >with stronger coarse grids. If you can assemble so-called Neumann >>>> >matrices >>>> >(https://petsc.org/release/docs/manualpages/Mat/MATIS.html#MATIS) then >>>> >you could try the geneo scheme offered by PCHPDDM. >>>> >>>> >>>> >>>> >>>> >>>> I found strange convergence behavior for HPDDM preconditioner. For 1 >>>> MPI process BFBCG solver did not converged >>>> (log_hpddm(bfbcg)_pchpddm_1_mpi.txt), while for 4 MPI processes computation >>>> was successful (1018 to reach convergence, >>>> log_hpddm(bfbcg)_pchpddm_4_mpi.txt). >>>> >>>> But it should be mentioned that stiffness matrix was created in AIJ >>>> format (our default matrix format in program). >>>> >>>> Matrix conversion to MATIS format via MatConvert subroutine resulted >>>> in losing of convergence for both serial and parallel run. >>>> >>>> >>>> >>* Is this peak memory usage expected for gamg preconditioner? is there >>>> >>any way to reduce it?* >>>> >>>> > >>>> >>>> >I think that peak memory usage comes from building the coarse grids. Can >>>> >you run with `-info` and grep for GAMG, this will provide some output >>>> >that more expert GAMG users can interpret. >>>> >>>> >>>> >>>> Thanks, I`ll try to use a strong threshold only for coarse grids. >>>> >>>> >>>> >>>> Kind regards, >>>> >>>> >>>> >>>> Viktor Nazdrachev >>>> >>>> >>>> >>>> R&D senior researcher >>>> >>>> >>>> >>>> Geosteering Technologies LLC >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> ср, 1 сент. 2021 г. в 12:02, Lawrence Mitchell <[email protected]>: >>>> >>>>> >>>>> >>>>> > On 1 Sep 2021, at 09:42, Наздрачёв Виктор <[email protected]> >>>>> wrote: >>>>> > >>>>> > I have a 3D elasticity problem with heterogeneous properties. >>>>> >>>>> What does your coefficient variation look like? How large is the >>>>> contrast? >>>>> >>>>> > There is unstructured grid with aspect ratio varied from 4 to 25. >>>>> Zero Dirichlet BCs are imposed on bottom face of mesh. Also, Neumann >>>>> (traction) BCs are imposed on side faces. Gravity load is also accounted >>>>> for. The grid I use consists of 500k cells (which is approximately 1.6M of >>>>> DOFs). >>>>> > >>>>> > The best performance and memory usage for single MPI process was >>>>> obtained with HPDDM(BFBCG) solver and bjacobian + ICC (1) in subdomains as >>>>> preconditioner, it took 1 m 45 s and RAM 5.0 GB. Parallel computation with >>>>> 4 MPI processes took 2 m 46 s when using 5.6 GB of RAM. This because of >>>>> number of iterations required to achieve the same tolerance is >>>>> significantly increased. >>>>> >>>>> How many iterations do you have in serial (and then in parallel)? >>>>> >>>>> > I`ve also tried PCGAMG (agg) preconditioner with ICС (1) >>>>> sub-precondtioner. For single MPI process, the calculation took 10 min and >>>>> 3.4 GB of RAM. To improve the convergence rate, the nullspace was attached >>>>> using MatNullSpaceCreateRigidBody and MatSetNearNullSpace subroutines. >>>>> This has reduced calculation time to 3 m 58 s when using 4.3 GB of RAM. >>>>> Also, there is peak memory usage with 14.1 GB, which appears just before >>>>> the start of the iterations. Parallel computation with 4 MPI processes >>>>> took >>>>> 2 m 53 s when using 8.4 GB of RAM. In that case the peak memory usage is >>>>> about 22 GB. >>>>> >>>>> Does the number of iterates increase in parallel? Again, how many >>>>> iterations do you have? >>>>> >>>>> > Are there ways to avoid decreasing of the convergence rate for >>>>> bjacobi precondtioner in parallel mode? Does it make sense to use >>>>> hierarchical or nested krylov methods with a local gmres solver >>>>> (sub_pc_type gmres) and some sub-precondtioner (for example, sub_pc_type >>>>> bjacobi)? >>>>> >>>>> bjacobi is only a one-level method, so you would not expect >>>>> process-independent convergence rate for this kind of problem. If the >>>>> coefficient variation is not too extreme, then I would expect GAMG (or >>>>> some >>>>> other smoothed aggregation package, perhaps -pc_type ml (you need >>>>> --download-ml)) would work well with some tuning. >>>>> >>>>> If you have extremely high contrast coefficients you might need >>>>> something with stronger coarse grids. If you can assemble so-called >>>>> Neumann >>>>> matrices ( >>>>> https://petsc.org/release/docs/manualpages/Mat/MATIS.html#MATIS) then >>>>> you could try the geneo scheme offered by PCHPDDM. >>>>> >>>>> > Is this peak memory usage expected for gamg preconditioner? is there >>>>> any way to reduce it? >>>>> >>>>> I think that peak memory usage comes from building the coarse grids. >>>>> Can you run with `-info` and grep for GAMG, this will provide some output >>>>> that more expert GAMG users can interpret. >>>>> >>>>> Lawrence >>>>> >>>>> >> >> -- >> What most experimenters take for granted before they begin their >> experiments is infinitely more interesting than any results to which their >> experiments lead. >> -- Norbert Wiener >> >> https://www.cse.buffalo.edu/~knepley/ >> <http://www.cse.buffalo.edu/~knepley/> >> >
