It is a RAR since this is Windows :)
Viktor, your system looks singular. Is it possible that you somehow have
zero on the diagonal? That might make the
SOR a problem. You could replace that with Jacobi using
-mg_levels_pc_type jacobi
0 KSP Residual norm 2.980664994991e+02
0 KSP preconditioned resid norm 2.980664994991e+02 true resid norm
7.983356882620e+11 ||r(i)||/||b|| 1.000000000000e+00
1 KSP Residual norm 1.650358505966e+01
1 KSP preconditioned resid norm 1.650358505966e+01 true resid norm
4.601793132543e+12 ||r(i)||/||b|| 5.764233267037e+00
2 KSP Residual norm 2.086911345353e+01
2 KSP preconditioned resid norm 2.086911345353e+01 true resid norm
1.258153657657e+12 ||r(i)||/||b|| 1.575970705250e+00
3 KSP Residual norm 1.909137523120e+01
3 KSP preconditioned resid norm 1.909137523120e+01 true resid norm
2.179275269000e+12 ||r(i)||/||b|| 2.729773077969e+00
Mark, here is the solver
KSP Object: 1 MPI processes
type: cg
maximum iterations=100000, initial guess is zero
tolerances: relative=1e-08, absolute=1e-50, divergence=10000.
left preconditioning
using PRECONDITIONED norm type for convergence test
PC Object: 1 MPI processes
type: gamg
type is MULTIPLICATIVE, levels=4 cycles=v
Cycles per PCApply=1
Using externally compute Galerkin coarse grid matrices
GAMG specific options
Threshold for dropping small values in graph on each level = 0.
0. 0. 0.
Threshold scaling factor for each level not specified = 1.
AGG specific options
Symmetric graph false
Number of levels to square graph 1
Number smoothing steps 1
Complexity: grid = 1.0042
Coarse grid solver -- level -------------------------------
KSP Object: (mg_coarse_) 1 MPI processes
type: preonly
maximum iterations=10000, initial guess is zero
tolerances: relative=1e-05, absolute=1e-50, divergence=10000.
left preconditioning
using NONE norm type for convergence test
PC Object: (mg_coarse_) 1 MPI processes
type: bjacobi
number of blocks = 1
Local solver information for first block is in the following KSP
and PC objects on rank 0:
Use -mg_coarse_ksp_view ::ascii_info_detail to display information
for all blocks
KSP Object: (mg_coarse_sub_) 1 MPI processes
type: preonly
maximum iterations=1, initial guess is zero
tolerances: relative=1e-05, absolute=1e-50, divergence=10000.
left preconditioning
using NONE norm type for convergence test
PC Object: (mg_coarse_sub_) 1 MPI processes
type: lu
out-of-place factorization
tolerance for zero pivot 2.22045e-14
using diagonal shift on blocks to prevent zero pivot [INBLOCKS]
matrix ordering: nd
factor fill ratio given 5., needed 1.19444
Factored matrix follows:
Mat Object: 1 MPI processes
type: seqaij
rows=36, cols=36
package used to perform factorization: petsc
total: nonzeros=774, allocated nonzeros=774
using I-node routines: found 22 nodes, limit used is 5
linear system matrix = precond matrix:
Mat Object: (mg_coarse_sub_) 1 MPI processes
type: seqaij
rows=36, cols=36
total: nonzeros=648, allocated nonzeros=648
total number of mallocs used during MatSetValues calls=0
not using I-node routines
linear system matrix = precond matrix:
Mat Object: (mg_coarse_sub_) 1 MPI processes
type: seqaij
rows=36, cols=36
total: nonzeros=648, allocated nonzeros=648
total number of mallocs used during MatSetValues calls=0
not using I-node routines
Down solver (pre-smoother) on level 1 -------------------------------
KSP Object: (mg_levels_1_) 1 MPI processes
type: chebyshev
eigenvalue estimates used: min = 0.0997354, max = 1.09709
eigenvalues estimate via gmres min 0.00372245, max 0.997354
eigenvalues estimated using gmres with translations [0. 0.1; 0.
1.1]
KSP Object: (mg_levels_1_esteig_) 1 MPI processes
type: gmres
restart=30, using Classical (unmodified) Gram-Schmidt
Orthogonalization with no iterative refinement
happy breakdown tolerance 1e-30
maximum iterations=10, initial guess is zero
tolerances: relative=1e-12, absolute=1e-50, divergence=10000.
left preconditioning
using PRECONDITIONED norm type for convergence test
estimating eigenvalues using noisy right hand side
maximum iterations=2, nonzero initial guess
tolerances: relative=1e-05, absolute=1e-50, divergence=10000.
left preconditioning
using NONE norm type for convergence test
PC Object: (mg_levels_1_) 1 MPI processes
type: sor
type = local_symmetric, iterations = 1, local iterations = 1, omega
= 1.
linear system matrix = precond matrix:
Mat Object: 1 MPI processes
type: seqaij
rows=902, cols=902
total: nonzeros=66660, allocated nonzeros=66660
total number of mallocs used during MatSetValues calls=0
not using I-node routines
Up solver (post-smoother) same as down solver (pre-smoother)
Down solver (pre-smoother) on level 2 -------------------------------
KSP Object: (mg_levels_2_) 1 MPI processes
type: chebyshev
eigenvalue estimates used: min = 0.0994525, max = 1.09398
eigenvalues estimate via gmres min 0.0303095, max 0.994525
eigenvalues estimated using gmres with translations [0. 0.1; 0.
1.1]
KSP Object: (mg_levels_2_esteig_) 1 MPI processes
type: gmres
restart=30, using Classical (unmodified) Gram-Schmidt
Orthogonalization with no iterative refinement
happy breakdown tolerance 1e-30
maximum iterations=10, initial guess is zero
tolerances: relative=1e-12, absolute=1e-50, divergence=10000.
left preconditioning
using PRECONDITIONED norm type for convergence test
estimating eigenvalues using noisy right hand side
maximum iterations=2, nonzero initial guess
tolerances: relative=1e-05, absolute=1e-50, divergence=10000.
left preconditioning
using NONE norm type for convergence test
PC Object: (mg_levels_2_) 1 MPI processes
type: sor
type = local_symmetric, iterations = 1, local iterations = 1, omega
= 1.
linear system matrix = precond matrix:
Mat Object: 1 MPI processes
type: seqaij
rows=12043, cols=12043
total: nonzeros=455611, allocated nonzeros=455611
total number of mallocs used during MatSetValues calls=0
not using I-node routines
Up solver (post-smoother) same as down solver (pre-smoother)
Down solver (pre-smoother) on level 3 -------------------------------
KSP Object: (mg_levels_3_) 1 MPI processes
type: chebyshev
eigenvalue estimates used: min = 0.0992144, max = 1.09136
eigenvalues estimate via gmres min 0.0222691, max 0.992144
eigenvalues estimated using gmres with translations [0. 0.1; 0.
1.1]
KSP Object: (mg_levels_3_esteig_) 1 MPI processes
type: gmres
restart=30, using Classical (unmodified) Gram-Schmidt
Orthogonalization with no iterative refinement
happy breakdown tolerance 1e-30
maximum iterations=10, initial guess is zero
tolerances: relative=1e-12, absolute=1e-50, divergence=10000.
left preconditioning
using PRECONDITIONED norm type for convergence test
estimating eigenvalues using noisy right hand side
maximum iterations=2, nonzero initial guess
tolerances: relative=1e-05, absolute=1e-50, divergence=10000.
left preconditioning
using NONE norm type for convergence test
PC Object: (mg_levels_3_) 1 MPI processes
type: sor
type = local_symmetric, iterations = 1, local iterations = 1, omega
= 1.
linear system matrix = precond matrix:
Mat Object: 1 MPI processes
type: seqaij
rows=1600200, cols=1600200
total: nonzeros=124439742, allocated nonzeros=129616200
total number of mallocs used during MatSetValues calls=0
using I-node routines: found 533400 nodes, limit used is 5
Up solver (post-smoother) same as down solver (pre-smoother)
linear system matrix = precond matrix:
Mat Object: 1 MPI processes
type: seqaij
rows=1600200, cols=1600200
total: nonzeros=124439742, allocated nonzeros=129616200
total number of mallocs used during MatSetValues calls=0
using I-node routines: found 533400 nodes, limit used is 5
Thanks,
Matt
On Fri, Sep 3, 2021 at 10:56 AM Mark Adams <[email protected]> wrote:
> That does not seem to be an ASCII file.
>
> On Fri, Sep 3, 2021 at 10:48 AM Viktor Nazdrachev <[email protected]>
> wrote:
>
>> Hello Mark and Matthew!
>>
>>
>>
>> I attached log files for serial and parallel cases and corresponding
>> information about GAMG preconditioner (using grep).
>>
>> I have to notice, that assembling of global stiffness matrix in code was
>> performed by MatSetValues subrotuine (not MatSetValuesBlocked)
>>
>> !nnds – number of nodes
>>
>> !dmn=3
>>
>> call MatCreate(Petsc_Comm_World,Mat_K,ierr)
>>
>> call MatSetFromOptions(Mat_K,ierr)
>>
>> call MatSetSizes(Mat_K,Petsc_Decide,Petsc_Decide,n,n,ierr_m)
>>
>> …
>>
>> call MatMPIAIJSetPreallocation(Mat_K,0,dbw,0,obw,ierr)
>>
>> …
>>
>> call MatSetOption(Mat_K,Mat_New_Nonzero_Allocation_Err,Petsc_False,ierr)
>>
>> …
>>
>> do i=1,nels
>>
>> call FormLocalK(i,k,indx,"Kp") ! find local stiffness matrix
>>
>> indx=indxmap(indx,2) !find global indices for DOFs
>>
>> call MatSetValues(Mat_K,ef_eldof,indx,ef_eldof,indx,k,Add_Values,ierr)
>>
>> end do
>>
>>
>>
>> But nullspace vector was created using VecSetBlockSize subroutine.
>>
>>
>>
>> call VecCreate(Petsc_Comm_World,Vec_NullSpace,ierr)
>>
>> call VecSetBlockSize(Vec_NullSpace,dmn,ierr)
>>
>> call VecSetSizes(Vec_NullSpace,nnds*dmn,Petsc_Decide,ierr)
>>
>> call VecSetUp(Vec_NullSpace,ierr)
>>
>> call VecGetArrayF90(Vec_NullSpace,null_space,ierr)
>>
>> …
>>
>> call VecRestoreArrayF90(Vec_NullSpace,null_space,ierr)
>>
>> call MatNullSpaceCreateRigidBody(Vec_NullSpace,matnull,ierr)
>>
>> call MatSetNearNullSpace(Mat_K,matnull,ierr)
>>
>>
>>
>> I suppose it can be one of the reasons of GAMG slow convergence.
>>
>> So I attached log files for parallel run with «pure» GAMG precondtioner.
>>
>>
>>
>>
>>
>> Kind regards,
>>
>>
>>
>> Viktor Nazdrachev
>>
>>
>>
>> R&D senior researcher
>>
>>
>>
>> Geosteering Technologies LLC
>>
>> пт, 3 сент. 2021 г. в 15:11, Matthew Knepley <[email protected]>:
>>
>>> On Fri, Sep 3, 2021 at 8:02 AM Mark Adams <[email protected]> wrote:
>>>
>>>>
>>>>
>>>> On Fri, Sep 3, 2021 at 1:57 AM Viktor Nazdrachev <[email protected]>
>>>> wrote:
>>>>
>>>>> Hello, Lawrence!
>>>>> Thank you for your response!
>>>>>
>>>>> I attached log files (txt files with convergence behavior and RAM
>>>>> usage log in separate txt files) and resulting table with convergence
>>>>> investigation data(xls). Data for main non-regular grid with 500K cells
>>>>> and
>>>>> heterogeneous properties are in 500K folder, whereas data for simple
>>>>> uniform 125K cells grid with constant properties are in 125K folder.
>>>>>
>>>>>
>>>>> >>* On 1 Sep 2021, at 09:42, **Наздрачёв** Виктор** <numbersixvs at
>>>>> >>gmail.com <https://lists.mcs.anl.gov/mailman/listinfo/petsc-users>**>
>>>>> >>wrote:*
>>>>>
>>>>> >>
>>>>>
>>>>> >>* I have a 3D elasticity problem with heterogeneous properties.*
>>>>>
>>>>> >
>>>>>
>>>>> >What does your coefficient variation look like? How large is the
>>>>> >contrast?
>>>>>
>>>>>
>>>>>
>>>>> Young modulus varies from 1 to 10 GPa, Poisson ratio varies from 0.3
>>>>> to 0.44 and density – from 1700 to 2600 kg/m^3.
>>>>>
>>>>
>>>> That is not too bad. Poorly shaped elements are the next thing to worry
>>>> about. Try to keep the aspect ratio below 10 if possible.
>>>>
>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> >>* There is unstructured grid with aspect ratio varied from 4 to 25.
>>>>> >>Zero Dirichlet BCs are imposed on bottom face of mesh. Also, Neumann
>>>>> >>(traction) BCs are imposed on side faces. Gravity load is also
>>>>> >>accounted for. The grid I use consists of 500k cells (which is
>>>>> >>approximately 1.6M of DOFs).*
>>>>>
>>>>> >>
>>>>>
>>>>> >>* The best performance and memory usage for single MPI process was
>>>>> >>obtained with HPDDM(BFBCG) solver and bjacobian + ICC (1) in subdomains
>>>>> >>as preconditioner, it took 1 m 45 s and RAM 5.0 GB. Parallel
>>>>> >>computation with 4 MPI processes took 2 m 46 s when using 5.6 GB of
>>>>> >>RAM. This because of number of iterations required to achieve the same
>>>>> >>tolerance is significantly increased.*
>>>>>
>>>>> >
>>>>>
>>>>> >How many iterations do you have in serial (and then in parallel)?
>>>>>
>>>>>
>>>>>
>>>>> Serial run is required 112 iterations to reach convergence
>>>>> (log_hpddm(bfbcg)_bjacobian_icc_1_mpi.txt), parallel run with 4 MPI – 680
>>>>> iterations.
>>>>>
>>>>>
>>>>>
>>>>> I attached log files for all simulations (txt files with convergence
>>>>> behavior and RAM usage log in separate txt files) and resulting table with
>>>>> convergence/memory usage data(xls). Data for main non-regular grid with
>>>>> 500K cells and heterogeneous properties are in 500K folder, whereas data
>>>>> for simple uniform 125K cells grid with constant properties are in 125K
>>>>> folder.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> >>* I`ve also tried PCGAMG (agg) preconditioner with IC**С** (1)
>>>>> >>sub-precondtioner. For single MPI process, the calculation took 10 min
>>>>> >>and 3.4 GB of RAM. To improve the convergence rate, the nullspace was
>>>>> >>attached using MatNullSpaceCreateRigidBody and MatSetNearNullSpace
>>>>> >>subroutines. This has reduced calculation time to 3 m 58 s when using
>>>>> >>4.3 GB of RAM. Also, there is peak memory usage with 14.1 GB, which
>>>>> >>appears just before the start of the iterations. Parallel computation
>>>>> >>with 4 MPI processes took 2 m 53 s when using 8.4 GB of RAM. In that
>>>>> >>case the peak memory usage is about 22 GB.*
>>>>>
>>>>> >
>>>>>
>>>>> >Does the number of iterates increase in parallel? Again, how many
>>>>> >iterations do you have?
>>>>>
>>>>>
>>>>>
>>>>> For case with 4 MPI processes and attached nullspace it is required 177
>>>>> iterations to reach convergence (you may see detailed log in
>>>>> log_hpddm(bfbcg)_gamg_nearnullspace_4_mpi.txt). For comparison, 90
>>>>> iterations are required for sequential
>>>>> run(log_hpddm(bfbcg)_gamg_nearnullspace_1_mpi.txt).
>>>>>
>>>>>
>>>> Again, do not use ICC. I am surprised to see such a large jump in
>>>> iteration count, but get ICC off the table.
>>>>
>>>> You will see variability in the iteration count with processor count
>>>> with GAMG. As much as 10% +-. Maybe more (random) variability , but usually
>>>> less.
>>>>
>>>> You can decrease the memory a little, and the setup time a lot, by
>>>> aggressively coarsening, at the expense of higher iteration counts. It's a
>>>> balancing act.
>>>>
>>>> You can run with the defaults, add '-info', grep on GAMG and send the
>>>> ~30 lines of output if you want advice on parameters.
>>>>
>>>
>>> Can you send the output of
>>>
>>> -ksp_view -ksp_monitor_true_residual -ksp_converged_reason
>>>
>>> Thanks,
>>>
>>> Matt
>>>
>>>
>>>> Thanks,
>>>> Mark
>>>>
>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> >>* Are there ways to avoid decreasing of the convergence rate for
>>>>> >>bjacobi precondtioner in parallel mode? Does it make sense to use
>>>>> >>hierarchical or nested krylov methods with a local gmres solver
>>>>> >>(sub_pc_type gmres) and some sub-precondtioner (for example,
>>>>> >>sub_pc_type bjacobi)?*
>>>>>
>>>>> >
>>>>>
>>>>> >bjacobi is only a one-level method, so you would not expect
>>>>> >process-independent convergence rate for this kind of problem. If the
>>>>> >coefficient variation is not too extreme, then I would expect GAMG (or
>>>>> >some other smoothed aggregation package, perhaps -pc_type ml (you need
>>>>> >--download-ml)) would work well with some tuning.
>>>>>
>>>>>
>>>>>
>>>>> Thanks for idea, but, unfortunately, ML cannot be compiled with 64bit
>>>>> integers (It is extremely necessary to perform computation on mesh with
>>>>> more than 10M cells).
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> >If you have extremely high contrast coefficients you might need
>>>>> >something with stronger coarse grids. If you can assemble so-called
>>>>> >Neumann matrices
>>>>> >(https://petsc.org/release/docs/manualpages/Mat/MATIS.html#MATIS) then
>>>>> >you could try the geneo scheme offered by PCHPDDM.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> I found strange convergence behavior for HPDDM preconditioner. For 1
>>>>> MPI process BFBCG solver did not converged
>>>>> (log_hpddm(bfbcg)_pchpddm_1_mpi.txt), while for 4 MPI processes
>>>>> computation
>>>>> was successful (1018 to reach convergence,
>>>>> log_hpddm(bfbcg)_pchpddm_4_mpi.txt).
>>>>>
>>>>> But it should be mentioned that stiffness matrix was created in AIJ
>>>>> format (our default matrix format in program).
>>>>>
>>>>> Matrix conversion to MATIS format via MatConvert subroutine resulted
>>>>> in losing of convergence for both serial and parallel run.
>>>>>
>>>>>
>>>>> >>* Is this peak memory usage expected for gamg preconditioner? is there
>>>>> >>any way to reduce it?*
>>>>>
>>>>> >
>>>>>
>>>>> >I think that peak memory usage comes from building the coarse grids. Can
>>>>> >you run with `-info` and grep for GAMG, this will provide some output
>>>>> >that more expert GAMG users can interpret.
>>>>>
>>>>>
>>>>>
>>>>> Thanks, I`ll try to use a strong threshold only for coarse grids.
>>>>>
>>>>>
>>>>>
>>>>> Kind regards,
>>>>>
>>>>>
>>>>>
>>>>> Viktor Nazdrachev
>>>>>
>>>>>
>>>>>
>>>>> R&D senior researcher
>>>>>
>>>>>
>>>>>
>>>>> Geosteering Technologies LLC
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> ср, 1 сент. 2021 г. в 12:02, Lawrence Mitchell <[email protected]>:
>>>>>
>>>>>>
>>>>>>
>>>>>> > On 1 Sep 2021, at 09:42, Наздрачёв Виктор <[email protected]>
>>>>>> wrote:
>>>>>> >
>>>>>> > I have a 3D elasticity problem with heterogeneous properties.
>>>>>>
>>>>>> What does your coefficient variation look like? How large is the
>>>>>> contrast?
>>>>>>
>>>>>> > There is unstructured grid with aspect ratio varied from 4 to 25.
>>>>>> Zero Dirichlet BCs are imposed on bottom face of mesh. Also, Neumann
>>>>>> (traction) BCs are imposed on side faces. Gravity load is also accounted
>>>>>> for. The grid I use consists of 500k cells (which is approximately 1.6M
>>>>>> of
>>>>>> DOFs).
>>>>>> >
>>>>>> > The best performance and memory usage for single MPI process was
>>>>>> obtained with HPDDM(BFBCG) solver and bjacobian + ICC (1) in subdomains
>>>>>> as
>>>>>> preconditioner, it took 1 m 45 s and RAM 5.0 GB. Parallel computation
>>>>>> with
>>>>>> 4 MPI processes took 2 m 46 s when using 5.6 GB of RAM. This because of
>>>>>> number of iterations required to achieve the same tolerance is
>>>>>> significantly increased.
>>>>>>
>>>>>> How many iterations do you have in serial (and then in parallel)?
>>>>>>
>>>>>> > I`ve also tried PCGAMG (agg) preconditioner with ICС (1)
>>>>>> sub-precondtioner. For single MPI process, the calculation took 10 min
>>>>>> and
>>>>>> 3.4 GB of RAM. To improve the convergence rate, the nullspace was
>>>>>> attached
>>>>>> using MatNullSpaceCreateRigidBody and MatSetNearNullSpace subroutines.
>>>>>> This has reduced calculation time to 3 m 58 s when using 4.3 GB of RAM.
>>>>>> Also, there is peak memory usage with 14.1 GB, which appears just before
>>>>>> the start of the iterations. Parallel computation with 4 MPI processes
>>>>>> took
>>>>>> 2 m 53 s when using 8.4 GB of RAM. In that case the peak memory usage is
>>>>>> about 22 GB.
>>>>>>
>>>>>> Does the number of iterates increase in parallel? Again, how many
>>>>>> iterations do you have?
>>>>>>
>>>>>> > Are there ways to avoid decreasing of the convergence rate for
>>>>>> bjacobi precondtioner in parallel mode? Does it make sense to use
>>>>>> hierarchical or nested krylov methods with a local gmres solver
>>>>>> (sub_pc_type gmres) and some sub-precondtioner (for example, sub_pc_type
>>>>>> bjacobi)?
>>>>>>
>>>>>> bjacobi is only a one-level method, so you would not expect
>>>>>> process-independent convergence rate for this kind of problem. If the
>>>>>> coefficient variation is not too extreme, then I would expect GAMG (or
>>>>>> some
>>>>>> other smoothed aggregation package, perhaps -pc_type ml (you need
>>>>>> --download-ml)) would work well with some tuning.
>>>>>>
>>>>>> If you have extremely high contrast coefficients you might need
>>>>>> something with stronger coarse grids. If you can assemble so-called
>>>>>> Neumann
>>>>>> matrices (
>>>>>> https://petsc.org/release/docs/manualpages/Mat/MATIS.html#MATIS)
>>>>>> then you could try the geneo scheme offered by PCHPDDM.
>>>>>>
>>>>>> > Is this peak memory usage expected for gamg preconditioner? is
>>>>>> there any way to reduce it?
>>>>>>
>>>>>> I think that peak memory usage comes from building the coarse grids.
>>>>>> Can you run with `-info` and grep for GAMG, this will provide some output
>>>>>> that more expert GAMG users can interpret.
>>>>>>
>>>>>> Lawrence
>>>>>>
>>>>>>
>>>
>>> --
>>> What most experimenters take for granted before they begin their
>>> experiments is infinitely more interesting than any results to which their
>>> experiments lead.
>>> -- Norbert Wiener
>>>
>>> https://www.cse.buffalo.edu/~knepley/
>>> <http://www.cse.buffalo.edu/~knepley/>
>>>
>>
--
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener
https://www.cse.buffalo.edu/~knepley/ <http://www.cse.buffalo.edu/~knepley/>