This Message Is From an External Sender
This message came from outside your organization.
Pierre,
I've attached the dumps of the matrix + RHS for something of about 3k x 1k. Regarding the weird divergence behaviour, I tried again at home but I still get the same results. I am running a rolling release distribution on both machines, but that really shouldn't matter for divergence behavior I would think. Is there some kind of option in PETSc to get more information about the breakdown from my side? Best regards, Marco ----- Original Message ----- >> From: Pierre Jolivet <pie...@joliv.et> >> To: Marco Seiz <ma...@kit.ac.jp> >> Cc: petsc-users@mcs.anl.gov >> Date: 2024-05-07 18:12:18 >> Subject: Re: [petsc-users] Reasons for breakdown in preconditioned LSQR >> >> >> > On 7 May 2024, at 9:10 AM, Marco Seiz <ma...@kit.ac.jp> wrote: >> > >> > Thanks for the quick response! >> > >> > On 07.05.24 14:24, Pierre Jolivet wrote: >> >> >> >> >> >>> On 7 May 2024, at 7:04 AM, Marco Seiz <ma...@kit.ac.jp> wrote: >> >>> >> >>> This Message Is From an External Sender >> >>> This message came from outside your organization. >> >>> Hello, >> >>> >> >>> something a bit different from my last question, since that didn't >> >>> progress so well: >> >>> I have a related model which generally produces a rectangular matrix A, >> >>> so I am using LSQR to solve the system. >> >>> The matrix A has two nonzeros (1, -1) per row, with A^T A being similar >> >>> to a finite difference Poisson matrix if the rows were permuted randomly. >> >>> The problem is singular in that the solution is only specified up to a >> >>> constant from the matrix, with my target solution being a weighted zero >> >>> average one, which I can handle by adding a nullspace to my matrix. >> >>> However, I'd also like to pin (potentially many) DOFs in the future so I >> >>> also tried pinning a single value, and afterwards subtracting the >> >>> average from the KSP solution. >> >>> This leads to the KSP *sometimes* diverging when I use a preconditioner; >> >>> the target size of the matrix will be something like ([1,20] N) x N, >> >>> with N ~ [2, 1e6] so for the higher end I will require a preconditioner >> >>> for reasonable execution time. >> >>> >> >>> For a smaller example system, I set up my application to dump the input >> >>> to the KSP when it breaks down and I've attached a simple python script >> >>> + data using petsc4py to demonstrate the divergence for those specific >> >>> systems. >> >>> With `python3 lsdiv.py -pc_type lu -ksp_converged_reason` that >> >>> particular system shows breakdown, but if I remove the pinned DOF and >> >>> add the nullspace (pass -usens) it converges. I did try different PCs >> >>> but they tend to break down at different steps, e.g. `python3 lsdiv.py >> >>> -usenormal -qrdiv -pc_type qr -ksp_converged_reason` shows the breakdown >> >>> for PCQR when I use MatCreateNormal for creating the PC mat, but >> >>> interestingly it doesn't break down when I explicitly form A^T A (don't >> >>> pass -usenormal). >> >> >> >> What version are you using? All those commands are returning >> >> Linear solve converged due to CONVERGED_RTOL_NORMAL iterations 1 >> >> So I cannot reproduce any breakdown, but there have been recent changes to KSPLSQR. >> > For those tests I've been using PETSc 3.20.5 (last githash was >> > 4b82c11ab5d ). >> > I pulled the latest version from gitlab ( 6b3135e3cbe ) and compiled it, >> > but I had to drop --download-suitesparse=1 from my earlier config due to >> > errors. >> > Should I write a separate mail about this? >> > >> > The LU example still behaves the same for me (`python3 lsdiv.py -pc_type >> > lu -ksp_converged_reason` gives DIVERGED_BREAKDOWN, `python3 lsdiv.py >> > -usens -pc_type lu -ksp_converged_reason` gives CONVERGED_RTOL_NORMAL) >> > but the QR example fails since I had to remove suitesparse. >> > petsc4py.__version__ reports 3.21.1 and if I rebuild my application, >> > then `ldd app` gives me `libpetsc.so <https://urldefense.us/v3/__http://libpetsc.so/__;!!G_uCfscf7eWS!fW1baXZMAQIKi0VDUIDUUzpMi4xQf7jrWGCXPlpIllqKAXJBzDClVwrLKYuWuT7LYfZoDzK4g9I9g_wFtHOXaQ$>.3.21 => >> > /opt/petsc/linux-c-opt/lib/libpetsc.so <https://urldefense.us/v3/__http://libpetsc.so/__;!!G_uCfscf7eWS!fW1baXZMAQIKi0VDUIDUUzpMi4xQf7jrWGCXPlpIllqKAXJBzDClVwrLKYuWuT7LYfZoDzK4g9I9g_wFtHOXaQ$>.3.21` so it should be using the >> > newly built one. >> > The application then still eventually yields a DIVERGED_BREAKDOWN. >> > I don't have a ~/.petscrc and PETSC_OPTIONS is unset, so if we are on >> > the same version and there's still a discrepancy it is quite weird. >> >> Quite weird indeed… >> $ python3 lsdiv.py -pc_type lu -ksp_converged_reason >> Linear solve converged due to CONVERGED_RTOL_NORMAL iterations 1 >> $ python3 lsdiv.py -usens -pc_type lu -ksp_converged_reason >> Linear solve converged due to CONVERGED_RTOL_NORMAL iterations 1 >> $ python3 lsdiv.py -pc_type qr -ksp_converged_reason >> Linear solve converged due to CONVERGED_RTOL_NORMAL iterations 1 >> $ python3 lsdiv.py -usens -pc_type qr -ksp_converged_reason >> Linear solve converged due to CONVERGED_RTOL_NORMAL iterations 1 >> >> >>> For the moment I can work by adding the nullspace but eventually the >> >>> need for pinning DOFs will resurface, so I'd like to ask where the >> >>> breakdown is coming from. What causes the breakdowns? Is that a generic >> >>> problem occurring when adding (dof_i = val) rows to least-squares >> >>> systems which prevents these preconditioners from being robust? If so, >> >>> what preconditioners could be robust? >> >>> I did a minimal sweep of the available PCs by going over the possible >> >>> inputs of -pc_type for my application while pinning one DOF. Excepting >> >>> unavailable PCs (not compiled for, other setup missing, ...) and those >> >>> that did break down, I am left with ( hmg jacobi mat none pbjacobi sor >> >>> svd ). >> >> It’s unlikely any of these preconditioners will scale (or even converge) for problems with up to 1E6 unknowns. >> >> I could help you setup https://urldefense.us/v3/__https://epubs.siam.org/doi/abs/10.1137/21M1434891__;!!G_uCfscf7eWS!fW1baXZMAQIKi0VDUIDUUzpMi4xQf7jrWGCXPlpIllqKAXJBzDClVwrLKYuWuT7LYfZoDzK4g9I9g_z0Iwv7Sg$ if you are willing to share a larger example (the current Mat are extremely tiny). >> > Yes, that would be great. About how large of a matrix do you need? I can >> > probably quickly get something non-artificial up to O(N) ~ 1e3, >> >> That’s big enough. >> If you’re in luck, AMG on the normal equations won’t behave too badly, but I’ll try some more robust (in theory) methods nonetheless. >> >> Thanks, >> Pierre >> >> > bigger >> > matrices will take some time since I purposefully ignored MPI previously. >> > The matrix basically describes the contacts between particles which are >> > resolved on a uniform grid, so the main memory hog isn't the matrix but >> > rather resolving the particles. >> > I should mention that the matrix changes over the course of the >> > simulation but stays constant for many solves, i.e. hundreds to >> > thousands of solves with variable RHS between periods of contact >> > formation/loss. >> > >> >> >> >> Thanks, >> >> Pierre >> >>> >> >>> >> >>> Best regards, >> >>> Marco >> >>> >> >>> <lsdiv.zip> >> >> >> >> >> > Best regards, >> > Marco >> >> >>
<<attachment: system.zip>>