Great, thanks
> On Jan 6, 2026, at 11:39 AM, David Knezevic <[email protected]> > wrote: > > Hi Barry, > > We've tested with your branch, and I confirm that it resolves the issue. We > now get DIVERGED_FUNCTION_DOMAIN as the converged reason (instead of > DIVERGED_LINE_SEARCH). > > Thanks! > David > > > On Wed, Dec 24, 2025 at 11:02 PM Barry Smith <[email protected] > <mailto:[email protected]>> wrote: >> I have started a merge request to properly propagate failure reasons up >> from the line search to the SNESSolve in >> https://urldefense.us/v3/__https://gitlab.com/petsc/petsc/-/merge_requests/8914__;!!G_uCfscf7eWS!b6nENAaHnj7yQFBKwtNflbY6D3iEE9H3j8aRJNBjSO_JPjS43jCRoSK7y2hBPj5UsoRATQUQwyaqoEiqp_Z5h-U$ >> Could you give it a try when you get the chance? >> >> >>> On Dec 22, 2025, at 3:03 PM, David Knezevic <[email protected] >>> <mailto:[email protected]>> wrote: >>> >>> P.S. As a test I removed the "postcheck" callback, and I still get the same >>> behavior with the DIVERGED_LINE_SEARCH converged reason, so I guess the >>> "postcheck" is not related. >>> >>> >>> On Mon, Dec 22, 2025 at 1:58 PM David Knezevic <[email protected] >>> <mailto:[email protected]>> wrote: >>>> The print out I get from -snes_view is shown below. I wonder if the issue >>>> is related to "using user-defined postcheck step"? >>>> >>>> >>>> SNES Object: 1 MPI process >>>> type: newtonls >>>> maximum iterations=5, maximum function evaluations=10000 >>>> tolerances: relative=0., absolute=0., solution=0. >>>> total number of linear solver iterations=3 >>>> total number of function evaluations=4 >>>> norm schedule ALWAYS >>>> SNESLineSearch Object: 1 MPI process >>>> type: basic >>>> maxstep=1.000000e+08, minlambda=1.000000e-12 >>>> tolerances: relative=1.000000e-08, absolute=1.000000e-15, >>>> lambda=1.000000e-08 >>>> maximum iterations=40 >>>> using user-defined postcheck step >>>> KSP Object: 1 MPI process >>>> type: preonly >>>> maximum iterations=10000, initial guess is zero >>>> tolerances: relative=1e-05, absolute=1e-50, divergence=10000. >>>> left preconditioning >>>> using NONE norm type for convergence test >>>> PC Object: 1 MPI process >>>> type: cholesky >>>> out-of-place factorization >>>> tolerance for zero pivot 2.22045e-14 >>>> matrix ordering: external >>>> factor fill ratio given 0., needed 0. >>>> Factored matrix follows: >>>> Mat Object: 1 MPI process >>>> type: mumps >>>> rows=1152, cols=1152 >>>> package used to perform factorization: mumps >>>> total: nonzeros=126936, allocated nonzeros=126936 >>>> MUMPS run parameters: >>>> Use -ksp_view ::ascii_info_detail to display information >>>> for all processes >>>> RINFOG(1) (global estimated flops for the elimination >>>> after analysis): 1.63461e+07 >>>> RINFOG(2) (global estimated flops for the assembly after >>>> factorization): 74826. >>>> RINFOG(3) (global estimated flops for the elimination >>>> after factorization): 1.63461e+07 >>>> (RINFOG(12) RINFOG(13))*2^INFOG(34) (determinant): >>>> (0.,0.)*(2^0) >>>> INFOG(3) (estimated real workspace for factors on all >>>> processors after analysis): 150505 >>>> INFOG(4) (estimated integer workspace for factors on all >>>> processors after analysis): 6276 >>>> INFOG(5) (estimated maximum front size in the complete >>>> tree): 216 >>>> INFOG(6) (number of nodes in the complete tree): 24 >>>> INFOG(7) (ordering option effectively used after >>>> analysis): 2 >>>> INFOG(8) (structural symmetry in percent of the permuted >>>> matrix after analysis): 100 >>>> INFOG(9) (total real/complex workspace to store the matrix >>>> factors after factorization): 150505 >>>> INFOG(10) (total integer space store the matrix factors >>>> after factorization): 6276 >>>> INFOG(11) (order of largest frontal matrix after >>>> factorization): 216 >>>> INFOG(12) (number of off-diagonal pivots): 1044 >>>> INFOG(13) (number of delayed pivots after factorization): 0 >>>> INFOG(14) (number of memory compress after factorization): >>>> 0 >>>> INFOG(15) (number of steps of iterative refinement after >>>> solution): 0 >>>> INFOG(16) (estimated size (in MB) of all MUMPS internal >>>> data for factorization after analysis: value on the most memory consuming >>>> processor): 2 >>>> INFOG(17) (estimated size of all MUMPS internal data for >>>> factorization after analysis: sum over all processors): 2 >>>> INFOG(18) (size of all MUMPS internal data allocated >>>> during factorization: value on the most memory consuming processor): 2 >>>> INFOG(19) (size of all MUMPS internal data allocated >>>> during factorization: sum over all processors): 2 >>>> INFOG(20) (estimated number of entries in the factors): >>>> 126936 >>>> INFOG(21) (size in MB of memory effectively used during >>>> factorization - value on the most memory consuming processor): 2 >>>> INFOG(22) (size in MB of memory effectively used during >>>> factorization - sum over all processors): 2 >>>> INFOG(23) (after analysis: value of ICNTL(6) effectively >>>> used): 0 >>>> INFOG(24) (after analysis: value of ICNTL(12) effectively >>>> used): 1 >>>> INFOG(25) (after factorization: number of pivots modified >>>> by static pivoting): 0 >>>> INFOG(28) (after factorization: number of null pivots >>>> encountered): 0 >>>> INFOG(29) (after factorization: effective number of >>>> entries in the factors (sum over all processors)): 126936 >>>> INFOG(30, 31) (after solution: size in Mbytes of memory >>>> used during solution phase): 2, 2 >>>> INFOG(32) (after analysis: type of analysis done): 1 >>>> INFOG(33) (value used for ICNTL(8)): 7 >>>> INFOG(34) (exponent of the determinant if determinant is >>>> requested): 0 >>>> INFOG(35) (after factorization: number of entries taking >>>> into account BLR factor compression - sum over all processors): 126936 >>>> INFOG(36) (after analysis: estimated size of all MUMPS >>>> internal data for running BLR in-core - value on the most memory consuming >>>> processor): 0 >>>> INFOG(37) (after analysis: estimated size of all MUMPS >>>> internal data for running BLR in-core - sum over all processors): 0 >>>> INFOG(38) (after analysis: estimated size of all MUMPS >>>> internal data for running BLR out-of-core - value on the most memory >>>> consuming processor): 0 >>>> INFOG(39) (after analysis: estimated size of all MUMPS >>>> internal data for running BLR out-of-core - sum over all processors): 0 >>>> linear system matrix = precond matrix: >>>> Mat Object: 1 MPI process >>>> type: seqaij >>>> rows=1152, cols=1152 >>>> total: nonzeros=60480, allocated nonzeros=60480 >>>> total number of mallocs used during MatSetValues calls=0 >>>> using I-node routines: found 384 nodes, limit used is 5 >>>> >>>> >>>> >>>> On Mon, Dec 22, 2025 at 9:25 AM Barry Smith <[email protected] >>>> <mailto:[email protected]>> wrote: >>>>> David, >>>>> >>>>> This is due to a software glitch. SNES_DIVERGED_FUNCTION_DOMAIN was >>>>> added long after the origins of SNES and, in places, the code was never >>>>> fully updated to handle function domain problems. In particular, parts of >>>>> the line search don't handle it correctly. Can you run with -snes_view >>>>> and that will help us find the spot that needs to be updated. >>>>> >>>>> Barry >>>>> >>>>> >>>>>> On Dec 21, 2025, at 5:53 PM, David Knezevic <[email protected] >>>>>> <mailto:[email protected]>> wrote: >>>>>> >>>>>> Hi, actually, I have a follow up on this topic. >>>>>> >>>>>> I noticed that when I call SNESSetFunctionDomainError(), it exits the >>>>>> solve as expected, but it leads to a converged reason >>>>>> "DIVERGED_LINE_SEARCH" instead of "DIVERGED_FUNCTION_DOMAIN". If I also >>>>>> set SNESSetConvergedReason(snes, SNES_DIVERGED_FUNCTION_DOMAIN) in the >>>>>> callback, then I get the expected SNES_DIVERGED_FUNCTION_DOMAIN >>>>>> converged reason, so that's what I'm doing now. I was surprised by this >>>>>> behavior, though, since I expected that calling >>>>>> SNESSetFunctionDomainError woudld lead to the DIVERGED_FUNCTION_DOMAIN >>>>>> converged reason, so I just wanted to check on what could be causing >>>>>> this. >>>>>> >>>>>> FYI, I'm using PETSc 3.23.4 >>>>>> >>>>>> Thanks, >>>>>> David >>>>>> >>>>>> >>>>>> On Thu, Dec 18, 2025 at 8:10 AM David Knezevic >>>>>> <[email protected] <mailto:[email protected]>> wrote: >>>>>>> Thank you very much for this guidance. I switched to use >>>>>>> SNES_DIVERGED_FUNCTION_DOMAIN, and I don't get any errors now. >>>>>>> >>>>>>> Thanks! >>>>>>> David >>>>>>> >>>>>>> >>>>>>> On Wed, Dec 17, 2025 at 3:43 PM Barry Smith <[email protected] >>>>>>> <mailto:[email protected]>> wrote: >>>>>>>> >>>>>>>> >>>>>>>>> On Dec 17, 2025, at 2:47 PM, David Knezevic >>>>>>>>> <[email protected] <mailto:[email protected]>> >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>> Stefano and Barry: Thank you, this is very helpful. >>>>>>>>> >>>>>>>>> I'll give some more info here which may help to clarify further. >>>>>>>>> Normally we do just get a negative "converged reason", as you >>>>>>>>> described. But in this specific case where I'm having issues the >>>>>>>>> solve is a numerically sensitive creep solve, which has exponential >>>>>>>>> terms in the residual and jacobian callback that can "blow up" and >>>>>>>>> give NaN values. In this case, the root cause is that we hit a NaN >>>>>>>>> value during a callback, and then we throw an exception (in libMesh >>>>>>>>> C++ code) which I gather leads to the SNES solve exiting with this >>>>>>>>> error code. >>>>>>>>> >>>>>>>>> Is there a way to tell the SNES to terminate with a negative >>>>>>>>> "converged reason" because we've encountered some issue during the >>>>>>>>> callback? >>>>>>>> >>>>>>>> In your callback you should call SNESSetFunctionDomainError() and >>>>>>>> make sure the function value has an infinity or NaN in it (you can >>>>>>>> call VecFlag() for this purpose)). >>>>>>>> >>>>>>>> Now SNESConvergedReason will be a completely reasonable >>>>>>>> SNES_DIVERGED_FUNCTION_DOMAIN >>>>>>>> >>>>>>>> Barry >>>>>>>> >>>>>>>> If you are using an ancient version of PETSc (I hope you are using the >>>>>>>> latest since that always has more bug fixes and features) that does >>>>>>>> not have SNESSetFunctionDomainError then just make sure the function >>>>>>>> vector result has an infinity or NaN in it and then >>>>>>>> SNESConvergedReason will be SNES_DIVERGED_FNORM_NAN >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> David >>>>>>>>> >>>>>>>>> >>>>>>>>> On Wed, Dec 17, 2025 at 2:25 PM Barry Smith <[email protected] >>>>>>>>> <mailto:[email protected]>> wrote: >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> On Dec 17, 2025, at 2:08 PM, David Knezevic via petsc-users >>>>>>>>>>> <[email protected] <mailto:[email protected]>> wrote: >>>>>>>>>>> >>>>>>>>>>> Hi, >>>>>>>>>>> >>>>>>>>>>> I'm using PETSc via the libMesh framework, so creating a MWE is >>>>>>>>>>> complicated by that, unfortunately. >>>>>>>>>>> >>>>>>>>>>> The situation is that I am not modifying the solution vector in a >>>>>>>>>>> callback. The SNES solve has terminated, with PetscErrorCode 82, >>>>>>>>>>> and I then want to update the solution vector (reset it to the >>>>>>>>>>> "previously converged value") and then try to solve again with a >>>>>>>>>>> smaller load increment. This is a typical "auto load stepping" >>>>>>>>>>> strategy in FE. >>>>>>>>>> >>>>>>>>>> Once a PetscError is generated you CANNOT continue the PETSc >>>>>>>>>> program, it is not designed to allow this and trying to continue >>>>>>>>>> will lead to further problems. >>>>>>>>>> >>>>>>>>>> So what you need to do is prevent PETSc from getting to the point >>>>>>>>>> where an actual PetscErrorCode of 82 is generated. Normally >>>>>>>>>> SNESSolve() returns without generating an error even if the >>>>>>>>>> nonlinear solver failed (for example did not converge). One then >>>>>>>>>> uses SNESGetConvergedReason to check if it converged or not. >>>>>>>>>> Normally when SNESSolve() returns, regardless of whether the >>>>>>>>>> converged reason is negative or positive, there will be no locked >>>>>>>>>> vectors and one can modify the SNES object and call SNESSolve again. >>>>>>>>>> >>>>>>>>>> So my guess is that an actual PETSc error is being generated >>>>>>>>>> because SNESSetErrorIfNotConverged(snes,PETSC_TRUE) is being called >>>>>>>>>> by either your code or libMesh or the option >>>>>>>>>> -snes_error_if_not_converged is being used. In your case when you >>>>>>>>>> wish the code to work after a non-converged SNESSolve() these >>>>>>>>>> options should never be set instead you should check the result of >>>>>>>>>> SNESGetConvergedReason() to check if SNESSolve has failed. If >>>>>>>>>> SNESSetErrorIfNotConverged() is never being set that may indicate >>>>>>>>>> you are using an old version of PETSc or have it a bug inside >>>>>>>>>> PETSc's SNES that does not handle errors correctly and we can help >>>>>>>>>> fix the problem if you can provide a full debug output version of >>>>>>>>>> when the error occurs. >>>>>>>>>> >>>>>>>>>> Barry >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> I think the key piece of info I'd like to know is, at what point is >>>>>>>>>>> the solution vector "unlocked" by the SNES object? Should it be >>>>>>>>>>> unlocked as soon as the SNES solve has terminated with >>>>>>>>>>> PetscErrorCode 82? Since it seems to me that it hasn't been >>>>>>>>>>> unlocked yet (maybe just on a subset of the processes). Should I >>>>>>>>>>> manually "unlock" the solution vector by calling VecLockWriteSet? >>>>>>>>>>> >>>>>>>>>>> Thanks, >>>>>>>>>>> David >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Wed, Dec 17, 2025 at 2:02 PM Stefano Zampini >>>>>>>>>>> <[email protected] <mailto:[email protected]>> >>>>>>>>>>> wrote: >>>>>>>>>>>> You are not allowed to call VecGetArray on the solution vector of >>>>>>>>>>>> an SNES object within a user callback, nor to modify its values in >>>>>>>>>>>> any other way. >>>>>>>>>>>> Put in C++ lingo, the solution vector is a "const" argument >>>>>>>>>>>> It would be great if you could provide an MWE to help us >>>>>>>>>>>> understand your problem >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> Il giorno mer 17 dic 2025 alle ore 20:51 David Knezevic via >>>>>>>>>>>> petsc-users <[email protected] >>>>>>>>>>>> <mailto:[email protected]>> ha scritto: >>>>>>>>>>>>> Hi all, >>>>>>>>>>>>> >>>>>>>>>>>>> I have a question about this error: >>>>>>>>>>>>>> Vector 'Vec_0x84000005_0' (argument #2) was locked for read-only >>>>>>>>>>>>>> access in unknown_function() at unknown file:0 (line numbers >>>>>>>>>>>>>> only accurate to function begin) >>>>>>>>>>>>> >>>>>>>>>>>>> I'm encountering this error in an FE solve where there is an >>>>>>>>>>>>> error encountered during the residual/jacobian assembly, and what >>>>>>>>>>>>> we normally do in that situation is shrink the load step and >>>>>>>>>>>>> continue, starting from the "last converged solution". However, >>>>>>>>>>>>> in this case I'm running on 32 processes, and 5 of the processes >>>>>>>>>>>>> report the error above about a "locked vector". >>>>>>>>>>>>> >>>>>>>>>>>>> We clear the SNES object (via SNESDestroy) before we reset the >>>>>>>>>>>>> solution to the "last converged solution", and then we make a new >>>>>>>>>>>>> SNES object subsequently. But it seems to me that somehow the >>>>>>>>>>>>> solution vector is still marked as "locked" on 5 of the processes >>>>>>>>>>>>> when we modify the solution vector, which leads to the error >>>>>>>>>>>>> above. >>>>>>>>>>>>> >>>>>>>>>>>>> I was wondering if someone could advise on what the best way to >>>>>>>>>>>>> handle this would be? I thought one option could be to add an MPI >>>>>>>>>>>>> barrier call prior to updating the solution vector to "last >>>>>>>>>>>>> converged solution", to make sure that the SNES object is >>>>>>>>>>>>> destroyed on all procs (and hence the locks cleared) before >>>>>>>>>>>>> editing the solution vector, but I'm unsure if that would make a >>>>>>>>>>>>> difference. Any help would be most appreciated! >>>>>>>>>>>>> >>>>>>>>>>>>> Thanks, >>>>>>>>>>>>> David >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> -- >>>>>>>>>>>> Stefano >>>>>>>>>> >>>>>>>> >>>>> >>
