OK, thanks! I'll let you know once I get a chance to try it out.
On Wed, Dec 24, 2025 at 10:02 PM Barry Smith <[email protected]> wrote: > I have started a merge request to properly propagate failure reasons up > from the line search to the SNESSolve in > https://urldefense.us/v3/__https://gitlab.com/petsc/petsc/-/merge_requests/8914__;!!G_uCfscf7eWS!c1k47peCRJtiG7O9EYxpFZUWSVyAnoq-6zoYdEPVFi0-gbNBHUxwlalV7EwvUqCe4iRdsX2nR2S2lzW1Ww7O2LY0rRmDhMQ$ > Could you give it a > try when you get the chance? > > > On Dec 22, 2025, at 3:03 PM, David Knezevic <[email protected]> > wrote: > > P.S. As a test I removed the "postcheck" callback, and I still get > the same behavior with the DIVERGED_LINE_SEARCH converged reason, so I > guess the "postcheck" is not related. > > > On Mon, Dec 22, 2025 at 1:58 PM David Knezevic <[email protected]> > wrote: > >> The print out I get from -snes_view is shown below. I wonder if the issue >> is related to "using user-defined postcheck step"? >> >> >> SNES Object: 1 MPI process >> type: newtonls >> maximum iterations=5, maximum function evaluations=10000 >> tolerances: relative=0., absolute=0., solution=0. >> total number of linear solver iterations=3 >> total number of function evaluations=4 >> norm schedule ALWAYS >> SNESLineSearch Object: 1 MPI process >> type: basic >> maxstep=1.000000e+08, minlambda=1.000000e-12 >> tolerances: relative=1.000000e-08, absolute=1.000000e-15, >> lambda=1.000000e-08 >> maximum iterations=40 >> using user-defined postcheck step >> KSP Object: 1 MPI process >> type: preonly >> maximum iterations=10000, initial guess is zero >> tolerances: relative=1e-05, absolute=1e-50, divergence=10000. >> left preconditioning >> using NONE norm type for convergence test >> PC Object: 1 MPI process >> type: cholesky >> out-of-place factorization >> tolerance for zero pivot 2.22045e-14 >> matrix ordering: external >> factor fill ratio given 0., needed 0. >> Factored matrix follows: >> Mat Object: 1 MPI process >> type: mumps >> rows=1152, cols=1152 >> package used to perform factorization: mumps >> total: nonzeros=126936, allocated nonzeros=126936 >> MUMPS run parameters: >> Use -ksp_view ::ascii_info_detail to display information >> for all processes >> RINFOG(1) (global estimated flops for the elimination >> after analysis): 1.63461e+07 >> RINFOG(2) (global estimated flops for the assembly after >> factorization): 74826. >> RINFOG(3) (global estimated flops for the elimination >> after factorization): 1.63461e+07 >> (RINFOG(12) RINFOG(13))*2^INFOG(34) (determinant): >> (0.,0.)*(2^0) >> INFOG(3) (estimated real workspace for factors on all >> processors after analysis): 150505 >> INFOG(4) (estimated integer workspace for factors on all >> processors after analysis): 6276 >> INFOG(5) (estimated maximum front size in the complete >> tree): 216 >> INFOG(6) (number of nodes in the complete tree): 24 >> INFOG(7) (ordering option effectively used after >> analysis): 2 >> INFOG(8) (structural symmetry in percent of the permuted >> matrix after analysis): 100 >> INFOG(9) (total real/complex workspace to store the >> matrix factors after factorization): 150505 >> INFOG(10) (total integer space store the matrix factors >> after factorization): 6276 >> INFOG(11) (order of largest frontal matrix after >> factorization): 216 >> INFOG(12) (number of off-diagonal pivots): 1044 >> INFOG(13) (number of delayed pivots after factorization): >> 0 >> INFOG(14) (number of memory compress after >> factorization): 0 >> INFOG(15) (number of steps of iterative refinement after >> solution): 0 >> INFOG(16) (estimated size (in MB) of all MUMPS internal >> data for factorization after analysis: value on the most memory consuming >> processor): 2 >> INFOG(17) (estimated size of all MUMPS internal data for >> factorization after analysis: sum over all processors): 2 >> INFOG(18) (size of all MUMPS internal data allocated >> during factorization: value on the most memory consuming processor): 2 >> INFOG(19) (size of all MUMPS internal data allocated >> during factorization: sum over all processors): 2 >> INFOG(20) (estimated number of entries in the factors): >> 126936 >> INFOG(21) (size in MB of memory effectively used during >> factorization - value on the most memory consuming processor): 2 >> INFOG(22) (size in MB of memory effectively used during >> factorization - sum over all processors): 2 >> INFOG(23) (after analysis: value of ICNTL(6) effectively >> used): 0 >> INFOG(24) (after analysis: value of ICNTL(12) effectively >> used): 1 >> INFOG(25) (after factorization: number of pivots modified >> by static pivoting): 0 >> INFOG(28) (after factorization: number of null pivots >> encountered): 0 >> INFOG(29) (after factorization: effective number of >> entries in the factors (sum over all processors)): 126936 >> INFOG(30, 31) (after solution: size in Mbytes of memory >> used during solution phase): 2, 2 >> INFOG(32) (after analysis: type of analysis done): 1 >> INFOG(33) (value used for ICNTL(8)): 7 >> INFOG(34) (exponent of the determinant if determinant is >> requested): 0 >> INFOG(35) (after factorization: number of entries taking >> into account BLR factor compression - sum over all processors): 126936 >> INFOG(36) (after analysis: estimated size of all MUMPS >> internal data for running BLR in-core - value on the most memory consuming >> processor): 0 >> INFOG(37) (after analysis: estimated size of all MUMPS >> internal data for running BLR in-core - sum over all processors): 0 >> INFOG(38) (after analysis: estimated size of all MUMPS >> internal data for running BLR out-of-core - value on the most memory >> consuming processor): 0 >> INFOG(39) (after analysis: estimated size of all MUMPS >> internal data for running BLR out-of-core - sum over all processors): 0 >> linear system matrix = precond matrix: >> Mat Object: 1 MPI process >> type: seqaij >> rows=1152, cols=1152 >> total: nonzeros=60480, allocated nonzeros=60480 >> total number of mallocs used during MatSetValues calls=0 >> using I-node routines: found 384 nodes, limit used is 5 >> >> >> >> On Mon, Dec 22, 2025 at 9:25 AM Barry Smith <[email protected]> wrote: >> >>> David, >>> >>> This is due to a software glitch. SNES_DIVERGED_FUNCTION_DOMAIN was >>> added long after the origins of SNES and, in places, the code was never >>> fully updated to handle function domain problems. In particular, parts of >>> the line search don't handle it correctly. Can you run with -snes_view and >>> that will help us find the spot that needs to be updated. >>> >>> Barry >>> >>> >>> On Dec 21, 2025, at 5:53 PM, David Knezevic <[email protected]> >>> wrote: >>> >>> Hi, actually, I have a follow up on this topic. >>> >>> I noticed that when I call SNESSetFunctionDomainError(), it exits the >>> solve as expected, but it leads to a converged reason >>> "DIVERGED_LINE_SEARCH" instead of "DIVERGED_FUNCTION_DOMAIN". If I also >>> set SNESSetConvergedReason(snes, SNES_DIVERGED_FUNCTION_DOMAIN) in the >>> callback, then I get the expected SNES_DIVERGED_FUNCTION_DOMAIN converged >>> reason, so that's what I'm doing now. I was surprised by this behavior, >>> though, since I expected that calling SNESSetFunctionDomainError woudld >>> lead to the DIVERGED_FUNCTION_DOMAIN converged reason, so I just wanted to >>> check on what could be causing this. >>> >>> FYI, I'm using PETSc 3.23.4 >>> >>> Thanks, >>> David >>> >>> >>> On Thu, Dec 18, 2025 at 8:10 AM David Knezevic < >>> [email protected]> wrote: >>> >>>> Thank you very much for this guidance. I switched to use >>>> SNES_DIVERGED_FUNCTION_DOMAIN, and I don't get any errors now. >>>> >>>> Thanks! >>>> David >>>> >>>> >>>> On Wed, Dec 17, 2025 at 3:43 PM Barry Smith <[email protected]> wrote: >>>> >>>>> >>>>> >>>>> On Dec 17, 2025, at 2:47 PM, David Knezevic < >>>>> [email protected]> wrote: >>>>> >>>>> Stefano and Barry: Thank you, this is very helpful. >>>>> >>>>> I'll give some more info here which may help to clarify further. >>>>> Normally we do just get a negative "converged reason", as you described. >>>>> But in this specific case where I'm having issues the solve is a >>>>> numerically sensitive creep solve, which has exponential terms in the >>>>> residual and jacobian callback that can "blow up" and give NaN values. In >>>>> this case, the root cause is that we hit a NaN value during a callback, >>>>> and >>>>> then we throw an exception (in libMesh C++ code) which I gather leads to >>>>> the SNES solve exiting with this error code. >>>>> >>>>> Is there a way to tell the SNES to terminate with a negative >>>>> "converged reason" because we've encountered some issue during the >>>>> callback? >>>>> >>>>> >>>>> In your callback you should call SNESSetFunctionDomainError() and >>>>> make sure the function value has an infinity or NaN in it (you can call >>>>> VecFlag() for this purpose)). >>>>> >>>>> Now SNESConvergedReason will be a completely >>>>> reasonable SNES_DIVERGED_FUNCTION_DOMAIN >>>>> >>>>> Barry >>>>> >>>>> If you are using an ancient version of PETSc (I hope you are using the >>>>> latest since that always has more bug fixes and features) that does not >>>>> have SNESSetFunctionDomainError then just make sure the function vector >>>>> result has an infinity or NaN in it and then SNESConvergedReason will be >>>>> SNES_DIVERGED_FNORM_NAN >>>>> >>>>> >>>>> >>>>> >>>>> Thanks, >>>>> David >>>>> >>>>> >>>>> On Wed, Dec 17, 2025 at 2:25 PM Barry Smith <[email protected]> wrote: >>>>> >>>>>> >>>>>> >>>>>> On Dec 17, 2025, at 2:08 PM, David Knezevic via petsc-users < >>>>>> [email protected]> wrote: >>>>>> >>>>>> Hi, >>>>>> >>>>>> I'm using PETSc via the libMesh framework, so creating a MWE is >>>>>> complicated by that, unfortunately. >>>>>> >>>>>> The situation is that I am not modifying the solution vector in a >>>>>> callback. The SNES solve has terminated, with PetscErrorCode 82, and I >>>>>> then >>>>>> want to update the solution vector (reset it to the "previously converged >>>>>> value") and then try to solve again with a smaller load increment. This >>>>>> is >>>>>> a typical "auto load stepping" strategy in FE. >>>>>> >>>>>> >>>>>> Once a PetscError is generated you CANNOT continue the PETSc >>>>>> program, it is not designed to allow this and trying to continue will >>>>>> lead >>>>>> to further problems. >>>>>> >>>>>> So what you need to do is prevent PETSc from getting to the point >>>>>> where an actual PetscErrorCode of 82 is generated. Normally SNESSolve() >>>>>> returns without generating an error even if the nonlinear solver failed >>>>>> (for example did not converge). One then uses SNESGetConvergedReason to >>>>>> check if it converged or not. Normally when SNESSolve() returns, >>>>>> regardless >>>>>> of whether the converged reason is negative or positive, there will be no >>>>>> locked vectors and one can modify the SNES object and call SNESSolve >>>>>> again. >>>>>> >>>>>> So my guess is that an actual PETSc error is being generated >>>>>> because SNESSetErrorIfNotConverged(snes,PETSC_TRUE) is being called by >>>>>> either your code or libMesh or the option -snes_error_if_not_converged is >>>>>> being used. In your case when you wish the code to work after a >>>>>> non-converged SNESSolve() these options should never be set instead you >>>>>> should check the result of SNESGetConvergedReason() to check if SNESSolve >>>>>> has failed. If SNESSetErrorIfNotConverged() is never being set that may >>>>>> indicate you are using an old version of PETSc or have it a bug inside >>>>>> PETSc's SNES that does not handle errors correctly and we can help fix >>>>>> the >>>>>> problem if you can provide a full debug output version of when the error >>>>>> occurs. >>>>>> >>>>>> Barry >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> I think the key piece of info I'd like to know is, at what point is >>>>>> the solution vector "unlocked" by the SNES object? Should it be unlocked >>>>>> as >>>>>> soon as the SNES solve has terminated with PetscErrorCode 82? Since it >>>>>> seems to me that it hasn't been unlocked yet (maybe just on a subset of >>>>>> the >>>>>> processes). Should I manually "unlock" the solution vector by >>>>>> calling VecLockWriteSet? >>>>>> >>>>>> Thanks, >>>>>> David >>>>>> >>>>>> >>>>>> >>>>>> On Wed, Dec 17, 2025 at 2:02 PM Stefano Zampini < >>>>>> [email protected]> wrote: >>>>>> >>>>>>> You are not allowed to call VecGetArray on the solution vector of an >>>>>>> SNES object within a user callback, nor to modify its values in any >>>>>>> other >>>>>>> way. >>>>>>> Put in C++ lingo, the solution vector is a "const" argument >>>>>>> It would be great if you could provide an MWE to help us understand >>>>>>> your problem >>>>>>> >>>>>>> >>>>>>> Il giorno mer 17 dic 2025 alle ore 20:51 David Knezevic via >>>>>>> petsc-users <[email protected]> ha scritto: >>>>>>> >>>>>>>> Hi all, >>>>>>>> >>>>>>>> I have a question about this error: >>>>>>>> >>>>>>>>> Vector 'Vec_0x84000005_0' (argument #2) was locked for read-only >>>>>>>>> access in unknown_function() at unknown file:0 (line numbers only >>>>>>>>> accurate >>>>>>>>> to function begin) >>>>>>>> >>>>>>>> >>>>>>>> I'm encountering this error in an FE solve where there is an error >>>>>>>> encountered during the residual/jacobian assembly, and what we >>>>>>>> normally do >>>>>>>> in that situation is shrink the load step and continue, starting from >>>>>>>> the >>>>>>>> "last converged solution". However, in this case I'm running on 32 >>>>>>>> processes, and 5 of the processes report the error above about a >>>>>>>> "locked >>>>>>>> vector". >>>>>>>> >>>>>>>> We clear the SNES object (via SNESDestroy) before we reset the >>>>>>>> solution to the "last converged solution", and then we make a new SNES >>>>>>>> object subsequently. But it seems to me that somehow the solution >>>>>>>> vector is >>>>>>>> still marked as "locked" on 5 of the processes when we modify the >>>>>>>> solution >>>>>>>> vector, which leads to the error above. >>>>>>>> >>>>>>>> I was wondering if someone could advise on what the best way to >>>>>>>> handle this would be? I thought one option could be to add an MPI >>>>>>>> barrier >>>>>>>> call prior to updating the solution vector to "last converged >>>>>>>> solution", to >>>>>>>> make sure that the SNES object is destroyed on all procs (and hence the >>>>>>>> locks cleared) before editing the solution vector, but I'm unsure if >>>>>>>> that >>>>>>>> would make a difference. Any help would be most appreciated! >>>>>>>> >>>>>>>> Thanks, >>>>>>>> David >>>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> Stefano >>>>>>> >>>>>> >>>>>> >>>>> >>> >
