Re: [petsc-users] Question regarding SNES error about locked vectors

Barry Smith Tue, 06 Jan 2026 11:04:55 -0800

  Great, thanks


> On Jan 6, 2026, at 11:39 AM, David Knezevic <[email protected]> 
> wrote:
> 
> Hi Barry,
> 
> We've tested with your branch, and I confirm that it resolves the issue. We 
> now get DIVERGED_FUNCTION_DOMAIN as the converged reason (instead of 
> DIVERGED_LINE_SEARCH).
> 
> Thanks!
> David
> 
> 
> On Wed, Dec 24, 2025 at 11:02 PM Barry Smith <[email protected] 
> <mailto:[email protected]>> wrote:
>>    I have started a merge request to properly propagate failure reasons up 
>> from the line search to the SNESSolve in 
>> https://urldefense.us/v3/__https://gitlab.com/petsc/petsc/-/merge_requests/8914__;!!G_uCfscf7eWS!b6nENAaHnj7yQFBKwtNflbY6D3iEE9H3j8aRJNBjSO_JPjS43jCRoSK7y2hBPj5UsoRATQUQwyaqoEiqp_Z5h-U$
>>   Could you give it a try when you get the chance?
>> 
>> 
>>> On Dec 22, 2025, at 3:03 PM, David Knezevic <[email protected] 
>>> <mailto:[email protected]>> wrote:
>>> 
>>> P.S. As a test I removed the "postcheck" callback, and I still get the same 
>>> behavior with the DIVERGED_LINE_SEARCH converged reason, so I guess the 
>>> "postcheck" is not related.
>>> 
>>> 
>>> On Mon, Dec 22, 2025 at 1:58 PM David Knezevic <[email protected] 
>>> <mailto:[email protected]>> wrote:
>>>> The print out I get from -snes_view is shown below. I wonder if the issue 
>>>> is related to "using user-defined postcheck step"?
>>>> 
>>>> 
>>>> SNES Object: 1 MPI process
>>>>   type: newtonls
>>>>   maximum iterations=5, maximum function evaluations=10000
>>>>   tolerances: relative=0., absolute=0., solution=0.
>>>>   total number of linear solver iterations=3
>>>>   total number of function evaluations=4
>>>>   norm schedule ALWAYS
>>>>   SNESLineSearch Object: 1 MPI process
>>>>     type: basic
>>>>     maxstep=1.000000e+08, minlambda=1.000000e-12
>>>>     tolerances: relative=1.000000e-08, absolute=1.000000e-15, 
>>>> lambda=1.000000e-08
>>>>     maximum iterations=40
>>>>     using user-defined postcheck step
>>>>   KSP Object: 1 MPI process
>>>>     type: preonly
>>>>     maximum iterations=10000, initial guess is zero
>>>>     tolerances: relative=1e-05, absolute=1e-50, divergence=10000.
>>>>     left preconditioning
>>>>     using NONE norm type for convergence test
>>>>   PC Object: 1 MPI process
>>>>     type: cholesky
>>>>       out-of-place factorization
>>>>       tolerance for zero pivot 2.22045e-14
>>>>       matrix ordering: external
>>>>       factor fill ratio given 0., needed 0.
>>>>         Factored matrix follows:
>>>>           Mat Object: 1 MPI process
>>>>             type: mumps
>>>>             rows=1152, cols=1152
>>>>             package used to perform factorization: mumps
>>>>             total: nonzeros=126936, allocated nonzeros=126936
>>>>               MUMPS run parameters:
>>>>                 Use -ksp_view ::ascii_info_detail to display information 
>>>> for all processes
>>>>                 RINFOG(1) (global estimated flops for the elimination 
>>>> after analysis): 1.63461e+07
>>>>                 RINFOG(2) (global estimated flops for the assembly after 
>>>> factorization): 74826.
>>>>                 RINFOG(3) (global estimated flops for the elimination 
>>>> after factorization): 1.63461e+07
>>>>                 (RINFOG(12) RINFOG(13))*2^INFOG(34) (determinant): 
>>>> (0.,0.)*(2^0)
>>>>                 INFOG(3) (estimated real workspace for factors on all 
>>>> processors after analysis): 150505
>>>>                 INFOG(4) (estimated integer workspace for factors on all 
>>>> processors after analysis): 6276
>>>>                 INFOG(5) (estimated maximum front size in the complete 
>>>> tree): 216
>>>>                 INFOG(6) (number of nodes in the complete tree): 24
>>>>                 INFOG(7) (ordering option effectively used after 
>>>> analysis): 2
>>>>                 INFOG(8) (structural symmetry in percent of the permuted 
>>>> matrix after analysis): 100
>>>>                 INFOG(9) (total real/complex workspace to store the matrix 
>>>> factors after factorization): 150505
>>>>                 INFOG(10) (total integer space store the matrix factors 
>>>> after factorization): 6276
>>>>                 INFOG(11) (order of largest frontal matrix after 
>>>> factorization): 216
>>>>                 INFOG(12) (number of off-diagonal pivots): 1044
>>>>                 INFOG(13) (number of delayed pivots after factorization): 0
>>>>                 INFOG(14) (number of memory compress after factorization): >>>> 0
>>>>                 INFOG(15) (number of steps of iterative refinement after 
>>>> solution): 0
>>>>                 INFOG(16) (estimated size (in MB) of all MUMPS internal 
>>>> data for factorization after analysis: value on the most memory consuming 
>>>> processor): 2
>>>>                 INFOG(17) (estimated size of all MUMPS internal data for 
>>>> factorization after analysis: sum over all processors): 2
>>>>                 INFOG(18) (size of all MUMPS internal data allocated 
>>>> during factorization: value on the most memory consuming processor): 2
>>>>                 INFOG(19) (size of all MUMPS internal data allocated 
>>>> during factorization: sum over all processors): 2
>>>>                 INFOG(20) (estimated number of entries in the factors): 
>>>> 126936
>>>>                 INFOG(21) (size in MB of memory effectively used during 
>>>> factorization - value on the most memory consuming processor): 2
>>>>                 INFOG(22) (size in MB of memory effectively used during 
>>>> factorization - sum over all processors): 2
>>>>                 INFOG(23) (after analysis: value of ICNTL(6) effectively 
>>>> used): 0
>>>>                 INFOG(24) (after analysis: value of ICNTL(12) effectively 
>>>> used): 1
>>>>                 INFOG(25) (after factorization: number of pivots modified 
>>>> by static pivoting): 0
>>>>                 INFOG(28) (after factorization: number of null pivots 
>>>> encountered): 0
>>>>                 INFOG(29) (after factorization: effective number of 
>>>> entries in the factors (sum over all processors)): 126936
>>>>                 INFOG(30, 31) (after solution: size in Mbytes of memory 
>>>> used during solution phase): 2, 2
>>>>                 INFOG(32) (after analysis: type of analysis done): 1
>>>>                 INFOG(33) (value used for ICNTL(8)): 7
>>>>                 INFOG(34) (exponent of the determinant if determinant is 
>>>> requested): 0
>>>>                 INFOG(35) (after factorization: number of entries taking 
>>>> into account BLR factor compression - sum over all processors): 126936
>>>>                 INFOG(36) (after analysis: estimated size of all MUMPS 
>>>> internal data for running BLR in-core - value on the most memory consuming 
>>>> processor): 0
>>>>                 INFOG(37) (after analysis: estimated size of all MUMPS 
>>>> internal data for running BLR in-core - sum over all processors): 0
>>>>                 INFOG(38) (after analysis: estimated size of all MUMPS 
>>>> internal data for running BLR out-of-core - value on the most memory 
>>>> consuming processor): 0
>>>>                 INFOG(39) (after analysis: estimated size of all MUMPS 
>>>> internal data for running BLR out-of-core - sum over all processors): 0
>>>>     linear system matrix = precond matrix:
>>>>     Mat Object: 1 MPI process
>>>>       type: seqaij
>>>>       rows=1152, cols=1152
>>>>       total: nonzeros=60480, allocated nonzeros=60480
>>>>       total number of mallocs used during MatSetValues calls=0
>>>>         using I-node routines: found 384 nodes, limit used is 5
>>>> 
>>>> 
>>>> 
>>>> On Mon, Dec 22, 2025 at 9:25 AM Barry Smith <[email protected] 
>>>> <mailto:[email protected]>> wrote:
>>>>>   David,
>>>>> 
>>>>>     This is due to a software glitch. SNES_DIVERGED_FUNCTION_DOMAIN was 
>>>>> added long after the origins of SNES and, in places, the code was never 
>>>>> fully updated to handle function domain problems. In particular, parts of 
>>>>> the line search don't handle it correctly. Can you run with -snes_view 
>>>>> and that will help us find the spot that needs to be updated. 
>>>>> 
>>>>>    Barry
>>>>> 
>>>>> 
>>>>>> On Dec 21, 2025, at 5:53 PM, David Knezevic <[email protected] 
>>>>>> <mailto:[email protected]>> wrote:
>>>>>> 
>>>>>> Hi, actually, I have a follow up on this topic.
>>>>>> 
>>>>>> I noticed that when I call SNESSetFunctionDomainError(), it exits the 
>>>>>> solve as expected, but it leads to a converged reason 
>>>>>> "DIVERGED_LINE_SEARCH" instead of "DIVERGED_FUNCTION_DOMAIN". If I also 
>>>>>> set SNESSetConvergedReason(snes, SNES_DIVERGED_FUNCTION_DOMAIN) in the 
>>>>>> callback, then I get the expected SNES_DIVERGED_FUNCTION_DOMAIN 
>>>>>> converged reason, so that's what I'm doing now. I was surprised by this 
>>>>>> behavior, though, since I expected that calling 
>>>>>> SNESSetFunctionDomainError woudld lead to the DIVERGED_FUNCTION_DOMAIN 
>>>>>> converged reason, so I just wanted to check on what could be causing 
>>>>>> this.
>>>>>> 
>>>>>> FYI, I'm using PETSc 3.23.4
>>>>>> 
>>>>>> Thanks,
>>>>>> David
>>>>>> 
>>>>>> 
>>>>>> On Thu, Dec 18, 2025 at 8:10 AM David Knezevic 
>>>>>> <[email protected] <mailto:[email protected]>> wrote:
>>>>>>> Thank you very much for this guidance. I switched to use 
>>>>>>> SNES_DIVERGED_FUNCTION_DOMAIN, and I don't get any errors now.
>>>>>>> 
>>>>>>> Thanks!
>>>>>>> David
>>>>>>> 
>>>>>>> 
>>>>>>> On Wed, Dec 17, 2025 at 3:43 PM Barry Smith <[email protected] 
>>>>>>> <mailto:[email protected]>> wrote:
>>>>>>>> 
>>>>>>>> 
>>>>>>>>> On Dec 17, 2025, at 2:47 PM, David Knezevic 
>>>>>>>>> <[email protected] <mailto:[email protected]>> 
>>>>>>>>> wrote:
>>>>>>>>> 
>>>>>>>>> Stefano and Barry: Thank you, this is very helpful.
>>>>>>>>> 
>>>>>>>>> I'll give some more info here which may help to clarify further. 
>>>>>>>>> Normally we do just get a negative "converged reason", as you 
>>>>>>>>> described. But in this specific case where I'm having issues the 
>>>>>>>>> solve is a numerically sensitive creep solve, which has exponential 
>>>>>>>>> terms in the residual and jacobian callback that can "blow up" and 
>>>>>>>>> give NaN values. In this case, the root cause is that we hit a NaN 
>>>>>>>>> value during a callback, and then we throw an exception (in libMesh 
>>>>>>>>> C++ code) which I gather leads to the SNES solve exiting with this 
>>>>>>>>> error code.
>>>>>>>>> 
>>>>>>>>> Is there a way to tell the SNES to terminate with a negative 
>>>>>>>>> "converged reason" because we've encountered some issue during the 
>>>>>>>>> callback?
>>>>>>>> 
>>>>>>>>    In your callback you should call SNESSetFunctionDomainError() and 
>>>>>>>> make sure the function value has an infinity or NaN in it (you can 
>>>>>>>> call VecFlag() for this purpose)). 
>>>>>>>> 
>>>>>>>>    Now SNESConvergedReason will be a completely reasonable 
>>>>>>>> SNES_DIVERGED_FUNCTION_DOMAIN
>>>>>>>> 
>>>>>>>>   Barry
>>>>>>>> 
>>>>>>>> If you are using an ancient version of PETSc (I hope you are using the 
>>>>>>>> latest since that always has more bug fixes and features) that does 
>>>>>>>> not have SNESSetFunctionDomainError then just make sure the function 
>>>>>>>> vector result has an infinity or NaN in it and then 
>>>>>>>> SNESConvergedReason will be SNES_DIVERGED_FNORM_NAN
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> Thanks,
>>>>>>>>> David
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> On Wed, Dec 17, 2025 at 2:25 PM Barry Smith <[email protected] 
>>>>>>>>> <mailto:[email protected]>> wrote:
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>>> On Dec 17, 2025, at 2:08 PM, David Knezevic via petsc-users 
>>>>>>>>>>> <[email protected] <mailto:[email protected]>> wrote:
>>>>>>>>>>> 
>>>>>>>>>>> Hi,
>>>>>>>>>>> 
>>>>>>>>>>> I'm using PETSc via the libMesh framework, so creating a MWE is 
>>>>>>>>>>> complicated by that, unfortunately.
>>>>>>>>>>> 
>>>>>>>>>>> The situation is that I am not modifying the solution vector in a 
>>>>>>>>>>> callback. The SNES solve has terminated, with PetscErrorCode 82, 
>>>>>>>>>>> and I then want to update the solution vector (reset it to the 
>>>>>>>>>>> "previously converged value") and then try to solve again with a 
>>>>>>>>>>> smaller load increment. This is a typical "auto load stepping" 
>>>>>>>>>>> strategy in FE.
>>>>>>>>>> 
>>>>>>>>>>    Once a PetscError is generated you CANNOT continue the PETSc 
>>>>>>>>>> program, it is not designed to allow this and trying to continue 
>>>>>>>>>> will lead to further problems. 
>>>>>>>>>> 
>>>>>>>>>>    So what you need to do is prevent PETSc from getting to the point 
>>>>>>>>>> where an actual PetscErrorCode of 82 is generated.  Normally 
>>>>>>>>>> SNESSolve() returns without generating an error even if the 
>>>>>>>>>> nonlinear solver failed (for example did not converge). One then 
>>>>>>>>>> uses SNESGetConvergedReason to check if it converged or not. 
>>>>>>>>>> Normally when SNESSolve() returns, regardless of whether the 
>>>>>>>>>> converged reason is negative or positive, there will be no locked 
>>>>>>>>>> vectors and one can modify the SNES object and call SNESSolve again. 
>>>>>>>>>> 
>>>>>>>>>>   So my guess is that an actual PETSc error is being generated 
>>>>>>>>>> because SNESSetErrorIfNotConverged(snes,PETSC_TRUE) is being called 
>>>>>>>>>> by either your code or libMesh or the option 
>>>>>>>>>> -snes_error_if_not_converged is being used. In your case when you 
>>>>>>>>>> wish the code to work after a non-converged SNESSolve() these 
>>>>>>>>>> options should never be set instead you should check the result of 
>>>>>>>>>> SNESGetConvergedReason() to check if SNESSolve has failed. If 
>>>>>>>>>> SNESSetErrorIfNotConverged() is never being set that may indicate 
>>>>>>>>>> you are using an old version of PETSc or have it a bug inside 
>>>>>>>>>> PETSc's SNES that does not handle errors correctly and we can help 
>>>>>>>>>> fix the problem if you can provide a full debug output version of 
>>>>>>>>>> when the error occurs.
>>>>>>>>>> 
>>>>>>>>>>   Barry
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>>   
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> I think the key piece of info I'd like to know is, at what point is 
>>>>>>>>>>> the solution vector "unlocked" by the SNES object? Should it be 
>>>>>>>>>>> unlocked as soon as the SNES solve has terminated with 
>>>>>>>>>>> PetscErrorCode 82? Since it seems to me that it hasn't been 
>>>>>>>>>>> unlocked yet (maybe just on a subset of the processes). Should I 
>>>>>>>>>>> manually "unlock" the solution vector by calling VecLockWriteSet?
>>>>>>>>>>> 
>>>>>>>>>>> Thanks,
>>>>>>>>>>> David
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> On Wed, Dec 17, 2025 at 2:02 PM Stefano Zampini 
>>>>>>>>>>> <[email protected] <mailto:[email protected]>> 
>>>>>>>>>>> wrote:
>>>>>>>>>>>> You are not allowed to call VecGetArray on the solution vector of 
>>>>>>>>>>>> an SNES object within a user callback, nor to modify its values in 
>>>>>>>>>>>> any other way.
>>>>>>>>>>>> Put in C++ lingo, the solution vector is a "const" argument
>>>>>>>>>>>> It would be great if you could provide an MWE to help us 
>>>>>>>>>>>> understand your problem
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> Il giorno mer 17 dic 2025 alle ore 20:51 David Knezevic via 
>>>>>>>>>>>> petsc-users <[email protected] 
>>>>>>>>>>>> <mailto:[email protected]>> ha scritto:
>>>>>>>>>>>>> Hi all,
>>>>>>>>>>>>> 
>>>>>>>>>>>>> I have a question about this error:
>>>>>>>>>>>>>> Vector 'Vec_0x84000005_0' (argument #2) was locked for read-only 
>>>>>>>>>>>>>> access in unknown_function() at unknown file:0 (line numbers 
>>>>>>>>>>>>>> only accurate to function begin)
>>>>>>>>>>>>> 
>>>>>>>>>>>>> I'm encountering this error in an FE solve where there is an 
>>>>>>>>>>>>> error encountered during the residual/jacobian assembly, and what 
>>>>>>>>>>>>> we normally do in that situation is shrink the load step and 
>>>>>>>>>>>>> continue, starting from the "last converged solution". However, 
>>>>>>>>>>>>> in this case I'm running on 32 processes, and 5 of the processes 
>>>>>>>>>>>>> report the error above about a "locked vector".
>>>>>>>>>>>>> 
>>>>>>>>>>>>> We clear the SNES object (via SNESDestroy) before we reset the 
>>>>>>>>>>>>> solution to the "last converged solution", and then we make a new 
>>>>>>>>>>>>> SNES object subsequently. But it seems to me that somehow the 
>>>>>>>>>>>>> solution vector is still marked as "locked" on 5 of the processes 
>>>>>>>>>>>>> when we modify the solution vector, which leads to the error 
>>>>>>>>>>>>> above.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> I was wondering if someone could advise on what the best way to 
>>>>>>>>>>>>> handle this would be? I thought one option could be to add an MPI 
>>>>>>>>>>>>> barrier call prior to updating the solution vector to "last 
>>>>>>>>>>>>> converged solution", to make sure that the SNES object is 
>>>>>>>>>>>>> destroyed on all procs (and hence the locks cleared) before 
>>>>>>>>>>>>> editing the solution vector, but I'm unsure if that would make a 
>>>>>>>>>>>>> difference. Any  help would be most appreciated!
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>> David
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> --
>>>>>>>>>>>> Stefano
>>>>>>>>>> 
>>>>>>>> 
>>>>> 
>>

Re: [petsc-users] Question regarding SNES error about locked vectors

Reply via email to