Re: [petsc-users] Question regarding SNES error about locked vectors

David Knezevic via petsc-users Thu, 25 Dec 2025 13:01:27 -0800

OK, thanks! I'll let you know once I get a chance to try it out.


On Wed, Dec 24, 2025 at 10:02 PM Barry Smith <[email protected]> wrote:

>    I have started a merge request to properly propagate failure reasons up
> from the line search to the SNESSolve in
> https://urldefense.us/v3/__https://gitlab.com/petsc/petsc/-/merge_requests/8914__;!!G_uCfscf7eWS!c1k47peCRJtiG7O9EYxpFZUWSVyAnoq-6zoYdEPVFi0-gbNBHUxwlalV7EwvUqCe4iRdsX2nR2S2lzW1Ww7O2LY0rRmDhMQ$
>   Could you give it a
> try when you get the chance?
>
>
> On Dec 22, 2025, at 3:03 PM, David Knezevic <[email protected]>
> wrote:
>
> P.S. As a test I removed the "postcheck" callback, and I still get
> the same behavior with the DIVERGED_LINE_SEARCH converged reason, so I
> guess the "postcheck" is not related.
>
>
> On Mon, Dec 22, 2025 at 1:58 PM David Knezevic <[email protected]>
> wrote:
>
>> The print out I get from -snes_view is shown below. I wonder if the issue
>> is related to "using user-defined postcheck step"?
>>
>>
>> SNES Object: 1 MPI process
>>   type: newtonls
>>   maximum iterations=5, maximum function evaluations=10000
>>   tolerances: relative=0., absolute=0., solution=0.
>>   total number of linear solver iterations=3
>>   total number of function evaluations=4
>>   norm schedule ALWAYS
>>   SNESLineSearch Object: 1 MPI process
>>     type: basic
>>     maxstep=1.000000e+08, minlambda=1.000000e-12
>>     tolerances: relative=1.000000e-08, absolute=1.000000e-15,
>> lambda=1.000000e-08
>>     maximum iterations=40
>>     using user-defined postcheck step
>>   KSP Object: 1 MPI process
>>     type: preonly
>>     maximum iterations=10000, initial guess is zero
>>     tolerances: relative=1e-05, absolute=1e-50, divergence=10000.
>>     left preconditioning
>>     using NONE norm type for convergence test
>>   PC Object: 1 MPI process
>>     type: cholesky
>>       out-of-place factorization
>>       tolerance for zero pivot 2.22045e-14
>>       matrix ordering: external
>>       factor fill ratio given 0., needed 0.
>>         Factored matrix follows:
>>           Mat Object: 1 MPI process
>>             type: mumps
>>             rows=1152, cols=1152
>>             package used to perform factorization: mumps
>>             total: nonzeros=126936, allocated nonzeros=126936
>>               MUMPS run parameters:
>>                 Use -ksp_view ::ascii_info_detail to display information
>> for all processes
>>                 RINFOG(1) (global estimated flops for the elimination
>> after analysis): 1.63461e+07
>>                 RINFOG(2) (global estimated flops for the assembly after
>> factorization): 74826.
>>                 RINFOG(3) (global estimated flops for the elimination
>> after factorization): 1.63461e+07
>>                 (RINFOG(12) RINFOG(13))*2^INFOG(34) (determinant):
>> (0.,0.)*(2^0)
>>                 INFOG(3) (estimated real workspace for factors on all
>> processors after analysis): 150505
>>                 INFOG(4) (estimated integer workspace for factors on all
>> processors after analysis): 6276
>>                 INFOG(5) (estimated maximum front size in the complete
>> tree): 216
>>                 INFOG(6) (number of nodes in the complete tree): 24
>>                 INFOG(7) (ordering option effectively used after
>> analysis): 2
>>                 INFOG(8) (structural symmetry in percent of the permuted
>> matrix after analysis): 100
>>                 INFOG(9) (total real/complex workspace to store the
>> matrix factors after factorization): 150505
>>                 INFOG(10) (total integer space store the matrix factors
>> after factorization): 6276
>>                 INFOG(11) (order of largest frontal matrix after
>> factorization): 216
>>                 INFOG(12) (number of off-diagonal pivots): 1044
>>                 INFOG(13) (number of delayed pivots after factorization):
>> 0
>>                 INFOG(14) (number of memory compress after
>> factorization): 0
>>                 INFOG(15) (number of steps of iterative refinement after
>> solution): 0
>>                 INFOG(16) (estimated size (in MB) of all MUMPS internal
>> data for factorization after analysis: value on the most memory consuming
>> processor): 2
>>                 INFOG(17) (estimated size of all MUMPS internal data for
>> factorization after analysis: sum over all processors): 2
>>                 INFOG(18) (size of all MUMPS internal data allocated
>> during factorization: value on the most memory consuming processor): 2
>>                 INFOG(19) (size of all MUMPS internal data allocated
>> during factorization: sum over all processors): 2
>>                 INFOG(20) (estimated number of entries in the factors):
>> 126936
>>                 INFOG(21) (size in MB of memory effectively used during
>> factorization - value on the most memory consuming processor): 2
>>                 INFOG(22) (size in MB of memory effectively used during
>> factorization - sum over all processors): 2
>>                 INFOG(23) (after analysis: value of ICNTL(6) effectively
>> used): 0
>>                 INFOG(24) (after analysis: value of ICNTL(12) effectively
>> used): 1
>>                 INFOG(25) (after factorization: number of pivots modified
>> by static pivoting): 0
>>                 INFOG(28) (after factorization: number of null pivots
>> encountered): 0
>>                 INFOG(29) (after factorization: effective number of
>> entries in the factors (sum over all processors)): 126936
>>                 INFOG(30, 31) (after solution: size in Mbytes of memory
>> used during solution phase): 2, 2
>>                 INFOG(32) (after analysis: type of analysis done): 1
>>                 INFOG(33) (value used for ICNTL(8)): 7
>>                 INFOG(34) (exponent of the determinant if determinant is
>> requested): 0
>>                 INFOG(35) (after factorization: number of entries taking
>> into account BLR factor compression - sum over all processors): 126936
>>                 INFOG(36) (after analysis: estimated size of all MUMPS
>> internal data for running BLR in-core - value on the most memory consuming
>> processor): 0
>>                 INFOG(37) (after analysis: estimated size of all MUMPS
>> internal data for running BLR in-core - sum over all processors): 0
>>                 INFOG(38) (after analysis: estimated size of all MUMPS
>> internal data for running BLR out-of-core - value on the most memory
>> consuming processor): 0
>>                 INFOG(39) (after analysis: estimated size of all MUMPS
>> internal data for running BLR out-of-core - sum over all processors): 0
>>     linear system matrix = precond matrix:
>>     Mat Object: 1 MPI process
>>       type: seqaij
>>       rows=1152, cols=1152
>>       total: nonzeros=60480, allocated nonzeros=60480
>>       total number of mallocs used during MatSetValues calls=0
>>         using I-node routines: found 384 nodes, limit used is 5
>>
>>
>>
>> On Mon, Dec 22, 2025 at 9:25 AM Barry Smith <[email protected]> wrote:
>>
>>>   David,
>>>
>>>     This is due to a software glitch. SNES_DIVERGED_FUNCTION_DOMAIN was
>>> added long after the origins of SNES and, in places, the code was never
>>> fully updated to handle function domain problems. In particular, parts of
>>> the line search don't handle it correctly. Can you run with -snes_view and
>>> that will help us find the spot that needs to be updated.
>>>
>>>    Barry
>>>
>>>
>>> On Dec 21, 2025, at 5:53 PM, David Knezevic <[email protected]>
>>> wrote:
>>>
>>> Hi, actually, I have a follow up on this topic.
>>>
>>> I noticed that when I call SNESSetFunctionDomainError(), it exits the
>>> solve as expected, but it leads to a converged reason
>>> "DIVERGED_LINE_SEARCH" instead of "DIVERGED_FUNCTION_DOMAIN". If I also
>>> set SNESSetConvergedReason(snes, SNES_DIVERGED_FUNCTION_DOMAIN) in the
>>> callback, then I get the expected SNES_DIVERGED_FUNCTION_DOMAIN converged
>>> reason, so that's what I'm doing now. I was surprised by this behavior,
>>> though, since I expected that calling SNESSetFunctionDomainError woudld
>>> lead to the DIVERGED_FUNCTION_DOMAIN converged reason, so I just wanted to
>>> check on what could be causing this.
>>>
>>> FYI, I'm using PETSc 3.23.4
>>>
>>> Thanks,
>>> David
>>>
>>>
>>> On Thu, Dec 18, 2025 at 8:10 AM David Knezevic <
>>> [email protected]> wrote:
>>>
>>>> Thank you very much for this guidance. I switched to use
>>>> SNES_DIVERGED_FUNCTION_DOMAIN, and I don't get any errors now.
>>>>
>>>> Thanks!
>>>> David
>>>>
>>>>
>>>> On Wed, Dec 17, 2025 at 3:43 PM Barry Smith <[email protected]> wrote:
>>>>
>>>>>
>>>>>
>>>>> On Dec 17, 2025, at 2:47 PM, David Knezevic <
>>>>> [email protected]> wrote:
>>>>>
>>>>> Stefano and Barry: Thank you, this is very helpful.
>>>>>
>>>>> I'll give some more info here which may help to clarify further.
>>>>> Normally we do just get a negative "converged reason", as you described.
>>>>> But in this specific case where I'm having issues the solve is a
>>>>> numerically sensitive creep solve, which has exponential terms in the
>>>>> residual and jacobian callback that can "blow up" and give NaN values. In
>>>>> this case, the root cause is that we hit a NaN value during a callback, 
>>>>> and
>>>>> then we throw an exception (in libMesh C++ code) which I gather leads to
>>>>> the SNES solve exiting with this error code.
>>>>>
>>>>> Is there a way to tell the SNES to terminate with a negative
>>>>> "converged reason" because we've encountered some issue during the 
>>>>> callback?
>>>>>
>>>>>
>>>>>    In your callback you should call SNESSetFunctionDomainError() and
>>>>> make sure the function value has an infinity or NaN in it (you can call
>>>>> VecFlag() for this purpose)).
>>>>>
>>>>>    Now SNESConvergedReason will be a completely
>>>>> reasonable SNES_DIVERGED_FUNCTION_DOMAIN
>>>>>
>>>>>   Barry
>>>>>
>>>>> If you are using an ancient version of PETSc (I hope you are using the
>>>>> latest since that always has more bug fixes and features) that does not
>>>>> have SNESSetFunctionDomainError then just make sure the function vector
>>>>> result has an infinity or NaN in it and then SNESConvergedReason will be
>>>>> SNES_DIVERGED_FNORM_NAN
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> Thanks,
>>>>> David
>>>>>
>>>>>
>>>>> On Wed, Dec 17, 2025 at 2:25 PM Barry Smith <[email protected]> wrote:
>>>>>
>>>>>>
>>>>>>
>>>>>> On Dec 17, 2025, at 2:08 PM, David Knezevic via petsc-users <
>>>>>> [email protected]> wrote:
>>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> I'm using PETSc via the libMesh framework, so creating a MWE is
>>>>>> complicated by that, unfortunately.
>>>>>>
>>>>>> The situation is that I am not modifying the solution vector in a
>>>>>> callback. The SNES solve has terminated, with PetscErrorCode 82, and I 
>>>>>> then
>>>>>> want to update the solution vector (reset it to the "previously converged
>>>>>> value") and then try to solve again with a smaller load increment. This 
>>>>>> is
>>>>>> a typical "auto load stepping" strategy in FE.
>>>>>>
>>>>>>
>>>>>>    Once a PetscError is generated you CANNOT continue the PETSc
>>>>>> program, it is not designed to allow this and trying to continue will 
>>>>>> lead
>>>>>> to further problems.
>>>>>>
>>>>>>    So what you need to do is prevent PETSc from getting to the point
>>>>>> where an actual PetscErrorCode of 82 is generated.  Normally SNESSolve()
>>>>>> returns without generating an error even if the nonlinear solver failed
>>>>>> (for example did not converge). One then uses SNESGetConvergedReason to
>>>>>> check if it converged or not. Normally when SNESSolve() returns, 
>>>>>> regardless
>>>>>> of whether the converged reason is negative or positive, there will be no
>>>>>> locked vectors and one can modify the SNES object and call SNESSolve 
>>>>>> again.
>>>>>>
>>>>>>   So my guess is that an actual PETSc error is being generated
>>>>>> because SNESSetErrorIfNotConverged(snes,PETSC_TRUE) is being called by
>>>>>> either your code or libMesh or the option -snes_error_if_not_converged is
>>>>>> being used. In your case when you wish the code to work after a
>>>>>> non-converged SNESSolve() these options should never be set instead you
>>>>>> should check the result of SNESGetConvergedReason() to check if SNESSolve
>>>>>> has failed. If SNESSetErrorIfNotConverged() is never being set that may
>>>>>> indicate you are using an old version of PETSc or have it a bug inside
>>>>>> PETSc's SNES that does not handle errors correctly and we can help fix 
>>>>>> the
>>>>>> problem if you can provide a full debug output version of when the error
>>>>>> occurs.
>>>>>>
>>>>>>   Barry
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> I think the key piece of info I'd like to know is, at what point is
>>>>>> the solution vector "unlocked" by the SNES object? Should it be unlocked 
>>>>>> as
>>>>>> soon as the SNES solve has terminated with PetscErrorCode 82? Since it
>>>>>> seems to me that it hasn't been unlocked yet (maybe just on a subset of 
>>>>>> the
>>>>>> processes). Should I manually "unlock" the solution vector by
>>>>>> calling VecLockWriteSet?
>>>>>>
>>>>>> Thanks,
>>>>>> David
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Wed, Dec 17, 2025 at 2:02 PM Stefano Zampini <
>>>>>> [email protected]> wrote:
>>>>>>
>>>>>>> You are not allowed to call VecGetArray on the solution vector of an
>>>>>>> SNES object within a user callback, nor to modify its values in any 
>>>>>>> other
>>>>>>> way.
>>>>>>> Put in C++ lingo, the solution vector is a "const" argument
>>>>>>> It would be great if you could provide an MWE to help us understand
>>>>>>> your problem
>>>>>>>
>>>>>>>
>>>>>>> Il giorno mer 17 dic 2025 alle ore 20:51 David Knezevic via
>>>>>>> petsc-users <[email protected]> ha scritto:
>>>>>>>
>>>>>>>> Hi all,
>>>>>>>>
>>>>>>>> I have a question about this error:
>>>>>>>>
>>>>>>>>> Vector 'Vec_0x84000005_0' (argument #2) was locked for read-only
>>>>>>>>> access in unknown_function() at unknown file:0 (line numbers only 
>>>>>>>>> accurate
>>>>>>>>> to function begin)
>>>>>>>>
>>>>>>>>
>>>>>>>> I'm encountering this error in an FE solve where there is an error
>>>>>>>> encountered during the residual/jacobian assembly, and what we 
>>>>>>>> normally do
>>>>>>>> in that situation is shrink the load step and continue, starting from 
>>>>>>>> the
>>>>>>>> "last converged solution". However, in this case I'm running on 32
>>>>>>>> processes, and 5 of the processes report the error above about a 
>>>>>>>> "locked
>>>>>>>> vector".
>>>>>>>>
>>>>>>>> We clear the SNES object (via SNESDestroy) before we reset the
>>>>>>>> solution to the "last converged solution", and then we make a new SNES
>>>>>>>> object subsequently. But it seems to me that somehow the solution 
>>>>>>>> vector is
>>>>>>>> still marked as "locked" on 5 of the processes when we modify the 
>>>>>>>> solution
>>>>>>>> vector, which leads to the error above.
>>>>>>>>
>>>>>>>> I was wondering if someone could advise on what the best way to
>>>>>>>> handle this would be? I thought one option could be to add an MPI 
>>>>>>>> barrier
>>>>>>>> call prior to updating the solution vector to "last converged 
>>>>>>>> solution", to
>>>>>>>> make sure that the SNES object is destroyed on all procs (and hence the
>>>>>>>> locks cleared) before editing the solution vector, but I'm unsure if 
>>>>>>>> that
>>>>>>>> would make a difference. Any  help would be most appreciated!
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> David
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Stefano
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>
>

Re: [petsc-users] Question regarding SNES error about locked vectors

Reply via email to