Re: [petsc-users] KSP_Solve crashes in debug mode

Barry Smith Fri, 24 Feb 2023 09:47:52 -0800

Hmm, here is the macro

    #define PetscCallAbort(comm, ...) \
    do { \
      PetscErrorCode ierr_petsc_call_abort_; \
      PetscStackUpdateLine; \
      ierr_petsc_call_abort_ = __VA_ARGS__; \
      if (PetscUnlikely(ierr_petsc_call_abort_ != PETSC_SUCCESS)) { \
        ierr_petsc_call_abort_ = PetscError(PETSC_COMM_SELF, __LINE__, 
PETSC_FUNCTION_NAME, __FILE__, ierr_petsc_call_abort_, PETSC_ERROR_REPEAT, " 
"); \
        (void)MPI_Abort(comm, (PetscMPIInt)ierr_petsc_call_abort_); \
      } \
    } while (0)


it does not seem to increment anything in the stack. So I think call should be 
ok

Perhaps your function has a PetscFunctionBegin, but no PetscFunctionReturn() or 
in some other way increase the stack size without decreasing it?




> On Feb 24, 2023, at 12:39 PM, Sajid Ali Syed <sas...@fnal.gov> wrote:
> 
> Hi Barry, 
> 
> The application calls PetscCallAbort in a loop, i.e.
> 
> for i in range:
>   void routine(PetscCallAbort(function_returning_petsc_error_code))
> 
> From the prior logs it looks like the stack grows every time PetscCallAbort 
> is called (in other words, the stack does not shrink upon successful exit 
> from PetscCallAbort). 
> 
> Is this usage pattern not recommended? Should I be manually checking for 
> success of the `function_returning_petsc_error_code` and throw instead of 
> relying on PetscCallAbort?
> 
> 
> 
> Thank You,
> Sajid Ali (he/him) | Research Associate
> Data Science, Simulation, and Learning Division
> Fermi National Accelerator Laboratory
> s-sajid-ali.github.io <http://s-sajid-ali.github.io/>
> From: Barry Smith <bsm...@petsc.dev <mailto:bsm...@petsc.dev>>
> Sent: Wednesday, February 22, 2023 6:49 PM
> To: Sajid Ali Syed <sas...@fnal.gov <mailto:sas...@fnal.gov>>
> Cc: Matthew Knepley <knep...@gmail.com <mailto:knep...@gmail.com>>; 
> petsc-users@mcs.anl.gov <mailto:petsc-users@mcs.anl.gov> 
> <petsc-users@mcs.anl.gov <mailto:petsc-users@mcs.anl.gov>>
> Subject: Re: [petsc-users] KSP_Solve crashes in debug mode
>  
> 
>   Hmm, there could be a bug in our handling of the stack when reaches the 
> maximum. It is suppose to just stop collecting additional levels at that 
> point but likely it has not been tested since a lot of refactorizations.
> 
>    What are you doing to have so many stack frames? 
> 
>> On Feb 22, 2023, at 6:32 PM, Sajid Ali Syed <sas...@fnal.gov 
>> <mailto:sas...@fnal.gov>> wrote:
>> 
>> Hi Matt, 
>> 
>> Adding `-checkstack` does not prevent the crash, both on my laptop and on 
>> the cluster. 
>> 
>> What does prevent the crash (on my laptop at least) is changing 
>> `PETSCSTACKSIZE` from 64 to 256 here : 
>> https://github.com/petsc/petsc/blob/main/include/petscerror.h#L1153 
>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_petsc_petsc_blob_main_include_petscerror.h-23L1153&d=DwMFaQ&c=gRgGjJ3BkIsb5y6s49QqsA&r=w-DPglgoOUOz8eiEyHKz0g&m=h95E7R5X17258LHwsaKi0qVASp22lBVFOsdrDZFvAOS2iJQd-5FGzfHgq68ShXYR&s=Rfmp69z-e_VacDf-D0n8jt0xA6qq7oRBfgFSgMn1Dj8&e=>
>> 
>> 
>> Thank You,
>> Sajid Ali (he/him) | Research Associate
>> Data Science, Simulation, and Learning Division
>> Fermi National Accelerator Laboratory
>> s-sajid-ali.github.io 
>> <https://urldefense.proofpoint.com/v2/url?u=http-3A__s-2Dsajid-2Dali.github.io_&d=DwMFaQ&c=gRgGjJ3BkIsb5y6s49QqsA&r=w-DPglgoOUOz8eiEyHKz0g&m=h95E7R5X17258LHwsaKi0qVASp22lBVFOsdrDZFvAOS2iJQd-5FGzfHgq68ShXYR&s=KDcd2SRT062jOa-0d8hvQywGEvYtyx9oHol5xp4XMI8&e=>
>> From: Matthew Knepley <knep...@gmail.com <mailto:knep...@gmail.com>>
>> Sent: Wednesday, February 22, 2023 5:23 PM
>> To: Sajid Ali Syed <sas...@fnal.gov <mailto:sas...@fnal.gov>>
>> Cc: Barry Smith <bsm...@petsc.dev <mailto:bsm...@petsc.dev>>; 
>> petsc-users@mcs.anl.gov <mailto:petsc-users@mcs.anl.gov> 
>> <petsc-users@mcs.anl.gov <mailto:petsc-users@mcs.anl.gov>>
>> Subject: Re: [petsc-users] KSP_Solve crashes in debug mode
>>  
>> On Wed, Feb 22, 2023 at 6:18 PM Sajid Ali Syed via petsc-users 
>> <petsc-users@mcs.anl.gov <mailto:petsc-users@mcs.anl.gov>> wrote:
>> One thing to note in relation to the trace attached in the previous email is 
>> that there are no warnings until the 36th call to KSP_Solve. The first error 
>> (as indicated by ASAN) occurs somewhere before the 40th call to KSP_Solve 
>> (part of what the application marks as turn 10 of the propagator). The crash 
>> finally occurs on the 43rd call to KSP_solve.
>> 
>> Looking at the trace, it appears that stack handling is messed up and 
>> eventually it causes the crash. This can happen when
>> PetscFunctionBegin is not matched up with PetscFunctionReturn. Can you try 
>> running this with
>> 
>>   -checkstack
>> 
>>   Thanks,
>> 
>>      Matt
>>  
>> Thank You,
>> Sajid Ali (he/him) | Research Associate
>> Data Science, Simulation, and Learning Division
>> Fermi National Accelerator Laboratory
>> s-sajid-ali.github.io 
>> <https://urldefense.proofpoint.com/v2/url?u=http-3A__s-2Dsajid-2Dali.github.io&d=DwMFaQ&c=gRgGjJ3BkIsb5y6s49QqsA&r=w-DPglgoOUOz8eiEyHKz0g&m=q5fD8r624Cr0Ow4AKTmgeLtq_M--q_KdGYMkBNiKOMDa8o82C8P97vdCRcxrqTCF&s=oNWxB3zDYTHODeZK9VCibIVqSo7DnwsJjSr6IgIPs2M&e=>
>> From: Sajid Ali Syed <sas...@fnal.gov <mailto:sas...@fnal.gov>>
>> Sent: Wednesday, February 22, 2023 5:11 PM
>> To: Barry Smith <bsm...@petsc.dev <mailto:bsm...@petsc.dev>>
>> Cc: petsc-users@mcs.anl.gov <mailto:petsc-users@mcs.anl.gov> 
>> <petsc-users@mcs.anl.gov <mailto:petsc-users@mcs.anl.gov>>
>> Subject: Re: [petsc-users] KSP_Solve crashes in debug mode
>>  
>> Hi Barry, 
>> 
>> Thanks a lot for fixing this issue. I ran the same problem on a linux 
>> machine and have the following trace for the same crash (with ASAN turned on 
>> for both PETSc (on the latest commit of the branch) and the application) : 
>> https://gist.github.com/s-sajid-ali/85bdf689eb8452ef8702c214c4df6940 
>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__gist.github.com_s-2Dsajid-2Dali_85bdf689eb8452ef8702c214c4df6940&d=DwMFaQ&c=gRgGjJ3BkIsb5y6s49QqsA&r=w-DPglgoOUOz8eiEyHKz0g&m=q5fD8r624Cr0Ow4AKTmgeLtq_M--q_KdGYMkBNiKOMDa8o82C8P97vdCRcxrqTCF&s=Z8JyNKYXjUZE4DXYKvjxTOG4HZUA95U6z750WC6gUCo&e=>
>> 
>> The trace seems to indicate a couple of buffer overflows, one of which 
>> causes the crash. I'm not sure as to what causes them. 
>> 
>> Thank You,
>> Sajid Ali (he/him) | Research Associate
>> Data Science, Simulation, and Learning Division
>> Fermi National Accelerator Laboratory
>> s-sajid-ali.github.io 
>> <https://urldefense.proofpoint.com/v2/url?u=http-3A__s-2Dsajid-2Dali.github.io&d=DwMFaQ&c=gRgGjJ3BkIsb5y6s49QqsA&r=w-DPglgoOUOz8eiEyHKz0g&m=q5fD8r624Cr0Ow4AKTmgeLtq_M--q_KdGYMkBNiKOMDa8o82C8P97vdCRcxrqTCF&s=oNWxB3zDYTHODeZK9VCibIVqSo7DnwsJjSr6IgIPs2M&e=>
>> From: Barry Smith <bsm...@petsc.dev <mailto:bsm...@petsc.dev>>
>> Sent: Wednesday, February 15, 2023 2:01 PM
>> To: Sajid Ali Syed <sas...@fnal.gov <mailto:sas...@fnal.gov>>
>> Cc: petsc-users@mcs.anl.gov <mailto:petsc-users@mcs.anl.gov> 
>> <petsc-users@mcs.anl.gov <mailto:petsc-users@mcs.anl.gov>>
>> Subject: Re: [petsc-users] KSP_Solve crashes in debug mode
>>  
>> 
>> https://gitlab.com/petsc/petsc/-/merge_requests/6075 
>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__gitlab.com_petsc_petsc_-2D_merge-5Frequests_6075&d=DwMFaQ&c=gRgGjJ3BkIsb5y6s49QqsA&r=w-DPglgoOUOz8eiEyHKz0g&m=P7R0CW9R-fGNfm2q3yTL-ehqhM5N9-r8hHBLNgDetm9-7jxVqNsujIZ2hdnhVrVX&s=QwRI_DzGnCHagpaQSC4MPPEUnC4aAkbMwdG1eg_QUII&e=>
>>  should fix the possible recursive error condition Matt pointed out
>> 
>> 
>>> On Feb 9, 2023, at 6:24 PM, Matthew Knepley <knep...@gmail.com 
>>> <mailto:knep...@gmail.com>> wrote:
>>> 
>>> On Thu, Feb 9, 2023 at 6:05 PM Sajid Ali Syed via petsc-users 
>>> <petsc-users@mcs.anl.gov <mailto:petsc-users@mcs.anl.gov>> wrote:
>>> I added “-malloc_debug” in a .petscrc file and ran it again. The backtrace 
>>> from lldb is in the attached file. The crash now seems to be at:
>>> 
>>> Process 32660 stopped* thread #1, queue = 'com.apple.main-thread', stop 
>>> reason = EXC_BAD_ACCESS (code=2, address=0x16f603fb8)
>>>     frame #0: 0x0000000112ecc8f8 libpetsc.3.018.dylib`PetscFPrintf(comm=0, 
>>> fd=0x0000000000000000, format=0x0000000000000000) at mprint.c:601
>>>    598               `PetscViewerASCIISynchronizedPrintf()`, 
>>> `PetscSynchronizedFlush()`
>>>    599      @*/
>>>    600      PetscErrorCode PetscFPrintf(MPI_Comm comm, FILE *fd, const char 
>>> format[], ...)
>>> -> 601      {
>>>    602       PetscMPIInt rank;
>>>    603      
>>>    604       PetscFunctionBegin;
>>> (lldb) frame info
>>> frame #0: 0x0000000112ecc8f8 libpetsc.3.018.dylib`PetscFPrintf(comm=0, 
>>> fd=0x0000000000000000, format=0x0000000000000000) at mprint.c:601
>>> (lldb)
>>> The trace seems to indicate some sort of infinite loop causing an overflow.
>>> 
>>> 
>>> Yes, I have also seen this. What happens is that we have a memory error. 
>>> The error is reported inside PetscMallocValidate()
>>> using PetscErrorPrintf, which eventually calls PetscCallMPI, which calls 
>>> PetscMallocValidate again, which fails. We need to
>>> remove all error checking from the prints inside Validate.
>>> 
>>>   Thanks,
>>> 
>>>      Matt
>>>  
>>> PS: I'm using a arm64 mac, so I don't have access to valgrind. 
>>> 
>>> Thank You,
>>> Sajid Ali (he/him) | Research Associate
>>> Scientific Computing Division
>>> Fermi National Accelerator Laboratory
>>> s-sajid-ali.github.io 
>>> <https://urldefense.proofpoint.com/v2/url?u=http-3A__s-2Dsajid-2Dali.github.io_&d=DwMFaQ&c=gRgGjJ3BkIsb5y6s49QqsA&r=w-DPglgoOUOz8eiEyHKz0g&m=P7R0CW9R-fGNfm2q3yTL-ehqhM5N9-r8hHBLNgDetm9-7jxVqNsujIZ2hdnhVrVX&s=JA1u9AHcO8HqY5oCgbEy-ghtKRjURlRDwdmxP-9YJac&e=>
>>> 
>>> 
>>> -- 
>>> What most experimenters take for granted before they begin their 
>>> experiments is infinitely more interesting than any results to which their 
>>> experiments lead.
>>> -- Norbert Wiener
>>> 
>>> https://www.cse.buffalo.edu/~knepley/ 
>>> <https://urldefense.proofpoint.com/v2/url?u=http-3A__www.cse.buffalo.edu_-7Eknepley_&d=DwMFaQ&c=gRgGjJ3BkIsb5y6s49QqsA&r=w-DPglgoOUOz8eiEyHKz0g&m=P7R0CW9R-fGNfm2q3yTL-ehqhM5N9-r8hHBLNgDetm9-7jxVqNsujIZ2hdnhVrVX&s=CdEZKWQbBYiD2pzU3Az_EDIGUTBNkNHwSoD2n_2098Y&e=>
>> 
>> 
>> 
>> -- 
>> What most experimenters take for granted before they begin their experiments 
>> is infinitely more interesting than any results to which their experiments 
>> lead.
>> -- Norbert Wiener
>> 
>> https://www.cse.buffalo.edu/~knepley/ 
>> <https://urldefense.proofpoint.com/v2/url?u=http-3A__www.cse.buffalo.edu_-7Eknepley_&d=DwMFaQ&c=gRgGjJ3BkIsb5y6s49QqsA&r=w-DPglgoOUOz8eiEyHKz0g&m=q5fD8r624Cr0Ow4AKTmgeLtq_M--q_KdGYMkBNiKOMDa8o82C8P97vdCRcxrqTCF&s=Hkn4IxPABZIeY0m9o_VGFHJ4ntffqbtyd3fddpbZw7I&e=>

Re: [petsc-users] KSP_Solve crashes in debug mode

Reply via email to