Hmm, here is the macro #define PetscCallAbort(comm, ...) \ do { \ PetscErrorCode ierr_petsc_call_abort_; \ PetscStackUpdateLine; \ ierr_petsc_call_abort_ = __VA_ARGS__; \ if (PetscUnlikely(ierr_petsc_call_abort_ != PETSC_SUCCESS)) { \ ierr_petsc_call_abort_ = PetscError(PETSC_COMM_SELF, __LINE__, PETSC_FUNCTION_NAME, __FILE__, ierr_petsc_call_abort_, PETSC_ERROR_REPEAT, " "); \ (void)MPI_Abort(comm, (PetscMPIInt)ierr_petsc_call_abort_); \ } \ } while (0)
it does not seem to increment anything in the stack. So I think call should be ok Perhaps your function has a PetscFunctionBegin, but no PetscFunctionReturn() or in some other way increase the stack size without decreasing it? > On Feb 24, 2023, at 12:39 PM, Sajid Ali Syed <sas...@fnal.gov> wrote: > > Hi Barry, > > The application calls PetscCallAbort in a loop, i.e. > > for i in range: > void routine(PetscCallAbort(function_returning_petsc_error_code)) > > From the prior logs it looks like the stack grows every time PetscCallAbort > is called (in other words, the stack does not shrink upon successful exit > from PetscCallAbort). > > Is this usage pattern not recommended? Should I be manually checking for > success of the `function_returning_petsc_error_code` and throw instead of > relying on PetscCallAbort? > > > > Thank You, > Sajid Ali (he/him) | Research Associate > Data Science, Simulation, and Learning Division > Fermi National Accelerator Laboratory > s-sajid-ali.github.io <http://s-sajid-ali.github.io/> > From: Barry Smith <bsm...@petsc.dev <mailto:bsm...@petsc.dev>> > Sent: Wednesday, February 22, 2023 6:49 PM > To: Sajid Ali Syed <sas...@fnal.gov <mailto:sas...@fnal.gov>> > Cc: Matthew Knepley <knep...@gmail.com <mailto:knep...@gmail.com>>; > petsc-users@mcs.anl.gov <mailto:petsc-users@mcs.anl.gov> > <petsc-users@mcs.anl.gov <mailto:petsc-users@mcs.anl.gov>> > Subject: Re: [petsc-users] KSP_Solve crashes in debug mode > > > Hmm, there could be a bug in our handling of the stack when reaches the > maximum. It is suppose to just stop collecting additional levels at that > point but likely it has not been tested since a lot of refactorizations. > > What are you doing to have so many stack frames? > >> On Feb 22, 2023, at 6:32 PM, Sajid Ali Syed <sas...@fnal.gov >> <mailto:sas...@fnal.gov>> wrote: >> >> Hi Matt, >> >> Adding `-checkstack` does not prevent the crash, both on my laptop and on >> the cluster. >> >> What does prevent the crash (on my laptop at least) is changing >> `PETSCSTACKSIZE` from 64 to 256 here : >> https://github.com/petsc/petsc/blob/main/include/petscerror.h#L1153 >> <https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_petsc_petsc_blob_main_include_petscerror.h-23L1153&d=DwMFaQ&c=gRgGjJ3BkIsb5y6s49QqsA&r=w-DPglgoOUOz8eiEyHKz0g&m=h95E7R5X17258LHwsaKi0qVASp22lBVFOsdrDZFvAOS2iJQd-5FGzfHgq68ShXYR&s=Rfmp69z-e_VacDf-D0n8jt0xA6qq7oRBfgFSgMn1Dj8&e=> >> >> >> Thank You, >> Sajid Ali (he/him) | Research Associate >> Data Science, Simulation, and Learning Division >> Fermi National Accelerator Laboratory >> s-sajid-ali.github.io >> <https://urldefense.proofpoint.com/v2/url?u=http-3A__s-2Dsajid-2Dali.github.io_&d=DwMFaQ&c=gRgGjJ3BkIsb5y6s49QqsA&r=w-DPglgoOUOz8eiEyHKz0g&m=h95E7R5X17258LHwsaKi0qVASp22lBVFOsdrDZFvAOS2iJQd-5FGzfHgq68ShXYR&s=KDcd2SRT062jOa-0d8hvQywGEvYtyx9oHol5xp4XMI8&e=> >> From: Matthew Knepley <knep...@gmail.com <mailto:knep...@gmail.com>> >> Sent: Wednesday, February 22, 2023 5:23 PM >> To: Sajid Ali Syed <sas...@fnal.gov <mailto:sas...@fnal.gov>> >> Cc: Barry Smith <bsm...@petsc.dev <mailto:bsm...@petsc.dev>>; >> petsc-users@mcs.anl.gov <mailto:petsc-users@mcs.anl.gov> >> <petsc-users@mcs.anl.gov <mailto:petsc-users@mcs.anl.gov>> >> Subject: Re: [petsc-users] KSP_Solve crashes in debug mode >> >> On Wed, Feb 22, 2023 at 6:18 PM Sajid Ali Syed via petsc-users >> <petsc-users@mcs.anl.gov <mailto:petsc-users@mcs.anl.gov>> wrote: >> One thing to note in relation to the trace attached in the previous email is >> that there are no warnings until the 36th call to KSP_Solve. The first error >> (as indicated by ASAN) occurs somewhere before the 40th call to KSP_Solve >> (part of what the application marks as turn 10 of the propagator). The crash >> finally occurs on the 43rd call to KSP_solve. >> >> Looking at the trace, it appears that stack handling is messed up and >> eventually it causes the crash. This can happen when >> PetscFunctionBegin is not matched up with PetscFunctionReturn. Can you try >> running this with >> >> -checkstack >> >> Thanks, >> >> Matt >> >> Thank You, >> Sajid Ali (he/him) | Research Associate >> Data Science, Simulation, and Learning Division >> Fermi National Accelerator Laboratory >> s-sajid-ali.github.io >> <https://urldefense.proofpoint.com/v2/url?u=http-3A__s-2Dsajid-2Dali.github.io&d=DwMFaQ&c=gRgGjJ3BkIsb5y6s49QqsA&r=w-DPglgoOUOz8eiEyHKz0g&m=q5fD8r624Cr0Ow4AKTmgeLtq_M--q_KdGYMkBNiKOMDa8o82C8P97vdCRcxrqTCF&s=oNWxB3zDYTHODeZK9VCibIVqSo7DnwsJjSr6IgIPs2M&e=> >> From: Sajid Ali Syed <sas...@fnal.gov <mailto:sas...@fnal.gov>> >> Sent: Wednesday, February 22, 2023 5:11 PM >> To: Barry Smith <bsm...@petsc.dev <mailto:bsm...@petsc.dev>> >> Cc: petsc-users@mcs.anl.gov <mailto:petsc-users@mcs.anl.gov> >> <petsc-users@mcs.anl.gov <mailto:petsc-users@mcs.anl.gov>> >> Subject: Re: [petsc-users] KSP_Solve crashes in debug mode >> >> Hi Barry, >> >> Thanks a lot for fixing this issue. I ran the same problem on a linux >> machine and have the following trace for the same crash (with ASAN turned on >> for both PETSc (on the latest commit of the branch) and the application) : >> https://gist.github.com/s-sajid-ali/85bdf689eb8452ef8702c214c4df6940 >> <https://urldefense.proofpoint.com/v2/url?u=https-3A__gist.github.com_s-2Dsajid-2Dali_85bdf689eb8452ef8702c214c4df6940&d=DwMFaQ&c=gRgGjJ3BkIsb5y6s49QqsA&r=w-DPglgoOUOz8eiEyHKz0g&m=q5fD8r624Cr0Ow4AKTmgeLtq_M--q_KdGYMkBNiKOMDa8o82C8P97vdCRcxrqTCF&s=Z8JyNKYXjUZE4DXYKvjxTOG4HZUA95U6z750WC6gUCo&e=> >> >> The trace seems to indicate a couple of buffer overflows, one of which >> causes the crash. I'm not sure as to what causes them. >> >> Thank You, >> Sajid Ali (he/him) | Research Associate >> Data Science, Simulation, and Learning Division >> Fermi National Accelerator Laboratory >> s-sajid-ali.github.io >> <https://urldefense.proofpoint.com/v2/url?u=http-3A__s-2Dsajid-2Dali.github.io&d=DwMFaQ&c=gRgGjJ3BkIsb5y6s49QqsA&r=w-DPglgoOUOz8eiEyHKz0g&m=q5fD8r624Cr0Ow4AKTmgeLtq_M--q_KdGYMkBNiKOMDa8o82C8P97vdCRcxrqTCF&s=oNWxB3zDYTHODeZK9VCibIVqSo7DnwsJjSr6IgIPs2M&e=> >> From: Barry Smith <bsm...@petsc.dev <mailto:bsm...@petsc.dev>> >> Sent: Wednesday, February 15, 2023 2:01 PM >> To: Sajid Ali Syed <sas...@fnal.gov <mailto:sas...@fnal.gov>> >> Cc: petsc-users@mcs.anl.gov <mailto:petsc-users@mcs.anl.gov> >> <petsc-users@mcs.anl.gov <mailto:petsc-users@mcs.anl.gov>> >> Subject: Re: [petsc-users] KSP_Solve crashes in debug mode >> >> >> https://gitlab.com/petsc/petsc/-/merge_requests/6075 >> <https://urldefense.proofpoint.com/v2/url?u=https-3A__gitlab.com_petsc_petsc_-2D_merge-5Frequests_6075&d=DwMFaQ&c=gRgGjJ3BkIsb5y6s49QqsA&r=w-DPglgoOUOz8eiEyHKz0g&m=P7R0CW9R-fGNfm2q3yTL-ehqhM5N9-r8hHBLNgDetm9-7jxVqNsujIZ2hdnhVrVX&s=QwRI_DzGnCHagpaQSC4MPPEUnC4aAkbMwdG1eg_QUII&e=> >> should fix the possible recursive error condition Matt pointed out >> >> >>> On Feb 9, 2023, at 6:24 PM, Matthew Knepley <knep...@gmail.com >>> <mailto:knep...@gmail.com>> wrote: >>> >>> On Thu, Feb 9, 2023 at 6:05 PM Sajid Ali Syed via petsc-users >>> <petsc-users@mcs.anl.gov <mailto:petsc-users@mcs.anl.gov>> wrote: >>> I added “-malloc_debug” in a .petscrc file and ran it again. The backtrace >>> from lldb is in the attached file. The crash now seems to be at: >>> >>> Process 32660 stopped* thread #1, queue = 'com.apple.main-thread', stop >>> reason = EXC_BAD_ACCESS (code=2, address=0x16f603fb8) >>> frame #0: 0x0000000112ecc8f8 libpetsc.3.018.dylib`PetscFPrintf(comm=0, >>> fd=0x0000000000000000, format=0x0000000000000000) at mprint.c:601 >>> 598 `PetscViewerASCIISynchronizedPrintf()`, >>> `PetscSynchronizedFlush()` >>> 599 @*/ >>> 600 PetscErrorCode PetscFPrintf(MPI_Comm comm, FILE *fd, const char >>> format[], ...) >>> -> 601 { >>> 602 PetscMPIInt rank; >>> 603 >>> 604 PetscFunctionBegin; >>> (lldb) frame info >>> frame #0: 0x0000000112ecc8f8 libpetsc.3.018.dylib`PetscFPrintf(comm=0, >>> fd=0x0000000000000000, format=0x0000000000000000) at mprint.c:601 >>> (lldb) >>> The trace seems to indicate some sort of infinite loop causing an overflow. >>> >>> >>> Yes, I have also seen this. What happens is that we have a memory error. >>> The error is reported inside PetscMallocValidate() >>> using PetscErrorPrintf, which eventually calls PetscCallMPI, which calls >>> PetscMallocValidate again, which fails. We need to >>> remove all error checking from the prints inside Validate. >>> >>> Thanks, >>> >>> Matt >>> >>> PS: I'm using a arm64 mac, so I don't have access to valgrind. >>> >>> Thank You, >>> Sajid Ali (he/him) | Research Associate >>> Scientific Computing Division >>> Fermi National Accelerator Laboratory >>> s-sajid-ali.github.io >>> <https://urldefense.proofpoint.com/v2/url?u=http-3A__s-2Dsajid-2Dali.github.io_&d=DwMFaQ&c=gRgGjJ3BkIsb5y6s49QqsA&r=w-DPglgoOUOz8eiEyHKz0g&m=P7R0CW9R-fGNfm2q3yTL-ehqhM5N9-r8hHBLNgDetm9-7jxVqNsujIZ2hdnhVrVX&s=JA1u9AHcO8HqY5oCgbEy-ghtKRjURlRDwdmxP-9YJac&e=> >>> >>> >>> -- >>> What most experimenters take for granted before they begin their >>> experiments is infinitely more interesting than any results to which their >>> experiments lead. >>> -- Norbert Wiener >>> >>> https://www.cse.buffalo.edu/~knepley/ >>> <https://urldefense.proofpoint.com/v2/url?u=http-3A__www.cse.buffalo.edu_-7Eknepley_&d=DwMFaQ&c=gRgGjJ3BkIsb5y6s49QqsA&r=w-DPglgoOUOz8eiEyHKz0g&m=P7R0CW9R-fGNfm2q3yTL-ehqhM5N9-r8hHBLNgDetm9-7jxVqNsujIZ2hdnhVrVX&s=CdEZKWQbBYiD2pzU3Az_EDIGUTBNkNHwSoD2n_2098Y&e=> >> >> >> >> -- >> What most experimenters take for granted before they begin their experiments >> is infinitely more interesting than any results to which their experiments >> lead. >> -- Norbert Wiener >> >> https://www.cse.buffalo.edu/~knepley/ >> <https://urldefense.proofpoint.com/v2/url?u=http-3A__www.cse.buffalo.edu_-7Eknepley_&d=DwMFaQ&c=gRgGjJ3BkIsb5y6s49QqsA&r=w-DPglgoOUOz8eiEyHKz0g&m=q5fD8r624Cr0Ow4AKTmgeLtq_M--q_KdGYMkBNiKOMDa8o82C8P97vdCRcxrqTCF&s=Hkn4IxPABZIeY0m9o_VGFHJ4ntffqbtyd3fddpbZw7I&e=>