On Wed, Feb 22, 2023 at 6:32 PM Sajid Ali Syed <sas...@fnal.gov> wrote:
> Hi Matt, > > Adding `-checkstack` does not prevent the crash, both on my laptop and on > the cluster. > It will not prevent a crash. The output is intended to show us where the stack problem originates. Can you send the output? Thanks, Matt > What does prevent the crash (on my laptop at least) is changing > `PETSCSTACKSIZE` from 64 to 256 here : > https://github.com/petsc/petsc/blob/main/include/petscerror.h#L1153 > > > Thank You, > Sajid Ali (he/him) | Research Associate > Data Science, Simulation, and Learning Division > Fermi National Accelerator Laboratory > s-sajid-ali.github.io > > ------------------------------ > *From:* Matthew Knepley <knep...@gmail.com> > *Sent:* Wednesday, February 22, 2023 5:23 PM > *To:* Sajid Ali Syed <sas...@fnal.gov> > *Cc:* Barry Smith <bsm...@petsc.dev>; petsc-users@mcs.anl.gov < > petsc-users@mcs.anl.gov> > *Subject:* Re: [petsc-users] KSP_Solve crashes in debug mode > > On Wed, Feb 22, 2023 at 6:18 PM Sajid Ali Syed via petsc-users < > petsc-users@mcs.anl.gov> wrote: > > One thing to note in relation to the trace attached in the previous email > is that there are no warnings until the 36th call to KSP_Solve. The first > error (as indicated by ASAN) occurs somewhere before the 40th call to > KSP_Solve (part of what the application marks as turn 10 of the > propagator). The crash finally occurs on the 43rd call to KSP_solve. > > > Looking at the trace, it appears that stack handling is messed up and > eventually it causes the crash. This can happen when > PetscFunctionBegin is not matched up with PetscFunctionReturn. Can you try > running this with > > -checkstack > > Thanks, > > Matt > > > Thank You, > Sajid Ali (he/him) | Research Associate > Data Science, Simulation, and Learning Division > Fermi National Accelerator Laboratory > s-sajid-ali.github.io > <https://urldefense.proofpoint.com/v2/url?u=http-3A__s-2Dsajid-2Dali.github.io&d=DwMFaQ&c=gRgGjJ3BkIsb5y6s49QqsA&r=w-DPglgoOUOz8eiEyHKz0g&m=q5fD8r624Cr0Ow4AKTmgeLtq_M--q_KdGYMkBNiKOMDa8o82C8P97vdCRcxrqTCF&s=oNWxB3zDYTHODeZK9VCibIVqSo7DnwsJjSr6IgIPs2M&e=> > > ------------------------------ > *From:* Sajid Ali Syed <sas...@fnal.gov> > *Sent:* Wednesday, February 22, 2023 5:11 PM > *To:* Barry Smith <bsm...@petsc.dev> > *Cc:* petsc-users@mcs.anl.gov <petsc-users@mcs.anl.gov> > *Subject:* Re: [petsc-users] KSP_Solve crashes in debug mode > > Hi Barry, > > Thanks a lot for fixing this issue. I ran the same problem on a linux > machine and have the following trace for the same crash (with ASAN turned > on for both PETSc (on the latest commit of the branch) and the application) > : https://gist.github.com/s-sajid-ali/85bdf689eb8452ef8702c214c4df6940 > <https://urldefense.proofpoint.com/v2/url?u=https-3A__gist.github.com_s-2Dsajid-2Dali_85bdf689eb8452ef8702c214c4df6940&d=DwMFaQ&c=gRgGjJ3BkIsb5y6s49QqsA&r=w-DPglgoOUOz8eiEyHKz0g&m=q5fD8r624Cr0Ow4AKTmgeLtq_M--q_KdGYMkBNiKOMDa8o82C8P97vdCRcxrqTCF&s=Z8JyNKYXjUZE4DXYKvjxTOG4HZUA95U6z750WC6gUCo&e=> > > The trace seems to indicate a couple of buffer overflows, one of which > causes the crash. I'm not sure as to what causes them. > > Thank You, > Sajid Ali (he/him) | Research Associate > Data Science, Simulation, and Learning Division > Fermi National Accelerator Laboratory > s-sajid-ali.github.io > <https://urldefense.proofpoint.com/v2/url?u=http-3A__s-2Dsajid-2Dali.github.io&d=DwMFaQ&c=gRgGjJ3BkIsb5y6s49QqsA&r=w-DPglgoOUOz8eiEyHKz0g&m=q5fD8r624Cr0Ow4AKTmgeLtq_M--q_KdGYMkBNiKOMDa8o82C8P97vdCRcxrqTCF&s=oNWxB3zDYTHODeZK9VCibIVqSo7DnwsJjSr6IgIPs2M&e=> > > ------------------------------ > *From:* Barry Smith <bsm...@petsc.dev> > *Sent:* Wednesday, February 15, 2023 2:01 PM > *To:* Sajid Ali Syed <sas...@fnal.gov> > *Cc:* petsc-users@mcs.anl.gov <petsc-users@mcs.anl.gov> > *Subject:* Re: [petsc-users] KSP_Solve crashes in debug mode > > > https://gitlab.com/petsc/petsc/-/merge_requests/6075 > <https://urldefense.proofpoint.com/v2/url?u=https-3A__gitlab.com_petsc_petsc_-2D_merge-5Frequests_6075&d=DwMFaQ&c=gRgGjJ3BkIsb5y6s49QqsA&r=w-DPglgoOUOz8eiEyHKz0g&m=P7R0CW9R-fGNfm2q3yTL-ehqhM5N9-r8hHBLNgDetm9-7jxVqNsujIZ2hdnhVrVX&s=QwRI_DzGnCHagpaQSC4MPPEUnC4aAkbMwdG1eg_QUII&e=> > should > fix the possible recursive error condition Matt pointed out > > > On Feb 9, 2023, at 6:24 PM, Matthew Knepley <knep...@gmail.com> wrote: > > On Thu, Feb 9, 2023 at 6:05 PM Sajid Ali Syed via petsc-users < > petsc-users@mcs.anl.gov> wrote: > > I added “-malloc_debug” in a .petscrc file and ran it again. The backtrace > from lldb is in the attached file. The crash now seems to be at: > > Process 32660 stopped* thread #1, queue = 'com.apple.main-thread', stop > reason = EXC_BAD_ACCESS (code=2, address=0x16f603fb8) > frame #0: 0x0000000112ecc8f8 libpetsc.3.018.dylib`PetscFPrintf(comm=0, > fd=0x0000000000000000, format=0x0000000000000000) at mprint.c:601 > 598 `PetscViewerASCIISynchronizedPrintf()`, > `PetscSynchronizedFlush()` > 599 @*/ > 600 PetscErrorCode PetscFPrintf(MPI_Comm comm, FILE *fd, const char > format[], ...) > -> 601 { > 602 PetscMPIInt rank; > 603 > 604 PetscFunctionBegin; > (lldb) frame info > frame #0: 0x0000000112ecc8f8 libpetsc.3.018.dylib`PetscFPrintf(comm=0, > fd=0x0000000000000000, format=0x0000000000000000) at mprint.c:601 > (lldb) > > The trace seems to indicate some sort of infinite loop causing an overflow. > > > Yes, I have also seen this. What happens is that we have a memory error. > The error is reported inside PetscMallocValidate() > using PetscErrorPrintf, which eventually calls PetscCallMPI, which calls > PetscMallocValidate again, which fails. We need to > remove all error checking from the prints inside Validate. > > Thanks, > > Matt > > > PS: I'm using a arm64 mac, so I don't have access to valgrind. > > Thank You, > Sajid Ali (he/him) | Research Associate > Scientific Computing Division > Fermi National Accelerator Laboratory > s-sajid-ali.github.io > <https://urldefense.proofpoint.com/v2/url?u=http-3A__s-2Dsajid-2Dali.github.io_&d=DwMFaQ&c=gRgGjJ3BkIsb5y6s49QqsA&r=w-DPglgoOUOz8eiEyHKz0g&m=P7R0CW9R-fGNfm2q3yTL-ehqhM5N9-r8hHBLNgDetm9-7jxVqNsujIZ2hdnhVrVX&s=JA1u9AHcO8HqY5oCgbEy-ghtKRjURlRDwdmxP-9YJac&e=> > > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > <https://urldefense.proofpoint.com/v2/url?u=http-3A__www.cse.buffalo.edu_-7Eknepley_&d=DwMFaQ&c=gRgGjJ3BkIsb5y6s49QqsA&r=w-DPglgoOUOz8eiEyHKz0g&m=P7R0CW9R-fGNfm2q3yTL-ehqhM5N9-r8hHBLNgDetm9-7jxVqNsujIZ2hdnhVrVX&s=CdEZKWQbBYiD2pzU3Az_EDIGUTBNkNHwSoD2n_2098Y&e=> > > > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > <https://urldefense.proofpoint.com/v2/url?u=http-3A__www.cse.buffalo.edu_-7Eknepley_&d=DwMFaQ&c=gRgGjJ3BkIsb5y6s49QqsA&r=w-DPglgoOUOz8eiEyHKz0g&m=q5fD8r624Cr0Ow4AKTmgeLtq_M--q_KdGYMkBNiKOMDa8o82C8P97vdCRcxrqTCF&s=Hkn4IxPABZIeY0m9o_VGFHJ4ntffqbtyd3fddpbZw7I&e=> > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ <http://www.cse.buffalo.edu/~knepley/>