Hi Barry, Thanks a lot for fixing this issue. I ran the same problem on a linux machine and have the following trace for the same crash (with ASAN turned on for both PETSc (on the latest commit of the branch) and the application) : https://gist.github.com/s-sajid-ali/85bdf689eb8452ef8702c214c4df6940
The trace seems to indicate a couple of buffer overflows, one of which causes the crash. I'm not sure as to what causes them. Thank You, Sajid Ali (he/him) | Research Associate Data Science, Simulation, and Learning Division Fermi National Accelerator Laboratory s-sajid-ali.github.io<http://s-sajid-ali.github.io> ________________________________ From: Barry Smith <bsm...@petsc.dev> Sent: Wednesday, February 15, 2023 2:01 PM To: Sajid Ali Syed <sas...@fnal.gov> Cc: petsc-users@mcs.anl.gov <petsc-users@mcs.anl.gov> Subject: Re: [petsc-users] KSP_Solve crashes in debug mode https://gitlab.com/petsc/petsc/-/merge_requests/6075<https://urldefense.proofpoint.com/v2/url?u=https-3A__gitlab.com_petsc_petsc_-2D_merge-5Frequests_6075&d=DwMFaQ&c=gRgGjJ3BkIsb5y6s49QqsA&r=w-DPglgoOUOz8eiEyHKz0g&m=P7R0CW9R-fGNfm2q3yTL-ehqhM5N9-r8hHBLNgDetm9-7jxVqNsujIZ2hdnhVrVX&s=QwRI_DzGnCHagpaQSC4MPPEUnC4aAkbMwdG1eg_QUII&e=> should fix the possible recursive error condition Matt pointed out On Feb 9, 2023, at 6:24 PM, Matthew Knepley <knep...@gmail.com> wrote: On Thu, Feb 9, 2023 at 6:05 PM Sajid Ali Syed via petsc-users <petsc-users@mcs.anl.gov<mailto:petsc-users@mcs.anl.gov>> wrote: I added “-malloc_debug” in a .petscrc file and ran it again. The backtrace from lldb is in the attached file. The crash now seems to be at: Process 32660 stopped* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=2, address=0x16f603fb8) frame #0: 0x0000000112ecc8f8 libpetsc.3.018.dylib`PetscFPrintf(comm=0, fd=0x0000000000000000, format=0x0000000000000000) at mprint.c:601 598 `PetscViewerASCIISynchronizedPrintf()`, `PetscSynchronizedFlush()` 599 @*/ 600 PetscErrorCode PetscFPrintf(MPI_Comm comm, FILE *fd, const char format[], ...) -> 601 { 602 PetscMPIInt rank; 603 604 PetscFunctionBegin; (lldb) frame info frame #0: 0x0000000112ecc8f8 libpetsc.3.018.dylib`PetscFPrintf(comm=0, fd=0x0000000000000000, format=0x0000000000000000) at mprint.c:601 (lldb) The trace seems to indicate some sort of infinite loop causing an overflow. Yes, I have also seen this. What happens is that we have a memory error. The error is reported inside PetscMallocValidate() using PetscErrorPrintf, which eventually calls PetscCallMPI, which calls PetscMallocValidate again, which fails. We need to remove all error checking from the prints inside Validate. Thanks, Matt PS: I'm using a arm64 mac, so I don't have access to valgrind. Thank You, Sajid Ali (he/him) | Research Associate Scientific Computing Division Fermi National Accelerator Laboratory s-sajid-ali.github.io<https://urldefense.proofpoint.com/v2/url?u=http-3A__s-2Dsajid-2Dali.github.io_&d=DwMFaQ&c=gRgGjJ3BkIsb5y6s49QqsA&r=w-DPglgoOUOz8eiEyHKz0g&m=P7R0CW9R-fGNfm2q3yTL-ehqhM5N9-r8hHBLNgDetm9-7jxVqNsujIZ2hdnhVrVX&s=JA1u9AHcO8HqY5oCgbEy-ghtKRjURlRDwdmxP-9YJac&e=> -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/<https://urldefense.proofpoint.com/v2/url?u=http-3A__www.cse.buffalo.edu_-7Eknepley_&d=DwMFaQ&c=gRgGjJ3BkIsb5y6s49QqsA&r=w-DPglgoOUOz8eiEyHKz0g&m=P7R0CW9R-fGNfm2q3yTL-ehqhM5N9-r8hHBLNgDetm9-7jxVqNsujIZ2hdnhVrVX&s=CdEZKWQbBYiD2pzU3Az_EDIGUTBNkNHwSoD2n_2098Y&e=>