Re: Crashes in recent builds from master
At 09:39 -0700 08 Sep 2021, "Kevin J. McCarthy" wrote: On Wed, Sep 08, 2021 at 09:09:03AM -0700, Kevin J. McCarthy wrote: On Wed, Sep 08, 2021 at 03:56:56AM -0700, Kevin J. McCarthy wrote: To trigger the QRESYNC failure, delete some messages in the mailbox using mutt. Sync and exit the mailbox, wait till there are more new messages in that mailbox and reopen using mutt. In the little bit of testing I was able to do after enabling ASAN, it had seemed that it was actually crashing as I left the mailbox after deleting rather than needing to return to it. But, I'd wanted to do a bit more to isolate the problem before reporting that. I was able to trigger the crash, and I've figured out the problem. I'll push a commit to a branch for testing later on today. I've pushed several commits to branch 'kevin/stable-fixes'. As the branch says, it's based on 'stable' and so doesn't have the thread changes in master. However, I've also pushed up a branch 'kevin/master-stable-fixes-rebase-test' that has those commits merged in to master. I need to clean things up and test more, but would appreciate if you would test it too. I've done a bit of testing with that and ASAN already, and haven't had any crashes even when doing things that seemed to reliably cause crashes with ASAN before those fixes. I'll continue using that build for awhile, although I'll likely want to disable ASAN at some point. Thank you. Thank you for the prompt fix. signature.asc Description: PGP signature
Re: Crashes in recent builds from master
On Wed, Sep 08, 2021 at 06:24:51PM +0200, Rene Kita wrote: > On Wed, Sep 08, 2021 at 03:56:56AM -0700, Kevin J. McCarthy wrote: > > However, I haven't been able to figure out the memory error leading to a > > crash yet. It would be helpful if you could run with ASAN enabled until you > > get the crash(es). With ASAN you'll need to arrange it so the tmux window > > doesn't close when mutt crashes so you can read the ASAN report. > JFTR, you should be able to catch the output from ASAN with tee: > % mutt 2> >(tee -a stderr.log >&2) Or, before you start mutt, add ASAN=log_path=asan:print_legend=0 to your environment, and the ASAN output will now appear in a file in the current working directory when mutt exits. -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3266 Virtualization: qemu.org | libvirt.org
Re: Crashes in recent builds from master
On Wed, Sep 08, 2021 at 09:09:03AM -0700, Kevin J. McCarthy wrote: On Wed, Sep 08, 2021 at 03:56:56AM -0700, Kevin J. McCarthy wrote: However, I haven't been able to figure out the memory error leading to a crash yet. It would be helpful if you could run with ASAN enabled until you get the crash(es). With ASAN you'll need to arrange it so the tmux window doesn't close when mutt crashes so you can read the ASAN report. To trigger the QRESYNC failure, delete some messages in the mailbox using mutt. Sync and exit the mailbox, wait till there are more new messages in that mailbox and reopen using mutt. I was able to trigger the crash, and I've figured out the problem. I'll push a commit to a branch for testing later on today. I've pushed several commits to branch 'kevin/stable-fixes'. As the branch says, it's based on 'stable' and so doesn't have the thread changes in master. However, I've also pushed up a branch 'kevin/master-stable-fixes-rebase-test' that has those commits merged in to master. I need to clean things up and test more, but would appreciate if you would test it too. Thank you. -- Kevin J. McCarthy GPG Fingerprint: 8975 A9B3 3AA3 7910 385C 5308 ADEF 7684 8031 6BDA signature.asc Description: PGP signature
Re: Crashes in recent builds from master
On Wed, Sep 08, 2021 at 03:56:56AM -0700, Kevin J. McCarthy wrote: > However, I haven't been able to figure out the memory error leading to a > crash yet. It would be helpful if you could run with ASAN enabled until you > get the crash(es). With ASAN you'll need to arrange it so the tmux window > doesn't close when mutt crashes so you can read the ASAN report. JFTR, you should be able to catch the output from ASAN with tee: % mutt 2> >(tee -a stderr.log >&2)
Re: Crashes in recent builds from master
On Wed, Sep 08, 2021 at 03:56:56AM -0700, Kevin J. McCarthy wrote: However, I haven't been able to figure out the memory error leading to a crash yet. It would be helpful if you could run with ASAN enabled until you get the crash(es). With ASAN you'll need to arrange it so the tmux window doesn't close when mutt crashes so you can read the ASAN report. To trigger the QRESYNC failure, delete some messages in the mailbox using mutt. Sync and exit the mailbox, wait till there are more new messages in that mailbox and reopen using mutt. I was able to trigger the crash, and I've figured out the problem. I'll push a commit to a branch for testing later on today. -- Kevin J. McCarthy GPG Fingerprint: 8975 A9B3 3AA3 7910 385C 5308 ADEF 7684 8031 6BDA signature.asc Description: PGP signature
Re: Crashes in recent builds from master
On Tue, Sep 07, 2021 at 09:50:05PM -0700, Kevin J. McCarthy wrote: On Tue, Sep 07, 2021 at 11:45:03PM -0400, Aaron Schrab wrote: At 18:00 -0700 07 Sep 2021, "Kevin J. McCarthy" wrote: Are you using $imap_qresync or $imap_condstore? Yes, I have both of those enabled (using dovecot 2.3.16 from Debian unstable as the IMAP server). In at least some of the crashes I believe I've seen messages about QRESYNC failing immediately before; but I generally have mutt running in a tmux window that's set to close when mutt exits so the message is generally only visible very briefly. That gives me some ideas. I'll take a closer look, but I think my fix in commit 74ce032f may have caused some other issues with $imap_qresync. Yes, it looks like commit 74ce032f was incorrect. I'll need to fix that and make another stable release soon. However, I haven't been able to figure out the memory error leading to a crash yet. It would be helpful if you could run with ASAN enabled until you get the crash(es). With ASAN you'll need to arrange it so the tmux window doesn't close when mutt crashes so you can read the ASAN report. To trigger the QRESYNC failure, delete some messages in the mailbox using mutt. Sync and exit the mailbox, wait till there are more new messages in that mailbox and reopen using mutt. -- Kevin J. McCarthy GPG Fingerprint: 8975 A9B3 3AA3 7910 385C 5308 ADEF 7684 8031 6BDA signature.asc Description: PGP signature
Re: Crashes in recent builds from master
On Tue, Sep 07, 2021 at 11:45:03PM -0400, Aaron Schrab wrote: At 18:00 -0700 07 Sep 2021, "Kevin J. McCarthy" wrote: Are you using $imap_qresync or $imap_condstore? Yes, I have both of those enabled (using dovecot 2.3.16 from Debian unstable as the IMAP server). In at least some of the crashes I believe I've seen messages about QRESYNC failing immediately before; but I generally have mutt running in a tmux window that's set to close when mutt exits so the message is generally only visible very briefly. That gives me some ideas. I'll take a closer look, but I think my fix in commit 74ce032f may have caused some other issues with $imap_qresync. I don't see how it's causing the crash, but it may be that I didn't properly reset something if verifying the qresync failed, leading to a stray pointer. I'll try to take a closer look the next couple days. -- Kevin J. McCarthy GPG Fingerprint: 8975 A9B3 3AA3 7910 385C 5308 ADEF 7684 8031 6BDA signature.asc Description: PGP signature
Re: Crashes in recent builds from master
At 18:00 -0700 07 Sep 2021, "Kevin J. McCarthy" wrote: On Tue, Sep 07, 2021 at 07:15:16PM -0400, Aaron Schrab wrote: Since updating from a build based on bcdb61560 (Add %T status format for $sort_thread_groups., 2021-08-05) to one based on 27e61da56 (Merge branch 'stable', 2021-08-24) I've been experiencing some sporadic crashes. Unfortunately both IMAP and thread code *have* been touched recently. I'd thought that most of the thread changes were in the clear, at least until I'd looked further into my builds for writing the original message. So it looks like I goofed something up. :-( Trying to catch those types of issues is one of the reasons I try to follow master (or in some cases branches that aren't even that ready) fairly closely. However, bcdb61560 isn't on master. Were you running off of my development branch before, or were you perhaps referring to 5aa75ed2? Yes, I had been using your branch with the early support for $sort_thread_groups. I *thought* that I'd updated to the version of that that got into master, but apparently I hadn't. At least that early version of the new threading mode had seemed very stable to me. Of course with the sporadic nature of the crashes it's possible that I just hadn't hit the problem then. You may also want to try enabling ASAN via something like export CFLAGS='-g3 -fno-omit-frame-pointer -fsanitize=address' and re-configure/recompile, to see if it can give an earlier warning about memory corruption. I've added that to my configure wrapper script, and I'll be restarting to use the copy built with that as soon as I send this message. For the more troublesome one I get the following backtrace. Once this comes up it will keep crashing when I attempt to change to the same folder, at least in the short term. Although if I open this folder once with the old build then switch back to the new build the problem will go away for awhile. Are you using $imap_qresync or $imap_condstore? Yes, I have both of those enabled (using dovecot 2.3.16 from Debian unstable as the IMAP server). In at least some of the crashes I believe I've seen messages about QRESYNC failing immediately before; but I generally have mutt running in a tmux window that's set to close when mutt exits so the message is generally only visible very briefly. The stack is in a pretty benign section, so it seems like it's a wild pointer or something corrupting memory. The other problem seems to occur on line 431 of thread.c: →· !tmp->fake_thread &&→·→· /* don't match pseudo threads */ I usually test with $strict_threads enabled. I'll turn that off and see if I can trigger the problem. If I run into a case where the problem seems to be at least briefly reproducible I'll try turning that off to see if that avoids the problem. signature.asc Description: PGP signature
Re: Crashes in recent builds from master
On Tue, Sep 07, 2021 at 07:15:16PM -0400, Aaron Schrab wrote: Since updating from a build based on bcdb61560 (Add %T status format for $sort_thread_groups., 2021-08-05) to one based on 27e61da56 (Merge branch 'stable', 2021-08-24) I've been experiencing some sporadic crashes. Unfortunately both IMAP and thread code *have* been touched recently. So it looks like I goofed something up. :-( However, bcdb61560 isn't on master. Were you running off of my development branch before, or were you perhaps referring to 5aa75ed2? You may also want to try enabling ASAN via something like export CFLAGS='-g3 -fno-omit-frame-pointer -fsanitize=address' and re-configure/recompile, to see if it can give an earlier warning about memory corruption. For the more troublesome one I get the following backtrace. Once this comes up it will keep crashing when I attempt to change to the same folder, at least in the short term. Although if I open this folder once with the old build then switch back to the new build the problem will go away for awhile. Are you using $imap_qresync or $imap_condstore? The stack is in a pretty benign section, so it seems like it's a wild pointer or something corrupting memory. The other problem seems to occur on line 431 of thread.c: →· !tmp->fake_thread &&→·→· /* don't match pseudo threads */ I usually test with $strict_threads enabled. I'll turn that off and see if I can trigger the problem. -- Kevin J. McCarthy GPG Fingerprint: 8975 A9B3 3AA3 7910 385C 5308 ADEF 7684 8031 6BDA signature.asc Description: PGP signature
Crashes in recent builds from master
Since updating from a build based on bcdb61560 (Add %T status format for $sort_thread_groups., 2021-08-05) to one based on 27e61da56 (Merge branch 'stable', 2021-08-24) I've been experiencing some sporadic crashes. After getting coredumps enabled, it seems that there are a couple of different issues. I've mainly observed both of these when changing folders, I'm using exclusively IMAP folders and for the purposes here it's all on a single, local IMAP server. For the more troublesome one I get the following backtrace. Once this comes up it will keep crashing when I attempt to change to the same folder, at least in the short term. Although if I open this folder once with the old build then switch back to the new build the problem will go away for awhile. #0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:49 #1 0x7fb66d64a536 in __GI_abort () at abort.c:79 #2 0x7fb66d6a22b8 in __libc_message (action=action@entry=do_abort, fmt=fmt@entry=0x7fb66d7b03a4 "%s\n") at ../sysdeps/posix/libc_fatal.c:155 #3 0x7fb66d6a9d0a in malloc_printerr (str=str@entry=0x7fb66d7ae6a2 "realloc(): invalid next size") at malloc.c:5389 #4 0x7fb66d6adf8c in _int_realloc (av=av@entry=0x7fb66d7e2ba0 , oldp=oldp@entry=0x55e25f67af30, oldsize=oldsize@entry=569616, nb=569824) at malloc.c:4601 #5 0x7fb66d6af0e6 in __GI___libc_realloc (oldmem=0x55e25f67af40, bytes=569808) at malloc.c:3246 #6 0x55e25c20265f in safe_realloc (ptr=0x55e25e53b408, siz=569808) at lib.c:176 #7 0x55e25c1c2eb6 in mx_alloc_memory (ctx=0x55e25e53b3b0) at mx.c:1461 #8 0x55e25c251854 in imap_read_headers (idata=0x55e25e5f2800, msn_begin=71193, msn_end=71201, initial_download=1) at message.c:409 #9 0x55e25c24cae8 in imap_open_mailbox (ctx=0x55e25e53b3b0) at imap.c:997 #10 0x55e25c1c069b in mx_open_mailbox (path=0x55e25e43f310 "imaps://a...@pug.qqx.org/L/dev/git", flags=0, pctx=0x0) at mx.c:656 #11 0x55e25c18926a in mutt_index_menu () at curs_main.c:1433 #12 0x55e25c1b1dd9 in main (argc=1, argv=0x7fffbf9e51b8, environ=0x7fffbf9e51d0) at main.c:1380 The other problem seems to occur on line 431 of thread.c: →· !tmp->fake_thread &&→·→· /* don't match pseudo threads */ With this one, starting mutt again and immediately opening the same folder appears to work, although it seems likely to appear again soon after. I don't currently have a core file for this problem, since I wasn't expecting to be looking at two different problems I was just using the standard name `core` so the one from the first issue that I noted overwrote the ones from this issue. For all of this I have: set sort="threads" set sort_aux="date-received" set sort_thread_groups="last-date-received" I plan to continue looking for a fix myself, but if anyone else has ideas I'd be glad to hear them.