[Bug sanitizer/101476] AddressSanitizer check failed, points out a (potentially) non-existing stack error and pthread_cancel
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101476 Andrew Pinski changed: What|Removed |Added CC||dimitri at ouroboros dot rocks --- Comment #19 from Andrew Pinski --- *** Bug 109198 has been marked as a duplicate of this bug. ***
[Bug sanitizer/101476] AddressSanitizer check failed, points out a (potentially) non-existing stack error and pthread_cancel
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101476 --- Comment #18 from Stas Sergeev --- (In reply to Stas Sergeev from comment #5) > And its running on a stack previously > poisoned before pthread_cancel(). And the reason for that is because the glibc in use is the one not built with -fsanitize=address. When it calls its __do_cancel() which has attribute "noreturn", __asan_handle_noreturn() is not being called. Therefore the canceled thread remains with the poison below SP. I believe the glibc re-built with asan would not exhibit the crash. Note: all URLs above where I was pointing to the code, now either are a dead links or point to wrong lines. Its quite a shame that such a bug remains unfixed after a complete explanation was provided, but now that explanation is rotten...
[Bug sanitizer/101476] AddressSanitizer check failed, points out a (potentially) non-existing stack error and pthread_cancel
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101476 --- Comment #17 from Stas Sergeev --- I sent the small patch-set here: https://lore.kernel.org/lkml/20220126191441.3380389-1-st...@yandex.ru/ but it is so far ignored by kernel developers. Someone from this bugzilla should give me an Ack or Review, or this won't float.
[Bug sanitizer/101476] AddressSanitizer check failed, points out a (potentially) non-existing stack error and pthread_cancel
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101476 --- Comment #16 from Stas Sergeev --- I think I'll propose to apply something like this to linux kernel: diff --git a/kernel/signal.c b/kernel/signal.c index 6f3476dc7873..0549212a8dd6 100644 --- a/kernel/signal.c +++ b/kernel/signal.c @@ -4153,6 +4153,7 @@ do_sigaltstack (const stack_t *ss, stack_t *oss, unsigned long sp, if (ss_mode == SS_DISABLE) { ss_size = 0; ss_sp = NULL; + ss_flags = SS_DISABLE; } else { if (unlikely(ss_size < min_ss_size)) ret = -ENOMEM;
[Bug sanitizer/101476] AddressSanitizer check failed, points out a (potentially) non-existing stack error and pthread_cancel
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101476 --- Comment #15 from Stas Sergeev --- (In reply to Martin Liška from comment #14) > Please report to upstream as well. I'd like some guidance on how should that be addressed, because that will allow to specify the upstream. I am not entirely sure that linux is doing the right thing, and I am not sure man page even makes sense saying that: --- The old_ss.ss_flags may return either of the following values: SS_ONSTACK SS_DISABLE SS_AUTODISARM --- ... because what I see is the return of "SS_DISABLE|SS_AUTODISARM", which is what I write to flags for probing. This is cludgy. Does anyone know what fix should that get?
[Bug sanitizer/101476] AddressSanitizer check failed, points out a (potentially) non-existing stack error and pthread_cancel
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101476 --- Comment #14 from Martin Liška --- (In reply to Stas Sergeev from comment #13) > Found another problem. > https://github.com/gcc-mirror/gcc/blob/master/libsanitizer/asan/asan_posix. > cpp#L53 > The comment above that line talks about > SS_AUTODISARM, but the line itself does > not account for any flags. In a mean time, > linux returns SS_DISABLE in combination > with flags, like SS_AUTODISARM. So the > "!=" check should not be used. > > My app probes for SS_AUTODISARM by trying > to set it, and after that, asan breaks. > This is quite cludgy though. > Should the check be changed to > if (!(signal_stack.ss_flags & SS_DISABLE)) > or maybe linux should not return any flags > together with SS_DISABLE? > man page talks "strange things" on that subject. Please report to upstream as well.
[Bug sanitizer/101476] AddressSanitizer check failed, points out a (potentially) non-existing stack error and pthread_cancel
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101476 --- Comment #13 from Stas Sergeev --- Found another problem. https://github.com/gcc-mirror/gcc/blob/master/libsanitizer/asan/asan_posix.cpp#L53 The comment above that line talks about SS_AUTODISARM, but the line itself does not account for any flags. In a mean time, linux returns SS_DISABLE in combination with flags, like SS_AUTODISARM. So the "!=" check should not be used. My app probes for SS_AUTODISARM by trying to set it, and after that, asan breaks. This is quite cludgy though. Should the check be changed to if (!(signal_stack.ss_flags & SS_DISABLE)) or maybe linux should not return any flags together with SS_DISABLE? man page talks "strange things" on that subject.
[Bug sanitizer/101476] AddressSanitizer check failed, points out a (potentially) non-existing stack error
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101476 Andrew Pinski changed: What|Removed |Added CC||contino at epigenesys dot com --- Comment #12 from Andrew Pinski --- *** Bug 103978 has been marked as a duplicate of this bug. ***
[Bug sanitizer/101476] AddressSanitizer check failed, points out a (potentially) non-existing stack error
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101476 --- Comment #11 from Stas Sergeev --- The third bug here seems to be that __asan_handle_no_return: https://github.com/gcc-mirror/gcc/blob/master/libsanitizer/asan/asan_rtl.cpp#L602 also calls sigaltstack() before unpoisoning stacks. I believe this makes the problem much more reproducible, for example the test-case with longjmp() is likely possible too. I've found about that instance by trying to call __asan_handle_no_return() manually as a pthread cleanup handler, in a hope to work around the destructor bug. But it appears __asan_handle_no_return() does the same thing. So the fix should be to move this line: https://github.com/gcc-mirror/gcc/blob/master/libsanitizer/asan/asan_rtl.cpp#L607 above PlatformUnpoisonStacks() call.
[Bug sanitizer/101476] AddressSanitizer check failed, points out a (potentially) non-existing stack error
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101476 --- Comment #10 from Martin Liška --- (In reply to Stas Sergeev from comment #9) > (In reply to Martin Liška from comment #8) > > Please report the problem to upstream libsanitizer project: > > https://github.com/llvm/llvm-project/issues > > I already did: > https://github.com/google/sanitizers/issues/1171#issuecomment-1015913891 > But URL is different, should I also report > that to llvm-project? That location is fine, however, they have a duplicated bugzilla.
[Bug sanitizer/101476] AddressSanitizer check failed, points out a (potentially) non-existing stack error
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101476 --- Comment #9 from Stas Sergeev --- (In reply to Martin Liška from comment #8) > Please report the problem to upstream libsanitizer project: > https://github.com/llvm/llvm-project/issues I already did: https://github.com/google/sanitizers/issues/1171#issuecomment-1015913891 But URL is different, should I also report that to llvm-project?
[Bug sanitizer/101476] AddressSanitizer check failed, points out a (potentially) non-existing stack error
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101476 Martin Liška changed: What|Removed |Added Status|WAITING |NEW --- Comment #8 from Martin Liška --- Please report the problem to upstream libsanitizer project: https://github.com/llvm/llvm-project/issues
[Bug sanitizer/101476] AddressSanitizer check failed, points out a (potentially) non-existing stack error
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101476 --- Comment #7 from Stas Sergeev --- Created attachment 52221 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=52221&action=edit test case This is a reproducer for both problems. $ cc -Wall -o bug -ggdb3 -fsanitize=address bug.c -O1 to see the canary overwrite problem. $ cc -Wall -o bug -ggdb3 -fsanitize=address bug.c -O0 to see the poisoned stack after pthread_cancel() problem.
[Bug sanitizer/101476] AddressSanitizer check failed, points out a (potentially) non-existing stack error
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101476 --- Comment #6 from Stas Sergeev --- I think the fix (of at least 1 problem here) would be to move this line: https://code.woboq.org/gcc/libsanitizer/asan/asan_thread.cc.html#109 upwards, before this: https://code.woboq.org/gcc/libsanitizer/asan/asan_thread.cc.html#103 It will then unpoison stack before playing its sigaltstack games. But I don't know how to test that idea.
[Bug sanitizer/101476] AddressSanitizer check failed, points out a (potentially) non-existing stack error
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101476 --- Comment #5 from Stas Sergeev --- Another problem here seems to be that pthread_cancel() doesn't unpoison the cancelled thread's stack. This causes dtors to run on a randomly poisoned stack, depending on where the cancellation happened. That explains the "random" nature of a crash, and the fact that pthread_cancel() is in a test-case attached to that ticket, and in my program as well. So, the best diagnostic I can come up with, is that after pthread_cancel() we have this: --- #0 __sanitizer::UnsetAlternateSignalStack () at ../../../../libsanitizer/sanitizer_common/sanitizer_posix_libcdep.cpp:190 #1 0x77672f0d in __asan::AsanThread::Destroy (this=0x7358e000) at ../../../../libsanitizer/asan/asan_thread.cpp:104 #2 0x769d2c61 in __GI___nptl_deallocate_tsd () at nptl_deallocate_tsd.c:74 #3 __GI___nptl_deallocate_tsd () at nptl_deallocate_tsd.c:23 #4 0x769d5948 in start_thread (arg=) at pthread_create.c:446 #5 0x76a5a640 in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81 --- And its running on a stack previously poisoned before pthread_cancel(). Then it detects the access to poisoned area and is trying to do a stack trace. But that fails too because the redzone canary is overwritten. So all we get is a crash.
[Bug sanitizer/101476] AddressSanitizer check failed, points out a (potentially) non-existing stack error
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101476 --- Comment #4 from Stas Sergeev --- Thread 3 "X ev" hit Breakpoint 4, __sanitizer::UnsetAlternateSignalStack () at ../../../../libsanitizer/sanitizer_common/sanitizer_posix_libcdep.cpp:190 190 void UnsetAlternateSignalStack() { (gdb) n 194 altstack.ss_size = GetAltStackSize(); // Some sane value required on Darwin. (gdb) p /x $rsp $128 = 0x7fffee0a0ce0 (gdb) p &oldstack $129 = (stack_t *) 0x7fffee0a0d00 (gdb) p /x *(int *)0x7fffee0a0cc0 <== canary address $130 = 0x41b58ab3 (gdb) p 0x7fffee0a0ce0-0x7fffee0a0cc0 $132 = 32 Here we can see that before a call to GetAltStackSize(), rsp is 32 bytes above the lowest canary value. After the call, there is no more canary because 32 bytes are quickly overwritten by a call to getconf().
[Bug sanitizer/101476] AddressSanitizer check failed, points out a (potentially) non-existing stack error
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101476 --- Comment #3 from Stas Sergeev --- Why does it check for a redzone on a non-leaf function? GetAltStackSize() calls to a glibc's getconf and that overwrites a canary. Maybe it shouldn't use/check the redzone on a non-leaf function?
[Bug sanitizer/101476] AddressSanitizer check failed, points out a (potentially) non-existing stack error
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101476 Stas Sergeev changed: What|Removed |Added CC||stsp at users dot sourceforge.net --- Comment #2 from Stas Sergeev --- I have the very same crash with the multi-threaded app. The test-case from this ticket doesn't reproduce it for me either, but my app crashes nevertheless. So I debugged it a bit myself. gcc-11.2.1. The crash happens here: https://github.com/gcc-mirror/gcc/blob/master/libsanitizer/sanitizer_common/sanitizer_common_interceptors.inc#L10168 Here asan checks that sigaltstack() didn't corrupt anything while writing the "old setting" to "oss" ptr. Next, some check is later fails here: https://code.woboq.org/gcc/libsanitizer/asan/asan_thread.cc.html#340 Asan failed to find the canary value kCurrentStackFrameMagic. The search was done the following way: it walks the shadow stack down, and looks for the kAsanStackLeftRedzoneMagic to find the bottom of redzone. Then, at the bottom of redzone, it looks for the canary value. I checked that the lowest canary value is overwritten by the call to GetAltStackSize(). It uses SIGSTKSZ macro: https://code.woboq.org/llvm/compiler-rt/lib/sanitizer_common/sanitizer_posix_libcdep.cpp.html#170 which expands into a getconf() call, so eats up quite a lot. Now I am not entirely sure what conclusion can be derived out of that. I think that the culprit is probably here: https://code.woboq.org/gcc/libsanitizer/asan/asan_interceptors_memintrinsics.h.html#26 They say that they expect 16 bytes of a redzone, but it seems to be completely exhausted with all canaries overwritten. Does something of the above makes sense? This is the first time I am looking into an asan code.
[Bug sanitizer/101476] AddressSanitizer check failed, points out a (potentially) non-existing stack error
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101476 Martin Liška changed: What|Removed |Added Status|UNCONFIRMED |WAITING Ever confirmed|0 |1 Last reconfirmed||2021-07-22 --- Comment #1 from Martin Liška --- Cannot reproduce that with gcc version 10.3.1 20210707 [revision 048117e16c77f82598fca9af585500572d46ad73] (SUSE Linux) and gcc version 11.1.1 20210625 [revision 62bbb113ae68a7e724255e17143520735bcb9ec9] (SUSE Linux)