Re: Thread-local storage issues arose again? Firefox frequently crashes on 10.0 aarch64
On 4/14/24 22:03, Taylor R Campbell wrote: Thanks for tracking this down -- can you file a PR with this information, plus `readelf -d libxul.so' output, so we have a place to track possible fixes and pullups? Filed as lib/58154. I put everything you requested in the PR.
Re: Thread-local storage issues arose again? Firefox frequently crashes on 10.0 aarch64
> Date: Sun, 14 Apr 2024 14:07:26 +0900 > From: PHO > > As I mentioned in > http://mail-index.netbsd.org/netbsd-users/2024/04/12/msg030915.html, > Firefox tab processes crash very frequently on NetBSD/aarch64 10.0. > Building it with PKG_OPTIONS.firefox+=debug-info revealed that when it > crashes it segfaults at one of these two places non-deterministically: Thanks for tracking this down -- can you file a PR with this information, plus `readelf -d libxul.so' output, so we have a place to track possible fixes and pullups? Can you also track down exactly what the definitions of `thread_local' and `THREAD_LOCAL' are in this code (maybe use cc -E to get the preprocessor output), just so we don't have to guess? Can you also grant me access to your binary package set, so I can examine the binaries and libraries in it? Can you also find out what firefox is dlopening, if anything? Can you also try building an ld.elf_so with -DDEBUG -DRTLD_DEBUG, and run firefox with LD_DEBUG=1 in the environment using this ld.elf_so, and skim through the output or share it with me?
Re: Thread-local storage issues arose again? Firefox frequently crashes on 10.0 aarch64
> Date: Sun, 14 Apr 2024 05:15:31 + > From: Emmanuel Dreyfus > > On Sun, Apr 14, 2024 at 02:07:26PM +0900, PHO wrote: > > As I mentioned in > > http://mail-index.netbsd.org/netbsd-users/2024/04/12/msg030915.html, Firefox > > tab processes crash very frequently on NetBSD/aarch64 10.0. Building it with > > PKG_OPTIONS.firefox+=debug-info revealed that when it crashes it segfaults > > at one of these two places non-deterministically: > > Not sure it is related, but it could be: I get random SISEGV when > bulk-building NetBSD-10.0/i386 packages on NetBSD-10.0/amd64 XEN3_DOMU. This is almost certainly unrelated to the issue on firefox/aarch64 which pho@ narrowed down to a thread-local storage matter; can you start a separate thread or PR for it, with diagnostic information?
Re: Thread-local storage issues arose again? Firefox frequently crashes on 10.0 aarch64
On Sun, Apr 14, 2024 at 05:15:31AM +, Emmanuel Dreyfus wrote: > On Sun, Apr 14, 2024 at 02:07:26PM +0900, PHO wrote: > > As I mentioned in > > http://mail-index.netbsd.org/netbsd-users/2024/04/12/msg030915.html, Firefox > > tab processes crash very frequently on NetBSD/aarch64 10.0. Building it with > > PKG_OPTIONS.firefox+=debug-info revealed that when it crashes it segfaults > > at one of these two places non-deterministically: > > Hello > > Not sure it is related, but it could be: I get random SISEGV when > bulk-building NetBSD-10.0/i386 packages on NetBSD-10.0/amd64 XEN3_DOMU. I also get theses SISEGV on bare-metal amd64 (it's a i9 with 20 cores) when running builds with high paralelism (packages or build.sh). And I also get these firefox crashes. As I've not yet seens the SISEGV on Xen yet I suspected a hardware issue, But if others are seeing it too a software bug becomes more likely -- Manuel Bouyer NetBSD: 26 ans d'experience feront toujours la difference --
Re: Thread-local storage issues arose again? Firefox frequently crashes on 10.0 aarch64
On Sun, Apr 14, 2024 at 02:07:26PM +0900, PHO wrote: > As I mentioned in > http://mail-index.netbsd.org/netbsd-users/2024/04/12/msg030915.html, Firefox > tab processes crash very frequently on NetBSD/aarch64 10.0. Building it with > PKG_OPTIONS.firefox+=debug-info revealed that when it crashes it segfaults > at one of these two places non-deterministically: Hello Not sure it is related, but it could be: I get random SISEGV when bulk-building NetBSD-10.0/i386 packages on NetBSD-10.0/amd64 XEN3_DOMU. -- Emmanuel Dreyfus m...@netbsd.org
Thread-local storage issues arose again? Firefox frequently crashes on 10.0 aarch64
Hi, As I mentioned in http://mail-index.netbsd.org/netbsd-users/2024/04/12/msg030915.html, Firefox tab processes crash very frequently on NetBSD/aarch64 10.0. Building it with PKG_OPTIONS.firefox+=debug-info revealed that when it crashes it segfaults at one of these two places non-deterministically: third_party/rlbox/include/rlbox_noop_sandbox.hpp: rlbox_noop_sandbox_thread_data* get_rlbox_noop_sandbox_thread_data(); # define RLBOX_NOOP_SANDBOX_STATIC_VARIABLES()\ thread_local rlbox::rlbox_noop_sandbox_thread_data \ rlbox_noop_sandbox_thread_info{ 0, 0 }; \ namespace rlbox { \ rlbox_noop_sandbox_thread_data* get_rlbox_noop_sandbox_thread_data() \ {\ return &rlbox_noop_sandbox_thread_info;\ }\ } \ static_assert(true, "Enforce semi-colon") > ... template auto impl_invoke_with_func_ptr(T_Converted* func_ptr, T_Args&&... params) { #ifdef RLBOX_EMBEDDER_PROVIDES_TLS_STATIC_VARIABLES auto& thread_data = *get_rlbox_noop_sandbox_thread_data(); #endif auto old_sandbox = thread_data.sandbox; // <-- CRASHES HERE! thread_data.sandbox = this; auto on_exit = detail::make_scope_exit([&] { thread_data.sandbox = old_sandbox; }); return (*func_ptr)(params...); } media/libjpeg/simd/arm/aarch64/jsimd.c: static THREAD_LOCAL unsigned int simd_support = ~0; JSIMD_FASTST3 | JSIMD_FASTTBL; > ... LOCAL(void) init_simd(void) { #ifndef NO_GETENV char env[2] = { 0 }; #endif #if defined(__linux__) || defined(ANDROID) || defined(__ANDROID__) int bufsize = 1024; /* an initial guess for the line buffer size limit */ #endif if (simd_support != ~0U) // <-- CRASHES HERE! return; simd_support = 0; So both of these cases involve TLS, that is, tab processes segfault while attempting to access thread-local variables. At run-time these functions reside in libxul.so, which is dlopen'ed by the main process. I recall there were a few issues in TLS handling in the past but riastradh@ fixed them before we branched 10.0, right? "readelf -r libxul.so" shows no R_AARCH64_TLS_TPR in its relocation table but only shows R_AARCH64_TLSDESC, so I believe these variables use local-dynamic model. I tried to create a minimal reproducer but it didn't crash: https://gist.github.com/depressed-pho/b6894fdaef94a1b9aa5459b1a2f65590 So I speculated that there were some kind of limit in the size of TLS blocks that dlopen(3) could sanely handle, and libxul.so exceeded it. As I mentioned in the previous mail, I modified /usr/pkg/bin/firefox based on this speculation: #!/bin/sh LD_PRELOAD=/usr/pkg/lib/firefox/libxul.so /usr/pkg/lib/firefox/firefox "$@" To my surprise this actually worked! Firefox hasn't crashed even once since this modification! Help, riastradh@, TLS is convoluted and I have nearly zero knowledge about this monstrosity!