[Bug other/55354] [asan] by default, the asan run-time should be linked statically, not dynamically
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55354 --- Comment #19 from Jakub Jelinek jakub at gcc dot gnu.org 2012-11-21 08:40:52 UTC --- (In reply to comment #18) For actual ThreadSanitizer runtime -fPIC -ftls-model=initial-exec causes degradation of generated code. Linker emits the same tls access code in all cases, but the compiler generates worse code. -fPIC -ftls-model=initial-exec is by definition almost equivalent to -fPIE, the only exceptions are: 1) -fPIE code is allowed to assume globally visible symbols aren't interposed 2) if TLS vars are defined locally (or hidden visibility), then local-exec model can be used instead of initial-exec (one less dereference) As for 2), I've explained already that by linking -fPIC code into the executable if the TLS var is defined in the executable, linker TLS transition transform all other TLS models (even global and local dynamic) into local-exec, just might result in some nops or for IE-LE setting of a register to an immediate and using that register as opposed to just using the immediate in the %fs: prefixed insn. And for 1), for the fast path, for any symbols on the fast path that shouldn't be interposeable and that are defined in libtsan, you should be able to just use visibility attributes and get the same effect. -fPIE flag simply isn't usable for a library that is to be used also by shared libraries. How do you link -fsanitize=thread shared libraries anyway? Just don't link libtsan in for -static-libtsan, and rely on the executable being linked against it? Such libraries will fail to link with -Wl,-z,defs ... Of course, having multiple tsan TLS roots in the same process isn't a good idea either (which is why I think we can't default to -static-libtsan).
[Bug other/55354] [asan] by default, the asan run-time should be linked statically, not dynamically
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55354 --- Comment #20 from Dmitry Vyukov dvyukov at google dot com 2012-11-21 09:04:07 UTC --- (In reply to comment #19) (In reply to comment #18) For actual ThreadSanitizer runtime -fPIC -ftls-model=initial-exec causes degradation of generated code. Linker emits the same tls access code in all cases, but the compiler generates worse code. -fPIC -ftls-model=initial-exec is by definition almost equivalent to -fPIE, the only exceptions are: 1) -fPIE code is allowed to assume globally visible symbols aren't interposed 2) if TLS vars are defined locally (or hidden visibility), then local-exec model can be used instead of initial-exec (one less dereference) What I see is that it also affect code generation (register allocation). Do we need to file a bug on that? As for 2), I've explained already that by linking -fPIC code into the executable if the TLS var is defined in the executable, linker TLS transition transform all other TLS models (even global and local dynamic) into local-exec, just might result in some nops or for IE-LE setting of a register to an immediate and using that register as opposed to just using the immediate in the %fs: prefixed insn. And for 1), for the fast path, for any symbols on the fast path that shouldn't be interposeable and that are defined in libtsan, you should be able to just use visibility attributes and get the same effect. -fPIE flag simply isn't usable for a library that is to be used also by shared libraries. How do you link -fsanitize=thread shared libraries anyway? Just don't link libtsan in for -static-libtsan, and rely on the executable being linked against it? Yes, we rely on the library being linked into the executable, because we want the runtime be linked statically. For dynamic libraries that are loaded into a non-instrumented executable (e.g. swig so preloaded into python process), we statically link the tsan runtime into the so. Such libraries will fail to link with -Wl,-z,defs ... Of course, having multiple tsan TLS roots in the same process isn't a good idea either (which is why I think we can't default to -static-libtsan).
[Bug other/55354] [asan] by default, the asan run-time should be linked statically, not dynamically
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55354 --- Comment #21 from Jakub Jelinek jakub at gcc dot gnu.org 2012-11-21 09:23:56 UTC --- (In reply to comment #20) What I see is that it also affect code generation (register allocation). Do we need to file a bug on that? If you see a code generation difference even with -ftls-model=local-exec -fPIC vs. -fPIE, then it must mean you don't have visibility attributes on the symbols used in the fast path. For initial-exec, the RA effects should be minimal, the TLS offset load from got is usually very close to the actual TLS memory load (or lea), and thus it will just pick up some short lived scratch register. Generally in GCC, -fPIE sets flag_pic and not flag_shlib, while -fPIC sets flag_pic and flag_shlib. flag_pic is about whether position independent code needs to be generated, flag_shlib is about whether locally defined symbols can be interposed (plus it affects TLS model default choice). For dynamic libraries that are loaded into a non-instrumented executable (e.g. swig so preloaded into python process), we statically link the tsan runtime into the so. And you don't get linker errors from that? That must be by pure luck.
[Bug other/55354] [asan] by default, the asan run-time should be linked statically, not dynamically
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55354 --- Comment #18 from Dmitry Vyukov dvyukov at google dot com 2012-11-21 07:45:20 UTC --- (In reply to comment #17) When building libtsan as a shared library (for which I had to hack our assembly blobs a bit) we get two sources of slowdown: 1. __tsan_read8 and friends are called through PLT 2. __tsan_read8 and friends use one extra load to get to TLS I bet 9.5% or more of that is due to the PLT call. That's not the overhead you are looking for, Luke. We currently compile with -fPIC and link statically, linker inserts only 1 memory dereference in this case. However, -fPIC affects code generation in compiler, it has to reserve more registers for tls access code and has to allocate stack frame because of the potential call. Only that causes *20%* slowdown on a real application (not a synthetic benchmark). Kostya, to evaluate initial-exec you need to insure that code characteristics of __tsan_read/write are not affected, i.e. 0 stack spills and analyze script passes. Everything else we have w/o initial-exec. For actual ThreadSanitizer runtime -fPIC -ftls-model=initial-exec causes degradation of generated code. Linker emits the same tls access code in all cases, but the compiler generates worse code. The table below show various stats about generated code for the hot functions. We are mostly interested in stack access instructions (rsp/push/pop): gcc -fPIE write1 tot 325; size 1316; rsp 1; push 0; pop 0; call 2; load 16; store 5; sh 52; mov 68; lea 6; cmp 46 write2 tot 326; size 1340; rsp 1; push 0; pop 0; call 2; load 16; store 5; sh 52; mov 68; lea 6; cmp 46 write4 tot 325; size 1316; rsp 1; push 0; pop 0; call 2; load 16; store 5; sh 52; mov 68; lea 6; cmp 46 write8 tot 325; size 1316; rsp 1; push 0; pop 0; call 2; load 16; store 5; sh 52; mov 68; lea 6; cmp 46 read1 tot 355; size 1476; rsp 1; push 0; pop 0; call 2; load 17; store 5; sh 52; mov 71; lea 6; cmp 52 read2 tot 344; size 1428; rsp 1; push 0; pop 0; call 2; load 16; store 5; sh 52; mov 71; lea 6; cmp 51 read4 tot 344; size 1436; rsp 1; push 0; pop 0; call 2; load 16; store 5; sh 52; mov 71; lea 6; cmp 51 read8 tot 344; size 1436; rsp 1; push 0; pop 0; call 2; load 16; store 5; sh 52; mov 71; lea 6; cmp 51 func_entry tot 28; size 116; rsp 0; push 0; pop 0; call 1; load 3; store 1; sh 2; mov 8; lea 0; cmp 1 func_exit tot 25; size 100; rsp 0; push 0; pop 0; call 1; load 1; store 0; sh 2; mov 5; lea 0; cmp 1 gcc -fPIC -ftls-model=initial-exec write1 tot 323; size 1268; rsp 1; push 1; pop 5; call 2; load 17; store 7; sh 52; mov 64; lea 6; cmp 46 write2 tot 321; size 1275; rsp 1; push 1; pop 5; call 2; load 17; store 7; sh 52; mov 64; lea 6; cmp 46 write4 tot 323; size 1268; rsp 1; push 1; pop 5; call 2; load 17; store 7; sh 52; mov 64; lea 6; cmp 46 write8 tot 323; size 1268; rsp 1; push 1; pop 5; call 2; load 17; store 7; sh 52; mov 64; lea 6; cmp 46 read1 tot 342; size 1380; rsp 1; push 1; pop 4; call 2; load 18; store 7; sh 52; mov 67; lea 6; cmp 52 read2 tot 331; size 1331; rsp 1; push 1; pop 3; call 2; load 17; store 7; sh 52; mov 67; lea 6; cmp 51 read4 tot 334; size 1356; rsp 1; push 1; pop 3; call 2; load 17; store 7; sh 52; mov 67; lea 6; cmp 51 read8 tot 334; size 1356; rsp 1; push 1; pop 3; call 2; load 17; store 7; sh 52; mov 67; lea 6; cmp 51 func_entry tot 7; size 24; rsp 0; push 0; pop 0; call 0; load 0; store 1; sh 0; mov 2; lea 0; cmp 0 func_exit tot 6; size 21; rsp 0; push 0; pop 0; call 0; load 0; store 1; sh 0; mov 1; lea 0; cmp 0 gcc -fPIC write1 tot 379; size 1571; rsp 23; push 0; pop 0; call 2; load 25; store 20; sh 52; mov 100; lea 6; cmp 42 write2 tot 383; size 1603; rsp 23; push 0; pop 0; call 2; load 25; store 20; sh 52; mov 100; lea 6; cmp 42 write4 tot 379; size 1571; rsp 23; push 0; pop 0; call 2; load 25; store 20; sh 52; mov 100; lea 6; cmp 42 write8 tot 379; size 1571; rsp 23; push 0; pop 0; call 2; load 25; store 20; sh 52; mov 100; lea 6; cmp 42 read1 tot 402; size 1715; rsp 23; push 0; pop 0; call 2; load 26; store 20; sh 52; mov 103; lea 6; cmp 48 read2 tot 393; size 1659; rsp 23; push 0; pop 0; call 2; load 25; store 20; sh 52; mov 103; lea 6; cmp 47 read4 tot 391; size 1659; rsp 23; push 0; pop 0; call 2; load 25; store 20; sh 52; mov 103; lea 6; cmp 47 read8 tot 391; size 1659; rsp 23; push 0; pop 0; call 2; load 25; store 20; sh 52; mov 103; lea 6; cmp 47 func_entry tot 9; size 32; rsp 0; push 1; pop 1; call 0; load 0; store 0; sh 0; mov 4; lea 0; cmp 0 func_exit tot 7; size 32; rsp 0; push 0; pop 0; call 0; load 0; store 0; sh
[Bug other/55354] [asan] by default, the asan run-time should be linked statically, not dynamically
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55354 --- Comment #14 from Jakub Jelinek jakub at gcc dot gnu.org 2012-11-19 08:54:47 UTC --- I bet 9.5% or more of that is due to the PLT call. The thing is, even when you have initial-exec TLS model code, if you link it into an executable and the referenced TLS variable is in the executable, the linker TLS transitions optimization changes the IE model into LE model, so instead of something like: mov0x2009a9(%rip),%rax mov%fs:(%rax),%eax you'll end up with mov$-4,%rax mov%fs:(%rax),%eax or so (compared to mov%fs:-4,%eax if it was local-exec model from the beginning). Given the amount of code in __tsan_read8, I seriously doubt it is noticeable. So, please compare libtsan built with -fPIC -ftls-model=initial-exec with libtsan built with -fPIE (-ftls-model=local-exec). The former will not require any special hacks and will work just fine even with shared libraries, the latter won't.
[Bug other/55354] [asan] by default, the asan run-time should be linked statically, not dynamically
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55354 --- Comment #15 from Konstantin Serebryany konstantin.s.serebryany at gmail dot com 2012-11-19 09:03:35 UTC --- You are right that -fPIC -ftls-model=initial-exec does not affect performance if we link libtsan statically (I checked). As you say, the linker nukes one of the loads. But if we link libtsan.so dynamically, we still get both sources of overhead.
[Bug other/55354] [asan] by default, the asan run-time should be linked statically, not dynamically
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55354 --- Comment #16 from Konstantin Serebryany konstantin.s.serebryany at gmail dot com 2012-11-19 09:06:26 UTC --- So, using -fPIC -ftls-model=initial-exec is a great idea, it will allow to build the files once and have both static and dynamic library. But we need to agree that the dynamic library is noticeably slower.
[Bug other/55354] [asan] by default, the asan run-time should be linked statically, not dynamically
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55354 --- Comment #17 from Dmitry Vyukov dvyukov at google dot com 2012-11-19 10:53:04 UTC --- When building libtsan as a shared library (for which I had to hack our assembly blobs a bit) we get two sources of slowdown: 1. __tsan_read8 and friends are called through PLT 2. __tsan_read8 and friends use one extra load to get to TLS I bet 9.5% or more of that is due to the PLT call. That's not the overhead you are looking for, Luke. We currently compile with -fPIC and link statically, linker inserts only 1 memory dereference in this case. However, -fPIC affects code generation in compiler, it has to reserve more registers for tls access code and has to allocate stack frame because of the potential call. Only that causes *20%* slowdown on a real application (not a synthetic benchmark). Kostya, to evaluate initial-exec you need to insure that code characteristics of __tsan_read/write are not affected, i.e. 0 stack spills and analyze script passes. Everything else we have w/o initial-exec.
[Bug other/55354] [asan] by default, the asan run-time should be linked statically, not dynamically
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55354 --- Comment #9 from Konstantin Serebryany konstantin.s.serebryany at gmail dot com 2012-11-18 19:35:43 UTC --- As dvyuokv@ pointed out, -ftls-model=initial-exec improves the situation, but does not fully help. Experiment: % cat x.c __thread int a; int foo() { return a; } HORRIBLE: -fPIC -shared % gcc x.c -O2 -fPIC -shared -o x.so ; objdump -d x.so | grep foo.: -A 5 06e0 foo: 6e0: 66 48 8d 3d f0 08 20lea0x2008f0(%rip),%rdi# 200fd8 _DYNAMIC+0x1b8 6e7: 00 6e8: 66 66 48 e8 10 ff ffcallq 600 __tls_get_addr@plt 6ef: ff 6f0: 8b 00 mov(%rax),%eax NOT-SO-BAD: -fPIC -shared -ftls-model=initial-exec % gcc x.c -O2 -fPIC -shared -o x.so -ftls-model=initial-exec ; objdump -d x.so | grep foo.: -A 5 0630 foo: 630: 48 8b 05 a9 09 20 00mov0x2009a9(%rip),%rax# 200fe0 _DYNAMIC+0x1b8 637: 64 8b 00mov%fs:(%rax),%eax 63a: c3 retq GOOD: -fPIE % gcc -c x.c -O2 -fPIE -o x.o ; objdump -d x.o | grep foo.: -A 5 foo: 0: 64 8b 04 25 00 00 00mov%fs:0x0,%eax 7: 00 8: c3 retq So, while -ftls-model=initial-exec improves the TLS performance, it is still 2x slower than -fPIE. For tsan, which does this for *every* memory access in the original program, this will cost 5%-10% slowdown. For our users this is a big deal, so they will link the static library whenever possible. Which default is used in gcc -- I don't care that much.
[Bug other/55354] [asan] by default, the asan run-time should be linked statically, not dynamically
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55354 --- Comment #10 from Jakub Jelinek jakub at gcc dot gnu.org 2012-11-18 19:54:37 UTC --- (In reply to comment #9) NOT-SO-BAD: -fPIC -shared -ftls-model=initial-exec % gcc x.c -O2 -fPIC -shared -o x.so -ftls-model=initial-exec ; objdump -d x.so | grep foo.: -A 5 0630 foo: 630: 48 8b 05 a9 09 20 00mov0x2009a9(%rip),%rax# 200fe0 _DYNAMIC+0x1b8 637: 64 8b 00mov%fs:(%rax),%eax 63a: c3 retq GOOD: -fPIE % gcc -c x.c -O2 -fPIE -o x.o ; objdump -d x.o | grep foo.: -A 5 foo: 0: 64 8b 04 25 00 00 00mov%fs:0x0,%eax 7: 00 8: c3 retq So, while -ftls-model=initial-exec improves the TLS performance, it is still 2x slower than -fPIE. Except obviously you can't use the last code sequence if you want to link it into a shared library. The extra indirection is the standard cost of relocatable code, especially if there are just a few TLS vars in libtsan and they are accessed a lot, that memory (the .got section entry) is in caches most likely and so the indirection can be just a cycle or at most a few of them. No idea how would you plan to compile libtsan with -fPIE flag, for libtsan.so.0 you obviously can't, it would fail to link or load, and for libtsan.a it would make the shared library only usable in executables or PIEs, not from shared libraries.
[Bug other/55354] [asan] by default, the asan run-time should be linked statically, not dynamically
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55354 --- Comment #11 from Konstantin Serebryany konstantin.s.serebryany at gmail dot com 2012-11-18 19:59:42 UTC --- The above comment is correct. -fPIE is only applicable if we build libtsan.a and link it statically to the pie executable. This mode however, works pretty well and most our users pay this price for performance. On every memory access tsan touches a few (two or three) extra cache lines. Not using -fPIE makes it touch one extra cache line. Even if that line is in L1, it still has a non-zero cost. We will try to provide benchmark numbers next week.
[Bug other/55354] [asan] by default, the asan run-time should be linked statically, not dynamically
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55354 --- Comment #12 from Jakub Jelinek jakub at gcc dot gnu.org 2012-11-18 20:09:39 UTC --- That would effectively require building libtsan as libtsan.so.0, libtsan.a (both -fPIC built) and libtsan_pie.a (-fPIE built), where the gcc driver would do: %{static-libtsan:!shared:-ltsan_pie;:-ltsan} or so. Or there could be even libtsan_nonpic.a for !shared:!pie, of course everything would need to be done only given appropriate benchmarks of real-world programs.
[Bug other/55354] [asan] by default, the asan run-time should be linked statically, not dynamically
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55354 --- Comment #13 from Konstantin Serebryany konstantin.s.serebryany at gmail dot com 2012-11-19 04:13:23 UTC --- of course everything would need to be done only given appropriate benchmarks of real-world programs. We have a synthetic benchmark which perfectly reflects the only major hot spot in tsan: the set of functions __tsan_{read,write}{1,2,4,8} that are called on every memory access. When building libtsan as a shared library (for which I had to hack our assembly blobs a bit) we get two sources of slowdown: 1. __tsan_read8 and friends are called through PLT 2. __tsan_read8 and friends use one extra load to get to TLS The result is 10% slowdown.
[Bug other/55354] [asan] by default, the asan run-time should be linked statically, not dynamically
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55354 --- Comment #7 from H.J. Lu hjl.tools at gmail dot com 2012-11-17 20:35:57 UTC --- (In reply to comment #6) Answering my own question: we can get static linking with -Wl,-Bstatic -lasan -Wl,-Bdynamic -ldl -lpthread The -static-libasan option was added by http://gcc.gnu.org/ml/gcc-cvs/2012-11/msg00536.html
[Bug other/55354] [asan] by default, the asan run-time should be linked statically, not dynamically
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55354 Markus Trippelsdorf markus at trippelsdorf dot de changed: What|Removed |Added CC||markus at trippelsdorf dot ||de --- Comment #8 from Markus Trippelsdorf markus at trippelsdorf dot de 2012-11-17 21:08:21 UTC --- (In reply to comment #7) (In reply to comment #6) Answering my own question: we can get static linking with -Wl,-Bstatic -lasan -Wl,-Bdynamic -ldl -lpthread The -static-libasan option was added by http://gcc.gnu.org/ml/gcc-cvs/2012-11/msg00536.html Can you please add -ldl -lpthread to it? Otherwise the user has to set this by hand: markus@x4 ~ % g++ -faddress-sanitizer -static-libasan -g test.cpp /home/markus/gcc/libsanitizer/sanitizer_common/sanitizer_linux.cc:135: error: undefined reference to 'pthread_getattr_np' /home/markus/gcc/libsanitizer/sanitizer_common/sanitizer_linux.cc:138: error: undefined reference to 'pthread_attr_getstack' /home/markus/gcc/libsanitizer/asan/asan_posix.cc:103: error: undefined reference to 'pthread_key_create' /home/markus/gcc/libsanitizer/asan/asan_posix.cc:108: error: undefined reference to 'pthread_getspecific' /home/markus/gcc/libsanitizer/asan/asan_posix.cc:113: error: undefined reference to 'pthread_setspecific' /home/markus/gcc/libsanitizer/interception/interception_linux.cc:22: error: undefined reference to 'dlsym' collect2: error: ld returned 1 exit status
[Bug other/55354] [asan] by default, the asan run-time should be linked statically, not dynamically
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55354 Andrew Pinski pinskia at gcc dot gnu.org changed: What|Removed |Added Version|unknown |4.8.0 --- Comment #1 from Andrew Pinski pinskia at gcc dot gnu.org 2012-11-16 17:15:31 UTC --- - it matches clang behavior But it is inconsistent with the rest of the target libraries of GCC. - causes less confusion for users (where is my libasan.so???) So this is a well documented issue already and easy fixed. - better for tsan performance (we'll need to link tsan statically too) Not much better performance.
[Bug other/55354] [asan] by default, the asan run-time should be linked statically, not dynamically
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55354 --- Comment #2 from Dmitry Vyukov dvyukov at google dot com 2012-11-16 17:20:43 UTC --- Not much better performance. Sole -fPIE vs -fPIC gives us 20% speedup on real programs. Indirect call will add another 10%.
[Bug other/55354] [asan] by default, the asan run-time should be linked statically, not dynamically
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55354 Jakub Jelinek jakub at gcc dot gnu.org changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED CC||jakub at gcc dot gnu.org Resolution||INVALID --- Comment #3 from Jakub Jelinek jakub at gcc dot gnu.org 2012-11-16 17:22:03 UTC --- This has been discussed already. And Diego agreed Google can have a different default in google branches if desirable for that kind of usage scenarios, but otherwise it is undesirable.
[Bug other/55354] [asan] by default, the asan run-time should be linked statically, not dynamically
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55354 --- Comment #4 from Konstantin Serebryany konstantin.s.serebryany at gmail dot com 2012-11-16 20:28:34 UTC --- You have been warned (especially about tsan performance. tsan run-time heavily depends on TLS, and TLS is much slower with -fPIC than with -fPIE). Do we have a flag today which will cause libasan.a to be linked statically, while not forcing anything else to link statically?
[Bug other/55354] [asan] by default, the asan run-time should be linked statically, not dynamically
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55354 --- Comment #5 from Jakub Jelinek jakub at gcc dot gnu.org 2012-11-16 20:46:35 UTC --- For TLS, you can just use -ftls-model=initial-exec or __attribute__((tls_model (initial-exec))). libasan from what I can see doesn't use TLS at all, and you really can't just have -fno-pic libasan.a only anyway, any time you want to instrument a shared library, if you linked non-pic libasan.a into it, it wouldn't link on many architectures and on others would fail SELinux restrictions. If IE TLS mode is used, there is no problem if an app is linked against it or if libraries it depends on are linked against it, there could be a problem if its TLS usage is too large and app isn't linked against the library, and you only dlopen some -fsanitize=address compiled/linked shared library. Then dlopen could fail (unless you e.g. LD_PRELOAD=libasan.so.0 or otherwise make sure the app is linked against it). Other GCC shared libraries (e.g. libgomp.so.1) are also using the IE model.
[Bug other/55354] [asan] by default, the asan run-time should be linked statically, not dynamically
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55354 --- Comment #6 from Konstantin Serebryany konstantin.s.serebryany at gmail dot com 2012-11-16 20:54:40 UTC --- Answering my own question: we can get static linking with -Wl,-Bstatic -lasan -Wl,-Bdynamic -ldl -lpthread For TLS, you can just use -ftls-model=initial-exec I did not know, thanks. Sounds like this could be a good solution for tsan. Will check.