https://gcc.gnu.org/bugzilla/show_bug.cgi?id=125436
--- Comment #11 from H.J. Lu <hjl.tools at gmail dot com> --- (In reply to Kevin Puetz from comment #9) > Created attachment 64546 [details] > g++ -g -shared -fPIC -O1 foo.cpp -o libfoo.so (demonstrate corruption in gcc > 16.1.0) > > Ok, that was the easy way but I was afraid you'd say the asm-reg stuff was > sketchy. I did not have hard registers assigned like that in the original > case, I just had a lot of inlining and enough register complexity that it > eventually ended up using esi for something. But I lost that pretty early in > the attempts to minimize it, because esi is a long way down the list of > preferred registers. I assume that's since it's callee-preserve, so the > allocator would rather use something volatile it can just clobber. But after > enough tinkiner I came up with a structure to ynthetically create a lot of > simultaneously-live registers without it actually being complicated, and > where it mostly doesn't matter which variable(s) get clobbered. > > This one works with the same main.c, but its output is the opposite. I load > many copies of magic_number, xor-ing them all together so the result depends > on all the variables (making them all live). Since they are loaded from a > volatile source, the optimizer can't actually assume they are all the same, > and must store each one, creating enough pressure to that it eventually > allocates one of them (`h`) into esi. > > But we know they are all the same, so the correct answer is "0". An even > number of copies of the same value, xor'ed together, should cancel out. But > if/when esi gets clobbered with nullptr (the initial value of dtv in the > first call to __tls_get_addr_slow), that incorrectly zeros `h`, leaving an > odd number of intact registers, so the "incorrect" result is printing > `ms_abi = 123456768`, which shows through incomplete cancellation due to > clobbering `h`. > > I also added a pair of `double` values, which end up in xmm6 and 7, and > cancel those out using subtraction. I don't actually see those get > corrupted, but AFAIK there's no reason __tls_get_addr wouldn't be allowed > to; xmm6/7 are volatile in sysv_abi (but nonvolatile in ms_abi). > > The generated asm shows that neither ms_tls_access nor ms_foo have done > anything to preserve them before calling the sysv_abi function > __tls_get_addr, so the risk is still there. Unless there's a special promise > made by glibc somewhere that __tls_get_addr will never use xmm* (e.g. use a > malloc that uses an SSE memset). > > If anything in __tls_get_addr were to touch xmm6-15, that would make > ms_tls_access break its claimed calling convention by trashing registers > that are supposed to be callee-preserved. Please try users/hjl/pr124798/master branch at https://gitlab.com/x86-gcc/gcc/-/commits/users/hjl/pr124798/master commit 0ab019696bcb8503e290172c95ed8049e90a25e7 Author: H.J. Lu <[email protected]> Date: Tue Apr 14 18:37:20 2026 +0800 x86: Implement TARGET_FNTYPE_ABI fixes your test.
