https://gcc.gnu.org/bugzilla/show_bug.cgi?id=125436

--- Comment #7 from Kevin Puetz <puetzk at puetzk dot org> ---
Created attachment 64543
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=64543&action=edit
g++ -g -shared -fPIC -O0 foo.cpp -o libfoo.so (reproduces with gcc 16.1.0)

The same one works if you just force it to use si instead of di. So I can
either randomly fiddle with making ms_foo complicated enough that ends up being
the register allocation (like it was in my original code, but that involved
wine and lots of cruft you don't really want), or could just brute-force it by
changing line foo.cpp:20 to use
https://gcc.gnu.org/onlinedocs/gcc/Local-Register-Variables.html. I hope that's
OK.

I.e. change line 20 of foo.cpp from
-int ret -1;
+register int ret asm("si") = -1;

And then it now has to be compiled at -O0 instead of -O1, so it actually uses
%esi all the way through optimization passes changing that initial allocation.
Also, at higher optimization levels, gcc 16 seems to like moving the call to
__tls_get_addr earlier (above the volatile load), which of course masks the
problem if the store to esi comes after the call.

But with the asm("si") and `g++ -g -shared -fPIC -O0 foo.cpp -o libfoo.so`, I
can reproduce corruption with gcc:16.1.0 from
https://hub.docker.com/_/gcc#supported-tags-and-respective-dockerfile-links

> ms_abi = 00000000
> sysv = 12345678
> ms_abi = 12345678
> ms_abi = 12345678

Using esi gets a different (and less random) than the corruption seen with edi,
since now what we're getting is the dtv as seen in update_get_addr. So it's
nullptr the first time and the second time we don't see corruption since
update_get_addr  doesn't have to run again. it would of course get more
complicated with more dlopen/dlclose activity.

But it does show *a* runtime malfunction, and the mechanism is the same
(__tls_get_addr clobbering registers that ms_abi would consider callee-save).

I don't have trunk handy except through godbolt, and godbolt doesn't give me a
way (that I know of) to get dlopen involved so that things have to pass through
__tls_get_addr_slow. __tls_get_addr itself doesn't stomp on much if it doesn't
fall off to the slow path... so that can show the wrong-code, but executing
there won't readily show a run-time symptom.

I also admit I don't specifically know how to produce run-time corruption
involving xmm*, I just can't find any promise on glibc's part not to let the
slow path (which needs to malloc and such) get into arbitrary user-defined
malloc replacements, or who knows what all. And sysv_abi in general considers
those volatile, so stomping on them breaks no rules, and I think it's the job
of the ms_abi -> sysv_abi boundry to consider them clobbered (like it does for
ordinary function calls).

Reply via email to