https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82803
Yann Droneaud <yann at droneaud dot fr> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |yann at droneaud dot fr --- Comment #8 from Yann Droneaud <yann at droneaud dot fr> --- Created attachment 46903 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=46903&action=edit An artificial test case for gcc to emit 17 calls to __tls_get_addr() Using Thread Local Storage (TLS) is a pain: the issue reported here still apply on latest GCC. I've code such as static struct state *state(void) __attribute__((pure)); static struct state *state(void) { static __thread struct state s; return &s; } int do(void) { struct state * const s = state(); int res; /* do something */ return res; } Once compiled, code for my real function contains 6 calls to __tls_get_addr(). Which is far more than expected. And far more than necessary. Clang compile the same code and emit a single call to __tls_get_addr(). Both on Linux amd64, -O3 -fPIC. The attached testcase is an example which is designed to trigger 17 calls to __tls_get_addr(). As you will see, there's about one per conditional + function call pair. Once again, clang is able to emit code with a single call to __tls_get_addr(). You can check for yourself: https://godbolt.org/z/QVGjka