https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54429
--- Comment #6 from Oleg Endo <olegendo at gcc dot gnu.org> --- A test case for this problem is gcc/testsuite/g++.dg/tls/thread_local-order1.C, which is compiled without optimizations and contains the following sequence: stc gbr,r1 mov.l .L20,r2 add r2,r1 lds r1,fpul fsts fpul,fr1 flds fr1,fpul sts fpul,r0 mov r14,r15 lds.l @r15+,pr mov.l @r15+,r14 rts nop what the code is actually doing: stc gbr,r1 mov.l .L20,r2 add r2,r1 mov r1,r0 mov r14,r15 lds.l @r15+,pr mov.l @r15+,r14 rts nop