https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80881
--- Comment #81 from Uroš Bizjak <ubizjak at gmail dot com> ---
(In reply to Julian Waters from comment #75)
> Any feedback on the new patch?
I would propose you legitimize TLS address using get_thread_pointer (as is the
case with Eric's patch). Generic optimizers are then able to optimize the
access to the symbol and later rewrite the address to a TLS named address
space.
Please consider this testcase (very relevant on linux, I don't know about
Windows):
--cut here--
extern __thread int i[8];
int foo (void)
{
return i[2] + i[4];
}
--cut here--
Using get_thread_pointer, the above is expanded into:
(insn 5 2 6 2 (set (reg:DI 102)
(mem/u/c:DI (const:DI (unspec:DI [
(symbol_ref:DI ("i") [flags 0x60] <var_decl
0x7fbe95c10c60 i>)
] UNSPEC_GOTNTPOFF)) [2 S8 A8])) "tls.c":5:11 95
{*movdi_internal}
(nil))
(insn 6 5 7 2 (set (reg:DI 103)
(mem/u/c:DI (const:DI (unspec:DI [
(symbol_ref:DI ("i") [flags 0x60] <var_decl
0x7fbe95c10c60 i>)
] UNSPEC_GOTNTPOFF)) [2 S8 A8])) "tls.c":5:18 95
{*movdi_internal}
(nil))
(insn 7 6 8 2 (set (reg:SI 104)
(mem/c:SI (plus:DI (plus:DI (unspec:DI [
(const_int 0 [0])
] UNSPEC_TP)
(reg:DI 102))
(const_int 8 [0x8])) [1 i[2]+0 S4 A32])) "tls.c":5:15 96
{*movsi_internal}
(nil))
(insn 8 7 9 2 (set (reg:SI 105)
(mem/c:SI (plus:DI (plus:DI (unspec:DI [
(const_int 0 [0])
] UNSPEC_TP)
(reg:DI 103))
(const_int 16 [0x10])) [1 i[4]+0 S4 A32])) "tls.c":5:15 96
{*movsi_internal}
(nil))
(insn 9 8 10 2 (parallel [
(set (reg:SI 101 [ _4 ])
(plus:SI (reg:SI 104)
(reg:SI 105)))
(clobber (reg:CC 17 flags))
]) "tls.c":5:15 283 {*addsi_1}
(expr_list:REG_EQUAL (plus:SI (mem/c:SI (plus:DI (plus:DI (unspec:DI [
(const_int 0 [0])
] UNSPEC_TP)
(reg:DI 102))
(const_int 8 [0x8])) [1 i[2]+0 S4 A32])
(mem/c:SI (plus:DI (plus:DI (unspec:DI [
(const_int 0 [0])
] UNSPEC_TP)
(reg:DI 103))
(const_int 16 [0x10])) [1 i[4]+0 S4 A32]))
(nil)))
Please note how UNSPEC_TP forms legitimate address in (insn 9). Generic
optimizers optimize the above to the following RTX sequence:
(insn 5 2 7 2 (set (reg:DI 102)
(mem/u/c:DI (const:DI (unspec:DI [
(symbol_ref:DI ("i") [flags 0x60] <var_decl
0x7fbe95c10c60 i>)
] UNSPEC_GOTNTPOFF)) [2 S8 A8])) "tls.c":5:11 95
{*movdi_internal}
(nil))
(note 7 5 8 2 NOTE_INSN_DELETED)
(insn 8 7 9 2 (set (reg:SI 105 [ i[4] ])
(mem/c:SI (plus:DI (plus:DI (unspec:DI [
(const_int 0 [0])
] UNSPEC_TP)
(reg:DI 102))
(const_int 16 [0x10])) [1 i[4]+0 S4 A32])) "tls.c":5:15 96
{*movsi_internal}
(nil))
(insn 9 8 14 2 (parallel [
(set (reg:SI 101 [ _4 ])
(plus:SI (mem/c:SI (plus:DI (plus:DI (unspec:DI [
(const_int 0 [0])
] UNSPEC_TP)
(reg:DI 102))
(const_int 8 [0x8])) [1 i[2]+0 S4 A32])
(reg:SI 105 [ i[4] ])))
(clobber (reg:CC 17 flags))
]) "tls.c":5:15 283 {*addsi_1}
(expr_list:REG_DEAD (reg:DI 102)
(expr_list:REG_UNUSED (reg:CC 17 flags)
(expr_list:REG_DEAD (reg:SI 105 [ i[4] ])
(nil)))))
And the above sequence is later rewritten to use TLS named address space
(please note AS1 in the address):
(insn 5 2 7 2 (set (reg:DI 102)
(mem/u/c:DI (const:DI (unspec:DI [
(symbol_ref:DI ("i") [flags 0x60] <var_decl
0x7fbe95c10c60 i>)
] UNSPEC_GOTNTPOFF)) [2 S8 A8])) "tls.c":5:11 95
{*movdi_internal}
(nil))
(note 7 5 18 2 NOTE_INSN_DELETED)
(insn 18 7 19 2 (set (reg:SI 105 [ i[4] ])
(mem/c:SI (plus:DI (reg:DI 102)
(const_int 16 [0x10])) [1 i[4]+0 S4 A32 AS1])) "tls.c":5:15 -1
(nil))
(insn 19 18 14 2 (parallel [
(set (reg:SI 101 [ _4 ])
(plus:SI (mem/c:SI (plus:DI (reg:DI 102)
(const_int 8 [0x8])) [1 i[2]+0 S4 A32 AS1])
(reg:SI 105 [ i[4] ])))
(clobber (reg:CC 17 flags))
]) "tls.c":5:15 -1
(nil))
and this results in the optimal assembly:
foo:
movq i@gottpoff(%rip), %rdx
movl %fs:16(%rdx), %eax
addl %fs:8(%rdx), %eax
ret
BTW, adding -mno-tls-direct-seg-refs to compile flags (that avoids
optimizations with segment register in the address) results in:
foo:
movq %fs:0, %rcx
movq i@gottpoff(%rip), %rdx
movl 16(%rcx,%rdx), %eax
addl 8(%rcx,%rdx), %eax
ret