On Thu, Jul 17, 2025 at 11:22 PM H.J. Lu <hjl.to...@gmail.com> wrote: > > For TLS calls: > > 1. UNSPEC_TLS_GD: > > (parallel [ > (set (reg:DI 0 ax) > (call:DI (mem:QI (symbol_ref:DI ("__tls_get_addr"))) > (const_int 0 [0]))) > (unspec:DI [(symbol_ref:DI ("e") [flags 0x50]) > (reg/f:DI 7 sp)] UNSPEC_TLS_GD) > (clobber (reg:DI 5 di))]) > > 2. UNSPEC_TLS_LD_BASE: > > (parallel [ > (set (reg:DI 0 ax) > (call:DI (mem:QI (symbol_ref:DI ("__tls_get_addr"))) > (const_int 0 [0]))) > (unspec:DI [(reg/f:DI 7 sp)] UNSPEC_TLS_LD_BASE)]) > > 3. UNSPEC_TLSDESC: > > (parallel [ > (set (reg/f:DI 104) > (plus:DI (unspec:DI [ > (symbol_ref:DI ("_TLS_MODULE_BASE_") [flags 0x10]) > (reg:DI 114) > (reg/f:DI 7 sp)] UNSPEC_TLSDESC) > (const:DI (unspec:DI [ > (symbol_ref:DI ("e") [flags 0x1a]) > ] UNSPEC_DTPOFF)))) > (clobber (reg:CC 17 flags))]) > > (parallel [ > (set (reg:DI 101) > (unspec:DI [(symbol_ref:DI ("e") [flags 0x50]) > (reg:DI 112) > (reg/f:DI 7 sp)] UNSPEC_TLSDESC)) > (clobber (reg:CC 17 flags))]) > > they return the same value for the same input value. But multiple calls > with the same input value may be generated for simple programs like: > > void a(long *); > int b(void); > void c(void); > static __thread long e; > long > d(void) > { > a(&e); > if (b()) > c(); > return e; > } > > When compiled with -O2 -fPIC -mtls-dialect=gnu2, the following codes are > generated: > > .type d, @function > d: > .LFB0: > .cfi_startproc > pushq %rbx > .cfi_def_cfa_offset 16 > .cfi_offset 3, -16 > leaq e@TLSDESC(%rip), %rbx > movq %rbx, %rax > call *e@TLSCALL(%rax) > addq %fs:0, %rax > movq %rax, %rdi > call a@PLT > call b@PLT > testl %eax, %eax > jne .L8 > movq %rbx, %rax > call *e@TLSCALL(%rax) > popq %rbx > .cfi_remember_state > .cfi_def_cfa_offset 8 > movq %fs:(%rax), %rax > ret > .p2align 4,,10 > .p2align 3 > .L8: > .cfi_restore_state > call c@PLT > movq %rbx, %rax > call *e@TLSCALL(%rax) > popq %rbx > .cfi_def_cfa_offset 8 > movq %fs:(%rax), %rax > ret > .cfi_endproc > > There are 3 "call *e@TLSCALL(%rax)". They all return the same value. > Rename the remove_redundant_vector pass to the x86_cse pass, for 64bit, > extend it to also remove redundant TLS calls to generate: > > d: > .LFB0: > .cfi_startproc > pushq %rbx > .cfi_def_cfa_offset 16 > .cfi_offset 3, -16 > leaq e@TLSDESC(%rip), %rax > movq %fs:0, %rdi > call *e@TLSCALL(%rax) > addq %rax, %rdi > movq %rax, %rbx > call a@PLT > call b@PLT > testl %eax, %eax > jne .L8 > movq %fs:(%rbx), %rax > popq %rbx > .cfi_remember_state > .cfi_def_cfa_offset 8 > ret > .p2align 4,,10 > .p2align 3 > .L8: > .cfi_restore_state > call c@PLT > movq %fs:(%rbx), %rax > popq %rbx > .cfi_def_cfa_offset 8 > ret > .cfi_endproc > > with only one "call *e@TLSCALL(%rax)". This reduces the number of > __tls_get_addr calls in libgcc.a by 72%: > > __tls_get_addr calls before after > libgcc.a 868 243 > > gcc/ > > PR target/81501 > * config/i386/i386-features.cc (x86_cse_kind): Add X86_CSE_TLS_GD, > X86_CSE_TLS_LD_BASE and X86_CSE_TLSDESC. > (redundant_load): Renamed to ... > (redundant_pattern): This. > (replace_tls_call): New. > (ix86_place_single_tls_call): Likewise. > (pass_remove_redundant_vector_load): Renamed to ... > (pass_x86_cse): This. Add val, def_insn, mode, scalar_mode, > kind, candidate_kind, x86_cse, candidate_gnu_tls_p, > candidate_gnu2_tls_p and candidate_vector_p. > (pass_x86_cse::candidate_gnu_tls_p): New. > (pass_x86_cse::candidate_gnu2_tls_p): Likewise. Can we define insn attribute for those tls patterns, so that we can just check the attribute instead of going through each rtx to check for the pattern. > (pass_x86_cse::candidate_vector_p): Likewise. > (remove_redundant_vector_load): Renamed to ... > (pass_x86_cse::x86_cse): This. Extend to remove redundant TLS > calls. > (make_pass_remove_redundant_vector_load): Renamed to ... > (make_pass_x86_cse): This. > (config/i386/i386-passes.def): Replace > pass_remove_redundant_vector_load with pass_x86_cse. > config/i386/i386-protos.h (ix86_tls_get_addr): New. > (make_pass_remove_redundant_vector_load): Renamed to ... > (make_pass_x86_cse): This. > * config/i386/i386.cc (ix86_tls_get_addr): Remove static. > * config/i386/i386.h (machine_function): Add > tls_descriptor_call_multiple_p. > * config/i386/i386.md (@tls_global_dynamic_64_<mode>): Set > tls_descriptor_call_multiple_p. > (@tls_local_dynamic_base_64_<mode>): Likewise. > (@tls_dynamic_gnu2_64_<mode>): Likewise. > (*tls_dynamic_gnu2_lea_64_<mode>): Renamed to ... > (tls_dynamic_gnu2_lea_64_<mode>): This. > (*tls_dynamic_gnu2_call_64_<mode>): Renamed to ... > (tls_dynamic_gnu2_call_64_<mode>): This. > (*tls_dynamic_gnu2_combine_64_<mode>): Renamed to ... > (tls_dynamic_gnu2_combine_64_<mode>): This. >
-- BR, Hongtao