On Sun, Jul 20, 2025 at 7:41 PM Hongtao Liu <crazy...@gmail.com> wrote: > > On Thu, Jul 17, 2025 at 11:22 PM H.J. Lu <hjl.to...@gmail.com> wrote: > > > > For TLS calls: > > > > 1. UNSPEC_TLS_GD: > > > > (parallel [ > > (set (reg:DI 0 ax) > > (call:DI (mem:QI (symbol_ref:DI ("__tls_get_addr"))) > > (const_int 0 [0]))) > > (unspec:DI [(symbol_ref:DI ("e") [flags 0x50]) > > (reg/f:DI 7 sp)] UNSPEC_TLS_GD) > > (clobber (reg:DI 5 di))]) > > > > 2. UNSPEC_TLS_LD_BASE: > > > > (parallel [ > > (set (reg:DI 0 ax) > > (call:DI (mem:QI (symbol_ref:DI ("__tls_get_addr"))) > > (const_int 0 [0]))) > > (unspec:DI [(reg/f:DI 7 sp)] UNSPEC_TLS_LD_BASE)]) > > > > 3. UNSPEC_TLSDESC: > > > > (parallel [ > > (set (reg/f:DI 104) > > (plus:DI (unspec:DI [ > > (symbol_ref:DI ("_TLS_MODULE_BASE_") [flags 0x10]) > > (reg:DI 114) > > (reg/f:DI 7 sp)] UNSPEC_TLSDESC) > > (const:DI (unspec:DI [ > > (symbol_ref:DI ("e") [flags 0x1a]) > > ] UNSPEC_DTPOFF)))) > > (clobber (reg:CC 17 flags))]) > > > > (parallel [ > > (set (reg:DI 101) > > (unspec:DI [(symbol_ref:DI ("e") [flags 0x50]) > > (reg:DI 112) > > (reg/f:DI 7 sp)] UNSPEC_TLSDESC)) > > (clobber (reg:CC 17 flags))]) > > > > they return the same value for the same input value. But multiple calls > > with the same input value may be generated for simple programs like: > > > > void a(long *); > > int b(void); > > void c(void); > > static __thread long e; > > long > > d(void) > > { > > a(&e); > > if (b()) > > c(); > > return e; > > } > > > > When compiled with -O2 -fPIC -mtls-dialect=gnu2, the following codes are > > generated: > > > > .type d, @function > > d: > > .LFB0: > > .cfi_startproc > > pushq %rbx > > .cfi_def_cfa_offset 16 > > .cfi_offset 3, -16 > > leaq e@TLSDESC(%rip), %rbx > > movq %rbx, %rax > > call *e@TLSCALL(%rax) > > addq %fs:0, %rax > > movq %rax, %rdi > > call a@PLT > > call b@PLT > > testl %eax, %eax > > jne .L8 > > movq %rbx, %rax > > call *e@TLSCALL(%rax) > > popq %rbx > > .cfi_remember_state > > .cfi_def_cfa_offset 8 > > movq %fs:(%rax), %rax > > ret > > .p2align 4,,10 > > .p2align 3 > > .L8: > > .cfi_restore_state > > call c@PLT > > movq %rbx, %rax > > call *e@TLSCALL(%rax) > > popq %rbx > > .cfi_def_cfa_offset 8 > > movq %fs:(%rax), %rax > > ret > > .cfi_endproc > > > > There are 3 "call *e@TLSCALL(%rax)". They all return the same value. > > Rename the remove_redundant_vector pass to the x86_cse pass, for 64bit, > > extend it to also remove redundant TLS calls to generate: > > > > d: > > .LFB0: > > .cfi_startproc > > pushq %rbx > > .cfi_def_cfa_offset 16 > > .cfi_offset 3, -16 > > leaq e@TLSDESC(%rip), %rax > > movq %fs:0, %rdi > > call *e@TLSCALL(%rax) > > addq %rax, %rdi > > movq %rax, %rbx > > call a@PLT > > call b@PLT > > testl %eax, %eax > > jne .L8 > > movq %fs:(%rbx), %rax > > popq %rbx > > .cfi_remember_state > > .cfi_def_cfa_offset 8 > > ret > > .p2align 4,,10 > > .p2align 3 > > .L8: > > .cfi_restore_state > > call c@PLT > > movq %fs:(%rbx), %rax > > popq %rbx > > .cfi_def_cfa_offset 8 > > ret > > .cfi_endproc > > > > with only one "call *e@TLSCALL(%rax)". This reduces the number of > > __tls_get_addr calls in libgcc.a by 72%: > > > > __tls_get_addr calls before after > > libgcc.a 868 243 > > > > gcc/ > > > > PR target/81501 > > * config/i386/i386-features.cc (x86_cse_kind): Add X86_CSE_TLS_GD, > > X86_CSE_TLS_LD_BASE and X86_CSE_TLSDESC. > > (redundant_load): Renamed to ... > > (redundant_pattern): This. > > (replace_tls_call): New. > > (ix86_place_single_tls_call): Likewise. > > (pass_remove_redundant_vector_load): Renamed to ... > > (pass_x86_cse): This. Add val, def_insn, mode, scalar_mode, > > kind, candidate_kind, x86_cse, candidate_gnu_tls_p, > > candidate_gnu2_tls_p and candidate_vector_p. > > (pass_x86_cse::candidate_gnu_tls_p): New. > > (pass_x86_cse::candidate_gnu2_tls_p): Likewise. > Can we define insn attribute for those tls patterns, so that we can > just check the attribute instead of going through each rtx to check > for the pattern.
Fixed in the v2 patch. Thanks. > > (pass_x86_cse::candidate_vector_p): Likewise. > > (remove_redundant_vector_load): Renamed to ... > > (pass_x86_cse::x86_cse): This. Extend to remove redundant TLS > > calls. > > (make_pass_remove_redundant_vector_load): Renamed to ... > > (make_pass_x86_cse): This. > > (config/i386/i386-passes.def): Replace > > pass_remove_redundant_vector_load with pass_x86_cse. > > config/i386/i386-protos.h (ix86_tls_get_addr): New. > > (make_pass_remove_redundant_vector_load): Renamed to ... > > (make_pass_x86_cse): This. > > * config/i386/i386.cc (ix86_tls_get_addr): Remove static. > > * config/i386/i386.h (machine_function): Add > > tls_descriptor_call_multiple_p. > > * config/i386/i386.md (@tls_global_dynamic_64_<mode>): Set > > tls_descriptor_call_multiple_p. > > (@tls_local_dynamic_base_64_<mode>): Likewise. > > (@tls_dynamic_gnu2_64_<mode>): Likewise. > > (*tls_dynamic_gnu2_lea_64_<mode>): Renamed to ... > > (tls_dynamic_gnu2_lea_64_<mode>): This. > > (*tls_dynamic_gnu2_call_64_<mode>): Renamed to ... > > (tls_dynamic_gnu2_call_64_<mode>): This. > > (*tls_dynamic_gnu2_combine_64_<mode>): Renamed to ... > > (tls_dynamic_gnu2_combine_64_<mode>): This. > > > > > > -- > BR, > Hongtao -- H.J.