On Thu, Jul 17, 2025 at 11:22 PM H.J. Lu <hjl.to...@gmail.com> wrote:
>
> For TLS calls:
>
> 1. UNSPEC_TLS_GD:
>
>   (parallel [
>     (set (reg:DI 0 ax)
>          (call:DI (mem:QI (symbol_ref:DI ("__tls_get_addr")))
>                   (const_int 0 [0])))
>     (unspec:DI [(symbol_ref:DI ("e") [flags 0x50])
>                 (reg/f:DI 7 sp)] UNSPEC_TLS_GD)
>     (clobber (reg:DI 5 di))])
>
> 2. UNSPEC_TLS_LD_BASE:
>
>   (parallel [
>     (set (reg:DI 0 ax)
>          (call:DI (mem:QI (symbol_ref:DI ("__tls_get_addr")))
>                   (const_int 0 [0])))
>     (unspec:DI [(reg/f:DI 7 sp)] UNSPEC_TLS_LD_BASE)])
>
> 3. UNSPEC_TLSDESC:
>
>   (parallel [
>      (set (reg/f:DI 104)
>            (plus:DI (unspec:DI [
>                        (symbol_ref:DI ("_TLS_MODULE_BASE_") [flags 0x10])
>                        (reg:DI 114)
>                        (reg/f:DI 7 sp)] UNSPEC_TLSDESC)
>                     (const:DI (unspec:DI [
>                                  (symbol_ref:DI ("e") [flags 0x1a])
>                               ] UNSPEC_DTPOFF))))
>      (clobber (reg:CC 17 flags))])
>
>   (parallel [
>     (set (reg:DI 101)
>          (unspec:DI [(symbol_ref:DI ("e") [flags 0x50])
>                      (reg:DI 112)
>                      (reg/f:DI 7 sp)] UNSPEC_TLSDESC))
>     (clobber (reg:CC 17 flags))])
>
> they return the same value for the same input value.  But multiple calls
> with the same input value may be generated for simple programs like:
>
> void a(long *);
> int b(void);
> void c(void);
> static __thread long e;
> long
> d(void)
> {
>   a(&e);
>   if (b())
>     c();
>   return e;
> }
>
> When compiled with -O2 -fPIC -mtls-dialect=gnu2, the following codes are
> generated:
>
>         .type   d, @function
> d:
> .LFB0:
>         .cfi_startproc
>         pushq   %rbx
>         .cfi_def_cfa_offset 16
>         .cfi_offset 3, -16
>         leaq    e@TLSDESC(%rip), %rbx
>         movq    %rbx, %rax
>         call    *e@TLSCALL(%rax)
>         addq    %fs:0, %rax
>         movq    %rax, %rdi
>         call    a@PLT
>         call    b@PLT
>         testl   %eax, %eax
>         jne     .L8
>         movq    %rbx, %rax
>         call    *e@TLSCALL(%rax)
>         popq    %rbx
>         .cfi_remember_state
>         .cfi_def_cfa_offset 8
>         movq    %fs:(%rax), %rax
>         ret
>         .p2align 4,,10
>         .p2align 3
> .L8:
>         .cfi_restore_state
>         call    c@PLT
>         movq    %rbx, %rax
>         call    *e@TLSCALL(%rax)
>         popq    %rbx
>         .cfi_def_cfa_offset 8
>         movq    %fs:(%rax), %rax
>         ret
>         .cfi_endproc
>
> There are 3 "call *e@TLSCALL(%rax)".  They all return the same value.
> Rename the remove_redundant_vector pass to the x86_cse pass, for 64bit,
> extend it to also remove redundant TLS calls to generate:
>
> d:
> .LFB0:
>         .cfi_startproc
>         pushq   %rbx
>         .cfi_def_cfa_offset 16
>         .cfi_offset 3, -16
>         leaq    e@TLSDESC(%rip), %rax
>         movq    %fs:0, %rdi
>         call    *e@TLSCALL(%rax)
>         addq    %rax, %rdi
>         movq    %rax, %rbx
>         call    a@PLT
>         call    b@PLT
>         testl   %eax, %eax
>         jne     .L8
>         movq    %fs:(%rbx), %rax
>         popq    %rbx
>         .cfi_remember_state
>         .cfi_def_cfa_offset 8
>         ret
>         .p2align 4,,10
>         .p2align 3
> .L8:
>         .cfi_restore_state
>         call    c@PLT
>         movq    %fs:(%rbx), %rax
>         popq    %rbx
>         .cfi_def_cfa_offset 8
>         ret
>         .cfi_endproc
>
> with only one "call *e@TLSCALL(%rax)".  This reduces the number of
> __tls_get_addr calls in libgcc.a by 72%:
>
> __tls_get_addr calls     before         after
> libgcc.a                 868            243
>
> gcc/
>
>         PR target/81501
>         * config/i386/i386-features.cc (x86_cse_kind): Add X86_CSE_TLS_GD,
>         X86_CSE_TLS_LD_BASE and X86_CSE_TLSDESC.
>         (redundant_load): Renamed to ...
>         (redundant_pattern): This.
>         (replace_tls_call): New.
>         (ix86_place_single_tls_call): Likewise.
>         (pass_remove_redundant_vector_load): Renamed to ...
>         (pass_x86_cse): This.  Add val, def_insn, mode, scalar_mode,
>         kind, candidate_kind, x86_cse, candidate_gnu_tls_p,
>         candidate_gnu2_tls_p and candidate_vector_p.
>         (pass_x86_cse::candidate_gnu_tls_p): New.
>         (pass_x86_cse::candidate_gnu2_tls_p): Likewise.
Can we define insn attribute for those tls patterns, so that we can
just check the attribute instead of going through each rtx to check
for the pattern.
>         (pass_x86_cse::candidate_vector_p): Likewise.
>         (remove_redundant_vector_load): Renamed to ...
>         (pass_x86_cse::x86_cse): This.  Extend to remove redundant TLS
>         calls.
>         (make_pass_remove_redundant_vector_load): Renamed to ...
>         (make_pass_x86_cse): This.
>         (config/i386/i386-passes.def): Replace
>         pass_remove_redundant_vector_load with pass_x86_cse.
>         config/i386/i386-protos.h (ix86_tls_get_addr): New.
>         (make_pass_remove_redundant_vector_load): Renamed to ...
>         (make_pass_x86_cse): This.
>         * config/i386/i386.cc (ix86_tls_get_addr): Remove static.
>         * config/i386/i386.h (machine_function): Add
>         tls_descriptor_call_multiple_p.
>         * config/i386/i386.md (@tls_global_dynamic_64_<mode>): Set
>         tls_descriptor_call_multiple_p.
>         (@tls_local_dynamic_base_64_<mode>): Likewise.
>         (@tls_dynamic_gnu2_64_<mode>): Likewise.
>         (*tls_dynamic_gnu2_lea_64_<mode>): Renamed to ...
>         (tls_dynamic_gnu2_lea_64_<mode>): This.
>         (*tls_dynamic_gnu2_call_64_<mode>): Renamed to ...
>         (tls_dynamic_gnu2_call_64_<mode>): This.
>         (*tls_dynamic_gnu2_combine_64_<mode>): Renamed to ...
>         (tls_dynamic_gnu2_combine_64_<mode>): This.
>



-- 
BR,
Hongtao

Reply via email to