On Sun, Jul 20, 2025 at 7:41 PM Hongtao Liu <crazy...@gmail.com> wrote:
>
> On Thu, Jul 17, 2025 at 11:22 PM H.J. Lu <hjl.to...@gmail.com> wrote:
> >
> > For TLS calls:
> >
> > 1. UNSPEC_TLS_GD:
> >
> >   (parallel [
> >     (set (reg:DI 0 ax)
> >          (call:DI (mem:QI (symbol_ref:DI ("__tls_get_addr")))
> >                   (const_int 0 [0])))
> >     (unspec:DI [(symbol_ref:DI ("e") [flags 0x50])
> >                 (reg/f:DI 7 sp)] UNSPEC_TLS_GD)
> >     (clobber (reg:DI 5 di))])
> >
> > 2. UNSPEC_TLS_LD_BASE:
> >
> >   (parallel [
> >     (set (reg:DI 0 ax)
> >          (call:DI (mem:QI (symbol_ref:DI ("__tls_get_addr")))
> >                   (const_int 0 [0])))
> >     (unspec:DI [(reg/f:DI 7 sp)] UNSPEC_TLS_LD_BASE)])
> >
> > 3. UNSPEC_TLSDESC:
> >
> >   (parallel [
> >      (set (reg/f:DI 104)
> >            (plus:DI (unspec:DI [
> >                        (symbol_ref:DI ("_TLS_MODULE_BASE_") [flags 0x10])
> >                        (reg:DI 114)
> >                        (reg/f:DI 7 sp)] UNSPEC_TLSDESC)
> >                     (const:DI (unspec:DI [
> >                                  (symbol_ref:DI ("e") [flags 0x1a])
> >                               ] UNSPEC_DTPOFF))))
> >      (clobber (reg:CC 17 flags))])
> >
> >   (parallel [
> >     (set (reg:DI 101)
> >          (unspec:DI [(symbol_ref:DI ("e") [flags 0x50])
> >                      (reg:DI 112)
> >                      (reg/f:DI 7 sp)] UNSPEC_TLSDESC))
> >     (clobber (reg:CC 17 flags))])
> >
> > they return the same value for the same input value.  But multiple calls
> > with the same input value may be generated for simple programs like:
> >
> > void a(long *);
> > int b(void);
> > void c(void);
> > static __thread long e;
> > long
> > d(void)
> > {
> >   a(&e);
> >   if (b())
> >     c();
> >   return e;
> > }
> >
> > When compiled with -O2 -fPIC -mtls-dialect=gnu2, the following codes are
> > generated:
> >
> >         .type   d, @function
> > d:
> > .LFB0:
> >         .cfi_startproc
> >         pushq   %rbx
> >         .cfi_def_cfa_offset 16
> >         .cfi_offset 3, -16
> >         leaq    e@TLSDESC(%rip), %rbx
> >         movq    %rbx, %rax
> >         call    *e@TLSCALL(%rax)
> >         addq    %fs:0, %rax
> >         movq    %rax, %rdi
> >         call    a@PLT
> >         call    b@PLT
> >         testl   %eax, %eax
> >         jne     .L8
> >         movq    %rbx, %rax
> >         call    *e@TLSCALL(%rax)
> >         popq    %rbx
> >         .cfi_remember_state
> >         .cfi_def_cfa_offset 8
> >         movq    %fs:(%rax), %rax
> >         ret
> >         .p2align 4,,10
> >         .p2align 3
> > .L8:
> >         .cfi_restore_state
> >         call    c@PLT
> >         movq    %rbx, %rax
> >         call    *e@TLSCALL(%rax)
> >         popq    %rbx
> >         .cfi_def_cfa_offset 8
> >         movq    %fs:(%rax), %rax
> >         ret
> >         .cfi_endproc
> >
> > There are 3 "call *e@TLSCALL(%rax)".  They all return the same value.
> > Rename the remove_redundant_vector pass to the x86_cse pass, for 64bit,
> > extend it to also remove redundant TLS calls to generate:
> >
> > d:
> > .LFB0:
> >         .cfi_startproc
> >         pushq   %rbx
> >         .cfi_def_cfa_offset 16
> >         .cfi_offset 3, -16
> >         leaq    e@TLSDESC(%rip), %rax
> >         movq    %fs:0, %rdi
> >         call    *e@TLSCALL(%rax)
> >         addq    %rax, %rdi
> >         movq    %rax, %rbx
> >         call    a@PLT
> >         call    b@PLT
> >         testl   %eax, %eax
> >         jne     .L8
> >         movq    %fs:(%rbx), %rax
> >         popq    %rbx
> >         .cfi_remember_state
> >         .cfi_def_cfa_offset 8
> >         ret
> >         .p2align 4,,10
> >         .p2align 3
> > .L8:
> >         .cfi_restore_state
> >         call    c@PLT
> >         movq    %fs:(%rbx), %rax
> >         popq    %rbx
> >         .cfi_def_cfa_offset 8
> >         ret
> >         .cfi_endproc
> >
> > with only one "call *e@TLSCALL(%rax)".  This reduces the number of
> > __tls_get_addr calls in libgcc.a by 72%:
> >
> > __tls_get_addr calls     before         after
> > libgcc.a                 868            243
> >
> > gcc/
> >
> >         PR target/81501
> >         * config/i386/i386-features.cc (x86_cse_kind): Add X86_CSE_TLS_GD,
> >         X86_CSE_TLS_LD_BASE and X86_CSE_TLSDESC.
> >         (redundant_load): Renamed to ...
> >         (redundant_pattern): This.
> >         (replace_tls_call): New.
> >         (ix86_place_single_tls_call): Likewise.
> >         (pass_remove_redundant_vector_load): Renamed to ...
> >         (pass_x86_cse): This.  Add val, def_insn, mode, scalar_mode,
> >         kind, candidate_kind, x86_cse, candidate_gnu_tls_p,
> >         candidate_gnu2_tls_p and candidate_vector_p.
> >         (pass_x86_cse::candidate_gnu_tls_p): New.
> >         (pass_x86_cse::candidate_gnu2_tls_p): Likewise.
> Can we define insn attribute for those tls patterns, so that we can
> just check the attribute instead of going through each rtx to check
> for the pattern.

Fixed in the v2 patch.

Thanks.

> >         (pass_x86_cse::candidate_vector_p): Likewise.
> >         (remove_redundant_vector_load): Renamed to ...
> >         (pass_x86_cse::x86_cse): This.  Extend to remove redundant TLS
> >         calls.
> >         (make_pass_remove_redundant_vector_load): Renamed to ...
> >         (make_pass_x86_cse): This.
> >         (config/i386/i386-passes.def): Replace
> >         pass_remove_redundant_vector_load with pass_x86_cse.
> >         config/i386/i386-protos.h (ix86_tls_get_addr): New.
> >         (make_pass_remove_redundant_vector_load): Renamed to ...
> >         (make_pass_x86_cse): This.
> >         * config/i386/i386.cc (ix86_tls_get_addr): Remove static.
> >         * config/i386/i386.h (machine_function): Add
> >         tls_descriptor_call_multiple_p.
> >         * config/i386/i386.md (@tls_global_dynamic_64_<mode>): Set
> >         tls_descriptor_call_multiple_p.
> >         (@tls_local_dynamic_base_64_<mode>): Likewise.
> >         (@tls_dynamic_gnu2_64_<mode>): Likewise.
> >         (*tls_dynamic_gnu2_lea_64_<mode>): Renamed to ...
> >         (tls_dynamic_gnu2_lea_64_<mode>): This.
> >         (*tls_dynamic_gnu2_call_64_<mode>): Renamed to ...
> >         (tls_dynamic_gnu2_call_64_<mode>): This.
> >         (*tls_dynamic_gnu2_combine_64_<mode>): Renamed to ...
> >         (tls_dynamic_gnu2_combine_64_<mode>): This.
> >
>
>
>
> --
> BR,
> Hongtao



-- 
H.J.

Reply via email to