Re: PATCH: Properly generate X32 IE sequence
On Tue, Mar 20, 2012 at 11:43 AM, Uros Bizjak wrote: > On Tue, Mar 20, 2012 at 7:27 PM, H.J. Lu wrote: > I think use the OS provided instruction to load TP into DImode register could simplify the code. >>> >>> Which OS provided instruction? >>> >>> Please see how TP is defined in get_thread_pointer, it is in ptr_mode: >>> >>> rtx tp = gen_rtx_UNSPEC (ptr_mode, gen_rtvec (1, const0_rtx), UNSPEC_TP); >>> >>> This says that TP is in SImode on X32. > >> TP is defined as (unspec:DI [(const_int 0]) UNSPEC_TP) >> and provided by OS. It is a CONST_INT, but its value is opaque >> to GCC. MODE here has no impact on its value provided by OS. >> X32 OS provides instructions to load TP to into an SImode and >> DImode registers. > > You must be looking to some other GCC sources than me. > > (define_insn "*load_tp_x32" > [(set (match_operand:SI 0 "register_operand" "=r") > (unspec:SI [(const_int 0)] UNSPEC_TP))] > "TARGET_X32" > "mov{l}\t{%%fs:0, %0|%0, DWORD PTR fs:0}" > [(set_attr "type" "imov") > (set_attr "modrm" "0") > (set_attr "length" "7") > (set_attr "memory" "load") > (set_attr "imm_disp" "false")]) > > (define_insn "*load_tp_x32_zext" > [(set (match_operand:DI 0 "register_operand" "=r") > (zero_extend:DI (unspec:SI [(const_int 0)] UNSPEC_TP)))] > "TARGET_X32" > "mov{l}\t{%%fs:0, %k0|%k0, DWORD PTR fs:0}" > [(set_attr "type" "imov") > (set_attr "modrm" "0") > (set_attr "length" "7") > (set_attr "memory" "load") > (set_attr "imm_disp" "false")]) > Thread pointer (TP) points to thread control block (TCB). X32 TCB is typedef struct { void *tcb;/* Pointer to the TCB. Not necessarily the thread descriptor used by libpthread. */ ... } It is a 32bit address set up by OS. That is where 0 in "%fs:0" comes from since it is the first field of the struct %fs points to. X32 OS provides mov %fs:0, %eax to load the address of TCB into EAX and mov %fs:0, %eax to load the address of TCB into RAX since OS guarantees that the upper 32bits of the address of TCB are all 0s. We added "*load_tp_x32_zext" since we zero-extend SI TP to DI TP. Or we can use mov %fs:0, %eax to directly load the value of the tcb field into RAX and remove "*load_tp_x32_zext". It will simplify the code. -- H.J.
Re: PATCH: Properly generate X32 IE sequence
On Tue, Mar 20, 2012 at 7:27 PM, H.J. Lu wrote: >>> I think use the OS provided instruction to load TP into DImode register >>> could simplify the code. >> >> Which OS provided instruction? >> >> Please see how TP is defined in get_thread_pointer, it is in ptr_mode: >> >> rtx tp = gen_rtx_UNSPEC (ptr_mode, gen_rtvec (1, const0_rtx), UNSPEC_TP); >> >> This says that TP is in SImode on X32. > TP is defined as (unspec:DI [(const_int 0]) UNSPEC_TP) > and provided by OS. It is a CONST_INT, but its value is opaque > to GCC. MODE here has no impact on its value provided by OS. > X32 OS provides instructions to load TP to into an SImode and > DImode registers. You must be looking to some other GCC sources than me. (define_insn "*load_tp_x32" [(set (match_operand:SI 0 "register_operand" "=r") (unspec:SI [(const_int 0)] UNSPEC_TP))] "TARGET_X32" "mov{l}\t{%%fs:0, %0|%0, DWORD PTR fs:0}" [(set_attr "type" "imov") (set_attr "modrm" "0") (set_attr "length" "7") (set_attr "memory" "load") (set_attr "imm_disp" "false")]) (define_insn "*load_tp_x32_zext" [(set (match_operand:DI 0 "register_operand" "=r") (zero_extend:DI (unspec:SI [(const_int 0)] UNSPEC_TP)))] "TARGET_X32" "mov{l}\t{%%fs:0, %k0|%k0, DWORD PTR fs:0}" [(set_attr "type" "imov") (set_attr "modrm" "0") (set_attr "length" "7") (set_attr "memory" "load") (set_attr "imm_disp" "false")]) Uros.
Re: PATCH: Properly generate X32 IE sequence
On Tue, Mar 20, 2012 at 10:54 AM, Uros Bizjak wrote: > On Tue, Mar 20, 2012 at 4:52 PM, H.J. Lu wrote: > Yeah, my bootstrap just failed the same. Will test: 2012-03-20 Jakub Jelinek * config/i386/i386.c (ix86_decompose_address) : If operand isn't UNSPEC, return 0. >>> >>> Committed as obvious now that bootstrap/regtest finished on x86_64-linux >>> and i686-linux. >>> --- gcc/config/i386/i386.c.jj 2012-03-20 09:35:06.0 +0100 +++ gcc/config/i386/i386.c 2012-03-20 09:56:35.038835835 +0100 @@ -11516,6 +11516,8 @@ ix86_decompose_address (rtx addr, struct case ZERO_EXTEND: op = XEXP (op, 0); + if (GET_CODE (op) != UNSPEC) + return 0; /* FALLTHRU */ case UNSPEC: >>> >> >> Uros, >> >> I think use the OS provided instruction to load TP into DImode register >> could simplify the code. > > Which OS provided instruction? > > Please see how TP is defined in get_thread_pointer, it is in ptr_mode: > > rtx tp = gen_rtx_UNSPEC (ptr_mode, gen_rtvec (1, const0_rtx), UNSPEC_TP); > > This says that TP is in SImode on X32. > > Uros. TP is defined as (unspec:DI [(const_int 0]) UNSPEC_TP) and provided by OS. It is a CONST_INT, but its value is opaque to GCC. MODE here has no impact on its value provided by OS. X32 OS provides instructions to load TP to into an SImode and DImode registers. -- H.J.
Re: PATCH: Properly generate X32 IE sequence
On Tue, Mar 20, 2012 at 4:52 PM, H.J. Lu wrote: >>> Yeah, my bootstrap just failed the same. Will test: >>> >>> 2012-03-20 Jakub Jelinek >>> >>> * config/i386/i386.c (ix86_decompose_address) : >>> If operand isn't UNSPEC, return 0. >> >> Committed as obvious now that bootstrap/regtest finished on x86_64-linux >> and i686-linux. >> >>> --- gcc/config/i386/i386.c.jj 2012-03-20 09:35:06.0 +0100 >>> +++ gcc/config/i386/i386.c 2012-03-20 09:56:35.038835835 +0100 >>> @@ -11516,6 +11516,8 @@ ix86_decompose_address (rtx addr, struct >>> >>> case ZERO_EXTEND: >>> op = XEXP (op, 0); >>> + if (GET_CODE (op) != UNSPEC) >>> + return 0; >>> /* FALLTHRU */ >>> >>> case UNSPEC: >> > > Uros, > > I think use the OS provided instruction to load TP into DImode register > could simplify the code. Which OS provided instruction? Please see how TP is defined in get_thread_pointer, it is in ptr_mode: rtx tp = gen_rtx_UNSPEC (ptr_mode, gen_rtvec (1, const0_rtx), UNSPEC_TP); This says that TP is in SImode on X32. Uros.
Re: PATCH: Properly generate X32 IE sequence
On Tue, Mar 20, 2012 at 4:19 AM, Jakub Jelinek wrote: > On Tue, Mar 20, 2012 at 09:58:29AM +0100, Jakub Jelinek wrote: >> Yeah, my bootstrap just failed the same. Will test: >> >> 2012-03-20 Jakub Jelinek >> >> * config/i386/i386.c (ix86_decompose_address) : >> If operand isn't UNSPEC, return 0. > > Committed as obvious now that bootstrap/regtest finished on x86_64-linux > and i686-linux. > >> --- gcc/config/i386/i386.c.jj 2012-03-20 09:35:06.0 +0100 >> +++ gcc/config/i386/i386.c 2012-03-20 09:56:35.038835835 +0100 >> @@ -11516,6 +11516,8 @@ ix86_decompose_address (rtx addr, struct >> >> case ZERO_EXTEND: >> op = XEXP (op, 0); >> + if (GET_CODE (op) != UNSPEC) >> + return 0; >> /* FALLTHRU */ >> >> case UNSPEC: > Uros, I think use the OS provided instruction to load TP into DImode register could simplify the code. -- H.J.
Re: PATCH: Properly generate X32 IE sequence
On Tue, Mar 20, 2012 at 09:58:29AM +0100, Jakub Jelinek wrote: > Yeah, my bootstrap just failed the same. Will test: > > 2012-03-20 Jakub Jelinek > > * config/i386/i386.c (ix86_decompose_address) : > If operand isn't UNSPEC, return 0. Committed as obvious now that bootstrap/regtest finished on x86_64-linux and i686-linux. > --- gcc/config/i386/i386.c.jj 2012-03-20 09:35:06.0 +0100 > +++ gcc/config/i386/i386.c2012-03-20 09:56:35.038835835 +0100 > @@ -11516,6 +11516,8 @@ ix86_decompose_address (rtx addr, struct > > case ZERO_EXTEND: > op = XEXP (op, 0); > + if (GET_CODE (op) != UNSPEC) > + return 0; > /* FALLTHRU */ > > case UNSPEC: Jakub
Re: PATCH: Properly generate X32 IE sequence
Il 19/03/2012 20:13, Uros Bizjak ha scritto: > 2012-03-19 Uros Bizjak > > * config/i386/i386.c (get_thread_pointer): Add tp_mode argument. > Generate ZERO_EXTEND in place if GET_MODE (tp) != tp_mode. > (legitimize_tls_address) : Always generate > DImode UNSPEC_GOTNTPOFF references on TARGET_64BIT. > (ix86_decompose_address): Allow zero extended UNSPEC_TP references. > > Revert: > 2012-03-13 Uros Bizjak > > * config/i386/i386.h (TARGET_TLS_INDIRECT_SEG_REFS): New. > * config/i386/i386.c (ix86_decompose_address): Use > TARGET_TLS_INDIRECT_SEG_REFS to prevent %fs:(%reg) addresses. > (legitimize_tls_address): Use TARGET_TLS_INDIRECT_SEG_REFS to load > thread pointer to a register. > > Revert: > 2012-03-10 H.J. Lu > > * config/i386/i386.c (ix86_decompose_address): Disallow fs:(reg) > if Pmode != word_mode. > (legitimize_tls_address): Call gen_tls_initial_exec_x32 if > Pmode == SImode for TARGET_X32. > > * config/i386/i386.md (UNSPEC_TLS_IE_X32): New. > (tls_initial_exec_x32): Likewise. > > Tested on x86_64-pc-linux-gnu {,-m32}. No testcases? Paolo
Re: PATCH: Properly generate X32 IE sequence
On Tue, Mar 20, 2012 at 09:51:07AM +0100, Eric Botcazou wrote: > > The patch is bootstrapping now on x86_64-pc-linux-gnu. > > It very likely breaks bootstrap with RTL checking enabled: > > /sil.a/gnatmail/gnatmail-x/build-sil/x86-linux/gnat/obj/./gcc/xgcc > -B/sil.a/gnatmail/gnatmail-x/build-sil/x86-linux/gnat/obj/./gcc/ > -B/usr/gnat/i686-pc-linux-gnu/bin/ -B/usr/gnat/i686-pc-linux-gnu/lib/ > -isystem /usr/gnat/i686-pc-linux-gnu/include -isystem > /usr/gnat/i686-pc-linux-gnu/sys-include-g -O2 -O2 -g -O2 -DIN_GCC -W > -Wall -Wwrite-strings -Wcast-qual -Wstrict-prototypes -Wmissing-prototypes > -Wold-style-definition -isystem ./include -fpic -g -DIN_LIBGCC2 > -fbuilding-libgcc -fno-stack-protector -fpic -I. -I. -I../.././gcc > -I../../../src/libgcc -I../../../src/libgcc/. -I../../../src/libgcc/../gcc > -I../../../src/libgcc/../include -I../../../src/libgcc/config/libbid > -DENABLE_DECIMAL_BID_FORMAT -DHAVE_CC_TLS -DUSE_TLS -o > _popcountsi2.o -MT _popcountsi2.o -MD -MP -MF > _popcountsi2.dep -DL_popcountsi2 -c ../../../src/libgcc/libgcc2.c > -fvisibility=hidden -DHIDE_EXPORTS > ../../../src/libgcc/libgcc2.c: In function '__popcountsi2': > ../../../src/libgcc/libgcc2.c:835:1: internal compiler error: RTL check: > expected elt 1 type 'i' or 'n', have '0' (rtx mem) in ix86_decompose_address, > at config/i386/i386.c:11522 > Please submit a full bug report, > with preprocessed source if appropriate. > See mailto:rep...@adacore.com> for instructions. > make[3]: *** [_popcountsi2.o] Error 1 Yeah, my bootstrap just failed the same. Will test: 2012-03-20 Jakub Jelinek * config/i386/i386.c (ix86_decompose_address) : If operand isn't UNSPEC, return 0. --- gcc/config/i386/i386.c.jj 2012-03-20 09:35:06.0 +0100 +++ gcc/config/i386/i386.c 2012-03-20 09:56:35.038835835 +0100 @@ -11516,6 +11516,8 @@ ix86_decompose_address (rtx addr, struct case ZERO_EXTEND: op = XEXP (op, 0); + if (GET_CODE (op) != UNSPEC) + return 0; /* FALLTHRU */ case UNSPEC: Jakub
Re: PATCH: Properly generate X32 IE sequence
> The patch is bootstrapping now on x86_64-pc-linux-gnu. It very likely breaks bootstrap with RTL checking enabled: /sil.a/gnatmail/gnatmail-x/build-sil/x86-linux/gnat/obj/./gcc/xgcc -B/sil.a/gnatmail/gnatmail-x/build-sil/x86-linux/gnat/obj/./gcc/ -B/usr/gnat/i686-pc-linux-gnu/bin/ -B/usr/gnat/i686-pc-linux-gnu/lib/ -isystem /usr/gnat/i686-pc-linux-gnu/include -isystem /usr/gnat/i686-pc-linux-gnu/sys-include-g -O2 -O2 -g -O2 -DIN_GCC -W -Wall -Wwrite-strings -Wcast-qual -Wstrict-prototypes -Wmissing-prototypes -Wold-style-definition -isystem ./include -fpic -g -DIN_LIBGCC2 -fbuilding-libgcc -fno-stack-protector -fpic -I. -I. -I../.././gcc -I../../../src/libgcc -I../../../src/libgcc/. -I../../../src/libgcc/../gcc -I../../../src/libgcc/../include -I../../../src/libgcc/config/libbid -DENABLE_DECIMAL_BID_FORMAT -DHAVE_CC_TLS -DUSE_TLS -o _popcountsi2.o -MT _popcountsi2.o -MD -MP -MF _popcountsi2.dep -DL_popcountsi2 -c ../../../src/libgcc/libgcc2.c -fvisibility=hidden -DHIDE_EXPORTS ../../../src/libgcc/libgcc2.c: In function '__popcountsi2': ../../../src/libgcc/libgcc2.c:835:1: internal compiler error: RTL check: expected elt 1 type 'i' or 'n', have '0' (rtx mem) in ix86_decompose_address, at config/i386/i386.c:11522 Please submit a full bug report, with preprocessed source if appropriate. See mailto:rep...@adacore.com> for instructions. make[3]: *** [_popcountsi2.o] Error 1 -- Eric Botcazou
Re: PATCH: Properly generate X32 IE sequence
On Mon, Mar 19, 2012 at 6:50 PM, H.J. Lu wrote: >>> Please test my proposed patch. If it works OK, I will commit it to SVN. >> >> The onyl acceptable way is to generate ZERO_EXTEND in place, so: >> >> --cut here-- >> static rtx >> get_thread_pointer (enum machine_mode tp_mode, bool to_reg) >> { >> rtx tp = gen_rtx_UNSPEC (ptr_mode, gen_rtvec (1, const0_rtx), UNSPEC_TP); >> >> if (GET_MODE (tp) != tp_mode) >> { >> gcc_assert (GET_MODE (tp) == SImode); >> gcc_assert (tp_mode == DImode); >> >> tp = gen_rtx_ZERO_EXTEND (tp_mode, tp); >> } >> >> if (to_reg) >> tp = copy_to_mode_reg (tp_mode, tp); >> >> return tp; >> } >> --cut here-- > > This version works fine. Attached patch was committed to mainline SVN with following ChangeLog: 2012-03-19 Uros Bizjak * config/i386/i386.c (get_thread_pointer): Add tp_mode argument. Generate ZERO_EXTEND in place if GET_MODE (tp) != tp_mode. (legitimize_tls_address) : Always generate DImode UNSPEC_GOTNTPOFF references on TARGET_64BIT. (ix86_decompose_address): Allow zero extended UNSPEC_TP references. Revert: 2012-03-13 Uros Bizjak * config/i386/i386.h (TARGET_TLS_INDIRECT_SEG_REFS): New. * config/i386/i386.c (ix86_decompose_address): Use TARGET_TLS_INDIRECT_SEG_REFS to prevent %fs:(%reg) addresses. (legitimize_tls_address): Use TARGET_TLS_INDIRECT_SEG_REFS to load thread pointer to a register. Revert: 2012-03-10 H.J. Lu * config/i386/i386.c (ix86_decompose_address): Disallow fs:(reg) if Pmode != word_mode. (legitimize_tls_address): Call gen_tls_initial_exec_x32 if Pmode == SImode for TARGET_X32. * config/i386/i386.md (UNSPEC_TLS_IE_X32): New. (tls_initial_exec_x32): Likewise. Tested on x86_64-pc-linux-gnu {,-m32}. Thanks, Uros. Index: i386.md === --- i386.md (revision 185524) +++ i386.md (working copy) @@ -96,7 +96,6 @@ UNSPEC_TLS_LD_BASE UNSPEC_TLSDESC UNSPEC_TLS_IE_SUN - UNSPEC_TLS_IE_X32 ;; Other random patterns UNSPEC_SCAS @@ -12836,28 +12835,6 @@ } [(set_attr "type" "multi")]) -;; When Pmode == SImode, there may be no REX prefix for ADD. Avoid -;; any instructions between MOV and ADD, which may interfere linker -;; IE->LE optimization, since the last byte of the previous instruction -;; before ADD may look like a REX prefix. This also avoids -;; movl x@gottpoff(%rip), %reg32 -;; movl $fs:(%reg32), %reg32 -;; Since address override works only on the (reg32) part in fs:(reg32), -;; we can't use it as memory operand. -(define_insn "tls_initial_exec_x32" - [(set (match_operand:SI 0 "register_operand" "=r") - (unspec:SI -[(match_operand 1 "tls_symbolic_operand")] -UNSPEC_TLS_IE_X32)) - (clobber (reg:CC FLAGS_REG))] - "TARGET_X32" -{ - output_asm_insn -("mov{l}\t{%%fs:0, %0|%0, DWORD PTR fs:0}", operands); - return "add{l}\t{%a1@gottpoff(%%rip), %0|%0, %a1@gottpoff[rip]}"; -} - [(set_attr "type" "multi")]) - ;; GNU2 TLS patterns can be split. (define_expand "tls_dynamic_gnu2_32" Index: i386.c === --- i386.c (revision 185524) +++ i386.c (working copy) @@ -11514,6 +11514,10 @@ ix86_decompose_address (rtx addr, struct ix86_addr scale = 1 << scale; break; + case ZERO_EXTEND: + op = XEXP (op, 0); + /* FALLTHRU */ + case UNSPEC: if (XINT (op, 1) == UNSPEC_TP && TARGET_TLS_DIRECT_SEG_REFS @@ -12483,15 +12487,20 @@ legitimize_pic_address (rtx orig, rtx reg) /* Load the thread pointer. If TO_REG is true, force it into a register. */ static rtx -get_thread_pointer (bool to_reg) +get_thread_pointer (enum machine_mode tp_mode, bool to_reg) { rtx tp = gen_rtx_UNSPEC (ptr_mode, gen_rtvec (1, const0_rtx), UNSPEC_TP); - if (GET_MODE (tp) != Pmode) -tp = convert_to_mode (Pmode, tp, 1); + if (GET_MODE (tp) != tp_mode) +{ + gcc_assert (GET_MODE (tp) == SImode); + gcc_assert (tp_mode == DImode); + tp = gen_rtx_ZERO_EXTEND (tp_mode, tp); +} + if (to_reg) -tp = copy_addr_to_reg (tp); +tp = copy_to_mode_reg (tp_mode, tp); return tp; } @@ -12543,6 +12552,7 @@ legitimize_tls_address (rtx x, enum tls_model mode { rtx dest, base, off; rtx pic = NULL_RTX, tp = NULL_RTX; + enum machine_mode tp_mode = Pmode; int type; switch (model) @@ -12568,7 +12578,7 @@ legitimize_tls_address (rtx x, enum tls_model mode else emit_insn (gen_tls_dynamic_gnu2_32 (dest, x, pic)); - tp = get_thread_pointer (true); + tp = get_thread_pointer (Pmode, true); dest = force_reg (Pmode, gen_rtx_PLUS (Pmode, tp, dest)); set_unique_reg_note (get_last_insn (), REG_EQUAL, x); @
Re: PATCH: Properly generate X32 IE sequence
On Mon, Mar 19, 2012 at 10:29 AM, Uros Bizjak wrote: > On Mon, Mar 19, 2012 at 6:01 PM, Uros Bizjak wrote: > For x32, thread pointer is an unsigned 32bit value. > > movl %fs:0, %eax > > is the correct instruction to load thread pointer into EAX and RAX. So, where is ZERO_EXTEND RTX then? >>> >>> Thread pointer (TP) is an opaque value to GCC. GCC needs to load >>> TP into a SImode or DImode register. ZERO_EXTEND isn't needed >>> when there is a single instruction to load TP into a DImode register. >> >> I don't agree with this explanation. The mode can't be SImode and >> DImode. TP is either SImode or ZERO_EXTENDed to DImode, this is the >> reason we went for all that TARGET_X32 stuff in TP load RTX. FWIW, TP maintained by OS is opaque to GCC and GCC mode doesn't apply to the TP value maintained by OS. The instruction pattern to load TP into a register is provided by OS and is also opaque to GCC. X32 OS provides single instructions to load TP into SImode and DImode registers. We can load x32 TP into SImode register and ZERO_EXTENDs to DImode. Or we can use the OS provided instruction to load TP into DImode register directly. >> Please test my proposed patch. If it works OK, I will commit it to SVN. > > The onyl acceptable way is to generate ZERO_EXTEND in place, so: > > --cut here-- > static rtx > get_thread_pointer (enum machine_mode tp_mode, bool to_reg) > { > rtx tp = gen_rtx_UNSPEC (ptr_mode, gen_rtvec (1, const0_rtx), UNSPEC_TP); > > if (GET_MODE (tp) != tp_mode) > { > gcc_assert (GET_MODE (tp) == SImode); > gcc_assert (tp_mode == DImode); > > tp = gen_rtx_ZERO_EXTEND (tp_mode, tp); > } > > if (to_reg) > tp = copy_to_mode_reg (tp_mode, tp); > > return tp; > } > --cut here-- This version works fine. Thanks. -- H.J.
Re: PATCH: Properly generate X32 IE sequence
On Mon, Mar 19, 2012 at 6:01 PM, Uros Bizjak wrote: For x32, thread pointer is an unsigned 32bit value. movl %fs:0, %eax is the correct instruction to load thread pointer into EAX and RAX. >>> >>> So, where is ZERO_EXTEND RTX then? >>> >> >> Thread pointer (TP) is an opaque value to GCC. GCC needs to load >> TP into a SImode or DImode register. ZERO_EXTEND isn't needed >> when there is a single instruction to load TP into a DImode register. > > I don't agree with this explanation. The mode can't be SImode and > DImode. TP is either SImode or ZERO_EXTENDed to DImode, this is the > reason we went for all that TARGET_X32 stuff in TP load RTX. > > Please test my proposed patch. If it works OK, I will commit it to SVN. The onyl acceptable way is to generate ZERO_EXTEND in place, so: --cut here-- static rtx get_thread_pointer (enum machine_mode tp_mode, bool to_reg) { rtx tp = gen_rtx_UNSPEC (ptr_mode, gen_rtvec (1, const0_rtx), UNSPEC_TP); if (GET_MODE (tp) != tp_mode) { gcc_assert (GET_MODE (tp) == SImode); gcc_assert (tp_mode == DImode); tp = gen_rtx_ZERO_EXTEND (tp_mode, tp); } if (to_reg) tp = copy_to_mode_reg (tp_mode, tp); return tp; } --cut here-- This will generate: movqc@gottpoff(%rip), %rax movzbl %fs:(%rax), %eax movb%al, y(%rip) movqw@gottpoff(%rip), %rax movzwl %fs:(%rax), %eax movw%ax, i(%rip) ret Uros.
Re: PATCH: Properly generate X32 IE sequence
On Mon, Mar 19, 2012 at 5:55 PM, H.J. Lu wrote: >>> For x32, thread pointer is an unsigned 32bit value. >>> >>> movl %fs:0, %eax >>> >>> is the correct instruction to load thread pointer into EAX and RAX. >> >> So, where is ZERO_EXTEND RTX then? >> > > Thread pointer (TP) is an opaque value to GCC. GCC needs to load > TP into a SImode or DImode register. ZERO_EXTEND isn't needed > when there is a single instruction to load TP into a DImode register. I don't agree with this explanation. The mode can't be SImode and DImode. TP is either SImode or ZERO_EXTENDed to DImode, this is the reason we went for all that TARGET_X32 stuff in TP load RTX. Please test my proposed patch. If it works OK, I will commit it to SVN. Thanks, Uros.
Re: PATCH: Properly generate X32 IE sequence
On Mon, Mar 19, 2012 at 9:49 AM, Uros Bizjak wrote: > On Mon, Mar 19, 2012 at 5:47 PM, H.J. Lu wrote: > >> For x32, thread pointer is an unsigned 32bit value. >> >> movl %fs:0, %eax >> >> is the correct instruction to load thread pointer into EAX and RAX. > > So, where is ZERO_EXTEND RTX then? > Thread pointer (TP) is an opaque value to GCC. GCC needs to load TP into a SImode or DImode register. ZERO_EXTEND isn't needed when there is a single instruction to load TP into a DImode register. -- H.J.
Re: PATCH: Properly generate X32 IE sequence
On Mon, Mar 19, 2012 at 5:47 PM, H.J. Lu wrote: > For x32, thread pointer is an unsigned 32bit value. > > movl %fs:0, %eax > > is the correct instruction to load thread pointer into EAX and RAX. So, where is ZERO_EXTEND RTX then? Uros.
Re: PATCH: Properly generate X32 IE sequence
On Mon, Mar 19, 2012 at 5:19 PM, H.J. Lu wrote: > movl %fs:0, %eax > movq c@gottpoff(%rip), %rdx > movzbl (%rax,%rdx), %edx > movb %dl, y(%rip) > movq w@gottpoff(%rip), %rdx > movzwl (%rax,%rdx), %eax > movw %ax, i(%rip) > ret > > It can be > > movq c@gottpoff(%rip), %rax > movzbl %fs:(%rax), %eax > movb %al, y(%rip) > movq w@gottpoff(%rip), %rax > movzwl %fs:(%rax), %eax > movw %ax, i(%rip) > ret This is just CSE in action. It CSEd movl %fs:0, %eax, since it has to be zero extended before going into address. Uros.
Re: PATCH: Properly generate X32 IE sequence
On Mon, Mar 19, 2012 at 9:37 AM, Uros Bizjak wrote: > On Mon, Mar 19, 2012 at 5:34 PM, H.J. Lu wrote: > Combine failed: (set (reg:QI 63 [ c ]) (mem/c:QI (plus:DI (zero_extend:DI (unspec:SI [ (const_int 0 [0]) ] UNSPEC_TP)) (mem/u/c:DI (const:DI (unspec:DI [ (symbol_ref:SI ("c") [flags 0x60] ) ] UNSPEC_GOTNTPOFF)) [2 S8 A8])) [0 c+0 S1 A8])) >>> >>> Wrong testcase. IT should be >>> >>> -- >>> extern __thread char c; >>> extern __thread short w; >>> extern char y; >>> extern short i; >>> void >>> ie (void) >>> { >>> y = c; >>> i = w; >>> } >>> --- >>> >>> I got >>> >>> movl %fs:0, %eax >>> movq c@gottpoff(%rip), %rdx >>> movzbl (%rax,%rdx), %edx >>> movb %dl, y(%rip) >>> movq w@gottpoff(%rip), %rdx >>> movzwl (%rax,%rdx), %eax >>> movw %ax, i(%rip) >>> ret >>> >>> It can be >>> >>> movq c@gottpoff(%rip), %rax >>> movzbl %fs:(%rax), %eax >>> movb %al, y(%rip) >>> movq w@gottpoff(%rip), %rax >>> movzwl %fs:(%rax), %eax >>> movw %ax, i(%rip) >>> ret >>> >>> >> >> How about this patch? I changed 32 TP load to >> >> (define_insn "*load_tp_x32_" >> [(set (match_operand:SWI48x 0 "register_operand" "=r") >> (unspec:SWI48x [(const_int 0)] UNSPEC_TP))] >> "TARGET_X32" >> "mov{l}\t{%%fs:0, %k0|%k0, DWORD PTR fs:0}" >> [(set_attr "type" "imov") >> (set_attr "modrm" "0") >> (set_attr "length" "7") >> (set_attr "memory" "load") >> (set_attr "imm_disp" "false")]) >> >> and removed *load_tp_x32_zext. > > No, your whole approach with splitters is wrong. > > @@ -12747,11 +12747,11 @@ > (define_mode_attr tp_seg [(SI "gs") (DI "fs")]) > > ;; Load and add the thread base pointer from %:0. > -(define_insn "*load_tp_x32" > - [(set (match_operand:SI 0 "register_operand" "=r") > - (unspec:SI [(const_int 0)] UNSPEC_TP))] > +(define_insn "*load_tp_x32_" > + [(set (match_operand:SWI48x 0 "register_operand" "=r") > + (unspec:SWI48x [(const_int 0)] UNSPEC_TP))] > "TARGET_X32" > - "mov{l}\t{%%fs:0, %0|%0, DWORD PTR fs:0}" > + "mov{l}\t{%%fs:0, %k0|%k0, DWORD PTR fs:0}" > > The result is zero_extended SImode register, not fake SImode register in > DImore. > > But as said, you should generate correct sequence from the beginning. > For x32, thread pointer is an unsigned 32bit value. movl %fs:0, %eax is the correct instruction to load thread pointer into EAX and RAX. -- H.J.
Re: PATCH: Properly generate X32 IE sequence
On Mon, Mar 19, 2012 at 5:34 PM, H.J. Lu wrote: >>> Combine failed: >>> >>> (set (reg:QI 63 [ c ]) >>> (mem/c:QI (plus:DI (zero_extend:DI (unspec:SI [ >>> (const_int 0 [0]) >>> ] UNSPEC_TP)) >>> (mem/u/c:DI (const:DI (unspec:DI [ >>> (symbol_ref:SI ("c") [flags 0x60] >>> ) >>> ] UNSPEC_GOTNTPOFF)) [2 S8 A8])) [0 c+0 S1 A8])) >>> >>> >> >> Wrong testcase. IT should be >> >> -- >> extern __thread char c; >> extern __thread short w; >> extern char y; >> extern short i; >> void >> ie (void) >> { >> y = c; >> i = w; >> } >> --- >> >> I got >> >> movl %fs:0, %eax >> movq c@gottpoff(%rip), %rdx >> movzbl (%rax,%rdx), %edx >> movb %dl, y(%rip) >> movq w@gottpoff(%rip), %rdx >> movzwl (%rax,%rdx), %eax >> movw %ax, i(%rip) >> ret >> >> It can be >> >> movq c@gottpoff(%rip), %rax >> movzbl %fs:(%rax), %eax >> movb %al, y(%rip) >> movq w@gottpoff(%rip), %rax >> movzwl %fs:(%rax), %eax >> movw %ax, i(%rip) >> ret >> >> > > How about this patch? I changed 32 TP load to > > (define_insn "*load_tp_x32_" > [(set (match_operand:SWI48x 0 "register_operand" "=r") > (unspec:SWI48x [(const_int 0)] UNSPEC_TP))] > "TARGET_X32" > "mov{l}\t{%%fs:0, %k0|%k0, DWORD PTR fs:0}" > [(set_attr "type" "imov") > (set_attr "modrm" "0") > (set_attr "length" "7") > (set_attr "memory" "load") > (set_attr "imm_disp" "false")]) > > and removed *load_tp_x32_zext. No, your whole approach with splitters is wrong. @@ -12747,11 +12747,11 @@ (define_mode_attr tp_seg [(SI "gs") (DI "fs")]) ;; Load and add the thread base pointer from %:0. -(define_insn "*load_tp_x32" - [(set (match_operand:SI 0 "register_operand" "=r") - (unspec:SI [(const_int 0)] UNSPEC_TP))] +(define_insn "*load_tp_x32_" + [(set (match_operand:SWI48x 0 "register_operand" "=r") + (unspec:SWI48x [(const_int 0)] UNSPEC_TP))] "TARGET_X32" - "mov{l}\t{%%fs:0, %0|%0, DWORD PTR fs:0}" + "mov{l}\t{%%fs:0, %k0|%k0, DWORD PTR fs:0}" The result is zero_extended SImode register, not fake SImode register in DImore. But as said, you should generate correct sequence from the beginning. Uros.
Re: PATCH: Properly generate X32 IE sequence
On Mon, Mar 19, 2012 at 9:19 AM, H.J. Lu wrote: > On Mon, Mar 19, 2012 at 8:54 AM, H.J. Lu wrote: >> On Mon, Mar 19, 2012 at 8:51 AM, H.J. Lu wrote: >>> On Sun, Mar 18, 2012 at 1:55 PM, Uros Bizjak wrote: On Sun, Mar 18, 2012 at 5:01 PM, Uros Bizjak wrote: >> I am testing this patch. OK for trunk if it passes all tests? > > No, force_reg will generate a pseudo, so this conversion is valid only > for !can_create_pseudo (). > > At least for *tls_initial_exec_x32_store, you will need a temporary to > split the pattern after reload. >>> >>> Here is the updated patch to add can_create_pseudo. I also changed >>> tls_initial_exec_x32 to take an input register operand as thread pointer. >>> Please try attached patch. It simply throws away all recent complications w.r.t. to thread pointer and always handles TP in DImode. The testcase: --cut here-- __thread int foo __attribute__ ((tls_model ("initial-exec"))); void bar (int x) { foo = x; } int baz (void) { return foo; } --cut here-- Now compiles to: bar: movq foo@gottpoff(%rip), %rax movl %edi, %fs:(%rax) ret baz: movq foo@gottpoff(%rip), %rax movl %fs:(%rax), %eax ret In effect, this always generates %fs(%rDI) and emits REX prefix before mov/add to satisfy brain-dead linkers. The patch is bootstrapping now on x86_64-pc-linux-gnu. >>> >>> For >>> >>> -- >>> extern __thread char c; >>> extern char y; >>> void >>> ie (void) >>> { >>> y = c; >>> } >>> -- >>> >>> Your patch generates: >>> >>> movl %fs:0, %eax >>> movq c@gottpoff(%rip), %rdx >>> movzbl (%rax,%rdx), %edx >>> movb %dl, y(%rip) >>> ret >>> >>> It can be optimized to: >>> >>> movq c@gottpoff(%rip), %rax >>> movzbl %fs:(%rax), %eax >>> movb %al, y(%rip) >>> ret >>> >> >> Combine failed: >> >> (set (reg:QI 63 [ c ]) >> (mem/c:QI (plus:DI (zero_extend:DI (unspec:SI [ >> (const_int 0 [0]) >> ] UNSPEC_TP)) >> (mem/u/c:DI (const:DI (unspec:DI [ >> (symbol_ref:SI ("c") [flags 0x60] >> ) >> ] UNSPEC_GOTNTPOFF)) [2 S8 A8])) [0 c+0 S1 A8])) >> >> > > Wrong testcase. IT should be > > -- > extern __thread char c; > extern __thread short w; > extern char y; > extern short i; > void > ie (void) > { > y = c; > i = w; > } > --- > > I got > > movl %fs:0, %eax > movq c@gottpoff(%rip), %rdx > movzbl (%rax,%rdx), %edx > movb %dl, y(%rip) > movq w@gottpoff(%rip), %rdx > movzwl (%rax,%rdx), %eax > movw %ax, i(%rip) > ret > > It can be > > movq c@gottpoff(%rip), %rax > movzbl %fs:(%rax), %eax > movb %al, y(%rip) > movq w@gottpoff(%rip), %rax > movzwl %fs:(%rax), %eax > movw %ax, i(%rip) > ret > > How about this patch? I changed 32 TP load to (define_insn "*load_tp_x32_" [(set (match_operand:SWI48x 0 "register_operand" "=r") (unspec:SWI48x [(const_int 0)] UNSPEC_TP))] "TARGET_X32" "mov{l}\t{%%fs:0, %k0|%k0, DWORD PTR fs:0}" [(set_attr "type" "imov") (set_attr "modrm" "0") (set_attr "length" "7") (set_attr "memory" "load") (set_attr "imm_disp" "false")]) and removed *load_tp_x32_zext. -- H.J. diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index 9aa5ee7..66221e4 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -12483,15 +12483,12 @@ legitimize_pic_address (rtx orig, rtx reg) /* Load the thread pointer. If TO_REG is true, force it into a register. */ static rtx -get_thread_pointer (bool to_reg) +get_thread_pointer (enum machine_mode tp_mode, bool to_reg) { - rtx tp = gen_rtx_UNSPEC (ptr_mode, gen_rtvec (1, const0_rtx), UNSPEC_TP); - - if (GET_MODE (tp) != Pmode) -tp = convert_to_mode (Pmode, tp, 1); + rtx tp = gen_rtx_UNSPEC (tp_mode, gen_rtvec (1, const0_rtx), UNSPEC_TP); if (to_reg) -tp = copy_addr_to_reg (tp); +tp = copy_to_mode_reg (tp_mode, tp); return tp; } @@ -12543,6 +12540,7 @@ legitimize_tls_address (rtx x, enum tls_model model, bool for_mov) { rtx dest, base, off; rtx pic = NULL_RTX, tp = NULL_RTX; + enum machine_mode tp_mode = Pmode; int type; switch (model) @@ -12568,7 +12566,7 @@ legitimize_tls_address (rtx x, enum tls_model model, bool for_mov) else emit_insn (gen_tls_dynamic_gnu2_32 (dest, x, pic)); - tp = get_thread_pointer (true); + tp = get_thread_pointer (Pmode, true); dest = force_reg (Pmode, gen_rtx_PLUS (Pmode, tp, dest)); set_unique_reg_note (get_last_insn (), REG_EQUAL, x); @@ -12618,7 +12616,7 @@ legitimize_tls_address (rtx x, enum tls_model model, bool for_mov) el
Re: PATCH: Properly generate X32 IE sequence
On Mon, Mar 19, 2012 at 8:54 AM, H.J. Lu wrote: > On Mon, Mar 19, 2012 at 8:51 AM, H.J. Lu wrote: >> On Sun, Mar 18, 2012 at 1:55 PM, Uros Bizjak wrote: >>> On Sun, Mar 18, 2012 at 5:01 PM, Uros Bizjak wrote: >>> > I am testing this patch. OK for trunk if it passes all tests? No, force_reg will generate a pseudo, so this conversion is valid only for !can_create_pseudo (). At least for *tls_initial_exec_x32_store, you will need a temporary to split the pattern after reload. >> >> Here is the updated patch to add can_create_pseudo. I also changed >> tls_initial_exec_x32 to take an input register operand as thread pointer. >> >>> Please try attached patch. It simply throws away all recent >>> complications w.r.t. to thread pointer and always handles TP in >>> DImode. >>> >>> The testcase: >>> >>> --cut here-- >>> __thread int foo __attribute__ ((tls_model ("initial-exec"))); >>> >>> void bar (int x) >>> { >>> foo = x; >>> } >>> >>> int baz (void) >>> { >>> return foo; >>> } >>> --cut here-- >>> >>> Now compiles to: >>> >>> bar: >>> movq foo@gottpoff(%rip), %rax >>> movl %edi, %fs:(%rax) >>> ret >>> >>> baz: >>> movq foo@gottpoff(%rip), %rax >>> movl %fs:(%rax), %eax >>> ret >>> >>> In effect, this always generates %fs(%rDI) and emits REX prefix before >>> mov/add to satisfy brain-dead linkers. >>> >>> The patch is bootstrapping now on x86_64-pc-linux-gnu. >>> >> >> For >> >> -- >> extern __thread char c; >> extern char y; >> void >> ie (void) >> { >> y = c; >> } >> -- >> >> Your patch generates: >> >> movl %fs:0, %eax >> movq c@gottpoff(%rip), %rdx >> movzbl (%rax,%rdx), %edx >> movb %dl, y(%rip) >> ret >> >> It can be optimized to: >> >> movq c@gottpoff(%rip), %rax >> movzbl %fs:(%rax), %eax >> movb %al, y(%rip) >> ret >> > > Combine failed: > > (set (reg:QI 63 [ c ]) > (mem/c:QI (plus:DI (zero_extend:DI (unspec:SI [ > (const_int 0 [0]) > ] UNSPEC_TP)) > (mem/u/c:DI (const:DI (unspec:DI [ > (symbol_ref:SI ("c") [flags 0x60] > ) > ] UNSPEC_GOTNTPOFF)) [2 S8 A8])) [0 c+0 S1 A8])) > > Wrong testcase. IT should be -- extern __thread char c; extern __thread short w; extern char y; extern short i; void ie (void) { y = c; i = w; } --- I got movl%fs:0, %eax movqc@gottpoff(%rip), %rdx movzbl (%rax,%rdx), %edx movb%dl, y(%rip) movqw@gottpoff(%rip), %rdx movzwl (%rax,%rdx), %eax movw%ax, i(%rip) ret It can be movqc@gottpoff(%rip), %rax movzbl %fs:(%rax), %eax movb%al, y(%rip) movqw@gottpoff(%rip), %rax movzwl %fs:(%rax), %eax movw%ax, i(%rip) ret -- H.J.
Re: PATCH: Properly generate X32 IE sequence
On Mon, Mar 19, 2012 at 8:51 AM, H.J. Lu wrote: > On Sun, Mar 18, 2012 at 1:55 PM, Uros Bizjak wrote: >> On Sun, Mar 18, 2012 at 5:01 PM, Uros Bizjak wrote: >> I am testing this patch. OK for trunk if it passes all tests? >>> >>> No, force_reg will generate a pseudo, so this conversion is valid only >>> for !can_create_pseudo (). >>> >>> At least for *tls_initial_exec_x32_store, you will need a temporary to >>> split the pattern after reload. > > Here is the updated patch to add can_create_pseudo. I also changed > tls_initial_exec_x32 to take an input register operand as thread pointer. > >> Please try attached patch. It simply throws away all recent >> complications w.r.t. to thread pointer and always handles TP in >> DImode. >> >> The testcase: >> >> --cut here-- >> __thread int foo __attribute__ ((tls_model ("initial-exec"))); >> >> void bar (int x) >> { >> foo = x; >> } >> >> int baz (void) >> { >> return foo; >> } >> --cut here-- >> >> Now compiles to: >> >> bar: >> movq foo@gottpoff(%rip), %rax >> movl %edi, %fs:(%rax) >> ret >> >> baz: >> movq foo@gottpoff(%rip), %rax >> movl %fs:(%rax), %eax >> ret >> >> In effect, this always generates %fs(%rDI) and emits REX prefix before >> mov/add to satisfy brain-dead linkers. >> >> The patch is bootstrapping now on x86_64-pc-linux-gnu. >> > > For > > -- > extern __thread char c; > extern char y; > void > ie (void) > { > y = c; > } > -- > > Your patch generates: > > movl %fs:0, %eax > movq c@gottpoff(%rip), %rdx > movzbl (%rax,%rdx), %edx > movb %dl, y(%rip) > ret > > It can be optimized to: > > movq c@gottpoff(%rip), %rax > movzbl %fs:(%rax), %eax > movb %al, y(%rip) > ret > Combine failed: (set (reg:QI 63 [ c ]) (mem/c:QI (plus:DI (zero_extend:DI (unspec:SI [ (const_int 0 [0]) ] UNSPEC_TP)) (mem/u/c:DI (const:DI (unspec:DI [ (symbol_ref:SI ("c") [flags 0x60] ) ] UNSPEC_GOTNTPOFF)) [2 S8 A8])) [0 c+0 S1 A8])) -- H.J.
Re: PATCH: Properly generate X32 IE sequence
On Sun, Mar 18, 2012 at 1:55 PM, Uros Bizjak wrote: > On Sun, Mar 18, 2012 at 5:01 PM, Uros Bizjak wrote: > >>> I am testing this patch. OK for trunk if it passes all tests? >> >> No, force_reg will generate a pseudo, so this conversion is valid only >> for !can_create_pseudo (). >> >> At least for *tls_initial_exec_x32_store, you will need a temporary to >> split the pattern after reload. Here is the updated patch to add can_create_pseudo. I also changed tls_initial_exec_x32 to take an input register operand as thread pointer. > Please try attached patch. It simply throws away all recent > complications w.r.t. to thread pointer and always handles TP in > DImode. > > The testcase: > > --cut here-- > __thread int foo __attribute__ ((tls_model ("initial-exec"))); > > void bar (int x) > { > foo = x; > } > > int baz (void) > { > return foo; > } > --cut here-- > > Now compiles to: > > bar: > movq foo@gottpoff(%rip), %rax > movl %edi, %fs:(%rax) > ret > > baz: > movq foo@gottpoff(%rip), %rax > movl %fs:(%rax), %eax > ret > > In effect, this always generates %fs(%rDI) and emits REX prefix before > mov/add to satisfy brain-dead linkers. > > The patch is bootstrapping now on x86_64-pc-linux-gnu. > For -- extern __thread char c; extern char y; void ie (void) { y = c; } -- Your patch generates: movl%fs:0, %eax movqc@gottpoff(%rip), %rdx movzbl (%rax,%rdx), %edx movb%dl, y(%rip) ret It can be optimized to: movqc@gottpoff(%rip), %rax movzbl %fs:(%rax), %eax movb%al, y(%rip) ret H.J. 2012-03-19 H.J. Lu * config/i386/i386-protos.h (ix86_split_tls_initial_exec_x32): New. * config/i386/i386.c (legitimize_tls_address): Also pass thread pointer to gen_tls_initial_exec_x32. (ix86_split_tls_initial_exec_x32): New. * config/i386/i386.md (*load_tp_x32): Renamed to ... (*load_tp_x32_): This. Replace SI with SWI48x. (tls_initial_exec_x32): Add an input register operand as thread pointer. Generate a REX prefix if needed. (*tls_initial_exec_x32_load): New. (*tls_initial_exec_x32_store): Likewise. diff --git a/gcc/config/i386/i386-protos.h b/gcc/config/i386/i386-protos.h index 630112f..528eeaa 100644 --- a/gcc/config/i386/i386-protos.h +++ b/gcc/config/i386/i386-protos.h @@ -142,6 +142,7 @@ extern void ix86_split_lshr (rtx *, rtx, enum machine_mode); extern rtx ix86_find_base_term (rtx); extern bool ix86_check_movabs (rtx, int); extern void ix86_split_idivmod (enum machine_mode, rtx[], bool); +extern void ix86_split_tls_initial_exec_x32 (rtx [], enum machine_mode, bool); extern rtx assign_386_stack_local (enum machine_mode, enum ix86_stack_slot); extern int ix86_attr_length_immediate_default (rtx, bool); diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index 78a366e..fb802ee 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -12671,13 +12671,14 @@ legitimize_tls_address (rtx x, enum tls_model model, bool for_mov) } else if (Pmode == SImode) { - /* Always generate - movl %fs:0, %reg32 + /* Always generate a REX prefix for addl xgottpoff(%rip), %reg32 - to support linker IE->LE optimization and avoid - fs:(%reg32) as memory operand. */ + to support linker IE->LE optimization. */ dest = gen_reg_rtx (Pmode); - emit_insn (gen_tls_initial_exec_x32 (dest, x)); + base = get_thread_pointer (for_mov + || !(TARGET_TLS_DIRECT_SEG_REFS + && TARGET_TLS_INDIRECT_SEG_REFS)); + emit_insn (gen_tls_initial_exec_x32 (dest, base, x)); return dest; } @@ -12754,6 +12755,28 @@ legitimize_tls_address (rtx x, enum tls_model model, bool for_mov) return dest; } +/* Split x32 TLS IE access in MODE. Split load if LOAD is TRUE, + otherwise split store. */ + +void +ix86_split_tls_initial_exec_x32 (rtx operands[], + enum machine_mode mode, bool load) +{ + rtx base, mem; + rtx off = load ? operands[1] : operands[0]; + off = gen_rtx_UNSPEC (DImode, gen_rtvec (1, off), UNSPEC_GOTNTPOFF); + off = gen_rtx_CONST (DImode, off); + off = gen_const_mem (DImode, off); + set_mem_alias_set (off, ix86_GOT_alias_set ()); + base = gen_rtx_UNSPEC (DImode, gen_rtvec (1, const0_rtx), UNSPEC_TP); + off = gen_rtx_PLUS (DImode, base, force_reg (DImode, off)); + mem = gen_rtx_MEM (mode, off); + if (load) +emit_move_insn (operands[0], mem); + else +emit_move_insn (mem, operands[1]); +} + /* Create or return the unique __imp_DECL dllimport symbol corresponding to symbol DECL. */ diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md index eae26ae..1643792 100644 --- a/gcc/config/i386/i386.md +++ b/gcc/config/i386/i386.md @@ -12747,11 +12747,11 @@ (define_mode_attr tp_seg [(SI "gs") (DI "fs")]) ;; Load and add the thread base pointer from %:0. -(define_insn "*load_tp_x32" - [(set (matc
Re: PATCH: Properly generate X32 IE sequence
On Sun, Mar 18, 2012 at 5:01 PM, Uros Bizjak wrote: >> I am testing this patch. OK for trunk if it passes all tests? > > No, force_reg will generate a pseudo, so this conversion is valid only > for !can_create_pseudo (). > > At least for *tls_initial_exec_x32_store, you will need a temporary to > split the pattern after reload. Please try attached patch. It simply throws away all recent complications w.r.t. to thread pointer and always handles TP in DImode. The testcase: --cut here-- __thread int foo __attribute__ ((tls_model ("initial-exec"))); void bar (int x) { foo = x; } int baz (void) { return foo; } --cut here-- Now compiles to: bar: movqfoo@gottpoff(%rip), %rax movl%edi, %fs:(%rax) ret baz: movqfoo@gottpoff(%rip), %rax movl%fs:(%rax), %eax ret In effect, this always generates %fs(%rDI) and emits REX prefix before mov/add to satisfy brain-dead linkers. The patch is bootstrapping now on x86_64-pc-linux-gnu. Uros. Index: i386.md === --- i386.md (revision 185505) +++ i386.md (working copy) @@ -12836,28 +12836,6 @@ } [(set_attr "type" "multi")]) -;; When Pmode == SImode, there may be no REX prefix for ADD. Avoid -;; any instructions between MOV and ADD, which may interfere linker -;; IE->LE optimization, since the last byte of the previous instruction -;; before ADD may look like a REX prefix. This also avoids -;; movl x@gottpoff(%rip), %reg32 -;; movl $fs:(%reg32), %reg32 -;; Since address override works only on the (reg32) part in fs:(reg32), -;; we can't use it as memory operand. -(define_insn "tls_initial_exec_x32" - [(set (match_operand:SI 0 "register_operand" "=r") - (unspec:SI -[(match_operand 1 "tls_symbolic_operand")] -UNSPEC_TLS_IE_X32)) - (clobber (reg:CC FLAGS_REG))] - "TARGET_X32" -{ - output_asm_insn -("mov{l}\t{%%fs:0, %0|%0, DWORD PTR fs:0}", operands); - return "add{l}\t{%a1@gottpoff(%%rip), %0|%0, %a1@gottpoff[rip]}"; -} - [(set_attr "type" "multi")]) - ;; GNU2 TLS patterns can be split. (define_expand "tls_dynamic_gnu2_32" Index: i386.c === --- i386.c (revision 185504) +++ i386.c (working copy) @@ -11509,6 +11509,10 @@ ix86_decompose_address (rtx addr, struct ix86_addr scale = 1 << scale; break; + case ZERO_EXTEND: + op = XEXP (op, 0); + /* FALLTHRU */ + case UNSPEC: if (XINT (op, 1) == UNSPEC_TP && TARGET_TLS_DIRECT_SEG_REFS @@ -12478,15 +12482,15 @@ legitimize_pic_address (rtx orig, rtx reg) /* Load the thread pointer. If TO_REG is true, force it into a register. */ static rtx -get_thread_pointer (bool to_reg) +get_thread_pointer (enum machine_mode tp_mode, bool to_reg) { rtx tp = gen_rtx_UNSPEC (ptr_mode, gen_rtvec (1, const0_rtx), UNSPEC_TP); - if (GET_MODE (tp) != Pmode) -tp = convert_to_mode (Pmode, tp, 1); + if (GET_MODE (tp) != tp_mode) +tp = convert_to_mode (tp_mode, tp, 1); if (to_reg) -tp = copy_addr_to_reg (tp); +tp = copy_to_mode_reg (tp_mode, tp); return tp; } @@ -12538,6 +12542,7 @@ legitimize_tls_address (rtx x, enum tls_model mode { rtx dest, base, off; rtx pic = NULL_RTX, tp = NULL_RTX; + enum machine_mode tp_mode = Pmode; int type; switch (model) @@ -12563,7 +12568,7 @@ legitimize_tls_address (rtx x, enum tls_model mode else emit_insn (gen_tls_dynamic_gnu2_32 (dest, x, pic)); - tp = get_thread_pointer (true); + tp = get_thread_pointer (Pmode, true); dest = force_reg (Pmode, gen_rtx_PLUS (Pmode, tp, dest)); set_unique_reg_note (get_last_insn (), REG_EQUAL, x); @@ -12613,7 +12618,7 @@ legitimize_tls_address (rtx x, enum tls_model mode else emit_insn (gen_tls_dynamic_gnu2_32 (base, tmp, pic)); - tp = get_thread_pointer (true); + tp = get_thread_pointer (Pmode, true); set_unique_reg_note (get_last_insn (), REG_EQUAL, gen_rtx_MINUS (Pmode, tmp, tp)); } @@ -12659,27 +12664,18 @@ legitimize_tls_address (rtx x, enum tls_model mode case TLS_MODEL_INITIAL_EXEC: if (TARGET_64BIT) { + tp_mode = DImode; + if (TARGET_SUN_TLS) { /* The Sun linker took the AMD64 TLS spec literally and can only handle %rax as destination of the initial executable code sequence. */ - dest = gen_reg_rtx (Pmode); + dest = gen_reg_rtx (tp_mode); emit_insn (gen_tls_initial_exec_64_sun (dest, x)); return dest; } - else if (Pmode == SImode) - { - /* Always generate - movl %fs:0, %reg32 -
Re: PATCH: Properly generate X32 IE sequence
On Sat, Mar 17, 2012 at 10:50 PM, H.J. Lu wrote: > Since we must use reg64 in %fs:(%reg) memory operand like > > movq x@gottpoff(%rip),%reg64; > mov %fs:(%reg64),%reg > > this patch optimizes x32 TLS IE load and store by wrapping > %reg64 inside of UNSPEC when Pmode == SImode. OK for > trunk? Can you implement this with define_insn_and_split, like i.e. *tls_dynamic_gnu2_combine_32 ? >>> >>> I will give it a try again. Last time when I tried it, GCC didn't >>> like memory operand in DImode when Pmode == SImode. >> >> You should remove mode for tls_symbolic_operand predicate. >> > > I am testing this patch. OK for trunk if it passes all tests? No, force_reg will generate a pseudo, so this conversion is valid only for !can_create_pseudo (). At least for *tls_initial_exec_x32_store, you will need a temporary to split the pattern after reload. Uros.
Re: PATCH: Properly generate X32 IE sequence
On Sat, Mar 17, 2012 at 11:20 AM, Uros Bizjak wrote: > On Sat, Mar 17, 2012 at 7:18 PM, H.J. Lu wrote: > Since we must use reg64 in %fs:(%reg) memory operand like movq x@gottpoff(%rip),%reg64; mov %fs:(%reg64),%reg this patch optimizes x32 TLS IE load and store by wrapping %reg64 inside of UNSPEC when Pmode == SImode. OK for trunk? Thanks. -- H.J. --- 2012-03-11 H.J. Lu * config/i386/i386.md (*tls_initial_exec_x32_load): New. (*tls_initial_exec_x32_store): Likewise. >>> >>> Can you implement this with define_insn_and_split, like i.e. >>> *tls_dynamic_gnu2_combine_32 ? >>> >> >> I will give it a try again. Last time when I tried it, GCC didn't >> like memory operand in DImode when Pmode == SImode. > > You should remove mode for tls_symbolic_operand predicate. > I am testing this patch. OK for trunk if it passes all tests? Thanks. -- H.J. 2012-03-17 H.J. Lu * config/i386/i386-protos.h (ix86_split_tls_initial_exec_x32): New. * config/i386/i386.c (ix86_split_tls_initial_exec_x32): Likewise. * config/i386/i386.md (*tls_initial_exec_x32_load): New. (*tls_initial_exec_x32_store): Likewise. diff --git a/gcc/config/i386/i386-protos.h b/gcc/config/i386/i386-protos.h index 630112f..2c4f1ed 100644 --- a/gcc/config/i386/i386-protos.h +++ b/gcc/config/i386/i386-protos.h @@ -213,6 +213,7 @@ extern unsigned int ix86_get_callcvt (const_tree); #endif extern rtx ix86_tls_module_base (void); +extern void ix86_split_tls_initial_exec_x32 (rtx [], enum machine_mode, bool); extern void ix86_expand_vector_init (bool, rtx, rtx); extern void ix86_expand_vector_set (bool, rtx, rtx, int); diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index 78a366e..5a9c673 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -12754,6 +12754,28 @@ legitimize_tls_address (rtx x, enum tls_model model, bool for_mov) return dest; } +/* Split x32 TLS IE access in MODE. Split load if LOAD is TRUE, + otherwise split store. */ + +void +ix86_split_tls_initial_exec_x32 (rtx operands[], +enum machine_mode mode, bool load) +{ + rtx base, mem; + rtx off = load ? operands[1] : operands[0]; + off = gen_rtx_UNSPEC (DImode, gen_rtvec (1, off), UNSPEC_GOTNTPOFF); + off = gen_rtx_CONST (DImode, off); + off = gen_const_mem (DImode, off); + set_mem_alias_set (off, ix86_GOT_alias_set ()); + base = gen_rtx_UNSPEC (DImode, gen_rtvec (1, const0_rtx), UNSPEC_TP); + off = gen_rtx_PLUS (DImode, base, force_reg (DImode, off)); + mem = gen_rtx_MEM (mode, off); + if (load) +emit_move_insn (operands[0], mem); + else +emit_move_insn (mem, operands[1]); +} + /* Create or return the unique __imp_DECL dllimport symbol corresponding to symbol DECL. */ diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md index eae26ae..78faeec 100644 --- a/gcc/config/i386/i386.md +++ b/gcc/config/i386/i386.md @@ -12858,6 +12858,32 @@ } [(set_attr "type" "multi")]) +(define_insn_and_split "*tls_initial_exec_x32_load" + [(set (match_operand:SWI1248x 0 "register_operand" "=r") +(mem:SWI1248x + (unspec:SI + [(match_operand 1 "tls_symbolic_operand" "")] + UNSPEC_TLS_IE_X32))) + (clobber (reg:CC FLAGS_REG))] + "TARGET_X32" + "#" + "" + [(const_int 0)] + "ix86_split_tls_initial_exec_x32 (operands, mode, TRUE); DONE;") + +(define_insn_and_split "*tls_initial_exec_x32_store" + [(set (mem:SWI1248x + (unspec:SI + [(match_operand 0 "tls_symbolic_operand" "")] + UNSPEC_TLS_IE_X32)) + (match_operand:SWI1248x 1 "register_operand" "r")) + (clobber (reg:CC FLAGS_REG))] + "TARGET_X32" + "#" + "" + [(const_int 0)] + "ix86_split_tls_initial_exec_x32 (operands, mode, FALSE); DONE;") + ;; GNU2 TLS patterns can be split. (define_expand "tls_dynamic_gnu2_32"
Re: PATCH: Properly generate X32 IE sequence
On Sat, Mar 17, 2012 at 7:18 PM, H.J. Lu wrote: >>> Since we must use reg64 in %fs:(%reg) memory operand like >>> >>> movq x@gottpoff(%rip),%reg64; >>> mov %fs:(%reg64),%reg >>> >>> this patch optimizes x32 TLS IE load and store by wrapping >>> %reg64 inside of UNSPEC when Pmode == SImode. OK for >>> trunk? >>> >>> Thanks. >>> >>> -- >>> H.J. >>> --- >>> 2012-03-11 H.J. Lu >>> >>> * config/i386/i386.md (*tls_initial_exec_x32_load): New. >>> (*tls_initial_exec_x32_store): Likewise. >> >> Can you implement this with define_insn_and_split, like i.e. >> *tls_dynamic_gnu2_combine_32 ? >> > > I will give it a try again. Last time when I tried it, GCC didn't > like memory operand in DImode when Pmode == SImode. You should remove mode for tls_symbolic_operand predicate. Uros.
Re: PATCH: Properly generate X32 IE sequence
On Sat, Mar 17, 2012 at 11:10 AM, Uros Bizjak wrote: > On Sun, Mar 11, 2012 at 6:11 PM, H.J. Lu wrote: > >> Since we must use reg64 in %fs:(%reg) memory operand like >> >> movq x@gottpoff(%rip),%reg64; >> mov %fs:(%reg64),%reg >> >> this patch optimizes x32 TLS IE load and store by wrapping >> %reg64 inside of UNSPEC when Pmode == SImode. OK for >> trunk? >> >> Thanks. >> >> -- >> H.J. >> --- >> 2012-03-11 H.J. Lu >> >> * config/i386/i386.md (*tls_initial_exec_x32_load): New. >> (*tls_initial_exec_x32_store): Likewise. > > Can you implement this with define_insn_and_split, like i.e. > *tls_dynamic_gnu2_combine_32 ? > I will give it a try again. Last time when I tried it, GCC didn't like memory operand in DImode when Pmode == SImode. -- H.J.
Re: PATCH: Properly generate X32 IE sequence
On Sun, Mar 11, 2012 at 6:11 PM, H.J. Lu wrote: > Since we must use reg64 in %fs:(%reg) memory operand like > > movq x@gottpoff(%rip),%reg64; > mov %fs:(%reg64),%reg > > this patch optimizes x32 TLS IE load and store by wrapping > %reg64 inside of UNSPEC when Pmode == SImode. OK for > trunk? > > Thanks. > > -- > H.J. > --- > 2012-03-11 H.J. Lu > > * config/i386/i386.md (*tls_initial_exec_x32_load): New. > (*tls_initial_exec_x32_store): Likewise. Can you implement this with define_insn_and_split, like i.e. *tls_dynamic_gnu2_combine_32 ? Uros.
Re: PATCH: Properly generate X32 IE sequence
On Tue, Mar 13, 2012 at 3:37 AM, Uros Bizjak wrote: > On Tue, Mar 13, 2012 at 8:11 AM, Uros Bizjak wrote: > > Please try attached patch. It introduces TARGET_TLS_INDIRECT_SEG_REFS > to block only indirect seg references. >>> >>> There is no regression. >> >> Thanks, committed to mainline SVN with following ChangeLog: >> >> 2012-03-13 Uros Bizjak >> >> * config/i386/i386.h (TARGET_TLS_INDIRECT_SEG_REFS): New. >> * config/i386/i386.c (ix86_decompose_address): Use >> TARGET_TLS_INDIRECT_SEG_REFS to prevent %fs:(%reg) addresses. >> (legitimize_tls_address): Use TARGET_TLS_INDIRECT_SEG_REFS to load >> thread pointer to a register. >> >> Tested on x86_64-pc-linux-gnu {,-m32}. >> >>> BTW, this x32 TLS IE optimization: >> >> > movq %rax, %fs:(%rdx) >> >> This is just looking for troubles. If we said these addresses are >> invalid, then we shouldn't generate them. > > OTOH, we can improve rejection test a bit to reject only non-word > mode registers. > > 2012-03-13 Uros Bizjak > > * config/i386/i386.c (ix86_decompose_address): Prevent %fs:(%reg) > addresses only when %reg is not in word mode. > > Tested on x86_64-pc-linux-gnu {,-m32}, committed. > > Uros. > > Index: i386.c > === > --- i386.c (revision 185278) > +++ i386.c (working copy) > @@ -11563,8 +11563,10 @@ > return 0; > } > > - if (seg != SEG_DEFAULT && (base || index) > - && !TARGET_TLS_INDIRECT_SEG_REFS) > +/* Address override works only on the (%reg) part of %fs:(%reg). */ > + if (seg != SEG_DEFAULT > + && ((base && GET_MODE (base) != word_mode) > + || (index && GET_MODE (index) != word_mode))) > return 0; > > /* Extract the integral value of scale. */ Is my x32 TLS IE optimization: http://gcc.gnu.org/ml/gcc-patches/2012-03/msg00714.html OK for trunk? Thanks. -- H.J.
Re: PATCH: Properly generate X32 IE sequence
On Tue, Mar 13, 2012 at 3:37 AM, Uros Bizjak wrote: > On Tue, Mar 13, 2012 at 8:11 AM, Uros Bizjak wrote: > > Please try attached patch. It introduces TARGET_TLS_INDIRECT_SEG_REFS > to block only indirect seg references. >>> >>> There is no regression. >> >> Thanks, committed to mainline SVN with following ChangeLog: >> >> 2012-03-13 Uros Bizjak >> >> * config/i386/i386.h (TARGET_TLS_INDIRECT_SEG_REFS): New. >> * config/i386/i386.c (ix86_decompose_address): Use >> TARGET_TLS_INDIRECT_SEG_REFS to prevent %fs:(%reg) addresses. >> (legitimize_tls_address): Use TARGET_TLS_INDIRECT_SEG_REFS to load >> thread pointer to a register. >> >> Tested on x86_64-pc-linux-gnu {,-m32}. >> >>> BTW, this x32 TLS IE optimization: >> >> > movq %rax, %fs:(%rdx) >> >> This is just looking for troubles. If we said these addresses are >> invalid, then we shouldn't generate them. > > OTOH, we can improve rejection test a bit to reject only non-word > mode registers. > > 2012-03-13 Uros Bizjak > > * config/i386/i386.c (ix86_decompose_address): Prevent %fs:(%reg) > addresses only when %reg is not in word mode. > > Tested on x86_64-pc-linux-gnu {,-m32}, committed. > > Uros. > > Index: i386.c > === > --- i386.c (revision 185278) > +++ i386.c (working copy) > @@ -11563,8 +11563,10 @@ > return 0; > } > > - if (seg != SEG_DEFAULT && (base || index) > - && !TARGET_TLS_INDIRECT_SEG_REFS) > +/* Address override works only on the (%reg) part of %fs:(%reg). */ > + if (seg != SEG_DEFAULT > + && ((base && GET_MODE (base) != word_mode) > + || (index && GET_MODE (index) != word_mode))) > return 0; > > /* Extract the integral value of scale. */ This works. -- H.J.
Re: PATCH: Properly generate X32 IE sequence
On Tue, Mar 13, 2012 at 8:11 AM, Uros Bizjak wrote: Please try attached patch. It introduces TARGET_TLS_INDIRECT_SEG_REFS to block only indirect seg references. >> >> There is no regression. > > Thanks, committed to mainline SVN with following ChangeLog: > > 2012-03-13 Uros Bizjak > > * config/i386/i386.h (TARGET_TLS_INDIRECT_SEG_REFS): New. > * config/i386/i386.c (ix86_decompose_address): Use > TARGET_TLS_INDIRECT_SEG_REFS to prevent %fs:(%reg) addresses. > (legitimize_tls_address): Use TARGET_TLS_INDIRECT_SEG_REFS to load > thread pointer to a register. > > Tested on x86_64-pc-linux-gnu {,-m32}. > >> BTW, this x32 TLS IE optimization: > > > movq %rax, %fs:(%rdx) > > This is just looking for troubles. If we said these addresses are > invalid, then we shouldn't generate them. OTOH, we can improve rejection test a bit to reject only non-word mode registers. 2012-03-13 Uros Bizjak * config/i386/i386.c (ix86_decompose_address): Prevent %fs:(%reg) addresses only when %reg is not in word mode. Tested on x86_64-pc-linux-gnu {,-m32}, committed. Uros. Index: i386.c === --- i386.c (revision 185278) +++ i386.c (working copy) @@ -11563,8 +11563,10 @@ return 0; } - if (seg != SEG_DEFAULT && (base || index) - && !TARGET_TLS_INDIRECT_SEG_REFS) +/* Address override works only on the (%reg) part of %fs:(%reg). */ + if (seg != SEG_DEFAULT + && ((base && GET_MODE (base) != word_mode) + || (index && GET_MODE (index) != word_mode))) return 0; /* Extract the integral value of scale. */
Re: PATCH: Properly generate X32 IE sequence
On Tue, Mar 13, 2012 at 2:20 AM, H.J. Lu wrote: Here is the patch which is equivalent to clearing MASK_TLS_DIRECT_SEG_REFS when Pmode != word_mode. We need to keep else if (Pmode == SImode) { /* Always generate movl %fs:0, %reg32 addl xgottpoff(%rip), %reg32 to support linker IE->LE optimization and avoid fs:(%reg32) as memory operand. */ dest = gen_reg_rtx (Pmode); emit_insn (gen_tls_initial_exec_x32 (dest, x)); return dest; } to support linker IE->LE optimization. TARGET_TLS_DIRECT_SEG_REFS only affects TLS LE access and fs:(%reg) is only generated by combine. So the main impact of disabling TARGET_TLS_DIRECT_SEG_REFS is to disable fs:immediate memory operand for TLS LE access, which doesn't have any problems to begin with. I would prefer to keep TARGET_TLS_DIRECT_SEG_REFS and disable only fs:(%reg), which is generated by combine. >>> >>> Please try attached patch. It introduces TARGET_TLS_INDIRECT_SEG_REFS >>> to block only indirect seg references. > > There is no regression. Thanks, committed to mainline SVN with following ChangeLog: 2012-03-13 Uros Bizjak * config/i386/i386.h (TARGET_TLS_INDIRECT_SEG_REFS): New. * config/i386/i386.c (ix86_decompose_address): Use TARGET_TLS_INDIRECT_SEG_REFS to prevent %fs:(%reg) addresses. (legitimize_tls_address): Use TARGET_TLS_INDIRECT_SEG_REFS to load thread pointer to a register. Tested on x86_64-pc-linux-gnu {,-m32}. > BTW, this x32 TLS IE optimization: >movq%rax, %fs:(%rdx) This is just looking for troubles. If we said these addresses are invalid, then we shouldn't generate them. Uros.
Re: PATCH: Properly generate X32 IE sequence
On Mon, Mar 12, 2012 at 3:35 PM, H.J. Lu wrote: > On Mon, Mar 12, 2012 at 12:39 PM, Uros Bizjak wrote: >> On Sun, Mar 11, 2012 at 10:24 PM, H.J. Lu wrote: >> >>> Here is the patch which is equivalent to clearing MASK_TLS_DIRECT_SEG_REFS >>> when Pmode != word_mode. We need to keep >>> >>> else if (Pmode == SImode) >>> { >>> /* Always generate >>> movl %fs:0, %reg32 >>> addl xgottpoff(%rip), %reg32 >>> to support linker IE->LE optimization and avoid >>> fs:(%reg32) as memory operand. */ >>> dest = gen_reg_rtx (Pmode); >>> emit_insn (gen_tls_initial_exec_x32 (dest, x)); >>> return dest; >>> } >>> >>> to support linker IE->LE optimization. TARGET_TLS_DIRECT_SEG_REFS only >>> affects >>> TLS LE access and fs:(%reg) is only generated by combine. >>> >>> So the main impact of disabling TARGET_TLS_DIRECT_SEG_REFS is to disable >>> fs:immediate memory operand for TLS LE access, which doesn't have any >>> problems >>> to begin with. >>> >>> I would prefer to keep TARGET_TLS_DIRECT_SEG_REFS and disable only >>> fs:(%reg), which is generated by combine. >> >> Please try attached patch. It introduces TARGET_TLS_INDIRECT_SEG_REFS >> to block only indirect seg references. >> >> Uros. > > I am testing it. > There is no regression. BTW, this x32 TLS IE optimization: http://gcc.gnu.org/ml/gcc-patches/2012-03/msg00714.html is still useful. For [hjl@gnu-6 tls]$ cat ie2.i extern __thread long long int x; extern long long int y; void ie2 (void) { x = y; } [hjl@gnu-6 tls]$ my patch turns ie2: .LFB0: .cfi_startproc movqy(%rip), %rdx # 6 *movdi_internal_rex64/2 [length = 7] movl%fs:0, %eax # 5 tls_initial_exec_x32[length = 16] addlx@gottpoff(%rip), %eax movq%rdx, (%eax)# 7 *movdi_internal_rex64/4 [length = 3] ret # 14simple_return_internal [length = 1] .cfi_endproc into ie2: .LFB0: .cfi_startproc movqy(%rip), %rax # 6 *movdi_internal_rex64/2 [length = 7] movqx@gottpoff(%rip), %rdx # 7 *tls_initial_exec_x32_store [length = 16] movq%rax, %fs:(%rdx) ret # 14simple_return_internal [length = 1] .cfi_endproc -- H.J.
Re: PATCH: Properly generate X32 IE sequence
On Mon, Mar 12, 2012 at 12:39 PM, Uros Bizjak wrote: > On Sun, Mar 11, 2012 at 10:24 PM, H.J. Lu wrote: > >> Here is the patch which is equivalent to clearing MASK_TLS_DIRECT_SEG_REFS >> when Pmode != word_mode. We need to keep >> >> else if (Pmode == SImode) >> { >> /* Always generate >> movl %fs:0, %reg32 >> addl xgottpoff(%rip), %reg32 >> to support linker IE->LE optimization and avoid >> fs:(%reg32) as memory operand. */ >> dest = gen_reg_rtx (Pmode); >> emit_insn (gen_tls_initial_exec_x32 (dest, x)); >> return dest; >> } >> >> to support linker IE->LE optimization. TARGET_TLS_DIRECT_SEG_REFS only >> affects >> TLS LE access and fs:(%reg) is only generated by combine. >> >> So the main impact of disabling TARGET_TLS_DIRECT_SEG_REFS is to disable >> fs:immediate memory operand for TLS LE access, which doesn't have any >> problems >> to begin with. >> >> I would prefer to keep TARGET_TLS_DIRECT_SEG_REFS and disable only >> fs:(%reg), which is generated by combine. > > Please try attached patch. It introduces TARGET_TLS_INDIRECT_SEG_REFS > to block only indirect seg references. > > Uros. I am testing it. -- H.J.
Re: PATCH: Properly generate X32 IE sequence
On Sun, Mar 11, 2012 at 10:24 PM, H.J. Lu wrote: > Here is the patch which is equivalent to clearing MASK_TLS_DIRECT_SEG_REFS > when Pmode != word_mode. We need to keep > > else if (Pmode == SImode) > { > /* Always generate > movl %fs:0, %reg32 > addl xgottpoff(%rip), %reg32 > to support linker IE->LE optimization and avoid > fs:(%reg32) as memory operand. */ > dest = gen_reg_rtx (Pmode); > emit_insn (gen_tls_initial_exec_x32 (dest, x)); > return dest; > } > > to support linker IE->LE optimization. TARGET_TLS_DIRECT_SEG_REFS only > affects > TLS LE access and fs:(%reg) is only generated by combine. > > So the main impact of disabling TARGET_TLS_DIRECT_SEG_REFS is to disable > fs:immediate memory operand for TLS LE access, which doesn't have any problems > to begin with. > > I would prefer to keep TARGET_TLS_DIRECT_SEG_REFS and disable only > fs:(%reg), which is generated by combine. Please try attached patch. It introduces TARGET_TLS_INDIRECT_SEG_REFS to block only indirect seg references. Uros. Index: i386.c === --- i386.c (revision 185250) +++ i386.c (working copy) @@ -11552,11 +11552,6 @@ ix86_decompose_address (rtx addr, struct ix86_addr else disp = addr; /* displacement */ - /* Since address override works only on the (reg32) part in fs:(reg32), - we can't use it as memory operand. */ - if (Pmode != word_mode && seg == SEG_FS && (base || index)) -return 0; - if (index) { if (REG_P (index)) @@ -11568,6 +11563,10 @@ ix86_decompose_address (rtx addr, struct ix86_addr return 0; } + if (seg != SEG_DEFAULT && (base || index) + && !TARGET_TLS_INDIRECT_SEG_REFS) +return 0; + /* Extract the integral value of scale. */ if (scale_rtx) { @@ -12696,7 +12695,9 @@ legitimize_tls_address (rtx x, enum tls_model mode if (TARGET_64BIT || TARGET_ANY_GNU_TLS) { - base = get_thread_pointer (for_mov || !TARGET_TLS_DIRECT_SEG_REFS); + base = get_thread_pointer (for_mov +|| !(TARGET_TLS_DIRECT_SEG_REFS + && TARGET_TLS_INDIRECT_SEG_REFS)); off = force_reg (Pmode, off); return gen_rtx_PLUS (Pmode, base, off); } @@ -12716,7 +12717,9 @@ legitimize_tls_address (rtx x, enum tls_model mode if (TARGET_64BIT || TARGET_ANY_GNU_TLS) { - base = get_thread_pointer (for_mov || !TARGET_TLS_DIRECT_SEG_REFS); + base = get_thread_pointer (for_mov +|| !(TARGET_TLS_DIRECT_SEG_REFS + && TARGET_TLS_INDIRECT_SEG_REFS)); return gen_rtx_PLUS (Pmode, base, off); } else @@ -13249,7 +13252,8 @@ ix86_delegitimize_tls_address (rtx orig_x) rtx x = orig_x, unspec; struct ix86_address addr; - if (!TARGET_TLS_DIRECT_SEG_REFS) + if (!(TARGET_TLS_DIRECT_SEG_REFS + && TARGET_TLS_INDIRECT_SEG_REFS)) return orig_x; if (MEM_P (x)) x = XEXP (x, 0); Index: i386.h === --- i386.h (revision 185250) +++ i386.h (working copy) @@ -467,6 +467,9 @@ extern int x86_prefetch_sse; #define TARGET_TLS_DIRECT_SEG_REFS_DEFAULT 0 #endif +/* Address override works only on the (%reg) part in %fs:(%reg). */ +#define TARGET_TLS_INDIRECT_SEG_REFS (Pmode == word_mode) + /* Fence to use after loop using storent. */ extern tree x86_mfence;
Re: PATCH: Properly generate X32 IE sequence
On Sun, Mar 11, 2012 at 11:21 AM, Uros Bizjak wrote: > On Sun, Mar 11, 2012 at 7:16 PM, H.J. Lu wrote: > >>> * config/i386/i386.c (ix86_decompose_address): Disallow fs:(reg) >>> if Pmode != word_mode. >>> (legitimize_tls_address): Call gen_tls_initial_exec_x32 if >>> Pmode == SImode for x32. >>> >>> * config/i386/i386.md (UNSPEC_TLS_IE_X32): New. >>> (tls_initial_exec_x32): Likewise. >> >> Nice solution! >> >> OK for mainline. > > Done. > >> BTW: Did you investigate the issue with memory aliasing? >> > > It isn't a problem since it is wrapped in UNSPEC_TLS_IE_X32 > which loads address of the TLS symbol. > > Thanks. > Since we must use reg64 in %fs:(%reg) memory operand like movq x@gottpoff(%rip),%reg64; mov %fs:(%reg64),%reg this patch optimizes x32 TLS IE load and store by wrapping %reg64 inside of UNSPEC when Pmode == SImode. OK for trunk? >>> >>> I think we should just scrap all these complications and go with the >>> idea of clearing MASK_TLS_DIRECT_SEG_REFS. >>> >> >> I will give it a try. > > You can also revert: > >>> * config/i386/i386.c (ix86_decompose_address): Disallow fs:(reg) >>> if Pmode != word_mode. > > then, since this part is handled later in the function. > Here is the patch which is equivalent to clearing MASK_TLS_DIRECT_SEG_REFS when Pmode != word_mode. We need to keep else if (Pmode == SImode) { /* Always generate movl %fs:0, %reg32 addl xgottpoff(%rip), %reg32 to support linker IE->LE optimization and avoid fs:(%reg32) as memory operand. */ dest = gen_reg_rtx (Pmode); emit_insn (gen_tls_initial_exec_x32 (dest, x)); return dest; } to support linker IE->LE optimization. TARGET_TLS_DIRECT_SEG_REFS only affects TLS LE access and fs:(%reg) is only generated by combine. So the main impact of disabling TARGET_TLS_DIRECT_SEG_REFS is to disable fs:immediate memory operand for TLS LE access, which doesn't have any problems to begin with. I would prefer to keep TARGET_TLS_DIRECT_SEG_REFS and disable only fs:(%reg), which is generated by combine. -- H.J. -- diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index b101922..1ffcc85 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -11478,6 +11478,7 @@ ix86_decompose_address (rtx addr, struct ix86_address *out) case UNSPEC: if (XINT (op, 1) == UNSPEC_TP + && Pmode == word_mode && TARGET_TLS_DIRECT_SEG_REFS && seg == SEG_DEFAULT) seg = TARGET_64BIT ? SEG_FS : SEG_GS; @@ -11534,11 +11535,6 @@ ix86_decompose_address (rtx addr, struct ix86_address *out) else disp = addr; /* displacement */ - /* Since address override works only on the (reg32) part in fs:(reg32), - we can't use it as memory operand. */ - if (Pmode != word_mode && seg == SEG_FS && (base || index)) -return 0; - if (index) { if (REG_P (index)) @@ -12706,7 +12702,9 @@ legitimize_tls_address (rtx x, enum tls_model model, bool for_mov) if (TARGET_64BIT || TARGET_ANY_GNU_TLS) { - base = get_thread_pointer (for_mov || !TARGET_TLS_DIRECT_SEG_REFS); + base = get_thread_pointer (for_mov +|| Pmode != word_mode +|| !TARGET_TLS_DIRECT_SEG_REFS); return gen_rtx_PLUS (Pmode, base, off); } else @@ -13239,7 +13237,7 @@ ix86_delegitimize_tls_address (rtx orig_x) rtx x = orig_x, unspec; struct ix86_address addr; - if (!TARGET_TLS_DIRECT_SEG_REFS) + if (Pmode != word_mode || !TARGET_TLS_DIRECT_SEG_REFS) return orig_x; if (MEM_P (x)) x = XEXP (x, 0);
Re: PATCH: Properly generate X32 IE sequence
On Sun, Mar 11, 2012 at 7:16 PM, H.J. Lu wrote: >> * config/i386/i386.c (ix86_decompose_address): Disallow fs:(reg) >> if Pmode != word_mode. >> (legitimize_tls_address): Call gen_tls_initial_exec_x32 if >> Pmode == SImode for x32. >> >> * config/i386/i386.md (UNSPEC_TLS_IE_X32): New. >> (tls_initial_exec_x32): Likewise. > > Nice solution! > > OK for mainline. Done. > BTW: Did you investigate the issue with memory aliasing? > It isn't a problem since it is wrapped in UNSPEC_TLS_IE_X32 which loads address of the TLS symbol. Thanks. >>> >>> Since we must use reg64 in %fs:(%reg) memory operand like >>> >>> movq x@gottpoff(%rip),%reg64; >>> mov %fs:(%reg64),%reg >>> >>> this patch optimizes x32 TLS IE load and store by wrapping >>> %reg64 inside of UNSPEC when Pmode == SImode. OK for >>> trunk? >> >> I think we should just scrap all these complications and go with the >> idea of clearing MASK_TLS_DIRECT_SEG_REFS. >> > > I will give it a try. You can also revert: >>* config/i386/i386.c (ix86_decompose_address): Disallow fs:(reg) >>if Pmode != word_mode. then, since this part is handled later in the function. Uros.
Re: PATCH: Properly generate X32 IE sequence
On Sun, Mar 11, 2012 at 10:55 AM, Uros Bizjak wrote: > On Sun, Mar 11, 2012 at 6:11 PM, H.J. Lu wrote: > > X86-64 linker optimizes TLS_MODEL_INITIAL_EXEC to TLS_MODEL_LOCAL_EXEC > by checking > > movq foo@gottpoff(%rip), %reg > > and > > addq foo@gottpoff(%rip), %reg > > It uses the REX prefix to avoid the last byte of the previous > instruction. With 32bit Pmode, we may not have the REX prefix and > the last byte of the previous instruction may be an offset, which > may look like a REX prefix. IE->LE optimization will generate > corrupted > binary. This patch makes sure we always output an REX pfrefix for > UNSPEC_GOTNTPOFF. OK for trunk? Actually, linker has: case R_X86_64_GOTTPOFF: /* Check transition from IE access model: mov foo@gottpoff(%rip), %reg add foo@gottpoff(%rip), %reg */ /* Check REX prefix first. */ if (offset >= 3 && (offset + 4) <= sec->size) { val = bfd_get_8 (abfd, contents + offset - 3); if (val != 0x48 && val != 0x4c) { /* X32 may have 0x44 REX prefix or no REX prefix. */ if (ABI_64_P (abfd)) return FALSE; } } else { /* X32 may not have any REX prefix. */ if (ABI_64_P (abfd)) return FALSE; if (offset < 2 || (offset + 3) > sec->size) return FALSE; } So, it should handle the case without REX just OK. If it doesn't, then this is a bug in binutils. >>> >>> The last byte of the displacement in the previous instruction >>> may happen to look like a REX byte. In that case, linker >>> will overwrite the last byte of the previous instruction and >>> generate the wrong instruction sequence. >>> >>> I need to update linker to enforce the REX byte check. >> >> One important observation: if we want to follow the x86_64 TLS spec >> strictly, we have to use existing DImode patterns only. This also >> means that we should NOT convert other TLS patterns to Pmode, since >> they explicitly state movq and addq. If this is not the case, then we >> need new TLS specification for X32. > > Here is a patch to properly generate X32 IE sequence. > > This is the summary of differences between x86-64 TLS and x32 TLS: > > x86-64 x32 > GD > byte 0x66; leaq foo@tlsgd(%rip),%rdi; leaq > foo@tlsgd(%rip),%rdi; > .word 0x; rex64; call __tls_get_addr@plt .word 0x; rex64; > call __tls_get_addr@plt > > GD->IE optimization > movq %fs:0,%rax; addq x@gottpoff(%rip),%rax movl %fs:0,%eax; > addq x@gottpoff(%rip),%rax > > GD->LE optimization > movq %fs:0,%rax; leaq x@tpoff(%rax),%rax movl %fs:0,%eax; > leaq x@tpoff(%rax),%rax > > LD > leaq foo@tlsld(%rip),%rdi; leaq > foo@tlsld(%rip),%rdi; > call __tls_get_addr@plt call __tls_get_addr@plt > > LD->LE optimization > .word 0x; .byte 0x66; movq %fs:0, %rax nopl 0x0(%rax); movl > %fs:0, %eax > > IE > movq %fs:0,%reg64; movl %fs:0,%reg32; > addq x@gottpoff(%rip),%reg64 addl > x@gottpoff(%rip),%reg32 > > or > Not supported if > Pmode == SImode > movq x@gottpoff(%rip),%reg64; movq > x@gottpoff(%rip),%reg64; > movq %fs:(%reg64),%reg32 movl %fs:(%reg64), %reg32 > > IE->LE optimization > > movq %fs:0,%reg64; movl %fs:0,%reg32; > addq x@gottpoff(%rip),%reg64 addl > x@gottpoff(%rip),%reg32 > > to > > movq %fs:0,%reg64; movl %fs:0,%reg32; > addq foo@tpoff, %reg64 addl foo@tpoff, %reg32 > > movq %fs:0,%reg64; movl %fs:0,%reg32; > leaq foo@tpoff(%reg64), %reg64 leal foo@tpoff(%reg32), > %reg32 > > or > > movq x@gottpoff(%rip),%reg64 movq > x@gottpoff(%rip),%reg64; > movl %fs:(%reg64),%reg32 movl %fs:(%reg64), %reg32 > > to > > movq foo@tpoff, %reg64 movq foo@tpoff, %reg64 > movl %fs:(%reeg64),%reg32 movl
Re: PATCH: Properly generate X32 IE sequence
On Sun, Mar 11, 2012 at 6:11 PM, H.J. Lu wrote: X86-64 linker optimizes TLS_MODEL_INITIAL_EXEC to TLS_MODEL_LOCAL_EXEC by checking movq foo@gottpoff(%rip), %reg and addq foo@gottpoff(%rip), %reg It uses the REX prefix to avoid the last byte of the previous instruction. With 32bit Pmode, we may not have the REX prefix and the last byte of the previous instruction may be an offset, which may look like a REX prefix. IE->LE optimization will generate corrupted binary. This patch makes sure we always output an REX pfrefix for UNSPEC_GOTNTPOFF. OK for trunk? >>> >>> Actually, linker has: >>> >>> case R_X86_64_GOTTPOFF: >>> /* Check transition from IE access model: >>> mov foo@gottpoff(%rip), %reg >>> add foo@gottpoff(%rip), %reg >>> */ >>> >>> /* Check REX prefix first. */ >>> if (offset >= 3 && (offset + 4) <= sec->size) >>> { >>> val = bfd_get_8 (abfd, contents + offset - 3); >>> if (val != 0x48 && val != 0x4c) >>> { >>> /* X32 may have 0x44 REX prefix or no REX prefix. */ >>> if (ABI_64_P (abfd)) >>> return FALSE; >>> } >>> } >>> else >>> { >>> /* X32 may not have any REX prefix. */ >>> if (ABI_64_P (abfd)) >>> return FALSE; >>> if (offset < 2 || (offset + 3) > sec->size) >>> return FALSE; >>> } >>> >>> So, it should handle the case without REX just OK. If it doesn't, then >>> this is a bug in binutils. >>> >> >> The last byte of the displacement in the previous instruction >> may happen to look like a REX byte. In that case, linker >> will overwrite the last byte of the previous instruction and >> generate the wrong instruction sequence. >> >> I need to update linker to enforce the REX byte check. > > One important observation: if we want to follow the x86_64 TLS spec > strictly, we have to use existing DImode patterns only. This also > means that we should NOT convert other TLS patterns to Pmode, since > they explicitly state movq and addq. If this is not the case, then we > need new TLS specification for X32. Here is a patch to properly generate X32 IE sequence. This is the summary of differences between x86-64 TLS and x32 TLS: x86-64 x32 GD byte 0x66; leaq foo@tlsgd(%rip),%rdi; leaq foo@tlsgd(%rip),%rdi; .word 0x; rex64; call __tls_get_addr@plt .word 0x; rex64; call __tls_get_addr@plt GD->IE optimization movq %fs:0,%rax; addq x@gottpoff(%rip),%rax movl %fs:0,%eax; addq x@gottpoff(%rip),%rax GD->LE optimization movq %fs:0,%rax; leaq x@tpoff(%rax),%rax movl %fs:0,%eax; leaq x@tpoff(%rax),%rax LD leaq foo@tlsld(%rip),%rdi; leaq foo@tlsld(%rip),%rdi; call __tls_get_addr@plt call __tls_get_addr@plt LD->LE optimization .word 0x; .byte 0x66; movq %fs:0, %rax nopl 0x0(%rax); movl %fs:0, %eax IE movq %fs:0,%reg64; movl %fs:0,%reg32; addq x@gottpoff(%rip),%reg64 addl x@gottpoff(%rip),%reg32 or Not supported if Pmode == SImode movq x@gottpoff(%rip),%reg64; movq x@gottpoff(%rip),%reg64; movq %fs:(%reg64),%reg32 movl %fs:(%reg64), %reg32 IE->LE optimization movq %fs:0,%reg64; movl %fs:0,%reg32; addq x@gottpoff(%rip),%reg64 addl x@gottpoff(%rip),%reg32 to movq %fs:0,%reg64; movl %fs:0,%reg32; addq foo@tpoff, %reg64 addl foo@tpoff, %reg32 movq %fs:0,%reg64; movl %fs:0,%reg32; leaq foo@tpoff(%reg64), %reg64 leal foo@tpoff(%reg32), %reg32 or movq x@gottpoff(%rip),%reg64 movq x@gottpoff(%rip),%reg64; movl %fs:(%reg64),%reg32 movl %fs:(%reg64), %reg32 to movq foo@tpoff, %reg64 movq foo@tpoff, %reg64 movl %fs:(%reeg64),%reg32 movl %fs:(%reg64), %reg32 LE movq %fs:0,%reg64; movl %fs:0,%reg32; leaq x@tpoff(%reg64),%reg32 leal x@tpoff(%reg32),%reg32
Re: PATCH: Properly generate X32 IE sequence
On Sat, Mar 10, 2012 at 10:49 AM, H.J. Lu wrote: > On Sat, Mar 10, 2012 at 5:09 AM, Uros Bizjak wrote: >> On Fri, Mar 9, 2012 at 11:26 PM, H.J. Lu wrote: >>> On Mon, Mar 5, 2012 at 9:25 AM, Uros Bizjak wrote: On Mon, Mar 5, 2012 at 6:03 PM, H.J. Lu wrote: >>> X86-64 linker optimizes TLS_MODEL_INITIAL_EXEC to TLS_MODEL_LOCAL_EXEC >>> by checking >>> >>> movq foo@gottpoff(%rip), %reg >>> >>> and >>> >>> addq foo@gottpoff(%rip), %reg >>> >>> It uses the REX prefix to avoid the last byte of the previous >>> instruction. With 32bit Pmode, we may not have the REX prefix and >>> the last byte of the previous instruction may be an offset, which >>> may look like a REX prefix. IE->LE optimization will generate corrupted >>> binary. This patch makes sure we always output an REX pfrefix for >>> UNSPEC_GOTNTPOFF. OK for trunk? >> >> Actually, linker has: >> >> case R_X86_64_GOTTPOFF: >> /* Check transition from IE access model: >> mov foo@gottpoff(%rip), %reg >> add foo@gottpoff(%rip), %reg >> */ >> >> /* Check REX prefix first. */ >> if (offset >= 3 && (offset + 4) <= sec->size) >> { >> val = bfd_get_8 (abfd, contents + offset - 3); >> if (val != 0x48 && val != 0x4c) >> { >> /* X32 may have 0x44 REX prefix or no REX prefix. */ >> if (ABI_64_P (abfd)) >> return FALSE; >> } >> } >> else >> { >> /* X32 may not have any REX prefix. */ >> if (ABI_64_P (abfd)) >> return FALSE; >> if (offset < 2 || (offset + 3) > sec->size) >> return FALSE; >> } >> >> So, it should handle the case without REX just OK. If it doesn't, then >> this is a bug in binutils. >> > > The last byte of the displacement in the previous instruction > may happen to look like a REX byte. In that case, linker > will overwrite the last byte of the previous instruction and > generate the wrong instruction sequence. > > I need to update linker to enforce the REX byte check. One important observation: if we want to follow the x86_64 TLS spec strictly, we have to use existing DImode patterns only. This also means that we should NOT convert other TLS patterns to Pmode, since they explicitly state movq and addq. If this is not the case, then we need new TLS specification for X32. >>> >>> Here is a patch to properly generate X32 IE sequence. >>> >>> This is the summary of differences between x86-64 TLS and x32 TLS: >>> >>> x86-64 x32 >>> GD >>> byte 0x66; leaq foo@tlsgd(%rip),%rdi; leaq foo@tlsgd(%rip),%rdi; >>> .word 0x; rex64; call __tls_get_addr@plt .word 0x; rex64; >>> call __tls_get_addr@plt >>> >>> GD->IE optimization >>> movq %fs:0,%rax; addq x@gottpoff(%rip),%rax movl %fs:0,%eax; >>> addq x@gottpoff(%rip),%rax >>> >>> GD->LE optimization >>> movq %fs:0,%rax; leaq x@tpoff(%rax),%rax movl %fs:0,%eax; >>> leaq x@tpoff(%rax),%rax >>> >>> LD >>> leaq foo@tlsld(%rip),%rdi; leaq foo@tlsld(%rip),%rdi; >>> call __tls_get_addr@plt call __tls_get_addr@plt >>> >>> LD->LE optimization >>> .word 0x; .byte 0x66; movq %fs:0, %rax nopl 0x0(%rax); movl >>> %fs:0, %eax >>> >>> IE >>> movq %fs:0,%reg64; movl %fs:0,%reg32; >>> addq x@gottpoff(%rip),%reg64 addl >>> x@gottpoff(%rip),%reg32 >>> >>> or >>> Not supported if >>> Pmode == SImode >>> movq x@gottpoff(%rip),%reg64; movq >>> x@gottpoff(%rip),%reg64; >>> movq %fs:(%reg64),%reg32 movl %fs:(%reg64), %reg32 >>> >>> IE->LE optimization >>> >>> movq %fs:0,%reg64; movl %fs:0,%reg32; >>> addq x@gottpoff(%rip),%reg64 addl >>> x@gottpoff(%rip),%reg32 >>> >>> to >>> >>> movq %fs:0,%reg64; movl %fs:0,%reg32; >>> addq foo@tpoff, %reg64 addl foo@tpoff, %reg32 >>> >>> movq %fs:0,%reg64; movl %fs:0,%reg32; >>> leaq foo@tpoff(%reg64), %reg64 leal foo@tpoff(%reg32), >>> %reg32 >>> >>> or >>> >>> movq x@gottpoff(%rip),%reg64 movq >>> x@gottpoff(%rip),%reg64; >>> movl %fs:(%reg64),%reg32 movl %fs:(%reg64), %reg32 >>> >>> to >>> >>> movq foo@tpoff, %reg64 movq foo@tpoff, %reg64 >>> movl %fs:(%reeg64),%reg32 movl %fs:(%reg64), %reg32 >>> >>> LE >>> movq %fs:0,%reg64; movl %fs:0,%reg32; >>>
Re: PATCH: Properly generate X32 IE sequence
On Sat, Mar 10, 2012 at 5:09 AM, Uros Bizjak wrote: > On Fri, Mar 9, 2012 at 11:26 PM, H.J. Lu wrote: >> On Mon, Mar 5, 2012 at 9:25 AM, Uros Bizjak wrote: >>> On Mon, Mar 5, 2012 at 6:03 PM, H.J. Lu wrote: >>> >> X86-64 linker optimizes TLS_MODEL_INITIAL_EXEC to TLS_MODEL_LOCAL_EXEC >> by checking >> >> movq foo@gottpoff(%rip), %reg >> >> and >> >> addq foo@gottpoff(%rip), %reg >> >> It uses the REX prefix to avoid the last byte of the previous >> instruction. With 32bit Pmode, we may not have the REX prefix and >> the last byte of the previous instruction may be an offset, which >> may look like a REX prefix. IE->LE optimization will generate corrupted >> binary. This patch makes sure we always output an REX pfrefix for >> UNSPEC_GOTNTPOFF. OK for trunk? > > Actually, linker has: > > case R_X86_64_GOTTPOFF: > /* Check transition from IE access model: > mov foo@gottpoff(%rip), %reg > add foo@gottpoff(%rip), %reg > */ > > /* Check REX prefix first. */ > if (offset >= 3 && (offset + 4) <= sec->size) > { > val = bfd_get_8 (abfd, contents + offset - 3); > if (val != 0x48 && val != 0x4c) > { > /* X32 may have 0x44 REX prefix or no REX prefix. */ > if (ABI_64_P (abfd)) > return FALSE; > } > } > else > { > /* X32 may not have any REX prefix. */ > if (ABI_64_P (abfd)) > return FALSE; > if (offset < 2 || (offset + 3) > sec->size) > return FALSE; > } > > So, it should handle the case without REX just OK. If it doesn't, then > this is a bug in binutils. > The last byte of the displacement in the previous instruction may happen to look like a REX byte. In that case, linker will overwrite the last byte of the previous instruction and generate the wrong instruction sequence. I need to update linker to enforce the REX byte check. >>> >>> One important observation: if we want to follow the x86_64 TLS spec >>> strictly, we have to use existing DImode patterns only. This also >>> means that we should NOT convert other TLS patterns to Pmode, since >>> they explicitly state movq and addq. If this is not the case, then we >>> need new TLS specification for X32. >> >> Here is a patch to properly generate X32 IE sequence. >> >> This is the summary of differences between x86-64 TLS and x32 TLS: >> >> x86-64 x32 >> GD >> byte 0x66; leaq foo@tlsgd(%rip),%rdi; leaq foo@tlsgd(%rip),%rdi; >> .word 0x; rex64; call __tls_get_addr@plt .word 0x; rex64; >> call __tls_get_addr@plt >> >> GD->IE optimization >> movq %fs:0,%rax; addq x@gottpoff(%rip),%rax movl %fs:0,%eax; >> addq x@gottpoff(%rip),%rax >> >> GD->LE optimization >> movq %fs:0,%rax; leaq x@tpoff(%rax),%rax movl %fs:0,%eax; >> leaq x@tpoff(%rax),%rax >> >> LD >> leaq foo@tlsld(%rip),%rdi; leaq foo@tlsld(%rip),%rdi; >> call __tls_get_addr@plt call __tls_get_addr@plt >> >> LD->LE optimization >> .word 0x; .byte 0x66; movq %fs:0, %rax nopl 0x0(%rax); movl >> %fs:0, %eax >> >> IE >> movq %fs:0,%reg64; movl %fs:0,%reg32; >> addq x@gottpoff(%rip),%reg64 addl x@gottpoff(%rip),%reg32 >> >> or >> Not supported if >> Pmode == SImode >> movq x@gottpoff(%rip),%reg64; movq >> x@gottpoff(%rip),%reg64; >> movq %fs:(%reg64),%reg32 movl %fs:(%reg64), %reg32 >> >> IE->LE optimization >> >> movq %fs:0,%reg64; movl %fs:0,%reg32; >> addq x@gottpoff(%rip),%reg64 addl x@gottpoff(%rip),%reg32 >> >> to >> >> movq %fs:0,%reg64; movl %fs:0,%reg32; >> addq foo@tpoff, %reg64 addl foo@tpoff, %reg32 >> >> movq %fs:0,%reg64; movl %fs:0,%reg32; >> leaq foo@tpoff(%reg64), %reg64 leal foo@tpoff(%reg32), >> %reg32 >> >> or >> >> movq x@gottpoff(%rip),%reg64 movq >> x@gottpoff(%rip),%reg64; >> movl %fs:(%reg64),%reg32 movl %fs:(%reg64), %reg32 >> >> to >> >> movq foo@tpoff, %reg64 movq foo@tpoff, %reg64 >> movl %fs:(%reeg64),%reg32 movl %fs:(%reg64), %reg32 >> >> LE >> movq %fs:0,%reg64; movl %fs:0,%reg32; >> leaq x@tpoff(%reg64),%reg32 leal x@tpoff(%reg32),%reg32 >> >> or >> >> movq %fs:0,%reg64; movl %fs:0,%reg32; >> addq $x@tpoff,%reg64
Re: PATCH: Properly generate X32 IE sequence
On Fri, Mar 9, 2012 at 11:26 PM, H.J. Lu wrote: > On Mon, Mar 5, 2012 at 9:25 AM, Uros Bizjak wrote: >> On Mon, Mar 5, 2012 at 6:03 PM, H.J. Lu wrote: >> > X86-64 linker optimizes TLS_MODEL_INITIAL_EXEC to TLS_MODEL_LOCAL_EXEC > by checking > > movq foo@gottpoff(%rip), %reg > > and > > addq foo@gottpoff(%rip), %reg > > It uses the REX prefix to avoid the last byte of the previous > instruction. With 32bit Pmode, we may not have the REX prefix and > the last byte of the previous instruction may be an offset, which > may look like a REX prefix. IE->LE optimization will generate corrupted > binary. This patch makes sure we always output an REX pfrefix for > UNSPEC_GOTNTPOFF. OK for trunk? Actually, linker has: case R_X86_64_GOTTPOFF: /* Check transition from IE access model: mov foo@gottpoff(%rip), %reg add foo@gottpoff(%rip), %reg */ /* Check REX prefix first. */ if (offset >= 3 && (offset + 4) <= sec->size) { val = bfd_get_8 (abfd, contents + offset - 3); if (val != 0x48 && val != 0x4c) { /* X32 may have 0x44 REX prefix or no REX prefix. */ if (ABI_64_P (abfd)) return FALSE; } } else { /* X32 may not have any REX prefix. */ if (ABI_64_P (abfd)) return FALSE; if (offset < 2 || (offset + 3) > sec->size) return FALSE; } So, it should handle the case without REX just OK. If it doesn't, then this is a bug in binutils. >>> >>> The last byte of the displacement in the previous instruction >>> may happen to look like a REX byte. In that case, linker >>> will overwrite the last byte of the previous instruction and >>> generate the wrong instruction sequence. >>> >>> I need to update linker to enforce the REX byte check. >> >> One important observation: if we want to follow the x86_64 TLS spec >> strictly, we have to use existing DImode patterns only. This also >> means that we should NOT convert other TLS patterns to Pmode, since >> they explicitly state movq and addq. If this is not the case, then we >> need new TLS specification for X32. > > Here is a patch to properly generate X32 IE sequence. > > This is the summary of differences between x86-64 TLS and x32 TLS: > > x86-64 x32 > GD > byte 0x66; leaq foo@tlsgd(%rip),%rdi; leaq foo@tlsgd(%rip),%rdi; > .word 0x; rex64; call __tls_get_addr@plt .word 0x; rex64; > call __tls_get_addr@plt > > GD->IE optimization > movq %fs:0,%rax; addq x@gottpoff(%rip),%rax movl %fs:0,%eax; > addq x@gottpoff(%rip),%rax > > GD->LE optimization > movq %fs:0,%rax; leaq x@tpoff(%rax),%rax movl %fs:0,%eax; > leaq x@tpoff(%rax),%rax > > LD > leaq foo@tlsld(%rip),%rdi; leaq foo@tlsld(%rip),%rdi; > call __tls_get_addr@plt call __tls_get_addr@plt > > LD->LE optimization > .word 0x; .byte 0x66; movq %fs:0, %rax nopl 0x0(%rax); movl > %fs:0, %eax > > IE > movq %fs:0,%reg64; movl %fs:0,%reg32; > addq x@gottpoff(%rip),%reg64 addl x@gottpoff(%rip),%reg32 > > or > Not supported if > Pmode == SImode > movq x@gottpoff(%rip),%reg64; movq x@gottpoff(%rip),%reg64; > movq %fs:(%reg64),%reg32 movl %fs:(%reg64), %reg32 > > IE->LE optimization > > movq %fs:0,%reg64; movl %fs:0,%reg32; > addq x@gottpoff(%rip),%reg64 addl x@gottpoff(%rip),%reg32 > > to > > movq %fs:0,%reg64; movl %fs:0,%reg32; > addq foo@tpoff, %reg64 addl foo@tpoff, %reg32 > > movq %fs:0,%reg64; movl %fs:0,%reg32; > leaq foo@tpoff(%reg64), %reg64 leal foo@tpoff(%reg32), > %reg32 > > or > > movq x@gottpoff(%rip),%reg64 movq x@gottpoff(%rip),%reg64; > movl %fs:(%reg64),%reg32 movl %fs:(%reg64), %reg32 > > to > > movq foo@tpoff, %reg64 movq foo@tpoff, %reg64 > movl %fs:(%reeg64),%reg32 movl %fs:(%reg64), %reg32 > > LE > movq %fs:0,%reg64; movl %fs:0,%reg32; > leaq x@tpoff(%reg64),%reg32 leal x@tpoff(%reg32),%reg32 > > or > > movq %fs:0,%reg64; movl %fs:0,%reg32; > addq $x@tpoff,%reg64 addl $x@tpoff,%reg32 > > or > > movq %fs:0,%reg64; movl %fs:0,%reg32; > movl x@tpoff(%reg64),%reg32 movl x@tpoff(%reg32)
PATCH: Properly generate X32 IE sequence
On Mon, Mar 5, 2012 at 9:25 AM, Uros Bizjak wrote: > On Mon, Mar 5, 2012 at 6:03 PM, H.J. Lu wrote: > X86-64 linker optimizes TLS_MODEL_INITIAL_EXEC to TLS_MODEL_LOCAL_EXEC by checking movq foo@gottpoff(%rip), %reg and addq foo@gottpoff(%rip), %reg It uses the REX prefix to avoid the last byte of the previous instruction. With 32bit Pmode, we may not have the REX prefix and the last byte of the previous instruction may be an offset, which may look like a REX prefix. IE->LE optimization will generate corrupted binary. This patch makes sure we always output an REX pfrefix for UNSPEC_GOTNTPOFF. OK for trunk? >>> >>> Actually, linker has: >>> >>> case R_X86_64_GOTTPOFF: >>> /* Check transition from IE access model: >>> mov foo@gottpoff(%rip), %reg >>> add foo@gottpoff(%rip), %reg >>> */ >>> >>> /* Check REX prefix first. */ >>> if (offset >= 3 && (offset + 4) <= sec->size) >>> { >>> val = bfd_get_8 (abfd, contents + offset - 3); >>> if (val != 0x48 && val != 0x4c) >>> { >>> /* X32 may have 0x44 REX prefix or no REX prefix. */ >>> if (ABI_64_P (abfd)) >>> return FALSE; >>> } >>> } >>> else >>> { >>> /* X32 may not have any REX prefix. */ >>> if (ABI_64_P (abfd)) >>> return FALSE; >>> if (offset < 2 || (offset + 3) > sec->size) >>> return FALSE; >>> } >>> >>> So, it should handle the case without REX just OK. If it doesn't, then >>> this is a bug in binutils. >>> >> >> The last byte of the displacement in the previous instruction >> may happen to look like a REX byte. In that case, linker >> will overwrite the last byte of the previous instruction and >> generate the wrong instruction sequence. >> >> I need to update linker to enforce the REX byte check. > > One important observation: if we want to follow the x86_64 TLS spec > strictly, we have to use existing DImode patterns only. This also > means that we should NOT convert other TLS patterns to Pmode, since > they explicitly state movq and addq. If this is not the case, then we > need new TLS specification for X32. Here is a patch to properly generate X32 IE sequence. This is the summary of differences between x86-64 TLS and x32 TLS: x86-64 x32 GD byte 0x66; leaq foo@tlsgd(%rip),%rdi; leaq foo@tlsgd(%rip),%rdi; .word 0x; rex64; call __tls_get_addr@plt .word 0x; rex64; call __tls_get_addr@plt GD->IE optimization movq %fs:0,%rax; addq x@gottpoff(%rip),%raxmovl %fs:0,%eax; addq x@gottpoff(%rip),%rax GD->LE optimization movq %fs:0,%rax; leaq x@tpoff(%rax),%rax movl %fs:0,%eax; leaq x@tpoff(%rax),%rax LD leaq foo@tlsld(%rip),%rdi; leaq foo@tlsld(%rip),%rdi; call __tls_get_addr@plt call __tls_get_addr@plt LD->LE optimization .word 0x; .byte 0x66; movq %fs:0, %rax nopl 0x0(%rax); movl %fs:0, %eax IE movq %fs:0,%reg64; movl %fs:0,%reg32; addq x@gottpoff(%rip),%reg64 addl x@gottpoff(%rip),%reg32 or Not supported if Pmode == SImode movq x@gottpoff(%rip),%reg64; movq x@gottpoff(%rip),%reg64; movq %fs:(%reg64),%reg32 movl %fs:(%reg64), %reg32 IE->LE optimization movq %fs:0,%reg64; movl %fs:0,%reg32; addq x@gottpoff(%rip),%reg64 addl x@gottpoff(%rip),%reg32 to movq %fs:0,%reg64; movl %fs:0,%reg32; addq foo@tpoff, %reg64 addl foo@tpoff, %reg32 movq %fs:0,%reg64; movl %fs:0,%reg32; leaq foo@tpoff(%reg64), %reg64 leal foo@tpoff(%reg32), %reg32 or movq x@gottpoff(%rip),%reg64 movq x@gottpoff(%rip),%reg64; movl %fs:(%reg64),%reg32 movl %fs:(%reg64), %reg32 to movq foo@tpoff, %reg64 movq foo@tpoff, %reg64 movl %fs:(%reeg64),%reg32 movl %fs:(%reg64), %reg32 LE movq %fs:0,%reg64; movl %fs:0,%reg32; leaq x@tpoff(%reg64),%reg32leal x@tpoff(%reg32),%reg32 or movq %fs:0,%reg64; movl %fs:0,%reg32; addq $x@tpoff,%reg64 addl $x@tpoff,%reg32 or movq %fs:0,%reg64; movl %fs:0,%reg32; movl x@tpoff(%reg64),%reg32movl x@tpoff(%reg32),%reg32 or movl %fs:x@tpoff,%reg32movl %fs:x@tpoff,%reg32 X32 TLS implementation is straight forward, except for IE: 1. Since address override works only on the (reg32)