Re: PATCH: Properly generate X32 IE sequence

2012-03-20 Thread Eric Botcazou
 The patch is bootstrapping now on x86_64-pc-linux-gnu.

It very likely breaks bootstrap with RTL checking enabled:

/sil.a/gnatmail/gnatmail-x/build-sil/x86-linux/gnat/obj/./gcc/xgcc 
-B/sil.a/gnatmail/gnatmail-x/build-sil/x86-linux/gnat/obj/./gcc/ 
-B/usr/gnat/i686-pc-linux-gnu/bin/ -B/usr/gnat/i686-pc-linux-gnu/lib/ -isystem 
/usr/gnat/i686-pc-linux-gnu/include -isystem 
/usr/gnat/i686-pc-linux-gnu/sys-include-g -O2 -O2  -g -O2 -DIN_GCC   -W 
-Wall -Wwrite-strings -Wcast-qual -Wstrict-prototypes -Wmissing-prototypes 
-Wold-style-definition  -isystem ./include   -fpic -g -DIN_LIBGCC2 
-fbuilding-libgcc -fno-stack-protector   -fpic -I. -I. -I../.././gcc 
-I../../../src/libgcc -I../../../src/libgcc/. -I../../../src/libgcc/../gcc 
-I../../../src/libgcc/../include -I../../../src/libgcc/config/libbid 
-DENABLE_DECIMAL_BID_FORMAT -DHAVE_CC_TLS  -DUSE_TLS -o 
_popcountsi2.o -MT _popcountsi2.o -MD -MP -MF 
_popcountsi2.dep -DL_popcountsi2 -c ../../../src/libgcc/libgcc2.c 
-fvisibility=hidden -DHIDE_EXPORTS
../../../src/libgcc/libgcc2.c: In function '__popcountsi2':
../../../src/libgcc/libgcc2.c:835:1: internal compiler error: RTL check: 
expected elt 1 type 'i' or 'n', have '0' (rtx mem) in ix86_decompose_address, 
at config/i386/i386.c:11522
Please submit a full bug report,
with preprocessed source if appropriate.
See URL:mailto:rep...@adacore.com for instructions.
make[3]: *** [_popcountsi2.o] Error 1

-- 
Eric Botcazou


Re: PATCH: Properly generate X32 IE sequence

2012-03-20 Thread Jakub Jelinek
On Tue, Mar 20, 2012 at 09:51:07AM +0100, Eric Botcazou wrote:
  The patch is bootstrapping now on x86_64-pc-linux-gnu.
 
 It very likely breaks bootstrap with RTL checking enabled:
 
 /sil.a/gnatmail/gnatmail-x/build-sil/x86-linux/gnat/obj/./gcc/xgcc 
 -B/sil.a/gnatmail/gnatmail-x/build-sil/x86-linux/gnat/obj/./gcc/ 
 -B/usr/gnat/i686-pc-linux-gnu/bin/ -B/usr/gnat/i686-pc-linux-gnu/lib/ 
 -isystem /usr/gnat/i686-pc-linux-gnu/include -isystem 
 /usr/gnat/i686-pc-linux-gnu/sys-include-g -O2 -O2  -g -O2 -DIN_GCC   -W 
 -Wall -Wwrite-strings -Wcast-qual -Wstrict-prototypes -Wmissing-prototypes 
 -Wold-style-definition  -isystem ./include   -fpic -g -DIN_LIBGCC2 
 -fbuilding-libgcc -fno-stack-protector   -fpic -I. -I. -I../.././gcc 
 -I../../../src/libgcc -I../../../src/libgcc/. -I../../../src/libgcc/../gcc 
 -I../../../src/libgcc/../include -I../../../src/libgcc/config/libbid 
 -DENABLE_DECIMAL_BID_FORMAT -DHAVE_CC_TLS  -DUSE_TLS -o 
 _popcountsi2.o -MT _popcountsi2.o -MD -MP -MF 
 _popcountsi2.dep -DL_popcountsi2 -c ../../../src/libgcc/libgcc2.c 
 -fvisibility=hidden -DHIDE_EXPORTS
 ../../../src/libgcc/libgcc2.c: In function '__popcountsi2':
 ../../../src/libgcc/libgcc2.c:835:1: internal compiler error: RTL check: 
 expected elt 1 type 'i' or 'n', have '0' (rtx mem) in ix86_decompose_address, 
 at config/i386/i386.c:11522
 Please submit a full bug report,
 with preprocessed source if appropriate.
 See URL:mailto:rep...@adacore.com for instructions.
 make[3]: *** [_popcountsi2.o] Error 1

Yeah, my bootstrap just failed the same.  Will test:

2012-03-20  Jakub Jelinek  ja...@redhat.com

* config/i386/i386.c (ix86_decompose_address) case ZERO_EXTEND:
If operand isn't UNSPEC, return 0.

--- gcc/config/i386/i386.c.jj   2012-03-20 09:35:06.0 +0100
+++ gcc/config/i386/i386.c  2012-03-20 09:56:35.038835835 +0100
@@ -11516,6 +11516,8 @@ ix86_decompose_address (rtx addr, struct
 
case ZERO_EXTEND:
  op = XEXP (op, 0);
+ if (GET_CODE (op) != UNSPEC)
+   return 0;
  /* FALLTHRU */
 
case UNSPEC:

Jakub


Re: PATCH: Properly generate X32 IE sequence

2012-03-20 Thread Paolo Bonzini
Il 19/03/2012 20:13, Uros Bizjak ha scritto:
 2012-03-19  Uros Bizjak  ubiz...@gmail.com
 
   * config/i386/i386.c (get_thread_pointer): Add tp_mode argument.
   Generate ZERO_EXTEND in place if GET_MODE (tp) != tp_mode.
   (legitimize_tls_address) TLS_MODEL_INITIAL_EXEC: Always generate
   DImode UNSPEC_GOTNTPOFF references on TARGET_64BIT.
   (ix86_decompose_address): Allow zero extended UNSPEC_TP references.
 
   Revert:
   2012-03-13  Uros Bizjak  ubiz...@gmail.com
 
   * config/i386/i386.h (TARGET_TLS_INDIRECT_SEG_REFS): New.
   * config/i386/i386.c (ix86_decompose_address): Use
   TARGET_TLS_INDIRECT_SEG_REFS to prevent %fs:(%reg) addresses.
   (legitimize_tls_address): Use TARGET_TLS_INDIRECT_SEG_REFS to load
   thread pointer to a register.
 
   Revert:
   2012-03-10  H.J. Lu  hongjiu...@intel.com
 
   * config/i386/i386.c (ix86_decompose_address): Disallow fs:(reg)
   if Pmode != word_mode.
   (legitimize_tls_address): Call gen_tls_initial_exec_x32 if
   Pmode == SImode for TARGET_X32.
 
   * config/i386/i386.md (UNSPEC_TLS_IE_X32): New.
   (tls_initial_exec_x32): Likewise.
 
 Tested on x86_64-pc-linux-gnu {,-m32}.

No testcases?

Paolo



Re: PATCH: Properly generate X32 IE sequence

2012-03-20 Thread Jakub Jelinek
On Tue, Mar 20, 2012 at 09:58:29AM +0100, Jakub Jelinek wrote:
 Yeah, my bootstrap just failed the same.  Will test:
 
 2012-03-20  Jakub Jelinek  ja...@redhat.com
 
   * config/i386/i386.c (ix86_decompose_address) case ZERO_EXTEND:
   If operand isn't UNSPEC, return 0.

Committed as obvious now that bootstrap/regtest finished on x86_64-linux
and i686-linux.

 --- gcc/config/i386/i386.c.jj 2012-03-20 09:35:06.0 +0100
 +++ gcc/config/i386/i386.c2012-03-20 09:56:35.038835835 +0100
 @@ -11516,6 +11516,8 @@ ix86_decompose_address (rtx addr, struct
  
   case ZERO_EXTEND:
 op = XEXP (op, 0);
 +   if (GET_CODE (op) != UNSPEC)
 + return 0;
 /* FALLTHRU */
  
   case UNSPEC:

Jakub


Re: PATCH: Properly generate X32 IE sequence

2012-03-20 Thread H.J. Lu
On Tue, Mar 20, 2012 at 4:19 AM, Jakub Jelinek ja...@redhat.com wrote:
 On Tue, Mar 20, 2012 at 09:58:29AM +0100, Jakub Jelinek wrote:
 Yeah, my bootstrap just failed the same.  Will test:

 2012-03-20  Jakub Jelinek  ja...@redhat.com

       * config/i386/i386.c (ix86_decompose_address) case ZERO_EXTEND:
       If operand isn't UNSPEC, return 0.

 Committed as obvious now that bootstrap/regtest finished on x86_64-linux
 and i686-linux.

 --- gcc/config/i386/i386.c.jj 2012-03-20 09:35:06.0 +0100
 +++ gcc/config/i386/i386.c    2012-03-20 09:56:35.038835835 +0100
 @@ -11516,6 +11516,8 @@ ix86_decompose_address (rtx addr, struct

           case ZERO_EXTEND:
             op = XEXP (op, 0);
 +           if (GET_CODE (op) != UNSPEC)
 +             return 0;
             /* FALLTHRU */

           case UNSPEC:


Uros,

I think use the OS provided instruction to load TP into DImode register
could simplify the code.


-- 
H.J.


Re: PATCH: Properly generate X32 IE sequence

2012-03-20 Thread Uros Bizjak
On Tue, Mar 20, 2012 at 4:52 PM, H.J. Lu hjl.to...@gmail.com wrote:

 Yeah, my bootstrap just failed the same.  Will test:

 2012-03-20  Jakub Jelinek  ja...@redhat.com

       * config/i386/i386.c (ix86_decompose_address) case ZERO_EXTEND:
       If operand isn't UNSPEC, return 0.

 Committed as obvious now that bootstrap/regtest finished on x86_64-linux
 and i686-linux.

 --- gcc/config/i386/i386.c.jj 2012-03-20 09:35:06.0 +0100
 +++ gcc/config/i386/i386.c    2012-03-20 09:56:35.038835835 +0100
 @@ -11516,6 +11516,8 @@ ix86_decompose_address (rtx addr, struct

           case ZERO_EXTEND:
             op = XEXP (op, 0);
 +           if (GET_CODE (op) != UNSPEC)
 +             return 0;
             /* FALLTHRU */

           case UNSPEC:


 Uros,

 I think use the OS provided instruction to load TP into DImode register
 could simplify the code.

Which OS provided instruction?

Please see how TP is defined in get_thread_pointer, it is in ptr_mode:

  rtx tp = gen_rtx_UNSPEC (ptr_mode, gen_rtvec (1, const0_rtx), UNSPEC_TP);

This says that TP is in SImode on X32.

Uros.


Re: PATCH: Properly generate X32 IE sequence

2012-03-20 Thread H.J. Lu
On Tue, Mar 20, 2012 at 10:54 AM, Uros Bizjak ubiz...@gmail.com wrote:
 On Tue, Mar 20, 2012 at 4:52 PM, H.J. Lu hjl.to...@gmail.com wrote:

 Yeah, my bootstrap just failed the same.  Will test:

 2012-03-20  Jakub Jelinek  ja...@redhat.com

       * config/i386/i386.c (ix86_decompose_address) case ZERO_EXTEND:
       If operand isn't UNSPEC, return 0.

 Committed as obvious now that bootstrap/regtest finished on x86_64-linux
 and i686-linux.

 --- gcc/config/i386/i386.c.jj 2012-03-20 09:35:06.0 +0100
 +++ gcc/config/i386/i386.c    2012-03-20 09:56:35.038835835 +0100
 @@ -11516,6 +11516,8 @@ ix86_decompose_address (rtx addr, struct

           case ZERO_EXTEND:
             op = XEXP (op, 0);
 +           if (GET_CODE (op) != UNSPEC)
 +             return 0;
             /* FALLTHRU */

           case UNSPEC:


 Uros,

 I think use the OS provided instruction to load TP into DImode register
 could simplify the code.

 Which OS provided instruction?

 Please see how TP is defined in get_thread_pointer, it is in ptr_mode:

  rtx tp = gen_rtx_UNSPEC (ptr_mode, gen_rtvec (1, const0_rtx), UNSPEC_TP);

 This says that TP is in SImode on X32.

 Uros.

TP is defined as (unspec:DI [(const_int 0]) UNSPEC_TP)
and provided by OS.  It is a CONST_INT, but its value is opaque
to GCC. MODE here has no impact on its value provided by OS.
X32 OS provides instructions to load TP to into an SImode and
DImode registers.


-- 
H.J.


Re: PATCH: Properly generate X32 IE sequence

2012-03-20 Thread Uros Bizjak
On Tue, Mar 20, 2012 at 7:27 PM, H.J. Lu hjl.to...@gmail.com wrote:

 I think use the OS provided instruction to load TP into DImode register
 could simplify the code.

 Which OS provided instruction?

 Please see how TP is defined in get_thread_pointer, it is in ptr_mode:

  rtx tp = gen_rtx_UNSPEC (ptr_mode, gen_rtvec (1, const0_rtx), UNSPEC_TP);

 This says that TP is in SImode on X32.

 TP is defined as (unspec:DI [(const_int 0]) UNSPEC_TP)
 and provided by OS.  It is a CONST_INT, but its value is opaque
 to GCC. MODE here has no impact on its value provided by OS.
 X32 OS provides instructions to load TP to into an SImode and
 DImode registers.

You must be looking to some other GCC sources than me.

(define_insn *load_tp_x32
  [(set (match_operand:SI 0 register_operand =r)
(unspec:SI [(const_int 0)] UNSPEC_TP))]
  TARGET_X32
  mov{l}\t{%%fs:0, %0|%0, DWORD PTR fs:0}
  [(set_attr type imov)
   (set_attr modrm 0)
   (set_attr length 7)
   (set_attr memory load)
   (set_attr imm_disp false)])

(define_insn *load_tp_x32_zext
  [(set (match_operand:DI 0 register_operand =r)
(zero_extend:DI (unspec:SI [(const_int 0)] UNSPEC_TP)))]
  TARGET_X32
  mov{l}\t{%%fs:0, %k0|%k0, DWORD PTR fs:0}
  [(set_attr type imov)
   (set_attr modrm 0)
   (set_attr length 7)
   (set_attr memory load)
   (set_attr imm_disp false)])

Uros.


Re: PATCH: Properly generate X32 IE sequence

2012-03-20 Thread H.J. Lu
On Tue, Mar 20, 2012 at 11:43 AM, Uros Bizjak ubiz...@gmail.com wrote:
 On Tue, Mar 20, 2012 at 7:27 PM, H.J. Lu hjl.to...@gmail.com wrote:

 I think use the OS provided instruction to load TP into DImode register
 could simplify the code.

 Which OS provided instruction?

 Please see how TP is defined in get_thread_pointer, it is in ptr_mode:

  rtx tp = gen_rtx_UNSPEC (ptr_mode, gen_rtvec (1, const0_rtx), UNSPEC_TP);

 This says that TP is in SImode on X32.

 TP is defined as (unspec:DI [(const_int 0]) UNSPEC_TP)
 and provided by OS.  It is a CONST_INT, but its value is opaque
 to GCC. MODE here has no impact on its value provided by OS.
 X32 OS provides instructions to load TP to into an SImode and
 DImode registers.

 You must be looking to some other GCC sources than me.

 (define_insn *load_tp_x32
  [(set (match_operand:SI 0 register_operand =r)
        (unspec:SI [(const_int 0)] UNSPEC_TP))]
  TARGET_X32
  mov{l}\t{%%fs:0, %0|%0, DWORD PTR fs:0}
  [(set_attr type imov)
   (set_attr modrm 0)
   (set_attr length 7)
   (set_attr memory load)
   (set_attr imm_disp false)])

 (define_insn *load_tp_x32_zext
  [(set (match_operand:DI 0 register_operand =r)
        (zero_extend:DI (unspec:SI [(const_int 0)] UNSPEC_TP)))]
  TARGET_X32
  mov{l}\t{%%fs:0, %k0|%k0, DWORD PTR fs:0}
  [(set_attr type imov)
   (set_attr modrm 0)
   (set_attr length 7)
   (set_attr memory load)
   (set_attr imm_disp false)])


Thread pointer (TP) points to thread control block (TCB).  X32 TCB is

typedef struct
{
  void *tcb;/* Pointer to the TCB.  Not necessarily the
   thread descriptor used by libpthread.  */
  ...
}

It is a 32bit address set up by OS.  That is where 0 in %fs:0 comes
from since it is the first field of the struct %fs points to.  X32 OS provides

mov %fs:0, %eax

to load the address of TCB into EAX and

mov %fs:0, %eax

to load the address of TCB into RAX since OS guarantees that the upper
32bits of the address of TCB are all 0s. We added *load_tp_x32_zext
since we zero-extend SI TP to DI TP.   Or we can use

mov %fs:0, %eax

to directly load the value of the tcb field into RAX and remove
*load_tp_x32_zext.  It will simplify the code.


-- 
H.J.


Re: PATCH: Properly generate X32 IE sequence

2012-03-19 Thread H.J. Lu
On Sun, Mar 18, 2012 at 1:55 PM, Uros Bizjak ubiz...@gmail.com wrote:
 On Sun, Mar 18, 2012 at 5:01 PM, Uros Bizjak ubiz...@gmail.com wrote:

 I am testing this patch.  OK for trunk if it passes all tests?

 No, force_reg will generate a pseudo, so this conversion is valid only
 for !can_create_pseudo ().

 At least for *tls_initial_exec_x32_store, you will need a temporary to
 split the pattern after reload.

Here is the updated patch to add can_create_pseudo.  I also changed
tls_initial_exec_x32 to take an input register operand as thread pointer.

 Please try attached patch. It simply throws away all recent
 complications w.r.t. to thread pointer and always handles TP in
 DImode.

 The testcase:

 --cut here--
 __thread int foo __attribute__ ((tls_model (initial-exec)));

 void bar (int x)
 {
  foo = x;
 }

 int baz (void)
 {
  return foo;
 }
 --cut here--

 Now compiles to:

 bar:
        movq    foo@gottpoff(%rip), %rax
        movl    %edi, %fs:(%rax)
        ret

 baz:
        movq    foo@gottpoff(%rip), %rax
        movl    %fs:(%rax), %eax
        ret

 In effect, this always generates %fs(%rDI) and emits REX prefix before
 mov/add to satisfy brain-dead linkers.

 The patch is bootstrapping now on x86_64-pc-linux-gnu.


For

--
extern __thread char c;
extern char y;
void
ie (void)
{
  y = c;
}
--

Your patch generates:

movl%fs:0, %eax 
movqc@gottpoff(%rip), %rdx  
movzbl  (%rax,%rdx), %edx   
movb%dl, y(%rip)
ret 

It can be optimized to:

movqc@gottpoff(%rip), %rax  
movzbl  %fs:(%rax), %eax
movb%al, y(%rip)
ret 

H.J.
2012-03-19  H.J. Lu  hongjiu...@intel.com

	* config/i386/i386-protos.h (ix86_split_tls_initial_exec_x32): New.

	* config/i386/i386.c (legitimize_tls_address): Also pass thread
	pointer to gen_tls_initial_exec_x32.
	(ix86_split_tls_initial_exec_x32): New.

	* config/i386/i386.md (*load_tp_x32): Renamed to ...
	(*load_tp_x32_mode): This. Replace SI with SWI48x.
	(tls_initial_exec_x32): Add an input register operand as thread
	pointer.  Generate a REX prefix if needed.
	(*tls_initial_exec_x32_load): New.
	(*tls_initial_exec_x32_store): Likewise.

diff --git a/gcc/config/i386/i386-protos.h b/gcc/config/i386/i386-protos.h
index 630112f..528eeaa 100644
--- a/gcc/config/i386/i386-protos.h
+++ b/gcc/config/i386/i386-protos.h
@@ -142,6 +142,7 @@ extern void ix86_split_lshr (rtx *, rtx, enum machine_mode);
 extern rtx ix86_find_base_term (rtx);
 extern bool ix86_check_movabs (rtx, int);
 extern void ix86_split_idivmod (enum machine_mode, rtx[], bool);
+extern void ix86_split_tls_initial_exec_x32 (rtx [], enum machine_mode, bool);
 
 extern rtx assign_386_stack_local (enum machine_mode, enum ix86_stack_slot);
 extern int ix86_attr_length_immediate_default (rtx, bool);
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 78a366e..fb802ee 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -12671,13 +12671,14 @@ legitimize_tls_address (rtx x, enum tls_model model, bool for_mov)
 	}
 	  else if (Pmode == SImode)
 	{
-	  /* Always generate
-			movl %fs:0, %reg32
+	  /* Always generate a REX prefix for
 			addl xgottpoff(%rip), %reg32
-		 to support linker IE-LE optimization and avoid
-		 fs:(%reg32) as memory operand.  */
+		 to support linker IE-LE optimization.  */
 	  dest = gen_reg_rtx (Pmode);
-	  emit_insn (gen_tls_initial_exec_x32 (dest, x));
+	  base = get_thread_pointer (for_mov
+	 || !(TARGET_TLS_DIRECT_SEG_REFS
+	   TARGET_TLS_INDIRECT_SEG_REFS));
+	  emit_insn (gen_tls_initial_exec_x32 (dest, base, x));
 	  return dest;
 	}
 
@@ -12754,6 +12755,28 @@ legitimize_tls_address (rtx x, enum tls_model model, bool for_mov)
   return dest;
 }
 
+/* Split x32 TLS IE access in MODE.  Split load if LOAD is TRUE,
+   otherwise split store.  */
+
+void
+ix86_split_tls_initial_exec_x32 (rtx operands[],
+ enum machine_mode mode, bool load)
+{
+  rtx base, mem;
+  rtx off = load ? operands[1] : operands[0];
+  off = gen_rtx_UNSPEC (DImode, gen_rtvec (1, off), UNSPEC_GOTNTPOFF);
+  off = gen_rtx_CONST (DImode, off);
+  off = gen_const_mem (DImode, off);
+  set_mem_alias_set (off, ix86_GOT_alias_set ());
+  base = gen_rtx_UNSPEC (DImode, gen_rtvec (1, const0_rtx), UNSPEC_TP);
+  off = gen_rtx_PLUS (DImode, base, force_reg (DImode, off));
+  mem = gen_rtx_MEM (mode, off);
+  if (load)
+emit_move_insn (operands[0], mem);
+  else
+emit_move_insn (mem, operands[1]);
+}
+
 /* Create or return the unique __imp_DECL dllimport symbol corresponding
to symbol DECL.  */
 
diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index eae26ae..1643792 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -12747,11 +12747,11 @@
 (define_mode_attr tp_seg [(SI gs) (DI fs)])
 
 ;; Load and add the thread base pointer from %tp_seg:0.
-(define_insn *load_tp_x32
-  [(set 

Re: PATCH: Properly generate X32 IE sequence

2012-03-19 Thread H.J. Lu
On Mon, Mar 19, 2012 at 8:54 AM, H.J. Lu hjl.to...@gmail.com wrote:
 On Mon, Mar 19, 2012 at 8:51 AM, H.J. Lu hjl.to...@gmail.com wrote:
 On Sun, Mar 18, 2012 at 1:55 PM, Uros Bizjak ubiz...@gmail.com wrote:
 On Sun, Mar 18, 2012 at 5:01 PM, Uros Bizjak ubiz...@gmail.com wrote:

 I am testing this patch.  OK for trunk if it passes all tests?

 No, force_reg will generate a pseudo, so this conversion is valid only
 for !can_create_pseudo ().

 At least for *tls_initial_exec_x32_store, you will need a temporary to
 split the pattern after reload.

 Here is the updated patch to add can_create_pseudo.  I also changed
 tls_initial_exec_x32 to take an input register operand as thread pointer.

 Please try attached patch. It simply throws away all recent
 complications w.r.t. to thread pointer and always handles TP in
 DImode.

 The testcase:

 --cut here--
 __thread int foo __attribute__ ((tls_model (initial-exec)));

 void bar (int x)
 {
  foo = x;
 }

 int baz (void)
 {
  return foo;
 }
 --cut here--

 Now compiles to:

 bar:
        movq    foo@gottpoff(%rip), %rax
        movl    %edi, %fs:(%rax)
        ret

 baz:
        movq    foo@gottpoff(%rip), %rax
        movl    %fs:(%rax), %eax
        ret

 In effect, this always generates %fs(%rDI) and emits REX prefix before
 mov/add to satisfy brain-dead linkers.

 The patch is bootstrapping now on x86_64-pc-linux-gnu.


 For

 --
 extern __thread char c;
 extern char y;
 void
 ie (void)
 {
  y = c;
 }
 --

 Your patch generates:

        movl    %fs:0, %eax
        movq    c@gottpoff(%rip), %rdx
        movzbl  (%rax,%rdx), %edx
        movb    %dl, y(%rip)
        ret

 It can be optimized to:

        movq    c@gottpoff(%rip), %rax
        movzbl  %fs:(%rax), %eax
        movb    %al, y(%rip)
        ret


 Combine failed:

 (set (reg:QI 63 [ c ])
    (mem/c:QI (plus:DI (zero_extend:DI (unspec:SI [
                        (const_int 0 [0])
                    ] UNSPEC_TP))
            (mem/u/c:DI (const:DI (unspec:DI [
                            (symbol_ref:SI (c) [flags 0x60]
 var_decl 0x719b8140 c)
                        ] UNSPEC_GOTNTPOFF)) [2 S8 A8])) [0 c+0 S1 A8]))



Wrong testcase.  IT should be

--
extern __thread char c;
extern __thread short w;
extern char y;
extern short i;
void
ie (void)
{
  y = c;
  i = w;
}
---

I got

movl%fs:0, %eax 
movqc@gottpoff(%rip), %rdx  
movzbl  (%rax,%rdx), %edx   
movb%dl, y(%rip)
movqw@gottpoff(%rip), %rdx  
movzwl  (%rax,%rdx), %eax   
movw%ax, i(%rip)
ret 

It can be

movqc@gottpoff(%rip), %rax  
movzbl  %fs:(%rax), %eax
movb%al, y(%rip)
movqw@gottpoff(%rip), %rax  
movzwl  %fs:(%rax), %eax
movw%ax, i(%rip)
ret 



-- 
H.J.


Re: PATCH: Properly generate X32 IE sequence

2012-03-19 Thread H.J. Lu
On Mon, Mar 19, 2012 at 9:19 AM, H.J. Lu hjl.to...@gmail.com wrote:
 On Mon, Mar 19, 2012 at 8:54 AM, H.J. Lu hjl.to...@gmail.com wrote:
 On Mon, Mar 19, 2012 at 8:51 AM, H.J. Lu hjl.to...@gmail.com wrote:
 On Sun, Mar 18, 2012 at 1:55 PM, Uros Bizjak ubiz...@gmail.com wrote:
 On Sun, Mar 18, 2012 at 5:01 PM, Uros Bizjak ubiz...@gmail.com wrote:

 I am testing this patch.  OK for trunk if it passes all tests?

 No, force_reg will generate a pseudo, so this conversion is valid only
 for !can_create_pseudo ().

 At least for *tls_initial_exec_x32_store, you will need a temporary to
 split the pattern after reload.

 Here is the updated patch to add can_create_pseudo.  I also changed
 tls_initial_exec_x32 to take an input register operand as thread pointer.

 Please try attached patch. It simply throws away all recent
 complications w.r.t. to thread pointer and always handles TP in
 DImode.

 The testcase:

 --cut here--
 __thread int foo __attribute__ ((tls_model (initial-exec)));

 void bar (int x)
 {
  foo = x;
 }

 int baz (void)
 {
  return foo;
 }
 --cut here--

 Now compiles to:

 bar:
        movq    foo@gottpoff(%rip), %rax
        movl    %edi, %fs:(%rax)
        ret

 baz:
        movq    foo@gottpoff(%rip), %rax
        movl    %fs:(%rax), %eax
        ret

 In effect, this always generates %fs(%rDI) and emits REX prefix before
 mov/add to satisfy brain-dead linkers.

 The patch is bootstrapping now on x86_64-pc-linux-gnu.


 For

 --
 extern __thread char c;
 extern char y;
 void
 ie (void)
 {
  y = c;
 }
 --

 Your patch generates:

        movl    %fs:0, %eax
        movq    c@gottpoff(%rip), %rdx
        movzbl  (%rax,%rdx), %edx
        movb    %dl, y(%rip)
        ret

 It can be optimized to:

        movq    c@gottpoff(%rip), %rax
        movzbl  %fs:(%rax), %eax
        movb    %al, y(%rip)
        ret


 Combine failed:

 (set (reg:QI 63 [ c ])
    (mem/c:QI (plus:DI (zero_extend:DI (unspec:SI [
                        (const_int 0 [0])
                    ] UNSPEC_TP))
            (mem/u/c:DI (const:DI (unspec:DI [
                            (symbol_ref:SI (c) [flags 0x60]
 var_decl 0x719b8140 c)
                        ] UNSPEC_GOTNTPOFF)) [2 S8 A8])) [0 c+0 S1 A8]))



 Wrong testcase.  IT should be

 --
 extern __thread char c;
 extern __thread short w;
 extern char y;
 extern short i;
 void
 ie (void)
 {
  y = c;
  i = w;
 }
 ---

 I got

        movl    %fs:0, %eax
        movq    c@gottpoff(%rip), %rdx
        movzbl  (%rax,%rdx), %edx
        movb    %dl, y(%rip)
        movq    w@gottpoff(%rip), %rdx
        movzwl  (%rax,%rdx), %eax
        movw    %ax, i(%rip)
        ret

 It can be

        movq    c@gottpoff(%rip), %rax
        movzbl  %fs:(%rax), %eax
        movb    %al, y(%rip)
        movq    w@gottpoff(%rip), %rax
        movzwl  %fs:(%rax), %eax
        movw    %ax, i(%rip)
        ret



How about this patch?  I changed 32 TP load to

(define_insn *load_tp_x32_mode
  [(set (match_operand:SWI48x 0 register_operand =r)
(unspec:SWI48x [(const_int 0)] UNSPEC_TP))]
  TARGET_X32
  mov{l}\t{%%fs:0, %k0|%k0, DWORD PTR fs:0}
  [(set_attr type imov)
   (set_attr modrm 0)
   (set_attr length 7)
   (set_attr memory load)
   (set_attr imm_disp false)])

and removed *load_tp_x32_zext.


-- 
H.J.
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 9aa5ee7..66221e4 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -12483,15 +12483,12 @@ legitimize_pic_address (rtx orig, rtx reg)
 /* Load the thread pointer.  If TO_REG is true, force it into a register.  */
 
 static rtx
-get_thread_pointer (bool to_reg)
+get_thread_pointer (enum machine_mode tp_mode, bool to_reg)
 {
-  rtx tp = gen_rtx_UNSPEC (ptr_mode, gen_rtvec (1, const0_rtx), UNSPEC_TP);
-
-  if (GET_MODE (tp) != Pmode)
-tp = convert_to_mode (Pmode, tp, 1);
+  rtx tp = gen_rtx_UNSPEC (tp_mode, gen_rtvec (1, const0_rtx), UNSPEC_TP);
 
   if (to_reg)
-tp = copy_addr_to_reg (tp);
+tp = copy_to_mode_reg (tp_mode, tp);
 
   return tp;
 }
@@ -12543,6 +12540,7 @@ legitimize_tls_address (rtx x, enum tls_model model, bool for_mov)
 {
   rtx dest, base, off;
   rtx pic = NULL_RTX, tp = NULL_RTX;
+  enum machine_mode tp_mode = Pmode;
   int type;
 
   switch (model)
@@ -12568,7 +12566,7 @@ legitimize_tls_address (rtx x, enum tls_model model, bool for_mov)
 	  else
 	emit_insn (gen_tls_dynamic_gnu2_32 (dest, x, pic));
 
-	  tp = get_thread_pointer (true);
+	  tp = get_thread_pointer (Pmode, true);
 	  dest = force_reg (Pmode, gen_rtx_PLUS (Pmode, tp, dest));
 
 	  set_unique_reg_note (get_last_insn (), REG_EQUAL, x);
@@ -12618,7 +12616,7 @@ legitimize_tls_address (rtx x, enum tls_model model, bool for_mov)
 	  else
 	emit_insn (gen_tls_dynamic_gnu2_32 (base, tmp, pic));
 
-	  tp = get_thread_pointer (true);
+	  tp = get_thread_pointer (Pmode, true);
 	  set_unique_reg_note (get_last_insn (), REG_EQUAL,
 			   gen_rtx_MINUS (Pmode, tmp, tp));
 	}
@@ -12664,27 +12662,18 @@ 

Re: PATCH: Properly generate X32 IE sequence

2012-03-19 Thread Uros Bizjak
On Mon, Mar 19, 2012 at 5:34 PM, H.J. Lu hjl.to...@gmail.com wrote:

 Combine failed:

 (set (reg:QI 63 [ c ])
    (mem/c:QI (plus:DI (zero_extend:DI (unspec:SI [
                        (const_int 0 [0])
                    ] UNSPEC_TP))
            (mem/u/c:DI (const:DI (unspec:DI [
                            (symbol_ref:SI (c) [flags 0x60]
 var_decl 0x719b8140 c)
                        ] UNSPEC_GOTNTPOFF)) [2 S8 A8])) [0 c+0 S1 A8]))



 Wrong testcase.  IT should be

 --
 extern __thread char c;
 extern __thread short w;
 extern char y;
 extern short i;
 void
 ie (void)
 {
  y = c;
  i = w;
 }
 ---

 I got

        movl    %fs:0, %eax
        movq    c@gottpoff(%rip), %rdx
        movzbl  (%rax,%rdx), %edx
        movb    %dl, y(%rip)
        movq    w@gottpoff(%rip), %rdx
        movzwl  (%rax,%rdx), %eax
        movw    %ax, i(%rip)
        ret

 It can be

        movq    c@gottpoff(%rip), %rax
        movzbl  %fs:(%rax), %eax
        movb    %al, y(%rip)
        movq    w@gottpoff(%rip), %rax
        movzwl  %fs:(%rax), %eax
        movw    %ax, i(%rip)
        ret



 How about this patch?  I changed 32 TP load to

 (define_insn *load_tp_x32_mode
  [(set (match_operand:SWI48x 0 register_operand =r)
        (unspec:SWI48x [(const_int 0)] UNSPEC_TP))]
  TARGET_X32
  mov{l}\t{%%fs:0, %k0|%k0, DWORD PTR fs:0}
  [(set_attr type imov)
   (set_attr modrm 0)
   (set_attr length 7)
   (set_attr memory load)
   (set_attr imm_disp false)])

 and removed *load_tp_x32_zext.

No, your whole approach with splitters is wrong.

@@ -12747,11 +12747,11 @@
 (define_mode_attr tp_seg [(SI gs) (DI fs)])

 ;; Load and add the thread base pointer from %tp_seg:0.
-(define_insn *load_tp_x32
-  [(set (match_operand:SI 0 register_operand =r)
-   (unspec:SI [(const_int 0)] UNSPEC_TP))]
+(define_insn *load_tp_x32_mode
+  [(set (match_operand:SWI48x 0 register_operand =r)
+   (unspec:SWI48x [(const_int 0)] UNSPEC_TP))]
   TARGET_X32
-  mov{l}\t{%%fs:0, %0|%0, DWORD PTR fs:0}
+  mov{l}\t{%%fs:0, %k0|%k0, DWORD PTR fs:0}

The result is zero_extended SImode register, not fake SImode register in DImore.

But as said, you should generate correct sequence from the beginning.

Uros.


Re: PATCH: Properly generate X32 IE sequence

2012-03-19 Thread Uros Bizjak
On Mon, Mar 19, 2012 at 5:19 PM, H.J. Lu hjl.to...@gmail.com wrote:

        movl    %fs:0, %eax
        movq    c@gottpoff(%rip), %rdx
        movzbl  (%rax,%rdx), %edx
        movb    %dl, y(%rip)
        movq    w@gottpoff(%rip), %rdx
        movzwl  (%rax,%rdx), %eax
        movw    %ax, i(%rip)
        ret

 It can be

        movq    c@gottpoff(%rip), %rax
        movzbl  %fs:(%rax), %eax
        movb    %al, y(%rip)
        movq    w@gottpoff(%rip), %rax
        movzwl  %fs:(%rax), %eax
        movw    %ax, i(%rip)
        ret

This is just CSE in action. It CSEd movl %fs:0, %eax, since it has to
be zero extended before going into address.

Uros.


Re: PATCH: Properly generate X32 IE sequence

2012-03-19 Thread Uros Bizjak
On Mon, Mar 19, 2012 at 5:47 PM, H.J. Lu hjl.to...@gmail.com wrote:

 For x32,  thread pointer is an unsigned 32bit value.

 movl %fs:0, %eax

 is the correct instruction to load thread pointer into EAX and RAX.

So, where is ZERO_EXTEND RTX then?

Uros.


Re: PATCH: Properly generate X32 IE sequence

2012-03-19 Thread H.J. Lu
On Mon, Mar 19, 2012 at 9:49 AM, Uros Bizjak ubiz...@gmail.com wrote:
 On Mon, Mar 19, 2012 at 5:47 PM, H.J. Lu hjl.to...@gmail.com wrote:

 For x32,  thread pointer is an unsigned 32bit value.

 movl %fs:0, %eax

 is the correct instruction to load thread pointer into EAX and RAX.

 So, where is ZERO_EXTEND RTX then?


Thread pointer (TP) is an opaque value to GCC.  GCC needs to load
TP into a SImode or DImode register.  ZERO_EXTEND isn't needed
when there is a single instruction to load TP into a DImode register.

-- 
H.J.


Re: PATCH: Properly generate X32 IE sequence

2012-03-19 Thread Uros Bizjak
On Mon, Mar 19, 2012 at 5:55 PM, H.J. Lu hjl.to...@gmail.com wrote:

 For x32,  thread pointer is an unsigned 32bit value.

 movl %fs:0, %eax

 is the correct instruction to load thread pointer into EAX and RAX.

 So, where is ZERO_EXTEND RTX then?


 Thread pointer (TP) is an opaque value to GCC.  GCC needs to load
 TP into a SImode or DImode register.  ZERO_EXTEND isn't needed
 when there is a single instruction to load TP into a DImode register.

I don't agree with this explanation. The mode can't be SImode and
DImode. TP is either SImode or ZERO_EXTENDed to DImode, this is the
reason we went for all that TARGET_X32 stuff in TP load RTX.

Please test my proposed patch. If it works OK, I will commit it to SVN.

Thanks,
Uros.


Re: PATCH: Properly generate X32 IE sequence

2012-03-19 Thread Uros Bizjak
On Mon, Mar 19, 2012 at 6:01 PM, Uros Bizjak ubiz...@gmail.com wrote:
 For x32,  thread pointer is an unsigned 32bit value.

 movl %fs:0, %eax

 is the correct instruction to load thread pointer into EAX and RAX.

 So, where is ZERO_EXTEND RTX then?


 Thread pointer (TP) is an opaque value to GCC.  GCC needs to load
 TP into a SImode or DImode register.  ZERO_EXTEND isn't needed
 when there is a single instruction to load TP into a DImode register.

 I don't agree with this explanation. The mode can't be SImode and
 DImode. TP is either SImode or ZERO_EXTENDed to DImode, this is the
 reason we went for all that TARGET_X32 stuff in TP load RTX.

 Please test my proposed patch. If it works OK, I will commit it to SVN.

The onyl acceptable way is to generate ZERO_EXTEND in place, so:

--cut here--
static rtx
get_thread_pointer (enum machine_mode tp_mode, bool to_reg)
{
  rtx tp = gen_rtx_UNSPEC (ptr_mode, gen_rtvec (1, const0_rtx), UNSPEC_TP);

  if (GET_MODE (tp) != tp_mode)
{
  gcc_assert (GET_MODE (tp) == SImode);
  gcc_assert (tp_mode == DImode);

  tp = gen_rtx_ZERO_EXTEND (tp_mode, tp);
}

  if (to_reg)
tp = copy_to_mode_reg (tp_mode, tp);

  return tp;
}
--cut here--

This will generate:

movqc@gottpoff(%rip), %rax
movzbl  %fs:(%rax), %eax
movb%al, y(%rip)
movqw@gottpoff(%rip), %rax
movzwl  %fs:(%rax), %eax
movw%ax, i(%rip)
ret

Uros.


Re: PATCH: Properly generate X32 IE sequence

2012-03-19 Thread H.J. Lu
On Mon, Mar 19, 2012 at 10:29 AM, Uros Bizjak ubiz...@gmail.com wrote:
 On Mon, Mar 19, 2012 at 6:01 PM, Uros Bizjak ubiz...@gmail.com wrote:
 For x32,  thread pointer is an unsigned 32bit value.

 movl %fs:0, %eax

 is the correct instruction to load thread pointer into EAX and RAX.

 So, where is ZERO_EXTEND RTX then?


 Thread pointer (TP) is an opaque value to GCC.  GCC needs to load
 TP into a SImode or DImode register.  ZERO_EXTEND isn't needed
 when there is a single instruction to load TP into a DImode register.

 I don't agree with this explanation. The mode can't be SImode and
 DImode. TP is either SImode or ZERO_EXTENDed to DImode, this is the
 reason we went for all that TARGET_X32 stuff in TP load RTX.

FWIW, TP maintained by OS is opaque to GCC and GCC mode doesn't
apply to the TP value maintained by OS.  The instruction pattern to load TP
into a register is provided by OS and is also opaque to GCC.  X32 OS provides
single instructions to load TP into SImode and DImode registers.  We
can load x32 TP into SImode register and ZERO_EXTENDs to DImode.
Or we can use the OS provided instruction to load TP into DImode
register directly.

 Please test my proposed patch. If it works OK, I will commit it to SVN.

 The onyl acceptable way is to generate ZERO_EXTEND in place, so:

 --cut here--
 static rtx
 get_thread_pointer (enum machine_mode tp_mode, bool to_reg)
 {
  rtx tp = gen_rtx_UNSPEC (ptr_mode, gen_rtvec (1, const0_rtx), UNSPEC_TP);

  if (GET_MODE (tp) != tp_mode)
    {
      gcc_assert (GET_MODE (tp) == SImode);
      gcc_assert (tp_mode == DImode);

      tp = gen_rtx_ZERO_EXTEND (tp_mode, tp);
    }

  if (to_reg)
    tp = copy_to_mode_reg (tp_mode, tp);

  return tp;
 }
 --cut here--

This version works fine.

Thanks.


-- 
H.J.


Re: PATCH: Properly generate X32 IE sequence

2012-03-19 Thread Uros Bizjak
On Mon, Mar 19, 2012 at 6:50 PM, H.J. Lu hjl.to...@gmail.com wrote:

 Please test my proposed patch. If it works OK, I will commit it to SVN.

 The onyl acceptable way is to generate ZERO_EXTEND in place, so:

 --cut here--
 static rtx
 get_thread_pointer (enum machine_mode tp_mode, bool to_reg)
 {
  rtx tp = gen_rtx_UNSPEC (ptr_mode, gen_rtvec (1, const0_rtx), UNSPEC_TP);

  if (GET_MODE (tp) != tp_mode)
    {
      gcc_assert (GET_MODE (tp) == SImode);
      gcc_assert (tp_mode == DImode);

      tp = gen_rtx_ZERO_EXTEND (tp_mode, tp);
    }

  if (to_reg)
    tp = copy_to_mode_reg (tp_mode, tp);

  return tp;
 }
 --cut here--

 This version works fine.

Attached patch was committed to mainline SVN with following ChangeLog:

2012-03-19  Uros Bizjak  ubiz...@gmail.com

* config/i386/i386.c (get_thread_pointer): Add tp_mode argument.
Generate ZERO_EXTEND in place if GET_MODE (tp) != tp_mode.
(legitimize_tls_address) TLS_MODEL_INITIAL_EXEC: Always generate
DImode UNSPEC_GOTNTPOFF references on TARGET_64BIT.
(ix86_decompose_address): Allow zero extended UNSPEC_TP references.

Revert:
2012-03-13  Uros Bizjak  ubiz...@gmail.com

* config/i386/i386.h (TARGET_TLS_INDIRECT_SEG_REFS): New.
* config/i386/i386.c (ix86_decompose_address): Use
TARGET_TLS_INDIRECT_SEG_REFS to prevent %fs:(%reg) addresses.
(legitimize_tls_address): Use TARGET_TLS_INDIRECT_SEG_REFS to load
thread pointer to a register.

Revert:
2012-03-10  H.J. Lu  hongjiu...@intel.com

* config/i386/i386.c (ix86_decompose_address): Disallow fs:(reg)
if Pmode != word_mode.
(legitimize_tls_address): Call gen_tls_initial_exec_x32 if
Pmode == SImode for TARGET_X32.

* config/i386/i386.md (UNSPEC_TLS_IE_X32): New.
(tls_initial_exec_x32): Likewise.

Tested on x86_64-pc-linux-gnu {,-m32}.

Thanks,
Uros.
Index: i386.md
===
--- i386.md (revision 185524)
+++ i386.md (working copy)
@@ -96,7 +96,6 @@
   UNSPEC_TLS_LD_BASE
   UNSPEC_TLSDESC
   UNSPEC_TLS_IE_SUN
-  UNSPEC_TLS_IE_X32
 
   ;; Other random patterns
   UNSPEC_SCAS
@@ -12836,28 +12835,6 @@
 }
   [(set_attr type multi)])
 
-;; When Pmode == SImode, there may be no REX prefix for ADD.  Avoid
-;; any instructions between MOV and ADD, which may interfere linker
-;; IE-LE optimization, since the last byte of the previous instruction
-;; before ADD may look like a REX prefix.  This also avoids
-;; movl x@gottpoff(%rip), %reg32
-;; movl $fs:(%reg32), %reg32
-;; Since address override works only on the (reg32) part in fs:(reg32),
-;; we can't use it as memory operand.
-(define_insn tls_initial_exec_x32
-  [(set (match_operand:SI 0 register_operand =r)
-   (unspec:SI
-[(match_operand 1 tls_symbolic_operand)]
-UNSPEC_TLS_IE_X32))
-   (clobber (reg:CC FLAGS_REG))]
-  TARGET_X32
-{
-  output_asm_insn
-(mov{l}\t{%%fs:0, %0|%0, DWORD PTR fs:0}, operands);
-  return add{l}\t{%a1@gottpoff(%%rip), %0|%0, %a1@gottpoff[rip]};
-}
-  [(set_attr type multi)])
-
 ;; GNU2 TLS patterns can be split.
 
 (define_expand tls_dynamic_gnu2_32
Index: i386.c
===
--- i386.c  (revision 185524)
+++ i386.c  (working copy)
@@ -11514,6 +11514,10 @@ ix86_decompose_address (rtx addr, struct ix86_addr
  scale = 1  scale;
  break;
 
+   case ZERO_EXTEND:
+ op = XEXP (op, 0);
+ /* FALLTHRU */
+
case UNSPEC:
  if (XINT (op, 1) == UNSPEC_TP
   TARGET_TLS_DIRECT_SEG_REFS
@@ -12483,15 +12487,20 @@ legitimize_pic_address (rtx orig, rtx reg)
 /* Load the thread pointer.  If TO_REG is true, force it into a register.  */
 
 static rtx
-get_thread_pointer (bool to_reg)
+get_thread_pointer (enum machine_mode tp_mode, bool to_reg)
 {
   rtx tp = gen_rtx_UNSPEC (ptr_mode, gen_rtvec (1, const0_rtx), UNSPEC_TP);
 
-  if (GET_MODE (tp) != Pmode)
-tp = convert_to_mode (Pmode, tp, 1);
+  if (GET_MODE (tp) != tp_mode)
+{
+  gcc_assert (GET_MODE (tp) == SImode);
+  gcc_assert (tp_mode == DImode);
 
+  tp = gen_rtx_ZERO_EXTEND (tp_mode, tp);
+}
+
   if (to_reg)
-tp = copy_addr_to_reg (tp);
+tp = copy_to_mode_reg (tp_mode, tp);
 
   return tp;
 }
@@ -12543,6 +12552,7 @@ legitimize_tls_address (rtx x, enum tls_model mode
 {
   rtx dest, base, off;
   rtx pic = NULL_RTX, tp = NULL_RTX;
+  enum machine_mode tp_mode = Pmode;
   int type;
 
   switch (model)
@@ -12568,7 +12578,7 @@ legitimize_tls_address (rtx x, enum tls_model mode
  else
emit_insn (gen_tls_dynamic_gnu2_32 (dest, x, pic));
 
- tp = get_thread_pointer (true);
+ tp = get_thread_pointer (Pmode, true);
  dest = force_reg (Pmode, gen_rtx_PLUS (Pmode, tp, dest));
 
  set_unique_reg_note (get_last_insn (), 

Re: PATCH: Properly generate X32 IE sequence

2012-03-18 Thread Uros Bizjak
On Sat, Mar 17, 2012 at 10:50 PM, H.J. Lu hjl.to...@gmail.com wrote:

 Since we must use reg64 in %fs:(%reg) memory operand like

 movq x@gottpoff(%rip),%reg64;
 mov %fs:(%reg64),%reg

 this patch optimizes x32 TLS IE load and store by wrapping
 %reg64 inside of UNSPEC when Pmode == SImode.  OK for
 trunk?

 Can you implement this with define_insn_and_split, like i.e.
 *tls_dynamic_gnu2_combine_32 ?


 I will give it a try again.  Last time when I tried it, GCC didn't
 like memory operand in DImode when Pmode == SImode.

 You should remove mode for tls_symbolic_operand predicate.


 I am testing this patch.  OK for trunk if it passes all tests?

No, force_reg will generate a pseudo, so this conversion is valid only
for !can_create_pseudo ().

At least for *tls_initial_exec_x32_store, you will need a temporary to
split the pattern after reload.

Uros.


Re: PATCH: Properly generate X32 IE sequence

2012-03-18 Thread Uros Bizjak
On Sun, Mar 18, 2012 at 5:01 PM, Uros Bizjak ubiz...@gmail.com wrote:

 I am testing this patch.  OK for trunk if it passes all tests?

 No, force_reg will generate a pseudo, so this conversion is valid only
 for !can_create_pseudo ().

 At least for *tls_initial_exec_x32_store, you will need a temporary to
 split the pattern after reload.

Please try attached patch. It simply throws away all recent
complications w.r.t. to thread pointer and always handles TP in
DImode.

The testcase:

--cut here--
__thread int foo __attribute__ ((tls_model (initial-exec)));

void bar (int x)
{
  foo = x;
}

int baz (void)
{
  return foo;
}
--cut here--

Now compiles to:

bar:
movqfoo@gottpoff(%rip), %rax
movl%edi, %fs:(%rax)
ret

baz:
movqfoo@gottpoff(%rip), %rax
movl%fs:(%rax), %eax
ret

In effect, this always generates %fs(%rDI) and emits REX prefix before
mov/add to satisfy brain-dead linkers.

The patch is bootstrapping now on x86_64-pc-linux-gnu.

Uros.
Index: i386.md
===
--- i386.md (revision 185505)
+++ i386.md (working copy)
@@ -12836,28 +12836,6 @@
 }
   [(set_attr type multi)])
 
-;; When Pmode == SImode, there may be no REX prefix for ADD.  Avoid
-;; any instructions between MOV and ADD, which may interfere linker
-;; IE-LE optimization, since the last byte of the previous instruction
-;; before ADD may look like a REX prefix.  This also avoids
-;; movl x@gottpoff(%rip), %reg32
-;; movl $fs:(%reg32), %reg32
-;; Since address override works only on the (reg32) part in fs:(reg32),
-;; we can't use it as memory operand.
-(define_insn tls_initial_exec_x32
-  [(set (match_operand:SI 0 register_operand =r)
-   (unspec:SI
-[(match_operand 1 tls_symbolic_operand)]
-UNSPEC_TLS_IE_X32))
-   (clobber (reg:CC FLAGS_REG))]
-  TARGET_X32
-{
-  output_asm_insn
-(mov{l}\t{%%fs:0, %0|%0, DWORD PTR fs:0}, operands);
-  return add{l}\t{%a1@gottpoff(%%rip), %0|%0, %a1@gottpoff[rip]};
-}
-  [(set_attr type multi)])
-
 ;; GNU2 TLS patterns can be split.
 
 (define_expand tls_dynamic_gnu2_32
Index: i386.c
===
--- i386.c  (revision 185504)
+++ i386.c  (working copy)
@@ -11509,6 +11509,10 @@ ix86_decompose_address (rtx addr, struct ix86_addr
  scale = 1  scale;
  break;
 
+   case ZERO_EXTEND:
+ op = XEXP (op, 0);
+ /* FALLTHRU */
+
case UNSPEC:
  if (XINT (op, 1) == UNSPEC_TP
   TARGET_TLS_DIRECT_SEG_REFS
@@ -12478,15 +12482,15 @@ legitimize_pic_address (rtx orig, rtx reg)
 /* Load the thread pointer.  If TO_REG is true, force it into a register.  */
 
 static rtx
-get_thread_pointer (bool to_reg)
+get_thread_pointer (enum machine_mode tp_mode, bool to_reg)
 {
   rtx tp = gen_rtx_UNSPEC (ptr_mode, gen_rtvec (1, const0_rtx), UNSPEC_TP);
 
-  if (GET_MODE (tp) != Pmode)
-tp = convert_to_mode (Pmode, tp, 1);
+  if (GET_MODE (tp) != tp_mode)
+tp = convert_to_mode (tp_mode, tp, 1);
 
   if (to_reg)
-tp = copy_addr_to_reg (tp);
+tp = copy_to_mode_reg (tp_mode, tp);
 
   return tp;
 }
@@ -12538,6 +12542,7 @@ legitimize_tls_address (rtx x, enum tls_model mode
 {
   rtx dest, base, off;
   rtx pic = NULL_RTX, tp = NULL_RTX;
+  enum machine_mode tp_mode = Pmode;
   int type;
 
   switch (model)
@@ -12563,7 +12568,7 @@ legitimize_tls_address (rtx x, enum tls_model mode
  else
emit_insn (gen_tls_dynamic_gnu2_32 (dest, x, pic));
 
- tp = get_thread_pointer (true);
+ tp = get_thread_pointer (Pmode, true);
  dest = force_reg (Pmode, gen_rtx_PLUS (Pmode, tp, dest));
 
  set_unique_reg_note (get_last_insn (), REG_EQUAL, x);
@@ -12613,7 +12618,7 @@ legitimize_tls_address (rtx x, enum tls_model mode
  else
emit_insn (gen_tls_dynamic_gnu2_32 (base, tmp, pic));
 
- tp = get_thread_pointer (true);
+ tp = get_thread_pointer (Pmode, true);
  set_unique_reg_note (get_last_insn (), REG_EQUAL,
   gen_rtx_MINUS (Pmode, tmp, tp));
}
@@ -12659,27 +12664,18 @@ legitimize_tls_address (rtx x, enum tls_model mode
 case TLS_MODEL_INITIAL_EXEC:
   if (TARGET_64BIT)
{
+ tp_mode = DImode;
+
  if (TARGET_SUN_TLS)
{
  /* The Sun linker took the AMD64 TLS spec literally
 and can only handle %rax as destination of the
 initial executable code sequence.  */
 
- dest = gen_reg_rtx (Pmode);
+ dest = gen_reg_rtx (tp_mode);
  emit_insn (gen_tls_initial_exec_64_sun (dest, x));
  return dest;
}
- else if (Pmode == SImode)
-   {
- /* Always generate
-   movl %fs:0, %reg32
-   addl 

Re: PATCH: Properly generate X32 IE sequence

2012-03-17 Thread H.J. Lu
On Tue, Mar 13, 2012 at 3:37 AM, Uros Bizjak ubiz...@gmail.com wrote:
 On Tue, Mar 13, 2012 at 8:11 AM, Uros Bizjak ubiz...@gmail.com wrote:

 Please try attached patch.  It introduces TARGET_TLS_INDIRECT_SEG_REFS
 to block only indirect seg references.

 There is no regression.

 Thanks, committed to mainline SVN with following ChangeLog:

 2012-03-13  Uros Bizjak  ubiz...@gmail.com

        * config/i386/i386.h (TARGET_TLS_INDIRECT_SEG_REFS): New.
        * config/i386/i386.c (ix86_decompose_address): Use
        TARGET_TLS_INDIRECT_SEG_REFS to prevent %fs:(%reg) addresses.
        (legitimize_tls_address): Use TARGET_TLS_INDIRECT_SEG_REFS to load
        thread pointer to a register.

 Tested on x86_64-pc-linux-gnu {,-m32}.

 BTW, this x32 TLS IE optimization:

      movq    %rax, %fs:(%rdx)

 This is just looking for troubles. If we said these addresses are
 invalid, then we shouldn't generate them.

 OTOH,  we can improve rejection test a bit to reject only non-word
 mode registers.

 2012-03-13  Uros Bizjak  ubiz...@gmail.com

        * config/i386/i386.c (ix86_decompose_address): Prevent %fs:(%reg)
        addresses only when %reg is not in word mode.

 Tested on x86_64-pc-linux-gnu {,-m32}, committed.

 Uros.

 Index: i386.c
 ===
 --- i386.c      (revision 185278)
 +++ i386.c      (working copy)
 @@ -11563,8 +11563,10 @@
        return 0;
     }

 -  if (seg != SEG_DEFAULT  (base || index)
 -       !TARGET_TLS_INDIRECT_SEG_REFS)
 +/* Address override works only on the (%reg) part of %fs:(%reg).  */
 +  if (seg != SEG_DEFAULT
 +       ((base  GET_MODE (base) != word_mode)
 +         || (index  GET_MODE (index) != word_mode)))
     return 0;

   /* Extract the integral value of scale.  */

Is my x32 TLS IE optimization:

http://gcc.gnu.org/ml/gcc-patches/2012-03/msg00714.html

OK for trunk?

Thanks.

-- 
H.J.


Re: PATCH: Properly generate X32 IE sequence

2012-03-17 Thread Uros Bizjak
On Sun, Mar 11, 2012 at 6:11 PM, H.J. Lu hjl.to...@gmail.com wrote:

 Since we must use reg64 in %fs:(%reg) memory operand like

 movq x@gottpoff(%rip),%reg64;
 mov %fs:(%reg64),%reg

 this patch optimizes x32 TLS IE load and store by wrapping
 %reg64 inside of UNSPEC when Pmode == SImode.  OK for
 trunk?

 Thanks.

 --
 H.J.
 ---
 2012-03-11  H.J. Lu  hongjiu...@intel.com

        * config/i386/i386.md (*tls_initial_exec_x32_load): New.
        (*tls_initial_exec_x32_store): Likewise.

Can you implement this with define_insn_and_split, like i.e.
*tls_dynamic_gnu2_combine_32 ?

Uros.


Re: PATCH: Properly generate X32 IE sequence

2012-03-17 Thread H.J. Lu
On Sat, Mar 17, 2012 at 11:10 AM, Uros Bizjak ubiz...@gmail.com wrote:
 On Sun, Mar 11, 2012 at 6:11 PM, H.J. Lu hjl.to...@gmail.com wrote:

 Since we must use reg64 in %fs:(%reg) memory operand like

 movq x@gottpoff(%rip),%reg64;
 mov %fs:(%reg64),%reg

 this patch optimizes x32 TLS IE load and store by wrapping
 %reg64 inside of UNSPEC when Pmode == SImode.  OK for
 trunk?

 Thanks.

 --
 H.J.
 ---
 2012-03-11  H.J. Lu  hongjiu...@intel.com

        * config/i386/i386.md (*tls_initial_exec_x32_load): New.
        (*tls_initial_exec_x32_store): Likewise.

 Can you implement this with define_insn_and_split, like i.e.
 *tls_dynamic_gnu2_combine_32 ?


I will give it a try again.  Last time when I tried it, GCC didn't
like memory operand in DImode when Pmode == SImode.


-- 
H.J.


Re: PATCH: Properly generate X32 IE sequence

2012-03-17 Thread Uros Bizjak
On Sat, Mar 17, 2012 at 7:18 PM, H.J. Lu hjl.to...@gmail.com wrote:

 Since we must use reg64 in %fs:(%reg) memory operand like

 movq x@gottpoff(%rip),%reg64;
 mov %fs:(%reg64),%reg

 this patch optimizes x32 TLS IE load and store by wrapping
 %reg64 inside of UNSPEC when Pmode == SImode.  OK for
 trunk?

 Thanks.

 --
 H.J.
 ---
 2012-03-11  H.J. Lu  hongjiu...@intel.com

        * config/i386/i386.md (*tls_initial_exec_x32_load): New.
        (*tls_initial_exec_x32_store): Likewise.

 Can you implement this with define_insn_and_split, like i.e.
 *tls_dynamic_gnu2_combine_32 ?


 I will give it a try again.  Last time when I tried it, GCC didn't
 like memory operand in DImode when Pmode == SImode.

You should remove mode for tls_symbolic_operand predicate.

Uros.


Re: PATCH: Properly generate X32 IE sequence

2012-03-17 Thread H.J. Lu
On Sat, Mar 17, 2012 at 11:20 AM, Uros Bizjak ubiz...@gmail.com wrote:
 On Sat, Mar 17, 2012 at 7:18 PM, H.J. Lu hjl.to...@gmail.com wrote:

 Since we must use reg64 in %fs:(%reg) memory operand like

 movq x@gottpoff(%rip),%reg64;
 mov %fs:(%reg64),%reg

 this patch optimizes x32 TLS IE load and store by wrapping
 %reg64 inside of UNSPEC when Pmode == SImode.  OK for
 trunk?

 Thanks.

 --
 H.J.
 ---
 2012-03-11  H.J. Lu  hongjiu...@intel.com

        * config/i386/i386.md (*tls_initial_exec_x32_load): New.
        (*tls_initial_exec_x32_store): Likewise.

 Can you implement this with define_insn_and_split, like i.e.
 *tls_dynamic_gnu2_combine_32 ?


 I will give it a try again.  Last time when I tried it, GCC didn't
 like memory operand in DImode when Pmode == SImode.

 You should remove mode for tls_symbolic_operand predicate.


I am testing this patch.  OK for trunk if it passes all tests?

Thanks.

-- 
H.J.
2012-03-17  H.J. Lu  hongjiu...@intel.com

* config/i386/i386-protos.h (ix86_split_tls_initial_exec_x32): New.
* config/i386/i386.c (ix86_split_tls_initial_exec_x32): Likewise.

* config/i386/i386.md (*tls_initial_exec_x32_load): New.
(*tls_initial_exec_x32_store): Likewise.

diff --git a/gcc/config/i386/i386-protos.h b/gcc/config/i386/i386-protos.h
index 630112f..2c4f1ed 100644
--- a/gcc/config/i386/i386-protos.h
+++ b/gcc/config/i386/i386-protos.h
@@ -213,6 +213,7 @@ extern unsigned int ix86_get_callcvt (const_tree);
 #endif
 
 extern rtx ix86_tls_module_base (void);
+extern void ix86_split_tls_initial_exec_x32 (rtx [], enum machine_mode, bool);
 
 extern void ix86_expand_vector_init (bool, rtx, rtx);
 extern void ix86_expand_vector_set (bool, rtx, rtx, int);
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 78a366e..5a9c673 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -12754,6 +12754,28 @@ legitimize_tls_address (rtx x, enum tls_model model, 
bool for_mov)
   return dest;
 }
 
+/* Split x32 TLS IE access in MODE.  Split load if LOAD is TRUE,
+   otherwise split store.  */
+
+void
+ix86_split_tls_initial_exec_x32 (rtx operands[],
+enum machine_mode mode, bool load)
+{
+  rtx base, mem;
+  rtx off = load ? operands[1] : operands[0];
+  off = gen_rtx_UNSPEC (DImode, gen_rtvec (1, off), UNSPEC_GOTNTPOFF);
+  off = gen_rtx_CONST (DImode, off);
+  off = gen_const_mem (DImode, off);
+  set_mem_alias_set (off, ix86_GOT_alias_set ());
+  base = gen_rtx_UNSPEC (DImode, gen_rtvec (1, const0_rtx), UNSPEC_TP);
+  off = gen_rtx_PLUS (DImode, base, force_reg (DImode, off));
+  mem = gen_rtx_MEM (mode, off);
+  if (load)
+emit_move_insn (operands[0], mem);
+  else
+emit_move_insn (mem, operands[1]);
+}
+
 /* Create or return the unique __imp_DECL dllimport symbol corresponding
to symbol DECL.  */
 
diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index eae26ae..78faeec 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -12858,6 +12858,32 @@
 }
   [(set_attr type multi)])
 
+(define_insn_and_split *tls_initial_exec_x32_load
+  [(set (match_operand:SWI1248x 0 register_operand =r)
+(mem:SWI1248x
+ (unspec:SI
+  [(match_operand 1 tls_symbolic_operand )]
+  UNSPEC_TLS_IE_X32)))
+   (clobber (reg:CC FLAGS_REG))]
+  TARGET_X32
+  #
+  
+  [(const_int 0)]
+  ix86_split_tls_initial_exec_x32 (operands, MODEmode, TRUE); DONE;)
+
+(define_insn_and_split *tls_initial_exec_x32_store
+  [(set (mem:SWI1248x
+ (unspec:SI
+  [(match_operand 0 tls_symbolic_operand )]
+  UNSPEC_TLS_IE_X32))
+   (match_operand:SWI1248x 1 register_operand r))
+   (clobber (reg:CC FLAGS_REG))]
+  TARGET_X32
+  #
+  
+  [(const_int 0)]
+  ix86_split_tls_initial_exec_x32 (operands, MODEmode, FALSE); DONE;)
+
 ;; GNU2 TLS patterns can be split.
 
 (define_expand tls_dynamic_gnu2_32


Re: PATCH: Properly generate X32 IE sequence

2012-03-13 Thread Uros Bizjak
On Tue, Mar 13, 2012 at 2:20 AM, H.J. Lu hjl.to...@gmail.com wrote:

 Here is the patch which is equivalent to clearing MASK_TLS_DIRECT_SEG_REFS
 when Pmode != word_mode.  We need to keep

          else if (Pmode == SImode)
            {
              /* Always generate
                        movl %fs:0, %reg32
                        addl xgottpoff(%rip), %reg32
                 to support linker IE-LE optimization and avoid
                 fs:(%reg32) as memory operand.  */
              dest = gen_reg_rtx (Pmode);
              emit_insn (gen_tls_initial_exec_x32 (dest, x));
              return dest;
            }

 to support linker IE-LE optimization.  TARGET_TLS_DIRECT_SEG_REFS only 
 affects
 TLS LE access and fs:(%reg) is only generated by combine.

 So the main impact of disabling TARGET_TLS_DIRECT_SEG_REFS is to disable
 fs:immediate memory operand for TLS LE access, which doesn't have any 
 problems
 to begin with.

 I would prefer to keep TARGET_TLS_DIRECT_SEG_REFS and disable only
 fs:(%reg), which is generated by combine.

 Please try attached patch.  It introduces TARGET_TLS_INDIRECT_SEG_REFS
 to block only indirect seg references.

 There is no regression.

Thanks, committed to mainline SVN with following ChangeLog:

2012-03-13  Uros Bizjak  ubiz...@gmail.com

* config/i386/i386.h (TARGET_TLS_INDIRECT_SEG_REFS): New.
* config/i386/i386.c (ix86_decompose_address): Use
TARGET_TLS_INDIRECT_SEG_REFS to prevent %fs:(%reg) addresses.
(legitimize_tls_address): Use TARGET_TLS_INDIRECT_SEG_REFS to load
thread pointer to a register.

Tested on x86_64-pc-linux-gnu {,-m32}.

 BTW, this x32 TLS IE optimization:

 movq%rax, %fs:(%rdx)

This is just looking for troubles. If we said these addresses are
invalid, then we shouldn't generate them.

Uros.


Re: PATCH: Properly generate X32 IE sequence

2012-03-13 Thread Uros Bizjak
On Tue, Mar 13, 2012 at 8:11 AM, Uros Bizjak ubiz...@gmail.com wrote:

 Please try attached patch.  It introduces TARGET_TLS_INDIRECT_SEG_REFS
 to block only indirect seg references.

 There is no regression.

 Thanks, committed to mainline SVN with following ChangeLog:

 2012-03-13  Uros Bizjak  ubiz...@gmail.com

        * config/i386/i386.h (TARGET_TLS_INDIRECT_SEG_REFS): New.
        * config/i386/i386.c (ix86_decompose_address): Use
        TARGET_TLS_INDIRECT_SEG_REFS to prevent %fs:(%reg) addresses.
        (legitimize_tls_address): Use TARGET_TLS_INDIRECT_SEG_REFS to load
        thread pointer to a register.

 Tested on x86_64-pc-linux-gnu {,-m32}.

 BTW, this x32 TLS IE optimization:

      movq    %rax, %fs:(%rdx)

 This is just looking for troubles. If we said these addresses are
 invalid, then we shouldn't generate them.

OTOH,  we can improve rejection test a bit to reject only non-word
mode registers.

2012-03-13  Uros Bizjak  ubiz...@gmail.com

* config/i386/i386.c (ix86_decompose_address): Prevent %fs:(%reg)
addresses only when %reg is not in word mode.

Tested on x86_64-pc-linux-gnu {,-m32}, committed.

Uros.

Index: i386.c
===
--- i386.c  (revision 185278)
+++ i386.c  (working copy)
@@ -11563,8 +11563,10 @@
return 0;
 }

-  if (seg != SEG_DEFAULT  (base || index)
-   !TARGET_TLS_INDIRECT_SEG_REFS)
+/* Address override works only on the (%reg) part of %fs:(%reg).  */
+  if (seg != SEG_DEFAULT
+   ((base  GET_MODE (base) != word_mode)
+ || (index  GET_MODE (index) != word_mode)))
 return 0;

   /* Extract the integral value of scale.  */


Re: PATCH: Properly generate X32 IE sequence

2012-03-13 Thread H.J. Lu
On Tue, Mar 13, 2012 at 3:37 AM, Uros Bizjak ubiz...@gmail.com wrote:
 On Tue, Mar 13, 2012 at 8:11 AM, Uros Bizjak ubiz...@gmail.com wrote:

 Please try attached patch.  It introduces TARGET_TLS_INDIRECT_SEG_REFS
 to block only indirect seg references.

 There is no regression.

 Thanks, committed to mainline SVN with following ChangeLog:

 2012-03-13  Uros Bizjak  ubiz...@gmail.com

        * config/i386/i386.h (TARGET_TLS_INDIRECT_SEG_REFS): New.
        * config/i386/i386.c (ix86_decompose_address): Use
        TARGET_TLS_INDIRECT_SEG_REFS to prevent %fs:(%reg) addresses.
        (legitimize_tls_address): Use TARGET_TLS_INDIRECT_SEG_REFS to load
        thread pointer to a register.

 Tested on x86_64-pc-linux-gnu {,-m32}.

 BTW, this x32 TLS IE optimization:

      movq    %rax, %fs:(%rdx)

 This is just looking for troubles. If we said these addresses are
 invalid, then we shouldn't generate them.

 OTOH,  we can improve rejection test a bit to reject only non-word
 mode registers.

 2012-03-13  Uros Bizjak  ubiz...@gmail.com

        * config/i386/i386.c (ix86_decompose_address): Prevent %fs:(%reg)
        addresses only when %reg is not in word mode.

 Tested on x86_64-pc-linux-gnu {,-m32}, committed.

 Uros.

 Index: i386.c
 ===
 --- i386.c      (revision 185278)
 +++ i386.c      (working copy)
 @@ -11563,8 +11563,10 @@
        return 0;
     }

 -  if (seg != SEG_DEFAULT  (base || index)
 -       !TARGET_TLS_INDIRECT_SEG_REFS)
 +/* Address override works only on the (%reg) part of %fs:(%reg).  */
 +  if (seg != SEG_DEFAULT
 +       ((base  GET_MODE (base) != word_mode)
 +         || (index  GET_MODE (index) != word_mode)))
     return 0;

   /* Extract the integral value of scale.  */

This works.

-- 
H.J.


Re: PATCH: Properly generate X32 IE sequence

2012-03-12 Thread Uros Bizjak
On Sun, Mar 11, 2012 at 10:24 PM, H.J. Lu hjl.to...@gmail.com wrote:

 Here is the patch which is equivalent to clearing MASK_TLS_DIRECT_SEG_REFS
 when Pmode != word_mode.  We need to keep

          else if (Pmode == SImode)
            {
              /* Always generate
                        movl %fs:0, %reg32
                        addl xgottpoff(%rip), %reg32
                 to support linker IE-LE optimization and avoid
                 fs:(%reg32) as memory operand.  */
              dest = gen_reg_rtx (Pmode);
              emit_insn (gen_tls_initial_exec_x32 (dest, x));
              return dest;
            }

 to support linker IE-LE optimization.  TARGET_TLS_DIRECT_SEG_REFS only 
 affects
 TLS LE access and fs:(%reg) is only generated by combine.

 So the main impact of disabling TARGET_TLS_DIRECT_SEG_REFS is to disable
 fs:immediate memory operand for TLS LE access, which doesn't have any problems
 to begin with.

 I would prefer to keep TARGET_TLS_DIRECT_SEG_REFS and disable only
 fs:(%reg), which is generated by combine.

Please try attached patch.  It introduces TARGET_TLS_INDIRECT_SEG_REFS
to block only indirect seg references.

Uros.
Index: i386.c
===
--- i386.c  (revision 185250)
+++ i386.c  (working copy)
@@ -11552,11 +11552,6 @@ ix86_decompose_address (rtx addr, struct ix86_addr
   else
 disp = addr;   /* displacement */
 
-  /* Since address override works only on the (reg32) part in fs:(reg32),
- we can't use it as memory operand.  */
-  if (Pmode != word_mode  seg == SEG_FS  (base || index))
-return 0;
-
   if (index)
 {
   if (REG_P (index))
@@ -11568,6 +11563,10 @@ ix86_decompose_address (rtx addr, struct ix86_addr
return 0;
 }
 
+  if (seg != SEG_DEFAULT  (base || index)
+   !TARGET_TLS_INDIRECT_SEG_REFS)
+return 0;
+
   /* Extract the integral value of scale.  */
   if (scale_rtx)
 {
@@ -12696,7 +12695,9 @@ legitimize_tls_address (rtx x, enum tls_model mode
 
   if (TARGET_64BIT || TARGET_ANY_GNU_TLS)
{
-  base = get_thread_pointer (for_mov || !TARGET_TLS_DIRECT_SEG_REFS);
+  base = get_thread_pointer (for_mov
+|| !(TARGET_TLS_DIRECT_SEG_REFS
+  TARGET_TLS_INDIRECT_SEG_REFS));
  off = force_reg (Pmode, off);
  return gen_rtx_PLUS (Pmode, base, off);
}
@@ -12716,7 +12717,9 @@ legitimize_tls_address (rtx x, enum tls_model mode
 
   if (TARGET_64BIT || TARGET_ANY_GNU_TLS)
{
- base = get_thread_pointer (for_mov || !TARGET_TLS_DIRECT_SEG_REFS);
+ base = get_thread_pointer (for_mov
+|| !(TARGET_TLS_DIRECT_SEG_REFS
+  TARGET_TLS_INDIRECT_SEG_REFS));
  return gen_rtx_PLUS (Pmode, base, off);
}
   else
@@ -13249,7 +13252,8 @@ ix86_delegitimize_tls_address (rtx orig_x)
   rtx x = orig_x, unspec;
   struct ix86_address addr;
 
-  if (!TARGET_TLS_DIRECT_SEG_REFS)
+  if (!(TARGET_TLS_DIRECT_SEG_REFS
+TARGET_TLS_INDIRECT_SEG_REFS))
 return orig_x;
   if (MEM_P (x))
 x = XEXP (x, 0);
Index: i386.h
===
--- i386.h  (revision 185250)
+++ i386.h  (working copy)
@@ -467,6 +467,9 @@ extern int x86_prefetch_sse;
 #define TARGET_TLS_DIRECT_SEG_REFS_DEFAULT 0
 #endif
 
+/* Address override works only on the (%reg) part in %fs:(%reg).  */
+#define TARGET_TLS_INDIRECT_SEG_REFS (Pmode == word_mode)
+
 /* Fence to use after loop using storent.  */
 
 extern tree x86_mfence;


Re: PATCH: Properly generate X32 IE sequence

2012-03-12 Thread H.J. Lu
On Mon, Mar 12, 2012 at 3:35 PM, H.J. Lu hjl.to...@gmail.com wrote:
 On Mon, Mar 12, 2012 at 12:39 PM, Uros Bizjak ubiz...@gmail.com wrote:
 On Sun, Mar 11, 2012 at 10:24 PM, H.J. Lu hjl.to...@gmail.com wrote:

 Here is the patch which is equivalent to clearing MASK_TLS_DIRECT_SEG_REFS
 when Pmode != word_mode.  We need to keep

          else if (Pmode == SImode)
            {
              /* Always generate
                        movl %fs:0, %reg32
                        addl xgottpoff(%rip), %reg32
                 to support linker IE-LE optimization and avoid
                 fs:(%reg32) as memory operand.  */
              dest = gen_reg_rtx (Pmode);
              emit_insn (gen_tls_initial_exec_x32 (dest, x));
              return dest;
            }

 to support linker IE-LE optimization.  TARGET_TLS_DIRECT_SEG_REFS only 
 affects
 TLS LE access and fs:(%reg) is only generated by combine.

 So the main impact of disabling TARGET_TLS_DIRECT_SEG_REFS is to disable
 fs:immediate memory operand for TLS LE access, which doesn't have any 
 problems
 to begin with.

 I would prefer to keep TARGET_TLS_DIRECT_SEG_REFS and disable only
 fs:(%reg), which is generated by combine.

 Please try attached patch.  It introduces TARGET_TLS_INDIRECT_SEG_REFS
 to block only indirect seg references.

 Uros.

 I am testing it.


There is no regression.

BTW, this x32 TLS IE optimization:

http://gcc.gnu.org/ml/gcc-patches/2012-03/msg00714.html

is still useful.  For

[hjl@gnu-6 tls]$ cat ie2.i
extern __thread long long int x;

extern long long int y;

void
ie2 (void)
{
  x = y;
}
[hjl@gnu-6 tls]$

my patch turns

ie2:
.LFB0:
.cfi_startproc
movqy(%rip), %rdx   # 6 *movdi_internal_rex64/2 [length = 7]
movl%fs:0, %eax # 5 tls_initial_exec_x32[length = 16]
addlx@gottpoff(%rip), %eax
movq%rdx, (%eax)# 7 *movdi_internal_rex64/4 [length = 3]
ret # 14simple_return_internal  [length = 1]
.cfi_endproc

into

ie2:
.LFB0:
.cfi_startproc
movqy(%rip), %rax   # 6 *movdi_internal_rex64/2 [length = 7]
movqx@gottpoff(%rip), %rdx  # 7 *tls_initial_exec_x32_store 
[length = 16]
movq%rax, %fs:(%rdx)
ret # 14simple_return_internal  [length = 1]
.cfi_endproc



-- 
H.J.


Re: PATCH: Properly generate X32 IE sequence

2012-03-11 Thread H.J. Lu
On Sat, Mar 10, 2012 at 10:49 AM, H.J. Lu hjl.to...@gmail.com wrote:
 On Sat, Mar 10, 2012 at 5:09 AM, Uros Bizjak ubiz...@gmail.com wrote:
 On Fri, Mar 9, 2012 at 11:26 PM, H.J. Lu hjl.to...@gmail.com wrote:
 On Mon, Mar 5, 2012 at 9:25 AM, Uros Bizjak ubiz...@gmail.com wrote:
 On Mon, Mar 5, 2012 at 6:03 PM, H.J. Lu hjl.to...@gmail.com wrote:

 X86-64 linker optimizes TLS_MODEL_INITIAL_EXEC to TLS_MODEL_LOCAL_EXEC
 by checking

        movq foo@gottpoff(%rip), %reg

 and

        addq foo@gottpoff(%rip), %reg

 It uses the REX prefix to avoid the last byte of the previous
 instruction.  With 32bit Pmode, we may not have the REX prefix and
 the last byte of the previous instruction may be an offset, which
 may look like a REX prefix.  IE-LE optimization will generate corrupted
 binary.  This patch makes sure we always output an REX pfrefix for
 UNSPEC_GOTNTPOFF.  OK for trunk?

 Actually, linker has:

    case R_X86_64_GOTTPOFF:
      /* Check transition from IE access model:
                mov foo@gottpoff(%rip), %reg
                add foo@gottpoff(%rip), %reg
       */

      /* Check REX prefix first.  */
      if (offset = 3  (offset + 4) = sec-size)
        {
          val = bfd_get_8 (abfd, contents + offset - 3);
          if (val != 0x48  val != 0x4c)
            {
              /* X32 may have 0x44 REX prefix or no REX prefix.  */
              if (ABI_64_P (abfd))
                return FALSE;
            }
        }
      else
        {
          /* X32 may not have any REX prefix.  */
          if (ABI_64_P (abfd))
            return FALSE;
          if (offset  2 || (offset + 3)  sec-size)
            return FALSE;
        }

 So, it should handle the case without REX just OK. If it doesn't, then
 this is a bug in binutils.


 The last byte of the displacement in the previous instruction
 may happen to look like a REX byte. In that case, linker
 will overwrite the last byte of the previous instruction and
 generate the wrong instruction sequence.

 I need to update linker to enforce the REX byte check.

 One important observation: if we want to follow the x86_64 TLS spec
 strictly, we have to use existing DImode patterns only. This also
 means that we should NOT convert other TLS patterns to Pmode, since
 they explicitly state movq and addq. If this is not the case, then we
 need new TLS specification for X32.

 Here is a patch to properly generate X32 IE sequence.

 This is the summary of differences between x86-64 TLS and x32 TLS:

                     x86-64                               x32
 GD
    byte 0x66; leaq foo@tlsgd(%rip),%rdi;         leaq foo@tlsgd(%rip),%rdi;
    .word 0x; rex64; call __tls_get_addr@plt  .word 0x; rex64;
 call __tls_get_addr@plt

 GD-IE optimization
   movq %fs:0,%rax; addq x@gottpoff(%rip),%rax    movl %fs:0,%eax;
 addq x@gottpoff(%rip),%rax

 GD-LE optimization
   movq %fs:0,%rax; leaq x@tpoff(%rax),%rax       movl %fs:0,%eax;
 leaq x@tpoff(%rax),%rax

 LD
  leaq foo@tlsld(%rip),%rdi;                      leaq foo@tlsld(%rip),%rdi;
  call __tls_get_addr@plt                         call __tls_get_addr@plt

 LD-LE optimization
  .word 0x; .byte 0x66; movq %fs:0, %rax      nopl 0x0(%rax); movl
 %fs:0, %eax

 IE
   movq %fs:0,%reg64;                             movl %fs:0,%reg32;
   addq x@gottpoff(%rip),%reg64                   addl 
 x@gottpoff(%rip),%reg32

   or
                                                  Not supported if
 Pmode == SImode
   movq x@gottpoff(%rip),%reg64;                  movq 
 x@gottpoff(%rip),%reg64;
   movq %fs:(%reg64),%reg32                       movl %fs:(%reg64), %reg32

 IE-LE optimization

   movq %fs:0,%reg64;                             movl %fs:0,%reg32;
   addq x@gottpoff(%rip),%reg64                   addl 
 x@gottpoff(%rip),%reg32

   to

   movq %fs:0,%reg64;                             movl %fs:0,%reg32;
   addq foo@tpoff, %reg64                         addl foo@tpoff, %reg32

   movq %fs:0,%reg64;                             movl %fs:0,%reg32;
   leaq foo@tpoff(%reg64), %reg64                 leal foo@tpoff(%reg32), 
 %reg32

   or

   movq x@gottpoff(%rip),%reg64                   movq 
 x@gottpoff(%rip),%reg64;
   movl %fs:(%reg64),%reg32                       movl %fs:(%reg64), %reg32

   to

   movq foo@tpoff, %reg64                         movq foo@tpoff, %reg64
   movl %fs:(%reeg64),%reg32                      movl %fs:(%reg64), %reg32

 LE
   movq %fs:0,%reg64;                             movl %fs:0,%reg32;
   leaq x@tpoff(%reg64),%reg32                    leal x@tpoff(%reg32),%reg32

   or

   movq %fs:0,%reg64;                             movl %fs:0,%reg32;
   addq $x@tpoff,%reg64                           addl $x@tpoff,%reg32

   or

   movq %fs:0,%reg64;                             movl %fs:0,%reg32;
   movl x@tpoff(%reg64),%reg32                    movl x@tpoff(%reg32),%reg32

   or

   movl %fs:x@tpoff,%reg32                        movl %fs:x@tpoff,%reg32


 X32 TLS implementation is 

Re: PATCH: Properly generate X32 IE sequence

2012-03-11 Thread Uros Bizjak
On Sun, Mar 11, 2012 at 6:11 PM, H.J. Lu hjl.to...@gmail.com wrote:

 X86-64 linker optimizes TLS_MODEL_INITIAL_EXEC to TLS_MODEL_LOCAL_EXEC
 by checking

        movq foo@gottpoff(%rip), %reg

 and

        addq foo@gottpoff(%rip), %reg

 It uses the REX prefix to avoid the last byte of the previous
 instruction.  With 32bit Pmode, we may not have the REX prefix and
 the last byte of the previous instruction may be an offset, which
 may look like a REX prefix.  IE-LE optimization will generate 
 corrupted
 binary.  This patch makes sure we always output an REX pfrefix for
 UNSPEC_GOTNTPOFF.  OK for trunk?

 Actually, linker has:

    case R_X86_64_GOTTPOFF:
      /* Check transition from IE access model:
                mov foo@gottpoff(%rip), %reg
                add foo@gottpoff(%rip), %reg
       */

      /* Check REX prefix first.  */
      if (offset = 3  (offset + 4) = sec-size)
        {
          val = bfd_get_8 (abfd, contents + offset - 3);
          if (val != 0x48  val != 0x4c)
            {
              /* X32 may have 0x44 REX prefix or no REX prefix.  */
              if (ABI_64_P (abfd))
                return FALSE;
            }
        }
      else
        {
          /* X32 may not have any REX prefix.  */
          if (ABI_64_P (abfd))
            return FALSE;
          if (offset  2 || (offset + 3)  sec-size)
            return FALSE;
        }

 So, it should handle the case without REX just OK. If it doesn't, then
 this is a bug in binutils.


 The last byte of the displacement in the previous instruction
 may happen to look like a REX byte. In that case, linker
 will overwrite the last byte of the previous instruction and
 generate the wrong instruction sequence.

 I need to update linker to enforce the REX byte check.

 One important observation: if we want to follow the x86_64 TLS spec
 strictly, we have to use existing DImode patterns only. This also
 means that we should NOT convert other TLS patterns to Pmode, since
 they explicitly state movq and addq. If this is not the case, then we
 need new TLS specification for X32.

 Here is a patch to properly generate X32 IE sequence.

 This is the summary of differences between x86-64 TLS and x32 TLS:

                     x86-64                               x32
 GD
    byte 0x66; leaq foo@tlsgd(%rip),%rdi;         leaq foo@tlsgd(%rip),%rdi;
    .word 0x; rex64; call __tls_get_addr@plt  .word 0x; rex64;
 call __tls_get_addr@plt

 GD-IE optimization
   movq %fs:0,%rax; addq x@gottpoff(%rip),%rax    movl %fs:0,%eax;
 addq x@gottpoff(%rip),%rax

 GD-LE optimization
   movq %fs:0,%rax; leaq x@tpoff(%rax),%rax       movl %fs:0,%eax;
 leaq x@tpoff(%rax),%rax

 LD
  leaq foo@tlsld(%rip),%rdi;                      leaq foo@tlsld(%rip),%rdi;
  call __tls_get_addr@plt                         call __tls_get_addr@plt

 LD-LE optimization
  .word 0x; .byte 0x66; movq %fs:0, %rax      nopl 0x0(%rax); movl
 %fs:0, %eax

 IE
   movq %fs:0,%reg64;                             movl %fs:0,%reg32;
   addq x@gottpoff(%rip),%reg64                   addl 
 x@gottpoff(%rip),%reg32

   or
                                                  Not supported if
 Pmode == SImode
   movq x@gottpoff(%rip),%reg64;                  movq 
 x@gottpoff(%rip),%reg64;
   movq %fs:(%reg64),%reg32                       movl %fs:(%reg64), %reg32

 IE-LE optimization

   movq %fs:0,%reg64;                             movl %fs:0,%reg32;
   addq x@gottpoff(%rip),%reg64                   addl 
 x@gottpoff(%rip),%reg32

   to

   movq %fs:0,%reg64;                             movl %fs:0,%reg32;
   addq foo@tpoff, %reg64                         addl foo@tpoff, %reg32

   movq %fs:0,%reg64;                             movl %fs:0,%reg32;
   leaq foo@tpoff(%reg64), %reg64                 leal foo@tpoff(%reg32), 
 %reg32

   or

   movq x@gottpoff(%rip),%reg64                   movq 
 x@gottpoff(%rip),%reg64;
   movl %fs:(%reg64),%reg32                       movl %fs:(%reg64), %reg32

   to

   movq foo@tpoff, %reg64                         movq foo@tpoff, %reg64
   movl %fs:(%reeg64),%reg32                      movl %fs:(%reg64), %reg32

 LE
   movq %fs:0,%reg64;                             movl %fs:0,%reg32;
   leaq x@tpoff(%reg64),%reg32                    leal 
 x@tpoff(%reg32),%reg32

   or

   movq %fs:0,%reg64;                             movl %fs:0,%reg32;
   addq $x@tpoff,%reg64                           addl $x@tpoff,%reg32

   or

   movq %fs:0,%reg64;                             movl %fs:0,%reg32;
   movl x@tpoff(%reg64),%reg32                    movl 
 x@tpoff(%reg32),%reg32

   or

   movl %fs:x@tpoff,%reg32                        movl %fs:x@tpoff,%reg32


 X32 TLS implementation is straight forward, except for IE:

 1. Since address override works only on the (reg32) part in fs:(reg32),
 we can't use it as memory operand.  This patch changes 
 ix86_decompose_address
 to disallow  fs:(reg) if Pmode != word_mode.
 2. When Pmode == SImode, there may be no REX 

Re: PATCH: Properly generate X32 IE sequence

2012-03-11 Thread H.J. Lu
On Sun, Mar 11, 2012 at 10:55 AM, Uros Bizjak ubiz...@gmail.com wrote:
 On Sun, Mar 11, 2012 at 6:11 PM, H.J. Lu hjl.to...@gmail.com wrote:

 X86-64 linker optimizes TLS_MODEL_INITIAL_EXEC to TLS_MODEL_LOCAL_EXEC
 by checking

        movq foo@gottpoff(%rip), %reg

 and

        addq foo@gottpoff(%rip), %reg

 It uses the REX prefix to avoid the last byte of the previous
 instruction.  With 32bit Pmode, we may not have the REX prefix and
 the last byte of the previous instruction may be an offset, which
 may look like a REX prefix.  IE-LE optimization will generate 
 corrupted
 binary.  This patch makes sure we always output an REX pfrefix for
 UNSPEC_GOTNTPOFF.  OK for trunk?

 Actually, linker has:

    case R_X86_64_GOTTPOFF:
      /* Check transition from IE access model:
                mov foo@gottpoff(%rip), %reg
                add foo@gottpoff(%rip), %reg
       */

      /* Check REX prefix first.  */
      if (offset = 3  (offset + 4) = sec-size)
        {
          val = bfd_get_8 (abfd, contents + offset - 3);
          if (val != 0x48  val != 0x4c)
            {
              /* X32 may have 0x44 REX prefix or no REX prefix.  */
              if (ABI_64_P (abfd))
                return FALSE;
            }
        }
      else
        {
          /* X32 may not have any REX prefix.  */
          if (ABI_64_P (abfd))
            return FALSE;
          if (offset  2 || (offset + 3)  sec-size)
            return FALSE;
        }

 So, it should handle the case without REX just OK. If it doesn't, then
 this is a bug in binutils.


 The last byte of the displacement in the previous instruction
 may happen to look like a REX byte. In that case, linker
 will overwrite the last byte of the previous instruction and
 generate the wrong instruction sequence.

 I need to update linker to enforce the REX byte check.

 One important observation: if we want to follow the x86_64 TLS spec
 strictly, we have to use existing DImode patterns only. This also
 means that we should NOT convert other TLS patterns to Pmode, since
 they explicitly state movq and addq. If this is not the case, then we
 need new TLS specification for X32.

 Here is a patch to properly generate X32 IE sequence.

 This is the summary of differences between x86-64 TLS and x32 TLS:

                     x86-64                               x32
 GD
    byte 0x66; leaq foo@tlsgd(%rip),%rdi;         leaq 
 foo@tlsgd(%rip),%rdi;
    .word 0x; rex64; call __tls_get_addr@plt  .word 0x; rex64;
 call __tls_get_addr@plt

 GD-IE optimization
   movq %fs:0,%rax; addq x@gottpoff(%rip),%rax    movl %fs:0,%eax;
 addq x@gottpoff(%rip),%rax

 GD-LE optimization
   movq %fs:0,%rax; leaq x@tpoff(%rax),%rax       movl %fs:0,%eax;
 leaq x@tpoff(%rax),%rax

 LD
  leaq foo@tlsld(%rip),%rdi;                      leaq 
 foo@tlsld(%rip),%rdi;
  call __tls_get_addr@plt                         call __tls_get_addr@plt

 LD-LE optimization
  .word 0x; .byte 0x66; movq %fs:0, %rax      nopl 0x0(%rax); movl
 %fs:0, %eax

 IE
   movq %fs:0,%reg64;                             movl %fs:0,%reg32;
   addq x@gottpoff(%rip),%reg64                   addl 
 x@gottpoff(%rip),%reg32

   or
                                                  Not supported if
 Pmode == SImode
   movq x@gottpoff(%rip),%reg64;                  movq 
 x@gottpoff(%rip),%reg64;
   movq %fs:(%reg64),%reg32                       movl %fs:(%reg64), %reg32

 IE-LE optimization

   movq %fs:0,%reg64;                             movl %fs:0,%reg32;
   addq x@gottpoff(%rip),%reg64                   addl 
 x@gottpoff(%rip),%reg32

   to

   movq %fs:0,%reg64;                             movl %fs:0,%reg32;
   addq foo@tpoff, %reg64                         addl foo@tpoff, %reg32

   movq %fs:0,%reg64;                             movl %fs:0,%reg32;
   leaq foo@tpoff(%reg64), %reg64                 leal foo@tpoff(%reg32), 
 %reg32

   or

   movq x@gottpoff(%rip),%reg64                   movq 
 x@gottpoff(%rip),%reg64;
   movl %fs:(%reg64),%reg32                       movl %fs:(%reg64), %reg32

   to

   movq foo@tpoff, %reg64                         movq foo@tpoff, %reg64
   movl %fs:(%reeg64),%reg32                      movl %fs:(%reg64), %reg32

 LE
   movq %fs:0,%reg64;                             movl %fs:0,%reg32;
   leaq x@tpoff(%reg64),%reg32                    leal 
 x@tpoff(%reg32),%reg32

   or

   movq %fs:0,%reg64;                             movl %fs:0,%reg32;
   addq $x@tpoff,%reg64                           addl $x@tpoff,%reg32

   or

   movq %fs:0,%reg64;                             movl %fs:0,%reg32;
   movl x@tpoff(%reg64),%reg32                    movl 
 x@tpoff(%reg32),%reg32

   or

   movl %fs:x@tpoff,%reg32                        movl %fs:x@tpoff,%reg32


 X32 TLS implementation is straight forward, except for IE:

 1. Since address override works only on the (reg32) part in fs:(reg32),
 we can't use it as memory operand.  This patch changes 
 ix86_decompose_address
 to disallow  

Re: PATCH: Properly generate X32 IE sequence

2012-03-11 Thread Uros Bizjak
On Sun, Mar 11, 2012 at 7:16 PM, H.J. Lu hjl.to...@gmail.com wrote:

        * config/i386/i386.c (ix86_decompose_address): Disallow fs:(reg)
        if Pmode != word_mode.
        (legitimize_tls_address): Call gen_tls_initial_exec_x32 if
        Pmode == SImode for x32.

        * config/i386/i386.md (UNSPEC_TLS_IE_X32): New.
        (tls_initial_exec_x32): Likewise.

 Nice solution!

 OK for mainline.

 Done.

 BTW: Did you investigate the issue with memory aliasing?


 It isn't a problem since it is wrapped in UNSPEC_TLS_IE_X32
 which loads address of the TLS symbol.

 Thanks.


 Since we must use reg64 in %fs:(%reg) memory operand like

 movq x@gottpoff(%rip),%reg64;
 mov %fs:(%reg64),%reg

 this patch optimizes x32 TLS IE load and store by wrapping
 %reg64 inside of UNSPEC when Pmode == SImode.  OK for
 trunk?

 I think we should just scrap all these complications and go with the
 idea of clearing MASK_TLS_DIRECT_SEG_REFS.


 I will give it a try.

You can also revert:

* config/i386/i386.c (ix86_decompose_address): Disallow fs:(reg)
if Pmode != word_mode.

then, since this part is handled later in the function.

Uros.


Re: PATCH: Properly generate X32 IE sequence

2012-03-11 Thread H.J. Lu
On Sun, Mar 11, 2012 at 11:21 AM, Uros Bizjak ubiz...@gmail.com wrote:
 On Sun, Mar 11, 2012 at 7:16 PM, H.J. Lu hjl.to...@gmail.com wrote:

        * config/i386/i386.c (ix86_decompose_address): Disallow fs:(reg)
        if Pmode != word_mode.
        (legitimize_tls_address): Call gen_tls_initial_exec_x32 if
        Pmode == SImode for x32.

        * config/i386/i386.md (UNSPEC_TLS_IE_X32): New.
        (tls_initial_exec_x32): Likewise.

 Nice solution!

 OK for mainline.

 Done.

 BTW: Did you investigate the issue with memory aliasing?


 It isn't a problem since it is wrapped in UNSPEC_TLS_IE_X32
 which loads address of the TLS symbol.

 Thanks.


 Since we must use reg64 in %fs:(%reg) memory operand like

 movq x@gottpoff(%rip),%reg64;
 mov %fs:(%reg64),%reg

 this patch optimizes x32 TLS IE load and store by wrapping
 %reg64 inside of UNSPEC when Pmode == SImode.  OK for
 trunk?

 I think we should just scrap all these complications and go with the
 idea of clearing MASK_TLS_DIRECT_SEG_REFS.


 I will give it a try.

 You can also revert:

        * config/i386/i386.c (ix86_decompose_address): Disallow fs:(reg)
        if Pmode != word_mode.

 then, since this part is handled later in the function.


Here is the patch which is equivalent to clearing MASK_TLS_DIRECT_SEG_REFS
when Pmode != word_mode.  We need to keep

  else if (Pmode == SImode)
{
  /* Always generate
movl %fs:0, %reg32
addl xgottpoff(%rip), %reg32
 to support linker IE-LE optimization and avoid
 fs:(%reg32) as memory operand.  */
  dest = gen_reg_rtx (Pmode);
  emit_insn (gen_tls_initial_exec_x32 (dest, x));
  return dest;
}

to support linker IE-LE optimization.  TARGET_TLS_DIRECT_SEG_REFS only affects
TLS LE access and fs:(%reg) is only generated by combine.

So the main impact of disabling TARGET_TLS_DIRECT_SEG_REFS is to disable
fs:immediate memory operand for TLS LE access, which doesn't have any problems
to begin with.

I would prefer to keep TARGET_TLS_DIRECT_SEG_REFS and disable only
fs:(%reg), which is generated by combine.

-- 
H.J.
--
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index b101922..1ffcc85 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -11478,6 +11478,7 @@ ix86_decompose_address (rtx addr, struct
ix86_address *out)

case UNSPEC:
  if (XINT (op, 1) == UNSPEC_TP
+  Pmode == word_mode
   TARGET_TLS_DIRECT_SEG_REFS
   seg == SEG_DEFAULT)
seg = TARGET_64BIT ? SEG_FS : SEG_GS;
@@ -11534,11 +11535,6 @@ ix86_decompose_address (rtx addr, struct
ix86_address *out)
   else
 disp = addr;   /* displacement */

-  /* Since address override works only on the (reg32) part in fs:(reg32),
- we can't use it as memory operand.  */
-  if (Pmode != word_mode  seg == SEG_FS  (base || index))
-return 0;
-
   if (index)
 {
   if (REG_P (index))
@@ -12706,7 +12702,9 @@ legitimize_tls_address (rtx x, enum tls_model
model, bool for_mov)

   if (TARGET_64BIT || TARGET_ANY_GNU_TLS)
{
- base = get_thread_pointer (for_mov || !TARGET_TLS_DIRECT_SEG_REFS);
+ base = get_thread_pointer (for_mov
+|| Pmode != word_mode
+|| !TARGET_TLS_DIRECT_SEG_REFS);
  return gen_rtx_PLUS (Pmode, base, off);
}
   else
@@ -13239,7 +13237,7 @@ ix86_delegitimize_tls_address (rtx orig_x)
   rtx x = orig_x, unspec;
   struct ix86_address addr;

-  if (!TARGET_TLS_DIRECT_SEG_REFS)
+  if (Pmode != word_mode || !TARGET_TLS_DIRECT_SEG_REFS)
 return orig_x;
   if (MEM_P (x))
 x = XEXP (x, 0);


Re: PATCH: Properly generate X32 IE sequence

2012-03-10 Thread Uros Bizjak
On Fri, Mar 9, 2012 at 11:26 PM, H.J. Lu hjl.to...@gmail.com wrote:
 On Mon, Mar 5, 2012 at 9:25 AM, Uros Bizjak ubiz...@gmail.com wrote:
 On Mon, Mar 5, 2012 at 6:03 PM, H.J. Lu hjl.to...@gmail.com wrote:

 X86-64 linker optimizes TLS_MODEL_INITIAL_EXEC to TLS_MODEL_LOCAL_EXEC
 by checking

        movq foo@gottpoff(%rip), %reg

 and

        addq foo@gottpoff(%rip), %reg

 It uses the REX prefix to avoid the last byte of the previous
 instruction.  With 32bit Pmode, we may not have the REX prefix and
 the last byte of the previous instruction may be an offset, which
 may look like a REX prefix.  IE-LE optimization will generate corrupted
 binary.  This patch makes sure we always output an REX pfrefix for
 UNSPEC_GOTNTPOFF.  OK for trunk?

 Actually, linker has:

    case R_X86_64_GOTTPOFF:
      /* Check transition from IE access model:
                mov foo@gottpoff(%rip), %reg
                add foo@gottpoff(%rip), %reg
       */

      /* Check REX prefix first.  */
      if (offset = 3  (offset + 4) = sec-size)
        {
          val = bfd_get_8 (abfd, contents + offset - 3);
          if (val != 0x48  val != 0x4c)
            {
              /* X32 may have 0x44 REX prefix or no REX prefix.  */
              if (ABI_64_P (abfd))
                return FALSE;
            }
        }
      else
        {
          /* X32 may not have any REX prefix.  */
          if (ABI_64_P (abfd))
            return FALSE;
          if (offset  2 || (offset + 3)  sec-size)
            return FALSE;
        }

 So, it should handle the case without REX just OK. If it doesn't, then
 this is a bug in binutils.


 The last byte of the displacement in the previous instruction
 may happen to look like a REX byte. In that case, linker
 will overwrite the last byte of the previous instruction and
 generate the wrong instruction sequence.

 I need to update linker to enforce the REX byte check.

 One important observation: if we want to follow the x86_64 TLS spec
 strictly, we have to use existing DImode patterns only. This also
 means that we should NOT convert other TLS patterns to Pmode, since
 they explicitly state movq and addq. If this is not the case, then we
 need new TLS specification for X32.

 Here is a patch to properly generate X32 IE sequence.

 This is the summary of differences between x86-64 TLS and x32 TLS:

                     x86-64                               x32
 GD
    byte 0x66; leaq foo@tlsgd(%rip),%rdi;         leaq foo@tlsgd(%rip),%rdi;
    .word 0x; rex64; call __tls_get_addr@plt  .word 0x; rex64;
 call __tls_get_addr@plt

 GD-IE optimization
   movq %fs:0,%rax; addq x@gottpoff(%rip),%rax    movl %fs:0,%eax;
 addq x@gottpoff(%rip),%rax

 GD-LE optimization
   movq %fs:0,%rax; leaq x@tpoff(%rax),%rax       movl %fs:0,%eax;
 leaq x@tpoff(%rax),%rax

 LD
  leaq foo@tlsld(%rip),%rdi;                      leaq foo@tlsld(%rip),%rdi;
  call __tls_get_addr@plt                         call __tls_get_addr@plt

 LD-LE optimization
  .word 0x; .byte 0x66; movq %fs:0, %rax      nopl 0x0(%rax); movl
 %fs:0, %eax

 IE
   movq %fs:0,%reg64;                             movl %fs:0,%reg32;
   addq x@gottpoff(%rip),%reg64                   addl x@gottpoff(%rip),%reg32

   or
                                                  Not supported if
 Pmode == SImode
   movq x@gottpoff(%rip),%reg64;                  movq x@gottpoff(%rip),%reg64;
   movq %fs:(%reg64),%reg32                       movl %fs:(%reg64), %reg32

 IE-LE optimization

   movq %fs:0,%reg64;                             movl %fs:0,%reg32;
   addq x@gottpoff(%rip),%reg64                   addl x@gottpoff(%rip),%reg32

   to

   movq %fs:0,%reg64;                             movl %fs:0,%reg32;
   addq foo@tpoff, %reg64                         addl foo@tpoff, %reg32

   movq %fs:0,%reg64;                             movl %fs:0,%reg32;
   leaq foo@tpoff(%reg64), %reg64                 leal foo@tpoff(%reg32), 
 %reg32

   or

   movq x@gottpoff(%rip),%reg64                   movq x@gottpoff(%rip),%reg64;
   movl %fs:(%reg64),%reg32                       movl %fs:(%reg64), %reg32

   to

   movq foo@tpoff, %reg64                         movq foo@tpoff, %reg64
   movl %fs:(%reeg64),%reg32                      movl %fs:(%reg64), %reg32

 LE
   movq %fs:0,%reg64;                             movl %fs:0,%reg32;
   leaq x@tpoff(%reg64),%reg32                    leal x@tpoff(%reg32),%reg32

   or

   movq %fs:0,%reg64;                             movl %fs:0,%reg32;
   addq $x@tpoff,%reg64                           addl $x@tpoff,%reg32

   or

   movq %fs:0,%reg64;                             movl %fs:0,%reg32;
   movl x@tpoff(%reg64),%reg32                    movl x@tpoff(%reg32),%reg32

   or

   movl %fs:x@tpoff,%reg32                        movl %fs:x@tpoff,%reg32


 X32 TLS implementation is straight forward, except for IE:

 1. Since address override works only on the (reg32) part in fs:(reg32),
 we can't use it as memory operand.  This patch 

Re: PATCH: Properly generate X32 IE sequence

2012-03-10 Thread H.J. Lu
On Sat, Mar 10, 2012 at 5:09 AM, Uros Bizjak ubiz...@gmail.com wrote:
 On Fri, Mar 9, 2012 at 11:26 PM, H.J. Lu hjl.to...@gmail.com wrote:
 On Mon, Mar 5, 2012 at 9:25 AM, Uros Bizjak ubiz...@gmail.com wrote:
 On Mon, Mar 5, 2012 at 6:03 PM, H.J. Lu hjl.to...@gmail.com wrote:

 X86-64 linker optimizes TLS_MODEL_INITIAL_EXEC to TLS_MODEL_LOCAL_EXEC
 by checking

        movq foo@gottpoff(%rip), %reg

 and

        addq foo@gottpoff(%rip), %reg

 It uses the REX prefix to avoid the last byte of the previous
 instruction.  With 32bit Pmode, we may not have the REX prefix and
 the last byte of the previous instruction may be an offset, which
 may look like a REX prefix.  IE-LE optimization will generate corrupted
 binary.  This patch makes sure we always output an REX pfrefix for
 UNSPEC_GOTNTPOFF.  OK for trunk?

 Actually, linker has:

    case R_X86_64_GOTTPOFF:
      /* Check transition from IE access model:
                mov foo@gottpoff(%rip), %reg
                add foo@gottpoff(%rip), %reg
       */

      /* Check REX prefix first.  */
      if (offset = 3  (offset + 4) = sec-size)
        {
          val = bfd_get_8 (abfd, contents + offset - 3);
          if (val != 0x48  val != 0x4c)
            {
              /* X32 may have 0x44 REX prefix or no REX prefix.  */
              if (ABI_64_P (abfd))
                return FALSE;
            }
        }
      else
        {
          /* X32 may not have any REX prefix.  */
          if (ABI_64_P (abfd))
            return FALSE;
          if (offset  2 || (offset + 3)  sec-size)
            return FALSE;
        }

 So, it should handle the case without REX just OK. If it doesn't, then
 this is a bug in binutils.


 The last byte of the displacement in the previous instruction
 may happen to look like a REX byte. In that case, linker
 will overwrite the last byte of the previous instruction and
 generate the wrong instruction sequence.

 I need to update linker to enforce the REX byte check.

 One important observation: if we want to follow the x86_64 TLS spec
 strictly, we have to use existing DImode patterns only. This also
 means that we should NOT convert other TLS patterns to Pmode, since
 they explicitly state movq and addq. If this is not the case, then we
 need new TLS specification for X32.

 Here is a patch to properly generate X32 IE sequence.

 This is the summary of differences between x86-64 TLS and x32 TLS:

                     x86-64                               x32
 GD
    byte 0x66; leaq foo@tlsgd(%rip),%rdi;         leaq foo@tlsgd(%rip),%rdi;
    .word 0x; rex64; call __tls_get_addr@plt  .word 0x; rex64;
 call __tls_get_addr@plt

 GD-IE optimization
   movq %fs:0,%rax; addq x@gottpoff(%rip),%rax    movl %fs:0,%eax;
 addq x@gottpoff(%rip),%rax

 GD-LE optimization
   movq %fs:0,%rax; leaq x@tpoff(%rax),%rax       movl %fs:0,%eax;
 leaq x@tpoff(%rax),%rax

 LD
  leaq foo@tlsld(%rip),%rdi;                      leaq foo@tlsld(%rip),%rdi;
  call __tls_get_addr@plt                         call __tls_get_addr@plt

 LD-LE optimization
  .word 0x; .byte 0x66; movq %fs:0, %rax      nopl 0x0(%rax); movl
 %fs:0, %eax

 IE
   movq %fs:0,%reg64;                             movl %fs:0,%reg32;
   addq x@gottpoff(%rip),%reg64                   addl x@gottpoff(%rip),%reg32

   or
                                                  Not supported if
 Pmode == SImode
   movq x@gottpoff(%rip),%reg64;                  movq 
 x@gottpoff(%rip),%reg64;
   movq %fs:(%reg64),%reg32                       movl %fs:(%reg64), %reg32

 IE-LE optimization

   movq %fs:0,%reg64;                             movl %fs:0,%reg32;
   addq x@gottpoff(%rip),%reg64                   addl x@gottpoff(%rip),%reg32

   to

   movq %fs:0,%reg64;                             movl %fs:0,%reg32;
   addq foo@tpoff, %reg64                         addl foo@tpoff, %reg32

   movq %fs:0,%reg64;                             movl %fs:0,%reg32;
   leaq foo@tpoff(%reg64), %reg64                 leal foo@tpoff(%reg32), 
 %reg32

   or

   movq x@gottpoff(%rip),%reg64                   movq 
 x@gottpoff(%rip),%reg64;
   movl %fs:(%reg64),%reg32                       movl %fs:(%reg64), %reg32

   to

   movq foo@tpoff, %reg64                         movq foo@tpoff, %reg64
   movl %fs:(%reeg64),%reg32                      movl %fs:(%reg64), %reg32

 LE
   movq %fs:0,%reg64;                             movl %fs:0,%reg32;
   leaq x@tpoff(%reg64),%reg32                    leal x@tpoff(%reg32),%reg32

   or

   movq %fs:0,%reg64;                             movl %fs:0,%reg32;
   addq $x@tpoff,%reg64                           addl $x@tpoff,%reg32

   or

   movq %fs:0,%reg64;                             movl %fs:0,%reg32;
   movl x@tpoff(%reg64),%reg32                    movl x@tpoff(%reg32),%reg32

   or

   movl %fs:x@tpoff,%reg32                        movl %fs:x@tpoff,%reg32


 X32 TLS implementation is straight forward, except for IE:

 1. Since address override works only on the 

PATCH: Properly generate X32 IE sequence

2012-03-09 Thread H.J. Lu
On Mon, Mar 5, 2012 at 9:25 AM, Uros Bizjak ubiz...@gmail.com wrote:
 On Mon, Mar 5, 2012 at 6:03 PM, H.J. Lu hjl.to...@gmail.com wrote:

 X86-64 linker optimizes TLS_MODEL_INITIAL_EXEC to TLS_MODEL_LOCAL_EXEC
 by checking

        movq foo@gottpoff(%rip), %reg

 and

        addq foo@gottpoff(%rip), %reg

 It uses the REX prefix to avoid the last byte of the previous
 instruction.  With 32bit Pmode, we may not have the REX prefix and
 the last byte of the previous instruction may be an offset, which
 may look like a REX prefix.  IE-LE optimization will generate corrupted
 binary.  This patch makes sure we always output an REX pfrefix for
 UNSPEC_GOTNTPOFF.  OK for trunk?

 Actually, linker has:

    case R_X86_64_GOTTPOFF:
      /* Check transition from IE access model:
                mov foo@gottpoff(%rip), %reg
                add foo@gottpoff(%rip), %reg
       */

      /* Check REX prefix first.  */
      if (offset = 3  (offset + 4) = sec-size)
        {
          val = bfd_get_8 (abfd, contents + offset - 3);
          if (val != 0x48  val != 0x4c)
            {
              /* X32 may have 0x44 REX prefix or no REX prefix.  */
              if (ABI_64_P (abfd))
                return FALSE;
            }
        }
      else
        {
          /* X32 may not have any REX prefix.  */
          if (ABI_64_P (abfd))
            return FALSE;
          if (offset  2 || (offset + 3)  sec-size)
            return FALSE;
        }

 So, it should handle the case without REX just OK. If it doesn't, then
 this is a bug in binutils.


 The last byte of the displacement in the previous instruction
 may happen to look like a REX byte. In that case, linker
 will overwrite the last byte of the previous instruction and
 generate the wrong instruction sequence.

 I need to update linker to enforce the REX byte check.

 One important observation: if we want to follow the x86_64 TLS spec
 strictly, we have to use existing DImode patterns only. This also
 means that we should NOT convert other TLS patterns to Pmode, since
 they explicitly state movq and addq. If this is not the case, then we
 need new TLS specification for X32.

Here is a patch to properly generate X32 IE sequence.

This is the summary of differences between x86-64 TLS and x32 TLS:

 x86-64   x32
GD
byte 0x66; leaq foo@tlsgd(%rip),%rdi; leaq foo@tlsgd(%rip),%rdi;
.word 0x; rex64; call __tls_get_addr@plt  .word 0x; rex64;
call __tls_get_addr@plt

GD-IE optimization
   movq %fs:0,%rax; addq x@gottpoff(%rip),%raxmovl %fs:0,%eax;
addq x@gottpoff(%rip),%rax

GD-LE optimization
   movq %fs:0,%rax; leaq x@tpoff(%rax),%rax   movl %fs:0,%eax;
leaq x@tpoff(%rax),%rax

LD
  leaq foo@tlsld(%rip),%rdi;  leaq foo@tlsld(%rip),%rdi;
  call __tls_get_addr@plt call __tls_get_addr@plt

LD-LE optimization
  .word 0x; .byte 0x66; movq %fs:0, %rax  nopl 0x0(%rax); movl
%fs:0, %eax

IE
   movq %fs:0,%reg64; movl %fs:0,%reg32;
   addq x@gottpoff(%rip),%reg64   addl x@gottpoff(%rip),%reg32

   or
  Not supported if
Pmode == SImode
   movq x@gottpoff(%rip),%reg64;  movq x@gottpoff(%rip),%reg64;
   movq %fs:(%reg64),%reg32   movl %fs:(%reg64), %reg32

IE-LE optimization

   movq %fs:0,%reg64; movl %fs:0,%reg32;
   addq x@gottpoff(%rip),%reg64   addl x@gottpoff(%rip),%reg32

   to

   movq %fs:0,%reg64; movl %fs:0,%reg32;
   addq foo@tpoff, %reg64 addl foo@tpoff, %reg32

   movq %fs:0,%reg64; movl %fs:0,%reg32;
   leaq foo@tpoff(%reg64), %reg64 leal foo@tpoff(%reg32), %reg32

   or

   movq x@gottpoff(%rip),%reg64   movq x@gottpoff(%rip),%reg64;
   movl %fs:(%reg64),%reg32   movl %fs:(%reg64), %reg32

   to

   movq foo@tpoff, %reg64 movq foo@tpoff, %reg64
   movl %fs:(%reeg64),%reg32  movl %fs:(%reg64), %reg32

LE
   movq %fs:0,%reg64; movl %fs:0,%reg32;
   leaq x@tpoff(%reg64),%reg32leal x@tpoff(%reg32),%reg32

   or

   movq %fs:0,%reg64; movl %fs:0,%reg32;
   addq $x@tpoff,%reg64   addl $x@tpoff,%reg32

   or

   movq %fs:0,%reg64; movl %fs:0,%reg32;
   movl x@tpoff(%reg64),%reg32movl x@tpoff(%reg32),%reg32

   or

   movl %fs:x@tpoff,%reg32movl %fs:x@tpoff,%reg32


X32 TLS implementation is straight forward, except for IE:

1. Since address override works only on the (reg32) part in fs:(reg32),
we can't use it as memory operand.  This patch changes ix86_decompose_address
to disallow  fs:(reg) if Pmode != word_mode.
2. When Pmode