RE: [PATCH 2/X, i386, PR54232] Enable EBX for x86 in 32bits PIC code

2014-10-31 Thread Zamyatin, Igor
 
 Those macros use =b etc. in asm constraints, so IMHO you'll get the same
 error as for say:
 
 int
 foo (void)
 {
   bar ();
   int i = 0;
   asm volatile ( : +b (i));
   bar ();
   return i;
 }
 
 when compiled by gcc 4.9 and earlier with -O2 -m32 -fpic:
 error: inconsistent operand constraints in an ‘asm’

I see, thanks!

Fixed patch is here - https://sourceware.org/ml/libc-alpha/2014-10/msg00746.html

Igor

 
   Jakub


RE: [PATCH 2/X, i386, PR54232] Enable EBX for x86 in 32bits PIC code

2014-10-30 Thread Zamyatin, Igor
Posted a patch in libc-alpha:

https://sourceware.org/ml/libc-alpha/2014-10/msg00701.html

 -Original Message-
 From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-
 ow...@gcc.gnu.org] On Behalf Of Jeff Law
 Sent: Saturday, October 25, 2014 3:42 AM
 To: Evgeny Stupachenko; Andrew Pinski
 Cc: Uros Bizjak; Vladimir Makarov; GCC Patches
 Subject: Re: [PATCH 2/X, i386, PR54232] Enable EBX for x86 in 32bits PIC code
 
 On 10/24/14 17:37, Evgeny Stupachenko wrote:
  What if we remove the check?
  glibc build pass?
 That would be my inclination...   But it's not my decision to make.
 
 The first check is to verify glibc builds and passes its testsuite with the 
 new
 compiler and that check removed.
 
 jeff



Re: [PATCH 2/X, i386, PR54232] Enable EBX for x86 in 32bits PIC code

2014-10-30 Thread Jakub Jelinek
On Thu, Oct 30, 2014 at 08:48:57AM +, Zamyatin, Igor wrote:
 Posted a patch in libc-alpha:
 
 https://sourceware.org/ml/libc-alpha/2014-10/msg00701.html

That looks wrong.  The non-PIC patterns that are enabled unconditionally
with the patch set/use ebx, which will not work with pre-GCC 5 in PIC mode.

Jakub


RE: [PATCH 2/X, i386, PR54232] Enable EBX for x86 in 32bits PIC code

2014-10-30 Thread Zamyatin, Igor
 
 On Thu, Oct 30, 2014 at 08:48:57AM +, Zamyatin, Igor wrote:
  Posted a patch in libc-alpha:
 
  https://sourceware.org/ml/libc-alpha/2014-10/msg00701.html
 
 That looks wrong.  The non-PIC patterns that are enabled unconditionally
 with the patch set/use ebx, which will not work with pre-GCC 5 in PIC mode.

Could you please specify why do you think it is wrong for PIC mode for pre-GCC 
5?

Thanks,
Igor

 
   Jakub


Re: [PATCH 2/X, i386, PR54232] Enable EBX for x86 in 32bits PIC code

2014-10-30 Thread Jakub Jelinek
On Thu, Oct 30, 2014 at 12:34:45PM +, Zamyatin, Igor wrote:
  
  On Thu, Oct 30, 2014 at 08:48:57AM +, Zamyatin, Igor wrote:
   Posted a patch in libc-alpha:
  
   https://sourceware.org/ml/libc-alpha/2014-10/msg00701.html
  
  That looks wrong.  The non-PIC patterns that are enabled unconditionally
  with the patch set/use ebx, which will not work with pre-GCC 5 in PIC mode.
 
 Could you please specify why do you think it is wrong for PIC mode for 
 pre-GCC 5?

Those macros use =b etc. in asm constraints, so IMHO you'll get the same
error as for say:

int
foo (void)
{
  bar ();
  int i = 0;
  asm volatile ( : +b (i));
  bar ();
  return i;
}

when compiled by gcc 4.9 and earlier with -O2 -m32 -fpic:
error: inconsistent operand constraints in an ‘asm’

Jakub


Re: [PATCH 1/X, i386, PR54232] Enable EBX for x86 in 32bits PIC code

2014-10-29 Thread Evgeny Stupachenko
The test passes now. So let's remove xfail.

2014-10-29  Evgeny Stupachenko  evstu...@gmail.com

gcc/testsuite
* gcc.target/i386/pr23098.c: Remove xfail.

diff --git a/gcc/testsuite/gcc.target/i386/pr23098.c
b/gcc/testsuite/gcc.target/i386/pr23098.c
index 7f118dc..7f118bb 100644
--- a/gcc/testsuite/gcc.target/i386/pr23098.c
+++ b/gcc/testsuite/gcc.target/i386/pr23098.c
@@ -1,7 +1,7 @@
 /* PR rtl-optimization/23098 */
 /* { dg-do compile } */
 /* { dg-options -O2 -fPIC } */
-/* { dg-final { scan-assembler-not \.LC\[0-9\] { xfail *-*-* } } } */
+/* { dg-final { scan-assembler-not \.LC\[0-9\] } } */
 /* { dg-require-effective-target ia32 } */
 /* { dg-require-effective-target fpic } */

On Thu, Oct 23, 2014 at 4:19 PM, Rainer Orth
r...@cebitec.uni-bielefeld.de wrote:
 Evgeny Stupachenko evstu...@gmail.com writes:

 Reattached.

 On Mon, Oct 13, 2014 at 8:22 PM, Uros Bizjak ubiz...@gmail.com wrote:
 On Mon, Oct 13, 2014 at 4:53 PM, Evgeny Stupachenko evstu...@gmail.com
 wrote:

 ChangeLog for testsuite:

 2014-10-13  Evgeny Stupachenko  evstu...@gmail.com

 PR target/8340
 PR middle-end/47602
 PR rtl-optimization/55458
 * gcc.target/i386/pic-1.c: Remove dg-error as test should pass now.
 * gcc.target/i386/pr55458.c: Likewise.
 * gcc.target/i386/pr47602.c: New.
 * gcc.target/i386/pr23098.c: Move to XFAIL.

 The unconditional XFAIL is wrong: the test now XPASSes on
 i386-pc-solaris2.11 and x86_64-unknown-linux-gnu, i686-unknown-linux-gnu
 for 32-bit.  Please fix.

 Rainer

 --
 -
 Rainer Orth, Center for Biotechnology, Bielefeld University


Re: [PATCH 1/X, i386, PR54232] Enable EBX for x86 in 32bits PIC code

2014-10-29 Thread Uros Bizjak
On Wed, Oct 29, 2014 at 1:28 PM, Evgeny Stupachenko evstu...@gmail.com wrote:
 The test passes now. So let's remove xfail.

 2014-10-29  Evgeny Stupachenko  evstu...@gmail.com

 gcc/testsuite
 * gcc.target/i386/pr23098.c: Remove xfail.

OK.

Thanks,
Uros.


Re: [PATCH 2/X, i386, PR54232] Enable EBX for x86 in 32bits PIC code

2014-10-24 Thread Andrew Pinski
On Fri, Oct 10, 2014 at 12:43 AM, Evgeny Stupachenko evstu...@gmail.com wrote:
 i386 specific part of the patch:

 2014-10-08  Ilya Enkovich  ilya.enkov...@intel.com
 Vladimir Makarov  vmaka...@redhat.com
 * gcc/config/i386/i386.c (ix86_use_pseudo_pic_reg): New.
 (ix86_init_pic_reg): New.
 (ix86_select_alt_pic_regnum): Add check on pseudo register.
 (ix86_save_reg): Likewise.
 (ix86_expand_prologue): Remove irrelevant code.
 (ix86_output_function_epilogue): Add check on pseudo register.
 (set_pic_reg_ever_alive): New.
 (legitimize_pic_address): Replace df_set_regs_ever_live with new
 set_pic_reg_ever_alive.
 (legitimize_tls_address): Likewise.
 (ix86_pic_register_p): New check.
 (ix86_delegitimize_address): Add check on pseudo register.
 (ix86_expand_call): Insert move from pseudo PIC register to ABI
 defined REAL_PIC_OFFSET_TABLE_REGNUM.
 (TARGET_INIT_PIC_REG): New.
 (TARGET_USE_PSEUDO_PIC_REG): New.
 (PIC_OFFSET_TABLE_REGNUM): New check.


This patch breaks glibc's ld.so on i686.
glibc has a check to make sure the PIC register is setup correctly:
/* Consistency check for position-independent code.  */
#ifdef __PIC__
# define check_consistency()  \
  ({ int __res;  \
 __asm__ __volatile__  \
   (LOAD_PIC_REG_STR (cx) ;  \
subl %%ebx, %%ecx;  \
je 1f;  \
ud2;  \
1:\n  \
: =c (__res));  \
 __res; })
#endif

This depends on ebx being the PIC register.  Now we don't have this so
we get ud2 in some cases.



Thanks,
Andrew Pinski


Re: [PATCH 2/X, i386, PR54232] Enable EBX for x86 in 32bits PIC code

2014-10-24 Thread Evgeny Stupachenko
What if we remove the check?
glibc build pass?

On Sat, Oct 25, 2014 at 3:09 AM, Andrew Pinski pins...@gmail.com wrote:
 On Fri, Oct 10, 2014 at 12:43 AM, Evgeny Stupachenko evstu...@gmail.com 
 wrote:
 i386 specific part of the patch:

 2014-10-08  Ilya Enkovich  ilya.enkov...@intel.com
 Vladimir Makarov  vmaka...@redhat.com
 * gcc/config/i386/i386.c (ix86_use_pseudo_pic_reg): New.
 (ix86_init_pic_reg): New.
 (ix86_select_alt_pic_regnum): Add check on pseudo register.
 (ix86_save_reg): Likewise.
 (ix86_expand_prologue): Remove irrelevant code.
 (ix86_output_function_epilogue): Add check on pseudo register.
 (set_pic_reg_ever_alive): New.
 (legitimize_pic_address): Replace df_set_regs_ever_live with new
 set_pic_reg_ever_alive.
 (legitimize_tls_address): Likewise.
 (ix86_pic_register_p): New check.
 (ix86_delegitimize_address): Add check on pseudo register.
 (ix86_expand_call): Insert move from pseudo PIC register to ABI
 defined REAL_PIC_OFFSET_TABLE_REGNUM.
 (TARGET_INIT_PIC_REG): New.
 (TARGET_USE_PSEUDO_PIC_REG): New.
 (PIC_OFFSET_TABLE_REGNUM): New check.


 This patch breaks glibc's ld.so on i686.
 glibc has a check to make sure the PIC register is setup correctly:
 /* Consistency check for position-independent code.  */
 #ifdef __PIC__
 # define check_consistency()  \
   ({ int __res;  \
  __asm__ __volatile__  \
(LOAD_PIC_REG_STR (cx) ;  \
 subl %%ebx, %%ecx;  \
 je 1f;  \
 ud2;  \
 1:\n  \
 : =c (__res));  \
  __res; })
 #endif

 This depends on ebx being the PIC register.  Now we don't have this so
 we get ud2 in some cases.



 Thanks,
 Andrew Pinski


Re: [PATCH 2/X, i386, PR54232] Enable EBX for x86 in 32bits PIC code

2014-10-24 Thread Jeff Law

On 10/24/14 17:37, Evgeny Stupachenko wrote:

What if we remove the check?
glibc build pass?

That would be my inclination...   But it's not my decision to make.

The first check is to verify glibc builds and passes its testsuite with 
the new compiler and that check removed.


jeff



Re: [PATCH 1/X, i386, PR54232] Enable EBX for x86 in 32bits PIC code

2014-10-23 Thread Rainer Orth
Evgeny Stupachenko evstu...@gmail.com writes:

 Reattached.

 On Mon, Oct 13, 2014 at 8:22 PM, Uros Bizjak ubiz...@gmail.com wrote:
 On Mon, Oct 13, 2014 at 4:53 PM, Evgeny Stupachenko evstu...@gmail.com
 wrote:

 ChangeLog for testsuite:

 2014-10-13  Evgeny Stupachenko  evstu...@gmail.com

 PR target/8340
 PR middle-end/47602
 PR rtl-optimization/55458
 * gcc.target/i386/pic-1.c: Remove dg-error as test should pass now.
 * gcc.target/i386/pr55458.c: Likewise.
 * gcc.target/i386/pr47602.c: New.
 * gcc.target/i386/pr23098.c: Move to XFAIL.

The unconditional XFAIL is wrong: the test now XPASSes on
i386-pc-solaris2.11 and x86_64-unknown-linux-gnu, i686-unknown-linux-gnu
for 32-bit.  Please fix.

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


Re: [PATCH 1/X, i386, PR54232] Enable EBX for x86 in 32bits PIC code

2014-10-14 Thread Jakub Jelinek
On Fri, Oct 10, 2014 at 10:03:38AM -0600, Jeff Law wrote:
 Can you add a PR markers to your changelog
 
   PR target/8340
   PR middle-end/47602
   PR rtl-optimization/55458
 
 Actually I think there is an additional test in 47602.  Can you please add
 it to the suite?  You'll also want to change the state of 47602 to
 RESOLVED/FIXED.

Unfortunately this broke bootstrap on x86_64/i686-linux,
see http://gcc.gnu.org/PR63534
- pretty much everything with -m32 -fsplit-stack -fpic ICEs, -m32 -fpic -p
results in wrong-code, and I see significant code quality regressions even
on simple testcases.

For the first two, I think (and said it before already) that the current
model of emitting set_got from a target hook during RA can't work, as there
can be calls in the prologue, and the prologue is inserted before the
set_got in that case.  I really think the RA should in that case just tell
the backend whether and in which register it wants to have the PIC register
loaded upon start of the function, and it should be emit prologue pass
that should arrange for that.

As for the code quality, either some RA improvements are needed, or
postreload must be able to fix it up, or hardreg propagation (though,
cprop_hardreg is forward propagation rather than backwards, right?).
Better before prologue is emitted though, because that will save/restore
the badly chosen hard reg too.

Jakub


Re: [PATCH 1/X, i386, PR54232] Enable EBX for x86 in 32bits PIC code

2014-10-14 Thread H.J. Lu
On Mon, Oct 13, 2014 at 11:49 AM, H.J. Lu hjl.to...@gmail.com wrote:
 On Mon, Oct 13, 2014 at 9:32 AM, Evgeny Stupachenko evstu...@gmail.com 
 wrote:
 Reattached.

 On Mon, Oct 13, 2014 at 8:22 PM, Uros Bizjak ubiz...@gmail.com wrote:
 On Mon, Oct 13, 2014 at 4:53 PM, Evgeny Stupachenko evstu...@gmail.com 
 wrote:

 ChangeLog for testsuite:

 2014-10-13  Evgeny Stupachenko  evstu...@gmail.com

 PR target/8340
 PR middle-end/47602
 PR rtl-optimization/55458
 * gcc.target/i386/pic-1.c: Remove dg-error as test should pass now.
 * gcc.target/i386/pr55458.c: Likewise.
 * gcc.target/i386/pr47602.c: New.
 * gcc.target/i386/pr23098.c: Move to XFAIL.

 Reversed patch was attached. Please repost.

 Uros.

 This caused a regression:

 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63527

Another bootstrap failure:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63536

-- 
H.J.


Re: [PATCH 1/X, i386, PR54232] Enable EBX for x86 in 32bits PIC code

2014-10-14 Thread Jeff Law

On 10/14/14 07:00, Jakub Jelinek wrote:


For the first two, I think (and said it before already) that the current
model of emitting set_got from a target hook during RA can't work, as there
can be calls in the prologue, and the prologue is inserted before the
set_got in that case.  I really think the RA should in that case just tell
the backend whether and in which register it wants to have the PIC register
loaded upon start of the function, and it should be emit prologue pass
that should arrange for that.
That works for me -- I've been encouraging Intel to push emitting the 
PIC setup further and further back in the RTL pipeline.  Their early 
patches had it very early in the RTL pipeline and naturally there was 
fallout/bleedout in various places in the optimizers.


I don't see much value in emitting the PIC setup prior to allocation, 
all I see is problems.





As for the code quality, either some RA improvements are needed, or
postreload must be able to fix it up, or hardreg propagation (though,
cprop_hardreg is forward propagation rather than backwards, right?).
Better before prologue is emitted though, because that will save/restore
the badly chosen hard reg too.
RA improvements are the way to go -- however, my understanding is that 
overall the code is better now than it was before Intel's changes, so I 
don't consider the performance side as a blocker for this code.


The biggest performance issue identified so far is rematerialization. 
The initial patch Intel sent to me was totally unacceptable as it just 
hacked off optimizers rather than digging into the guts of why IRA/LRA 
was unable to sanely rematerialize the PIC register value.


jeff


Re: [PATCH 1/X, i386, PR54232] Enable EBX for x86 in 32bits PIC code

2014-10-14 Thread H.J. Lu
On Tue, Oct 14, 2014 at 9:43 AM, Jeff Law l...@redhat.com wrote:

 RA improvements are the way to go -- however, my understanding is that
 overall the code is better now than it was before Intel's changes, so I
 don't consider the performance side as a blocker for this code.


The new approach improves PIC code quality in functions where there
no frequent GOT access and extra register helps.

For ld.so and libc.so from glibc build, we use 2 registers to access GOT
instead of one register which may lead to lower performance in shared
libraries.


-- 
H.J.


Re: [PATCH 1/X, i386, PR54232] Enable EBX for x86 in 32bits PIC code

2014-10-13 Thread Evgeny Stupachenko
On Fri, Oct 10, 2014 at 8:03 PM, Jeff Law l...@redhat.com wrote:
 On 10/10/14 01:42, Evgeny Stupachenko wrote:

 Hi,

 The patch enables EBX in RA for x86 32bits PIC mode.
 It was discussed here:
 https://gcc.gnu.org/ml/gcc-patches/2014-09/msg02513.html
 Now there is working version with good performance and stability level
 - it could be a solid first step of EBX enabling.

 Bootstrap and make check passed.
 There are several changes in -m32 make check.

 New pass:
 gcc.target/i386/pr57003.c - before patch there was not enough registers to
 PASS

 ?!?  That doesn't make a lot of sense.  More likely it was Uros's fix from
 yesterday to regcprop which causes this to pass again.
Correct. I've marked it by mistake. The test is flaky and the patch
does not change anything for the test.

 Is it possible you updated your sources between testing runs and as a result
 picked up Uros's fix?


 New fails:

 gcc.target/i386/pic-1.c (test for errors, line 12) - now there are no
 errors as we can do asm insertions with EBX

 I think you should remove the dg-error directive.  That turns this test into
 a simple confirmation that we can use %ebx in an asm even when generating
 PIC code.

 Can you add a PR markers to your changelog

 PR target/8340
 PR middle-end/47602
 PR rtl-optimization/55458

 Actually I think there is an additional test in 47602.  Can you please add
 it to the suite?  You'll also want to change the state of 47602 to
 RESOLVED/FIXED.







 gcc.target/i386/pr23098.c scan-assembler-not .LC[0-9] - potential
 performance opportunity using constant immediate

 If you're not going to fix it, then you should xfail it.

 gcc.target/i386/pr55458.c (test for errors, line 10) - now there are
 no errors as there enough registers

 Right.  Remove the dg-error and turn this into a test that effective
 verifies that %ebx is no longer fixed when generating PIC code on i686.

 With those changes this is OK for the trunk.

 jeff



ChangeLog for testsuite:

2014-10-13  Evgeny Stupachenko  evstu...@gmail.com

PR target/8340
PR middle-end/47602
PR rtl-optimization/55458
* gcc.target/i386/pic-1.c: Remove dg-error as test should pass now.
* gcc.target/i386/pr55458.c: Likewise.
* gcc.target/i386/pr47602.c: New.
* gcc.target/i386/pr23098.c: Move to XFAIL.


enabling_ebx_tests.patch
Description: Binary data


Re: [PATCH 2/X, i386, PR54232] Enable EBX for x86 in 32bits PIC code

2014-10-13 Thread Evgeny Stupachenko
-#define PIC_OFFSET_TABLE_REGNUM \
-  ((TARGET_64BIT  (ix86_cmodel == CM_SMALL_PIC \
- || TARGET_PECOFF)) \
-   || !flag_pic ? INVALID_REGNUM \
-   : reload_completed ? REGNO (pic_offset_table_rtx) \
+#define PIC_OFFSET_TABLE_REGNUM \
+  ((TARGET_64BIT  (ix86_cmodel == CM_SMALL_PIC \
+ || TARGET_PECOFF)) \
+   || !flag_pic ? INVALID_REGNUM \
+   : pic_offset_table_rtx ? INVALID_REGNUM \
: REAL_PIC_OFFSET_TABLE_REGNUM)

No negative conditions, please. Also, please follow established
multi-level condition format, please see e.g. HARD_REGNO_NREGS
definition in i386.h.
I don't see how we can avoid negative condition here. If we remove
not from !flag_pic we'll need to add not to TARGET_64BIT and
TARGET_PECOFF.

I've done it this way:
+#define PIC_OFFSET_TABLE_REGNUM
 \
+  ((TARGET_64BIT  (ix86_cmodel == CM_SMALL_PIC   \
+ || TARGET_PECOFF))
 \
+   || !flag_pic
 \
+   ? INVALID_REGNUM\
+   : pic_offset_table_rtx  \
+ ? INVALID_REGNUM  \
+ : REAL_PIC_OFFSET_TABLE_REGNUM)

Is it ok?

On Fri, Oct 10, 2014 at 6:01 PM, Evgeny Stupachenko evstu...@gmail.com wrote:
 Updated ChangeLog:
 2014-10-10  Ilya Enkovich  ilya.enkov...@intel.com
 Vladimir Makarov  vmaka...@redhat.com

 * config/i386/i386.c (ix86_use_pseudo_pic_reg): New.
 (ix86_init_pic_reg): New.
 (ix86_select_alt_pic_regnum): Add check on pseudo register.
 (ix86_save_reg): Likewise.
 (ix86_expand_prologue): Remove PIC register initialization
 now performed in ix86_init_pic_reg.
 (ix86_output_function_epilogue): Add check on pseudo register.
 (set_pic_reg_ever_alive): New.
 (legitimize_pic_address): Replace df_set_regs_ever_live with new
 set_pic_reg_ever_alive.
 (legitimize_tls_address): Likewise.
 (ix86_pic_register_p): New check.
 (ix86_delegitimize_address): Add check on pseudo register.
 (ix86_expand_call): Insert move from pseudo PIC register to ABI
 defined REAL_PIC_OFFSET_TABLE_REGNUM.
 (TARGET_INIT_PIC_REG): New.
 (TARGET_USE_PSEUDO_PIC_REG): New.
 * config/i386/i386.h (PIC_OFFSET_TABLE_REGNUM): Return INVALID_REGNUM
 if pic_offset_table_rtx exists.

-  if (pic_offset_table_rtx)
+  if (pic_offset_table_rtx
+   (!reload_completed || !ix86_use_pseudo_pic_reg ()))

Hm, can you please add a comment for this change?

 I've added the following comment to the patch:
 -in which case we return (%ecx - %ebx) + foo.  */
 +in which case we return (%ecx - %ebx) + foo.
 +
 +Note that when pseudo_pic_reg is used we can generate it only
 +before reload_completed.  */

 On Fri, Oct 10, 2014 at 4:36 PM, Jakub Jelinek ja...@redhat.com wrote:
 On Fri, Oct 10, 2014 at 02:34:07PM +0200, Rainer Orth wrote:
 Uros Bizjak ubiz...@gmail.com writes:

  On Fri, Oct 10, 2014 at 9:43 AM, Evgeny Stupachenko evstu...@gmail.com 
  wrote:
  i386 specific part of the patch:
 
  2014-10-08  Ilya Enkovich  ilya.enkov...@intel.com
  Vladimir Makarov  vmaka...@redhat.com
  * gcc/config/i386/i386.c (ix86_use_pseudo_pic_reg): New.

 Evgeny: here and in your other submissions: drop the gcc prefix from the
 pathnames.  They are all relative to the directory the ChangeLog lives
 in.

 And add a blank line after after the e-mail lines.

 Jakub


Re: [PATCH 3/X, i386, PR54232] Enable EBX for x86 in 32bits PIC code

2014-10-13 Thread Evgeny Stupachenko
Patch updated with the comment:

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 2a64d2d..5fd6a82 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -12455,9 +12455,18 @@ ix86_address_cost (rtx x, enum machine_mode,
addr_space_t, bool)
  || REGNO (parts.index) = FIRST_PSEUDO_REGISTER)))
 cost++;

+  /* When address base or index is pic_offset_table_rtx we don't increase
+ address cost.  When a memop with pic_offset_table_rtx is not invariant
+ itself it most likely means that base or index is not invariant.
+ Therefore only pic_offset_table_rtx could be hoisted out, which is not
+ profitable for x86.  */
   if (parts.base
+   (!pic_offset_table_rtx
+ || REGNO (pic_offset_table_rtx) != REGNO(parts.base))
(!REG_P (parts.base) || REGNO (parts.base) = FIRST_PSEUDO_REGISTER)
parts.index
+   (!pic_offset_table_rtx
+ || REGNO (pic_offset_table_rtx) != REGNO(parts.index))
(!REG_P (parts.index) || REGNO (parts.index) = FIRST_PSEUDO_REGISTER)
parts.base != parts.index)
 cost++;

On Fri, Oct 10, 2014 at 3:04 PM, Uros Bizjak ubiz...@gmail.com wrote:
 On Fri, Oct 10, 2014 at 9:58 AM, Evgeny Stupachenko evstu...@gmail.com 
 wrote:
 the patch improves performance when previous are applied.
 It makes RTL loop invariant behavior for GOT loads same as it was
 before the 2 previous patches.

 The patch fixes x86 address cost so that cost for addresses with GOT
 register becomes less, how it was before enabling EBX.

 In x86_address_cost the result of “REGNO (parts.base) =
 FIRST_PSEUDO_REGISTER” for hard ebx was always false. The patch makes
 condition result
 the same when parts.base is GOT register (the same for parts.index).

 2014-10-08  Evgeny Stupachenko  evstu...@gmail.com
 * gcc/config/i386/i386.c (ix86_address_cost): Lower cost for
 when address contains GOT register.

 diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
 index b43e870..9d8cfd1 100644
 --- a/gcc/config/i386/i386.c
 +++ b/gcc/config/i386/i386.c
 @@ -12497,8 +12497,12 @@ ix86_address_cost (rtx x, enum machine_mode,
 addr_space_t, bool)
  cost++;

 Please add a short comment here, explaining the reason for new condition.

if (parts.base
 +   (!pic_offset_table_rtx
 + || REGNO (pic_offset_table_rtx) != REGNO(parts.base))
 (!REG_P (parts.base) || REGNO (parts.base) = 
 FIRST_PSEUDO_REGISTER)
 parts.index
 +   (!pic_offset_table_rtx
 + || REGNO (pic_offset_table_rtx) != REGNO(parts.index))
 (!REG_P (parts.index) || REGNO (parts.index) = 
 FIRST_PSEUDO_REGISTER)
 parts.base != parts.index)
  cost++;

 Otherwise LGTM, but please repost the patch with a comment.

 Uros.


Re: [PATCH 2/X, i386, PR54232] Enable EBX for x86 in 32bits PIC code

2014-10-13 Thread Uros Bizjak
On Mon, Oct 13, 2014 at 5:01 PM, Evgeny Stupachenko evstu...@gmail.com wrote:
-#define PIC_OFFSET_TABLE_REGNUM \
-  ((TARGET_64BIT  (ix86_cmodel == CM_SMALL_PIC \
- || TARGET_PECOFF)) \
-   || !flag_pic ? INVALID_REGNUM \
-   : reload_completed ? REGNO (pic_offset_table_rtx) \
+#define PIC_OFFSET_TABLE_REGNUM \
+  ((TARGET_64BIT  (ix86_cmodel == CM_SMALL_PIC \
+ || TARGET_PECOFF)) \
+   || !flag_pic ? INVALID_REGNUM \
+   : pic_offset_table_rtx ? INVALID_REGNUM \
: REAL_PIC_OFFSET_TABLE_REGNUM)

No negative conditions, please. Also, please follow established
multi-level condition format, please see e.g. HARD_REGNO_NREGS
definition in i386.h.
 I don't see how we can avoid negative condition here. If we remove
 not from !flag_pic we'll need to add not to TARGET_64BIT and
 TARGET_PECOFF.

 I've done it this way:
 +#define PIC_OFFSET_TABLE_REGNUM
  \
 +  ((TARGET_64BIT  (ix86_cmodel == CM_SMALL_PIC   \
 + || TARGET_PECOFF))
  \
 +   || !flag_pic
  \
 +   ? INVALID_REGNUM\
 +   : pic_offset_table_rtx  \
 + ? INVALID_REGNUM  \
 + : REAL_PIC_OFFSET_TABLE_REGNUM)

 Is it ok?

Oh, indeed. I missed the logical or. Maybe put the first condition
into parenthesis, to avoid confusion.

OK in any case.

Thanks,
Uros.


Re: [PATCH 3/X, i386, PR54232] Enable EBX for x86 in 32bits PIC code

2014-10-13 Thread Uros Bizjak
On Mon, Oct 13, 2014 at 5:17 PM, Evgeny Stupachenko evstu...@gmail.com wrote:
 Patch updated with the comment:

 diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
 index 2a64d2d..5fd6a82 100644
 --- a/gcc/config/i386/i386.c
 +++ b/gcc/config/i386/i386.c
 @@ -12455,9 +12455,18 @@ ix86_address_cost (rtx x, enum machine_mode,
 addr_space_t, bool)
   || REGNO (parts.index) = FIRST_PSEUDO_REGISTER)))
  cost++;

 +  /* When address base or index is pic_offset_table_rtx we don't increase
 + address cost.  When a memop with pic_offset_table_rtx is not invariant
 + itself it most likely means that base or index is not invariant.
 + Therefore only pic_offset_table_rtx could be hoisted out, which is not
 + profitable for x86.  */
if (parts.base
 +   (!pic_offset_table_rtx
 + || REGNO (pic_offset_table_rtx) != REGNO(parts.base))
 (!REG_P (parts.base) || REGNO (parts.base) = FIRST_PSEUDO_REGISTER)
 parts.index
 +   (!pic_offset_table_rtx
 + || REGNO (pic_offset_table_rtx) != REGNO(parts.index))
 (!REG_P (parts.index) || REGNO (parts.index) = 
 FIRST_PSEUDO_REGISTER)
 parts.base != parts.index)
  cost++;

LGTM.

OK.

Thanks,
Uros.


Re: [PATCH 1/X, i386, PR54232] Enable EBX for x86 in 32bits PIC code

2014-10-13 Thread Uros Bizjak
On Mon, Oct 13, 2014 at 4:53 PM, Evgeny Stupachenko evstu...@gmail.com wrote:

 ChangeLog for testsuite:

 2014-10-13  Evgeny Stupachenko  evstu...@gmail.com

 PR target/8340
 PR middle-end/47602
 PR rtl-optimization/55458
 * gcc.target/i386/pic-1.c: Remove dg-error as test should pass now.
 * gcc.target/i386/pr55458.c: Likewise.
 * gcc.target/i386/pr47602.c: New.
 * gcc.target/i386/pr23098.c: Move to XFAIL.

Reversed patch was attached. Please repost.

Uros.


Re: [PATCH 1/X, i386, PR54232] Enable EBX for x86 in 32bits PIC code

2014-10-13 Thread Jeff Law

On 10/13/14 08:53, Evgeny Stupachenko wrote:

ChangeLog for testsuite:

2014-10-13  Evgeny Stupachenko  evstu...@gmail.com

 PR target/8340
 PR middle-end/47602
 PR rtl-optimization/55458
 * gcc.target/i386/pic-1.c: Remove dg-error as test should pass now.
 * gcc.target/i386/pr55458.c: Likewise.
 * gcc.target/i386/pr47602.c: New.
 * gcc.target/i386/pr23098.c: Move to XFAIL.


Looks like you goof'd the patch slightly (reversed).

It's trivial enough that I can see what the correctly ordered patch is 
doing.


OK for the trunk at the same time the rest of the bits go in.

jeff



Re: [PATCH 1/X, i386, PR54232] Enable EBX for x86 in 32bits PIC code

2014-10-13 Thread Evgeny Stupachenko
Reattached.

On Mon, Oct 13, 2014 at 8:22 PM, Uros Bizjak ubiz...@gmail.com wrote:
 On Mon, Oct 13, 2014 at 4:53 PM, Evgeny Stupachenko evstu...@gmail.com 
 wrote:

 ChangeLog for testsuite:

 2014-10-13  Evgeny Stupachenko  evstu...@gmail.com

 PR target/8340
 PR middle-end/47602
 PR rtl-optimization/55458
 * gcc.target/i386/pic-1.c: Remove dg-error as test should pass now.
 * gcc.target/i386/pr55458.c: Likewise.
 * gcc.target/i386/pr47602.c: New.
 * gcc.target/i386/pr23098.c: Move to XFAIL.

 Reversed patch was attached. Please repost.

 Uros.


enabling_ebx_tests.patch
Description: Binary data


Re: [PATCH 1/X, i386, PR54232] Enable EBX for x86 in 32bits PIC code

2014-10-13 Thread Uros Bizjak
On Mon, Oct 13, 2014 at 6:32 PM, Evgeny Stupachenko evstu...@gmail.com wrote:
 Reattached.

 On Mon, Oct 13, 2014 at 8:22 PM, Uros Bizjak ubiz...@gmail.com wrote:
 On Mon, Oct 13, 2014 at 4:53 PM, Evgeny Stupachenko evstu...@gmail.com 
 wrote:

 ChangeLog for testsuite:

 2014-10-13  Evgeny Stupachenko  evstu...@gmail.com

 PR target/8340
 PR middle-end/47602
 PR rtl-optimization/55458
 * gcc.target/i386/pic-1.c: Remove dg-error as test should pass now.
 * gcc.target/i386/pr55458.c: Likewise.
 * gcc.target/i386/pr47602.c: New.
 * gcc.target/i386/pr23098.c: Move to XFAIL.

 Reversed patch was attached. Please repost.

OK.

Thanks,
Uros.


Re: [PATCH 1/X, i386, PR54232] Enable EBX for x86 in 32bits PIC code

2014-10-13 Thread H.J. Lu
On Mon, Oct 13, 2014 at 9:32 AM, Evgeny Stupachenko evstu...@gmail.com wrote:
 Reattached.

 On Mon, Oct 13, 2014 at 8:22 PM, Uros Bizjak ubiz...@gmail.com wrote:
 On Mon, Oct 13, 2014 at 4:53 PM, Evgeny Stupachenko evstu...@gmail.com 
 wrote:

 ChangeLog for testsuite:

 2014-10-13  Evgeny Stupachenko  evstu...@gmail.com

 PR target/8340
 PR middle-end/47602
 PR rtl-optimization/55458
 * gcc.target/i386/pic-1.c: Remove dg-error as test should pass now.
 * gcc.target/i386/pr55458.c: Likewise.
 * gcc.target/i386/pr47602.c: New.
 * gcc.target/i386/pr23098.c: Move to XFAIL.

 Reversed patch was attached. Please repost.

 Uros.

This caused a regression:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63527

-- 
H.J.


[PATCH 1/X, i386, PR54232] Enable EBX for x86 in 32bits PIC code

2014-10-10 Thread Evgeny Stupachenko
Hi,

The patch enables EBX in RA for x86 32bits PIC mode.
It was discussed here: https://gcc.gnu.org/ml/gcc-patches/2014-09/msg02513.html
Now there is working version with good performance and stability level
- it could be a solid first step of EBX enabling.

Bootstrap and make check passed.
There are several changes in -m32 make check.

New pass:
gcc.target/i386/pr57003.c - before patch there was not enough registers to PASS

New fails:

gcc.target/i386/pic-1.c (test for errors, line 12) - now there are no
errors as we can do asm insertions with EBX
gcc.target/i386/pr23098.c scan-assembler-not .LC[0-9] - potential
performance opportunity using constant immediate
gcc.target/i386/pr55458.c (test for errors, line 10) - now there are
no errors as there enough registers

Performance on Silvermont (for the group of 3 patches):

at -Ofast -flto -funroll-loops -fPIC
EEMBC benchs:
 +6% CoreMark
 +3% Denmark
Individual test gained up to +25%

SPEC2000int is almost unchanged, SPEC2000fp +1,7%:

at -O2 -fPIC
EEMBC benchs:
 +2% CoreMark
 +5% Denmark
Individual test gained up to +25%

SPEC2000int and SPEC2000fp are almost unchanged:

ChangeLog:

2014-10-08  Ilya Enkovich  ilya.enkov...@intel.com
Vladimir Makarov  vmaka...@redhat.com
* doc/tm.texi.in (TARGET_USE_PSEUDO_PIC_REG, TARGET_INIT_PIC_REG):
Document.
* doc/tm.texi: Regenerate.
* gcc/function.c (assign_parms): Generate pseudo register for PIC.
* gcc/init-regs.c (initialize_uninitialized_regs): Ignor pseudo PIC
register.
* gcc/ira-color.c (color_pass): Add check on pseudo register.
* gcc/ira-emit.c (change_loop): Don't create copies for PIC pseudo
register.
* gcc/ira.c (split_live_ranges_for_shrink_wrap): Add check on pseudo
register.
(ira): Add target specific PIC register initialization.
(do_reload): Keep PIC pseudo register.
* gcc/lra-assigns.c (spill_for): Add checks on pseudo register.
* gcc/lra-constraints.c (contains_symbol_ref_p): New.
(lra_constraints): Enable lra risky transformations when PIC is pseudo
register.
* gcc/shrink-wrap.c (try_shrink_wrapping): Add check on pseudo register.
* gcc/target.def (use_pseudo_pic_reg): New.
(init_pic_reg): New.


enabling_ebx_general.patch
Description: Binary data


[PATCH 2/X, i386, PR54232] Enable EBX for x86 in 32bits PIC code

2014-10-10 Thread Evgeny Stupachenko
i386 specific part of the patch:

2014-10-08  Ilya Enkovich  ilya.enkov...@intel.com
Vladimir Makarov  vmaka...@redhat.com
* gcc/config/i386/i386.c (ix86_use_pseudo_pic_reg): New.
(ix86_init_pic_reg): New.
(ix86_select_alt_pic_regnum): Add check on pseudo register.
(ix86_save_reg): Likewise.
(ix86_expand_prologue): Remove irrelevant code.
(ix86_output_function_epilogue): Add check on pseudo register.
(set_pic_reg_ever_alive): New.
(legitimize_pic_address): Replace df_set_regs_ever_live with new
set_pic_reg_ever_alive.
(legitimize_tls_address): Likewise.
(ix86_pic_register_p): New check.
(ix86_delegitimize_address): Add check on pseudo register.
(ix86_expand_call): Insert move from pseudo PIC register to ABI
defined REAL_PIC_OFFSET_TABLE_REGNUM.
(TARGET_INIT_PIC_REG): New.
(TARGET_USE_PSEUDO_PIC_REG): New.
(PIC_OFFSET_TABLE_REGNUM): New check.


enabling_ebx_i386.patch
Description: Binary data


[PATCH 3/X, i386, PR54232] Enable EBX for x86 in 32bits PIC code

2014-10-10 Thread Evgeny Stupachenko
the patch improves performance when previous are applied.
It makes RTL loop invariant behavior for GOT loads same as it was
before the 2 previous patches.

It improves 164.gzip (+9%), 253.perlbmk (+2%) giving ~0.5% to SPEC2000int
(compiled with “-m32 -Ofast -flto -funroll-loops -fPIC”

For example in 164.gzip.

Before enabling EBX:
loop begin:

(I) 1. SI162 = prev (global address)
2. SI163 = SI142  0xfff (SI 142 modified in the loop)
3. SI164 = EBX + SI162
4. HI107 = SI163*2 + SI164
5. SI142 = HI107

Only INSN 1. treated as invariant and later combine propagates 2,3,4 into 5.

After enabling EBX:
loop begin:

(I) 1. SI163 = prev (global address)
2. SI164 = SI142  0xfff (SI 142 modified in the loop)
(I) 3. SI165 = SI143 + SI163
4. HI107 = SI164*2 + SI165
5. SI142 = HI107

INSNs 1. and 3. are treated as invariants (143 is GOT register) and
hoisted outside the loop

After that combine pass was unable to combine INSNs inside and outside
the loop, which lead to higher register pressure and therefore new
spills/fills.

The patch fixes x86 address cost so that cost for addresses with GOT
register becomes less, how it was before enabling EBX.

In x86_address_cost the result of “REGNO (parts.base) =
FIRST_PSEUDO_REGISTER” for hard ebx was always false. The patch makes
condition result
the same when parts.base is GOT register (the same for parts.index).

2014-10-08  Evgeny Stupachenko  evstu...@gmail.com
* gcc/config/i386/i386.c (ix86_address_cost): Lower cost for
when address contains GOT register.

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index b43e870..9d8cfd1 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -12497,8 +12497,12 @@ ix86_address_cost (rtx x, enum machine_mode,
addr_space_t, bool)
 cost++;

   if (parts.base
+   (!pic_offset_table_rtx
+ || REGNO (pic_offset_table_rtx) != REGNO(parts.base))
(!REG_P (parts.base) || REGNO (parts.base) = FIRST_PSEUDO_REGISTER)
parts.index
+   (!pic_offset_table_rtx
+ || REGNO (pic_offset_table_rtx) != REGNO(parts.index))
(!REG_P (parts.index) || REGNO (parts.index) = FIRST_PSEUDO_REGISTER)
parts.base != parts.index)
 cost++;


Re: [PATCH 2/X, i386, PR54232] Enable EBX for x86 in 32bits PIC code

2014-10-10 Thread Uros Bizjak
On Fri, Oct 10, 2014 at 9:43 AM, Evgeny Stupachenko evstu...@gmail.com wrote:
 i386 specific part of the patch:

 2014-10-08  Ilya Enkovich  ilya.enkov...@intel.com
 Vladimir Makarov  vmaka...@redhat.com
 * gcc/config/i386/i386.c (ix86_use_pseudo_pic_reg): New.
 (ix86_init_pic_reg): New.
 (ix86_select_alt_pic_regnum): Add check on pseudo register.
 (ix86_save_reg): Likewise.
 (ix86_expand_prologue): Remove irrelevant code.
 (ix86_output_function_epilogue): Add check on pseudo register.
 (set_pic_reg_ever_alive): New.
 (legitimize_pic_address): Replace df_set_regs_ever_live with new
 set_pic_reg_ever_alive.
 (legitimize_tls_address): Likewise.
 (ix86_pic_register_p): New check.
 (ix86_delegitimize_address): Add check on pseudo register.
 (ix86_expand_call): Insert move from pseudo PIC register to ABI
 defined REAL_PIC_OFFSET_TABLE_REGNUM.
 (TARGET_INIT_PIC_REG): New.
 (TARGET_USE_PSEUDO_PIC_REG): New.
 (PIC_OFFSET_TABLE_REGNUM): New check.

Couple of nits below.

+/* Set regs_ever_live for PIC base address register
+   to true if required.  */
+static void
+set_pic_reg_ever_alive ()

Please rename this function to set_pic_reg_ever_live.

-#define PIC_OFFSET_TABLE_REGNUM \
-  ((TARGET_64BIT  (ix86_cmodel == CM_SMALL_PIC \
- || TARGET_PECOFF)) \
-   || !flag_pic ? INVALID_REGNUM \
-   : reload_completed ? REGNO (pic_offset_table_rtx) \
+#define PIC_OFFSET_TABLE_REGNUM \
+  ((TARGET_64BIT  (ix86_cmodel == CM_SMALL_PIC \
+ || TARGET_PECOFF)) \
+   || !flag_pic ? INVALID_REGNUM \
+   : pic_offset_table_rtx ? INVALID_REGNUM \
: REAL_PIC_OFFSET_TABLE_REGNUM)

No negative conditions, please. Also, please follow established
multi-level condition format, please see e.g. HARD_REGNO_NREGS
definition in i386.h.

OK for mainline after infrastructure patch is approved.

Thanks,
Uros.


Re: [PATCH 3/X, i386, PR54232] Enable EBX for x86 in 32bits PIC code

2014-10-10 Thread Uros Bizjak
On Fri, Oct 10, 2014 at 9:58 AM, Evgeny Stupachenko evstu...@gmail.com wrote:
 the patch improves performance when previous are applied.
 It makes RTL loop invariant behavior for GOT loads same as it was
 before the 2 previous patches.

 The patch fixes x86 address cost so that cost for addresses with GOT
 register becomes less, how it was before enabling EBX.

 In x86_address_cost the result of “REGNO (parts.base) =
 FIRST_PSEUDO_REGISTER” for hard ebx was always false. The patch makes
 condition result
 the same when parts.base is GOT register (the same for parts.index).

 2014-10-08  Evgeny Stupachenko  evstu...@gmail.com
 * gcc/config/i386/i386.c (ix86_address_cost): Lower cost for
 when address contains GOT register.

 diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
 index b43e870..9d8cfd1 100644
 --- a/gcc/config/i386/i386.c
 +++ b/gcc/config/i386/i386.c
 @@ -12497,8 +12497,12 @@ ix86_address_cost (rtx x, enum machine_mode,
 addr_space_t, bool)
  cost++;

Please add a short comment here, explaining the reason for new condition.

if (parts.base
 +   (!pic_offset_table_rtx
 + || REGNO (pic_offset_table_rtx) != REGNO(parts.base))
 (!REG_P (parts.base) || REGNO (parts.base) = FIRST_PSEUDO_REGISTER)
 parts.index
 +   (!pic_offset_table_rtx
 + || REGNO (pic_offset_table_rtx) != REGNO(parts.index))
 (!REG_P (parts.index) || REGNO (parts.index) = 
 FIRST_PSEUDO_REGISTER)
 parts.base != parts.index)
  cost++;

Otherwise LGTM, but please repost the patch with a comment.

Uros.


Re: [PATCH 2/X, i386, PR54232] Enable EBX for x86 in 32bits PIC code

2014-10-10 Thread Uros Bizjak
On Fri, Oct 10, 2014 at 9:43 AM, Evgeny Stupachenko evstu...@gmail.com wrote:
 i386 specific part of the patch:

 2014-10-08  Ilya Enkovich  ilya.enkov...@intel.com
 Vladimir Makarov  vmaka...@redhat.com
 * gcc/config/i386/i386.c (ix86_use_pseudo_pic_reg): New.
 (ix86_init_pic_reg): New.
 (ix86_select_alt_pic_regnum): Add check on pseudo register.
 (ix86_save_reg): Likewise.
 (ix86_expand_prologue): Remove irrelevant code.

Please mention *which* code you removed here.

 (ix86_output_function_epilogue): Add check on pseudo register.
 (set_pic_reg_ever_alive): New.
 (legitimize_pic_address): Replace df_set_regs_ever_live with new
 set_pic_reg_ever_alive.
 (legitimize_tls_address): Likewise.
 (ix86_pic_register_p): New check.
 (ix86_delegitimize_address): Add check on pseudo register.
 (ix86_expand_call): Insert move from pseudo PIC register to ABI
 defined REAL_PIC_OFFSET_TABLE_REGNUM.
 (TARGET_INIT_PIC_REG): New.
 (TARGET_USE_PSEUDO_PIC_REG): New.
 (PIC_OFFSET_TABLE_REGNUM): New check.

This is not New check, but changed one. Please mention *what* changed.

-  if (pic_offset_table_rtx)
+  if (pic_offset_table_rtx
+   (!reload_completed || !ix86_use_pseudo_pic_reg ()))

Hm, can you please add a comment for this change?

Uros.


Re: [PATCH 2/X, i386, PR54232] Enable EBX for x86 in 32bits PIC code

2014-10-10 Thread Rainer Orth
Uros Bizjak ubiz...@gmail.com writes:

 On Fri, Oct 10, 2014 at 9:43 AM, Evgeny Stupachenko evstu...@gmail.com 
 wrote:
 i386 specific part of the patch:

 2014-10-08  Ilya Enkovich  ilya.enkov...@intel.com
 Vladimir Makarov  vmaka...@redhat.com
 * gcc/config/i386/i386.c (ix86_use_pseudo_pic_reg): New.

Evgeny: here and in your other submissions: drop the gcc prefix from the
pathnames.  They are all relative to the directory the ChangeLog lives
in.

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


Re: [PATCH 2/X, i386, PR54232] Enable EBX for x86 in 32bits PIC code

2014-10-10 Thread Jakub Jelinek
On Fri, Oct 10, 2014 at 02:34:07PM +0200, Rainer Orth wrote:
 Uros Bizjak ubiz...@gmail.com writes:
 
  On Fri, Oct 10, 2014 at 9:43 AM, Evgeny Stupachenko evstu...@gmail.com 
  wrote:
  i386 specific part of the patch:
 
  2014-10-08  Ilya Enkovich  ilya.enkov...@intel.com
  Vladimir Makarov  vmaka...@redhat.com
  * gcc/config/i386/i386.c (ix86_use_pseudo_pic_reg): New.
 
 Evgeny: here and in your other submissions: drop the gcc prefix from the
 pathnames.  They are all relative to the directory the ChangeLog lives
 in.

And add a blank line after after the e-mail lines.

Jakub


Re: [PATCH 2/X, i386, PR54232] Enable EBX for x86 in 32bits PIC code

2014-10-10 Thread Evgeny Stupachenko
Updated ChangeLog:
2014-10-10  Ilya Enkovich  ilya.enkov...@intel.com
Vladimir Makarov  vmaka...@redhat.com

* config/i386/i386.c (ix86_use_pseudo_pic_reg): New.
(ix86_init_pic_reg): New.
(ix86_select_alt_pic_regnum): Add check on pseudo register.
(ix86_save_reg): Likewise.
(ix86_expand_prologue): Remove PIC register initialization
now performed in ix86_init_pic_reg.
(ix86_output_function_epilogue): Add check on pseudo register.
(set_pic_reg_ever_alive): New.
(legitimize_pic_address): Replace df_set_regs_ever_live with new
set_pic_reg_ever_alive.
(legitimize_tls_address): Likewise.
(ix86_pic_register_p): New check.
(ix86_delegitimize_address): Add check on pseudo register.
(ix86_expand_call): Insert move from pseudo PIC register to ABI
defined REAL_PIC_OFFSET_TABLE_REGNUM.
(TARGET_INIT_PIC_REG): New.
(TARGET_USE_PSEUDO_PIC_REG): New.
* config/i386/i386.h (PIC_OFFSET_TABLE_REGNUM): Return INVALID_REGNUM
if pic_offset_table_rtx exists.

-  if (pic_offset_table_rtx)
+  if (pic_offset_table_rtx
+   (!reload_completed || !ix86_use_pseudo_pic_reg ()))

Hm, can you please add a comment for this change?

I've added the following comment to the patch:
-in which case we return (%ecx - %ebx) + foo.  */
+in which case we return (%ecx - %ebx) + foo.
+
+Note that when pseudo_pic_reg is used we can generate it only
+before reload_completed.  */

On Fri, Oct 10, 2014 at 4:36 PM, Jakub Jelinek ja...@redhat.com wrote:
 On Fri, Oct 10, 2014 at 02:34:07PM +0200, Rainer Orth wrote:
 Uros Bizjak ubiz...@gmail.com writes:

  On Fri, Oct 10, 2014 at 9:43 AM, Evgeny Stupachenko evstu...@gmail.com 
  wrote:
  i386 specific part of the patch:
 
  2014-10-08  Ilya Enkovich  ilya.enkov...@intel.com
  Vladimir Makarov  vmaka...@redhat.com
  * gcc/config/i386/i386.c (ix86_use_pseudo_pic_reg): New.

 Evgeny: here and in your other submissions: drop the gcc prefix from the
 pathnames.  They are all relative to the directory the ChangeLog lives
 in.

 And add a blank line after after the e-mail lines.

 Jakub


Re: [PATCH 1/X, i386, PR54232] Enable EBX for x86 in 32bits PIC code

2014-10-10 Thread Vladimir Makarov

On 2014-10-10 3:42 AM, Evgeny Stupachenko wrote:

Hi,

The patch enables EBX in RA for x86 32bits PIC mode.
It was discussed here: https://gcc.gnu.org/ml/gcc-patches/2014-09/msg02513.html
Now there is working version with good performance and stability level
- it could be a solid first step of EBX enabling.

Bootstrap and make check passed.
There are several changes in -m32 make check.

New pass:
gcc.target/i386/pr57003.c - before patch there was not enough registers to PASS

New fails:

gcc.target/i386/pic-1.c (test for errors, line 12) - now there are no
errors as we can do asm insertions with EBX
gcc.target/i386/pr23098.c scan-assembler-not .LC[0-9] - potential
performance opportunity using constant immediate
gcc.target/i386/pr55458.c (test for errors, line 10) - now there are
no errors as there enough registers

Performance on Silvermont (for the group of 3 patches):

at -Ofast -flto -funroll-loops -fPIC
EEMBC benchs:
  +6% CoreMark
  +3% Denmark
Individual test gained up to +25%

SPEC2000int is almost unchanged, SPEC2000fp +1,7%:

at -O2 -fPIC
EEMBC benchs:
  +2% CoreMark
  +5% Denmark
Individual test gained up to +25%

SPEC2000int and SPEC2000fp are almost unchanged:



Great results!  The changes to RA (all ira* lra* file changes) is ok for 
me.  I'll probably add some comments to these changes on next week.


Thanks.



Re: [PATCH 1/X, i386, PR54232] Enable EBX for x86 in 32bits PIC code

2014-10-10 Thread Jeff Law

On 10/10/14 01:42, Evgeny Stupachenko wrote:

Hi,

The patch enables EBX in RA for x86 32bits PIC mode.
It was discussed here: https://gcc.gnu.org/ml/gcc-patches/2014-09/msg02513.html
Now there is working version with good performance and stability level
- it could be a solid first step of EBX enabling.

Bootstrap and make check passed.
There are several changes in -m32 make check.

New pass:
gcc.target/i386/pr57003.c - before patch there was not enough registers to PASS
?!?  That doesn't make a lot of sense.  More likely it was Uros's fix 
from yesterday to regcprop which causes this to pass again.


Is it possible you updated your sources between testing runs and as a 
result picked up Uros's fix?




New fails:

gcc.target/i386/pic-1.c (test for errors, line 12) - now there are no
errors as we can do asm insertions with EBX
I think you should remove the dg-error directive.  That turns this test 
into a simple confirmation that we can use %ebx in an asm even when 
generating PIC code.


Can you add a PR markers to your changelog

PR target/8340
PR middle-end/47602
PR rtl-optimization/55458

Actually I think there is an additional test in 47602.  Can you please 
add it to the suite?  You'll also want to change the state of 47602 to 
RESOLVED/FIXED.









gcc.target/i386/pr23098.c scan-assembler-not .LC[0-9] - potential
performance opportunity using constant immediate

If you're not going to fix it, then you should xfail it.


gcc.target/i386/pr55458.c (test for errors, line 10) - now there are
no errors as there enough registers
Right.  Remove the dg-error and turn this into a test that effective 
verifies that %ebx is no longer fixed when generating PIC code on i686.


With those changes this is OK for the trunk.

jeff




Re: Enable EBX for x86 in 32bits PIC code

2014-09-29 Thread Jakub Jelinek
On Wed, Sep 24, 2014 at 03:20:44PM -0600, Jeff Law wrote:
 On 09/24/14 14:32, Ilya Enkovich wrote:
 2014-09-24 19:27 GMT+04:00 Jeff Law l...@redhat.com:
 On 09/24/14 00:56, Ilya Enkovich wrote:
 
 
 After register allocation we have no idea where GOT address is and
 therefore delegitimize_address target hook becomes less efficient and
 cannot remove UNSPECs. That's what I see now when build GCC with patch
 applied:
 
 In theory this shouldn't be too hard to fix.
 
 I haven't looked at the code, but it might be something looking explicitly
 for ebx by register #, or something similar.  Which case within
 delegitimize_address isn't firing as it should after your changes?
 
 It is the case I had to fix:
 
 @@ -14415,7 +14433,8 @@ ix86_delegitimize_address (rtx x)
   ...
   movl foo@GOTOFF(%ecx), %edx
   in which case we return (%ecx - %ebx) + foo.  */
 -  if (pic_offset_table_rtx)
 +  if (pic_offset_table_rtx
 +  (!reload_completed || !ix86_use_pseudo_pic_reg ()))
   result = gen_rtx_PLUS (Pmode, gen_rtx_MINUS (Pmode, copy_rtx 
  (addend),
   pic_offset_table_rtx),
 result);
 
 Originally if there is a UNSPEC_GOTOFFSET but no EBX usage then we
 just remove this UNSPEC and substract EBX value.  With pseudo PIC reg
 we should use PIC register instead of EBX but it is unclear what to
 use after register allocation.
 What's the RTL before  after allocation?  Feel free to just pass along the
 dump files for sum_r4 that you referenced in a prior message.

I wonder if during/after reload we just couldn't look at
ORIGINAL_REGNO of hard regs if ix86_use_pseudo_pic_reg.  Or is that
the other case, where you don't have any PIC register replacement around,
and want to subtract something?  Perhaps in that case we could just
subtract the value of _GLOBAL_OFFSET_TABLE_ symbol if we have nothing better
around.

Jakub


Re: Enable EBX for x86 in 32bits PIC code

2014-09-29 Thread Jakub Jelinek
On Wed, Sep 24, 2014 at 03:20:44PM -0600, Jeff Law wrote:
 On 09/24/14 14:32, Ilya Enkovich wrote:
 2014-09-24 19:27 GMT+04:00 Jeff Law l...@redhat.com:
 On 09/24/14 00:56, Ilya Enkovich wrote:
 
 
 After register allocation we have no idea where GOT address is and
 therefore delegitimize_address target hook becomes less efficient and
 cannot remove UNSPECs. That's what I see now when build GCC with patch
 applied:
 
 In theory this shouldn't be too hard to fix.
 
 I haven't looked at the code, but it might be something looking explicitly
 for ebx by register #, or something similar.  Which case within
 delegitimize_address isn't firing as it should after your changes?
 
 It is the case I had to fix:
 
 @@ -14415,7 +14433,8 @@ ix86_delegitimize_address (rtx x)
   ...
   movl foo@GOTOFF(%ecx), %edx
   in which case we return (%ecx - %ebx) + foo.  */
 -  if (pic_offset_table_rtx)
 +  if (pic_offset_table_rtx
 +  (!reload_completed || !ix86_use_pseudo_pic_reg ()))
   result = gen_rtx_PLUS (Pmode, gen_rtx_MINUS (Pmode, copy_rtx 
  (addend),
   pic_offset_table_rtx),
 result);
 
 Originally if there is a UNSPEC_GOTOFFSET but no EBX usage then we
 just remove this UNSPEC and substract EBX value.  With pseudo PIC reg
 we should use PIC register instead of EBX but it is unclear what to
 use after register allocation.
 What's the RTL before  after allocation?  Feel free to just pass along the
 dump files for sum_r4 that you referenced in a prior message.

I wonder if during/after reload we just couldn't look at
ORIGINAL_REGNO of hard regs if ix86_use_pseudo_pic_reg.  Or is that
the other case, where you don't have any PIC register replacement around,
and want to subtract something?  Perhaps in that case we could just
subtract the value of _GLOBAL_OFFSET_TABLE_ symbol if we have nothing better
around.

Jakub


Re: Enable EBX for x86 in 32bits PIC code

2014-09-24 Thread Ilya Enkovich
2014-09-23 20:10 GMT+04:00 Jeff Law l...@redhat.com:
 On 09/23/14 10:03, Jakub Jelinek wrote:

 On Tue, Sep 23, 2014 at 10:00:00AM -0600, Jeff Law wrote:

 On 09/23/14 08:34, Jakub Jelinek wrote:

 On Tue, Sep 23, 2014 at 05:54:37PM +0400, Ilya Enkovich wrote:

 use fixed EBX at least until we make sure pseudo PIC doesn't harm debug
 info generation.  If we have such option then gcc.target/i386/pic-1.c
 and


 For debug info, it seems you are already handling this in
 delegitimize_address target hook, I'd suggest just building some very
 large
 shared library at -O2 -g -fpic on i?86 and either look at the
 sizes of .debug_info/.debug_loc sections with/without the patch,
 or use the locstat utility from elfutils (talk to Petr Machata if
 needed).

 Can't hurt, but I really don't see how changing from a fixed to an
 allocatable register is going to muck up debug info in any significant
 way.


 What matters is if the delegitimize_address target hook is as efficient in
 delegitimization as before.  E.g. if it previously matched only when
 seeing
 %ebx + gotoff or similar, and wouldn't match anything now, some vars could
 have debug locations including UNSPEC and be dropped on the floor.

 Ah, yea, that makes sense.

 jeff


After register allocation we have no idea where GOT address is and
therefore delegitimize_address target hook becomes less efficient and
cannot remove UNSPECs. That's what I see now when build GCC with patch
applied:

../../../../gcc/libgfortran/generated/sum_r4.c: In function 'msum_r4':
../../../../gcc/libgfortran/generated/sum_r4.c:195:1: note:
non-delegitimized UNSPEC UNSPEC_GOTOFF (1) found in variable location
 msum_r4 (gfc_array_r4 * const restrict retarray,
 ^
../../../../gcc/libgfortran/generated/sum_r4.c:195:1: note:
non-delegitimized UNSPEC UNSPEC_GOTOFF (1) found in variable location
../../../../gcc/libgfortran/generated/sum_r4.c:195:1: note:
non-delegitimized UNSPEC UNSPEC_GOTOFF (1) found in variable location
../../../../gcc/libgfortran/generated/sum_r4.c:195:1: note:
non-delegitimized UNSPEC UNSPEC_GOTOFF (1) found in variable location
../../../../gcc/libgfortran/generated/sum_r8.c: In function 'msum_r8':
../../../../gcc/libgfortran/generated/sum_r8.c:195:1: note:
non-delegitimized UNSPEC UNSPEC_GOTOFF (1) found in variable location
 msum_r8 (gfc_array_r8 * const restrict retarray,
 ^
../../../../gcc/libgfortran/generated/sum_r8.c:195:1: note:
non-delegitimized UNSPEC UNSPEC_GOTOFF (1) found in variable location
../../../../gcc/libgfortran/generated/sum_r8.c:195:1: note:
non-delegitimized UNSPEC UNSPEC_GOTOFF (1) found in variable location
../../../../gcc/libgfortran/generated/sum_r8.c:195:1: note:
non-delegitimized UNSPEC UNSPEC_GOTOFF (1) found in variable location


Ilya


Re: Enable EBX for x86 in 32bits PIC code

2014-09-24 Thread Jeff Law

On 09/24/14 00:56, Ilya Enkovich wrote:

2014-09-23 20:10 GMT+04:00 Jeff Law l...@redhat.com:

On 09/23/14 10:03, Jakub Jelinek wrote:


On Tue, Sep 23, 2014 at 10:00:00AM -0600, Jeff Law wrote:


On 09/23/14 08:34, Jakub Jelinek wrote:


On Tue, Sep 23, 2014 at 05:54:37PM +0400, Ilya Enkovich wrote:


use fixed EBX at least until we make sure pseudo PIC doesn't harm debug
info generation.  If we have such option then gcc.target/i386/pic-1.c
and



For debug info, it seems you are already handling this in
delegitimize_address target hook, I'd suggest just building some very
large
shared library at -O2 -g -fpic on i?86 and either look at the
sizes of .debug_info/.debug_loc sections with/without the patch,
or use the locstat utility from elfutils (talk to Petr Machata if
needed).


Can't hurt, but I really don't see how changing from a fixed to an
allocatable register is going to muck up debug info in any significant
way.



What matters is if the delegitimize_address target hook is as efficient in
delegitimization as before.  E.g. if it previously matched only when
seeing
%ebx + gotoff or similar, and wouldn't match anything now, some vars could
have debug locations including UNSPEC and be dropped on the floor.


Ah, yea, that makes sense.

jeff



After register allocation we have no idea where GOT address is and
therefore delegitimize_address target hook becomes less efficient and
cannot remove UNSPECs. That's what I see now when build GCC with patch
applied:

In theory this shouldn't be too hard to fix.

I haven't looked at the code, but it might be something looking 
explicitly for ebx by register #, or something similar.  Which case 
within delegitimize_address isn't firing as it should after your changes?


jeff



Re: Enable EBX for x86 in 32bits PIC code

2014-09-24 Thread Ilya Enkovich
2014-09-24 19:27 GMT+04:00 Jeff Law l...@redhat.com:
 On 09/24/14 00:56, Ilya Enkovich wrote:

 2014-09-23 20:10 GMT+04:00 Jeff Law l...@redhat.com:

 On 09/23/14 10:03, Jakub Jelinek wrote:


 On Tue, Sep 23, 2014 at 10:00:00AM -0600, Jeff Law wrote:


 On 09/23/14 08:34, Jakub Jelinek wrote:


 On Tue, Sep 23, 2014 at 05:54:37PM +0400, Ilya Enkovich wrote:


 use fixed EBX at least until we make sure pseudo PIC doesn't harm
 debug
 info generation.  If we have such option then gcc.target/i386/pic-1.c
 and



 For debug info, it seems you are already handling this in
 delegitimize_address target hook, I'd suggest just building some very
 large
 shared library at -O2 -g -fpic on i?86 and either look at the
 sizes of .debug_info/.debug_loc sections with/without the patch,
 or use the locstat utility from elfutils (talk to Petr Machata if
 needed).


 Can't hurt, but I really don't see how changing from a fixed to an
 allocatable register is going to muck up debug info in any significant
 way.



 What matters is if the delegitimize_address target hook is as efficient
 in
 delegitimization as before.  E.g. if it previously matched only when
 seeing
 %ebx + gotoff or similar, and wouldn't match anything now, some vars
 could
 have debug locations including UNSPEC and be dropped on the floor.


 Ah, yea, that makes sense.

 jeff



 After register allocation we have no idea where GOT address is and
 therefore delegitimize_address target hook becomes less efficient and
 cannot remove UNSPECs. That's what I see now when build GCC with patch
 applied:

 In theory this shouldn't be too hard to fix.

 I haven't looked at the code, but it might be something looking explicitly
 for ebx by register #, or something similar.  Which case within
 delegitimize_address isn't firing as it should after your changes?

It is the case I had to fix:

@@ -14415,7 +14433,8 @@ ix86_delegitimize_address (rtx x)
 ...
 movl foo@GOTOFF(%ecx), %edx
 in which case we return (%ecx - %ebx) + foo.  */
-  if (pic_offset_table_rtx)
+  if (pic_offset_table_rtx
+  (!reload_completed || !ix86_use_pseudo_pic_reg ()))
 result = gen_rtx_PLUS (Pmode, gen_rtx_MINUS (Pmode, copy_rtx (addend),
 pic_offset_table_rtx),
   result);

Originally if there is a UNSPEC_GOTOFFSET but no EBX usage then we
just remove this UNSPEC and substract EBX value.  With pseudo PIC reg
we should use PIC register instead of EBX but it is unclear what to
use after register allocation.

Ilya


 jeff



Re: Enable EBX for x86 in 32bits PIC code

2014-09-24 Thread Jeff Law

On 09/24/14 14:32, Ilya Enkovich wrote:

2014-09-24 19:27 GMT+04:00 Jeff Law l...@redhat.com:

On 09/24/14 00:56, Ilya Enkovich wrote:




After register allocation we have no idea where GOT address is and
therefore delegitimize_address target hook becomes less efficient and
cannot remove UNSPECs. That's what I see now when build GCC with patch
applied:


In theory this shouldn't be too hard to fix.

I haven't looked at the code, but it might be something looking explicitly
for ebx by register #, or something similar.  Which case within
delegitimize_address isn't firing as it should after your changes?


It is the case I had to fix:

@@ -14415,7 +14433,8 @@ ix86_delegitimize_address (rtx x)
  ...
  movl foo@GOTOFF(%ecx), %edx
  in which case we return (%ecx - %ebx) + foo.  */
-  if (pic_offset_table_rtx)
+  if (pic_offset_table_rtx
+  (!reload_completed || !ix86_use_pseudo_pic_reg ()))
  result = gen_rtx_PLUS (Pmode, gen_rtx_MINUS (Pmode, copy_rtx (addend),
  pic_offset_table_rtx),
result);

Originally if there is a UNSPEC_GOTOFFSET but no EBX usage then we
just remove this UNSPEC and substract EBX value.  With pseudo PIC reg
we should use PIC register instead of EBX but it is unclear what to
use after register allocation.
What's the RTL before  after allocation?  Feel free to just pass along 
the dump files for sum_r4 that you referenced in a prior message.


jeff


Re: Enable EBX for x86 in 32bits PIC code

2014-09-24 Thread Ilya Enkovich
2014-09-23 20:10 GMT+04:00 Jeff Law l...@redhat.com:
 On 09/23/14 10:03, Jakub Jelinek wrote:

 On Tue, Sep 23, 2014 at 10:00:00AM -0600, Jeff Law wrote:

 On 09/23/14 08:34, Jakub Jelinek wrote:

 On Tue, Sep 23, 2014 at 05:54:37PM +0400, Ilya Enkovich wrote:

 use fixed EBX at least until we make sure pseudo PIC doesn't harm debug
 info generation.  If we have such option then gcc.target/i386/pic-1.c
 and


 For debug info, it seems you are already handling this in
 delegitimize_address target hook, I'd suggest just building some very
 large
 shared library at -O2 -g -fpic on i?86 and either look at the
 sizes of .debug_info/.debug_loc sections with/without the patch,
 or use the locstat utility from elfutils (talk to Petr Machata if
 needed).

 Can't hurt, but I really don't see how changing from a fixed to an
 allocatable register is going to muck up debug info in any significant
 way.


 What matters is if the delegitimize_address target hook is as efficient in
 delegitimization as before.  E.g. if it previously matched only when
 seeing
 %ebx + gotoff or similar, and wouldn't match anything now, some vars could
 have debug locations including UNSPEC and be dropped on the floor.

 Ah, yea, that makes sense.

 jeff


After register allocation we have no idea where GOT address is and
therefore delegitimize_address target hook becomes less efficient and
cannot remove UNSPECs. That's what I see now when build GCC with patch
applied:

../../../../gcc/libgfortran/generated/sum_r4.c: In function 'msum_r4':
../../../../gcc/libgfortran/generated/sum_r4.c:195:1: note:
non-delegitimized UNSPEC UNSPEC_GOTOFF (1) found in variable location
 msum_r4 (gfc_array_r4 * const restrict retarray,
 ^
../../../../gcc/libgfortran/generated/sum_r4.c:195:1: note:
non-delegitimized UNSPEC UNSPEC_GOTOFF (1) found in variable location
../../../../gcc/libgfortran/generated/sum_r4.c:195:1: note:
non-delegitimized UNSPEC UNSPEC_GOTOFF (1) found in variable location
../../../../gcc/libgfortran/generated/sum_r4.c:195:1: note:
non-delegitimized UNSPEC UNSPEC_GOTOFF (1) found in variable location
../../../../gcc/libgfortran/generated/sum_r8.c: In function 'msum_r8':
../../../../gcc/libgfortran/generated/sum_r8.c:195:1: note:
non-delegitimized UNSPEC UNSPEC_GOTOFF (1) found in variable location
 msum_r8 (gfc_array_r8 * const restrict retarray,
 ^
../../../../gcc/libgfortran/generated/sum_r8.c:195:1: note:
non-delegitimized UNSPEC UNSPEC_GOTOFF (1) found in variable location
../../../../gcc/libgfortran/generated/sum_r8.c:195:1: note:
non-delegitimized UNSPEC UNSPEC_GOTOFF (1) found in variable location
../../../../gcc/libgfortran/generated/sum_r8.c:195:1: note:
non-delegitimized UNSPEC UNSPEC_GOTOFF (1) found in variable location


Ilya


Re: Enable EBX for x86 in 32bits PIC code

2014-09-24 Thread Jeff Law

On 09/24/14 00:56, Ilya Enkovich wrote:

2014-09-23 20:10 GMT+04:00 Jeff Law l...@redhat.com:

On 09/23/14 10:03, Jakub Jelinek wrote:


On Tue, Sep 23, 2014 at 10:00:00AM -0600, Jeff Law wrote:


On 09/23/14 08:34, Jakub Jelinek wrote:


On Tue, Sep 23, 2014 at 05:54:37PM +0400, Ilya Enkovich wrote:


use fixed EBX at least until we make sure pseudo PIC doesn't harm debug
info generation.  If we have such option then gcc.target/i386/pic-1.c
and



For debug info, it seems you are already handling this in
delegitimize_address target hook, I'd suggest just building some very
large
shared library at -O2 -g -fpic on i?86 and either look at the
sizes of .debug_info/.debug_loc sections with/without the patch,
or use the locstat utility from elfutils (talk to Petr Machata if
needed).


Can't hurt, but I really don't see how changing from a fixed to an
allocatable register is going to muck up debug info in any significant
way.



What matters is if the delegitimize_address target hook is as efficient in
delegitimization as before.  E.g. if it previously matched only when
seeing
%ebx + gotoff or similar, and wouldn't match anything now, some vars could
have debug locations including UNSPEC and be dropped on the floor.


Ah, yea, that makes sense.

jeff



After register allocation we have no idea where GOT address is and
therefore delegitimize_address target hook becomes less efficient and
cannot remove UNSPECs. That's what I see now when build GCC with patch
applied:

In theory this shouldn't be too hard to fix.

I haven't looked at the code, but it might be something looking 
explicitly for ebx by register #, or something similar.  Which case 
within delegitimize_address isn't firing as it should after your changes?


jeff



Re: Enable EBX for x86 in 32bits PIC code

2014-09-24 Thread Ilya Enkovich
2014-09-24 19:27 GMT+04:00 Jeff Law l...@redhat.com:
 On 09/24/14 00:56, Ilya Enkovich wrote:

 2014-09-23 20:10 GMT+04:00 Jeff Law l...@redhat.com:

 On 09/23/14 10:03, Jakub Jelinek wrote:


 On Tue, Sep 23, 2014 at 10:00:00AM -0600, Jeff Law wrote:


 On 09/23/14 08:34, Jakub Jelinek wrote:


 On Tue, Sep 23, 2014 at 05:54:37PM +0400, Ilya Enkovich wrote:


 use fixed EBX at least until we make sure pseudo PIC doesn't harm
 debug
 info generation.  If we have such option then gcc.target/i386/pic-1.c
 and



 For debug info, it seems you are already handling this in
 delegitimize_address target hook, I'd suggest just building some very
 large
 shared library at -O2 -g -fpic on i?86 and either look at the
 sizes of .debug_info/.debug_loc sections with/without the patch,
 or use the locstat utility from elfutils (talk to Petr Machata if
 needed).


 Can't hurt, but I really don't see how changing from a fixed to an
 allocatable register is going to muck up debug info in any significant
 way.



 What matters is if the delegitimize_address target hook is as efficient
 in
 delegitimization as before.  E.g. if it previously matched only when
 seeing
 %ebx + gotoff or similar, and wouldn't match anything now, some vars
 could
 have debug locations including UNSPEC and be dropped on the floor.


 Ah, yea, that makes sense.

 jeff



 After register allocation we have no idea where GOT address is and
 therefore delegitimize_address target hook becomes less efficient and
 cannot remove UNSPECs. That's what I see now when build GCC with patch
 applied:

 In theory this shouldn't be too hard to fix.

 I haven't looked at the code, but it might be something looking explicitly
 for ebx by register #, or something similar.  Which case within
 delegitimize_address isn't firing as it should after your changes?

It is the case I had to fix:

@@ -14415,7 +14433,8 @@ ix86_delegitimize_address (rtx x)
 ...
 movl foo@GOTOFF(%ecx), %edx
 in which case we return (%ecx - %ebx) + foo.  */
-  if (pic_offset_table_rtx)
+  if (pic_offset_table_rtx
+  (!reload_completed || !ix86_use_pseudo_pic_reg ()))
 result = gen_rtx_PLUS (Pmode, gen_rtx_MINUS (Pmode, copy_rtx (addend),
 pic_offset_table_rtx),
   result);

Originally if there is a UNSPEC_GOTOFFSET but no EBX usage then we
just remove this UNSPEC and substract EBX value.  With pseudo PIC reg
we should use PIC register instead of EBX but it is unclear what to
use after register allocation.

Ilya


 jeff



Re: Enable EBX for x86 in 32bits PIC code

2014-09-24 Thread Jeff Law

On 09/24/14 14:32, Ilya Enkovich wrote:

2014-09-24 19:27 GMT+04:00 Jeff Law l...@redhat.com:

On 09/24/14 00:56, Ilya Enkovich wrote:




After register allocation we have no idea where GOT address is and
therefore delegitimize_address target hook becomes less efficient and
cannot remove UNSPECs. That's what I see now when build GCC with patch
applied:


In theory this shouldn't be too hard to fix.

I haven't looked at the code, but it might be something looking explicitly
for ebx by register #, or something similar.  Which case within
delegitimize_address isn't firing as it should after your changes?


It is the case I had to fix:

@@ -14415,7 +14433,8 @@ ix86_delegitimize_address (rtx x)
  ...
  movl foo@GOTOFF(%ecx), %edx
  in which case we return (%ecx - %ebx) + foo.  */
-  if (pic_offset_table_rtx)
+  if (pic_offset_table_rtx
+  (!reload_completed || !ix86_use_pseudo_pic_reg ()))
  result = gen_rtx_PLUS (Pmode, gen_rtx_MINUS (Pmode, copy_rtx (addend),
  pic_offset_table_rtx),
result);

Originally if there is a UNSPEC_GOTOFFSET but no EBX usage then we
just remove this UNSPEC and substract EBX value.  With pseudo PIC reg
we should use PIC register instead of EBX but it is unclear what to
use after register allocation.
What's the RTL before  after allocation?  Feel free to just pass along 
the dump files for sum_r4 that you referenced in a prior message.


jeff


Re: Enable EBX for x86 in 32bits PIC code

2014-09-23 Thread Ilya Enkovich
On 03 Sep 16:19, Vladimir Makarov wrote:
 On 2014-08-29 2:47 AM, Ilya Enkovich wrote:
 Seems your patch doesn't cover all cases.  Attached is a modified
 patch (with your changes included) and a test where double constant is
 wrongly rematerialized.  I also see in ira dump that there is still a
 copy of PIC reg created:
 
 Initialization of original PIC reg:
 (insn 23 22 24 2 (set (reg:SI 127)
  (reg:SI 3 bx)) test.cc:42 90 {*movsi_internal}
   (expr_list:REG_DEAD (reg:SI 3 bx)
  (nil)))
 ...
 Copy is created:
 (insn 135 37 25 3 (set (reg:SI 138 [127])
  (reg:SI 127)) 90 {*movsi_internal}
   (expr_list:REG_DEAD (reg:SI 127)
  (nil)))
 ...
 Copy is used:
 (insn 119 25 122 3 (set (reg:DF 134)
  (mem/u/c:DF (plus:SI (reg:SI 138 [127])
  (const:SI (unspec:SI [
  (symbol_ref/u:SI (*.LC0) [flags 0x2])
  ] UNSPEC_GOTOFF))) [5  S8 A64])) 128 
  {*movdf_internal}
   (expr_list:REG_EQUIV (const_double:DF
 2.9997371893933895137251965934410691261292e-4
 [0x0.9d495182a99308p-11])
  (nil)))
 
 
 The copy is created by a newer IRA optimization for function prologues.
 
 The patch in the attachment should solve the problem.  I also added
 the code to prevent spilling the pic pseudo in LRA which could
 happen before theoretically.
 
 
 After reload we have new usage of r127 which is allocated to ecx which
 actually does not have any definition in this function at all.
 
 (insn 151 42 44 4 (set (reg:SI 0 ax [147])
  (plus:SI (reg:SI 2 cx [127])
  (const:SI (unspec:SI [
  (symbol_ref/u:SI (*.LC0) [flags 0x2])
  ] UNSPEC_GOTOFF test.cc:44 213 {*leasi}
   (expr_list:REG_EQUAL (symbol_ref/u:SI (*.LC0) [flags 0x2])
  (nil)))
 (insn 44 151 45 4 (set (reg:DF 21 xmm0 [orig:129 D.2450 ] [129])
  (mult:DF (reg:DF 21 xmm0 [orig:128 D.2450 ] [128])
  (mem/u/c:DF (reg:SI 0 ax [147]) [5  S8 A64]))) test.cc:44
 790 {*fop_df_comm_sse}
   (expr_list:REG_EQUAL (mult:DF (reg:DF 21 xmm0 [orig:128 D.2450 ] [128])
  (const_double:DF
 2.9997371893933895137251965934410691261292e-4
 [0x0.9d495182a99308p-11]))
  (nil)))
 
 Compilation string: g++ -m32 -O2 -mfpmath=sse -fPIE -S test.cc
 

 Index: ira.c
 ===
 --- ira.c (revision 214576)
 +++ ira.c (working copy)
 @@ -4887,7 +4887,7 @@ split_live_ranges_for_shrink_wrap (void)
FOR_BB_INSNS (first, insn)
  {
rtx dest = interesting_dest_for_shprep (insn, call_dom);
 -  if (!dest)
 +  if (!dest || dest == pic_offset_table_rtx)
   continue;
  
rtx newreg = NULL_RTX;
 Index: lra-assigns.c
 ===
 --- lra-assigns.c (revision 214576)
 +++ lra-assigns.c (working copy)
 @@ -879,11 +879,13 @@ spill_for (int regno, bitmap spilled_pse
   }
/* Spill pseudos.   */
EXECUTE_IF_SET_IN_BITMAP (spill_pseudos_bitmap, 0, spill_regno, bi)
 - if ((int) spill_regno = lra_constraint_new_regno_start
 -  ! bitmap_bit_p (lra_inheritance_pseudos, spill_regno)
 -  ! bitmap_bit_p (lra_split_regs, spill_regno)
 -  ! bitmap_bit_p (lra_subreg_reload_pseudos, spill_regno)
 -  ! bitmap_bit_p (lra_optional_reload_pseudos, spill_regno))
 + if ((pic_offset_table_rtx != NULL
 +   spill_regno == REGNO (pic_offset_table_rtx))
 + || ((int) spill_regno = lra_constraint_new_regno_start
 +  ! bitmap_bit_p (lra_inheritance_pseudos, spill_regno)
 +  ! bitmap_bit_p (lra_split_regs, spill_regno)
 +  ! bitmap_bit_p (lra_subreg_reload_pseudos, spill_regno)
 +  ! bitmap_bit_p (lra_optional_reload_pseudos, spill_regno)))
 goto fail;
insn_pseudos_num = 0;
if (lra_dump_file != NULL)
 @@ -1053,7 +1055,9 @@ setup_live_pseudos_and_spill_after_risky
return;
  }
for (n = 0, i = FIRST_PSEUDO_REGISTER; i  max_regno; i++)
 -if (reg_renumber[i] = 0  lra_reg_info[i].nrefs  0)
 +if ((pic_offset_table_rtx == NULL_RTX
 +  || i != (int) REGNO (pic_offset_table_rtx))
 +  reg_renumber[i] = 0  lra_reg_info[i].nrefs  0)
sorted_pseudos[n++] = i;
qsort (sorted_pseudos, n, sizeof (int), pseudo_compare_func);
for (i = n - 1; i = 0; i--)
 @@ -1360,6 +1364,8 @@ assign_by_spills (void)
   }
EXECUTE_IF_SET_IN_SPARSESET (live_range_hard_reg_pseudos, 
 conflict_regno)
   {
 +   gcc_assert (pic_offset_table_rtx == NULL
 +   || conflict_regno != REGNO (pic_offset_table_rtx));
 if ((int) conflict_regno = lra_constraint_new_regno_start)
   sorted_pseudos[nfails++] = conflict_regno;
 if (lra_dump_file != NULL)

Hi,

Here is a patch which combines results of my and Vladimir's work on EBX 

Re: Enable EBX for x86 in 32bits PIC code

2014-09-23 Thread Uros Bizjak
On Tue, Sep 23, 2014 at 3:54 PM, Ilya Enkovich enkovich@gmail.com wrote:

 Here is a patch which combines results of my and Vladimir's work on EBX 
 enabling.

 It works OK for SPEC2000 and SPEC2006 on -Ofast + LTO.  It passes bootstrap 
 but there are few new failures in make check.

 gcc.target/i386/pic-1.c fails because it doesn't expect we can use EBX in 
 32bit PIC mode
 gcc.target/i386/pr55458.c fails due to the same reason
 gcc.target/i386/pr23098.c fails because compiler fails to use float constant 
 as an immediate and loads it from GOT instead

 Do we have the final decision about having a sompiler flag to control 
 enabling of pseudo PIC register?  I think we should have a possibility to use 
 fixed EBX at least until we make sure pseudo PIC doesn't harm debug info 
 generation. If we have such option then gcc.target/i386/pic-1.c and 
 gcc.target/i386/pr55458.c should be modified, otherwise these tests should be 
 removed.

I think having this flag would be dangerous. In effect, this flag
would be a hidden -ffixed-bx, with unwanted consequences on asm code
that handles ebx. As an example, please see config/i386/cpuid.h - ATM,
we handle ebx in a special way when __PIC__ is defined. With your
patch, we will have to handle it in a special way when new flag is in
effect, which is impossible, unless another compiler-generated define
is emitted.

So, I vote to change PIC reg to a pseudo unconditionally and adjust
testsuite for all (expected) fall-out.

Uros.


Re: Enable EBX for x86 in 32bits PIC code

2014-09-23 Thread Jakub Jelinek
On Tue, Sep 23, 2014 at 05:54:37PM +0400, Ilya Enkovich wrote:
 use fixed EBX at least until we make sure pseudo PIC doesn't harm debug
 info generation.  If we have such option then gcc.target/i386/pic-1.c and

For debug info, it seems you are already handling this in
delegitimize_address target hook, I'd suggest just building some very large
shared library at -O2 -g -fpic on i?86 and either look at the
sizes of .debug_info/.debug_loc sections with/without the patch,
or use the locstat utility from elfutils (talk to Petr Machata if needed).

Jakub


Re: Enable EBX for x86 in 32bits PIC code

2014-09-23 Thread Jeff Law

On 09/23/14 08:23, Uros Bizjak wrote:

On Tue, Sep 23, 2014 at 3:54 PM, Ilya Enkovich enkovich@gmail.com wrote:


Here is a patch which combines results of my and Vladimir's work on EBX 
enabling.

It works OK for SPEC2000 and SPEC2006 on -Ofast + LTO.  It passes bootstrap but 
there are few new failures in make check.

gcc.target/i386/pic-1.c fails because it doesn't expect we can use EBX in 32bit 
PIC mode
gcc.target/i386/pr55458.c fails due to the same reason
gcc.target/i386/pr23098.c fails because compiler fails to use float constant as 
an immediate and loads it from GOT instead

Do we have the final decision about having a sompiler flag to control enabling 
of pseudo PIC register?  I think we should have a possibility to use fixed EBX 
at least until we make sure pseudo PIC doesn't harm debug info generation. If 
we have such option then gcc.target/i386/pic-1.c and gcc.target/i386/pr55458.c 
should be modified, otherwise these tests should be removed.


I think having this flag would be dangerous. In effect, this flag
would be a hidden -ffixed-bx, with unwanted consequences on asm code
that handles ebx. As an example, please see config/i386/cpuid.h - ATM,
we handle ebx in a special way when __PIC__ is defined. With your
patch, we will have to handle it in a special way when new flag is in
effect, which is impossible, unless another compiler-generated define
is emitted.

So, I vote to change PIC reg to a pseudo unconditionally and adjust
testsuite for all (expected) fall-out.
Agreed.  Continuing to support both modes just seems like a maintenance 
nightmare and asking for problems at some point.  If there's performance 
regressions, we just tackle them :-)


I suspect any performance regressions we find are going to point us at 
issues in IRA/LRA that we would want to look at anyway.


jeff


Re: Enable EBX for x86 in 32bits PIC code

2014-09-23 Thread Petr Machata
Jakub Jelinek ja...@redhat.com writes:

 look at the sizes of .debug_info/.debug_loc sections with/without the
 patch, or use the locstat utility from elfutils

Not actually part of elfutils, but available either here:
https://github.com/pmachata/dwlocstat

... or packaged in Fedora.

Thanks,
PM


Re: Enable EBX for x86 in 32bits PIC code

2014-09-23 Thread Jakub Jelinek
On Tue, Sep 23, 2014 at 10:00:00AM -0600, Jeff Law wrote:
 On 09/23/14 08:34, Jakub Jelinek wrote:
 On Tue, Sep 23, 2014 at 05:54:37PM +0400, Ilya Enkovich wrote:
 use fixed EBX at least until we make sure pseudo PIC doesn't harm debug
 info generation.  If we have such option then gcc.target/i386/pic-1.c and
 
 For debug info, it seems you are already handling this in
 delegitimize_address target hook, I'd suggest just building some very large
 shared library at -O2 -g -fpic on i?86 and either look at the
 sizes of .debug_info/.debug_loc sections with/without the patch,
 or use the locstat utility from elfutils (talk to Petr Machata if needed).
 Can't hurt, but I really don't see how changing from a fixed to an
 allocatable register is going to muck up debug info in any significant way.

What matters is if the delegitimize_address target hook is as efficient in
delegitimization as before.  E.g. if it previously matched only when seeing
%ebx + gotoff or similar, and wouldn't match anything now, some vars could
have debug locations including UNSPEC and be dropped on the floor.

Jakub


Re: Enable EBX for x86 in 32bits PIC code

2014-09-23 Thread Jeff Law

On 09/23/14 10:03, Jakub Jelinek wrote:

On Tue, Sep 23, 2014 at 10:00:00AM -0600, Jeff Law wrote:

On 09/23/14 08:34, Jakub Jelinek wrote:

On Tue, Sep 23, 2014 at 05:54:37PM +0400, Ilya Enkovich wrote:

use fixed EBX at least until we make sure pseudo PIC doesn't harm debug
info generation.  If we have such option then gcc.target/i386/pic-1.c and


For debug info, it seems you are already handling this in
delegitimize_address target hook, I'd suggest just building some very large
shared library at -O2 -g -fpic on i?86 and either look at the
sizes of .debug_info/.debug_loc sections with/without the patch,
or use the locstat utility from elfutils (talk to Petr Machata if needed).

Can't hurt, but I really don't see how changing from a fixed to an
allocatable register is going to muck up debug info in any significant way.


What matters is if the delegitimize_address target hook is as efficient in
delegitimization as before.  E.g. if it previously matched only when seeing
%ebx + gotoff or similar, and wouldn't match anything now, some vars could
have debug locations including UNSPEC and be dropped on the floor.

Ah, yea, that makes sense.

jeff


Re: Enable EBX for x86 in 32bits PIC code

2014-09-23 Thread Jeff Law

On 09/23/14 08:34, Jakub Jelinek wrote:

On Tue, Sep 23, 2014 at 05:54:37PM +0400, Ilya Enkovich wrote:

use fixed EBX at least until we make sure pseudo PIC doesn't harm debug
info generation.  If we have such option then gcc.target/i386/pic-1.c and


For debug info, it seems you are already handling this in
delegitimize_address target hook, I'd suggest just building some very large
shared library at -O2 -g -fpic on i?86 and either look at the
sizes of .debug_info/.debug_loc sections with/without the patch,
or use the locstat utility from elfutils (talk to Petr Machata if needed).
Can't hurt, but I really don't see how changing from a fixed to an 
allocatable register is going to muck up debug info in any significant 
way.


jeff


Re: Enable EBX for x86 in 32bits PIC code

2014-09-23 Thread Ilya Enkovich
On 03 Sep 16:19, Vladimir Makarov wrote:
 On 2014-08-29 2:47 AM, Ilya Enkovich wrote:
 Seems your patch doesn't cover all cases.  Attached is a modified
 patch (with your changes included) and a test where double constant is
 wrongly rematerialized.  I also see in ira dump that there is still a
 copy of PIC reg created:
 
 Initialization of original PIC reg:
 (insn 23 22 24 2 (set (reg:SI 127)
  (reg:SI 3 bx)) test.cc:42 90 {*movsi_internal}
   (expr_list:REG_DEAD (reg:SI 3 bx)
  (nil)))
 ...
 Copy is created:
 (insn 135 37 25 3 (set (reg:SI 138 [127])
  (reg:SI 127)) 90 {*movsi_internal}
   (expr_list:REG_DEAD (reg:SI 127)
  (nil)))
 ...
 Copy is used:
 (insn 119 25 122 3 (set (reg:DF 134)
  (mem/u/c:DF (plus:SI (reg:SI 138 [127])
  (const:SI (unspec:SI [
  (symbol_ref/u:SI (*.LC0) [flags 0x2])
  ] UNSPEC_GOTOFF))) [5  S8 A64])) 128 
  {*movdf_internal}
   (expr_list:REG_EQUIV (const_double:DF
 2.9997371893933895137251965934410691261292e-4
 [0x0.9d495182a99308p-11])
  (nil)))
 
 
 The copy is created by a newer IRA optimization for function prologues.
 
 The patch in the attachment should solve the problem.  I also added
 the code to prevent spilling the pic pseudo in LRA which could
 happen before theoretically.
 
 
 After reload we have new usage of r127 which is allocated to ecx which
 actually does not have any definition in this function at all.
 
 (insn 151 42 44 4 (set (reg:SI 0 ax [147])
  (plus:SI (reg:SI 2 cx [127])
  (const:SI (unspec:SI [
  (symbol_ref/u:SI (*.LC0) [flags 0x2])
  ] UNSPEC_GOTOFF test.cc:44 213 {*leasi}
   (expr_list:REG_EQUAL (symbol_ref/u:SI (*.LC0) [flags 0x2])
  (nil)))
 (insn 44 151 45 4 (set (reg:DF 21 xmm0 [orig:129 D.2450 ] [129])
  (mult:DF (reg:DF 21 xmm0 [orig:128 D.2450 ] [128])
  (mem/u/c:DF (reg:SI 0 ax [147]) [5  S8 A64]))) test.cc:44
 790 {*fop_df_comm_sse}
   (expr_list:REG_EQUAL (mult:DF (reg:DF 21 xmm0 [orig:128 D.2450 ] [128])
  (const_double:DF
 2.9997371893933895137251965934410691261292e-4
 [0x0.9d495182a99308p-11]))
  (nil)))
 
 Compilation string: g++ -m32 -O2 -mfpmath=sse -fPIE -S test.cc
 

 Index: ira.c
 ===
 --- ira.c (revision 214576)
 +++ ira.c (working copy)
 @@ -4887,7 +4887,7 @@ split_live_ranges_for_shrink_wrap (void)
FOR_BB_INSNS (first, insn)
  {
rtx dest = interesting_dest_for_shprep (insn, call_dom);
 -  if (!dest)
 +  if (!dest || dest == pic_offset_table_rtx)
   continue;
  
rtx newreg = NULL_RTX;
 Index: lra-assigns.c
 ===
 --- lra-assigns.c (revision 214576)
 +++ lra-assigns.c (working copy)
 @@ -879,11 +879,13 @@ spill_for (int regno, bitmap spilled_pse
   }
/* Spill pseudos.   */
EXECUTE_IF_SET_IN_BITMAP (spill_pseudos_bitmap, 0, spill_regno, bi)
 - if ((int) spill_regno = lra_constraint_new_regno_start
 -  ! bitmap_bit_p (lra_inheritance_pseudos, spill_regno)
 -  ! bitmap_bit_p (lra_split_regs, spill_regno)
 -  ! bitmap_bit_p (lra_subreg_reload_pseudos, spill_regno)
 -  ! bitmap_bit_p (lra_optional_reload_pseudos, spill_regno))
 + if ((pic_offset_table_rtx != NULL
 +   spill_regno == REGNO (pic_offset_table_rtx))
 + || ((int) spill_regno = lra_constraint_new_regno_start
 +  ! bitmap_bit_p (lra_inheritance_pseudos, spill_regno)
 +  ! bitmap_bit_p (lra_split_regs, spill_regno)
 +  ! bitmap_bit_p (lra_subreg_reload_pseudos, spill_regno)
 +  ! bitmap_bit_p (lra_optional_reload_pseudos, spill_regno)))
 goto fail;
insn_pseudos_num = 0;
if (lra_dump_file != NULL)
 @@ -1053,7 +1055,9 @@ setup_live_pseudos_and_spill_after_risky
return;
  }
for (n = 0, i = FIRST_PSEUDO_REGISTER; i  max_regno; i++)
 -if (reg_renumber[i] = 0  lra_reg_info[i].nrefs  0)
 +if ((pic_offset_table_rtx == NULL_RTX
 +  || i != (int) REGNO (pic_offset_table_rtx))
 +  reg_renumber[i] = 0  lra_reg_info[i].nrefs  0)
sorted_pseudos[n++] = i;
qsort (sorted_pseudos, n, sizeof (int), pseudo_compare_func);
for (i = n - 1; i = 0; i--)
 @@ -1360,6 +1364,8 @@ assign_by_spills (void)
   }
EXECUTE_IF_SET_IN_SPARSESET (live_range_hard_reg_pseudos, 
 conflict_regno)
   {
 +   gcc_assert (pic_offset_table_rtx == NULL
 +   || conflict_regno != REGNO (pic_offset_table_rtx));
 if ((int) conflict_regno = lra_constraint_new_regno_start)
   sorted_pseudos[nfails++] = conflict_regno;
 if (lra_dump_file != NULL)

Hi,

Here is a patch which combines results of my and Vladimir's work on EBX 

Re: Enable EBX for x86 in 32bits PIC code

2014-09-23 Thread Uros Bizjak
On Tue, Sep 23, 2014 at 3:54 PM, Ilya Enkovich enkovich@gmail.com wrote:

 Here is a patch which combines results of my and Vladimir's work on EBX 
 enabling.

 It works OK for SPEC2000 and SPEC2006 on -Ofast + LTO.  It passes bootstrap 
 but there are few new failures in make check.

 gcc.target/i386/pic-1.c fails because it doesn't expect we can use EBX in 
 32bit PIC mode
 gcc.target/i386/pr55458.c fails due to the same reason
 gcc.target/i386/pr23098.c fails because compiler fails to use float constant 
 as an immediate and loads it from GOT instead

 Do we have the final decision about having a sompiler flag to control 
 enabling of pseudo PIC register?  I think we should have a possibility to use 
 fixed EBX at least until we make sure pseudo PIC doesn't harm debug info 
 generation. If we have such option then gcc.target/i386/pic-1.c and 
 gcc.target/i386/pr55458.c should be modified, otherwise these tests should be 
 removed.

I think having this flag would be dangerous. In effect, this flag
would be a hidden -ffixed-bx, with unwanted consequences on asm code
that handles ebx. As an example, please see config/i386/cpuid.h - ATM,
we handle ebx in a special way when __PIC__ is defined. With your
patch, we will have to handle it in a special way when new flag is in
effect, which is impossible, unless another compiler-generated define
is emitted.

So, I vote to change PIC reg to a pseudo unconditionally and adjust
testsuite for all (expected) fall-out.

Uros.


Re: Enable EBX for x86 in 32bits PIC code

2014-09-23 Thread Jakub Jelinek
On Tue, Sep 23, 2014 at 05:54:37PM +0400, Ilya Enkovich wrote:
 use fixed EBX at least until we make sure pseudo PIC doesn't harm debug
 info generation.  If we have such option then gcc.target/i386/pic-1.c and

For debug info, it seems you are already handling this in
delegitimize_address target hook, I'd suggest just building some very large
shared library at -O2 -g -fpic on i?86 and either look at the
sizes of .debug_info/.debug_loc sections with/without the patch,
or use the locstat utility from elfutils (talk to Petr Machata if needed).

Jakub


Re: Enable EBX for x86 in 32bits PIC code

2014-09-23 Thread Jeff Law

On 09/23/14 08:23, Uros Bizjak wrote:

On Tue, Sep 23, 2014 at 3:54 PM, Ilya Enkovich enkovich@gmail.com wrote:


Here is a patch which combines results of my and Vladimir's work on EBX 
enabling.

It works OK for SPEC2000 and SPEC2006 on -Ofast + LTO.  It passes bootstrap but 
there are few new failures in make check.

gcc.target/i386/pic-1.c fails because it doesn't expect we can use EBX in 32bit 
PIC mode
gcc.target/i386/pr55458.c fails due to the same reason
gcc.target/i386/pr23098.c fails because compiler fails to use float constant as 
an immediate and loads it from GOT instead

Do we have the final decision about having a sompiler flag to control enabling 
of pseudo PIC register?  I think we should have a possibility to use fixed EBX 
at least until we make sure pseudo PIC doesn't harm debug info generation. If 
we have such option then gcc.target/i386/pic-1.c and gcc.target/i386/pr55458.c 
should be modified, otherwise these tests should be removed.


I think having this flag would be dangerous. In effect, this flag
would be a hidden -ffixed-bx, with unwanted consequences on asm code
that handles ebx. As an example, please see config/i386/cpuid.h - ATM,
we handle ebx in a special way when __PIC__ is defined. With your
patch, we will have to handle it in a special way when new flag is in
effect, which is impossible, unless another compiler-generated define
is emitted.

So, I vote to change PIC reg to a pseudo unconditionally and adjust
testsuite for all (expected) fall-out.
Agreed.  Continuing to support both modes just seems like a maintenance 
nightmare and asking for problems at some point.  If there's performance 
regressions, we just tackle them :-)


I suspect any performance regressions we find are going to point us at 
issues in IRA/LRA that we would want to look at anyway.


jeff


Re: Enable EBX for x86 in 32bits PIC code

2014-09-23 Thread Petr Machata
Jakub Jelinek ja...@redhat.com writes:

 look at the sizes of .debug_info/.debug_loc sections with/without the
 patch, or use the locstat utility from elfutils

Not actually part of elfutils, but available either here:
https://github.com/pmachata/dwlocstat

... or packaged in Fedora.

Thanks,
PM


Re: Enable EBX for x86 in 32bits PIC code

2014-09-23 Thread Jeff Law

On 09/23/14 08:34, Jakub Jelinek wrote:

On Tue, Sep 23, 2014 at 05:54:37PM +0400, Ilya Enkovich wrote:

use fixed EBX at least until we make sure pseudo PIC doesn't harm debug
info generation.  If we have such option then gcc.target/i386/pic-1.c and


For debug info, it seems you are already handling this in
delegitimize_address target hook, I'd suggest just building some very large
shared library at -O2 -g -fpic on i?86 and either look at the
sizes of .debug_info/.debug_loc sections with/without the patch,
or use the locstat utility from elfutils (talk to Petr Machata if needed).
Can't hurt, but I really don't see how changing from a fixed to an 
allocatable register is going to muck up debug info in any significant 
way.


jeff


Re: Enable EBX for x86 in 32bits PIC code

2014-09-23 Thread Jakub Jelinek
On Tue, Sep 23, 2014 at 10:00:00AM -0600, Jeff Law wrote:
 On 09/23/14 08:34, Jakub Jelinek wrote:
 On Tue, Sep 23, 2014 at 05:54:37PM +0400, Ilya Enkovich wrote:
 use fixed EBX at least until we make sure pseudo PIC doesn't harm debug
 info generation.  If we have such option then gcc.target/i386/pic-1.c and
 
 For debug info, it seems you are already handling this in
 delegitimize_address target hook, I'd suggest just building some very large
 shared library at -O2 -g -fpic on i?86 and either look at the
 sizes of .debug_info/.debug_loc sections with/without the patch,
 or use the locstat utility from elfutils (talk to Petr Machata if needed).
 Can't hurt, but I really don't see how changing from a fixed to an
 allocatable register is going to muck up debug info in any significant way.

What matters is if the delegitimize_address target hook is as efficient in
delegitimization as before.  E.g. if it previously matched only when seeing
%ebx + gotoff or similar, and wouldn't match anything now, some vars could
have debug locations including UNSPEC and be dropped on the floor.

Jakub


Re: Enable EBX for x86 in 32bits PIC code

2014-09-23 Thread Jeff Law

On 09/23/14 10:03, Jakub Jelinek wrote:

On Tue, Sep 23, 2014 at 10:00:00AM -0600, Jeff Law wrote:

On 09/23/14 08:34, Jakub Jelinek wrote:

On Tue, Sep 23, 2014 at 05:54:37PM +0400, Ilya Enkovich wrote:

use fixed EBX at least until we make sure pseudo PIC doesn't harm debug
info generation.  If we have such option then gcc.target/i386/pic-1.c and


For debug info, it seems you are already handling this in
delegitimize_address target hook, I'd suggest just building some very large
shared library at -O2 -g -fpic on i?86 and either look at the
sizes of .debug_info/.debug_loc sections with/without the patch,
or use the locstat utility from elfutils (talk to Petr Machata if needed).

Can't hurt, but I really don't see how changing from a fixed to an
allocatable register is going to muck up debug info in any significant way.


What matters is if the delegitimize_address target hook is as efficient in
delegitimization as before.  E.g. if it previously matched only when seeing
%ebx + gotoff or similar, and wouldn't match anything now, some vars could
have debug locations including UNSPEC and be dropped on the floor.

Ah, yea, that makes sense.

jeff


Re: Enable EBX for x86 in 32bits PIC code

2014-09-11 Thread Jeff Law

On 09/09/14 10:43, Vladimir Makarov wrote:


I've investigated the wrong code generation.  I did a mistake in my last
patch excluding pic pseudo from live-range analysis when risky
transformations are on.

Here is the right version of all IRA/LRA changes relative to trunk.  I
managed to compile and run successfully all 32-bit PIC SPECInt2000
programs with these changes.

Thanks Vlad.  I'll leave final testing and committing these patches to you.

I did play around with a simpler patch to un-fix the PIC register for 
32bit x86 -- without turning it into a pseudo.  The problem with that 
intermediate step is that when we spill values with symbolic 
equivalences, we fail to handle the new conflicts that will generate. 
ie, in the case of PIC reloading those values should cause the unfixed 
hard pic register to conflict with any live pseudos at the reload point. 
 But it doesn't :(  As a result we end up using the unfixed hard pic 
reg to satsify a reload and, well, you know what happens then.


jeff


Re: Enable EBX for x86 in 32bits PIC code

2014-09-09 Thread Vladimir Makarov
On 09/04/2014 10:30 AM, Zamyatin, Igor wrote:

 -Original Message-
 From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-
 ow...@gcc.gnu.org] On Behalf Of Vladimir Makarov
 Sent: Thursday, September 04, 2014 12:19 AM
 To: Ilya Enkovich
 Cc: g...@gnu.org; gcc-patches; Evgeny Stupachenko; Richard Biener; Uros
 Bizjak; Jeff Law
 Subject: Re: Enable EBX for x86 in 32bits PIC code

 On 2014-08-29 2:47 AM, Ilya Enkovich wrote:
 Seems your patch doesn't cover all cases.  Attached is a modified
 patch (with your changes included) and a test where double constant is
 wrongly rematerialized.  I also see in ira dump that there is still a
 copy of PIC reg created:

 Initialization of original PIC reg:
 (insn 23 22 24 2 (set (reg:SI 127)
  (reg:SI 3 bx)) test.cc:42 90 {*movsi_internal}
   (expr_list:REG_DEAD (reg:SI 3 bx)
  (nil)))
 ...
 Copy is created:
 (insn 135 37 25 3 (set (reg:SI 138 [127])
  (reg:SI 127)) 90 {*movsi_internal}
   (expr_list:REG_DEAD (reg:SI 127)
  (nil)))
 ...
 Copy is used:
 (insn 119 25 122 3 (set (reg:DF 134)
  (mem/u/c:DF (plus:SI (reg:SI 138 [127])
  (const:SI (unspec:SI [
  (symbol_ref/u:SI (*.LC0) [flags 0x2])
  ] UNSPEC_GOTOFF))) [5  S8 A64])) 128 
 {*movdf_internal}
   (expr_list:REG_EQUIV (const_double:DF
 2.9997371893933895137251965934410691261292e-4
 [0x0.9d495182a99308p-11])
  (nil)))

 The copy is created by a newer IRA optimization for function prologues.

 The patch in the attachment should solve the problem.  I also added the code
 to prevent spilling the pic pseudo in LRA which could happen before
 theoretically.
 Hi, Vladimir!

 I applied patch as an addition to your previous patch (was I right?) and 
 unfortunately got all spec2000 tests failed at the runtime (segfault)

 Looking at 164.gzip I saw following code in spec_init


 4bc0 spec_init:
 4bc0:   55  push   %ebp
 4bc1:   57  push   %edi
 4bc2:   56  push   %esi
 4bc3:   e8 58 c6 ff ff  call   1220 __x86.get_pc_thunk.si
 4bc8:   81 c6 38 84 00 00   add$0x8438,%esi
 4bce:   53  push   %ebx
 4bcf:   8d 64 24 e4 lea-0x1c(%esp),%esp
 4bd3:   83 be a0 03 00 00 03cmpl   $0x3,0x3a0(%esi)
 4bda:   7f 67   jg 4c43 spec_init+0x83
 4bdc:   8d ae 40 12 05 00   lea0x51240(%esi),%ebp
 4be2:   8d 45 30lea0x30(%ebp),%eax
 4be5:   89 c6   mov%eax,%esi  
 incorrect move, GOT value is now lost (here was mov
 %eax,0x1c(%esp) before this additional patch)
 4be7:   8b 7d 00mov0x0(%ebp),%edi
 4bea:   89 f3   mov%esi,%ebx  
    now ebx contains incorrect value so call to malloc will be 
 executed wrongly
 4bec:   c7 45 04 00 00 00 00movl   $0x0,0x4(%ebp)
 4bf3:   c7 45 08 00 00 00 00movl   $0x0,0x8(%ebp)
 4bfa:   c7 45 0c 00 00 00 00movl   $0x0,0xc(%ebp)
 4c01:   8d 87 00 90 01 00   lea0x19000(%edi),%eax
 4c07:   89 04 24mov%eax,(%esp)
 4c0a:   e8 a1 be ff ff  call   ab0 malloc@plt


I've investigated the wrong code generation.  I did a mistake in my last
patch excluding pic pseudo from live-range analysis when risky
transformations are on.

Here is the right version of all IRA/LRA changes relative to trunk.  I
managed to compile and run successfully all 32-bit PIC SPECInt2000
programs with these changes.

Index: ira-color.c
===
--- ira-color.c (revision 214576)
+++ ira-color.c (working copy)
@@ -3239,9 +3239,11 @@
  ira_assert (ALLOCNO_CLASS (subloop_allocno) == rclass);
  ira_assert (bitmap_bit_p (subloop_node-all_allocnos,
ALLOCNO_NUM (subloop_allocno)));
- if ((flag_ira_region == IRA_REGION_MIXED)
-  (loop_tree_node-reg_pressure[pclass]
- = ira_class_hard_regs_num[pclass]))
+ if ((flag_ira_region == IRA_REGION_MIXED
+   (loop_tree_node-reg_pressure[pclass]
+  = ira_class_hard_regs_num[pclass]))
+ || (pic_offset_table_rtx != NULL
+  regno == (int) REGNO (pic_offset_table_rtx)))
{
  if (! ALLOCNO_ASSIGNED_P (subloop_allocno))
{
Index: ira-emit.c
===
--- ira-emit.c  (revision 214576)
+++ ira-emit.c  (working copy)
@@ -620,7 +620,10 @@
  /* don't create copies because reload can spill an
 allocno set by copy although the allocno

Re: Enable EBX for x86 in 32bits PIC code

2014-09-03 Thread Vladimir Makarov

On 2014-08-29 2:47 AM, Ilya Enkovich wrote:

Seems your patch doesn't cover all cases.  Attached is a modified
patch (with your changes included) and a test where double constant is
wrongly rematerialized.  I also see in ira dump that there is still a
copy of PIC reg created:

Initialization of original PIC reg:
(insn 23 22 24 2 (set (reg:SI 127)
 (reg:SI 3 bx)) test.cc:42 90 {*movsi_internal}
  (expr_list:REG_DEAD (reg:SI 3 bx)
 (nil)))
...
Copy is created:
(insn 135 37 25 3 (set (reg:SI 138 [127])
 (reg:SI 127)) 90 {*movsi_internal}
  (expr_list:REG_DEAD (reg:SI 127)
 (nil)))
...
Copy is used:
(insn 119 25 122 3 (set (reg:DF 134)
 (mem/u/c:DF (plus:SI (reg:SI 138 [127])
 (const:SI (unspec:SI [
 (symbol_ref/u:SI (*.LC0) [flags 0x2])
 ] UNSPEC_GOTOFF))) [5  S8 A64])) 128 {*movdf_internal}
  (expr_list:REG_EQUIV (const_double:DF
2.9997371893933895137251965934410691261292e-4
[0x0.9d495182a99308p-11])
 (nil)))



The copy is created by a newer IRA optimization for function prologues.

The patch in the attachment should solve the problem.  I also added the 
code to prevent spilling the pic pseudo in LRA which could happen before 
theoretically.




After reload we have new usage of r127 which is allocated to ecx which
actually does not have any definition in this function at all.

(insn 151 42 44 4 (set (reg:SI 0 ax [147])
 (plus:SI (reg:SI 2 cx [127])
 (const:SI (unspec:SI [
 (symbol_ref/u:SI (*.LC0) [flags 0x2])
 ] UNSPEC_GOTOFF test.cc:44 213 {*leasi}
  (expr_list:REG_EQUAL (symbol_ref/u:SI (*.LC0) [flags 0x2])
 (nil)))
(insn 44 151 45 4 (set (reg:DF 21 xmm0 [orig:129 D.2450 ] [129])
 (mult:DF (reg:DF 21 xmm0 [orig:128 D.2450 ] [128])
 (mem/u/c:DF (reg:SI 0 ax [147]) [5  S8 A64]))) test.cc:44
790 {*fop_df_comm_sse}
  (expr_list:REG_EQUAL (mult:DF (reg:DF 21 xmm0 [orig:128 D.2450 ] [128])
 (const_double:DF
2.9997371893933895137251965934410691261292e-4
[0x0.9d495182a99308p-11]))
 (nil)))

Compilation string: g++ -m32 -O2 -mfpmath=sse -fPIE -S test.cc


Index: ira.c
===
--- ira.c   (revision 214576)
+++ ira.c   (working copy)
@@ -4887,7 +4887,7 @@ split_live_ranges_for_shrink_wrap (void)
   FOR_BB_INSNS (first, insn)
 {
   rtx dest = interesting_dest_for_shprep (insn, call_dom);
-  if (!dest)
+  if (!dest || dest == pic_offset_table_rtx)
continue;
 
   rtx newreg = NULL_RTX;
Index: lra-assigns.c
===
--- lra-assigns.c   (revision 214576)
+++ lra-assigns.c   (working copy)
@@ -879,11 +879,13 @@ spill_for (int regno, bitmap spilled_pse
}
   /* Spill pseudos. */
   EXECUTE_IF_SET_IN_BITMAP (spill_pseudos_bitmap, 0, spill_regno, bi)
-   if ((int) spill_regno = lra_constraint_new_regno_start
-! bitmap_bit_p (lra_inheritance_pseudos, spill_regno)
-! bitmap_bit_p (lra_split_regs, spill_regno)
-! bitmap_bit_p (lra_subreg_reload_pseudos, spill_regno)
-! bitmap_bit_p (lra_optional_reload_pseudos, spill_regno))
+   if ((pic_offset_table_rtx != NULL
+ spill_regno == REGNO (pic_offset_table_rtx))
+   || ((int) spill_regno = lra_constraint_new_regno_start
+! bitmap_bit_p (lra_inheritance_pseudos, spill_regno)
+! bitmap_bit_p (lra_split_regs, spill_regno)
+! bitmap_bit_p (lra_subreg_reload_pseudos, spill_regno)
+! bitmap_bit_p (lra_optional_reload_pseudos, spill_regno)))
  goto fail;
   insn_pseudos_num = 0;
   if (lra_dump_file != NULL)
@@ -1053,7 +1055,9 @@ setup_live_pseudos_and_spill_after_risky
   return;
 }
   for (n = 0, i = FIRST_PSEUDO_REGISTER; i  max_regno; i++)
-if (reg_renumber[i] = 0  lra_reg_info[i].nrefs  0)
+if ((pic_offset_table_rtx == NULL_RTX
+|| i != (int) REGNO (pic_offset_table_rtx))
+reg_renumber[i] = 0  lra_reg_info[i].nrefs  0)
   sorted_pseudos[n++] = i;
   qsort (sorted_pseudos, n, sizeof (int), pseudo_compare_func);
   for (i = n - 1; i = 0; i--)
@@ -1360,6 +1364,8 @@ assign_by_spills (void)
}
   EXECUTE_IF_SET_IN_SPARSESET (live_range_hard_reg_pseudos, conflict_regno)
{
+ gcc_assert (pic_offset_table_rtx == NULL
+ || conflict_regno != REGNO (pic_offset_table_rtx));
  if ((int) conflict_regno = lra_constraint_new_regno_start)
sorted_pseudos[nfails++] = conflict_regno;
  if (lra_dump_file != NULL)


Re: Enable EBX for x86 in 32bits PIC code

2014-09-03 Thread Vladimir Makarov

On 2014-08-29 2:47 AM, Ilya Enkovich wrote:

Seems your patch doesn't cover all cases.  Attached is a modified
patch (with your changes included) and a test where double constant is
wrongly rematerialized.  I also see in ira dump that there is still a
copy of PIC reg created:

Initialization of original PIC reg:
(insn 23 22 24 2 (set (reg:SI 127)
 (reg:SI 3 bx)) test.cc:42 90 {*movsi_internal}
  (expr_list:REG_DEAD (reg:SI 3 bx)
 (nil)))
...
Copy is created:
(insn 135 37 25 3 (set (reg:SI 138 [127])
 (reg:SI 127)) 90 {*movsi_internal}
  (expr_list:REG_DEAD (reg:SI 127)
 (nil)))
...
Copy is used:
(insn 119 25 122 3 (set (reg:DF 134)
 (mem/u/c:DF (plus:SI (reg:SI 138 [127])
 (const:SI (unspec:SI [
 (symbol_ref/u:SI (*.LC0) [flags 0x2])
 ] UNSPEC_GOTOFF))) [5  S8 A64])) 128 {*movdf_internal}
  (expr_list:REG_EQUIV (const_double:DF
2.9997371893933895137251965934410691261292e-4
[0x0.9d495182a99308p-11])
 (nil)))



The copy is created by a newer IRA optimization for function prologues.

The patch in the attachment should solve the problem.  I also added the 
code to prevent spilling the pic pseudo in LRA which could happen before 
theoretically.




After reload we have new usage of r127 which is allocated to ecx which
actually does not have any definition in this function at all.

(insn 151 42 44 4 (set (reg:SI 0 ax [147])
 (plus:SI (reg:SI 2 cx [127])
 (const:SI (unspec:SI [
 (symbol_ref/u:SI (*.LC0) [flags 0x2])
 ] UNSPEC_GOTOFF test.cc:44 213 {*leasi}
  (expr_list:REG_EQUAL (symbol_ref/u:SI (*.LC0) [flags 0x2])
 (nil)))
(insn 44 151 45 4 (set (reg:DF 21 xmm0 [orig:129 D.2450 ] [129])
 (mult:DF (reg:DF 21 xmm0 [orig:128 D.2450 ] [128])
 (mem/u/c:DF (reg:SI 0 ax [147]) [5  S8 A64]))) test.cc:44
790 {*fop_df_comm_sse}
  (expr_list:REG_EQUAL (mult:DF (reg:DF 21 xmm0 [orig:128 D.2450 ] [128])
 (const_double:DF
2.9997371893933895137251965934410691261292e-4
[0x0.9d495182a99308p-11]))
 (nil)))

Compilation string: g++ -m32 -O2 -mfpmath=sse -fPIE -S test.cc


Index: ira.c
===
--- ira.c   (revision 214576)
+++ ira.c   (working copy)
@@ -4887,7 +4887,7 @@ split_live_ranges_for_shrink_wrap (void)
   FOR_BB_INSNS (first, insn)
 {
   rtx dest = interesting_dest_for_shprep (insn, call_dom);
-  if (!dest)
+  if (!dest || dest == pic_offset_table_rtx)
continue;
 
   rtx newreg = NULL_RTX;
Index: lra-assigns.c
===
--- lra-assigns.c   (revision 214576)
+++ lra-assigns.c   (working copy)
@@ -879,11 +879,13 @@ spill_for (int regno, bitmap spilled_pse
}
   /* Spill pseudos. */
   EXECUTE_IF_SET_IN_BITMAP (spill_pseudos_bitmap, 0, spill_regno, bi)
-   if ((int) spill_regno = lra_constraint_new_regno_start
-! bitmap_bit_p (lra_inheritance_pseudos, spill_regno)
-! bitmap_bit_p (lra_split_regs, spill_regno)
-! bitmap_bit_p (lra_subreg_reload_pseudos, spill_regno)
-! bitmap_bit_p (lra_optional_reload_pseudos, spill_regno))
+   if ((pic_offset_table_rtx != NULL
+ spill_regno == REGNO (pic_offset_table_rtx))
+   || ((int) spill_regno = lra_constraint_new_regno_start
+! bitmap_bit_p (lra_inheritance_pseudos, spill_regno)
+! bitmap_bit_p (lra_split_regs, spill_regno)
+! bitmap_bit_p (lra_subreg_reload_pseudos, spill_regno)
+! bitmap_bit_p (lra_optional_reload_pseudos, spill_regno)))
  goto fail;
   insn_pseudos_num = 0;
   if (lra_dump_file != NULL)
@@ -1053,7 +1055,9 @@ setup_live_pseudos_and_spill_after_risky
   return;
 }
   for (n = 0, i = FIRST_PSEUDO_REGISTER; i  max_regno; i++)
-if (reg_renumber[i] = 0  lra_reg_info[i].nrefs  0)
+if ((pic_offset_table_rtx == NULL_RTX
+|| i != (int) REGNO (pic_offset_table_rtx))
+reg_renumber[i] = 0  lra_reg_info[i].nrefs  0)
   sorted_pseudos[n++] = i;
   qsort (sorted_pseudos, n, sizeof (int), pseudo_compare_func);
   for (i = n - 1; i = 0; i--)
@@ -1360,6 +1364,8 @@ assign_by_spills (void)
}
   EXECUTE_IF_SET_IN_SPARSESET (live_range_hard_reg_pseudos, conflict_regno)
{
+ gcc_assert (pic_offset_table_rtx == NULL
+ || conflict_regno != REGNO (pic_offset_table_rtx));
  if ((int) conflict_regno = lra_constraint_new_regno_start)
sorted_pseudos[nfails++] = conflict_regno;
  if (lra_dump_file != NULL)


Re: Enable EBX for x86 in 32bits PIC code

2014-09-02 Thread Vladimir Makarov
On 08/29/2014 02:47 AM, Ilya Enkovich wrote:
 Seems your patch doesn't cover all cases.  Attached is a modified
 patch (with your changes included) and a test where double constant is
 wrongly rematerialized.  I also see in ira dump that there is still a
 copy of PIC reg created:

 Initialization of original PIC reg:
 (insn 23 22 24 2 (set (reg:SI 127)
 (reg:SI 3 bx)) test.cc:42 90 {*movsi_internal}
  (expr_list:REG_DEAD (reg:SI 3 bx)
 (nil)))
 ...
 Copy is created:
 (insn 135 37 25 3 (set (reg:SI 138 [127])
 (reg:SI 127)) 90 {*movsi_internal}
  (expr_list:REG_DEAD (reg:SI 127)
 (nil)))
 ...
 Copy is used:
 (insn 119 25 122 3 (set (reg:DF 134)
 (mem/u/c:DF (plus:SI (reg:SI 138 [127])
 (const:SI (unspec:SI [
 (symbol_ref/u:SI (*.LC0) [flags 0x2])
 ] UNSPEC_GOTOFF))) [5  S8 A64])) 128 {*movdf_internal}
  (expr_list:REG_EQUIV (const_double:DF
 2.9997371893933895137251965934410691261292e-4
 [0x0.9d495182a99308p-11])
 (nil)))

 After reload we have new usage of r127 which is allocated to ecx which
 actually does not have any definition in this function at all.

 (insn 151 42 44 4 (set (reg:SI 0 ax [147])
 (plus:SI (reg:SI 2 cx [127])
 (const:SI (unspec:SI [
 (symbol_ref/u:SI (*.LC0) [flags 0x2])
 ] UNSPEC_GOTOFF test.cc:44 213 {*leasi}
  (expr_list:REG_EQUAL (symbol_ref/u:SI (*.LC0) [flags 0x2])
 (nil)))
 (insn 44 151 45 4 (set (reg:DF 21 xmm0 [orig:129 D.2450 ] [129])
 (mult:DF (reg:DF 21 xmm0 [orig:128 D.2450 ] [128])
 (mem/u/c:DF (reg:SI 0 ax [147]) [5  S8 A64]))) test.cc:44
 790 {*fop_df_comm_sse}
  (expr_list:REG_EQUAL (mult:DF (reg:DF 21 xmm0 [orig:128 D.2450 ] [128])
 (const_double:DF
 2.9997371893933895137251965934410691261292e-4
 [0x0.9d495182a99308p-11]))
 (nil)))

 Compilation string: g++ -m32 -O2 -mfpmath=sse -fPIE -S test.cc


Ok, Ilya.  I'll look at the problem this week.



Re: Enable EBX for x86 in 32bits PIC code

2014-08-29 Thread Ilya Enkovich
2014-08-28 12:28 GMT+04:00 Ilya Enkovich enkovich@gmail.com:
 2014-08-28 0:19 GMT+04:00 Vladimir Makarov vmaka...@redhat.com:
 On 2014-08-26 5:42 PM, Ilya Enkovich wrote:

 Hi,

 Here is a patch I tried.  I apply it over revision 214215.  Unfortunately
 I do not have a small reproducer but the problem can be easily reproduced on
 SPEC2000 benchmark 175.vpr.  The problem is in read_arch.c:701 where float
 value is compared with float constant 1.0.  It is inlined into read_arch
 function and can be easily found in RTL dump of function read_arch as a
 float comparison with 1.0 after the first call to strtod function.

 Here is a compilation string I use:

 gcc -m32 -mno-movbe -g3 -fdump-rtl-all-details -O2 -ffast-math
 -mfpmath=sse -m32  -march=slm -fPIE -pie -c -o read_arch.o
 -DSPEC_CPU2000read_arch.c

 In my final assembler comparison with 1.0 looks like:

 comiss  .LC11@GOTOFF(%ebp), %xmm0   # 1101  *cmpisf_sse [length =
 7]

 and %ebp here doesn't have a proper value.

 I'll try to make a smaller reproducer if these instructions don't help.


 I've managed to reproduce it.  Although it would be better to send the patch
 as an attachment.

 The problem is actually in IRA not LRA.  IRA splits pseudo used for PIC.
 Then in a region when a *new* pseudo used as PIC we rematerialize a constant
 which transformed in memory addressed through *original* PIC pseudo.

 To solve the problem we should prevent such splitting and guarantee that PIC
 pseudo allocnos in different region gets the same hard reg.

 The following patch should solve the problem.


 Thanks for the patch! I'll try it and be back with results.

Seems your patch doesn't cover all cases.  Attached is a modified
patch (with your changes included) and a test where double constant is
wrongly rematerialized.  I also see in ira dump that there is still a
copy of PIC reg created:

Initialization of original PIC reg:
(insn 23 22 24 2 (set (reg:SI 127)
(reg:SI 3 bx)) test.cc:42 90 {*movsi_internal}
 (expr_list:REG_DEAD (reg:SI 3 bx)
(nil)))
...
Copy is created:
(insn 135 37 25 3 (set (reg:SI 138 [127])
(reg:SI 127)) 90 {*movsi_internal}
 (expr_list:REG_DEAD (reg:SI 127)
(nil)))
...
Copy is used:
(insn 119 25 122 3 (set (reg:DF 134)
(mem/u/c:DF (plus:SI (reg:SI 138 [127])
(const:SI (unspec:SI [
(symbol_ref/u:SI (*.LC0) [flags 0x2])
] UNSPEC_GOTOFF))) [5  S8 A64])) 128 {*movdf_internal}
 (expr_list:REG_EQUIV (const_double:DF
2.9997371893933895137251965934410691261292e-4
[0x0.9d495182a99308p-11])
(nil)))

After reload we have new usage of r127 which is allocated to ecx which
actually does not have any definition in this function at all.

(insn 151 42 44 4 (set (reg:SI 0 ax [147])
(plus:SI (reg:SI 2 cx [127])
(const:SI (unspec:SI [
(symbol_ref/u:SI (*.LC0) [flags 0x2])
] UNSPEC_GOTOFF test.cc:44 213 {*leasi}
 (expr_list:REG_EQUAL (symbol_ref/u:SI (*.LC0) [flags 0x2])
(nil)))
(insn 44 151 45 4 (set (reg:DF 21 xmm0 [orig:129 D.2450 ] [129])
(mult:DF (reg:DF 21 xmm0 [orig:128 D.2450 ] [128])
(mem/u/c:DF (reg:SI 0 ax [147]) [5  S8 A64]))) test.cc:44
790 {*fop_df_comm_sse}
 (expr_list:REG_EQUAL (mult:DF (reg:DF 21 xmm0 [orig:128 D.2450 ] [128])
(const_double:DF
2.9997371893933895137251965934410691261292e-4
[0x0.9d495182a99308p-11]))
(nil)))

Compilation string: g++ -m32 -O2 -mfpmath=sse -fPIE -S test.cc

Thanks,
Ilya


 Ilya



pie-2014-08-28.patch
Description: Binary data


test.cc
Description: Binary data


Re: Enable EBX for x86 in 32bits PIC code

2014-08-29 Thread Ilya Enkovich
2014-08-28 22:58 GMT+04:00 Uros Bizjak ubiz...@gmail.com:
 On Fri, Aug 22, 2014 at 2:21 PM, Ilya Enkovich enkovich@gmail.com wrote:

 On Cauldron 2014 we had a couple of talks about relaxation of ebx usage in 
 32bit PIC mode.  It was decided that the best approach would be to not fix 
 ebx register, use speudo register for GOT base address and let allocator do 
 the rest.  This should be similar to how clang and icc work with GOT base 
 address.  I've been working for some time on such patch and now want to 
 share my results.

  (define_insn *pushtf
[(set (match_operand:TF 0 push_operand =,)
 -   (match_operand:TF 1 general_no_elim_operand x,*roF))]
 +   (match_operand:TF 1 nonimmediate_no_elim_operand x,*roF))]

 Can you please explain the reason for this change (and a couple of
 similar changes to push patterns) ?

This is a workaround for stability problem with reload.  Immediate
operands cause new usages of pseudo PIC register in reload which leads
to wrong registers allocation.  These changes wouldn't be required
after reload issue if resolved.

Ilya


 Uros.


Re: Enable EBX for x86 in 32bits PIC code

2014-08-29 Thread Jeff Law

On 08/28/14 12:58, Uros Bizjak wrote:

On Fri, Aug 22, 2014 at 2:21 PM, Ilya Enkovich enkovich@gmail.com wrote:


On Cauldron 2014 we had a couple of talks about relaxation of ebx usage in 
32bit PIC mode.  It was decided that the best approach would be to not fix ebx 
register, use speudo register for GOT base address and let allocator do the 
rest.  This should be similar to how clang and icc work with GOT base address.  
I've been working for some time on such patch and now want to share my results.



  (define_insn *pushtf
[(set (match_operand:TF 0 push_operand =,)
-   (match_operand:TF 1 general_no_elim_operand x,*roF))]
+   (match_operand:TF 1 nonimmediate_no_elim_operand x,*roF))]


Can you please explain the reason for this change (and a couple of
similar changes to push patterns) ?

I'd recommend dropping them from the WIP postings.

jeff


Re: Enable EBX for x86 in 32bits PIC code

2014-08-29 Thread Jeff Law

On 08/28/14 07:01, Uros Bizjak wrote:

On Fri, Aug 22, 2014 at 2:21 PM, Ilya Enkovich enkovich@gmail.com wrote:

Hi,

On Cauldron 2014 we had a couple of talks about relaxation of ebx usage in 
32bit PIC mode.  It was decided that the best approach would be to not fix ebx 
register, use speudo register for GOT base address and let allocator do the 
rest.  This should be similar to how clang and icc work with GOT base address.  
I've been working for some time on such patch and now want to share my results.


+#define PIC_OFFSET_TABLE_REGNUM
  \
+  ((TARGET_64BIT  (ix86_cmodel == CM_SMALL_PIC   \
+ || TARGET_PECOFF))
  \
+   || !flag_pic ? INVALID_REGNUM   \
+   : X86_TUNE_RELAX_PIC_REG ? (pic_offset_table_rtx ? INVALID_REGNUM   \
+  : REAL_PIC_OFFSET_TABLE_REGNUM)  \
+   : reload_completed ? REGNO (pic_offset_table_rtx)   \
 : REAL_PIC_OFFSET_TABLE_REGNUM)

I'd like to avoid X86_TUNE_RELAX_PIC_REG and always treat EBX as an
allocatable register. This way, we can avoid all mess with implicit
xchgs in atomic_compare_and_swapdwi_doubleword. Also, having
allocatable EBX would allow us to introduce __builtin_cpuid builtin
and cleanup cpiud.h.
I think for the initial WIP patch it was fine.  However I think we all 
agree that we want EBX as an allocatable register without any special 
conditions.  So I'd recommend pulling this out of the WIP patches as well.

Jeff


Re: Enable EBX for x86 in 32bits PIC code

2014-08-29 Thread Jeff Law

On 08/28/14 02:37, Ilya Enkovich wrote:

2014-08-28 1:39 GMT+04:00 Jeff Law l...@redhat.com:

On 08/26/14 15:42, Ilya Enkovich wrote:


diff --git a/gcc/calls.c b/gcc/calls.c
index 4285ec1..85dae6b 100644
--- a/gcc/calls.c
+++ b/gcc/calls.c
@@ -1122,6 +1122,14 @@ initialize_argument_information (int num_actuals
ATTRIBUTE_UNUSED,
   call_expr_arg_iterator iter;
   tree arg;

+if (targetm.calls.implicit_pic_arg (fndecl ? fndecl : fntype))
+  {
+   gcc_assert (pic_offset_table_rtx);
+   args[j].tree_value = make_tree (ptr_type_node,
+   pic_offset_table_rtx);
+   j--;
+  }
+
   if (struct_value_addr_value)
 {
 args[j].tree_value = struct_value_addr_value;


So why do you need this?  Can't this be handled in the call/call_value
expanders or what about attaching the use to CALL_INSN_FUNCTION_USAGE from
inside ix86_expand_call?  Basically I'm not seeing the need for another
target hook here.  I think that would significantly simply the patch as
well.


GOT base address become an additional implicit arg with EBX relaxed
and I handled it as all other args. I can move EBX initialization into
ix86_expand_call. Would still need some hint from target to init
pic_offset_table_rtx with proper value in the beginning of function
expand.
It doesn't really need to be an argument in the traditional sense and 
adding it just complicates things with a target implementation detail as 
far as I can see.


I think you'll find that if you have the call pattern emit a copy from 
pic_offset_table_rtx into EBX and attach of use of EBX to the call then 
most of the code you've written to add the implicit argument just 
disappears.


jeff



Re: Enable EBX for x86 in 32bits PIC code

2014-08-29 Thread Ilya Enkovich
2014-08-28 12:28 GMT+04:00 Ilya Enkovich enkovich@gmail.com:
 2014-08-28 0:19 GMT+04:00 Vladimir Makarov vmaka...@redhat.com:
 On 2014-08-26 5:42 PM, Ilya Enkovich wrote:

 Hi,

 Here is a patch I tried.  I apply it over revision 214215.  Unfortunately
 I do not have a small reproducer but the problem can be easily reproduced on
 SPEC2000 benchmark 175.vpr.  The problem is in read_arch.c:701 where float
 value is compared with float constant 1.0.  It is inlined into read_arch
 function and can be easily found in RTL dump of function read_arch as a
 float comparison with 1.0 after the first call to strtod function.

 Here is a compilation string I use:

 gcc -m32 -mno-movbe -g3 -fdump-rtl-all-details -O2 -ffast-math
 -mfpmath=sse -m32  -march=slm -fPIE -pie -c -o read_arch.o
 -DSPEC_CPU2000read_arch.c

 In my final assembler comparison with 1.0 looks like:

 comiss  .LC11@GOTOFF(%ebp), %xmm0   # 1101  *cmpisf_sse [length =
 7]

 and %ebp here doesn't have a proper value.

 I'll try to make a smaller reproducer if these instructions don't help.


 I've managed to reproduce it.  Although it would be better to send the patch
 as an attachment.

 The problem is actually in IRA not LRA.  IRA splits pseudo used for PIC.
 Then in a region when a *new* pseudo used as PIC we rematerialize a constant
 which transformed in memory addressed through *original* PIC pseudo.

 To solve the problem we should prevent such splitting and guarantee that PIC
 pseudo allocnos in different region gets the same hard reg.

 The following patch should solve the problem.


 Thanks for the patch! I'll try it and be back with results.

Seems your patch doesn't cover all cases.  Attached is a modified
patch (with your changes included) and a test where double constant is
wrongly rematerialized.  I also see in ira dump that there is still a
copy of PIC reg created:

Initialization of original PIC reg:
(insn 23 22 24 2 (set (reg:SI 127)
(reg:SI 3 bx)) test.cc:42 90 {*movsi_internal}
 (expr_list:REG_DEAD (reg:SI 3 bx)
(nil)))
...
Copy is created:
(insn 135 37 25 3 (set (reg:SI 138 [127])
(reg:SI 127)) 90 {*movsi_internal}
 (expr_list:REG_DEAD (reg:SI 127)
(nil)))
...
Copy is used:
(insn 119 25 122 3 (set (reg:DF 134)
(mem/u/c:DF (plus:SI (reg:SI 138 [127])
(const:SI (unspec:SI [
(symbol_ref/u:SI (*.LC0) [flags 0x2])
] UNSPEC_GOTOFF))) [5  S8 A64])) 128 {*movdf_internal}
 (expr_list:REG_EQUIV (const_double:DF
2.9997371893933895137251965934410691261292e-4
[0x0.9d495182a99308p-11])
(nil)))

After reload we have new usage of r127 which is allocated to ecx which
actually does not have any definition in this function at all.

(insn 151 42 44 4 (set (reg:SI 0 ax [147])
(plus:SI (reg:SI 2 cx [127])
(const:SI (unspec:SI [
(symbol_ref/u:SI (*.LC0) [flags 0x2])
] UNSPEC_GOTOFF test.cc:44 213 {*leasi}
 (expr_list:REG_EQUAL (symbol_ref/u:SI (*.LC0) [flags 0x2])
(nil)))
(insn 44 151 45 4 (set (reg:DF 21 xmm0 [orig:129 D.2450 ] [129])
(mult:DF (reg:DF 21 xmm0 [orig:128 D.2450 ] [128])
(mem/u/c:DF (reg:SI 0 ax [147]) [5  S8 A64]))) test.cc:44
790 {*fop_df_comm_sse}
 (expr_list:REG_EQUAL (mult:DF (reg:DF 21 xmm0 [orig:128 D.2450 ] [128])
(const_double:DF
2.9997371893933895137251965934410691261292e-4
[0x0.9d495182a99308p-11]))
(nil)))

Compilation string: g++ -m32 -O2 -mfpmath=sse -fPIE -S test.cc

Thanks,
Ilya


 Ilya



pie-2014-08-28.patch
Description: Binary data


test.cc
Description: Binary data


Re: Enable EBX for x86 in 32bits PIC code

2014-08-29 Thread Ilya Enkovich
2014-08-28 22:58 GMT+04:00 Uros Bizjak ubiz...@gmail.com:
 On Fri, Aug 22, 2014 at 2:21 PM, Ilya Enkovich enkovich@gmail.com wrote:

 On Cauldron 2014 we had a couple of talks about relaxation of ebx usage in 
 32bit PIC mode.  It was decided that the best approach would be to not fix 
 ebx register, use speudo register for GOT base address and let allocator do 
 the rest.  This should be similar to how clang and icc work with GOT base 
 address.  I've been working for some time on such patch and now want to 
 share my results.

  (define_insn *pushtf
[(set (match_operand:TF 0 push_operand =,)
 -   (match_operand:TF 1 general_no_elim_operand x,*roF))]
 +   (match_operand:TF 1 nonimmediate_no_elim_operand x,*roF))]

 Can you please explain the reason for this change (and a couple of
 similar changes to push patterns) ?

This is a workaround for stability problem with reload.  Immediate
operands cause new usages of pseudo PIC register in reload which leads
to wrong registers allocation.  These changes wouldn't be required
after reload issue if resolved.

Ilya


 Uros.


Re: Enable EBX for x86 in 32bits PIC code

2014-08-29 Thread Jeff Law

On 08/28/14 12:58, Uros Bizjak wrote:

On Fri, Aug 22, 2014 at 2:21 PM, Ilya Enkovich enkovich@gmail.com wrote:


On Cauldron 2014 we had a couple of talks about relaxation of ebx usage in 
32bit PIC mode.  It was decided that the best approach would be to not fix ebx 
register, use speudo register for GOT base address and let allocator do the 
rest.  This should be similar to how clang and icc work with GOT base address.  
I've been working for some time on such patch and now want to share my results.



  (define_insn *pushtf
[(set (match_operand:TF 0 push_operand =,)
-   (match_operand:TF 1 general_no_elim_operand x,*roF))]
+   (match_operand:TF 1 nonimmediate_no_elim_operand x,*roF))]


Can you please explain the reason for this change (and a couple of
similar changes to push patterns) ?

I'd recommend dropping them from the WIP postings.

jeff


Re: Enable EBX for x86 in 32bits PIC code

2014-08-29 Thread Jeff Law

On 08/28/14 07:01, Uros Bizjak wrote:

On Fri, Aug 22, 2014 at 2:21 PM, Ilya Enkovich enkovich@gmail.com wrote:

Hi,

On Cauldron 2014 we had a couple of talks about relaxation of ebx usage in 
32bit PIC mode.  It was decided that the best approach would be to not fix ebx 
register, use speudo register for GOT base address and let allocator do the 
rest.  This should be similar to how clang and icc work with GOT base address.  
I've been working for some time on such patch and now want to share my results.


+#define PIC_OFFSET_TABLE_REGNUM
  \
+  ((TARGET_64BIT  (ix86_cmodel == CM_SMALL_PIC   \
+ || TARGET_PECOFF))
  \
+   || !flag_pic ? INVALID_REGNUM   \
+   : X86_TUNE_RELAX_PIC_REG ? (pic_offset_table_rtx ? INVALID_REGNUM   \
+  : REAL_PIC_OFFSET_TABLE_REGNUM)  \
+   : reload_completed ? REGNO (pic_offset_table_rtx)   \
 : REAL_PIC_OFFSET_TABLE_REGNUM)

I'd like to avoid X86_TUNE_RELAX_PIC_REG and always treat EBX as an
allocatable register. This way, we can avoid all mess with implicit
xchgs in atomic_compare_and_swapdwi_doubleword. Also, having
allocatable EBX would allow us to introduce __builtin_cpuid builtin
and cleanup cpiud.h.
I think for the initial WIP patch it was fine.  However I think we all 
agree that we want EBX as an allocatable register without any special 
conditions.  So I'd recommend pulling this out of the WIP patches as well.

Jeff


Re: Enable EBX for x86 in 32bits PIC code

2014-08-29 Thread Jeff Law

On 08/28/14 02:37, Ilya Enkovich wrote:

2014-08-28 1:39 GMT+04:00 Jeff Law l...@redhat.com:

On 08/26/14 15:42, Ilya Enkovich wrote:


diff --git a/gcc/calls.c b/gcc/calls.c
index 4285ec1..85dae6b 100644
--- a/gcc/calls.c
+++ b/gcc/calls.c
@@ -1122,6 +1122,14 @@ initialize_argument_information (int num_actuals
ATTRIBUTE_UNUSED,
   call_expr_arg_iterator iter;
   tree arg;

+if (targetm.calls.implicit_pic_arg (fndecl ? fndecl : fntype))
+  {
+   gcc_assert (pic_offset_table_rtx);
+   args[j].tree_value = make_tree (ptr_type_node,
+   pic_offset_table_rtx);
+   j--;
+  }
+
   if (struct_value_addr_value)
 {
 args[j].tree_value = struct_value_addr_value;


So why do you need this?  Can't this be handled in the call/call_value
expanders or what about attaching the use to CALL_INSN_FUNCTION_USAGE from
inside ix86_expand_call?  Basically I'm not seeing the need for another
target hook here.  I think that would significantly simply the patch as
well.


GOT base address become an additional implicit arg with EBX relaxed
and I handled it as all other args. I can move EBX initialization into
ix86_expand_call. Would still need some hint from target to init
pic_offset_table_rtx with proper value in the beginning of function
expand.
It doesn't really need to be an argument in the traditional sense and 
adding it just complicates things with a target implementation detail as 
far as I can see.


I think you'll find that if you have the call pattern emit a copy from 
pic_offset_table_rtx into EBX and attach of use of EBX to the call then 
most of the code you've written to add the implicit argument just 
disappears.


jeff



Re: Enable EBX for x86 in 32bits PIC code

2014-08-28 Thread Ilya Enkovich
2014-08-28 0:19 GMT+04:00 Vladimir Makarov vmaka...@redhat.com:
 On 2014-08-26 5:42 PM, Ilya Enkovich wrote:

 Hi,

 Here is a patch I tried.  I apply it over revision 214215.  Unfortunately
 I do not have a small reproducer but the problem can be easily reproduced on
 SPEC2000 benchmark 175.vpr.  The problem is in read_arch.c:701 where float
 value is compared with float constant 1.0.  It is inlined into read_arch
 function and can be easily found in RTL dump of function read_arch as a
 float comparison with 1.0 after the first call to strtod function.

 Here is a compilation string I use:

 gcc -m32 -mno-movbe -g3 -fdump-rtl-all-details -O2 -ffast-math
 -mfpmath=sse -m32  -march=slm -fPIE -pie -c -o read_arch.o
 -DSPEC_CPU2000read_arch.c

 In my final assembler comparison with 1.0 looks like:

 comiss  .LC11@GOTOFF(%ebp), %xmm0   # 1101  *cmpisf_sse [length =
 7]

 and %ebp here doesn't have a proper value.

 I'll try to make a smaller reproducer if these instructions don't help.


 I've managed to reproduce it.  Although it would be better to send the patch
 as an attachment.

 The problem is actually in IRA not LRA.  IRA splits pseudo used for PIC.
 Then in a region when a *new* pseudo used as PIC we rematerialize a constant
 which transformed in memory addressed through *original* PIC pseudo.

 To solve the problem we should prevent such splitting and guarantee that PIC
 pseudo allocnos in different region gets the same hard reg.

 The following patch should solve the problem.


Thanks for the patch! I'll try it and be back with results.

Ilya



Re: Enable EBX for x86 in 32bits PIC code

2014-08-28 Thread Ilya Enkovich
2014-08-28 1:39 GMT+04:00 Jeff Law l...@redhat.com:
 On 08/26/14 15:42, Ilya Enkovich wrote:

 diff --git a/gcc/calls.c b/gcc/calls.c
 index 4285ec1..85dae6b 100644
 --- a/gcc/calls.c
 +++ b/gcc/calls.c
 @@ -1122,6 +1122,14 @@ initialize_argument_information (int num_actuals
 ATTRIBUTE_UNUSED,
   call_expr_arg_iterator iter;
   tree arg;

 +if (targetm.calls.implicit_pic_arg (fndecl ? fndecl : fntype))
 +  {
 +   gcc_assert (pic_offset_table_rtx);
 +   args[j].tree_value = make_tree (ptr_type_node,
 +   pic_offset_table_rtx);
 +   j--;
 +  }
 +
   if (struct_value_addr_value)
 {
 args[j].tree_value = struct_value_addr_value;

 So why do you need this?  Can't this be handled in the call/call_value
 expanders or what about attaching the use to CALL_INSN_FUNCTION_USAGE from
 inside ix86_expand_call?  Basically I'm not seeing the need for another
 target hook here.  I think that would significantly simply the patch as
 well.

GOT base address become an additional implicit arg with EBX relaxed
and I handled it as all other args. I can move EBX initialization into
ix86_expand_call. Would still need some hint from target to init
pic_offset_table_rtx with proper value in the beginning of function
expand.

Thanks,
Ilya



 Jeff


Re: Enable EBX for x86 in 32bits PIC code

2014-08-28 Thread Uros Bizjak
On Thu, Aug 28, 2014 at 10:37 AM, Ilya Enkovich enkovich@gmail.com wrote:
 2014-08-28 1:39 GMT+04:00 Jeff Law l...@redhat.com:
 On 08/26/14 15:42, Ilya Enkovich wrote:

 diff --git a/gcc/calls.c b/gcc/calls.c
 index 4285ec1..85dae6b 100644
 --- a/gcc/calls.c
 +++ b/gcc/calls.c
 @@ -1122,6 +1122,14 @@ initialize_argument_information (int num_actuals
 ATTRIBUTE_UNUSED,
   call_expr_arg_iterator iter;
   tree arg;

 +if (targetm.calls.implicit_pic_arg (fndecl ? fndecl : fntype))
 +  {
 +   gcc_assert (pic_offset_table_rtx);
 +   args[j].tree_value = make_tree (ptr_type_node,
 +   pic_offset_table_rtx);
 +   j--;
 +  }
 +
   if (struct_value_addr_value)
 {
 args[j].tree_value = struct_value_addr_value;

 So why do you need this?  Can't this be handled in the call/call_value
 expanders or what about attaching the use to CALL_INSN_FUNCTION_USAGE from
 inside ix86_expand_call?  Basically I'm not seeing the need for another
 target hook here.  I think that would significantly simply the patch as
 well.

 GOT base address become an additional implicit arg with EBX relaxed
 and I handled it as all other args. I can move EBX initialization into
 ix86_expand_call. Would still need some hint from target to init
 pic_offset_table_rtx with proper value in the beginning of function
 expand.

Maybe you can you use get_hard_reg_initial_val for this?

Uros.


Re: Enable EBX for x86 in 32bits PIC code

2014-08-28 Thread Ilya Enkovich
2014-08-28 16:42 GMT+04:00 Uros Bizjak ubiz...@gmail.com:
 On Thu, Aug 28, 2014 at 10:37 AM, Ilya Enkovich enkovich@gmail.com 
 wrote:
 2014-08-28 1:39 GMT+04:00 Jeff Law l...@redhat.com:
 On 08/26/14 15:42, Ilya Enkovich wrote:

 diff --git a/gcc/calls.c b/gcc/calls.c
 index 4285ec1..85dae6b 100644
 --- a/gcc/calls.c
 +++ b/gcc/calls.c
 @@ -1122,6 +1122,14 @@ initialize_argument_information (int num_actuals
 ATTRIBUTE_UNUSED,
   call_expr_arg_iterator iter;
   tree arg;

 +if (targetm.calls.implicit_pic_arg (fndecl ? fndecl : fntype))
 +  {
 +   gcc_assert (pic_offset_table_rtx);
 +   args[j].tree_value = make_tree (ptr_type_node,
 +   pic_offset_table_rtx);
 +   j--;
 +  }
 +
   if (struct_value_addr_value)
 {
 args[j].tree_value = struct_value_addr_value;

 So why do you need this?  Can't this be handled in the call/call_value
 expanders or what about attaching the use to CALL_INSN_FUNCTION_USAGE from
 inside ix86_expand_call?  Basically I'm not seeing the need for another
 target hook here.  I think that would significantly simply the patch as
 well.

 GOT base address become an additional implicit arg with EBX relaxed
 and I handled it as all other args. I can move EBX initialization into
 ix86_expand_call. Would still need some hint from target to init
 pic_offset_table_rtx with proper value in the beginning of function
 expand.

 Maybe you can you use get_hard_reg_initial_val for this?

Actually there is no input hard reg holding GOT address.  Currently I
use initialization with ebx with following ebx initialization in
prolog_epilog pass.  But this is a temporary workaround.  It is
inefficient because always uses callee save reg to get GOT address.  I
suppose we should generate pseudo reg for pic_offset_table_rtx and
also set_got with this register as a destination in expand pass.
After register allocation set_got may be transformed into get_pc_thunk
call with proper hard reg.  But some target hook has to be used for
this.

Ilya


 Uros.


Re: Enable EBX for x86 in 32bits PIC code

2014-08-28 Thread Uros Bizjak
On Fri, Aug 22, 2014 at 2:21 PM, Ilya Enkovich enkovich@gmail.com wrote:
 Hi,

 On Cauldron 2014 we had a couple of talks about relaxation of ebx usage in 
 32bit PIC mode.  It was decided that the best approach would be to not fix 
 ebx register, use speudo register for GOT base address and let allocator do 
 the rest.  This should be similar to how clang and icc work with GOT base 
 address.  I've been working for some time on such patch and now want to share 
 my results.

+#define PIC_OFFSET_TABLE_REGNUM
 \
+  ((TARGET_64BIT  (ix86_cmodel == CM_SMALL_PIC   \
+ || TARGET_PECOFF))
 \
+   || !flag_pic ? INVALID_REGNUM   \
+   : X86_TUNE_RELAX_PIC_REG ? (pic_offset_table_rtx ? INVALID_REGNUM   \
+  : REAL_PIC_OFFSET_TABLE_REGNUM)  \
+   : reload_completed ? REGNO (pic_offset_table_rtx)   \
: REAL_PIC_OFFSET_TABLE_REGNUM)

I'd like to avoid X86_TUNE_RELAX_PIC_REG and always treat EBX as an
allocatable register. This way, we can avoid all mess with implicit
xchgs in atomic_compare_and_swapdwi_doubleword. Also, having
allocatable EBX would allow us to introduce __builtin_cpuid builtin
and cleanup cpiud.h.


Re: Enable EBX for x86 in 32bits PIC code

2014-08-28 Thread Uros Bizjak
On Thu, Aug 28, 2014 at 2:54 PM, Ilya Enkovich enkovich@gmail.com wrote:

 diff --git a/gcc/calls.c b/gcc/calls.c
 index 4285ec1..85dae6b 100644
 --- a/gcc/calls.c
 +++ b/gcc/calls.c
 @@ -1122,6 +1122,14 @@ initialize_argument_information (int num_actuals
 ATTRIBUTE_UNUSED,
   call_expr_arg_iterator iter;
   tree arg;

 +if (targetm.calls.implicit_pic_arg (fndecl ? fndecl : fntype))
 +  {
 +   gcc_assert (pic_offset_table_rtx);
 +   args[j].tree_value = make_tree (ptr_type_node,
 +   pic_offset_table_rtx);
 +   j--;
 +  }
 +
   if (struct_value_addr_value)
 {
 args[j].tree_value = struct_value_addr_value;

 So why do you need this?  Can't this be handled in the call/call_value
 expanders or what about attaching the use to CALL_INSN_FUNCTION_USAGE from
 inside ix86_expand_call?  Basically I'm not seeing the need for another
 target hook here.  I think that would significantly simply the patch as
 well.

 GOT base address become an additional implicit arg with EBX relaxed
 and I handled it as all other args. I can move EBX initialization into
 ix86_expand_call. Would still need some hint from target to init
 pic_offset_table_rtx with proper value in the beginning of function
 expand.

 Maybe you can you use get_hard_reg_initial_val for this?

 Actually there is no input hard reg holding GOT address.  Currently I
 use initialization with ebx with following ebx initialization in
 prolog_epilog pass.  But this is a temporary workaround.  It is
 inefficient because always uses callee save reg to get GOT address.  I
 suppose we should generate pseudo reg for pic_offset_table_rtx and
 also set_got with this register as a destination in expand pass.
 After register allocation set_got may be transformed into get_pc_thunk
 call with proper hard reg.  But some target hook has to be used for
 this.

Let me expand my idea a bit. IIRC, get_hard_reg_initial_val and
friends will automatically emit intialization of a pseudo from
pic_offset_table_rtx hard reg. After reload, real initialization of
pic_offset_table_rtx hard reg is emitted in pro_and_epilogue pass. I
don't know if this works with current implementation of dynamic
pic_offset_table_rtx selection, though.

Uros.


Re: Enable EBX for x86 in 32bits PIC code

2014-08-28 Thread Ilya Enkovich
2014-08-28 17:01 GMT+04:00 Uros Bizjak ubiz...@gmail.com:
 On Fri, Aug 22, 2014 at 2:21 PM, Ilya Enkovich enkovich@gmail.com wrote:
 Hi,

 On Cauldron 2014 we had a couple of talks about relaxation of ebx usage in 
 32bit PIC mode.  It was decided that the best approach would be to not fix 
 ebx register, use speudo register for GOT base address and let allocator do 
 the rest.  This should be similar to how clang and icc work with GOT base 
 address.  I've been working for some time on such patch and now want to 
 share my results.

 +#define PIC_OFFSET_TABLE_REGNUM
  \
 +  ((TARGET_64BIT  (ix86_cmodel == CM_SMALL_PIC   \
 + || TARGET_PECOFF))
  \
 +   || !flag_pic ? INVALID_REGNUM   \
 +   : X86_TUNE_RELAX_PIC_REG ? (pic_offset_table_rtx ? INVALID_REGNUM   \
 +  : REAL_PIC_OFFSET_TABLE_REGNUM)  \
 +   : reload_completed ? REGNO (pic_offset_table_rtx)   \
 : REAL_PIC_OFFSET_TABLE_REGNUM)

 I'd like to avoid X86_TUNE_RELAX_PIC_REG and always treat EBX as an
 allocatable register. This way, we can avoid all mess with implicit
 xchgs in atomic_compare_and_swapdwi_doubleword. Also, having
 allocatable EBX would allow us to introduce __builtin_cpuid builtin
 and cleanup cpiud.h.

We should show nice performance to have this feature enabled by
default.  Currently patch causes a set of performance losses. I have a
version of this patch where EBX is relaxed by a compiler flag, not
tune flag.

Ilya


Re: Enable EBX for x86 in 32bits PIC code

2014-08-28 Thread Ilya Enkovich
2014-08-28 17:08 GMT+04:00 Uros Bizjak ubiz...@gmail.com:
 On Thu, Aug 28, 2014 at 2:54 PM, Ilya Enkovich enkovich@gmail.com wrote:

 diff --git a/gcc/calls.c b/gcc/calls.c
 index 4285ec1..85dae6b 100644
 --- a/gcc/calls.c
 +++ b/gcc/calls.c
 @@ -1122,6 +1122,14 @@ initialize_argument_information (int num_actuals
 ATTRIBUTE_UNUSED,
   call_expr_arg_iterator iter;
   tree arg;

 +if (targetm.calls.implicit_pic_arg (fndecl ? fndecl : fntype))
 +  {
 +   gcc_assert (pic_offset_table_rtx);
 +   args[j].tree_value = make_tree (ptr_type_node,
 +   pic_offset_table_rtx);
 +   j--;
 +  }
 +
   if (struct_value_addr_value)
 {
 args[j].tree_value = struct_value_addr_value;

 So why do you need this?  Can't this be handled in the call/call_value
 expanders or what about attaching the use to CALL_INSN_FUNCTION_USAGE from
 inside ix86_expand_call?  Basically I'm not seeing the need for another
 target hook here.  I think that would significantly simply the patch as
 well.

 GOT base address become an additional implicit arg with EBX relaxed
 and I handled it as all other args. I can move EBX initialization into
 ix86_expand_call. Would still need some hint from target to init
 pic_offset_table_rtx with proper value in the beginning of function
 expand.

 Maybe you can you use get_hard_reg_initial_val for this?

 Actually there is no input hard reg holding GOT address.  Currently I
 use initialization with ebx with following ebx initialization in
 prolog_epilog pass.  But this is a temporary workaround.  It is
 inefficient because always uses callee save reg to get GOT address.  I
 suppose we should generate pseudo reg for pic_offset_table_rtx and
 also set_got with this register as a destination in expand pass.
 After register allocation set_got may be transformed into get_pc_thunk
 call with proper hard reg.  But some target hook has to be used for
 this.

 Let me expand my idea a bit. IIRC, get_hard_reg_initial_val and
 friends will automatically emit intialization of a pseudo from
 pic_offset_table_rtx hard reg. After reload, real initialization of
 pic_offset_table_rtx hard reg is emitted in pro_and_epilogue pass. I
 don't know if this works with current implementation of dynamic
 pic_offset_table_rtx selection, though.

That means you should choose some hard reg early before register
allocation to be used for PIC reg initialization.  I do not like we
have to do this and want to just generate set_got with pseudo reg and
do not involve any additional hard reg. That would look like

(insn/f 168 167 169 2 (parallel [
(set (reg:SI 127)
(unspec:SI [
(const_int 0 [0])
] UNSPEC_SET_GOT))
(clobber (reg:CC 17 flags))
]) test.cc:42 -1
 (expr_list:REG_CFA_FLUSH_QUEUE (nil)
(nil)))

after expand pass.  r127 is pic_offset_table_rtx here. And after
reload it would become:

(insn/f 168 167 169 2 (parallel [
(set (reg:SI 3 bx)
(unspec:SI [
(const_int 0 [0])
] UNSPEC_SET_GOT))
(clobber (reg:CC 17 flags))
]) test.cc:42 -1
 (expr_list:REG_CFA_FLUSH_QUEUE (nil)
(nil)))

And no additional actions are required on pro_and_epilogue.  Also it
simplifies analysis whether we should generate set_got at all.
Current we check hard reg is ever live which is wrong with not fixed
ebx because any usage of hard reg used to init GOT doesn't mean GOT
usage.  And with my proposed scheme unused GOT would mean DCE just
removes useless set_got.

Ilya


 Uros.


Re: Enable EBX for x86 in 32bits PIC code

2014-08-28 Thread Uros Bizjak
On Thu, Aug 28, 2014 at 3:29 PM, Ilya Enkovich enkovich@gmail.com wrote:

 diff --git a/gcc/calls.c b/gcc/calls.c
 index 4285ec1..85dae6b 100644
 --- a/gcc/calls.c
 +++ b/gcc/calls.c
 @@ -1122,6 +1122,14 @@ initialize_argument_information (int num_actuals
 ATTRIBUTE_UNUSED,
   call_expr_arg_iterator iter;
   tree arg;

 +if (targetm.calls.implicit_pic_arg (fndecl ? fndecl : fntype))
 +  {
 +   gcc_assert (pic_offset_table_rtx);
 +   args[j].tree_value = make_tree (ptr_type_node,
 +   pic_offset_table_rtx);
 +   j--;
 +  }
 +
   if (struct_value_addr_value)
 {
 args[j].tree_value = struct_value_addr_value;

 So why do you need this?  Can't this be handled in the call/call_value
 expanders or what about attaching the use to CALL_INSN_FUNCTION_USAGE 
 from
 inside ix86_expand_call?  Basically I'm not seeing the need for another
 target hook here.  I think that would significantly simply the patch as
 well.

 GOT base address become an additional implicit arg with EBX relaxed
 and I handled it as all other args. I can move EBX initialization into
 ix86_expand_call. Would still need some hint from target to init
 pic_offset_table_rtx with proper value in the beginning of function
 expand.

 Maybe you can you use get_hard_reg_initial_val for this?

 Actually there is no input hard reg holding GOT address.  Currently I
 use initialization with ebx with following ebx initialization in
 prolog_epilog pass.  But this is a temporary workaround.  It is
 inefficient because always uses callee save reg to get GOT address.  I
 suppose we should generate pseudo reg for pic_offset_table_rtx and
 also set_got with this register as a destination in expand pass.
 After register allocation set_got may be transformed into get_pc_thunk
 call with proper hard reg.  But some target hook has to be used for
 this.

 Let me expand my idea a bit. IIRC, get_hard_reg_initial_val and
 friends will automatically emit intialization of a pseudo from
 pic_offset_table_rtx hard reg. After reload, real initialization of
 pic_offset_table_rtx hard reg is emitted in pro_and_epilogue pass. I
 don't know if this works with current implementation of dynamic
 pic_offset_table_rtx selection, though.

 That means you should choose some hard reg early before register
 allocation to be used for PIC reg initialization.  I do not like we
 have to do this and want to just generate set_got with pseudo reg and
 do not involve any additional hard reg. That would look like

 (insn/f 168 167 169 2 (parallel [
 (set (reg:SI 127)
 (unspec:SI [
 (const_int 0 [0])
 ] UNSPEC_SET_GOT))
 (clobber (reg:CC 17 flags))
 ]) test.cc:42 -1
  (expr_list:REG_CFA_FLUSH_QUEUE (nil)
 (nil)))

 after expand pass.  r127 is pic_offset_table_rtx here. And after
 reload it would become:

 (insn/f 168 167 169 2 (parallel [
 (set (reg:SI 3 bx)
 (unspec:SI [
 (const_int 0 [0])
 ] UNSPEC_SET_GOT))
 (clobber (reg:CC 17 flags))
 ]) test.cc:42 -1
  (expr_list:REG_CFA_FLUSH_QUEUE (nil)
 (nil)))

 And no additional actions are required on pro_and_epilogue.  Also it
 simplifies analysis whether we should generate set_got at all.
 Current we check hard reg is ever live which is wrong with not fixed
 ebx because any usage of hard reg used to init GOT doesn't mean GOT
 usage.  And with my proposed scheme unused GOT would mean DCE just
 removes useless set_got.

Yes this is better. I was under impression you want to retain current
initialization insertion in expand_prologue.

Uros.


Re: Enable EBX for x86 in 32bits PIC code

2014-08-28 Thread Florian Weimer

On 08/28/2014 03:01 PM, Uros Bizjak wrote:

I'd like to avoid X86_TUNE_RELAX_PIC_REG and always treat EBX as an
allocatable register. This way, we can avoid all mess with implicit
xchgs in atomic_compare_and_swapdwi_doubleword. Also, having
allocatable EBX would allow us to introduce __builtin_cpuid builtin
and cleanup cpiud.h.


It also makes writing solid inline assembly which has to use %ebx for 
some reason much easier.  We just fixed a glibc bug related to that.


--
Florian Weimer / Red Hat Product Security


Re: Enable EBX for x86 in 32bits PIC code

2014-08-28 Thread Uros Bizjak
On Fri, Aug 22, 2014 at 2:21 PM, Ilya Enkovich enkovich@gmail.com wrote:

 On Cauldron 2014 we had a couple of talks about relaxation of ebx usage in 
 32bit PIC mode.  It was decided that the best approach would be to not fix 
 ebx register, use speudo register for GOT base address and let allocator do 
 the rest.  This should be similar to how clang and icc work with GOT base 
 address.  I've been working for some time on such patch and now want to share 
 my results.

  (define_insn *pushtf
[(set (match_operand:TF 0 push_operand =,)
 -   (match_operand:TF 1 general_no_elim_operand x,*roF))]
 +   (match_operand:TF 1 nonimmediate_no_elim_operand x,*roF))]

Can you please explain the reason for this change (and a couple of
similar changes to push patterns) ?

Uros.


Re: Enable EBX for x86 in 32bits PIC code

2014-08-28 Thread Ilya Enkovich
2014-08-28 0:19 GMT+04:00 Vladimir Makarov vmaka...@redhat.com:
 On 2014-08-26 5:42 PM, Ilya Enkovich wrote:

 Hi,

 Here is a patch I tried.  I apply it over revision 214215.  Unfortunately
 I do not have a small reproducer but the problem can be easily reproduced on
 SPEC2000 benchmark 175.vpr.  The problem is in read_arch.c:701 where float
 value is compared with float constant 1.0.  It is inlined into read_arch
 function and can be easily found in RTL dump of function read_arch as a
 float comparison with 1.0 after the first call to strtod function.

 Here is a compilation string I use:

 gcc -m32 -mno-movbe -g3 -fdump-rtl-all-details -O2 -ffast-math
 -mfpmath=sse -m32  -march=slm -fPIE -pie -c -o read_arch.o
 -DSPEC_CPU2000read_arch.c

 In my final assembler comparison with 1.0 looks like:

 comiss  .LC11@GOTOFF(%ebp), %xmm0   # 1101  *cmpisf_sse [length =
 7]

 and %ebp here doesn't have a proper value.

 I'll try to make a smaller reproducer if these instructions don't help.


 I've managed to reproduce it.  Although it would be better to send the patch
 as an attachment.

 The problem is actually in IRA not LRA.  IRA splits pseudo used for PIC.
 Then in a region when a *new* pseudo used as PIC we rematerialize a constant
 which transformed in memory addressed through *original* PIC pseudo.

 To solve the problem we should prevent such splitting and guarantee that PIC
 pseudo allocnos in different region gets the same hard reg.

 The following patch should solve the problem.


Thanks for the patch! I'll try it and be back with results.

Ilya



Re: Enable EBX for x86 in 32bits PIC code

2014-08-28 Thread Ilya Enkovich
2014-08-28 1:39 GMT+04:00 Jeff Law l...@redhat.com:
 On 08/26/14 15:42, Ilya Enkovich wrote:

 diff --git a/gcc/calls.c b/gcc/calls.c
 index 4285ec1..85dae6b 100644
 --- a/gcc/calls.c
 +++ b/gcc/calls.c
 @@ -1122,6 +1122,14 @@ initialize_argument_information (int num_actuals
 ATTRIBUTE_UNUSED,
   call_expr_arg_iterator iter;
   tree arg;

 +if (targetm.calls.implicit_pic_arg (fndecl ? fndecl : fntype))
 +  {
 +   gcc_assert (pic_offset_table_rtx);
 +   args[j].tree_value = make_tree (ptr_type_node,
 +   pic_offset_table_rtx);
 +   j--;
 +  }
 +
   if (struct_value_addr_value)
 {
 args[j].tree_value = struct_value_addr_value;

 So why do you need this?  Can't this be handled in the call/call_value
 expanders or what about attaching the use to CALL_INSN_FUNCTION_USAGE from
 inside ix86_expand_call?  Basically I'm not seeing the need for another
 target hook here.  I think that would significantly simply the patch as
 well.

GOT base address become an additional implicit arg with EBX relaxed
and I handled it as all other args. I can move EBX initialization into
ix86_expand_call. Would still need some hint from target to init
pic_offset_table_rtx with proper value in the beginning of function
expand.

Thanks,
Ilya



 Jeff


Re: Enable EBX for x86 in 32bits PIC code

2014-08-28 Thread Uros Bizjak
On Thu, Aug 28, 2014 at 10:37 AM, Ilya Enkovich enkovich@gmail.com wrote:
 2014-08-28 1:39 GMT+04:00 Jeff Law l...@redhat.com:
 On 08/26/14 15:42, Ilya Enkovich wrote:

 diff --git a/gcc/calls.c b/gcc/calls.c
 index 4285ec1..85dae6b 100644
 --- a/gcc/calls.c
 +++ b/gcc/calls.c
 @@ -1122,6 +1122,14 @@ initialize_argument_information (int num_actuals
 ATTRIBUTE_UNUSED,
   call_expr_arg_iterator iter;
   tree arg;

 +if (targetm.calls.implicit_pic_arg (fndecl ? fndecl : fntype))
 +  {
 +   gcc_assert (pic_offset_table_rtx);
 +   args[j].tree_value = make_tree (ptr_type_node,
 +   pic_offset_table_rtx);
 +   j--;
 +  }
 +
   if (struct_value_addr_value)
 {
 args[j].tree_value = struct_value_addr_value;

 So why do you need this?  Can't this be handled in the call/call_value
 expanders or what about attaching the use to CALL_INSN_FUNCTION_USAGE from
 inside ix86_expand_call?  Basically I'm not seeing the need for another
 target hook here.  I think that would significantly simply the patch as
 well.

 GOT base address become an additional implicit arg with EBX relaxed
 and I handled it as all other args. I can move EBX initialization into
 ix86_expand_call. Would still need some hint from target to init
 pic_offset_table_rtx with proper value in the beginning of function
 expand.

Maybe you can you use get_hard_reg_initial_val for this?

Uros.


Re: Enable EBX for x86 in 32bits PIC code

2014-08-28 Thread Ilya Enkovich
2014-08-28 16:42 GMT+04:00 Uros Bizjak ubiz...@gmail.com:
 On Thu, Aug 28, 2014 at 10:37 AM, Ilya Enkovich enkovich@gmail.com 
 wrote:
 2014-08-28 1:39 GMT+04:00 Jeff Law l...@redhat.com:
 On 08/26/14 15:42, Ilya Enkovich wrote:

 diff --git a/gcc/calls.c b/gcc/calls.c
 index 4285ec1..85dae6b 100644
 --- a/gcc/calls.c
 +++ b/gcc/calls.c
 @@ -1122,6 +1122,14 @@ initialize_argument_information (int num_actuals
 ATTRIBUTE_UNUSED,
   call_expr_arg_iterator iter;
   tree arg;

 +if (targetm.calls.implicit_pic_arg (fndecl ? fndecl : fntype))
 +  {
 +   gcc_assert (pic_offset_table_rtx);
 +   args[j].tree_value = make_tree (ptr_type_node,
 +   pic_offset_table_rtx);
 +   j--;
 +  }
 +
   if (struct_value_addr_value)
 {
 args[j].tree_value = struct_value_addr_value;

 So why do you need this?  Can't this be handled in the call/call_value
 expanders or what about attaching the use to CALL_INSN_FUNCTION_USAGE from
 inside ix86_expand_call?  Basically I'm not seeing the need for another
 target hook here.  I think that would significantly simply the patch as
 well.

 GOT base address become an additional implicit arg with EBX relaxed
 and I handled it as all other args. I can move EBX initialization into
 ix86_expand_call. Would still need some hint from target to init
 pic_offset_table_rtx with proper value in the beginning of function
 expand.

 Maybe you can you use get_hard_reg_initial_val for this?

Actually there is no input hard reg holding GOT address.  Currently I
use initialization with ebx with following ebx initialization in
prolog_epilog pass.  But this is a temporary workaround.  It is
inefficient because always uses callee save reg to get GOT address.  I
suppose we should generate pseudo reg for pic_offset_table_rtx and
also set_got with this register as a destination in expand pass.
After register allocation set_got may be transformed into get_pc_thunk
call with proper hard reg.  But some target hook has to be used for
this.

Ilya


 Uros.


Re: Enable EBX for x86 in 32bits PIC code

2014-08-28 Thread Uros Bizjak
On Fri, Aug 22, 2014 at 2:21 PM, Ilya Enkovich enkovich@gmail.com wrote:
 Hi,

 On Cauldron 2014 we had a couple of talks about relaxation of ebx usage in 
 32bit PIC mode.  It was decided that the best approach would be to not fix 
 ebx register, use speudo register for GOT base address and let allocator do 
 the rest.  This should be similar to how clang and icc work with GOT base 
 address.  I've been working for some time on such patch and now want to share 
 my results.

+#define PIC_OFFSET_TABLE_REGNUM
 \
+  ((TARGET_64BIT  (ix86_cmodel == CM_SMALL_PIC   \
+ || TARGET_PECOFF))
 \
+   || !flag_pic ? INVALID_REGNUM   \
+   : X86_TUNE_RELAX_PIC_REG ? (pic_offset_table_rtx ? INVALID_REGNUM   \
+  : REAL_PIC_OFFSET_TABLE_REGNUM)  \
+   : reload_completed ? REGNO (pic_offset_table_rtx)   \
: REAL_PIC_OFFSET_TABLE_REGNUM)

I'd like to avoid X86_TUNE_RELAX_PIC_REG and always treat EBX as an
allocatable register. This way, we can avoid all mess with implicit
xchgs in atomic_compare_and_swapdwi_doubleword. Also, having
allocatable EBX would allow us to introduce __builtin_cpuid builtin
and cleanup cpiud.h.


Re: Enable EBX for x86 in 32bits PIC code

2014-08-28 Thread Uros Bizjak
On Thu, Aug 28, 2014 at 2:54 PM, Ilya Enkovich enkovich@gmail.com wrote:

 diff --git a/gcc/calls.c b/gcc/calls.c
 index 4285ec1..85dae6b 100644
 --- a/gcc/calls.c
 +++ b/gcc/calls.c
 @@ -1122,6 +1122,14 @@ initialize_argument_information (int num_actuals
 ATTRIBUTE_UNUSED,
   call_expr_arg_iterator iter;
   tree arg;

 +if (targetm.calls.implicit_pic_arg (fndecl ? fndecl : fntype))
 +  {
 +   gcc_assert (pic_offset_table_rtx);
 +   args[j].tree_value = make_tree (ptr_type_node,
 +   pic_offset_table_rtx);
 +   j--;
 +  }
 +
   if (struct_value_addr_value)
 {
 args[j].tree_value = struct_value_addr_value;

 So why do you need this?  Can't this be handled in the call/call_value
 expanders or what about attaching the use to CALL_INSN_FUNCTION_USAGE from
 inside ix86_expand_call?  Basically I'm not seeing the need for another
 target hook here.  I think that would significantly simply the patch as
 well.

 GOT base address become an additional implicit arg with EBX relaxed
 and I handled it as all other args. I can move EBX initialization into
 ix86_expand_call. Would still need some hint from target to init
 pic_offset_table_rtx with proper value in the beginning of function
 expand.

 Maybe you can you use get_hard_reg_initial_val for this?

 Actually there is no input hard reg holding GOT address.  Currently I
 use initialization with ebx with following ebx initialization in
 prolog_epilog pass.  But this is a temporary workaround.  It is
 inefficient because always uses callee save reg to get GOT address.  I
 suppose we should generate pseudo reg for pic_offset_table_rtx and
 also set_got with this register as a destination in expand pass.
 After register allocation set_got may be transformed into get_pc_thunk
 call with proper hard reg.  But some target hook has to be used for
 this.

Let me expand my idea a bit. IIRC, get_hard_reg_initial_val and
friends will automatically emit intialization of a pseudo from
pic_offset_table_rtx hard reg. After reload, real initialization of
pic_offset_table_rtx hard reg is emitted in pro_and_epilogue pass. I
don't know if this works with current implementation of dynamic
pic_offset_table_rtx selection, though.

Uros.


Re: Enable EBX for x86 in 32bits PIC code

2014-08-28 Thread Ilya Enkovich
2014-08-28 17:01 GMT+04:00 Uros Bizjak ubiz...@gmail.com:
 On Fri, Aug 22, 2014 at 2:21 PM, Ilya Enkovich enkovich@gmail.com wrote:
 Hi,

 On Cauldron 2014 we had a couple of talks about relaxation of ebx usage in 
 32bit PIC mode.  It was decided that the best approach would be to not fix 
 ebx register, use speudo register for GOT base address and let allocator do 
 the rest.  This should be similar to how clang and icc work with GOT base 
 address.  I've been working for some time on such patch and now want to 
 share my results.

 +#define PIC_OFFSET_TABLE_REGNUM
  \
 +  ((TARGET_64BIT  (ix86_cmodel == CM_SMALL_PIC   \
 + || TARGET_PECOFF))
  \
 +   || !flag_pic ? INVALID_REGNUM   \
 +   : X86_TUNE_RELAX_PIC_REG ? (pic_offset_table_rtx ? INVALID_REGNUM   \
 +  : REAL_PIC_OFFSET_TABLE_REGNUM)  \
 +   : reload_completed ? REGNO (pic_offset_table_rtx)   \
 : REAL_PIC_OFFSET_TABLE_REGNUM)

 I'd like to avoid X86_TUNE_RELAX_PIC_REG and always treat EBX as an
 allocatable register. This way, we can avoid all mess with implicit
 xchgs in atomic_compare_and_swapdwi_doubleword. Also, having
 allocatable EBX would allow us to introduce __builtin_cpuid builtin
 and cleanup cpiud.h.

We should show nice performance to have this feature enabled by
default.  Currently patch causes a set of performance losses. I have a
version of this patch where EBX is relaxed by a compiler flag, not
tune flag.

Ilya


Re: Enable EBX for x86 in 32bits PIC code

2014-08-28 Thread Ilya Enkovich
2014-08-28 17:08 GMT+04:00 Uros Bizjak ubiz...@gmail.com:
 On Thu, Aug 28, 2014 at 2:54 PM, Ilya Enkovich enkovich@gmail.com wrote:

 diff --git a/gcc/calls.c b/gcc/calls.c
 index 4285ec1..85dae6b 100644
 --- a/gcc/calls.c
 +++ b/gcc/calls.c
 @@ -1122,6 +1122,14 @@ initialize_argument_information (int num_actuals
 ATTRIBUTE_UNUSED,
   call_expr_arg_iterator iter;
   tree arg;

 +if (targetm.calls.implicit_pic_arg (fndecl ? fndecl : fntype))
 +  {
 +   gcc_assert (pic_offset_table_rtx);
 +   args[j].tree_value = make_tree (ptr_type_node,
 +   pic_offset_table_rtx);
 +   j--;
 +  }
 +
   if (struct_value_addr_value)
 {
 args[j].tree_value = struct_value_addr_value;

 So why do you need this?  Can't this be handled in the call/call_value
 expanders or what about attaching the use to CALL_INSN_FUNCTION_USAGE from
 inside ix86_expand_call?  Basically I'm not seeing the need for another
 target hook here.  I think that would significantly simply the patch as
 well.

 GOT base address become an additional implicit arg with EBX relaxed
 and I handled it as all other args. I can move EBX initialization into
 ix86_expand_call. Would still need some hint from target to init
 pic_offset_table_rtx with proper value in the beginning of function
 expand.

 Maybe you can you use get_hard_reg_initial_val for this?

 Actually there is no input hard reg holding GOT address.  Currently I
 use initialization with ebx with following ebx initialization in
 prolog_epilog pass.  But this is a temporary workaround.  It is
 inefficient because always uses callee save reg to get GOT address.  I
 suppose we should generate pseudo reg for pic_offset_table_rtx and
 also set_got with this register as a destination in expand pass.
 After register allocation set_got may be transformed into get_pc_thunk
 call with proper hard reg.  But some target hook has to be used for
 this.

 Let me expand my idea a bit. IIRC, get_hard_reg_initial_val and
 friends will automatically emit intialization of a pseudo from
 pic_offset_table_rtx hard reg. After reload, real initialization of
 pic_offset_table_rtx hard reg is emitted in pro_and_epilogue pass. I
 don't know if this works with current implementation of dynamic
 pic_offset_table_rtx selection, though.

That means you should choose some hard reg early before register
allocation to be used for PIC reg initialization.  I do not like we
have to do this and want to just generate set_got with pseudo reg and
do not involve any additional hard reg. That would look like

(insn/f 168 167 169 2 (parallel [
(set (reg:SI 127)
(unspec:SI [
(const_int 0 [0])
] UNSPEC_SET_GOT))
(clobber (reg:CC 17 flags))
]) test.cc:42 -1
 (expr_list:REG_CFA_FLUSH_QUEUE (nil)
(nil)))

after expand pass.  r127 is pic_offset_table_rtx here. And after
reload it would become:

(insn/f 168 167 169 2 (parallel [
(set (reg:SI 3 bx)
(unspec:SI [
(const_int 0 [0])
] UNSPEC_SET_GOT))
(clobber (reg:CC 17 flags))
]) test.cc:42 -1
 (expr_list:REG_CFA_FLUSH_QUEUE (nil)
(nil)))

And no additional actions are required on pro_and_epilogue.  Also it
simplifies analysis whether we should generate set_got at all.
Current we check hard reg is ever live which is wrong with not fixed
ebx because any usage of hard reg used to init GOT doesn't mean GOT
usage.  And with my proposed scheme unused GOT would mean DCE just
removes useless set_got.

Ilya


 Uros.


Re: Enable EBX for x86 in 32bits PIC code

2014-08-28 Thread Uros Bizjak
On Thu, Aug 28, 2014 at 3:29 PM, Ilya Enkovich enkovich@gmail.com wrote:

 diff --git a/gcc/calls.c b/gcc/calls.c
 index 4285ec1..85dae6b 100644
 --- a/gcc/calls.c
 +++ b/gcc/calls.c
 @@ -1122,6 +1122,14 @@ initialize_argument_information (int num_actuals
 ATTRIBUTE_UNUSED,
   call_expr_arg_iterator iter;
   tree arg;

 +if (targetm.calls.implicit_pic_arg (fndecl ? fndecl : fntype))
 +  {
 +   gcc_assert (pic_offset_table_rtx);
 +   args[j].tree_value = make_tree (ptr_type_node,
 +   pic_offset_table_rtx);
 +   j--;
 +  }
 +
   if (struct_value_addr_value)
 {
 args[j].tree_value = struct_value_addr_value;

 So why do you need this?  Can't this be handled in the call/call_value
 expanders or what about attaching the use to CALL_INSN_FUNCTION_USAGE 
 from
 inside ix86_expand_call?  Basically I'm not seeing the need for another
 target hook here.  I think that would significantly simply the patch as
 well.

 GOT base address become an additional implicit arg with EBX relaxed
 and I handled it as all other args. I can move EBX initialization into
 ix86_expand_call. Would still need some hint from target to init
 pic_offset_table_rtx with proper value in the beginning of function
 expand.

 Maybe you can you use get_hard_reg_initial_val for this?

 Actually there is no input hard reg holding GOT address.  Currently I
 use initialization with ebx with following ebx initialization in
 prolog_epilog pass.  But this is a temporary workaround.  It is
 inefficient because always uses callee save reg to get GOT address.  I
 suppose we should generate pseudo reg for pic_offset_table_rtx and
 also set_got with this register as a destination in expand pass.
 After register allocation set_got may be transformed into get_pc_thunk
 call with proper hard reg.  But some target hook has to be used for
 this.

 Let me expand my idea a bit. IIRC, get_hard_reg_initial_val and
 friends will automatically emit intialization of a pseudo from
 pic_offset_table_rtx hard reg. After reload, real initialization of
 pic_offset_table_rtx hard reg is emitted in pro_and_epilogue pass. I
 don't know if this works with current implementation of dynamic
 pic_offset_table_rtx selection, though.

 That means you should choose some hard reg early before register
 allocation to be used for PIC reg initialization.  I do not like we
 have to do this and want to just generate set_got with pseudo reg and
 do not involve any additional hard reg. That would look like

 (insn/f 168 167 169 2 (parallel [
 (set (reg:SI 127)
 (unspec:SI [
 (const_int 0 [0])
 ] UNSPEC_SET_GOT))
 (clobber (reg:CC 17 flags))
 ]) test.cc:42 -1
  (expr_list:REG_CFA_FLUSH_QUEUE (nil)
 (nil)))

 after expand pass.  r127 is pic_offset_table_rtx here. And after
 reload it would become:

 (insn/f 168 167 169 2 (parallel [
 (set (reg:SI 3 bx)
 (unspec:SI [
 (const_int 0 [0])
 ] UNSPEC_SET_GOT))
 (clobber (reg:CC 17 flags))
 ]) test.cc:42 -1
  (expr_list:REG_CFA_FLUSH_QUEUE (nil)
 (nil)))

 And no additional actions are required on pro_and_epilogue.  Also it
 simplifies analysis whether we should generate set_got at all.
 Current we check hard reg is ever live which is wrong with not fixed
 ebx because any usage of hard reg used to init GOT doesn't mean GOT
 usage.  And with my proposed scheme unused GOT would mean DCE just
 removes useless set_got.

Yes this is better. I was under impression you want to retain current
initialization insertion in expand_prologue.

Uros.


Re: Enable EBX for x86 in 32bits PIC code

2014-08-28 Thread Florian Weimer

On 08/28/2014 03:01 PM, Uros Bizjak wrote:

I'd like to avoid X86_TUNE_RELAX_PIC_REG and always treat EBX as an
allocatable register. This way, we can avoid all mess with implicit
xchgs in atomic_compare_and_swapdwi_doubleword. Also, having
allocatable EBX would allow us to introduce __builtin_cpuid builtin
and cleanup cpiud.h.


It also makes writing solid inline assembly which has to use %ebx for 
some reason much easier.  We just fixed a glibc bug related to that.


--
Florian Weimer / Red Hat Product Security


Re: Enable EBX for x86 in 32bits PIC code

2014-08-28 Thread Uros Bizjak
On Fri, Aug 22, 2014 at 2:21 PM, Ilya Enkovich enkovich@gmail.com wrote:

 On Cauldron 2014 we had a couple of talks about relaxation of ebx usage in 
 32bit PIC mode.  It was decided that the best approach would be to not fix 
 ebx register, use speudo register for GOT base address and let allocator do 
 the rest.  This should be similar to how clang and icc work with GOT base 
 address.  I've been working for some time on such patch and now want to share 
 my results.

  (define_insn *pushtf
[(set (match_operand:TF 0 push_operand =,)
 -   (match_operand:TF 1 general_no_elim_operand x,*roF))]
 +   (match_operand:TF 1 nonimmediate_no_elim_operand x,*roF))]

Can you please explain the reason for this change (and a couple of
similar changes to push patterns) ?

Uros.


Re: Enable EBX for x86 in 32bits PIC code

2014-08-27 Thread Vladimir Makarov

On 2014-08-26 5:42 PM, Ilya Enkovich wrote:

Hi,

Here is a patch I tried.  I apply it over revision 214215.  Unfortunately I do 
not have a small reproducer but the problem can be easily reproduced on 
SPEC2000 benchmark 175.vpr.  The problem is in read_arch.c:701 where float 
value is compared with float constant 1.0.  It is inlined into read_arch 
function and can be easily found in RTL dump of function read_arch as a float 
comparison with 1.0 after the first call to strtod function.

Here is a compilation string I use:

gcc -m32 -mno-movbe -g3 -fdump-rtl-all-details -O2 -ffast-math -mfpmath=sse 
-m32  -march=slm -fPIE -pie -c -o read_arch.o   -DSPEC_CPU2000
read_arch.c

In my final assembler comparison with 1.0 looks like:

comiss  .LC11@GOTOFF(%ebp), %xmm0   # 1101  *cmpisf_sse [length = 7]

and %ebp here doesn't have a proper value.

I'll try to make a smaller reproducer if these instructions don't help.


I've managed to reproduce it.  Although it would be better to send the 
patch as an attachment.


The problem is actually in IRA not LRA.  IRA splits pseudo used for PIC. 
 Then in a region when a *new* pseudo used as PIC we rematerialize a 
constant which transformed in memory addressed through *original* PIC 
pseudo.


To solve the problem we should prevent such splitting and guarantee that 
PIC pseudo allocnos in different region gets the same hard reg.


The following patch should solve the problem.


Index: ira-color.c
===
--- ira-color.c (revision 214576)
+++ ira-color.c (working copy)
@@ -3239,9 +3239,10 @@
  ira_assert (ALLOCNO_CLASS (subloop_allocno) == rclass);
  ira_assert (bitmap_bit_p (subloop_node-all_allocnos,
ALLOCNO_NUM (subloop_allocno)));
- if ((flag_ira_region == IRA_REGION_MIXED)
-  (loop_tree_node-reg_pressure[pclass]
- = ira_class_hard_regs_num[pclass]))
+ if ((flag_ira_region == IRA_REGION_MIXED
+   (loop_tree_node-reg_pressure[pclass]
+  = ira_class_hard_regs_num[pclass]))
+ || regno == (int) REGNO (pic_offset_table_rtx))
{
  if (! ALLOCNO_ASSIGNED_P (subloop_allocno))
{
Index: ira-emit.c
===
--- ira-emit.c  (revision 214576)
+++ ira-emit.c  (working copy)
@@ -620,7 +620,8 @@
  /* don't create copies because reload can spill an
 allocno set by copy although the allocno will not
 get memory slot.  */
- || ira_equiv_no_lvalue_p (regno)))
+ || ira_equiv_no_lvalue_p (regno)
+ || ALLOCNO_REGNO (allocno) == REGNO (pic_offset_table_rtx)))
continue;
  original_reg = allocno_emit_reg (allocno);
  if (parent_allocno == NULL


  1   2   >