RE: [PATCH 2/X, i386, PR54232] Enable EBX for x86 in 32bits PIC code
Those macros use =b etc. in asm constraints, so IMHO you'll get the same error as for say: int foo (void) { bar (); int i = 0; asm volatile ( : +b (i)); bar (); return i; } when compiled by gcc 4.9 and earlier with -O2 -m32 -fpic: error: inconsistent operand constraints in an ‘asm’ I see, thanks! Fixed patch is here - https://sourceware.org/ml/libc-alpha/2014-10/msg00746.html Igor Jakub
RE: [PATCH 2/X, i386, PR54232] Enable EBX for x86 in 32bits PIC code
Posted a patch in libc-alpha: https://sourceware.org/ml/libc-alpha/2014-10/msg00701.html -Original Message- From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches- ow...@gcc.gnu.org] On Behalf Of Jeff Law Sent: Saturday, October 25, 2014 3:42 AM To: Evgeny Stupachenko; Andrew Pinski Cc: Uros Bizjak; Vladimir Makarov; GCC Patches Subject: Re: [PATCH 2/X, i386, PR54232] Enable EBX for x86 in 32bits PIC code On 10/24/14 17:37, Evgeny Stupachenko wrote: What if we remove the check? glibc build pass? That would be my inclination... But it's not my decision to make. The first check is to verify glibc builds and passes its testsuite with the new compiler and that check removed. jeff
Re: [PATCH 2/X, i386, PR54232] Enable EBX for x86 in 32bits PIC code
On Thu, Oct 30, 2014 at 08:48:57AM +, Zamyatin, Igor wrote: Posted a patch in libc-alpha: https://sourceware.org/ml/libc-alpha/2014-10/msg00701.html That looks wrong. The non-PIC patterns that are enabled unconditionally with the patch set/use ebx, which will not work with pre-GCC 5 in PIC mode. Jakub
RE: [PATCH 2/X, i386, PR54232] Enable EBX for x86 in 32bits PIC code
On Thu, Oct 30, 2014 at 08:48:57AM +, Zamyatin, Igor wrote: Posted a patch in libc-alpha: https://sourceware.org/ml/libc-alpha/2014-10/msg00701.html That looks wrong. The non-PIC patterns that are enabled unconditionally with the patch set/use ebx, which will not work with pre-GCC 5 in PIC mode. Could you please specify why do you think it is wrong for PIC mode for pre-GCC 5? Thanks, Igor Jakub
Re: [PATCH 2/X, i386, PR54232] Enable EBX for x86 in 32bits PIC code
On Thu, Oct 30, 2014 at 12:34:45PM +, Zamyatin, Igor wrote: On Thu, Oct 30, 2014 at 08:48:57AM +, Zamyatin, Igor wrote: Posted a patch in libc-alpha: https://sourceware.org/ml/libc-alpha/2014-10/msg00701.html That looks wrong. The non-PIC patterns that are enabled unconditionally with the patch set/use ebx, which will not work with pre-GCC 5 in PIC mode. Could you please specify why do you think it is wrong for PIC mode for pre-GCC 5? Those macros use =b etc. in asm constraints, so IMHO you'll get the same error as for say: int foo (void) { bar (); int i = 0; asm volatile ( : +b (i)); bar (); return i; } when compiled by gcc 4.9 and earlier with -O2 -m32 -fpic: error: inconsistent operand constraints in an ‘asm’ Jakub
Re: [PATCH 1/X, i386, PR54232] Enable EBX for x86 in 32bits PIC code
The test passes now. So let's remove xfail. 2014-10-29 Evgeny Stupachenko evstu...@gmail.com gcc/testsuite * gcc.target/i386/pr23098.c: Remove xfail. diff --git a/gcc/testsuite/gcc.target/i386/pr23098.c b/gcc/testsuite/gcc.target/i386/pr23098.c index 7f118dc..7f118bb 100644 --- a/gcc/testsuite/gcc.target/i386/pr23098.c +++ b/gcc/testsuite/gcc.target/i386/pr23098.c @@ -1,7 +1,7 @@ /* PR rtl-optimization/23098 */ /* { dg-do compile } */ /* { dg-options -O2 -fPIC } */ -/* { dg-final { scan-assembler-not \.LC\[0-9\] { xfail *-*-* } } } */ +/* { dg-final { scan-assembler-not \.LC\[0-9\] } } */ /* { dg-require-effective-target ia32 } */ /* { dg-require-effective-target fpic } */ On Thu, Oct 23, 2014 at 4:19 PM, Rainer Orth r...@cebitec.uni-bielefeld.de wrote: Evgeny Stupachenko evstu...@gmail.com writes: Reattached. On Mon, Oct 13, 2014 at 8:22 PM, Uros Bizjak ubiz...@gmail.com wrote: On Mon, Oct 13, 2014 at 4:53 PM, Evgeny Stupachenko evstu...@gmail.com wrote: ChangeLog for testsuite: 2014-10-13 Evgeny Stupachenko evstu...@gmail.com PR target/8340 PR middle-end/47602 PR rtl-optimization/55458 * gcc.target/i386/pic-1.c: Remove dg-error as test should pass now. * gcc.target/i386/pr55458.c: Likewise. * gcc.target/i386/pr47602.c: New. * gcc.target/i386/pr23098.c: Move to XFAIL. The unconditional XFAIL is wrong: the test now XPASSes on i386-pc-solaris2.11 and x86_64-unknown-linux-gnu, i686-unknown-linux-gnu for 32-bit. Please fix. Rainer -- - Rainer Orth, Center for Biotechnology, Bielefeld University
Re: [PATCH 1/X, i386, PR54232] Enable EBX for x86 in 32bits PIC code
On Wed, Oct 29, 2014 at 1:28 PM, Evgeny Stupachenko evstu...@gmail.com wrote: The test passes now. So let's remove xfail. 2014-10-29 Evgeny Stupachenko evstu...@gmail.com gcc/testsuite * gcc.target/i386/pr23098.c: Remove xfail. OK. Thanks, Uros.
Re: [PATCH 2/X, i386, PR54232] Enable EBX for x86 in 32bits PIC code
On Fri, Oct 10, 2014 at 12:43 AM, Evgeny Stupachenko evstu...@gmail.com wrote: i386 specific part of the patch: 2014-10-08 Ilya Enkovich ilya.enkov...@intel.com Vladimir Makarov vmaka...@redhat.com * gcc/config/i386/i386.c (ix86_use_pseudo_pic_reg): New. (ix86_init_pic_reg): New. (ix86_select_alt_pic_regnum): Add check on pseudo register. (ix86_save_reg): Likewise. (ix86_expand_prologue): Remove irrelevant code. (ix86_output_function_epilogue): Add check on pseudo register. (set_pic_reg_ever_alive): New. (legitimize_pic_address): Replace df_set_regs_ever_live with new set_pic_reg_ever_alive. (legitimize_tls_address): Likewise. (ix86_pic_register_p): New check. (ix86_delegitimize_address): Add check on pseudo register. (ix86_expand_call): Insert move from pseudo PIC register to ABI defined REAL_PIC_OFFSET_TABLE_REGNUM. (TARGET_INIT_PIC_REG): New. (TARGET_USE_PSEUDO_PIC_REG): New. (PIC_OFFSET_TABLE_REGNUM): New check. This patch breaks glibc's ld.so on i686. glibc has a check to make sure the PIC register is setup correctly: /* Consistency check for position-independent code. */ #ifdef __PIC__ # define check_consistency() \ ({ int __res; \ __asm__ __volatile__ \ (LOAD_PIC_REG_STR (cx) ; \ subl %%ebx, %%ecx; \ je 1f; \ ud2; \ 1:\n \ : =c (__res)); \ __res; }) #endif This depends on ebx being the PIC register. Now we don't have this so we get ud2 in some cases. Thanks, Andrew Pinski
Re: [PATCH 2/X, i386, PR54232] Enable EBX for x86 in 32bits PIC code
What if we remove the check? glibc build pass? On Sat, Oct 25, 2014 at 3:09 AM, Andrew Pinski pins...@gmail.com wrote: On Fri, Oct 10, 2014 at 12:43 AM, Evgeny Stupachenko evstu...@gmail.com wrote: i386 specific part of the patch: 2014-10-08 Ilya Enkovich ilya.enkov...@intel.com Vladimir Makarov vmaka...@redhat.com * gcc/config/i386/i386.c (ix86_use_pseudo_pic_reg): New. (ix86_init_pic_reg): New. (ix86_select_alt_pic_regnum): Add check on pseudo register. (ix86_save_reg): Likewise. (ix86_expand_prologue): Remove irrelevant code. (ix86_output_function_epilogue): Add check on pseudo register. (set_pic_reg_ever_alive): New. (legitimize_pic_address): Replace df_set_regs_ever_live with new set_pic_reg_ever_alive. (legitimize_tls_address): Likewise. (ix86_pic_register_p): New check. (ix86_delegitimize_address): Add check on pseudo register. (ix86_expand_call): Insert move from pseudo PIC register to ABI defined REAL_PIC_OFFSET_TABLE_REGNUM. (TARGET_INIT_PIC_REG): New. (TARGET_USE_PSEUDO_PIC_REG): New. (PIC_OFFSET_TABLE_REGNUM): New check. This patch breaks glibc's ld.so on i686. glibc has a check to make sure the PIC register is setup correctly: /* Consistency check for position-independent code. */ #ifdef __PIC__ # define check_consistency() \ ({ int __res; \ __asm__ __volatile__ \ (LOAD_PIC_REG_STR (cx) ; \ subl %%ebx, %%ecx; \ je 1f; \ ud2; \ 1:\n \ : =c (__res)); \ __res; }) #endif This depends on ebx being the PIC register. Now we don't have this so we get ud2 in some cases. Thanks, Andrew Pinski
Re: [PATCH 2/X, i386, PR54232] Enable EBX for x86 in 32bits PIC code
On 10/24/14 17:37, Evgeny Stupachenko wrote: What if we remove the check? glibc build pass? That would be my inclination... But it's not my decision to make. The first check is to verify glibc builds and passes its testsuite with the new compiler and that check removed. jeff
Re: [PATCH 1/X, i386, PR54232] Enable EBX for x86 in 32bits PIC code
Evgeny Stupachenko evstu...@gmail.com writes: Reattached. On Mon, Oct 13, 2014 at 8:22 PM, Uros Bizjak ubiz...@gmail.com wrote: On Mon, Oct 13, 2014 at 4:53 PM, Evgeny Stupachenko evstu...@gmail.com wrote: ChangeLog for testsuite: 2014-10-13 Evgeny Stupachenko evstu...@gmail.com PR target/8340 PR middle-end/47602 PR rtl-optimization/55458 * gcc.target/i386/pic-1.c: Remove dg-error as test should pass now. * gcc.target/i386/pr55458.c: Likewise. * gcc.target/i386/pr47602.c: New. * gcc.target/i386/pr23098.c: Move to XFAIL. The unconditional XFAIL is wrong: the test now XPASSes on i386-pc-solaris2.11 and x86_64-unknown-linux-gnu, i686-unknown-linux-gnu for 32-bit. Please fix. Rainer -- - Rainer Orth, Center for Biotechnology, Bielefeld University
Re: [PATCH 1/X, i386, PR54232] Enable EBX for x86 in 32bits PIC code
On Fri, Oct 10, 2014 at 10:03:38AM -0600, Jeff Law wrote: Can you add a PR markers to your changelog PR target/8340 PR middle-end/47602 PR rtl-optimization/55458 Actually I think there is an additional test in 47602. Can you please add it to the suite? You'll also want to change the state of 47602 to RESOLVED/FIXED. Unfortunately this broke bootstrap on x86_64/i686-linux, see http://gcc.gnu.org/PR63534 - pretty much everything with -m32 -fsplit-stack -fpic ICEs, -m32 -fpic -p results in wrong-code, and I see significant code quality regressions even on simple testcases. For the first two, I think (and said it before already) that the current model of emitting set_got from a target hook during RA can't work, as there can be calls in the prologue, and the prologue is inserted before the set_got in that case. I really think the RA should in that case just tell the backend whether and in which register it wants to have the PIC register loaded upon start of the function, and it should be emit prologue pass that should arrange for that. As for the code quality, either some RA improvements are needed, or postreload must be able to fix it up, or hardreg propagation (though, cprop_hardreg is forward propagation rather than backwards, right?). Better before prologue is emitted though, because that will save/restore the badly chosen hard reg too. Jakub
Re: [PATCH 1/X, i386, PR54232] Enable EBX for x86 in 32bits PIC code
On Mon, Oct 13, 2014 at 11:49 AM, H.J. Lu hjl.to...@gmail.com wrote: On Mon, Oct 13, 2014 at 9:32 AM, Evgeny Stupachenko evstu...@gmail.com wrote: Reattached. On Mon, Oct 13, 2014 at 8:22 PM, Uros Bizjak ubiz...@gmail.com wrote: On Mon, Oct 13, 2014 at 4:53 PM, Evgeny Stupachenko evstu...@gmail.com wrote: ChangeLog for testsuite: 2014-10-13 Evgeny Stupachenko evstu...@gmail.com PR target/8340 PR middle-end/47602 PR rtl-optimization/55458 * gcc.target/i386/pic-1.c: Remove dg-error as test should pass now. * gcc.target/i386/pr55458.c: Likewise. * gcc.target/i386/pr47602.c: New. * gcc.target/i386/pr23098.c: Move to XFAIL. Reversed patch was attached. Please repost. Uros. This caused a regression: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63527 Another bootstrap failure: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63536 -- H.J.
Re: [PATCH 1/X, i386, PR54232] Enable EBX for x86 in 32bits PIC code
On 10/14/14 07:00, Jakub Jelinek wrote: For the first two, I think (and said it before already) that the current model of emitting set_got from a target hook during RA can't work, as there can be calls in the prologue, and the prologue is inserted before the set_got in that case. I really think the RA should in that case just tell the backend whether and in which register it wants to have the PIC register loaded upon start of the function, and it should be emit prologue pass that should arrange for that. That works for me -- I've been encouraging Intel to push emitting the PIC setup further and further back in the RTL pipeline. Their early patches had it very early in the RTL pipeline and naturally there was fallout/bleedout in various places in the optimizers. I don't see much value in emitting the PIC setup prior to allocation, all I see is problems. As for the code quality, either some RA improvements are needed, or postreload must be able to fix it up, or hardreg propagation (though, cprop_hardreg is forward propagation rather than backwards, right?). Better before prologue is emitted though, because that will save/restore the badly chosen hard reg too. RA improvements are the way to go -- however, my understanding is that overall the code is better now than it was before Intel's changes, so I don't consider the performance side as a blocker for this code. The biggest performance issue identified so far is rematerialization. The initial patch Intel sent to me was totally unacceptable as it just hacked off optimizers rather than digging into the guts of why IRA/LRA was unable to sanely rematerialize the PIC register value. jeff
Re: [PATCH 1/X, i386, PR54232] Enable EBX for x86 in 32bits PIC code
On Tue, Oct 14, 2014 at 9:43 AM, Jeff Law l...@redhat.com wrote: RA improvements are the way to go -- however, my understanding is that overall the code is better now than it was before Intel's changes, so I don't consider the performance side as a blocker for this code. The new approach improves PIC code quality in functions where there no frequent GOT access and extra register helps. For ld.so and libc.so from glibc build, we use 2 registers to access GOT instead of one register which may lead to lower performance in shared libraries. -- H.J.
Re: [PATCH 1/X, i386, PR54232] Enable EBX for x86 in 32bits PIC code
On Fri, Oct 10, 2014 at 8:03 PM, Jeff Law l...@redhat.com wrote: On 10/10/14 01:42, Evgeny Stupachenko wrote: Hi, The patch enables EBX in RA for x86 32bits PIC mode. It was discussed here: https://gcc.gnu.org/ml/gcc-patches/2014-09/msg02513.html Now there is working version with good performance and stability level - it could be a solid first step of EBX enabling. Bootstrap and make check passed. There are several changes in -m32 make check. New pass: gcc.target/i386/pr57003.c - before patch there was not enough registers to PASS ?!? That doesn't make a lot of sense. More likely it was Uros's fix from yesterday to regcprop which causes this to pass again. Correct. I've marked it by mistake. The test is flaky and the patch does not change anything for the test. Is it possible you updated your sources between testing runs and as a result picked up Uros's fix? New fails: gcc.target/i386/pic-1.c (test for errors, line 12) - now there are no errors as we can do asm insertions with EBX I think you should remove the dg-error directive. That turns this test into a simple confirmation that we can use %ebx in an asm even when generating PIC code. Can you add a PR markers to your changelog PR target/8340 PR middle-end/47602 PR rtl-optimization/55458 Actually I think there is an additional test in 47602. Can you please add it to the suite? You'll also want to change the state of 47602 to RESOLVED/FIXED. gcc.target/i386/pr23098.c scan-assembler-not .LC[0-9] - potential performance opportunity using constant immediate If you're not going to fix it, then you should xfail it. gcc.target/i386/pr55458.c (test for errors, line 10) - now there are no errors as there enough registers Right. Remove the dg-error and turn this into a test that effective verifies that %ebx is no longer fixed when generating PIC code on i686. With those changes this is OK for the trunk. jeff ChangeLog for testsuite: 2014-10-13 Evgeny Stupachenko evstu...@gmail.com PR target/8340 PR middle-end/47602 PR rtl-optimization/55458 * gcc.target/i386/pic-1.c: Remove dg-error as test should pass now. * gcc.target/i386/pr55458.c: Likewise. * gcc.target/i386/pr47602.c: New. * gcc.target/i386/pr23098.c: Move to XFAIL. enabling_ebx_tests.patch Description: Binary data
Re: [PATCH 2/X, i386, PR54232] Enable EBX for x86 in 32bits PIC code
-#define PIC_OFFSET_TABLE_REGNUM \ - ((TARGET_64BIT (ix86_cmodel == CM_SMALL_PIC \ - || TARGET_PECOFF)) \ - || !flag_pic ? INVALID_REGNUM \ - : reload_completed ? REGNO (pic_offset_table_rtx) \ +#define PIC_OFFSET_TABLE_REGNUM \ + ((TARGET_64BIT (ix86_cmodel == CM_SMALL_PIC \ + || TARGET_PECOFF)) \ + || !flag_pic ? INVALID_REGNUM \ + : pic_offset_table_rtx ? INVALID_REGNUM \ : REAL_PIC_OFFSET_TABLE_REGNUM) No negative conditions, please. Also, please follow established multi-level condition format, please see e.g. HARD_REGNO_NREGS definition in i386.h. I don't see how we can avoid negative condition here. If we remove not from !flag_pic we'll need to add not to TARGET_64BIT and TARGET_PECOFF. I've done it this way: +#define PIC_OFFSET_TABLE_REGNUM \ + ((TARGET_64BIT (ix86_cmodel == CM_SMALL_PIC \ + || TARGET_PECOFF)) \ + || !flag_pic \ + ? INVALID_REGNUM\ + : pic_offset_table_rtx \ + ? INVALID_REGNUM \ + : REAL_PIC_OFFSET_TABLE_REGNUM) Is it ok? On Fri, Oct 10, 2014 at 6:01 PM, Evgeny Stupachenko evstu...@gmail.com wrote: Updated ChangeLog: 2014-10-10 Ilya Enkovich ilya.enkov...@intel.com Vladimir Makarov vmaka...@redhat.com * config/i386/i386.c (ix86_use_pseudo_pic_reg): New. (ix86_init_pic_reg): New. (ix86_select_alt_pic_regnum): Add check on pseudo register. (ix86_save_reg): Likewise. (ix86_expand_prologue): Remove PIC register initialization now performed in ix86_init_pic_reg. (ix86_output_function_epilogue): Add check on pseudo register. (set_pic_reg_ever_alive): New. (legitimize_pic_address): Replace df_set_regs_ever_live with new set_pic_reg_ever_alive. (legitimize_tls_address): Likewise. (ix86_pic_register_p): New check. (ix86_delegitimize_address): Add check on pseudo register. (ix86_expand_call): Insert move from pseudo PIC register to ABI defined REAL_PIC_OFFSET_TABLE_REGNUM. (TARGET_INIT_PIC_REG): New. (TARGET_USE_PSEUDO_PIC_REG): New. * config/i386/i386.h (PIC_OFFSET_TABLE_REGNUM): Return INVALID_REGNUM if pic_offset_table_rtx exists. - if (pic_offset_table_rtx) + if (pic_offset_table_rtx + (!reload_completed || !ix86_use_pseudo_pic_reg ())) Hm, can you please add a comment for this change? I've added the following comment to the patch: -in which case we return (%ecx - %ebx) + foo. */ +in which case we return (%ecx - %ebx) + foo. + +Note that when pseudo_pic_reg is used we can generate it only +before reload_completed. */ On Fri, Oct 10, 2014 at 4:36 PM, Jakub Jelinek ja...@redhat.com wrote: On Fri, Oct 10, 2014 at 02:34:07PM +0200, Rainer Orth wrote: Uros Bizjak ubiz...@gmail.com writes: On Fri, Oct 10, 2014 at 9:43 AM, Evgeny Stupachenko evstu...@gmail.com wrote: i386 specific part of the patch: 2014-10-08 Ilya Enkovich ilya.enkov...@intel.com Vladimir Makarov vmaka...@redhat.com * gcc/config/i386/i386.c (ix86_use_pseudo_pic_reg): New. Evgeny: here and in your other submissions: drop the gcc prefix from the pathnames. They are all relative to the directory the ChangeLog lives in. And add a blank line after after the e-mail lines. Jakub
Re: [PATCH 3/X, i386, PR54232] Enable EBX for x86 in 32bits PIC code
Patch updated with the comment: diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index 2a64d2d..5fd6a82 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -12455,9 +12455,18 @@ ix86_address_cost (rtx x, enum machine_mode, addr_space_t, bool) || REGNO (parts.index) = FIRST_PSEUDO_REGISTER))) cost++; + /* When address base or index is pic_offset_table_rtx we don't increase + address cost. When a memop with pic_offset_table_rtx is not invariant + itself it most likely means that base or index is not invariant. + Therefore only pic_offset_table_rtx could be hoisted out, which is not + profitable for x86. */ if (parts.base + (!pic_offset_table_rtx + || REGNO (pic_offset_table_rtx) != REGNO(parts.base)) (!REG_P (parts.base) || REGNO (parts.base) = FIRST_PSEUDO_REGISTER) parts.index + (!pic_offset_table_rtx + || REGNO (pic_offset_table_rtx) != REGNO(parts.index)) (!REG_P (parts.index) || REGNO (parts.index) = FIRST_PSEUDO_REGISTER) parts.base != parts.index) cost++; On Fri, Oct 10, 2014 at 3:04 PM, Uros Bizjak ubiz...@gmail.com wrote: On Fri, Oct 10, 2014 at 9:58 AM, Evgeny Stupachenko evstu...@gmail.com wrote: the patch improves performance when previous are applied. It makes RTL loop invariant behavior for GOT loads same as it was before the 2 previous patches. The patch fixes x86 address cost so that cost for addresses with GOT register becomes less, how it was before enabling EBX. In x86_address_cost the result of “REGNO (parts.base) = FIRST_PSEUDO_REGISTER” for hard ebx was always false. The patch makes condition result the same when parts.base is GOT register (the same for parts.index). 2014-10-08 Evgeny Stupachenko evstu...@gmail.com * gcc/config/i386/i386.c (ix86_address_cost): Lower cost for when address contains GOT register. diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index b43e870..9d8cfd1 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -12497,8 +12497,12 @@ ix86_address_cost (rtx x, enum machine_mode, addr_space_t, bool) cost++; Please add a short comment here, explaining the reason for new condition. if (parts.base + (!pic_offset_table_rtx + || REGNO (pic_offset_table_rtx) != REGNO(parts.base)) (!REG_P (parts.base) || REGNO (parts.base) = FIRST_PSEUDO_REGISTER) parts.index + (!pic_offset_table_rtx + || REGNO (pic_offset_table_rtx) != REGNO(parts.index)) (!REG_P (parts.index) || REGNO (parts.index) = FIRST_PSEUDO_REGISTER) parts.base != parts.index) cost++; Otherwise LGTM, but please repost the patch with a comment. Uros.
Re: [PATCH 2/X, i386, PR54232] Enable EBX for x86 in 32bits PIC code
On Mon, Oct 13, 2014 at 5:01 PM, Evgeny Stupachenko evstu...@gmail.com wrote: -#define PIC_OFFSET_TABLE_REGNUM \ - ((TARGET_64BIT (ix86_cmodel == CM_SMALL_PIC \ - || TARGET_PECOFF)) \ - || !flag_pic ? INVALID_REGNUM \ - : reload_completed ? REGNO (pic_offset_table_rtx) \ +#define PIC_OFFSET_TABLE_REGNUM \ + ((TARGET_64BIT (ix86_cmodel == CM_SMALL_PIC \ + || TARGET_PECOFF)) \ + || !flag_pic ? INVALID_REGNUM \ + : pic_offset_table_rtx ? INVALID_REGNUM \ : REAL_PIC_OFFSET_TABLE_REGNUM) No negative conditions, please. Also, please follow established multi-level condition format, please see e.g. HARD_REGNO_NREGS definition in i386.h. I don't see how we can avoid negative condition here. If we remove not from !flag_pic we'll need to add not to TARGET_64BIT and TARGET_PECOFF. I've done it this way: +#define PIC_OFFSET_TABLE_REGNUM \ + ((TARGET_64BIT (ix86_cmodel == CM_SMALL_PIC \ + || TARGET_PECOFF)) \ + || !flag_pic \ + ? INVALID_REGNUM\ + : pic_offset_table_rtx \ + ? INVALID_REGNUM \ + : REAL_PIC_OFFSET_TABLE_REGNUM) Is it ok? Oh, indeed. I missed the logical or. Maybe put the first condition into parenthesis, to avoid confusion. OK in any case. Thanks, Uros.
Re: [PATCH 3/X, i386, PR54232] Enable EBX for x86 in 32bits PIC code
On Mon, Oct 13, 2014 at 5:17 PM, Evgeny Stupachenko evstu...@gmail.com wrote: Patch updated with the comment: diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index 2a64d2d..5fd6a82 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -12455,9 +12455,18 @@ ix86_address_cost (rtx x, enum machine_mode, addr_space_t, bool) || REGNO (parts.index) = FIRST_PSEUDO_REGISTER))) cost++; + /* When address base or index is pic_offset_table_rtx we don't increase + address cost. When a memop with pic_offset_table_rtx is not invariant + itself it most likely means that base or index is not invariant. + Therefore only pic_offset_table_rtx could be hoisted out, which is not + profitable for x86. */ if (parts.base + (!pic_offset_table_rtx + || REGNO (pic_offset_table_rtx) != REGNO(parts.base)) (!REG_P (parts.base) || REGNO (parts.base) = FIRST_PSEUDO_REGISTER) parts.index + (!pic_offset_table_rtx + || REGNO (pic_offset_table_rtx) != REGNO(parts.index)) (!REG_P (parts.index) || REGNO (parts.index) = FIRST_PSEUDO_REGISTER) parts.base != parts.index) cost++; LGTM. OK. Thanks, Uros.
Re: [PATCH 1/X, i386, PR54232] Enable EBX for x86 in 32bits PIC code
On Mon, Oct 13, 2014 at 4:53 PM, Evgeny Stupachenko evstu...@gmail.com wrote: ChangeLog for testsuite: 2014-10-13 Evgeny Stupachenko evstu...@gmail.com PR target/8340 PR middle-end/47602 PR rtl-optimization/55458 * gcc.target/i386/pic-1.c: Remove dg-error as test should pass now. * gcc.target/i386/pr55458.c: Likewise. * gcc.target/i386/pr47602.c: New. * gcc.target/i386/pr23098.c: Move to XFAIL. Reversed patch was attached. Please repost. Uros.
Re: [PATCH 1/X, i386, PR54232] Enable EBX for x86 in 32bits PIC code
On 10/13/14 08:53, Evgeny Stupachenko wrote: ChangeLog for testsuite: 2014-10-13 Evgeny Stupachenko evstu...@gmail.com PR target/8340 PR middle-end/47602 PR rtl-optimization/55458 * gcc.target/i386/pic-1.c: Remove dg-error as test should pass now. * gcc.target/i386/pr55458.c: Likewise. * gcc.target/i386/pr47602.c: New. * gcc.target/i386/pr23098.c: Move to XFAIL. Looks like you goof'd the patch slightly (reversed). It's trivial enough that I can see what the correctly ordered patch is doing. OK for the trunk at the same time the rest of the bits go in. jeff
Re: [PATCH 1/X, i386, PR54232] Enable EBX for x86 in 32bits PIC code
Reattached. On Mon, Oct 13, 2014 at 8:22 PM, Uros Bizjak ubiz...@gmail.com wrote: On Mon, Oct 13, 2014 at 4:53 PM, Evgeny Stupachenko evstu...@gmail.com wrote: ChangeLog for testsuite: 2014-10-13 Evgeny Stupachenko evstu...@gmail.com PR target/8340 PR middle-end/47602 PR rtl-optimization/55458 * gcc.target/i386/pic-1.c: Remove dg-error as test should pass now. * gcc.target/i386/pr55458.c: Likewise. * gcc.target/i386/pr47602.c: New. * gcc.target/i386/pr23098.c: Move to XFAIL. Reversed patch was attached. Please repost. Uros. enabling_ebx_tests.patch Description: Binary data
Re: [PATCH 1/X, i386, PR54232] Enable EBX for x86 in 32bits PIC code
On Mon, Oct 13, 2014 at 6:32 PM, Evgeny Stupachenko evstu...@gmail.com wrote: Reattached. On Mon, Oct 13, 2014 at 8:22 PM, Uros Bizjak ubiz...@gmail.com wrote: On Mon, Oct 13, 2014 at 4:53 PM, Evgeny Stupachenko evstu...@gmail.com wrote: ChangeLog for testsuite: 2014-10-13 Evgeny Stupachenko evstu...@gmail.com PR target/8340 PR middle-end/47602 PR rtl-optimization/55458 * gcc.target/i386/pic-1.c: Remove dg-error as test should pass now. * gcc.target/i386/pr55458.c: Likewise. * gcc.target/i386/pr47602.c: New. * gcc.target/i386/pr23098.c: Move to XFAIL. Reversed patch was attached. Please repost. OK. Thanks, Uros.
Re: [PATCH 1/X, i386, PR54232] Enable EBX for x86 in 32bits PIC code
On Mon, Oct 13, 2014 at 9:32 AM, Evgeny Stupachenko evstu...@gmail.com wrote: Reattached. On Mon, Oct 13, 2014 at 8:22 PM, Uros Bizjak ubiz...@gmail.com wrote: On Mon, Oct 13, 2014 at 4:53 PM, Evgeny Stupachenko evstu...@gmail.com wrote: ChangeLog for testsuite: 2014-10-13 Evgeny Stupachenko evstu...@gmail.com PR target/8340 PR middle-end/47602 PR rtl-optimization/55458 * gcc.target/i386/pic-1.c: Remove dg-error as test should pass now. * gcc.target/i386/pr55458.c: Likewise. * gcc.target/i386/pr47602.c: New. * gcc.target/i386/pr23098.c: Move to XFAIL. Reversed patch was attached. Please repost. Uros. This caused a regression: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63527 -- H.J.
[PATCH 1/X, i386, PR54232] Enable EBX for x86 in 32bits PIC code
Hi, The patch enables EBX in RA for x86 32bits PIC mode. It was discussed here: https://gcc.gnu.org/ml/gcc-patches/2014-09/msg02513.html Now there is working version with good performance and stability level - it could be a solid first step of EBX enabling. Bootstrap and make check passed. There are several changes in -m32 make check. New pass: gcc.target/i386/pr57003.c - before patch there was not enough registers to PASS New fails: gcc.target/i386/pic-1.c (test for errors, line 12) - now there are no errors as we can do asm insertions with EBX gcc.target/i386/pr23098.c scan-assembler-not .LC[0-9] - potential performance opportunity using constant immediate gcc.target/i386/pr55458.c (test for errors, line 10) - now there are no errors as there enough registers Performance on Silvermont (for the group of 3 patches): at -Ofast -flto -funroll-loops -fPIC EEMBC benchs: +6% CoreMark +3% Denmark Individual test gained up to +25% SPEC2000int is almost unchanged, SPEC2000fp +1,7%: at -O2 -fPIC EEMBC benchs: +2% CoreMark +5% Denmark Individual test gained up to +25% SPEC2000int and SPEC2000fp are almost unchanged: ChangeLog: 2014-10-08 Ilya Enkovich ilya.enkov...@intel.com Vladimir Makarov vmaka...@redhat.com * doc/tm.texi.in (TARGET_USE_PSEUDO_PIC_REG, TARGET_INIT_PIC_REG): Document. * doc/tm.texi: Regenerate. * gcc/function.c (assign_parms): Generate pseudo register for PIC. * gcc/init-regs.c (initialize_uninitialized_regs): Ignor pseudo PIC register. * gcc/ira-color.c (color_pass): Add check on pseudo register. * gcc/ira-emit.c (change_loop): Don't create copies for PIC pseudo register. * gcc/ira.c (split_live_ranges_for_shrink_wrap): Add check on pseudo register. (ira): Add target specific PIC register initialization. (do_reload): Keep PIC pseudo register. * gcc/lra-assigns.c (spill_for): Add checks on pseudo register. * gcc/lra-constraints.c (contains_symbol_ref_p): New. (lra_constraints): Enable lra risky transformations when PIC is pseudo register. * gcc/shrink-wrap.c (try_shrink_wrapping): Add check on pseudo register. * gcc/target.def (use_pseudo_pic_reg): New. (init_pic_reg): New. enabling_ebx_general.patch Description: Binary data
[PATCH 2/X, i386, PR54232] Enable EBX for x86 in 32bits PIC code
i386 specific part of the patch: 2014-10-08 Ilya Enkovich ilya.enkov...@intel.com Vladimir Makarov vmaka...@redhat.com * gcc/config/i386/i386.c (ix86_use_pseudo_pic_reg): New. (ix86_init_pic_reg): New. (ix86_select_alt_pic_regnum): Add check on pseudo register. (ix86_save_reg): Likewise. (ix86_expand_prologue): Remove irrelevant code. (ix86_output_function_epilogue): Add check on pseudo register. (set_pic_reg_ever_alive): New. (legitimize_pic_address): Replace df_set_regs_ever_live with new set_pic_reg_ever_alive. (legitimize_tls_address): Likewise. (ix86_pic_register_p): New check. (ix86_delegitimize_address): Add check on pseudo register. (ix86_expand_call): Insert move from pseudo PIC register to ABI defined REAL_PIC_OFFSET_TABLE_REGNUM. (TARGET_INIT_PIC_REG): New. (TARGET_USE_PSEUDO_PIC_REG): New. (PIC_OFFSET_TABLE_REGNUM): New check. enabling_ebx_i386.patch Description: Binary data
[PATCH 3/X, i386, PR54232] Enable EBX for x86 in 32bits PIC code
the patch improves performance when previous are applied. It makes RTL loop invariant behavior for GOT loads same as it was before the 2 previous patches. It improves 164.gzip (+9%), 253.perlbmk (+2%) giving ~0.5% to SPEC2000int (compiled with “-m32 -Ofast -flto -funroll-loops -fPIC” For example in 164.gzip. Before enabling EBX: loop begin: (I) 1. SI162 = prev (global address) 2. SI163 = SI142 0xfff (SI 142 modified in the loop) 3. SI164 = EBX + SI162 4. HI107 = SI163*2 + SI164 5. SI142 = HI107 Only INSN 1. treated as invariant and later combine propagates 2,3,4 into 5. After enabling EBX: loop begin: (I) 1. SI163 = prev (global address) 2. SI164 = SI142 0xfff (SI 142 modified in the loop) (I) 3. SI165 = SI143 + SI163 4. HI107 = SI164*2 + SI165 5. SI142 = HI107 INSNs 1. and 3. are treated as invariants (143 is GOT register) and hoisted outside the loop After that combine pass was unable to combine INSNs inside and outside the loop, which lead to higher register pressure and therefore new spills/fills. The patch fixes x86 address cost so that cost for addresses with GOT register becomes less, how it was before enabling EBX. In x86_address_cost the result of “REGNO (parts.base) = FIRST_PSEUDO_REGISTER” for hard ebx was always false. The patch makes condition result the same when parts.base is GOT register (the same for parts.index). 2014-10-08 Evgeny Stupachenko evstu...@gmail.com * gcc/config/i386/i386.c (ix86_address_cost): Lower cost for when address contains GOT register. diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index b43e870..9d8cfd1 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -12497,8 +12497,12 @@ ix86_address_cost (rtx x, enum machine_mode, addr_space_t, bool) cost++; if (parts.base + (!pic_offset_table_rtx + || REGNO (pic_offset_table_rtx) != REGNO(parts.base)) (!REG_P (parts.base) || REGNO (parts.base) = FIRST_PSEUDO_REGISTER) parts.index + (!pic_offset_table_rtx + || REGNO (pic_offset_table_rtx) != REGNO(parts.index)) (!REG_P (parts.index) || REGNO (parts.index) = FIRST_PSEUDO_REGISTER) parts.base != parts.index) cost++;
Re: [PATCH 2/X, i386, PR54232] Enable EBX for x86 in 32bits PIC code
On Fri, Oct 10, 2014 at 9:43 AM, Evgeny Stupachenko evstu...@gmail.com wrote: i386 specific part of the patch: 2014-10-08 Ilya Enkovich ilya.enkov...@intel.com Vladimir Makarov vmaka...@redhat.com * gcc/config/i386/i386.c (ix86_use_pseudo_pic_reg): New. (ix86_init_pic_reg): New. (ix86_select_alt_pic_regnum): Add check on pseudo register. (ix86_save_reg): Likewise. (ix86_expand_prologue): Remove irrelevant code. (ix86_output_function_epilogue): Add check on pseudo register. (set_pic_reg_ever_alive): New. (legitimize_pic_address): Replace df_set_regs_ever_live with new set_pic_reg_ever_alive. (legitimize_tls_address): Likewise. (ix86_pic_register_p): New check. (ix86_delegitimize_address): Add check on pseudo register. (ix86_expand_call): Insert move from pseudo PIC register to ABI defined REAL_PIC_OFFSET_TABLE_REGNUM. (TARGET_INIT_PIC_REG): New. (TARGET_USE_PSEUDO_PIC_REG): New. (PIC_OFFSET_TABLE_REGNUM): New check. Couple of nits below. +/* Set regs_ever_live for PIC base address register + to true if required. */ +static void +set_pic_reg_ever_alive () Please rename this function to set_pic_reg_ever_live. -#define PIC_OFFSET_TABLE_REGNUM \ - ((TARGET_64BIT (ix86_cmodel == CM_SMALL_PIC \ - || TARGET_PECOFF)) \ - || !flag_pic ? INVALID_REGNUM \ - : reload_completed ? REGNO (pic_offset_table_rtx) \ +#define PIC_OFFSET_TABLE_REGNUM \ + ((TARGET_64BIT (ix86_cmodel == CM_SMALL_PIC \ + || TARGET_PECOFF)) \ + || !flag_pic ? INVALID_REGNUM \ + : pic_offset_table_rtx ? INVALID_REGNUM \ : REAL_PIC_OFFSET_TABLE_REGNUM) No negative conditions, please. Also, please follow established multi-level condition format, please see e.g. HARD_REGNO_NREGS definition in i386.h. OK for mainline after infrastructure patch is approved. Thanks, Uros.
Re: [PATCH 3/X, i386, PR54232] Enable EBX for x86 in 32bits PIC code
On Fri, Oct 10, 2014 at 9:58 AM, Evgeny Stupachenko evstu...@gmail.com wrote: the patch improves performance when previous are applied. It makes RTL loop invariant behavior for GOT loads same as it was before the 2 previous patches. The patch fixes x86 address cost so that cost for addresses with GOT register becomes less, how it was before enabling EBX. In x86_address_cost the result of “REGNO (parts.base) = FIRST_PSEUDO_REGISTER” for hard ebx was always false. The patch makes condition result the same when parts.base is GOT register (the same for parts.index). 2014-10-08 Evgeny Stupachenko evstu...@gmail.com * gcc/config/i386/i386.c (ix86_address_cost): Lower cost for when address contains GOT register. diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index b43e870..9d8cfd1 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -12497,8 +12497,12 @@ ix86_address_cost (rtx x, enum machine_mode, addr_space_t, bool) cost++; Please add a short comment here, explaining the reason for new condition. if (parts.base + (!pic_offset_table_rtx + || REGNO (pic_offset_table_rtx) != REGNO(parts.base)) (!REG_P (parts.base) || REGNO (parts.base) = FIRST_PSEUDO_REGISTER) parts.index + (!pic_offset_table_rtx + || REGNO (pic_offset_table_rtx) != REGNO(parts.index)) (!REG_P (parts.index) || REGNO (parts.index) = FIRST_PSEUDO_REGISTER) parts.base != parts.index) cost++; Otherwise LGTM, but please repost the patch with a comment. Uros.
Re: [PATCH 2/X, i386, PR54232] Enable EBX for x86 in 32bits PIC code
On Fri, Oct 10, 2014 at 9:43 AM, Evgeny Stupachenko evstu...@gmail.com wrote: i386 specific part of the patch: 2014-10-08 Ilya Enkovich ilya.enkov...@intel.com Vladimir Makarov vmaka...@redhat.com * gcc/config/i386/i386.c (ix86_use_pseudo_pic_reg): New. (ix86_init_pic_reg): New. (ix86_select_alt_pic_regnum): Add check on pseudo register. (ix86_save_reg): Likewise. (ix86_expand_prologue): Remove irrelevant code. Please mention *which* code you removed here. (ix86_output_function_epilogue): Add check on pseudo register. (set_pic_reg_ever_alive): New. (legitimize_pic_address): Replace df_set_regs_ever_live with new set_pic_reg_ever_alive. (legitimize_tls_address): Likewise. (ix86_pic_register_p): New check. (ix86_delegitimize_address): Add check on pseudo register. (ix86_expand_call): Insert move from pseudo PIC register to ABI defined REAL_PIC_OFFSET_TABLE_REGNUM. (TARGET_INIT_PIC_REG): New. (TARGET_USE_PSEUDO_PIC_REG): New. (PIC_OFFSET_TABLE_REGNUM): New check. This is not New check, but changed one. Please mention *what* changed. - if (pic_offset_table_rtx) + if (pic_offset_table_rtx + (!reload_completed || !ix86_use_pseudo_pic_reg ())) Hm, can you please add a comment for this change? Uros.
Re: [PATCH 2/X, i386, PR54232] Enable EBX for x86 in 32bits PIC code
Uros Bizjak ubiz...@gmail.com writes: On Fri, Oct 10, 2014 at 9:43 AM, Evgeny Stupachenko evstu...@gmail.com wrote: i386 specific part of the patch: 2014-10-08 Ilya Enkovich ilya.enkov...@intel.com Vladimir Makarov vmaka...@redhat.com * gcc/config/i386/i386.c (ix86_use_pseudo_pic_reg): New. Evgeny: here and in your other submissions: drop the gcc prefix from the pathnames. They are all relative to the directory the ChangeLog lives in. Rainer -- - Rainer Orth, Center for Biotechnology, Bielefeld University
Re: [PATCH 2/X, i386, PR54232] Enable EBX for x86 in 32bits PIC code
On Fri, Oct 10, 2014 at 02:34:07PM +0200, Rainer Orth wrote: Uros Bizjak ubiz...@gmail.com writes: On Fri, Oct 10, 2014 at 9:43 AM, Evgeny Stupachenko evstu...@gmail.com wrote: i386 specific part of the patch: 2014-10-08 Ilya Enkovich ilya.enkov...@intel.com Vladimir Makarov vmaka...@redhat.com * gcc/config/i386/i386.c (ix86_use_pseudo_pic_reg): New. Evgeny: here and in your other submissions: drop the gcc prefix from the pathnames. They are all relative to the directory the ChangeLog lives in. And add a blank line after after the e-mail lines. Jakub
Re: [PATCH 2/X, i386, PR54232] Enable EBX for x86 in 32bits PIC code
Updated ChangeLog: 2014-10-10 Ilya Enkovich ilya.enkov...@intel.com Vladimir Makarov vmaka...@redhat.com * config/i386/i386.c (ix86_use_pseudo_pic_reg): New. (ix86_init_pic_reg): New. (ix86_select_alt_pic_regnum): Add check on pseudo register. (ix86_save_reg): Likewise. (ix86_expand_prologue): Remove PIC register initialization now performed in ix86_init_pic_reg. (ix86_output_function_epilogue): Add check on pseudo register. (set_pic_reg_ever_alive): New. (legitimize_pic_address): Replace df_set_regs_ever_live with new set_pic_reg_ever_alive. (legitimize_tls_address): Likewise. (ix86_pic_register_p): New check. (ix86_delegitimize_address): Add check on pseudo register. (ix86_expand_call): Insert move from pseudo PIC register to ABI defined REAL_PIC_OFFSET_TABLE_REGNUM. (TARGET_INIT_PIC_REG): New. (TARGET_USE_PSEUDO_PIC_REG): New. * config/i386/i386.h (PIC_OFFSET_TABLE_REGNUM): Return INVALID_REGNUM if pic_offset_table_rtx exists. - if (pic_offset_table_rtx) + if (pic_offset_table_rtx + (!reload_completed || !ix86_use_pseudo_pic_reg ())) Hm, can you please add a comment for this change? I've added the following comment to the patch: -in which case we return (%ecx - %ebx) + foo. */ +in which case we return (%ecx - %ebx) + foo. + +Note that when pseudo_pic_reg is used we can generate it only +before reload_completed. */ On Fri, Oct 10, 2014 at 4:36 PM, Jakub Jelinek ja...@redhat.com wrote: On Fri, Oct 10, 2014 at 02:34:07PM +0200, Rainer Orth wrote: Uros Bizjak ubiz...@gmail.com writes: On Fri, Oct 10, 2014 at 9:43 AM, Evgeny Stupachenko evstu...@gmail.com wrote: i386 specific part of the patch: 2014-10-08 Ilya Enkovich ilya.enkov...@intel.com Vladimir Makarov vmaka...@redhat.com * gcc/config/i386/i386.c (ix86_use_pseudo_pic_reg): New. Evgeny: here and in your other submissions: drop the gcc prefix from the pathnames. They are all relative to the directory the ChangeLog lives in. And add a blank line after after the e-mail lines. Jakub
Re: [PATCH 1/X, i386, PR54232] Enable EBX for x86 in 32bits PIC code
On 2014-10-10 3:42 AM, Evgeny Stupachenko wrote: Hi, The patch enables EBX in RA for x86 32bits PIC mode. It was discussed here: https://gcc.gnu.org/ml/gcc-patches/2014-09/msg02513.html Now there is working version with good performance and stability level - it could be a solid first step of EBX enabling. Bootstrap and make check passed. There are several changes in -m32 make check. New pass: gcc.target/i386/pr57003.c - before patch there was not enough registers to PASS New fails: gcc.target/i386/pic-1.c (test for errors, line 12) - now there are no errors as we can do asm insertions with EBX gcc.target/i386/pr23098.c scan-assembler-not .LC[0-9] - potential performance opportunity using constant immediate gcc.target/i386/pr55458.c (test for errors, line 10) - now there are no errors as there enough registers Performance on Silvermont (for the group of 3 patches): at -Ofast -flto -funroll-loops -fPIC EEMBC benchs: +6% CoreMark +3% Denmark Individual test gained up to +25% SPEC2000int is almost unchanged, SPEC2000fp +1,7%: at -O2 -fPIC EEMBC benchs: +2% CoreMark +5% Denmark Individual test gained up to +25% SPEC2000int and SPEC2000fp are almost unchanged: Great results! The changes to RA (all ira* lra* file changes) is ok for me. I'll probably add some comments to these changes on next week. Thanks.
Re: [PATCH 1/X, i386, PR54232] Enable EBX for x86 in 32bits PIC code
On 10/10/14 01:42, Evgeny Stupachenko wrote: Hi, The patch enables EBX in RA for x86 32bits PIC mode. It was discussed here: https://gcc.gnu.org/ml/gcc-patches/2014-09/msg02513.html Now there is working version with good performance and stability level - it could be a solid first step of EBX enabling. Bootstrap and make check passed. There are several changes in -m32 make check. New pass: gcc.target/i386/pr57003.c - before patch there was not enough registers to PASS ?!? That doesn't make a lot of sense. More likely it was Uros's fix from yesterday to regcprop which causes this to pass again. Is it possible you updated your sources between testing runs and as a result picked up Uros's fix? New fails: gcc.target/i386/pic-1.c (test for errors, line 12) - now there are no errors as we can do asm insertions with EBX I think you should remove the dg-error directive. That turns this test into a simple confirmation that we can use %ebx in an asm even when generating PIC code. Can you add a PR markers to your changelog PR target/8340 PR middle-end/47602 PR rtl-optimization/55458 Actually I think there is an additional test in 47602. Can you please add it to the suite? You'll also want to change the state of 47602 to RESOLVED/FIXED. gcc.target/i386/pr23098.c scan-assembler-not .LC[0-9] - potential performance opportunity using constant immediate If you're not going to fix it, then you should xfail it. gcc.target/i386/pr55458.c (test for errors, line 10) - now there are no errors as there enough registers Right. Remove the dg-error and turn this into a test that effective verifies that %ebx is no longer fixed when generating PIC code on i686. With those changes this is OK for the trunk. jeff
Re: Enable EBX for x86 in 32bits PIC code
On Wed, Sep 24, 2014 at 03:20:44PM -0600, Jeff Law wrote: On 09/24/14 14:32, Ilya Enkovich wrote: 2014-09-24 19:27 GMT+04:00 Jeff Law l...@redhat.com: On 09/24/14 00:56, Ilya Enkovich wrote: After register allocation we have no idea where GOT address is and therefore delegitimize_address target hook becomes less efficient and cannot remove UNSPECs. That's what I see now when build GCC with patch applied: In theory this shouldn't be too hard to fix. I haven't looked at the code, but it might be something looking explicitly for ebx by register #, or something similar. Which case within delegitimize_address isn't firing as it should after your changes? It is the case I had to fix: @@ -14415,7 +14433,8 @@ ix86_delegitimize_address (rtx x) ... movl foo@GOTOFF(%ecx), %edx in which case we return (%ecx - %ebx) + foo. */ - if (pic_offset_table_rtx) + if (pic_offset_table_rtx + (!reload_completed || !ix86_use_pseudo_pic_reg ())) result = gen_rtx_PLUS (Pmode, gen_rtx_MINUS (Pmode, copy_rtx (addend), pic_offset_table_rtx), result); Originally if there is a UNSPEC_GOTOFFSET but no EBX usage then we just remove this UNSPEC and substract EBX value. With pseudo PIC reg we should use PIC register instead of EBX but it is unclear what to use after register allocation. What's the RTL before after allocation? Feel free to just pass along the dump files for sum_r4 that you referenced in a prior message. I wonder if during/after reload we just couldn't look at ORIGINAL_REGNO of hard regs if ix86_use_pseudo_pic_reg. Or is that the other case, where you don't have any PIC register replacement around, and want to subtract something? Perhaps in that case we could just subtract the value of _GLOBAL_OFFSET_TABLE_ symbol if we have nothing better around. Jakub
Re: Enable EBX for x86 in 32bits PIC code
On Wed, Sep 24, 2014 at 03:20:44PM -0600, Jeff Law wrote: On 09/24/14 14:32, Ilya Enkovich wrote: 2014-09-24 19:27 GMT+04:00 Jeff Law l...@redhat.com: On 09/24/14 00:56, Ilya Enkovich wrote: After register allocation we have no idea where GOT address is and therefore delegitimize_address target hook becomes less efficient and cannot remove UNSPECs. That's what I see now when build GCC with patch applied: In theory this shouldn't be too hard to fix. I haven't looked at the code, but it might be something looking explicitly for ebx by register #, or something similar. Which case within delegitimize_address isn't firing as it should after your changes? It is the case I had to fix: @@ -14415,7 +14433,8 @@ ix86_delegitimize_address (rtx x) ... movl foo@GOTOFF(%ecx), %edx in which case we return (%ecx - %ebx) + foo. */ - if (pic_offset_table_rtx) + if (pic_offset_table_rtx + (!reload_completed || !ix86_use_pseudo_pic_reg ())) result = gen_rtx_PLUS (Pmode, gen_rtx_MINUS (Pmode, copy_rtx (addend), pic_offset_table_rtx), result); Originally if there is a UNSPEC_GOTOFFSET but no EBX usage then we just remove this UNSPEC and substract EBX value. With pseudo PIC reg we should use PIC register instead of EBX but it is unclear what to use after register allocation. What's the RTL before after allocation? Feel free to just pass along the dump files for sum_r4 that you referenced in a prior message. I wonder if during/after reload we just couldn't look at ORIGINAL_REGNO of hard regs if ix86_use_pseudo_pic_reg. Or is that the other case, where you don't have any PIC register replacement around, and want to subtract something? Perhaps in that case we could just subtract the value of _GLOBAL_OFFSET_TABLE_ symbol if we have nothing better around. Jakub
Re: Enable EBX for x86 in 32bits PIC code
2014-09-23 20:10 GMT+04:00 Jeff Law l...@redhat.com: On 09/23/14 10:03, Jakub Jelinek wrote: On Tue, Sep 23, 2014 at 10:00:00AM -0600, Jeff Law wrote: On 09/23/14 08:34, Jakub Jelinek wrote: On Tue, Sep 23, 2014 at 05:54:37PM +0400, Ilya Enkovich wrote: use fixed EBX at least until we make sure pseudo PIC doesn't harm debug info generation. If we have such option then gcc.target/i386/pic-1.c and For debug info, it seems you are already handling this in delegitimize_address target hook, I'd suggest just building some very large shared library at -O2 -g -fpic on i?86 and either look at the sizes of .debug_info/.debug_loc sections with/without the patch, or use the locstat utility from elfutils (talk to Petr Machata if needed). Can't hurt, but I really don't see how changing from a fixed to an allocatable register is going to muck up debug info in any significant way. What matters is if the delegitimize_address target hook is as efficient in delegitimization as before. E.g. if it previously matched only when seeing %ebx + gotoff or similar, and wouldn't match anything now, some vars could have debug locations including UNSPEC and be dropped on the floor. Ah, yea, that makes sense. jeff After register allocation we have no idea where GOT address is and therefore delegitimize_address target hook becomes less efficient and cannot remove UNSPECs. That's what I see now when build GCC with patch applied: ../../../../gcc/libgfortran/generated/sum_r4.c: In function 'msum_r4': ../../../../gcc/libgfortran/generated/sum_r4.c:195:1: note: non-delegitimized UNSPEC UNSPEC_GOTOFF (1) found in variable location msum_r4 (gfc_array_r4 * const restrict retarray, ^ ../../../../gcc/libgfortran/generated/sum_r4.c:195:1: note: non-delegitimized UNSPEC UNSPEC_GOTOFF (1) found in variable location ../../../../gcc/libgfortran/generated/sum_r4.c:195:1: note: non-delegitimized UNSPEC UNSPEC_GOTOFF (1) found in variable location ../../../../gcc/libgfortran/generated/sum_r4.c:195:1: note: non-delegitimized UNSPEC UNSPEC_GOTOFF (1) found in variable location ../../../../gcc/libgfortran/generated/sum_r8.c: In function 'msum_r8': ../../../../gcc/libgfortran/generated/sum_r8.c:195:1: note: non-delegitimized UNSPEC UNSPEC_GOTOFF (1) found in variable location msum_r8 (gfc_array_r8 * const restrict retarray, ^ ../../../../gcc/libgfortran/generated/sum_r8.c:195:1: note: non-delegitimized UNSPEC UNSPEC_GOTOFF (1) found in variable location ../../../../gcc/libgfortran/generated/sum_r8.c:195:1: note: non-delegitimized UNSPEC UNSPEC_GOTOFF (1) found in variable location ../../../../gcc/libgfortran/generated/sum_r8.c:195:1: note: non-delegitimized UNSPEC UNSPEC_GOTOFF (1) found in variable location Ilya
Re: Enable EBX for x86 in 32bits PIC code
On 09/24/14 00:56, Ilya Enkovich wrote: 2014-09-23 20:10 GMT+04:00 Jeff Law l...@redhat.com: On 09/23/14 10:03, Jakub Jelinek wrote: On Tue, Sep 23, 2014 at 10:00:00AM -0600, Jeff Law wrote: On 09/23/14 08:34, Jakub Jelinek wrote: On Tue, Sep 23, 2014 at 05:54:37PM +0400, Ilya Enkovich wrote: use fixed EBX at least until we make sure pseudo PIC doesn't harm debug info generation. If we have such option then gcc.target/i386/pic-1.c and For debug info, it seems you are already handling this in delegitimize_address target hook, I'd suggest just building some very large shared library at -O2 -g -fpic on i?86 and either look at the sizes of .debug_info/.debug_loc sections with/without the patch, or use the locstat utility from elfutils (talk to Petr Machata if needed). Can't hurt, but I really don't see how changing from a fixed to an allocatable register is going to muck up debug info in any significant way. What matters is if the delegitimize_address target hook is as efficient in delegitimization as before. E.g. if it previously matched only when seeing %ebx + gotoff or similar, and wouldn't match anything now, some vars could have debug locations including UNSPEC and be dropped on the floor. Ah, yea, that makes sense. jeff After register allocation we have no idea where GOT address is and therefore delegitimize_address target hook becomes less efficient and cannot remove UNSPECs. That's what I see now when build GCC with patch applied: In theory this shouldn't be too hard to fix. I haven't looked at the code, but it might be something looking explicitly for ebx by register #, or something similar. Which case within delegitimize_address isn't firing as it should after your changes? jeff
Re: Enable EBX for x86 in 32bits PIC code
2014-09-24 19:27 GMT+04:00 Jeff Law l...@redhat.com: On 09/24/14 00:56, Ilya Enkovich wrote: 2014-09-23 20:10 GMT+04:00 Jeff Law l...@redhat.com: On 09/23/14 10:03, Jakub Jelinek wrote: On Tue, Sep 23, 2014 at 10:00:00AM -0600, Jeff Law wrote: On 09/23/14 08:34, Jakub Jelinek wrote: On Tue, Sep 23, 2014 at 05:54:37PM +0400, Ilya Enkovich wrote: use fixed EBX at least until we make sure pseudo PIC doesn't harm debug info generation. If we have such option then gcc.target/i386/pic-1.c and For debug info, it seems you are already handling this in delegitimize_address target hook, I'd suggest just building some very large shared library at -O2 -g -fpic on i?86 and either look at the sizes of .debug_info/.debug_loc sections with/without the patch, or use the locstat utility from elfutils (talk to Petr Machata if needed). Can't hurt, but I really don't see how changing from a fixed to an allocatable register is going to muck up debug info in any significant way. What matters is if the delegitimize_address target hook is as efficient in delegitimization as before. E.g. if it previously matched only when seeing %ebx + gotoff or similar, and wouldn't match anything now, some vars could have debug locations including UNSPEC and be dropped on the floor. Ah, yea, that makes sense. jeff After register allocation we have no idea where GOT address is and therefore delegitimize_address target hook becomes less efficient and cannot remove UNSPECs. That's what I see now when build GCC with patch applied: In theory this shouldn't be too hard to fix. I haven't looked at the code, but it might be something looking explicitly for ebx by register #, or something similar. Which case within delegitimize_address isn't firing as it should after your changes? It is the case I had to fix: @@ -14415,7 +14433,8 @@ ix86_delegitimize_address (rtx x) ... movl foo@GOTOFF(%ecx), %edx in which case we return (%ecx - %ebx) + foo. */ - if (pic_offset_table_rtx) + if (pic_offset_table_rtx + (!reload_completed || !ix86_use_pseudo_pic_reg ())) result = gen_rtx_PLUS (Pmode, gen_rtx_MINUS (Pmode, copy_rtx (addend), pic_offset_table_rtx), result); Originally if there is a UNSPEC_GOTOFFSET but no EBX usage then we just remove this UNSPEC and substract EBX value. With pseudo PIC reg we should use PIC register instead of EBX but it is unclear what to use after register allocation. Ilya jeff
Re: Enable EBX for x86 in 32bits PIC code
On 09/24/14 14:32, Ilya Enkovich wrote: 2014-09-24 19:27 GMT+04:00 Jeff Law l...@redhat.com: On 09/24/14 00:56, Ilya Enkovich wrote: After register allocation we have no idea where GOT address is and therefore delegitimize_address target hook becomes less efficient and cannot remove UNSPECs. That's what I see now when build GCC with patch applied: In theory this shouldn't be too hard to fix. I haven't looked at the code, but it might be something looking explicitly for ebx by register #, or something similar. Which case within delegitimize_address isn't firing as it should after your changes? It is the case I had to fix: @@ -14415,7 +14433,8 @@ ix86_delegitimize_address (rtx x) ... movl foo@GOTOFF(%ecx), %edx in which case we return (%ecx - %ebx) + foo. */ - if (pic_offset_table_rtx) + if (pic_offset_table_rtx + (!reload_completed || !ix86_use_pseudo_pic_reg ())) result = gen_rtx_PLUS (Pmode, gen_rtx_MINUS (Pmode, copy_rtx (addend), pic_offset_table_rtx), result); Originally if there is a UNSPEC_GOTOFFSET but no EBX usage then we just remove this UNSPEC and substract EBX value. With pseudo PIC reg we should use PIC register instead of EBX but it is unclear what to use after register allocation. What's the RTL before after allocation? Feel free to just pass along the dump files for sum_r4 that you referenced in a prior message. jeff
Re: Enable EBX for x86 in 32bits PIC code
2014-09-23 20:10 GMT+04:00 Jeff Law l...@redhat.com: On 09/23/14 10:03, Jakub Jelinek wrote: On Tue, Sep 23, 2014 at 10:00:00AM -0600, Jeff Law wrote: On 09/23/14 08:34, Jakub Jelinek wrote: On Tue, Sep 23, 2014 at 05:54:37PM +0400, Ilya Enkovich wrote: use fixed EBX at least until we make sure pseudo PIC doesn't harm debug info generation. If we have such option then gcc.target/i386/pic-1.c and For debug info, it seems you are already handling this in delegitimize_address target hook, I'd suggest just building some very large shared library at -O2 -g -fpic on i?86 and either look at the sizes of .debug_info/.debug_loc sections with/without the patch, or use the locstat utility from elfutils (talk to Petr Machata if needed). Can't hurt, but I really don't see how changing from a fixed to an allocatable register is going to muck up debug info in any significant way. What matters is if the delegitimize_address target hook is as efficient in delegitimization as before. E.g. if it previously matched only when seeing %ebx + gotoff or similar, and wouldn't match anything now, some vars could have debug locations including UNSPEC and be dropped on the floor. Ah, yea, that makes sense. jeff After register allocation we have no idea where GOT address is and therefore delegitimize_address target hook becomes less efficient and cannot remove UNSPECs. That's what I see now when build GCC with patch applied: ../../../../gcc/libgfortran/generated/sum_r4.c: In function 'msum_r4': ../../../../gcc/libgfortran/generated/sum_r4.c:195:1: note: non-delegitimized UNSPEC UNSPEC_GOTOFF (1) found in variable location msum_r4 (gfc_array_r4 * const restrict retarray, ^ ../../../../gcc/libgfortran/generated/sum_r4.c:195:1: note: non-delegitimized UNSPEC UNSPEC_GOTOFF (1) found in variable location ../../../../gcc/libgfortran/generated/sum_r4.c:195:1: note: non-delegitimized UNSPEC UNSPEC_GOTOFF (1) found in variable location ../../../../gcc/libgfortran/generated/sum_r4.c:195:1: note: non-delegitimized UNSPEC UNSPEC_GOTOFF (1) found in variable location ../../../../gcc/libgfortran/generated/sum_r8.c: In function 'msum_r8': ../../../../gcc/libgfortran/generated/sum_r8.c:195:1: note: non-delegitimized UNSPEC UNSPEC_GOTOFF (1) found in variable location msum_r8 (gfc_array_r8 * const restrict retarray, ^ ../../../../gcc/libgfortran/generated/sum_r8.c:195:1: note: non-delegitimized UNSPEC UNSPEC_GOTOFF (1) found in variable location ../../../../gcc/libgfortran/generated/sum_r8.c:195:1: note: non-delegitimized UNSPEC UNSPEC_GOTOFF (1) found in variable location ../../../../gcc/libgfortran/generated/sum_r8.c:195:1: note: non-delegitimized UNSPEC UNSPEC_GOTOFF (1) found in variable location Ilya
Re: Enable EBX for x86 in 32bits PIC code
On 09/24/14 00:56, Ilya Enkovich wrote: 2014-09-23 20:10 GMT+04:00 Jeff Law l...@redhat.com: On 09/23/14 10:03, Jakub Jelinek wrote: On Tue, Sep 23, 2014 at 10:00:00AM -0600, Jeff Law wrote: On 09/23/14 08:34, Jakub Jelinek wrote: On Tue, Sep 23, 2014 at 05:54:37PM +0400, Ilya Enkovich wrote: use fixed EBX at least until we make sure pseudo PIC doesn't harm debug info generation. If we have such option then gcc.target/i386/pic-1.c and For debug info, it seems you are already handling this in delegitimize_address target hook, I'd suggest just building some very large shared library at -O2 -g -fpic on i?86 and either look at the sizes of .debug_info/.debug_loc sections with/without the patch, or use the locstat utility from elfutils (talk to Petr Machata if needed). Can't hurt, but I really don't see how changing from a fixed to an allocatable register is going to muck up debug info in any significant way. What matters is if the delegitimize_address target hook is as efficient in delegitimization as before. E.g. if it previously matched only when seeing %ebx + gotoff or similar, and wouldn't match anything now, some vars could have debug locations including UNSPEC and be dropped on the floor. Ah, yea, that makes sense. jeff After register allocation we have no idea where GOT address is and therefore delegitimize_address target hook becomes less efficient and cannot remove UNSPECs. That's what I see now when build GCC with patch applied: In theory this shouldn't be too hard to fix. I haven't looked at the code, but it might be something looking explicitly for ebx by register #, or something similar. Which case within delegitimize_address isn't firing as it should after your changes? jeff
Re: Enable EBX for x86 in 32bits PIC code
2014-09-24 19:27 GMT+04:00 Jeff Law l...@redhat.com: On 09/24/14 00:56, Ilya Enkovich wrote: 2014-09-23 20:10 GMT+04:00 Jeff Law l...@redhat.com: On 09/23/14 10:03, Jakub Jelinek wrote: On Tue, Sep 23, 2014 at 10:00:00AM -0600, Jeff Law wrote: On 09/23/14 08:34, Jakub Jelinek wrote: On Tue, Sep 23, 2014 at 05:54:37PM +0400, Ilya Enkovich wrote: use fixed EBX at least until we make sure pseudo PIC doesn't harm debug info generation. If we have such option then gcc.target/i386/pic-1.c and For debug info, it seems you are already handling this in delegitimize_address target hook, I'd suggest just building some very large shared library at -O2 -g -fpic on i?86 and either look at the sizes of .debug_info/.debug_loc sections with/without the patch, or use the locstat utility from elfutils (talk to Petr Machata if needed). Can't hurt, but I really don't see how changing from a fixed to an allocatable register is going to muck up debug info in any significant way. What matters is if the delegitimize_address target hook is as efficient in delegitimization as before. E.g. if it previously matched only when seeing %ebx + gotoff or similar, and wouldn't match anything now, some vars could have debug locations including UNSPEC and be dropped on the floor. Ah, yea, that makes sense. jeff After register allocation we have no idea where GOT address is and therefore delegitimize_address target hook becomes less efficient and cannot remove UNSPECs. That's what I see now when build GCC with patch applied: In theory this shouldn't be too hard to fix. I haven't looked at the code, but it might be something looking explicitly for ebx by register #, or something similar. Which case within delegitimize_address isn't firing as it should after your changes? It is the case I had to fix: @@ -14415,7 +14433,8 @@ ix86_delegitimize_address (rtx x) ... movl foo@GOTOFF(%ecx), %edx in which case we return (%ecx - %ebx) + foo. */ - if (pic_offset_table_rtx) + if (pic_offset_table_rtx + (!reload_completed || !ix86_use_pseudo_pic_reg ())) result = gen_rtx_PLUS (Pmode, gen_rtx_MINUS (Pmode, copy_rtx (addend), pic_offset_table_rtx), result); Originally if there is a UNSPEC_GOTOFFSET but no EBX usage then we just remove this UNSPEC and substract EBX value. With pseudo PIC reg we should use PIC register instead of EBX but it is unclear what to use after register allocation. Ilya jeff
Re: Enable EBX for x86 in 32bits PIC code
On 09/24/14 14:32, Ilya Enkovich wrote: 2014-09-24 19:27 GMT+04:00 Jeff Law l...@redhat.com: On 09/24/14 00:56, Ilya Enkovich wrote: After register allocation we have no idea where GOT address is and therefore delegitimize_address target hook becomes less efficient and cannot remove UNSPECs. That's what I see now when build GCC with patch applied: In theory this shouldn't be too hard to fix. I haven't looked at the code, but it might be something looking explicitly for ebx by register #, or something similar. Which case within delegitimize_address isn't firing as it should after your changes? It is the case I had to fix: @@ -14415,7 +14433,8 @@ ix86_delegitimize_address (rtx x) ... movl foo@GOTOFF(%ecx), %edx in which case we return (%ecx - %ebx) + foo. */ - if (pic_offset_table_rtx) + if (pic_offset_table_rtx + (!reload_completed || !ix86_use_pseudo_pic_reg ())) result = gen_rtx_PLUS (Pmode, gen_rtx_MINUS (Pmode, copy_rtx (addend), pic_offset_table_rtx), result); Originally if there is a UNSPEC_GOTOFFSET but no EBX usage then we just remove this UNSPEC and substract EBX value. With pseudo PIC reg we should use PIC register instead of EBX but it is unclear what to use after register allocation. What's the RTL before after allocation? Feel free to just pass along the dump files for sum_r4 that you referenced in a prior message. jeff
Re: Enable EBX for x86 in 32bits PIC code
On 03 Sep 16:19, Vladimir Makarov wrote: On 2014-08-29 2:47 AM, Ilya Enkovich wrote: Seems your patch doesn't cover all cases. Attached is a modified patch (with your changes included) and a test where double constant is wrongly rematerialized. I also see in ira dump that there is still a copy of PIC reg created: Initialization of original PIC reg: (insn 23 22 24 2 (set (reg:SI 127) (reg:SI 3 bx)) test.cc:42 90 {*movsi_internal} (expr_list:REG_DEAD (reg:SI 3 bx) (nil))) ... Copy is created: (insn 135 37 25 3 (set (reg:SI 138 [127]) (reg:SI 127)) 90 {*movsi_internal} (expr_list:REG_DEAD (reg:SI 127) (nil))) ... Copy is used: (insn 119 25 122 3 (set (reg:DF 134) (mem/u/c:DF (plus:SI (reg:SI 138 [127]) (const:SI (unspec:SI [ (symbol_ref/u:SI (*.LC0) [flags 0x2]) ] UNSPEC_GOTOFF))) [5 S8 A64])) 128 {*movdf_internal} (expr_list:REG_EQUIV (const_double:DF 2.9997371893933895137251965934410691261292e-4 [0x0.9d495182a99308p-11]) (nil))) The copy is created by a newer IRA optimization for function prologues. The patch in the attachment should solve the problem. I also added the code to prevent spilling the pic pseudo in LRA which could happen before theoretically. After reload we have new usage of r127 which is allocated to ecx which actually does not have any definition in this function at all. (insn 151 42 44 4 (set (reg:SI 0 ax [147]) (plus:SI (reg:SI 2 cx [127]) (const:SI (unspec:SI [ (symbol_ref/u:SI (*.LC0) [flags 0x2]) ] UNSPEC_GOTOFF test.cc:44 213 {*leasi} (expr_list:REG_EQUAL (symbol_ref/u:SI (*.LC0) [flags 0x2]) (nil))) (insn 44 151 45 4 (set (reg:DF 21 xmm0 [orig:129 D.2450 ] [129]) (mult:DF (reg:DF 21 xmm0 [orig:128 D.2450 ] [128]) (mem/u/c:DF (reg:SI 0 ax [147]) [5 S8 A64]))) test.cc:44 790 {*fop_df_comm_sse} (expr_list:REG_EQUAL (mult:DF (reg:DF 21 xmm0 [orig:128 D.2450 ] [128]) (const_double:DF 2.9997371893933895137251965934410691261292e-4 [0x0.9d495182a99308p-11])) (nil))) Compilation string: g++ -m32 -O2 -mfpmath=sse -fPIE -S test.cc Index: ira.c === --- ira.c (revision 214576) +++ ira.c (working copy) @@ -4887,7 +4887,7 @@ split_live_ranges_for_shrink_wrap (void) FOR_BB_INSNS (first, insn) { rtx dest = interesting_dest_for_shprep (insn, call_dom); - if (!dest) + if (!dest || dest == pic_offset_table_rtx) continue; rtx newreg = NULL_RTX; Index: lra-assigns.c === --- lra-assigns.c (revision 214576) +++ lra-assigns.c (working copy) @@ -879,11 +879,13 @@ spill_for (int regno, bitmap spilled_pse } /* Spill pseudos. */ EXECUTE_IF_SET_IN_BITMAP (spill_pseudos_bitmap, 0, spill_regno, bi) - if ((int) spill_regno = lra_constraint_new_regno_start - ! bitmap_bit_p (lra_inheritance_pseudos, spill_regno) - ! bitmap_bit_p (lra_split_regs, spill_regno) - ! bitmap_bit_p (lra_subreg_reload_pseudos, spill_regno) - ! bitmap_bit_p (lra_optional_reload_pseudos, spill_regno)) + if ((pic_offset_table_rtx != NULL + spill_regno == REGNO (pic_offset_table_rtx)) + || ((int) spill_regno = lra_constraint_new_regno_start + ! bitmap_bit_p (lra_inheritance_pseudos, spill_regno) + ! bitmap_bit_p (lra_split_regs, spill_regno) + ! bitmap_bit_p (lra_subreg_reload_pseudos, spill_regno) + ! bitmap_bit_p (lra_optional_reload_pseudos, spill_regno))) goto fail; insn_pseudos_num = 0; if (lra_dump_file != NULL) @@ -1053,7 +1055,9 @@ setup_live_pseudos_and_spill_after_risky return; } for (n = 0, i = FIRST_PSEUDO_REGISTER; i max_regno; i++) -if (reg_renumber[i] = 0 lra_reg_info[i].nrefs 0) +if ((pic_offset_table_rtx == NULL_RTX + || i != (int) REGNO (pic_offset_table_rtx)) + reg_renumber[i] = 0 lra_reg_info[i].nrefs 0) sorted_pseudos[n++] = i; qsort (sorted_pseudos, n, sizeof (int), pseudo_compare_func); for (i = n - 1; i = 0; i--) @@ -1360,6 +1364,8 @@ assign_by_spills (void) } EXECUTE_IF_SET_IN_SPARSESET (live_range_hard_reg_pseudos, conflict_regno) { + gcc_assert (pic_offset_table_rtx == NULL + || conflict_regno != REGNO (pic_offset_table_rtx)); if ((int) conflict_regno = lra_constraint_new_regno_start) sorted_pseudos[nfails++] = conflict_regno; if (lra_dump_file != NULL) Hi, Here is a patch which combines results of my and Vladimir's work on EBX
Re: Enable EBX for x86 in 32bits PIC code
On Tue, Sep 23, 2014 at 3:54 PM, Ilya Enkovich enkovich@gmail.com wrote: Here is a patch which combines results of my and Vladimir's work on EBX enabling. It works OK for SPEC2000 and SPEC2006 on -Ofast + LTO. It passes bootstrap but there are few new failures in make check. gcc.target/i386/pic-1.c fails because it doesn't expect we can use EBX in 32bit PIC mode gcc.target/i386/pr55458.c fails due to the same reason gcc.target/i386/pr23098.c fails because compiler fails to use float constant as an immediate and loads it from GOT instead Do we have the final decision about having a sompiler flag to control enabling of pseudo PIC register? I think we should have a possibility to use fixed EBX at least until we make sure pseudo PIC doesn't harm debug info generation. If we have such option then gcc.target/i386/pic-1.c and gcc.target/i386/pr55458.c should be modified, otherwise these tests should be removed. I think having this flag would be dangerous. In effect, this flag would be a hidden -ffixed-bx, with unwanted consequences on asm code that handles ebx. As an example, please see config/i386/cpuid.h - ATM, we handle ebx in a special way when __PIC__ is defined. With your patch, we will have to handle it in a special way when new flag is in effect, which is impossible, unless another compiler-generated define is emitted. So, I vote to change PIC reg to a pseudo unconditionally and adjust testsuite for all (expected) fall-out. Uros.
Re: Enable EBX for x86 in 32bits PIC code
On Tue, Sep 23, 2014 at 05:54:37PM +0400, Ilya Enkovich wrote: use fixed EBX at least until we make sure pseudo PIC doesn't harm debug info generation. If we have such option then gcc.target/i386/pic-1.c and For debug info, it seems you are already handling this in delegitimize_address target hook, I'd suggest just building some very large shared library at -O2 -g -fpic on i?86 and either look at the sizes of .debug_info/.debug_loc sections with/without the patch, or use the locstat utility from elfutils (talk to Petr Machata if needed). Jakub
Re: Enable EBX for x86 in 32bits PIC code
On 09/23/14 08:23, Uros Bizjak wrote: On Tue, Sep 23, 2014 at 3:54 PM, Ilya Enkovich enkovich@gmail.com wrote: Here is a patch which combines results of my and Vladimir's work on EBX enabling. It works OK for SPEC2000 and SPEC2006 on -Ofast + LTO. It passes bootstrap but there are few new failures in make check. gcc.target/i386/pic-1.c fails because it doesn't expect we can use EBX in 32bit PIC mode gcc.target/i386/pr55458.c fails due to the same reason gcc.target/i386/pr23098.c fails because compiler fails to use float constant as an immediate and loads it from GOT instead Do we have the final decision about having a sompiler flag to control enabling of pseudo PIC register? I think we should have a possibility to use fixed EBX at least until we make sure pseudo PIC doesn't harm debug info generation. If we have such option then gcc.target/i386/pic-1.c and gcc.target/i386/pr55458.c should be modified, otherwise these tests should be removed. I think having this flag would be dangerous. In effect, this flag would be a hidden -ffixed-bx, with unwanted consequences on asm code that handles ebx. As an example, please see config/i386/cpuid.h - ATM, we handle ebx in a special way when __PIC__ is defined. With your patch, we will have to handle it in a special way when new flag is in effect, which is impossible, unless another compiler-generated define is emitted. So, I vote to change PIC reg to a pseudo unconditionally and adjust testsuite for all (expected) fall-out. Agreed. Continuing to support both modes just seems like a maintenance nightmare and asking for problems at some point. If there's performance regressions, we just tackle them :-) I suspect any performance regressions we find are going to point us at issues in IRA/LRA that we would want to look at anyway. jeff
Re: Enable EBX for x86 in 32bits PIC code
Jakub Jelinek ja...@redhat.com writes: look at the sizes of .debug_info/.debug_loc sections with/without the patch, or use the locstat utility from elfutils Not actually part of elfutils, but available either here: https://github.com/pmachata/dwlocstat ... or packaged in Fedora. Thanks, PM
Re: Enable EBX for x86 in 32bits PIC code
On Tue, Sep 23, 2014 at 10:00:00AM -0600, Jeff Law wrote: On 09/23/14 08:34, Jakub Jelinek wrote: On Tue, Sep 23, 2014 at 05:54:37PM +0400, Ilya Enkovich wrote: use fixed EBX at least until we make sure pseudo PIC doesn't harm debug info generation. If we have such option then gcc.target/i386/pic-1.c and For debug info, it seems you are already handling this in delegitimize_address target hook, I'd suggest just building some very large shared library at -O2 -g -fpic on i?86 and either look at the sizes of .debug_info/.debug_loc sections with/without the patch, or use the locstat utility from elfutils (talk to Petr Machata if needed). Can't hurt, but I really don't see how changing from a fixed to an allocatable register is going to muck up debug info in any significant way. What matters is if the delegitimize_address target hook is as efficient in delegitimization as before. E.g. if it previously matched only when seeing %ebx + gotoff or similar, and wouldn't match anything now, some vars could have debug locations including UNSPEC and be dropped on the floor. Jakub
Re: Enable EBX for x86 in 32bits PIC code
On 09/23/14 10:03, Jakub Jelinek wrote: On Tue, Sep 23, 2014 at 10:00:00AM -0600, Jeff Law wrote: On 09/23/14 08:34, Jakub Jelinek wrote: On Tue, Sep 23, 2014 at 05:54:37PM +0400, Ilya Enkovich wrote: use fixed EBX at least until we make sure pseudo PIC doesn't harm debug info generation. If we have such option then gcc.target/i386/pic-1.c and For debug info, it seems you are already handling this in delegitimize_address target hook, I'd suggest just building some very large shared library at -O2 -g -fpic on i?86 and either look at the sizes of .debug_info/.debug_loc sections with/without the patch, or use the locstat utility from elfutils (talk to Petr Machata if needed). Can't hurt, but I really don't see how changing from a fixed to an allocatable register is going to muck up debug info in any significant way. What matters is if the delegitimize_address target hook is as efficient in delegitimization as before. E.g. if it previously matched only when seeing %ebx + gotoff or similar, and wouldn't match anything now, some vars could have debug locations including UNSPEC and be dropped on the floor. Ah, yea, that makes sense. jeff
Re: Enable EBX for x86 in 32bits PIC code
On 09/23/14 08:34, Jakub Jelinek wrote: On Tue, Sep 23, 2014 at 05:54:37PM +0400, Ilya Enkovich wrote: use fixed EBX at least until we make sure pseudo PIC doesn't harm debug info generation. If we have such option then gcc.target/i386/pic-1.c and For debug info, it seems you are already handling this in delegitimize_address target hook, I'd suggest just building some very large shared library at -O2 -g -fpic on i?86 and either look at the sizes of .debug_info/.debug_loc sections with/without the patch, or use the locstat utility from elfutils (talk to Petr Machata if needed). Can't hurt, but I really don't see how changing from a fixed to an allocatable register is going to muck up debug info in any significant way. jeff
Re: Enable EBX for x86 in 32bits PIC code
On 03 Sep 16:19, Vladimir Makarov wrote: On 2014-08-29 2:47 AM, Ilya Enkovich wrote: Seems your patch doesn't cover all cases. Attached is a modified patch (with your changes included) and a test where double constant is wrongly rematerialized. I also see in ira dump that there is still a copy of PIC reg created: Initialization of original PIC reg: (insn 23 22 24 2 (set (reg:SI 127) (reg:SI 3 bx)) test.cc:42 90 {*movsi_internal} (expr_list:REG_DEAD (reg:SI 3 bx) (nil))) ... Copy is created: (insn 135 37 25 3 (set (reg:SI 138 [127]) (reg:SI 127)) 90 {*movsi_internal} (expr_list:REG_DEAD (reg:SI 127) (nil))) ... Copy is used: (insn 119 25 122 3 (set (reg:DF 134) (mem/u/c:DF (plus:SI (reg:SI 138 [127]) (const:SI (unspec:SI [ (symbol_ref/u:SI (*.LC0) [flags 0x2]) ] UNSPEC_GOTOFF))) [5 S8 A64])) 128 {*movdf_internal} (expr_list:REG_EQUIV (const_double:DF 2.9997371893933895137251965934410691261292e-4 [0x0.9d495182a99308p-11]) (nil))) The copy is created by a newer IRA optimization for function prologues. The patch in the attachment should solve the problem. I also added the code to prevent spilling the pic pseudo in LRA which could happen before theoretically. After reload we have new usage of r127 which is allocated to ecx which actually does not have any definition in this function at all. (insn 151 42 44 4 (set (reg:SI 0 ax [147]) (plus:SI (reg:SI 2 cx [127]) (const:SI (unspec:SI [ (symbol_ref/u:SI (*.LC0) [flags 0x2]) ] UNSPEC_GOTOFF test.cc:44 213 {*leasi} (expr_list:REG_EQUAL (symbol_ref/u:SI (*.LC0) [flags 0x2]) (nil))) (insn 44 151 45 4 (set (reg:DF 21 xmm0 [orig:129 D.2450 ] [129]) (mult:DF (reg:DF 21 xmm0 [orig:128 D.2450 ] [128]) (mem/u/c:DF (reg:SI 0 ax [147]) [5 S8 A64]))) test.cc:44 790 {*fop_df_comm_sse} (expr_list:REG_EQUAL (mult:DF (reg:DF 21 xmm0 [orig:128 D.2450 ] [128]) (const_double:DF 2.9997371893933895137251965934410691261292e-4 [0x0.9d495182a99308p-11])) (nil))) Compilation string: g++ -m32 -O2 -mfpmath=sse -fPIE -S test.cc Index: ira.c === --- ira.c (revision 214576) +++ ira.c (working copy) @@ -4887,7 +4887,7 @@ split_live_ranges_for_shrink_wrap (void) FOR_BB_INSNS (first, insn) { rtx dest = interesting_dest_for_shprep (insn, call_dom); - if (!dest) + if (!dest || dest == pic_offset_table_rtx) continue; rtx newreg = NULL_RTX; Index: lra-assigns.c === --- lra-assigns.c (revision 214576) +++ lra-assigns.c (working copy) @@ -879,11 +879,13 @@ spill_for (int regno, bitmap spilled_pse } /* Spill pseudos. */ EXECUTE_IF_SET_IN_BITMAP (spill_pseudos_bitmap, 0, spill_regno, bi) - if ((int) spill_regno = lra_constraint_new_regno_start - ! bitmap_bit_p (lra_inheritance_pseudos, spill_regno) - ! bitmap_bit_p (lra_split_regs, spill_regno) - ! bitmap_bit_p (lra_subreg_reload_pseudos, spill_regno) - ! bitmap_bit_p (lra_optional_reload_pseudos, spill_regno)) + if ((pic_offset_table_rtx != NULL + spill_regno == REGNO (pic_offset_table_rtx)) + || ((int) spill_regno = lra_constraint_new_regno_start + ! bitmap_bit_p (lra_inheritance_pseudos, spill_regno) + ! bitmap_bit_p (lra_split_regs, spill_regno) + ! bitmap_bit_p (lra_subreg_reload_pseudos, spill_regno) + ! bitmap_bit_p (lra_optional_reload_pseudos, spill_regno))) goto fail; insn_pseudos_num = 0; if (lra_dump_file != NULL) @@ -1053,7 +1055,9 @@ setup_live_pseudos_and_spill_after_risky return; } for (n = 0, i = FIRST_PSEUDO_REGISTER; i max_regno; i++) -if (reg_renumber[i] = 0 lra_reg_info[i].nrefs 0) +if ((pic_offset_table_rtx == NULL_RTX + || i != (int) REGNO (pic_offset_table_rtx)) + reg_renumber[i] = 0 lra_reg_info[i].nrefs 0) sorted_pseudos[n++] = i; qsort (sorted_pseudos, n, sizeof (int), pseudo_compare_func); for (i = n - 1; i = 0; i--) @@ -1360,6 +1364,8 @@ assign_by_spills (void) } EXECUTE_IF_SET_IN_SPARSESET (live_range_hard_reg_pseudos, conflict_regno) { + gcc_assert (pic_offset_table_rtx == NULL + || conflict_regno != REGNO (pic_offset_table_rtx)); if ((int) conflict_regno = lra_constraint_new_regno_start) sorted_pseudos[nfails++] = conflict_regno; if (lra_dump_file != NULL) Hi, Here is a patch which combines results of my and Vladimir's work on EBX
Re: Enable EBX for x86 in 32bits PIC code
On Tue, Sep 23, 2014 at 3:54 PM, Ilya Enkovich enkovich@gmail.com wrote: Here is a patch which combines results of my and Vladimir's work on EBX enabling. It works OK for SPEC2000 and SPEC2006 on -Ofast + LTO. It passes bootstrap but there are few new failures in make check. gcc.target/i386/pic-1.c fails because it doesn't expect we can use EBX in 32bit PIC mode gcc.target/i386/pr55458.c fails due to the same reason gcc.target/i386/pr23098.c fails because compiler fails to use float constant as an immediate and loads it from GOT instead Do we have the final decision about having a sompiler flag to control enabling of pseudo PIC register? I think we should have a possibility to use fixed EBX at least until we make sure pseudo PIC doesn't harm debug info generation. If we have such option then gcc.target/i386/pic-1.c and gcc.target/i386/pr55458.c should be modified, otherwise these tests should be removed. I think having this flag would be dangerous. In effect, this flag would be a hidden -ffixed-bx, with unwanted consequences on asm code that handles ebx. As an example, please see config/i386/cpuid.h - ATM, we handle ebx in a special way when __PIC__ is defined. With your patch, we will have to handle it in a special way when new flag is in effect, which is impossible, unless another compiler-generated define is emitted. So, I vote to change PIC reg to a pseudo unconditionally and adjust testsuite for all (expected) fall-out. Uros.
Re: Enable EBX for x86 in 32bits PIC code
On Tue, Sep 23, 2014 at 05:54:37PM +0400, Ilya Enkovich wrote: use fixed EBX at least until we make sure pseudo PIC doesn't harm debug info generation. If we have such option then gcc.target/i386/pic-1.c and For debug info, it seems you are already handling this in delegitimize_address target hook, I'd suggest just building some very large shared library at -O2 -g -fpic on i?86 and either look at the sizes of .debug_info/.debug_loc sections with/without the patch, or use the locstat utility from elfutils (talk to Petr Machata if needed). Jakub
Re: Enable EBX for x86 in 32bits PIC code
On 09/23/14 08:23, Uros Bizjak wrote: On Tue, Sep 23, 2014 at 3:54 PM, Ilya Enkovich enkovich@gmail.com wrote: Here is a patch which combines results of my and Vladimir's work on EBX enabling. It works OK for SPEC2000 and SPEC2006 on -Ofast + LTO. It passes bootstrap but there are few new failures in make check. gcc.target/i386/pic-1.c fails because it doesn't expect we can use EBX in 32bit PIC mode gcc.target/i386/pr55458.c fails due to the same reason gcc.target/i386/pr23098.c fails because compiler fails to use float constant as an immediate and loads it from GOT instead Do we have the final decision about having a sompiler flag to control enabling of pseudo PIC register? I think we should have a possibility to use fixed EBX at least until we make sure pseudo PIC doesn't harm debug info generation. If we have such option then gcc.target/i386/pic-1.c and gcc.target/i386/pr55458.c should be modified, otherwise these tests should be removed. I think having this flag would be dangerous. In effect, this flag would be a hidden -ffixed-bx, with unwanted consequences on asm code that handles ebx. As an example, please see config/i386/cpuid.h - ATM, we handle ebx in a special way when __PIC__ is defined. With your patch, we will have to handle it in a special way when new flag is in effect, which is impossible, unless another compiler-generated define is emitted. So, I vote to change PIC reg to a pseudo unconditionally and adjust testsuite for all (expected) fall-out. Agreed. Continuing to support both modes just seems like a maintenance nightmare and asking for problems at some point. If there's performance regressions, we just tackle them :-) I suspect any performance regressions we find are going to point us at issues in IRA/LRA that we would want to look at anyway. jeff
Re: Enable EBX for x86 in 32bits PIC code
Jakub Jelinek ja...@redhat.com writes: look at the sizes of .debug_info/.debug_loc sections with/without the patch, or use the locstat utility from elfutils Not actually part of elfutils, but available either here: https://github.com/pmachata/dwlocstat ... or packaged in Fedora. Thanks, PM
Re: Enable EBX for x86 in 32bits PIC code
On 09/23/14 08:34, Jakub Jelinek wrote: On Tue, Sep 23, 2014 at 05:54:37PM +0400, Ilya Enkovich wrote: use fixed EBX at least until we make sure pseudo PIC doesn't harm debug info generation. If we have such option then gcc.target/i386/pic-1.c and For debug info, it seems you are already handling this in delegitimize_address target hook, I'd suggest just building some very large shared library at -O2 -g -fpic on i?86 and either look at the sizes of .debug_info/.debug_loc sections with/without the patch, or use the locstat utility from elfutils (talk to Petr Machata if needed). Can't hurt, but I really don't see how changing from a fixed to an allocatable register is going to muck up debug info in any significant way. jeff
Re: Enable EBX for x86 in 32bits PIC code
On Tue, Sep 23, 2014 at 10:00:00AM -0600, Jeff Law wrote: On 09/23/14 08:34, Jakub Jelinek wrote: On Tue, Sep 23, 2014 at 05:54:37PM +0400, Ilya Enkovich wrote: use fixed EBX at least until we make sure pseudo PIC doesn't harm debug info generation. If we have such option then gcc.target/i386/pic-1.c and For debug info, it seems you are already handling this in delegitimize_address target hook, I'd suggest just building some very large shared library at -O2 -g -fpic on i?86 and either look at the sizes of .debug_info/.debug_loc sections with/without the patch, or use the locstat utility from elfutils (talk to Petr Machata if needed). Can't hurt, but I really don't see how changing from a fixed to an allocatable register is going to muck up debug info in any significant way. What matters is if the delegitimize_address target hook is as efficient in delegitimization as before. E.g. if it previously matched only when seeing %ebx + gotoff or similar, and wouldn't match anything now, some vars could have debug locations including UNSPEC and be dropped on the floor. Jakub
Re: Enable EBX for x86 in 32bits PIC code
On 09/23/14 10:03, Jakub Jelinek wrote: On Tue, Sep 23, 2014 at 10:00:00AM -0600, Jeff Law wrote: On 09/23/14 08:34, Jakub Jelinek wrote: On Tue, Sep 23, 2014 at 05:54:37PM +0400, Ilya Enkovich wrote: use fixed EBX at least until we make sure pseudo PIC doesn't harm debug info generation. If we have such option then gcc.target/i386/pic-1.c and For debug info, it seems you are already handling this in delegitimize_address target hook, I'd suggest just building some very large shared library at -O2 -g -fpic on i?86 and either look at the sizes of .debug_info/.debug_loc sections with/without the patch, or use the locstat utility from elfutils (talk to Petr Machata if needed). Can't hurt, but I really don't see how changing from a fixed to an allocatable register is going to muck up debug info in any significant way. What matters is if the delegitimize_address target hook is as efficient in delegitimization as before. E.g. if it previously matched only when seeing %ebx + gotoff or similar, and wouldn't match anything now, some vars could have debug locations including UNSPEC and be dropped on the floor. Ah, yea, that makes sense. jeff
Re: Enable EBX for x86 in 32bits PIC code
On 09/09/14 10:43, Vladimir Makarov wrote: I've investigated the wrong code generation. I did a mistake in my last patch excluding pic pseudo from live-range analysis when risky transformations are on. Here is the right version of all IRA/LRA changes relative to trunk. I managed to compile and run successfully all 32-bit PIC SPECInt2000 programs with these changes. Thanks Vlad. I'll leave final testing and committing these patches to you. I did play around with a simpler patch to un-fix the PIC register for 32bit x86 -- without turning it into a pseudo. The problem with that intermediate step is that when we spill values with symbolic equivalences, we fail to handle the new conflicts that will generate. ie, in the case of PIC reloading those values should cause the unfixed hard pic register to conflict with any live pseudos at the reload point. But it doesn't :( As a result we end up using the unfixed hard pic reg to satsify a reload and, well, you know what happens then. jeff
Re: Enable EBX for x86 in 32bits PIC code
On 09/04/2014 10:30 AM, Zamyatin, Igor wrote: -Original Message- From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches- ow...@gcc.gnu.org] On Behalf Of Vladimir Makarov Sent: Thursday, September 04, 2014 12:19 AM To: Ilya Enkovich Cc: g...@gnu.org; gcc-patches; Evgeny Stupachenko; Richard Biener; Uros Bizjak; Jeff Law Subject: Re: Enable EBX for x86 in 32bits PIC code On 2014-08-29 2:47 AM, Ilya Enkovich wrote: Seems your patch doesn't cover all cases. Attached is a modified patch (with your changes included) and a test where double constant is wrongly rematerialized. I also see in ira dump that there is still a copy of PIC reg created: Initialization of original PIC reg: (insn 23 22 24 2 (set (reg:SI 127) (reg:SI 3 bx)) test.cc:42 90 {*movsi_internal} (expr_list:REG_DEAD (reg:SI 3 bx) (nil))) ... Copy is created: (insn 135 37 25 3 (set (reg:SI 138 [127]) (reg:SI 127)) 90 {*movsi_internal} (expr_list:REG_DEAD (reg:SI 127) (nil))) ... Copy is used: (insn 119 25 122 3 (set (reg:DF 134) (mem/u/c:DF (plus:SI (reg:SI 138 [127]) (const:SI (unspec:SI [ (symbol_ref/u:SI (*.LC0) [flags 0x2]) ] UNSPEC_GOTOFF))) [5 S8 A64])) 128 {*movdf_internal} (expr_list:REG_EQUIV (const_double:DF 2.9997371893933895137251965934410691261292e-4 [0x0.9d495182a99308p-11]) (nil))) The copy is created by a newer IRA optimization for function prologues. The patch in the attachment should solve the problem. I also added the code to prevent spilling the pic pseudo in LRA which could happen before theoretically. Hi, Vladimir! I applied patch as an addition to your previous patch (was I right?) and unfortunately got all spec2000 tests failed at the runtime (segfault) Looking at 164.gzip I saw following code in spec_init 4bc0 spec_init: 4bc0: 55 push %ebp 4bc1: 57 push %edi 4bc2: 56 push %esi 4bc3: e8 58 c6 ff ff call 1220 __x86.get_pc_thunk.si 4bc8: 81 c6 38 84 00 00 add$0x8438,%esi 4bce: 53 push %ebx 4bcf: 8d 64 24 e4 lea-0x1c(%esp),%esp 4bd3: 83 be a0 03 00 00 03cmpl $0x3,0x3a0(%esi) 4bda: 7f 67 jg 4c43 spec_init+0x83 4bdc: 8d ae 40 12 05 00 lea0x51240(%esi),%ebp 4be2: 8d 45 30lea0x30(%ebp),%eax 4be5: 89 c6 mov%eax,%esi incorrect move, GOT value is now lost (here was mov %eax,0x1c(%esp) before this additional patch) 4be7: 8b 7d 00mov0x0(%ebp),%edi 4bea: 89 f3 mov%esi,%ebx now ebx contains incorrect value so call to malloc will be executed wrongly 4bec: c7 45 04 00 00 00 00movl $0x0,0x4(%ebp) 4bf3: c7 45 08 00 00 00 00movl $0x0,0x8(%ebp) 4bfa: c7 45 0c 00 00 00 00movl $0x0,0xc(%ebp) 4c01: 8d 87 00 90 01 00 lea0x19000(%edi),%eax 4c07: 89 04 24mov%eax,(%esp) 4c0a: e8 a1 be ff ff call ab0 malloc@plt I've investigated the wrong code generation. I did a mistake in my last patch excluding pic pseudo from live-range analysis when risky transformations are on. Here is the right version of all IRA/LRA changes relative to trunk. I managed to compile and run successfully all 32-bit PIC SPECInt2000 programs with these changes. Index: ira-color.c === --- ira-color.c (revision 214576) +++ ira-color.c (working copy) @@ -3239,9 +3239,11 @@ ira_assert (ALLOCNO_CLASS (subloop_allocno) == rclass); ira_assert (bitmap_bit_p (subloop_node-all_allocnos, ALLOCNO_NUM (subloop_allocno))); - if ((flag_ira_region == IRA_REGION_MIXED) - (loop_tree_node-reg_pressure[pclass] - = ira_class_hard_regs_num[pclass])) + if ((flag_ira_region == IRA_REGION_MIXED + (loop_tree_node-reg_pressure[pclass] + = ira_class_hard_regs_num[pclass])) + || (pic_offset_table_rtx != NULL + regno == (int) REGNO (pic_offset_table_rtx))) { if (! ALLOCNO_ASSIGNED_P (subloop_allocno)) { Index: ira-emit.c === --- ira-emit.c (revision 214576) +++ ira-emit.c (working copy) @@ -620,7 +620,10 @@ /* don't create copies because reload can spill an allocno set by copy although the allocno
Re: Enable EBX for x86 in 32bits PIC code
On 2014-08-29 2:47 AM, Ilya Enkovich wrote: Seems your patch doesn't cover all cases. Attached is a modified patch (with your changes included) and a test where double constant is wrongly rematerialized. I also see in ira dump that there is still a copy of PIC reg created: Initialization of original PIC reg: (insn 23 22 24 2 (set (reg:SI 127) (reg:SI 3 bx)) test.cc:42 90 {*movsi_internal} (expr_list:REG_DEAD (reg:SI 3 bx) (nil))) ... Copy is created: (insn 135 37 25 3 (set (reg:SI 138 [127]) (reg:SI 127)) 90 {*movsi_internal} (expr_list:REG_DEAD (reg:SI 127) (nil))) ... Copy is used: (insn 119 25 122 3 (set (reg:DF 134) (mem/u/c:DF (plus:SI (reg:SI 138 [127]) (const:SI (unspec:SI [ (symbol_ref/u:SI (*.LC0) [flags 0x2]) ] UNSPEC_GOTOFF))) [5 S8 A64])) 128 {*movdf_internal} (expr_list:REG_EQUIV (const_double:DF 2.9997371893933895137251965934410691261292e-4 [0x0.9d495182a99308p-11]) (nil))) The copy is created by a newer IRA optimization for function prologues. The patch in the attachment should solve the problem. I also added the code to prevent spilling the pic pseudo in LRA which could happen before theoretically. After reload we have new usage of r127 which is allocated to ecx which actually does not have any definition in this function at all. (insn 151 42 44 4 (set (reg:SI 0 ax [147]) (plus:SI (reg:SI 2 cx [127]) (const:SI (unspec:SI [ (symbol_ref/u:SI (*.LC0) [flags 0x2]) ] UNSPEC_GOTOFF test.cc:44 213 {*leasi} (expr_list:REG_EQUAL (symbol_ref/u:SI (*.LC0) [flags 0x2]) (nil))) (insn 44 151 45 4 (set (reg:DF 21 xmm0 [orig:129 D.2450 ] [129]) (mult:DF (reg:DF 21 xmm0 [orig:128 D.2450 ] [128]) (mem/u/c:DF (reg:SI 0 ax [147]) [5 S8 A64]))) test.cc:44 790 {*fop_df_comm_sse} (expr_list:REG_EQUAL (mult:DF (reg:DF 21 xmm0 [orig:128 D.2450 ] [128]) (const_double:DF 2.9997371893933895137251965934410691261292e-4 [0x0.9d495182a99308p-11])) (nil))) Compilation string: g++ -m32 -O2 -mfpmath=sse -fPIE -S test.cc Index: ira.c === --- ira.c (revision 214576) +++ ira.c (working copy) @@ -4887,7 +4887,7 @@ split_live_ranges_for_shrink_wrap (void) FOR_BB_INSNS (first, insn) { rtx dest = interesting_dest_for_shprep (insn, call_dom); - if (!dest) + if (!dest || dest == pic_offset_table_rtx) continue; rtx newreg = NULL_RTX; Index: lra-assigns.c === --- lra-assigns.c (revision 214576) +++ lra-assigns.c (working copy) @@ -879,11 +879,13 @@ spill_for (int regno, bitmap spilled_pse } /* Spill pseudos. */ EXECUTE_IF_SET_IN_BITMAP (spill_pseudos_bitmap, 0, spill_regno, bi) - if ((int) spill_regno = lra_constraint_new_regno_start -! bitmap_bit_p (lra_inheritance_pseudos, spill_regno) -! bitmap_bit_p (lra_split_regs, spill_regno) -! bitmap_bit_p (lra_subreg_reload_pseudos, spill_regno) -! bitmap_bit_p (lra_optional_reload_pseudos, spill_regno)) + if ((pic_offset_table_rtx != NULL + spill_regno == REGNO (pic_offset_table_rtx)) + || ((int) spill_regno = lra_constraint_new_regno_start +! bitmap_bit_p (lra_inheritance_pseudos, spill_regno) +! bitmap_bit_p (lra_split_regs, spill_regno) +! bitmap_bit_p (lra_subreg_reload_pseudos, spill_regno) +! bitmap_bit_p (lra_optional_reload_pseudos, spill_regno))) goto fail; insn_pseudos_num = 0; if (lra_dump_file != NULL) @@ -1053,7 +1055,9 @@ setup_live_pseudos_and_spill_after_risky return; } for (n = 0, i = FIRST_PSEUDO_REGISTER; i max_regno; i++) -if (reg_renumber[i] = 0 lra_reg_info[i].nrefs 0) +if ((pic_offset_table_rtx == NULL_RTX +|| i != (int) REGNO (pic_offset_table_rtx)) +reg_renumber[i] = 0 lra_reg_info[i].nrefs 0) sorted_pseudos[n++] = i; qsort (sorted_pseudos, n, sizeof (int), pseudo_compare_func); for (i = n - 1; i = 0; i--) @@ -1360,6 +1364,8 @@ assign_by_spills (void) } EXECUTE_IF_SET_IN_SPARSESET (live_range_hard_reg_pseudos, conflict_regno) { + gcc_assert (pic_offset_table_rtx == NULL + || conflict_regno != REGNO (pic_offset_table_rtx)); if ((int) conflict_regno = lra_constraint_new_regno_start) sorted_pseudos[nfails++] = conflict_regno; if (lra_dump_file != NULL)
Re: Enable EBX for x86 in 32bits PIC code
On 2014-08-29 2:47 AM, Ilya Enkovich wrote: Seems your patch doesn't cover all cases. Attached is a modified patch (with your changes included) and a test where double constant is wrongly rematerialized. I also see in ira dump that there is still a copy of PIC reg created: Initialization of original PIC reg: (insn 23 22 24 2 (set (reg:SI 127) (reg:SI 3 bx)) test.cc:42 90 {*movsi_internal} (expr_list:REG_DEAD (reg:SI 3 bx) (nil))) ... Copy is created: (insn 135 37 25 3 (set (reg:SI 138 [127]) (reg:SI 127)) 90 {*movsi_internal} (expr_list:REG_DEAD (reg:SI 127) (nil))) ... Copy is used: (insn 119 25 122 3 (set (reg:DF 134) (mem/u/c:DF (plus:SI (reg:SI 138 [127]) (const:SI (unspec:SI [ (symbol_ref/u:SI (*.LC0) [flags 0x2]) ] UNSPEC_GOTOFF))) [5 S8 A64])) 128 {*movdf_internal} (expr_list:REG_EQUIV (const_double:DF 2.9997371893933895137251965934410691261292e-4 [0x0.9d495182a99308p-11]) (nil))) The copy is created by a newer IRA optimization for function prologues. The patch in the attachment should solve the problem. I also added the code to prevent spilling the pic pseudo in LRA which could happen before theoretically. After reload we have new usage of r127 which is allocated to ecx which actually does not have any definition in this function at all. (insn 151 42 44 4 (set (reg:SI 0 ax [147]) (plus:SI (reg:SI 2 cx [127]) (const:SI (unspec:SI [ (symbol_ref/u:SI (*.LC0) [flags 0x2]) ] UNSPEC_GOTOFF test.cc:44 213 {*leasi} (expr_list:REG_EQUAL (symbol_ref/u:SI (*.LC0) [flags 0x2]) (nil))) (insn 44 151 45 4 (set (reg:DF 21 xmm0 [orig:129 D.2450 ] [129]) (mult:DF (reg:DF 21 xmm0 [orig:128 D.2450 ] [128]) (mem/u/c:DF (reg:SI 0 ax [147]) [5 S8 A64]))) test.cc:44 790 {*fop_df_comm_sse} (expr_list:REG_EQUAL (mult:DF (reg:DF 21 xmm0 [orig:128 D.2450 ] [128]) (const_double:DF 2.9997371893933895137251965934410691261292e-4 [0x0.9d495182a99308p-11])) (nil))) Compilation string: g++ -m32 -O2 -mfpmath=sse -fPIE -S test.cc Index: ira.c === --- ira.c (revision 214576) +++ ira.c (working copy) @@ -4887,7 +4887,7 @@ split_live_ranges_for_shrink_wrap (void) FOR_BB_INSNS (first, insn) { rtx dest = interesting_dest_for_shprep (insn, call_dom); - if (!dest) + if (!dest || dest == pic_offset_table_rtx) continue; rtx newreg = NULL_RTX; Index: lra-assigns.c === --- lra-assigns.c (revision 214576) +++ lra-assigns.c (working copy) @@ -879,11 +879,13 @@ spill_for (int regno, bitmap spilled_pse } /* Spill pseudos. */ EXECUTE_IF_SET_IN_BITMAP (spill_pseudos_bitmap, 0, spill_regno, bi) - if ((int) spill_regno = lra_constraint_new_regno_start -! bitmap_bit_p (lra_inheritance_pseudos, spill_regno) -! bitmap_bit_p (lra_split_regs, spill_regno) -! bitmap_bit_p (lra_subreg_reload_pseudos, spill_regno) -! bitmap_bit_p (lra_optional_reload_pseudos, spill_regno)) + if ((pic_offset_table_rtx != NULL + spill_regno == REGNO (pic_offset_table_rtx)) + || ((int) spill_regno = lra_constraint_new_regno_start +! bitmap_bit_p (lra_inheritance_pseudos, spill_regno) +! bitmap_bit_p (lra_split_regs, spill_regno) +! bitmap_bit_p (lra_subreg_reload_pseudos, spill_regno) +! bitmap_bit_p (lra_optional_reload_pseudos, spill_regno))) goto fail; insn_pseudos_num = 0; if (lra_dump_file != NULL) @@ -1053,7 +1055,9 @@ setup_live_pseudos_and_spill_after_risky return; } for (n = 0, i = FIRST_PSEUDO_REGISTER; i max_regno; i++) -if (reg_renumber[i] = 0 lra_reg_info[i].nrefs 0) +if ((pic_offset_table_rtx == NULL_RTX +|| i != (int) REGNO (pic_offset_table_rtx)) +reg_renumber[i] = 0 lra_reg_info[i].nrefs 0) sorted_pseudos[n++] = i; qsort (sorted_pseudos, n, sizeof (int), pseudo_compare_func); for (i = n - 1; i = 0; i--) @@ -1360,6 +1364,8 @@ assign_by_spills (void) } EXECUTE_IF_SET_IN_SPARSESET (live_range_hard_reg_pseudos, conflict_regno) { + gcc_assert (pic_offset_table_rtx == NULL + || conflict_regno != REGNO (pic_offset_table_rtx)); if ((int) conflict_regno = lra_constraint_new_regno_start) sorted_pseudos[nfails++] = conflict_regno; if (lra_dump_file != NULL)
Re: Enable EBX for x86 in 32bits PIC code
On 08/29/2014 02:47 AM, Ilya Enkovich wrote: Seems your patch doesn't cover all cases. Attached is a modified patch (with your changes included) and a test where double constant is wrongly rematerialized. I also see in ira dump that there is still a copy of PIC reg created: Initialization of original PIC reg: (insn 23 22 24 2 (set (reg:SI 127) (reg:SI 3 bx)) test.cc:42 90 {*movsi_internal} (expr_list:REG_DEAD (reg:SI 3 bx) (nil))) ... Copy is created: (insn 135 37 25 3 (set (reg:SI 138 [127]) (reg:SI 127)) 90 {*movsi_internal} (expr_list:REG_DEAD (reg:SI 127) (nil))) ... Copy is used: (insn 119 25 122 3 (set (reg:DF 134) (mem/u/c:DF (plus:SI (reg:SI 138 [127]) (const:SI (unspec:SI [ (symbol_ref/u:SI (*.LC0) [flags 0x2]) ] UNSPEC_GOTOFF))) [5 S8 A64])) 128 {*movdf_internal} (expr_list:REG_EQUIV (const_double:DF 2.9997371893933895137251965934410691261292e-4 [0x0.9d495182a99308p-11]) (nil))) After reload we have new usage of r127 which is allocated to ecx which actually does not have any definition in this function at all. (insn 151 42 44 4 (set (reg:SI 0 ax [147]) (plus:SI (reg:SI 2 cx [127]) (const:SI (unspec:SI [ (symbol_ref/u:SI (*.LC0) [flags 0x2]) ] UNSPEC_GOTOFF test.cc:44 213 {*leasi} (expr_list:REG_EQUAL (symbol_ref/u:SI (*.LC0) [flags 0x2]) (nil))) (insn 44 151 45 4 (set (reg:DF 21 xmm0 [orig:129 D.2450 ] [129]) (mult:DF (reg:DF 21 xmm0 [orig:128 D.2450 ] [128]) (mem/u/c:DF (reg:SI 0 ax [147]) [5 S8 A64]))) test.cc:44 790 {*fop_df_comm_sse} (expr_list:REG_EQUAL (mult:DF (reg:DF 21 xmm0 [orig:128 D.2450 ] [128]) (const_double:DF 2.9997371893933895137251965934410691261292e-4 [0x0.9d495182a99308p-11])) (nil))) Compilation string: g++ -m32 -O2 -mfpmath=sse -fPIE -S test.cc Ok, Ilya. I'll look at the problem this week.
Re: Enable EBX for x86 in 32bits PIC code
2014-08-28 12:28 GMT+04:00 Ilya Enkovich enkovich@gmail.com: 2014-08-28 0:19 GMT+04:00 Vladimir Makarov vmaka...@redhat.com: On 2014-08-26 5:42 PM, Ilya Enkovich wrote: Hi, Here is a patch I tried. I apply it over revision 214215. Unfortunately I do not have a small reproducer but the problem can be easily reproduced on SPEC2000 benchmark 175.vpr. The problem is in read_arch.c:701 where float value is compared with float constant 1.0. It is inlined into read_arch function and can be easily found in RTL dump of function read_arch as a float comparison with 1.0 after the first call to strtod function. Here is a compilation string I use: gcc -m32 -mno-movbe -g3 -fdump-rtl-all-details -O2 -ffast-math -mfpmath=sse -m32 -march=slm -fPIE -pie -c -o read_arch.o -DSPEC_CPU2000read_arch.c In my final assembler comparison with 1.0 looks like: comiss .LC11@GOTOFF(%ebp), %xmm0 # 1101 *cmpisf_sse [length = 7] and %ebp here doesn't have a proper value. I'll try to make a smaller reproducer if these instructions don't help. I've managed to reproduce it. Although it would be better to send the patch as an attachment. The problem is actually in IRA not LRA. IRA splits pseudo used for PIC. Then in a region when a *new* pseudo used as PIC we rematerialize a constant which transformed in memory addressed through *original* PIC pseudo. To solve the problem we should prevent such splitting and guarantee that PIC pseudo allocnos in different region gets the same hard reg. The following patch should solve the problem. Thanks for the patch! I'll try it and be back with results. Seems your patch doesn't cover all cases. Attached is a modified patch (with your changes included) and a test where double constant is wrongly rematerialized. I also see in ira dump that there is still a copy of PIC reg created: Initialization of original PIC reg: (insn 23 22 24 2 (set (reg:SI 127) (reg:SI 3 bx)) test.cc:42 90 {*movsi_internal} (expr_list:REG_DEAD (reg:SI 3 bx) (nil))) ... Copy is created: (insn 135 37 25 3 (set (reg:SI 138 [127]) (reg:SI 127)) 90 {*movsi_internal} (expr_list:REG_DEAD (reg:SI 127) (nil))) ... Copy is used: (insn 119 25 122 3 (set (reg:DF 134) (mem/u/c:DF (plus:SI (reg:SI 138 [127]) (const:SI (unspec:SI [ (symbol_ref/u:SI (*.LC0) [flags 0x2]) ] UNSPEC_GOTOFF))) [5 S8 A64])) 128 {*movdf_internal} (expr_list:REG_EQUIV (const_double:DF 2.9997371893933895137251965934410691261292e-4 [0x0.9d495182a99308p-11]) (nil))) After reload we have new usage of r127 which is allocated to ecx which actually does not have any definition in this function at all. (insn 151 42 44 4 (set (reg:SI 0 ax [147]) (plus:SI (reg:SI 2 cx [127]) (const:SI (unspec:SI [ (symbol_ref/u:SI (*.LC0) [flags 0x2]) ] UNSPEC_GOTOFF test.cc:44 213 {*leasi} (expr_list:REG_EQUAL (symbol_ref/u:SI (*.LC0) [flags 0x2]) (nil))) (insn 44 151 45 4 (set (reg:DF 21 xmm0 [orig:129 D.2450 ] [129]) (mult:DF (reg:DF 21 xmm0 [orig:128 D.2450 ] [128]) (mem/u/c:DF (reg:SI 0 ax [147]) [5 S8 A64]))) test.cc:44 790 {*fop_df_comm_sse} (expr_list:REG_EQUAL (mult:DF (reg:DF 21 xmm0 [orig:128 D.2450 ] [128]) (const_double:DF 2.9997371893933895137251965934410691261292e-4 [0x0.9d495182a99308p-11])) (nil))) Compilation string: g++ -m32 -O2 -mfpmath=sse -fPIE -S test.cc Thanks, Ilya Ilya pie-2014-08-28.patch Description: Binary data test.cc Description: Binary data
Re: Enable EBX for x86 in 32bits PIC code
2014-08-28 22:58 GMT+04:00 Uros Bizjak ubiz...@gmail.com: On Fri, Aug 22, 2014 at 2:21 PM, Ilya Enkovich enkovich@gmail.com wrote: On Cauldron 2014 we had a couple of talks about relaxation of ebx usage in 32bit PIC mode. It was decided that the best approach would be to not fix ebx register, use speudo register for GOT base address and let allocator do the rest. This should be similar to how clang and icc work with GOT base address. I've been working for some time on such patch and now want to share my results. (define_insn *pushtf [(set (match_operand:TF 0 push_operand =,) - (match_operand:TF 1 general_no_elim_operand x,*roF))] + (match_operand:TF 1 nonimmediate_no_elim_operand x,*roF))] Can you please explain the reason for this change (and a couple of similar changes to push patterns) ? This is a workaround for stability problem with reload. Immediate operands cause new usages of pseudo PIC register in reload which leads to wrong registers allocation. These changes wouldn't be required after reload issue if resolved. Ilya Uros.
Re: Enable EBX for x86 in 32bits PIC code
On 08/28/14 12:58, Uros Bizjak wrote: On Fri, Aug 22, 2014 at 2:21 PM, Ilya Enkovich enkovich@gmail.com wrote: On Cauldron 2014 we had a couple of talks about relaxation of ebx usage in 32bit PIC mode. It was decided that the best approach would be to not fix ebx register, use speudo register for GOT base address and let allocator do the rest. This should be similar to how clang and icc work with GOT base address. I've been working for some time on such patch and now want to share my results. (define_insn *pushtf [(set (match_operand:TF 0 push_operand =,) - (match_operand:TF 1 general_no_elim_operand x,*roF))] + (match_operand:TF 1 nonimmediate_no_elim_operand x,*roF))] Can you please explain the reason for this change (and a couple of similar changes to push patterns) ? I'd recommend dropping them from the WIP postings. jeff
Re: Enable EBX for x86 in 32bits PIC code
On 08/28/14 07:01, Uros Bizjak wrote: On Fri, Aug 22, 2014 at 2:21 PM, Ilya Enkovich enkovich@gmail.com wrote: Hi, On Cauldron 2014 we had a couple of talks about relaxation of ebx usage in 32bit PIC mode. It was decided that the best approach would be to not fix ebx register, use speudo register for GOT base address and let allocator do the rest. This should be similar to how clang and icc work with GOT base address. I've been working for some time on such patch and now want to share my results. +#define PIC_OFFSET_TABLE_REGNUM \ + ((TARGET_64BIT (ix86_cmodel == CM_SMALL_PIC \ + || TARGET_PECOFF)) \ + || !flag_pic ? INVALID_REGNUM \ + : X86_TUNE_RELAX_PIC_REG ? (pic_offset_table_rtx ? INVALID_REGNUM \ + : REAL_PIC_OFFSET_TABLE_REGNUM) \ + : reload_completed ? REGNO (pic_offset_table_rtx) \ : REAL_PIC_OFFSET_TABLE_REGNUM) I'd like to avoid X86_TUNE_RELAX_PIC_REG and always treat EBX as an allocatable register. This way, we can avoid all mess with implicit xchgs in atomic_compare_and_swapdwi_doubleword. Also, having allocatable EBX would allow us to introduce __builtin_cpuid builtin and cleanup cpiud.h. I think for the initial WIP patch it was fine. However I think we all agree that we want EBX as an allocatable register without any special conditions. So I'd recommend pulling this out of the WIP patches as well. Jeff
Re: Enable EBX for x86 in 32bits PIC code
On 08/28/14 02:37, Ilya Enkovich wrote: 2014-08-28 1:39 GMT+04:00 Jeff Law l...@redhat.com: On 08/26/14 15:42, Ilya Enkovich wrote: diff --git a/gcc/calls.c b/gcc/calls.c index 4285ec1..85dae6b 100644 --- a/gcc/calls.c +++ b/gcc/calls.c @@ -1122,6 +1122,14 @@ initialize_argument_information (int num_actuals ATTRIBUTE_UNUSED, call_expr_arg_iterator iter; tree arg; +if (targetm.calls.implicit_pic_arg (fndecl ? fndecl : fntype)) + { + gcc_assert (pic_offset_table_rtx); + args[j].tree_value = make_tree (ptr_type_node, + pic_offset_table_rtx); + j--; + } + if (struct_value_addr_value) { args[j].tree_value = struct_value_addr_value; So why do you need this? Can't this be handled in the call/call_value expanders or what about attaching the use to CALL_INSN_FUNCTION_USAGE from inside ix86_expand_call? Basically I'm not seeing the need for another target hook here. I think that would significantly simply the patch as well. GOT base address become an additional implicit arg with EBX relaxed and I handled it as all other args. I can move EBX initialization into ix86_expand_call. Would still need some hint from target to init pic_offset_table_rtx with proper value in the beginning of function expand. It doesn't really need to be an argument in the traditional sense and adding it just complicates things with a target implementation detail as far as I can see. I think you'll find that if you have the call pattern emit a copy from pic_offset_table_rtx into EBX and attach of use of EBX to the call then most of the code you've written to add the implicit argument just disappears. jeff
Re: Enable EBX for x86 in 32bits PIC code
2014-08-28 12:28 GMT+04:00 Ilya Enkovich enkovich@gmail.com: 2014-08-28 0:19 GMT+04:00 Vladimir Makarov vmaka...@redhat.com: On 2014-08-26 5:42 PM, Ilya Enkovich wrote: Hi, Here is a patch I tried. I apply it over revision 214215. Unfortunately I do not have a small reproducer but the problem can be easily reproduced on SPEC2000 benchmark 175.vpr. The problem is in read_arch.c:701 where float value is compared with float constant 1.0. It is inlined into read_arch function and can be easily found in RTL dump of function read_arch as a float comparison with 1.0 after the first call to strtod function. Here is a compilation string I use: gcc -m32 -mno-movbe -g3 -fdump-rtl-all-details -O2 -ffast-math -mfpmath=sse -m32 -march=slm -fPIE -pie -c -o read_arch.o -DSPEC_CPU2000read_arch.c In my final assembler comparison with 1.0 looks like: comiss .LC11@GOTOFF(%ebp), %xmm0 # 1101 *cmpisf_sse [length = 7] and %ebp here doesn't have a proper value. I'll try to make a smaller reproducer if these instructions don't help. I've managed to reproduce it. Although it would be better to send the patch as an attachment. The problem is actually in IRA not LRA. IRA splits pseudo used for PIC. Then in a region when a *new* pseudo used as PIC we rematerialize a constant which transformed in memory addressed through *original* PIC pseudo. To solve the problem we should prevent such splitting and guarantee that PIC pseudo allocnos in different region gets the same hard reg. The following patch should solve the problem. Thanks for the patch! I'll try it and be back with results. Seems your patch doesn't cover all cases. Attached is a modified patch (with your changes included) and a test where double constant is wrongly rematerialized. I also see in ira dump that there is still a copy of PIC reg created: Initialization of original PIC reg: (insn 23 22 24 2 (set (reg:SI 127) (reg:SI 3 bx)) test.cc:42 90 {*movsi_internal} (expr_list:REG_DEAD (reg:SI 3 bx) (nil))) ... Copy is created: (insn 135 37 25 3 (set (reg:SI 138 [127]) (reg:SI 127)) 90 {*movsi_internal} (expr_list:REG_DEAD (reg:SI 127) (nil))) ... Copy is used: (insn 119 25 122 3 (set (reg:DF 134) (mem/u/c:DF (plus:SI (reg:SI 138 [127]) (const:SI (unspec:SI [ (symbol_ref/u:SI (*.LC0) [flags 0x2]) ] UNSPEC_GOTOFF))) [5 S8 A64])) 128 {*movdf_internal} (expr_list:REG_EQUIV (const_double:DF 2.9997371893933895137251965934410691261292e-4 [0x0.9d495182a99308p-11]) (nil))) After reload we have new usage of r127 which is allocated to ecx which actually does not have any definition in this function at all. (insn 151 42 44 4 (set (reg:SI 0 ax [147]) (plus:SI (reg:SI 2 cx [127]) (const:SI (unspec:SI [ (symbol_ref/u:SI (*.LC0) [flags 0x2]) ] UNSPEC_GOTOFF test.cc:44 213 {*leasi} (expr_list:REG_EQUAL (symbol_ref/u:SI (*.LC0) [flags 0x2]) (nil))) (insn 44 151 45 4 (set (reg:DF 21 xmm0 [orig:129 D.2450 ] [129]) (mult:DF (reg:DF 21 xmm0 [orig:128 D.2450 ] [128]) (mem/u/c:DF (reg:SI 0 ax [147]) [5 S8 A64]))) test.cc:44 790 {*fop_df_comm_sse} (expr_list:REG_EQUAL (mult:DF (reg:DF 21 xmm0 [orig:128 D.2450 ] [128]) (const_double:DF 2.9997371893933895137251965934410691261292e-4 [0x0.9d495182a99308p-11])) (nil))) Compilation string: g++ -m32 -O2 -mfpmath=sse -fPIE -S test.cc Thanks, Ilya Ilya pie-2014-08-28.patch Description: Binary data test.cc Description: Binary data
Re: Enable EBX for x86 in 32bits PIC code
2014-08-28 22:58 GMT+04:00 Uros Bizjak ubiz...@gmail.com: On Fri, Aug 22, 2014 at 2:21 PM, Ilya Enkovich enkovich@gmail.com wrote: On Cauldron 2014 we had a couple of talks about relaxation of ebx usage in 32bit PIC mode. It was decided that the best approach would be to not fix ebx register, use speudo register for GOT base address and let allocator do the rest. This should be similar to how clang and icc work with GOT base address. I've been working for some time on such patch and now want to share my results. (define_insn *pushtf [(set (match_operand:TF 0 push_operand =,) - (match_operand:TF 1 general_no_elim_operand x,*roF))] + (match_operand:TF 1 nonimmediate_no_elim_operand x,*roF))] Can you please explain the reason for this change (and a couple of similar changes to push patterns) ? This is a workaround for stability problem with reload. Immediate operands cause new usages of pseudo PIC register in reload which leads to wrong registers allocation. These changes wouldn't be required after reload issue if resolved. Ilya Uros.
Re: Enable EBX for x86 in 32bits PIC code
On 08/28/14 12:58, Uros Bizjak wrote: On Fri, Aug 22, 2014 at 2:21 PM, Ilya Enkovich enkovich@gmail.com wrote: On Cauldron 2014 we had a couple of talks about relaxation of ebx usage in 32bit PIC mode. It was decided that the best approach would be to not fix ebx register, use speudo register for GOT base address and let allocator do the rest. This should be similar to how clang and icc work with GOT base address. I've been working for some time on such patch and now want to share my results. (define_insn *pushtf [(set (match_operand:TF 0 push_operand =,) - (match_operand:TF 1 general_no_elim_operand x,*roF))] + (match_operand:TF 1 nonimmediate_no_elim_operand x,*roF))] Can you please explain the reason for this change (and a couple of similar changes to push patterns) ? I'd recommend dropping them from the WIP postings. jeff
Re: Enable EBX for x86 in 32bits PIC code
On 08/28/14 07:01, Uros Bizjak wrote: On Fri, Aug 22, 2014 at 2:21 PM, Ilya Enkovich enkovich@gmail.com wrote: Hi, On Cauldron 2014 we had a couple of talks about relaxation of ebx usage in 32bit PIC mode. It was decided that the best approach would be to not fix ebx register, use speudo register for GOT base address and let allocator do the rest. This should be similar to how clang and icc work with GOT base address. I've been working for some time on such patch and now want to share my results. +#define PIC_OFFSET_TABLE_REGNUM \ + ((TARGET_64BIT (ix86_cmodel == CM_SMALL_PIC \ + || TARGET_PECOFF)) \ + || !flag_pic ? INVALID_REGNUM \ + : X86_TUNE_RELAX_PIC_REG ? (pic_offset_table_rtx ? INVALID_REGNUM \ + : REAL_PIC_OFFSET_TABLE_REGNUM) \ + : reload_completed ? REGNO (pic_offset_table_rtx) \ : REAL_PIC_OFFSET_TABLE_REGNUM) I'd like to avoid X86_TUNE_RELAX_PIC_REG and always treat EBX as an allocatable register. This way, we can avoid all mess with implicit xchgs in atomic_compare_and_swapdwi_doubleword. Also, having allocatable EBX would allow us to introduce __builtin_cpuid builtin and cleanup cpiud.h. I think for the initial WIP patch it was fine. However I think we all agree that we want EBX as an allocatable register without any special conditions. So I'd recommend pulling this out of the WIP patches as well. Jeff
Re: Enable EBX for x86 in 32bits PIC code
On 08/28/14 02:37, Ilya Enkovich wrote: 2014-08-28 1:39 GMT+04:00 Jeff Law l...@redhat.com: On 08/26/14 15:42, Ilya Enkovich wrote: diff --git a/gcc/calls.c b/gcc/calls.c index 4285ec1..85dae6b 100644 --- a/gcc/calls.c +++ b/gcc/calls.c @@ -1122,6 +1122,14 @@ initialize_argument_information (int num_actuals ATTRIBUTE_UNUSED, call_expr_arg_iterator iter; tree arg; +if (targetm.calls.implicit_pic_arg (fndecl ? fndecl : fntype)) + { + gcc_assert (pic_offset_table_rtx); + args[j].tree_value = make_tree (ptr_type_node, + pic_offset_table_rtx); + j--; + } + if (struct_value_addr_value) { args[j].tree_value = struct_value_addr_value; So why do you need this? Can't this be handled in the call/call_value expanders or what about attaching the use to CALL_INSN_FUNCTION_USAGE from inside ix86_expand_call? Basically I'm not seeing the need for another target hook here. I think that would significantly simply the patch as well. GOT base address become an additional implicit arg with EBX relaxed and I handled it as all other args. I can move EBX initialization into ix86_expand_call. Would still need some hint from target to init pic_offset_table_rtx with proper value in the beginning of function expand. It doesn't really need to be an argument in the traditional sense and adding it just complicates things with a target implementation detail as far as I can see. I think you'll find that if you have the call pattern emit a copy from pic_offset_table_rtx into EBX and attach of use of EBX to the call then most of the code you've written to add the implicit argument just disappears. jeff
Re: Enable EBX for x86 in 32bits PIC code
2014-08-28 0:19 GMT+04:00 Vladimir Makarov vmaka...@redhat.com: On 2014-08-26 5:42 PM, Ilya Enkovich wrote: Hi, Here is a patch I tried. I apply it over revision 214215. Unfortunately I do not have a small reproducer but the problem can be easily reproduced on SPEC2000 benchmark 175.vpr. The problem is in read_arch.c:701 where float value is compared with float constant 1.0. It is inlined into read_arch function and can be easily found in RTL dump of function read_arch as a float comparison with 1.0 after the first call to strtod function. Here is a compilation string I use: gcc -m32 -mno-movbe -g3 -fdump-rtl-all-details -O2 -ffast-math -mfpmath=sse -m32 -march=slm -fPIE -pie -c -o read_arch.o -DSPEC_CPU2000read_arch.c In my final assembler comparison with 1.0 looks like: comiss .LC11@GOTOFF(%ebp), %xmm0 # 1101 *cmpisf_sse [length = 7] and %ebp here doesn't have a proper value. I'll try to make a smaller reproducer if these instructions don't help. I've managed to reproduce it. Although it would be better to send the patch as an attachment. The problem is actually in IRA not LRA. IRA splits pseudo used for PIC. Then in a region when a *new* pseudo used as PIC we rematerialize a constant which transformed in memory addressed through *original* PIC pseudo. To solve the problem we should prevent such splitting and guarantee that PIC pseudo allocnos in different region gets the same hard reg. The following patch should solve the problem. Thanks for the patch! I'll try it and be back with results. Ilya
Re: Enable EBX for x86 in 32bits PIC code
2014-08-28 1:39 GMT+04:00 Jeff Law l...@redhat.com: On 08/26/14 15:42, Ilya Enkovich wrote: diff --git a/gcc/calls.c b/gcc/calls.c index 4285ec1..85dae6b 100644 --- a/gcc/calls.c +++ b/gcc/calls.c @@ -1122,6 +1122,14 @@ initialize_argument_information (int num_actuals ATTRIBUTE_UNUSED, call_expr_arg_iterator iter; tree arg; +if (targetm.calls.implicit_pic_arg (fndecl ? fndecl : fntype)) + { + gcc_assert (pic_offset_table_rtx); + args[j].tree_value = make_tree (ptr_type_node, + pic_offset_table_rtx); + j--; + } + if (struct_value_addr_value) { args[j].tree_value = struct_value_addr_value; So why do you need this? Can't this be handled in the call/call_value expanders or what about attaching the use to CALL_INSN_FUNCTION_USAGE from inside ix86_expand_call? Basically I'm not seeing the need for another target hook here. I think that would significantly simply the patch as well. GOT base address become an additional implicit arg with EBX relaxed and I handled it as all other args. I can move EBX initialization into ix86_expand_call. Would still need some hint from target to init pic_offset_table_rtx with proper value in the beginning of function expand. Thanks, Ilya Jeff
Re: Enable EBX for x86 in 32bits PIC code
On Thu, Aug 28, 2014 at 10:37 AM, Ilya Enkovich enkovich@gmail.com wrote: 2014-08-28 1:39 GMT+04:00 Jeff Law l...@redhat.com: On 08/26/14 15:42, Ilya Enkovich wrote: diff --git a/gcc/calls.c b/gcc/calls.c index 4285ec1..85dae6b 100644 --- a/gcc/calls.c +++ b/gcc/calls.c @@ -1122,6 +1122,14 @@ initialize_argument_information (int num_actuals ATTRIBUTE_UNUSED, call_expr_arg_iterator iter; tree arg; +if (targetm.calls.implicit_pic_arg (fndecl ? fndecl : fntype)) + { + gcc_assert (pic_offset_table_rtx); + args[j].tree_value = make_tree (ptr_type_node, + pic_offset_table_rtx); + j--; + } + if (struct_value_addr_value) { args[j].tree_value = struct_value_addr_value; So why do you need this? Can't this be handled in the call/call_value expanders or what about attaching the use to CALL_INSN_FUNCTION_USAGE from inside ix86_expand_call? Basically I'm not seeing the need for another target hook here. I think that would significantly simply the patch as well. GOT base address become an additional implicit arg with EBX relaxed and I handled it as all other args. I can move EBX initialization into ix86_expand_call. Would still need some hint from target to init pic_offset_table_rtx with proper value in the beginning of function expand. Maybe you can you use get_hard_reg_initial_val for this? Uros.
Re: Enable EBX for x86 in 32bits PIC code
2014-08-28 16:42 GMT+04:00 Uros Bizjak ubiz...@gmail.com: On Thu, Aug 28, 2014 at 10:37 AM, Ilya Enkovich enkovich@gmail.com wrote: 2014-08-28 1:39 GMT+04:00 Jeff Law l...@redhat.com: On 08/26/14 15:42, Ilya Enkovich wrote: diff --git a/gcc/calls.c b/gcc/calls.c index 4285ec1..85dae6b 100644 --- a/gcc/calls.c +++ b/gcc/calls.c @@ -1122,6 +1122,14 @@ initialize_argument_information (int num_actuals ATTRIBUTE_UNUSED, call_expr_arg_iterator iter; tree arg; +if (targetm.calls.implicit_pic_arg (fndecl ? fndecl : fntype)) + { + gcc_assert (pic_offset_table_rtx); + args[j].tree_value = make_tree (ptr_type_node, + pic_offset_table_rtx); + j--; + } + if (struct_value_addr_value) { args[j].tree_value = struct_value_addr_value; So why do you need this? Can't this be handled in the call/call_value expanders or what about attaching the use to CALL_INSN_FUNCTION_USAGE from inside ix86_expand_call? Basically I'm not seeing the need for another target hook here. I think that would significantly simply the patch as well. GOT base address become an additional implicit arg with EBX relaxed and I handled it as all other args. I can move EBX initialization into ix86_expand_call. Would still need some hint from target to init pic_offset_table_rtx with proper value in the beginning of function expand. Maybe you can you use get_hard_reg_initial_val for this? Actually there is no input hard reg holding GOT address. Currently I use initialization with ebx with following ebx initialization in prolog_epilog pass. But this is a temporary workaround. It is inefficient because always uses callee save reg to get GOT address. I suppose we should generate pseudo reg for pic_offset_table_rtx and also set_got with this register as a destination in expand pass. After register allocation set_got may be transformed into get_pc_thunk call with proper hard reg. But some target hook has to be used for this. Ilya Uros.
Re: Enable EBX for x86 in 32bits PIC code
On Fri, Aug 22, 2014 at 2:21 PM, Ilya Enkovich enkovich@gmail.com wrote: Hi, On Cauldron 2014 we had a couple of talks about relaxation of ebx usage in 32bit PIC mode. It was decided that the best approach would be to not fix ebx register, use speudo register for GOT base address and let allocator do the rest. This should be similar to how clang and icc work with GOT base address. I've been working for some time on such patch and now want to share my results. +#define PIC_OFFSET_TABLE_REGNUM \ + ((TARGET_64BIT (ix86_cmodel == CM_SMALL_PIC \ + || TARGET_PECOFF)) \ + || !flag_pic ? INVALID_REGNUM \ + : X86_TUNE_RELAX_PIC_REG ? (pic_offset_table_rtx ? INVALID_REGNUM \ + : REAL_PIC_OFFSET_TABLE_REGNUM) \ + : reload_completed ? REGNO (pic_offset_table_rtx) \ : REAL_PIC_OFFSET_TABLE_REGNUM) I'd like to avoid X86_TUNE_RELAX_PIC_REG and always treat EBX as an allocatable register. This way, we can avoid all mess with implicit xchgs in atomic_compare_and_swapdwi_doubleword. Also, having allocatable EBX would allow us to introduce __builtin_cpuid builtin and cleanup cpiud.h.
Re: Enable EBX for x86 in 32bits PIC code
On Thu, Aug 28, 2014 at 2:54 PM, Ilya Enkovich enkovich@gmail.com wrote: diff --git a/gcc/calls.c b/gcc/calls.c index 4285ec1..85dae6b 100644 --- a/gcc/calls.c +++ b/gcc/calls.c @@ -1122,6 +1122,14 @@ initialize_argument_information (int num_actuals ATTRIBUTE_UNUSED, call_expr_arg_iterator iter; tree arg; +if (targetm.calls.implicit_pic_arg (fndecl ? fndecl : fntype)) + { + gcc_assert (pic_offset_table_rtx); + args[j].tree_value = make_tree (ptr_type_node, + pic_offset_table_rtx); + j--; + } + if (struct_value_addr_value) { args[j].tree_value = struct_value_addr_value; So why do you need this? Can't this be handled in the call/call_value expanders or what about attaching the use to CALL_INSN_FUNCTION_USAGE from inside ix86_expand_call? Basically I'm not seeing the need for another target hook here. I think that would significantly simply the patch as well. GOT base address become an additional implicit arg with EBX relaxed and I handled it as all other args. I can move EBX initialization into ix86_expand_call. Would still need some hint from target to init pic_offset_table_rtx with proper value in the beginning of function expand. Maybe you can you use get_hard_reg_initial_val for this? Actually there is no input hard reg holding GOT address. Currently I use initialization with ebx with following ebx initialization in prolog_epilog pass. But this is a temporary workaround. It is inefficient because always uses callee save reg to get GOT address. I suppose we should generate pseudo reg for pic_offset_table_rtx and also set_got with this register as a destination in expand pass. After register allocation set_got may be transformed into get_pc_thunk call with proper hard reg. But some target hook has to be used for this. Let me expand my idea a bit. IIRC, get_hard_reg_initial_val and friends will automatically emit intialization of a pseudo from pic_offset_table_rtx hard reg. After reload, real initialization of pic_offset_table_rtx hard reg is emitted in pro_and_epilogue pass. I don't know if this works with current implementation of dynamic pic_offset_table_rtx selection, though. Uros.
Re: Enable EBX for x86 in 32bits PIC code
2014-08-28 17:01 GMT+04:00 Uros Bizjak ubiz...@gmail.com: On Fri, Aug 22, 2014 at 2:21 PM, Ilya Enkovich enkovich@gmail.com wrote: Hi, On Cauldron 2014 we had a couple of talks about relaxation of ebx usage in 32bit PIC mode. It was decided that the best approach would be to not fix ebx register, use speudo register for GOT base address and let allocator do the rest. This should be similar to how clang and icc work with GOT base address. I've been working for some time on such patch and now want to share my results. +#define PIC_OFFSET_TABLE_REGNUM \ + ((TARGET_64BIT (ix86_cmodel == CM_SMALL_PIC \ + || TARGET_PECOFF)) \ + || !flag_pic ? INVALID_REGNUM \ + : X86_TUNE_RELAX_PIC_REG ? (pic_offset_table_rtx ? INVALID_REGNUM \ + : REAL_PIC_OFFSET_TABLE_REGNUM) \ + : reload_completed ? REGNO (pic_offset_table_rtx) \ : REAL_PIC_OFFSET_TABLE_REGNUM) I'd like to avoid X86_TUNE_RELAX_PIC_REG and always treat EBX as an allocatable register. This way, we can avoid all mess with implicit xchgs in atomic_compare_and_swapdwi_doubleword. Also, having allocatable EBX would allow us to introduce __builtin_cpuid builtin and cleanup cpiud.h. We should show nice performance to have this feature enabled by default. Currently patch causes a set of performance losses. I have a version of this patch where EBX is relaxed by a compiler flag, not tune flag. Ilya
Re: Enable EBX for x86 in 32bits PIC code
2014-08-28 17:08 GMT+04:00 Uros Bizjak ubiz...@gmail.com: On Thu, Aug 28, 2014 at 2:54 PM, Ilya Enkovich enkovich@gmail.com wrote: diff --git a/gcc/calls.c b/gcc/calls.c index 4285ec1..85dae6b 100644 --- a/gcc/calls.c +++ b/gcc/calls.c @@ -1122,6 +1122,14 @@ initialize_argument_information (int num_actuals ATTRIBUTE_UNUSED, call_expr_arg_iterator iter; tree arg; +if (targetm.calls.implicit_pic_arg (fndecl ? fndecl : fntype)) + { + gcc_assert (pic_offset_table_rtx); + args[j].tree_value = make_tree (ptr_type_node, + pic_offset_table_rtx); + j--; + } + if (struct_value_addr_value) { args[j].tree_value = struct_value_addr_value; So why do you need this? Can't this be handled in the call/call_value expanders or what about attaching the use to CALL_INSN_FUNCTION_USAGE from inside ix86_expand_call? Basically I'm not seeing the need for another target hook here. I think that would significantly simply the patch as well. GOT base address become an additional implicit arg with EBX relaxed and I handled it as all other args. I can move EBX initialization into ix86_expand_call. Would still need some hint from target to init pic_offset_table_rtx with proper value in the beginning of function expand. Maybe you can you use get_hard_reg_initial_val for this? Actually there is no input hard reg holding GOT address. Currently I use initialization with ebx with following ebx initialization in prolog_epilog pass. But this is a temporary workaround. It is inefficient because always uses callee save reg to get GOT address. I suppose we should generate pseudo reg for pic_offset_table_rtx and also set_got with this register as a destination in expand pass. After register allocation set_got may be transformed into get_pc_thunk call with proper hard reg. But some target hook has to be used for this. Let me expand my idea a bit. IIRC, get_hard_reg_initial_val and friends will automatically emit intialization of a pseudo from pic_offset_table_rtx hard reg. After reload, real initialization of pic_offset_table_rtx hard reg is emitted in pro_and_epilogue pass. I don't know if this works with current implementation of dynamic pic_offset_table_rtx selection, though. That means you should choose some hard reg early before register allocation to be used for PIC reg initialization. I do not like we have to do this and want to just generate set_got with pseudo reg and do not involve any additional hard reg. That would look like (insn/f 168 167 169 2 (parallel [ (set (reg:SI 127) (unspec:SI [ (const_int 0 [0]) ] UNSPEC_SET_GOT)) (clobber (reg:CC 17 flags)) ]) test.cc:42 -1 (expr_list:REG_CFA_FLUSH_QUEUE (nil) (nil))) after expand pass. r127 is pic_offset_table_rtx here. And after reload it would become: (insn/f 168 167 169 2 (parallel [ (set (reg:SI 3 bx) (unspec:SI [ (const_int 0 [0]) ] UNSPEC_SET_GOT)) (clobber (reg:CC 17 flags)) ]) test.cc:42 -1 (expr_list:REG_CFA_FLUSH_QUEUE (nil) (nil))) And no additional actions are required on pro_and_epilogue. Also it simplifies analysis whether we should generate set_got at all. Current we check hard reg is ever live which is wrong with not fixed ebx because any usage of hard reg used to init GOT doesn't mean GOT usage. And with my proposed scheme unused GOT would mean DCE just removes useless set_got. Ilya Uros.
Re: Enable EBX for x86 in 32bits PIC code
On Thu, Aug 28, 2014 at 3:29 PM, Ilya Enkovich enkovich@gmail.com wrote: diff --git a/gcc/calls.c b/gcc/calls.c index 4285ec1..85dae6b 100644 --- a/gcc/calls.c +++ b/gcc/calls.c @@ -1122,6 +1122,14 @@ initialize_argument_information (int num_actuals ATTRIBUTE_UNUSED, call_expr_arg_iterator iter; tree arg; +if (targetm.calls.implicit_pic_arg (fndecl ? fndecl : fntype)) + { + gcc_assert (pic_offset_table_rtx); + args[j].tree_value = make_tree (ptr_type_node, + pic_offset_table_rtx); + j--; + } + if (struct_value_addr_value) { args[j].tree_value = struct_value_addr_value; So why do you need this? Can't this be handled in the call/call_value expanders or what about attaching the use to CALL_INSN_FUNCTION_USAGE from inside ix86_expand_call? Basically I'm not seeing the need for another target hook here. I think that would significantly simply the patch as well. GOT base address become an additional implicit arg with EBX relaxed and I handled it as all other args. I can move EBX initialization into ix86_expand_call. Would still need some hint from target to init pic_offset_table_rtx with proper value in the beginning of function expand. Maybe you can you use get_hard_reg_initial_val for this? Actually there is no input hard reg holding GOT address. Currently I use initialization with ebx with following ebx initialization in prolog_epilog pass. But this is a temporary workaround. It is inefficient because always uses callee save reg to get GOT address. I suppose we should generate pseudo reg for pic_offset_table_rtx and also set_got with this register as a destination in expand pass. After register allocation set_got may be transformed into get_pc_thunk call with proper hard reg. But some target hook has to be used for this. Let me expand my idea a bit. IIRC, get_hard_reg_initial_val and friends will automatically emit intialization of a pseudo from pic_offset_table_rtx hard reg. After reload, real initialization of pic_offset_table_rtx hard reg is emitted in pro_and_epilogue pass. I don't know if this works with current implementation of dynamic pic_offset_table_rtx selection, though. That means you should choose some hard reg early before register allocation to be used for PIC reg initialization. I do not like we have to do this and want to just generate set_got with pseudo reg and do not involve any additional hard reg. That would look like (insn/f 168 167 169 2 (parallel [ (set (reg:SI 127) (unspec:SI [ (const_int 0 [0]) ] UNSPEC_SET_GOT)) (clobber (reg:CC 17 flags)) ]) test.cc:42 -1 (expr_list:REG_CFA_FLUSH_QUEUE (nil) (nil))) after expand pass. r127 is pic_offset_table_rtx here. And after reload it would become: (insn/f 168 167 169 2 (parallel [ (set (reg:SI 3 bx) (unspec:SI [ (const_int 0 [0]) ] UNSPEC_SET_GOT)) (clobber (reg:CC 17 flags)) ]) test.cc:42 -1 (expr_list:REG_CFA_FLUSH_QUEUE (nil) (nil))) And no additional actions are required on pro_and_epilogue. Also it simplifies analysis whether we should generate set_got at all. Current we check hard reg is ever live which is wrong with not fixed ebx because any usage of hard reg used to init GOT doesn't mean GOT usage. And with my proposed scheme unused GOT would mean DCE just removes useless set_got. Yes this is better. I was under impression you want to retain current initialization insertion in expand_prologue. Uros.
Re: Enable EBX for x86 in 32bits PIC code
On 08/28/2014 03:01 PM, Uros Bizjak wrote: I'd like to avoid X86_TUNE_RELAX_PIC_REG and always treat EBX as an allocatable register. This way, we can avoid all mess with implicit xchgs in atomic_compare_and_swapdwi_doubleword. Also, having allocatable EBX would allow us to introduce __builtin_cpuid builtin and cleanup cpiud.h. It also makes writing solid inline assembly which has to use %ebx for some reason much easier. We just fixed a glibc bug related to that. -- Florian Weimer / Red Hat Product Security
Re: Enable EBX for x86 in 32bits PIC code
On Fri, Aug 22, 2014 at 2:21 PM, Ilya Enkovich enkovich@gmail.com wrote: On Cauldron 2014 we had a couple of talks about relaxation of ebx usage in 32bit PIC mode. It was decided that the best approach would be to not fix ebx register, use speudo register for GOT base address and let allocator do the rest. This should be similar to how clang and icc work with GOT base address. I've been working for some time on such patch and now want to share my results. (define_insn *pushtf [(set (match_operand:TF 0 push_operand =,) - (match_operand:TF 1 general_no_elim_operand x,*roF))] + (match_operand:TF 1 nonimmediate_no_elim_operand x,*roF))] Can you please explain the reason for this change (and a couple of similar changes to push patterns) ? Uros.
Re: Enable EBX for x86 in 32bits PIC code
2014-08-28 0:19 GMT+04:00 Vladimir Makarov vmaka...@redhat.com: On 2014-08-26 5:42 PM, Ilya Enkovich wrote: Hi, Here is a patch I tried. I apply it over revision 214215. Unfortunately I do not have a small reproducer but the problem can be easily reproduced on SPEC2000 benchmark 175.vpr. The problem is in read_arch.c:701 where float value is compared with float constant 1.0. It is inlined into read_arch function and can be easily found in RTL dump of function read_arch as a float comparison with 1.0 after the first call to strtod function. Here is a compilation string I use: gcc -m32 -mno-movbe -g3 -fdump-rtl-all-details -O2 -ffast-math -mfpmath=sse -m32 -march=slm -fPIE -pie -c -o read_arch.o -DSPEC_CPU2000read_arch.c In my final assembler comparison with 1.0 looks like: comiss .LC11@GOTOFF(%ebp), %xmm0 # 1101 *cmpisf_sse [length = 7] and %ebp here doesn't have a proper value. I'll try to make a smaller reproducer if these instructions don't help. I've managed to reproduce it. Although it would be better to send the patch as an attachment. The problem is actually in IRA not LRA. IRA splits pseudo used for PIC. Then in a region when a *new* pseudo used as PIC we rematerialize a constant which transformed in memory addressed through *original* PIC pseudo. To solve the problem we should prevent such splitting and guarantee that PIC pseudo allocnos in different region gets the same hard reg. The following patch should solve the problem. Thanks for the patch! I'll try it and be back with results. Ilya
Re: Enable EBX for x86 in 32bits PIC code
2014-08-28 1:39 GMT+04:00 Jeff Law l...@redhat.com: On 08/26/14 15:42, Ilya Enkovich wrote: diff --git a/gcc/calls.c b/gcc/calls.c index 4285ec1..85dae6b 100644 --- a/gcc/calls.c +++ b/gcc/calls.c @@ -1122,6 +1122,14 @@ initialize_argument_information (int num_actuals ATTRIBUTE_UNUSED, call_expr_arg_iterator iter; tree arg; +if (targetm.calls.implicit_pic_arg (fndecl ? fndecl : fntype)) + { + gcc_assert (pic_offset_table_rtx); + args[j].tree_value = make_tree (ptr_type_node, + pic_offset_table_rtx); + j--; + } + if (struct_value_addr_value) { args[j].tree_value = struct_value_addr_value; So why do you need this? Can't this be handled in the call/call_value expanders or what about attaching the use to CALL_INSN_FUNCTION_USAGE from inside ix86_expand_call? Basically I'm not seeing the need for another target hook here. I think that would significantly simply the patch as well. GOT base address become an additional implicit arg with EBX relaxed and I handled it as all other args. I can move EBX initialization into ix86_expand_call. Would still need some hint from target to init pic_offset_table_rtx with proper value in the beginning of function expand. Thanks, Ilya Jeff
Re: Enable EBX for x86 in 32bits PIC code
On Thu, Aug 28, 2014 at 10:37 AM, Ilya Enkovich enkovich@gmail.com wrote: 2014-08-28 1:39 GMT+04:00 Jeff Law l...@redhat.com: On 08/26/14 15:42, Ilya Enkovich wrote: diff --git a/gcc/calls.c b/gcc/calls.c index 4285ec1..85dae6b 100644 --- a/gcc/calls.c +++ b/gcc/calls.c @@ -1122,6 +1122,14 @@ initialize_argument_information (int num_actuals ATTRIBUTE_UNUSED, call_expr_arg_iterator iter; tree arg; +if (targetm.calls.implicit_pic_arg (fndecl ? fndecl : fntype)) + { + gcc_assert (pic_offset_table_rtx); + args[j].tree_value = make_tree (ptr_type_node, + pic_offset_table_rtx); + j--; + } + if (struct_value_addr_value) { args[j].tree_value = struct_value_addr_value; So why do you need this? Can't this be handled in the call/call_value expanders or what about attaching the use to CALL_INSN_FUNCTION_USAGE from inside ix86_expand_call? Basically I'm not seeing the need for another target hook here. I think that would significantly simply the patch as well. GOT base address become an additional implicit arg with EBX relaxed and I handled it as all other args. I can move EBX initialization into ix86_expand_call. Would still need some hint from target to init pic_offset_table_rtx with proper value in the beginning of function expand. Maybe you can you use get_hard_reg_initial_val for this? Uros.
Re: Enable EBX for x86 in 32bits PIC code
2014-08-28 16:42 GMT+04:00 Uros Bizjak ubiz...@gmail.com: On Thu, Aug 28, 2014 at 10:37 AM, Ilya Enkovich enkovich@gmail.com wrote: 2014-08-28 1:39 GMT+04:00 Jeff Law l...@redhat.com: On 08/26/14 15:42, Ilya Enkovich wrote: diff --git a/gcc/calls.c b/gcc/calls.c index 4285ec1..85dae6b 100644 --- a/gcc/calls.c +++ b/gcc/calls.c @@ -1122,6 +1122,14 @@ initialize_argument_information (int num_actuals ATTRIBUTE_UNUSED, call_expr_arg_iterator iter; tree arg; +if (targetm.calls.implicit_pic_arg (fndecl ? fndecl : fntype)) + { + gcc_assert (pic_offset_table_rtx); + args[j].tree_value = make_tree (ptr_type_node, + pic_offset_table_rtx); + j--; + } + if (struct_value_addr_value) { args[j].tree_value = struct_value_addr_value; So why do you need this? Can't this be handled in the call/call_value expanders or what about attaching the use to CALL_INSN_FUNCTION_USAGE from inside ix86_expand_call? Basically I'm not seeing the need for another target hook here. I think that would significantly simply the patch as well. GOT base address become an additional implicit arg with EBX relaxed and I handled it as all other args. I can move EBX initialization into ix86_expand_call. Would still need some hint from target to init pic_offset_table_rtx with proper value in the beginning of function expand. Maybe you can you use get_hard_reg_initial_val for this? Actually there is no input hard reg holding GOT address. Currently I use initialization with ebx with following ebx initialization in prolog_epilog pass. But this is a temporary workaround. It is inefficient because always uses callee save reg to get GOT address. I suppose we should generate pseudo reg for pic_offset_table_rtx and also set_got with this register as a destination in expand pass. After register allocation set_got may be transformed into get_pc_thunk call with proper hard reg. But some target hook has to be used for this. Ilya Uros.
Re: Enable EBX for x86 in 32bits PIC code
On Fri, Aug 22, 2014 at 2:21 PM, Ilya Enkovich enkovich@gmail.com wrote: Hi, On Cauldron 2014 we had a couple of talks about relaxation of ebx usage in 32bit PIC mode. It was decided that the best approach would be to not fix ebx register, use speudo register for GOT base address and let allocator do the rest. This should be similar to how clang and icc work with GOT base address. I've been working for some time on such patch and now want to share my results. +#define PIC_OFFSET_TABLE_REGNUM \ + ((TARGET_64BIT (ix86_cmodel == CM_SMALL_PIC \ + || TARGET_PECOFF)) \ + || !flag_pic ? INVALID_REGNUM \ + : X86_TUNE_RELAX_PIC_REG ? (pic_offset_table_rtx ? INVALID_REGNUM \ + : REAL_PIC_OFFSET_TABLE_REGNUM) \ + : reload_completed ? REGNO (pic_offset_table_rtx) \ : REAL_PIC_OFFSET_TABLE_REGNUM) I'd like to avoid X86_TUNE_RELAX_PIC_REG and always treat EBX as an allocatable register. This way, we can avoid all mess with implicit xchgs in atomic_compare_and_swapdwi_doubleword. Also, having allocatable EBX would allow us to introduce __builtin_cpuid builtin and cleanup cpiud.h.
Re: Enable EBX for x86 in 32bits PIC code
On Thu, Aug 28, 2014 at 2:54 PM, Ilya Enkovich enkovich@gmail.com wrote: diff --git a/gcc/calls.c b/gcc/calls.c index 4285ec1..85dae6b 100644 --- a/gcc/calls.c +++ b/gcc/calls.c @@ -1122,6 +1122,14 @@ initialize_argument_information (int num_actuals ATTRIBUTE_UNUSED, call_expr_arg_iterator iter; tree arg; +if (targetm.calls.implicit_pic_arg (fndecl ? fndecl : fntype)) + { + gcc_assert (pic_offset_table_rtx); + args[j].tree_value = make_tree (ptr_type_node, + pic_offset_table_rtx); + j--; + } + if (struct_value_addr_value) { args[j].tree_value = struct_value_addr_value; So why do you need this? Can't this be handled in the call/call_value expanders or what about attaching the use to CALL_INSN_FUNCTION_USAGE from inside ix86_expand_call? Basically I'm not seeing the need for another target hook here. I think that would significantly simply the patch as well. GOT base address become an additional implicit arg with EBX relaxed and I handled it as all other args. I can move EBX initialization into ix86_expand_call. Would still need some hint from target to init pic_offset_table_rtx with proper value in the beginning of function expand. Maybe you can you use get_hard_reg_initial_val for this? Actually there is no input hard reg holding GOT address. Currently I use initialization with ebx with following ebx initialization in prolog_epilog pass. But this is a temporary workaround. It is inefficient because always uses callee save reg to get GOT address. I suppose we should generate pseudo reg for pic_offset_table_rtx and also set_got with this register as a destination in expand pass. After register allocation set_got may be transformed into get_pc_thunk call with proper hard reg. But some target hook has to be used for this. Let me expand my idea a bit. IIRC, get_hard_reg_initial_val and friends will automatically emit intialization of a pseudo from pic_offset_table_rtx hard reg. After reload, real initialization of pic_offset_table_rtx hard reg is emitted in pro_and_epilogue pass. I don't know if this works with current implementation of dynamic pic_offset_table_rtx selection, though. Uros.
Re: Enable EBX for x86 in 32bits PIC code
2014-08-28 17:01 GMT+04:00 Uros Bizjak ubiz...@gmail.com: On Fri, Aug 22, 2014 at 2:21 PM, Ilya Enkovich enkovich@gmail.com wrote: Hi, On Cauldron 2014 we had a couple of talks about relaxation of ebx usage in 32bit PIC mode. It was decided that the best approach would be to not fix ebx register, use speudo register for GOT base address and let allocator do the rest. This should be similar to how clang and icc work with GOT base address. I've been working for some time on such patch and now want to share my results. +#define PIC_OFFSET_TABLE_REGNUM \ + ((TARGET_64BIT (ix86_cmodel == CM_SMALL_PIC \ + || TARGET_PECOFF)) \ + || !flag_pic ? INVALID_REGNUM \ + : X86_TUNE_RELAX_PIC_REG ? (pic_offset_table_rtx ? INVALID_REGNUM \ + : REAL_PIC_OFFSET_TABLE_REGNUM) \ + : reload_completed ? REGNO (pic_offset_table_rtx) \ : REAL_PIC_OFFSET_TABLE_REGNUM) I'd like to avoid X86_TUNE_RELAX_PIC_REG and always treat EBX as an allocatable register. This way, we can avoid all mess with implicit xchgs in atomic_compare_and_swapdwi_doubleword. Also, having allocatable EBX would allow us to introduce __builtin_cpuid builtin and cleanup cpiud.h. We should show nice performance to have this feature enabled by default. Currently patch causes a set of performance losses. I have a version of this patch where EBX is relaxed by a compiler flag, not tune flag. Ilya
Re: Enable EBX for x86 in 32bits PIC code
2014-08-28 17:08 GMT+04:00 Uros Bizjak ubiz...@gmail.com: On Thu, Aug 28, 2014 at 2:54 PM, Ilya Enkovich enkovich@gmail.com wrote: diff --git a/gcc/calls.c b/gcc/calls.c index 4285ec1..85dae6b 100644 --- a/gcc/calls.c +++ b/gcc/calls.c @@ -1122,6 +1122,14 @@ initialize_argument_information (int num_actuals ATTRIBUTE_UNUSED, call_expr_arg_iterator iter; tree arg; +if (targetm.calls.implicit_pic_arg (fndecl ? fndecl : fntype)) + { + gcc_assert (pic_offset_table_rtx); + args[j].tree_value = make_tree (ptr_type_node, + pic_offset_table_rtx); + j--; + } + if (struct_value_addr_value) { args[j].tree_value = struct_value_addr_value; So why do you need this? Can't this be handled in the call/call_value expanders or what about attaching the use to CALL_INSN_FUNCTION_USAGE from inside ix86_expand_call? Basically I'm not seeing the need for another target hook here. I think that would significantly simply the patch as well. GOT base address become an additional implicit arg with EBX relaxed and I handled it as all other args. I can move EBX initialization into ix86_expand_call. Would still need some hint from target to init pic_offset_table_rtx with proper value in the beginning of function expand. Maybe you can you use get_hard_reg_initial_val for this? Actually there is no input hard reg holding GOT address. Currently I use initialization with ebx with following ebx initialization in prolog_epilog pass. But this is a temporary workaround. It is inefficient because always uses callee save reg to get GOT address. I suppose we should generate pseudo reg for pic_offset_table_rtx and also set_got with this register as a destination in expand pass. After register allocation set_got may be transformed into get_pc_thunk call with proper hard reg. But some target hook has to be used for this. Let me expand my idea a bit. IIRC, get_hard_reg_initial_val and friends will automatically emit intialization of a pseudo from pic_offset_table_rtx hard reg. After reload, real initialization of pic_offset_table_rtx hard reg is emitted in pro_and_epilogue pass. I don't know if this works with current implementation of dynamic pic_offset_table_rtx selection, though. That means you should choose some hard reg early before register allocation to be used for PIC reg initialization. I do not like we have to do this and want to just generate set_got with pseudo reg and do not involve any additional hard reg. That would look like (insn/f 168 167 169 2 (parallel [ (set (reg:SI 127) (unspec:SI [ (const_int 0 [0]) ] UNSPEC_SET_GOT)) (clobber (reg:CC 17 flags)) ]) test.cc:42 -1 (expr_list:REG_CFA_FLUSH_QUEUE (nil) (nil))) after expand pass. r127 is pic_offset_table_rtx here. And after reload it would become: (insn/f 168 167 169 2 (parallel [ (set (reg:SI 3 bx) (unspec:SI [ (const_int 0 [0]) ] UNSPEC_SET_GOT)) (clobber (reg:CC 17 flags)) ]) test.cc:42 -1 (expr_list:REG_CFA_FLUSH_QUEUE (nil) (nil))) And no additional actions are required on pro_and_epilogue. Also it simplifies analysis whether we should generate set_got at all. Current we check hard reg is ever live which is wrong with not fixed ebx because any usage of hard reg used to init GOT doesn't mean GOT usage. And with my proposed scheme unused GOT would mean DCE just removes useless set_got. Ilya Uros.
Re: Enable EBX for x86 in 32bits PIC code
On Thu, Aug 28, 2014 at 3:29 PM, Ilya Enkovich enkovich@gmail.com wrote: diff --git a/gcc/calls.c b/gcc/calls.c index 4285ec1..85dae6b 100644 --- a/gcc/calls.c +++ b/gcc/calls.c @@ -1122,6 +1122,14 @@ initialize_argument_information (int num_actuals ATTRIBUTE_UNUSED, call_expr_arg_iterator iter; tree arg; +if (targetm.calls.implicit_pic_arg (fndecl ? fndecl : fntype)) + { + gcc_assert (pic_offset_table_rtx); + args[j].tree_value = make_tree (ptr_type_node, + pic_offset_table_rtx); + j--; + } + if (struct_value_addr_value) { args[j].tree_value = struct_value_addr_value; So why do you need this? Can't this be handled in the call/call_value expanders or what about attaching the use to CALL_INSN_FUNCTION_USAGE from inside ix86_expand_call? Basically I'm not seeing the need for another target hook here. I think that would significantly simply the patch as well. GOT base address become an additional implicit arg with EBX relaxed and I handled it as all other args. I can move EBX initialization into ix86_expand_call. Would still need some hint from target to init pic_offset_table_rtx with proper value in the beginning of function expand. Maybe you can you use get_hard_reg_initial_val for this? Actually there is no input hard reg holding GOT address. Currently I use initialization with ebx with following ebx initialization in prolog_epilog pass. But this is a temporary workaround. It is inefficient because always uses callee save reg to get GOT address. I suppose we should generate pseudo reg for pic_offset_table_rtx and also set_got with this register as a destination in expand pass. After register allocation set_got may be transformed into get_pc_thunk call with proper hard reg. But some target hook has to be used for this. Let me expand my idea a bit. IIRC, get_hard_reg_initial_val and friends will automatically emit intialization of a pseudo from pic_offset_table_rtx hard reg. After reload, real initialization of pic_offset_table_rtx hard reg is emitted in pro_and_epilogue pass. I don't know if this works with current implementation of dynamic pic_offset_table_rtx selection, though. That means you should choose some hard reg early before register allocation to be used for PIC reg initialization. I do not like we have to do this and want to just generate set_got with pseudo reg and do not involve any additional hard reg. That would look like (insn/f 168 167 169 2 (parallel [ (set (reg:SI 127) (unspec:SI [ (const_int 0 [0]) ] UNSPEC_SET_GOT)) (clobber (reg:CC 17 flags)) ]) test.cc:42 -1 (expr_list:REG_CFA_FLUSH_QUEUE (nil) (nil))) after expand pass. r127 is pic_offset_table_rtx here. And after reload it would become: (insn/f 168 167 169 2 (parallel [ (set (reg:SI 3 bx) (unspec:SI [ (const_int 0 [0]) ] UNSPEC_SET_GOT)) (clobber (reg:CC 17 flags)) ]) test.cc:42 -1 (expr_list:REG_CFA_FLUSH_QUEUE (nil) (nil))) And no additional actions are required on pro_and_epilogue. Also it simplifies analysis whether we should generate set_got at all. Current we check hard reg is ever live which is wrong with not fixed ebx because any usage of hard reg used to init GOT doesn't mean GOT usage. And with my proposed scheme unused GOT would mean DCE just removes useless set_got. Yes this is better. I was under impression you want to retain current initialization insertion in expand_prologue. Uros.
Re: Enable EBX for x86 in 32bits PIC code
On 08/28/2014 03:01 PM, Uros Bizjak wrote: I'd like to avoid X86_TUNE_RELAX_PIC_REG and always treat EBX as an allocatable register. This way, we can avoid all mess with implicit xchgs in atomic_compare_and_swapdwi_doubleword. Also, having allocatable EBX would allow us to introduce __builtin_cpuid builtin and cleanup cpiud.h. It also makes writing solid inline assembly which has to use %ebx for some reason much easier. We just fixed a glibc bug related to that. -- Florian Weimer / Red Hat Product Security
Re: Enable EBX for x86 in 32bits PIC code
On Fri, Aug 22, 2014 at 2:21 PM, Ilya Enkovich enkovich@gmail.com wrote: On Cauldron 2014 we had a couple of talks about relaxation of ebx usage in 32bit PIC mode. It was decided that the best approach would be to not fix ebx register, use speudo register for GOT base address and let allocator do the rest. This should be similar to how clang and icc work with GOT base address. I've been working for some time on such patch and now want to share my results. (define_insn *pushtf [(set (match_operand:TF 0 push_operand =,) - (match_operand:TF 1 general_no_elim_operand x,*roF))] + (match_operand:TF 1 nonimmediate_no_elim_operand x,*roF))] Can you please explain the reason for this change (and a couple of similar changes to push patterns) ? Uros.
Re: Enable EBX for x86 in 32bits PIC code
On 2014-08-26 5:42 PM, Ilya Enkovich wrote: Hi, Here is a patch I tried. I apply it over revision 214215. Unfortunately I do not have a small reproducer but the problem can be easily reproduced on SPEC2000 benchmark 175.vpr. The problem is in read_arch.c:701 where float value is compared with float constant 1.0. It is inlined into read_arch function and can be easily found in RTL dump of function read_arch as a float comparison with 1.0 after the first call to strtod function. Here is a compilation string I use: gcc -m32 -mno-movbe -g3 -fdump-rtl-all-details -O2 -ffast-math -mfpmath=sse -m32 -march=slm -fPIE -pie -c -o read_arch.o -DSPEC_CPU2000 read_arch.c In my final assembler comparison with 1.0 looks like: comiss .LC11@GOTOFF(%ebp), %xmm0 # 1101 *cmpisf_sse [length = 7] and %ebp here doesn't have a proper value. I'll try to make a smaller reproducer if these instructions don't help. I've managed to reproduce it. Although it would be better to send the patch as an attachment. The problem is actually in IRA not LRA. IRA splits pseudo used for PIC. Then in a region when a *new* pseudo used as PIC we rematerialize a constant which transformed in memory addressed through *original* PIC pseudo. To solve the problem we should prevent such splitting and guarantee that PIC pseudo allocnos in different region gets the same hard reg. The following patch should solve the problem. Index: ira-color.c === --- ira-color.c (revision 214576) +++ ira-color.c (working copy) @@ -3239,9 +3239,10 @@ ira_assert (ALLOCNO_CLASS (subloop_allocno) == rclass); ira_assert (bitmap_bit_p (subloop_node-all_allocnos, ALLOCNO_NUM (subloop_allocno))); - if ((flag_ira_region == IRA_REGION_MIXED) - (loop_tree_node-reg_pressure[pclass] - = ira_class_hard_regs_num[pclass])) + if ((flag_ira_region == IRA_REGION_MIXED + (loop_tree_node-reg_pressure[pclass] + = ira_class_hard_regs_num[pclass])) + || regno == (int) REGNO (pic_offset_table_rtx)) { if (! ALLOCNO_ASSIGNED_P (subloop_allocno)) { Index: ira-emit.c === --- ira-emit.c (revision 214576) +++ ira-emit.c (working copy) @@ -620,7 +620,8 @@ /* don't create copies because reload can spill an allocno set by copy although the allocno will not get memory slot. */ - || ira_equiv_no_lvalue_p (regno))) + || ira_equiv_no_lvalue_p (regno) + || ALLOCNO_REGNO (allocno) == REGNO (pic_offset_table_rtx))) continue; original_reg = allocno_emit_reg (allocno); if (parent_allocno == NULL