[Bug rtl-optimization/69052] [6 Regression] Performance regression after r229402.

2016-03-07 Thread amker at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69052

--- Comment #17 from amker at gcc dot gnu.org ---
Author: amker
Date: Mon Mar  7 16:39:27 2016
New Revision: 234034

URL: https://gcc.gnu.org/viewcvs?rev=234034=gcc=rev
Log:

PR rtl-optimization/69052
* rtlanal.c (commutative_operand_precedence): Set higher precedence
to CONST_WIDE_INT.


Modified:
trunk/gcc/ChangeLog
trunk/gcc/rtlanal.c

[Bug rtl-optimization/69052] [6 Regression] Performance regression after r229402.

2016-03-02 Thread amker at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69052

--- Comment #16 from amker at gcc dot gnu.org ---
(In reply to amker from comment #14)
> Author: amker
> Date: Wed Mar  2 14:10:56 2016
> New Revision: 233907
> 
> URL: https://gcc.gnu.org/viewcvs?rev=233907=gcc=rev
> Log:
> 
>   PR tree-optimization/69052
>   * loop-invariant.c (canonicalize_address): New function.
>   (inv_can_prop_to_addr_use): Check validity of address expression
>   which is canonicalized by above function.
> 
>   gcc/testsuite/ChangeLog
>   PR tree-optimization/69052
>   * gcc.target/i386/pr69052.c: New test.
> 
> 
> Added:
> trunk/gcc/testsuite/gcc.target/i386/pr69052.c
> Modified:
> trunk/gcc/ChangeLog
> trunk/gcc/loop-invariant.c
> trunk/gcc/testsuite/ChangeLog

Just realized that CONST_INT and CONST_WIDE_INT both have the same precedence. 
The patch I applied could be broken in cases which have both CONST_INT and
CONST_WIDE_INT address parts and the CONST_WIDE_INT one happens to be sorted
after CONST_INT, as below code:

  /* Simplify all constant int summary if possible.  */
  for (i = 0; i < addr_parts.length (); i++)
if (CONST_INT_P (addr_parts[i]))
  break;

  for (j = i + 1; j < addr_parts.length (); j++)
{
  gcc_assert (CONST_INT_P (addr_parts[j]));  <-assertion failure
  addr_parts[i] = simplify_gen_binary (PLUS, mode,
   addr_parts[i],
   addr_parts[j]);
}

Although I don't think we can really run into such complicated address
expression, I will work out a patch to compare_address_parts to force CONST_INT
after CONST_WIDE_INT during sorting.

[Bug rtl-optimization/69052] [6 Regression] Performance regression after r229402.

2016-03-02 Thread law at redhat dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69052

Jeffrey A. Law  changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 CC||law at redhat dot com
 Resolution|--- |FIXED

--- Comment #15 from Jeffrey A. Law  ---
Fixed by Bin's patch on the trunk.

[Bug rtl-optimization/69052] [6 Regression] Performance regression after r229402.

2016-03-02 Thread amker at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69052

--- Comment #14 from amker at gcc dot gnu.org ---
Author: amker
Date: Wed Mar  2 14:10:56 2016
New Revision: 233907

URL: https://gcc.gnu.org/viewcvs?rev=233907=gcc=rev
Log:

PR tree-optimization/69052
* loop-invariant.c (canonicalize_address): New function.
(inv_can_prop_to_addr_use): Check validity of address expression
which is canonicalized by above function.

gcc/testsuite/ChangeLog
PR tree-optimization/69052
* gcc.target/i386/pr69052.c: New test.


Added:
trunk/gcc/testsuite/gcc.target/i386/pr69052.c
Modified:
trunk/gcc/ChangeLog
trunk/gcc/loop-invariant.c
trunk/gcc/testsuite/ChangeLog

[Bug rtl-optimization/69052] [6 Regression] Performance regression after r229402.

2016-03-01 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69052

Richard Biener  changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
   Last reconfirmed||2016-03-01
 Ever confirmed|0   |1

[Bug rtl-optimization/69052] [6 Regression] Performance regression after r229402.

2016-03-01 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69052

Richard Biener  changed:

   What|Removed |Added

   Priority|P3  |P1

[Bug rtl-optimization/69052] [6 Regression] Performance regression after r229402.

2016-02-09 Thread amker at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69052

--- Comment #12 from amker at gcc dot gnu.org ---
Patch sent for review at
https://gcc.gnu.org/ml/gcc-patches/2016-02/msg00612.html
It works for the reduced test case, could you please help me to check if it
works for you original case?
Thanks,
bin

[Bug rtl-optimization/69052] [6 Regression] Performance regression after r229402.

2016-02-09 Thread ysrumyan at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69052

--- Comment #13 from Yuri Rumyantsev  ---
I checked that performance is back for the whole benchmark. Thanks a lot.

Yuri.

2016-02-09 14:17 GMT+03:00 amker at gcc dot gnu.org :
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69052
>
> --- Comment #12 from amker at gcc dot gnu.org ---
> Patch sent for review at
> https://gcc.gnu.org/ml/gcc-patches/2016-02/msg00612.html
> It works for the reduced test case, could you please help me to check if it
> works for you original case?
> Thanks,
> bin
>
> --
> You are receiving this mail because:
> You reported the bug.

[Bug rtl-optimization/69052] [6 Regression] Performance regression after r229402.

2016-01-27 Thread amker at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69052

--- Comment #9 from amker at gcc dot gnu.org ---
(In reply to Ilya Enkovich from comment #8)
> (In reply to amker from comment #7)
> > According to discussion at https://gcc.gnu.org/ml/gcc/2016-01/msg00190.html,
> > hook is probably not wanted in this case.
> > Bernd gave another proposal by moving combine before loop transforms is also
> > interesting, but it can be for GCC6.
> > So a backend fix would be nice.
> 
> Unfortunately my patch causes significant regressions in some SPEC
> benchmarks.  Looks like address operands order matters in some other places.

I know little about x86, is it because of generation of non-canonical rtl
expression after this change?

Another question for this case is: Is it because operand ordering that
inv_can_prop_to_addr_use failed to verify the result insn as a valid one?  If
so, is it because of non-canonical rtl expressions?

I will check how combine merges these two instructions.

Thanks.

[Bug rtl-optimization/69052] [6 Regression] Performance regression after r229402.

2016-01-27 Thread ienkovich at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69052

--- Comment #8 from Ilya Enkovich  ---
(In reply to amker from comment #7)
> According to discussion at https://gcc.gnu.org/ml/gcc/2016-01/msg00190.html,
> hook is probably not wanted in this case.
> Bernd gave another proposal by moving combine before loop transforms is also
> interesting, but it can be for GCC6.
> So a backend fix would be nice.

Unfortunately my patch causes significant regressions in some SPEC benchmarks. 
Looks like address operands order matters in some other places.

[Bug rtl-optimization/69052] [6 Regression] Performance regression after r229402.

2016-01-27 Thread ienkovich at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69052

--- Comment #10 from Ilya Enkovich  ---
(In reply to amker from comment #9)
> I know little about x86, is it because of generation of non-canonical rtl
> expression after this change?
> 
> Another question for this case is: Is it because operand ordering that
> inv_can_prop_to_addr_use failed to verify the result insn as a valid one? 
> If so, is it because of non-canonical rtl expressions?

In i386 address expression only the first operand of PLUS may be another PLUS. 
Thus (r1 + r2) + CONST is a valid address but r1 + (r2 + CONST) is not.  Thus
when you propagate a register you may need to reorder operands.

[Bug rtl-optimization/69052] [6 Regression] Performance regression after r229402.

2016-01-27 Thread amker at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69052

--- Comment #11 from amker at gcc dot gnu.org ---
(In reply to Ilya Enkovich from comment #10)
> (In reply to amker from comment #9)
> > I know little about x86, is it because of generation of non-canonical rtl
> > expression after this change?
> > 
> > Another question for this case is: Is it because operand ordering that
> > inv_can_prop_to_addr_use failed to verify the result insn as a valid one? 
> > If so, is it because of non-canonical rtl expressions?
> 
> In i386 address expression only the first operand of PLUS may be another
> PLUS.  Thus (r1 + r2) + CONST is a valid address but r1 + (r2 + CONST) is
> not.  Thus when you propagate a register you may need to reorder operands.

I see, I will check what combine does here.  Maybe it's just an overlook in
inv_can_prop_to_addr_use, which reminds me if other targets like AArch64 are
also affected.

Thanks.

[Bug rtl-optimization/69052] [6 Regression] Performance regression after r229402.

2016-01-26 Thread amker at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69052

--- Comment #7 from amker at gcc dot gnu.org ---
(In reply to Ilya Enkovich from comment #6)
> (In reply to amker from comment #5)
> > Not sure if stage4 is a good time for a new hook either.
> > 
> > Any ideas?
> 
> We can try to improve i386 address recognition to ignore operands order. 
> With this patch:
> 
> diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
> index 34b57a4..b13d3f6 100644
> --- a/gcc/config/i386/i386.c
> +++ b/gcc/config/i386/i386.c
> @@ -14111,8 +14111,16 @@ ix86_decompose_address (rtx addr, struct
> ix86_address *out)
> {
>   if (n >= 4)
> return 0;
> - addends[n++] = XEXP (op, 1);
> - op = XEXP (op, 0);
> + if (GET_CODE (XEXP (op, 1)) == PLUS)
> +   {
> + addends[n++] = XEXP (op, 0);
> + op = XEXP (op, 1);
> +   }
> + else
> +   {
> + addends[n++] = XEXP (op, 1);
> + op = XEXP (op, 0);
> +   }
> }
>while (GET_CODE (op) == PLUS);
>if (n >= 4)
> 
> I get following ASM:
> 
> .L5:
> movlind@GOTOFF(%ebx,%esi,4), %eax
> movl12(%esp), %edi
> movl%ebp, (%edi,%eax,4)

According to discussion at https://gcc.gnu.org/ml/gcc/2016-01/msg00190.html,
hook is probably not wanted in this case.
Bernd gave another proposal by moving combine before loop transforms is also
interesting, but it can be for GCC6.
So a backend fix would be nice.

[Bug rtl-optimization/69052] [6 Regression] Performance regression after r229402.

2016-01-26 Thread ienkovich at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69052

Ilya Enkovich  changed:

   What|Removed |Added

 CC||ienkovich at gcc dot gnu.org

--- Comment #6 from Ilya Enkovich  ---
(In reply to amker from comment #5)
> Not sure if stage4 is a good time for a new hook either.
> 
> Any ideas?

We can try to improve i386 address recognition to ignore operands order.  With
this patch:

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 34b57a4..b13d3f6 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -14111,8 +14111,16 @@ ix86_decompose_address (rtx addr, struct ix86_address
*out)
{
  if (n >= 4)
return 0;
- addends[n++] = XEXP (op, 1);
- op = XEXP (op, 0);
+ if (GET_CODE (XEXP (op, 1)) == PLUS)
+   {
+ addends[n++] = XEXP (op, 0);
+ op = XEXP (op, 1);
+   }
+ else
+   {
+ addends[n++] = XEXP (op, 1);
+ op = XEXP (op, 0);
+   }
}
   while (GET_CODE (op) == PLUS);
   if (n >= 4)

I get following ASM:

.L5:
movlind@GOTOFF(%ebx,%esi,4), %eax
movl12(%esp), %edi
movl%ebp, (%edi,%eax,4)

[Bug rtl-optimization/69052] [6 Regression] Performance regression after r229402.

2016-01-25 Thread amker at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69052

--- Comment #5 from amker at gcc dot gnu.org ---
(In reply to Yuri Rumyantsev from comment #0)
> In loop_invariant phase additional function inv_can_prop_to_addr_use which
> tried to determine if forward propagation for cheap address is possible
> through call of verify_changes which is very poor in comparison with combine
> phase.
> For example, for attached test-case it tries
> (gdb) call debug_rtx(def_insn)
> (insn 69 67 70 9 (set (reg/f:SI 149)
> (plus:SI (reg:SI 87)
> (const:SI (unspec:SI [
> (symbol_ref:SI ("ind") [flags 0x2]  0x77ffbe10 ind>)
> ] UNSPEC_GOTOFF t1.c:40 212 {*leasi}
>  (expr_list:REG_DEAD (reg:SI 87)
> (nil)))
> (gdb) call debug_rtx(use_insn)
> (insn 70 69 71 9 (set (reg:SI 150)
> (mem/u:SI (plus:SI (mult:SI (reg/v:SI 90 [ k ])
> (const_int 4 [0x4]))
> (reg/f:SI 149)) [1 ind S4 A32])) t1.c:40 86 {*movsi_internal}
>  (expr_list:REG_DEAD (reg/f:SI 149)
> (nil)))
> and determines that propagation is not possible:
> (gdb) p ok
> $1 = false
> but combine can do such substitution.
> 
> This leads to undesired code motion and performance lost:
> for stmt out[ind[k]] = result
> before r229402
>   movlind@GOTOFF(%ebx,%esi,4), %eax
>   movl12(%esp), %edi
>   movl%ebp, (%edi,%eax,4)
> after r229402
>   movl28(%esp), %eax
>   movl24(%esp), %ebx
>   movl(%eax,%esi,4), %eax
>   movl%edi, (%ebx,%eax,4)
> 
> redundant fill has been generated by LRA.
> 
> Since emulation combine phase is not so simple I assume that additional hook
> should be added to turn off such transformation for x86 in PIE mode.

I agree it's hard to mimic combine behavior here.  It would be nice if we can
abstract an interface out of combine so that we can test if instructions can be
combined at any passes, but this sounds even more difficult...
Also is it possible to give green light to got related address computation.  On
other targets, I fount it's better to keep these instructions inside one basic
block, otherwise redundant computation or CSE opportunities might be missed
among different basic blocks.

Not sure if stage4 is a good time for a new hook either.

Any ideas?

[Bug rtl-optimization/69052] [6 Regression] Performance regression after r229402.

2016-01-20 Thread amker at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69052

--- Comment #4 from amker at gcc dot gnu.org ---
(In reply to Igor Zamyatin from comment #3)
> (In reply to amker from comment #2)
> > It's my change, I will look into it.
> 
> Any plans on this?

Sorry for late response, I will try to get to this one in one or two weeks.

[Bug rtl-optimization/69052] [6 Regression] Performance regression after r229402.

2016-01-18 Thread izamyatin at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69052

Igor Zamyatin  changed:

   What|Removed |Added

 CC||izamyatin at gmail dot com

--- Comment #3 from Igor Zamyatin  ---
(In reply to amker from comment #2)
> It's my change, I will look into it.

Any plans on this?

[Bug rtl-optimization/69052] [6 Regression] Performance regression after r229402.

2016-01-11 Thread amker at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69052

--- Comment #2 from amker at gcc dot gnu.org ---
It's my change, I will look into it.

[Bug rtl-optimization/69052] [6 Regression] Performance regression after r229402.

2016-01-05 Thread pinskia at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69052

Andrew Pinski  changed:

   What|Removed |Added

   Keywords||missed-optimization
 Target||x86_64-linux-gnu
   Target Milestone|--- |6.0

[Bug rtl-optimization/69052] [6 Regression] Performance regression after r229402.

2015-12-25 Thread ysrumyan at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69052

--- Comment #1 from Yuri Rumyantsev  ---
Created attachment 37133
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=37133=edit
test-case to reproduce

It should be compile with -O2 -m32 options to reproduce.