[Bug middle-end/44382] Slow integer multiply

2016-05-12 Thread wschmidt at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=44382

Bill Schmidt  changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED

--- Comment #13 from Bill Schmidt  ---
Fixed, then.

[Bug middle-end/44382] Slow integer multiply

2016-05-10 Thread wschmidt at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=44382

--- Comment #12 from Bill Schmidt  ---
I'd propose that this bug can now be closed.  If nobody objects, I'll do that
later this week.

[Bug middle-end/44382] Slow integer multiply

2016-05-10 Thread acsawdey at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=44382

acsawdey at gcc dot gnu.org changed:

   What|Removed |Added

 CC||acsawdey at gcc dot gnu.org

--- Comment #11 from acsawdey at gcc dot gnu.org ---
236043 adds rs6000_reassociation_width() to enable parallel reassociation for
ppc64/ppc64le with -mcpu=power8. The width is picked to be a balance between
something wide enough to exploit the core resources but not so wide that we
generate a lot of excess spills. SPEC 2006 int is pretty neutral but fp sees
some gains, largely because lbm gets a big improvement from this.

[Bug middle-end/44382] Slow integer multiply

2011-10-21 Thread wschmidt at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44382

--- Comment #10 from William J. Schmidt wschmidt at gcc dot gnu.org 
2011-10-21 14:41:13 UTC ---
One more data point.  I repeated the experiment using -fsched-pressure. 
Although this reduced the degradations considerably, the overall results are
equivocal.  I see a few improvements and a few degradations in the 1-4% range,
with the geometric means essentially unchanged.  So even if -fsched-pressure
were the default, there wouldn't be an overwhelming case for enabling this
support on powerpc64.


[Bug middle-end/44382] Slow integer multiply

2011-10-13 Thread wschmidt at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44382

--- Comment #9 from William J. Schmidt wschmidt at gcc dot gnu.org 2011-10-13 
17:30:14 UTC ---
Just adding some status information well after the fact...

We experimented with adding powerpc64 hooks to use the parallel reassociation
support from comment #8.  We elected not to enable this support because the
results for SPEC were negative (quite negative in some cases), due to increased
register pressure in loops where spill was already an issue.  Our plans at this
point are to live with the left-linear association, at least until the spill
costs can be mitigated in some fashion.

If parallel reassociation had some heuristics that predicted for register
pressure (difficult in tree-ssa, I know), it might become practical for us.


[Bug middle-end/44382] Slow integer multiply

2011-09-06 Thread hjl at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44382

--- Comment #8 from hjl at gcc dot gnu.org hjl at gcc dot gnu.org 2011-09-06 
16:42:56 UTC ---
Author: hjl
Date: Tue Sep  6 16:42:47 2011
New Revision: 178602

URL: http://gcc.gnu.org/viewcvs?root=gccview=revrev=178602
Log:
PR middle-end/44382: Tree reassociation improvement

gcc/

2011-09-06  Enkovich Ilya  ilya.enkov...@intel.com

PR middle-end/44382
* target.def (reassociation_width): New hook.

* doc/tm.texi.in (reassociation_width): Likewise.

* doc/tm.texi (reassociation_width): Likewise.

* doc/invoke.texi (tree-reassoc-width): New param documented.

* hooks.h (hook_int_uint_mode_1): New default hook.

* hooks.c (hook_int_uint_mode_1): Likewise.

* config/i386/i386.h (ix86_tune_indices): Add
X86_TUNE_REASSOC_INT_TO_PARALLEL and
X86_TUNE_REASSOC_FP_TO_PARALLEL.

(TARGET_REASSOC_INT_TO_PARALLEL): New.
(TARGET_REASSOC_FP_TO_PARALLEL): Likewise.

* config/i386/i386.c (initial_ix86_tune_features): Add
X86_TUNE_REASSOC_INT_TO_PARALLEL and
X86_TUNE_REASSOC_FP_TO_PARALLEL.

(ix86_reassociation_width) implementation of
new hook for i386 target.

* params.def (PARAM_TREE_REASSOC_WIDTH): New param added.

* tree-ssa-reassoc.c (get_required_cycles): New function.
(get_reassociation_width): Likewise.
(swap_ops_for_binary_stmt): Likewise.
(rewrite_expr_tree_parallel): Likewise.

(rewrite_expr_tree): Refactored. Part of code moved into
swap_ops_for_binary_stmt.

(reassociate_bb): Now checks reassociation width to be used
and call rewrite_expr_tree_parallel instead of rewrite_expr_tree
if needed.

gcc/testsuite/

2011-09-06  Enkovich Ilya  ilya.enkov...@intel.com

* gcc.dg/tree-ssa/pr38533.c (dg-options): Added option
--param tree-reassoc-width=1.

* gcc.dg/tree-ssa/reassoc-24.c: New test.
* gcc.dg/tree-ssa/reassoc-25.c: Likewise.

Added:
trunk/gcc/testsuite/gcc.dg/tree-ssa/reassoc-24.c
trunk/gcc/testsuite/gcc.dg/tree-ssa/reassoc-25.c
Modified:
trunk/gcc/ChangeLog
trunk/gcc/config/i386/i386.c
trunk/gcc/config/i386/i386.h
trunk/gcc/doc/invoke.texi
trunk/gcc/doc/tm.texi
trunk/gcc/doc/tm.texi.in
trunk/gcc/hooks.c
trunk/gcc/hooks.h
trunk/gcc/params.def
trunk/gcc/target.def
trunk/gcc/testsuite/ChangeLog
trunk/gcc/testsuite/gcc.dg/tree-ssa/pr38533.c
trunk/gcc/tree-ssa-reassoc.c


[Bug middle-end/44382] Slow integer multiply

2011-07-12 Thread wschmidt at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44382

William J. Schmidt wschmidt at gcc dot gnu.org changed:

   What|Removed |Added

 CC||wschmidt at gcc dot gnu.org

--- Comment #7 from William J. Schmidt wschmidt at gcc dot gnu.org 2011-07-12 
17:33:06 UTC ---
The test case from bug 45671 is as follows:

int myfunction (int a, int b, int c, int d, int e, int f, int g, int h) {
  int ret;

  ret = a + b + c + d + e + f + g + h;
  return ret;

}

Compiling with -O3 results in a series of dependent add instructions to
accumulate the sum.

add 4,3,4
add 4,4,5
add 4,4,6
add 4,4,7
add 4,4,8
add 4,4,9
add 4,4,10


If we regrouped to (a+b)+(c+d)+... we can do multiple adds in parallel on
different execution units.


[Bug middle-end/44382] Slow integer multiply

2010-09-14 Thread hjl dot tools at gmail dot com


--- Comment #6 from hjl dot tools at gmail dot com  2010-09-15 04:29 ---
*** Bug 45671 has been marked as a duplicate of this bug. ***


-- 

hjl dot tools at gmail dot com changed:

   What|Removed |Added

 CC||pthaugen at gcc dot gnu dot
   ||org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44382



[Bug middle-end/44382] Slow integer multiply

2010-06-04 Thread hjl dot tools at gmail dot com


--- Comment #2 from hjl dot tools at gmail dot com  2010-06-04 13:08 ---
(In reply to comment #1)
 Because our tree reassoc doesn't re-associate them.
 

The tree reassoc pass makes it slower:

[...@gnu-6 44382]$ cat x.i
extern int a, b, c, d, e, f;
void
foo ()
{
  a = (b * c) * (d * e);
}
[...@gnu-6 44382]$ gcc -S -O2 x.i
[...@gnu-6 44382]$ cat x.s
.file   x.i
.text
.p2align 4,,15
.globl foo
.type   foo, @function
foo:
.LFB0:
.cfi_startproc
movlc(%rip), %eax
imull   b(%rip), %eax
imull   d(%rip), %eax
imull   e(%rip), %eax
movl%eax, a(%rip)
ret
[...@gnu-6 44382]$ gcc -S -O2 x.i -fno-tree-reassoc
[...@gnu-6 44382]$ cat x.s
.file   x.i
.text
.p2align 4,,15
.globl foo
.type   foo, @function
foo:
.LFB0:
.cfi_startproc
movlb(%rip), %eax
movld(%rip), %edx
imull   c(%rip), %eax
imull   e(%rip), %edx
imull   %edx, %eax
movl%eax, a(%rip)
ret
[...@gnu-6 44382]$ 


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44382



[Bug middle-end/44382] Slow integer multiply

2010-06-04 Thread rguenth at gcc dot gnu dot org


--- Comment #3 from rguenth at gcc dot gnu dot org  2010-06-04 13:21 ---
Yes, reassoc linearizes instead of building a tree (saves one (or was it two?)
registers at best).


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44382



[Bug middle-end/44382] Slow integer multiply

2010-06-04 Thread hjl dot tools at gmail dot com


--- Comment #4 from hjl dot tools at gmail dot com  2010-06-04 13:56 ---
(In reply to comment #3)
 Yes, reassoc linearizes instead of building a tree (saves one (or was it two?)
 registers at best).
 

Should we always build a tree? It may increase register pressure.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44382



[Bug middle-end/44382] Slow integer multiply

2010-06-04 Thread hjl dot tools at gmail dot com


--- Comment #5 from hjl dot tools at gmail dot com  2010-06-04 14:40 ---
tree-ssa-reassoc.c has

2. Left linearization of the expression trees, so that (A+B)+(C+D)
becomes (((A+B)+C)+D), which is easier for us to rewrite later.
During linearization, we place the operands of the binary
expressions into a vector of operand_entry_t

I think this may always generate slower codes. We may not want to
use much more registers. We can limit us to 2 temporaries.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44382



[Bug middle-end/44382] Slow integer multiply

2010-06-02 Thread rguenth at gcc dot gnu dot org


--- Comment #1 from rguenth at gcc dot gnu dot org  2010-06-02 15:15 ---
Because our tree reassoc doesn't re-associate them.


-- 

rguenth at gcc dot gnu dot org changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
 Ever Confirmed|0   |1
   Keywords||missed-optimization
   Last reconfirmed|-00-00 00:00:00 |2010-06-02 15:15:06
   date||


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44382