[Bug tree-optimization/38401] TreeSSA-PRE load after store missed optimization

2008-12-20 Thread sergeid at il dot ibm dot com


--- Comment #16 from sergeid at il dot ibm dot com  2008-12-21 07:44 ---
(In reply to comment #15)
> Re. comment #14: Yes, I suppose so.  Why do you want to remove gcse-las from
> mainline.  Not that I'm against it -- ideally RTL gcse.c would not work on
> memory at all anymore -- but I wouldn't remove gcse-las until we catch in the
> GIMPLE optimizers as much as possible of the things we still need gcse-las 
> for.

For the time being this is the only case I've found out which is missed by
tree-PRE and caught by GCSE-LAS. As you pointed out, GCSE-LAS doesn't seem to
help much.

> It seems to me, btw, that it might be easier to teach GIMPLE loop invariant
> code motion about this transformation.  Adding this in GIMPLE PRE might be a
> little too expensive...?

That may be; I was just noting that such redundancies should be caught
somewhere at the GIMPLE stage.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38401



[Bug tree-optimization/38401] TreeSSA-PRE load after store misoptimization

2008-12-14 Thread sergeid at il dot ibm dot com


--- Comment #14 from sergeid at il dot ibm dot com  2008-12-15 07:17 ---
Ok, since this case is the only one where RTL PRE (gcse-las) improves
performance and it can be dealt with at the TreeSSA level, it should be ok to
remove gcse-las from mainline and keep this PR open? 


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38401



[Bug tree-optimization/38401] TreeSSA-PRE load after store misoptimization

2008-12-08 Thread sergeid at il dot ibm dot com


--- Comment #12 from sergeid at il dot ibm dot com  2008-12-08 11:53 ---
I have to mention that tree PRE still don't catch this LOAD with -O3. 
Though the patch Richard posted does the job.

(In reply to comment #1)
> It works with -O3 (with partial-partial PRE enabled).  At least 
> phi-translation
> figures out that *res is zero on the incoming edge.
> 
> Un-leashing partial-PRE like with


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38401



[Bug tree-optimization/38401] TreeSSA-PRE load after store misoptimization

2008-12-08 Thread sergeid at il dot ibm dot com


--- Comment #10 from sergeid at il dot ibm dot com  2008-12-08 10:08 ---
Subject: Re:  TreeSSA-PRE load after store
 misoptimization

Sorry, forgot to attach the patch.(See attached file:
gcse-las-counter.patch)


--- Comment #11 from sergeid at il dot ibm dot com  2008-12-08 10:08 ---
Created an attachment (id=16850)
 --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=16850&action=view)


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38401



[Bug tree-optimization/38401] TreeSSA-PRE load after store misoptimization

2008-12-08 Thread sergeid at il dot ibm dot com


--- Comment #9 from sergeid at il dot ibm dot com  2008-12-08 10:03 ---
Subject: Re:  TreeSSA-PRE load after store
 misoptimization

Can you post your gcc configuration options?
I've created and attached a little patch which adds some more information
to dump file. Can you apply it and send me the new .gcse1 dump? Then I'll
compare it with mine and may be we'll find the reason.

"steven at gcc dot gnu dot org" <[EMAIL PROTECTED]> wrote on
04/12/2008 20:16:05:

> I still don't see why this is caught on powerpc by RTL PRE, but not on
ia64
> (note *ia64*, not x86).  I compile with -O3 -fgcse-las.  The compiler is
> yesterday's trunk on ia64-unknown-linux-gnu.  The .gcse1 dump is
> attached.  Why
> is it optimized for you on powerpc but not for me on ia64?


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38401



[Bug tree-optimization/38401] TreeSSA-PRE load after store misoptimization

2008-12-04 Thread sergeid at il dot ibm dot com


--- Comment #7 from sergeid at il dot ibm dot com  2008-12-04 17:54 ---
Subject: Re:  TreeSSA-PRE load after store
 misoptimization

You're right, it worked for me only on powerpc. This is RTL snippet
_before_ elimination (load + xor + store):

...
(insn 20 19 21 5 ../loop.c:10 (set (reg:SI 128)
(mem:SI (reg/v/f:SI 123 [ res ]) [2 S4 A32])) 324
{*movsi_internal1} (nil))

(insn 21 20 22 5 ../loop.c:10 (set (reg:SI 129)
(xor:SI (reg:SI 128)
(const_int 234 [0xea]))) 139 {*boolsi3_internal1} (nil))

(insn 22 21 23 5 ../loop.c:10 (set (mem:SI (reg/v/f:SI 123 [ res ]) [2 S4
A32])
(reg:SI 129)) 324 {*movsi_internal1} (nil)):
...


And this is _after_ (xor + store only):
...
(insn 21 19 22 5 ../loop.c:10 (set (reg:SI 131)
(xor:SI (reg:SI 131)
(const_int 234 [0xea]))) 139 {*boolsi3_internal1}
(expr_list:REG_DEAD (reg:SI 131)
(nil)))

(insn 22 21 23 5 ../loop.c:10 (set (mem:SI (reg/v/f:SI 123 [ res ]) [2 S4
A32])
(reg:SI 131)) 324 {*movsi_internal1} (expr_list:REG_DEAD (reg:SI
129)])
(nil)))
...

On x86 it produces complex set instructions:
...
(insn 18 17 19 5 ../loop.c:10 (parallel [
(set (mem:SI (reg/v/f:DI 62 [ res ]) [2 S4 A32])
(xor:SI (mem:SI (reg/v/f:DI 62 [ res ]) [2 S4 A32])
(const_int 234 [0xea])))
(clobber (reg:CC 17 flags))
]) 417 {*xorsi_1} (expr_list:REG_UNUSED (reg:CC 17 flags)])
(nil)))
...
and that's why (probably) GCSE can't optimize it.


PS. BTW, I _do_ compile it with "-O3" and tree PRE doesn't catch it.


"steven at gcc dot gnu dot org" <[EMAIL PROTECTED]> wrote on
04/12/2008 19:08:57:

> I do not see RTL PRE catch this on ia64, with or without -fgcse-las.
>
> Can you show, please, the RTL dumps before and after GCSE?
>
>
> --
>
>
> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38401
>
> --- You are receiving this mail because: ---
> You reported the bug, or are watching the reporter.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38401



[Bug tree-optimization/38401] New: TreeSSA-PRE load after store misoptimization

2008-12-04 Thread sergeid at il dot ibm dot com
There is an obvious redundant LOAD in the in the following code ( (*) line):

void f (int n, int *cond, int *res)
{
int i;
*res = 0;
for (i = 0; i < n; i++)
if (*cond)
*res ^= 234; /* (*) */
}

GCSE LAS (load after store) catches it in RTL stage but it should be catched by
PRE in TreeSSA stage.


-- 
   Summary: TreeSSA-PRE load after store misoptimization
   Product: gcc
   Version: 4.4.0
Status: UNCONFIRMED
  Severity: minor
  Priority: P3
 Component: tree-optimization
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: sergeid at il dot ibm dot com
 GCC build triplet: powerpc
  GCC host triplet: powerpc
GCC target triplet: powerpc


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38401