Re: [PATCH][RFC] final-value replacement from DCE

2019-05-31 Thread Richard Biener
On Wed, 29 May 2019, Jakub Jelinek wrote:

> On Wed, May 29, 2019 at 09:57:50AM -0600, Jeff Law wrote:
> > > FAIL: gcc.dg/builtin-object-size-1.c execution test
> > > FAIL: gcc.dg/builtin-object-size-5.c scan-assembler-not abort
> 
> I admit I haven't looked at the details here, but wonder if the optimization
> couldn't be done only in the DCE passes post IPA, otherwise we risk
> behavior changes for __builtin_object_size.

We can do that - the first CD-DCE pass is in the loop pipeline though,
_after_ final value replacement.  Looking at the testsuite fallout
it's also clear that doing loop-header copying before final-value
replacement results in better code for some testcases.

So I'm trying turning the first DCE after loop-header copying into
a CD-DCE run, not doing final value replacement before IPA.

The following does that independently, bootstrapped & tested
on x86_64-unknown-linux-gnu.  It will leave

FAIL: gcc.dg/tree-ssa/pr68619-4.c scan-tree-dump optimized "PHI <.*, 39"

because the testcase is totally unclear on who is supposed to
propagate 39 and why.  With CD-DCE there's one PRE opportunity
less because, well, a value is no longer partially redundant.

I hope I catched all dce/cddce dump issues and it just seemed to
me that unifying dce and cd-dce may be a useful cleanup
and just have

  NEXT_PASS (pass_dce, true /* perform control-dependent DCE */)

but not for today...

Not going to apply this separately but only eventually together
with the rest.

Richard.

2019-05-31  Richard Biener  

PR tree-optimization/68619
* passes.def (pass_dce after CH): Turn into pass_cd_dce.

* g++.dg/tree-ssa/copyprop-1.C: Adjust dump scanned.
* gcc.dg/tree-ssa/20030709-2.c: Likewise.
* gcc.dg/tree-ssa/20030808-1.c: Likewise.
* gcc.dg/tree-ssa/20040729-1.c: Likewise.
* gcc.dg/tree-ssa/loop-36.c: Likewise.
* gcc.dg/tree-ssa/ssa-dce-1.c: Likewise.
* gcc.dg/tree-ssa/ssa-dce-2.c:  Likewise.

Index: gcc/passes.def
===
--- gcc/passes.def  (revision 271802)
+++ gcc/passes.def  (working copy)
@@ -231,7 +231,7 @@ along with GCC; see the file COPYING3.
   NEXT_PASS (pass_isolate_erroneous_paths);
   NEXT_PASS (pass_dse);
   NEXT_PASS (pass_reassoc, true /* insert_powi_p */);
-  NEXT_PASS (pass_dce);
+  NEXT_PASS (pass_cd_dce);
   NEXT_PASS (pass_forwprop);
   NEXT_PASS (pass_phiopt, false /* early_p */);
   NEXT_PASS (pass_ccp, true /* nonzero_p */);
Index: gcc/testsuite/g++.dg/tree-ssa/copyprop-1.C
===
--- gcc/testsuite/g++.dg/tree-ssa/copyprop-1.C  (revision 271802)
+++ gcc/testsuite/g++.dg/tree-ssa/copyprop-1.C  (working copy)
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O -fdump-tree-dce3" } */
+/* { dg-options "-O -fdump-tree-cddce2" } */
 
 /* Verify that we can eliminate the useless conversions to/from
const qualified pointer types
@@ -27,4 +27,4 @@ int foo(Object)
 
 /* Remaining should be two loads.  */
 
-/* { dg-final { scan-tree-dump-times " = \[^\n\]*;" 2 "dce3" } } */
+/* { dg-final { scan-tree-dump-times " = \[^\n\]*;" 2 "cddce2" } } */
Index: gcc/testsuite/gcc.dg/tree-ssa/20030709-2.c
===
--- gcc/testsuite/gcc.dg/tree-ssa/20030709-2.c  (revision 271802)
+++ gcc/testsuite/gcc.dg/tree-ssa/20030709-2.c  (working copy)
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O -fdump-tree-dce3" } */
+/* { dg-options "-O -fdump-tree-cddce2" } */
   
 struct rtx_def;
 typedef struct rtx_def *rtx;
@@ -42,13 +42,13 @@ get_alias_set (t)
 
 /* There should be precisely one load of ->decl.rtl.  If there is
more than, then the dominator optimizations failed.  */
-/* { dg-final { scan-tree-dump-times "->decl\\.rtl" 1 "dce3"} } */
+/* { dg-final { scan-tree-dump-times "->decl\\.rtl" 1 "cddce2"} } */
   
 /* There should be no loads of .rtmem since the complex return statement
is just "return 0".  */
-/* { dg-final { scan-tree-dump-times ".rtmem" 0 "dce3"} } */
+/* { dg-final { scan-tree-dump-times ".rtmem" 0 "cddce2"} } */
   
 /* There should be one IF statement (the complex return statement should
collapse down to a simple return 0 without any conditionals).  */
-/* { dg-final { scan-tree-dump-times "if " 1 "dce3"} } */
+/* { dg-final { scan-tree-dump-times "if " 1 "cddce2"} } */
 
Index: gcc/testsuite/gcc.dg/tree-ssa/20030808-1.c
===
--- gcc/testsuite/gcc.dg/tree-ssa/20030808-1.c  (revision 271802)
+++ gcc/testsuite/gcc.dg/tree-ssa/20030808-1.c  (working copy)
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O1 -fdump-tree-cddce3" } */
+/* { dg-options "-O1 -fdump-tree-cddce4" } */
   
 extern void abort (void);
 
@@ -33,8 +33,8 @@ delete_dead_jumptables ()
 /* There should be no loads of ->code.  If any 

Re: [PATCH][RFC] final-value replacement from DCE

2019-05-29 Thread Jakub Jelinek
On Wed, May 29, 2019 at 09:57:50AM -0600, Jeff Law wrote:
> > FAIL: gcc.dg/builtin-object-size-1.c execution test
> > FAIL: gcc.dg/builtin-object-size-5.c scan-assembler-not abort

I admit I haven't looked at the details here, but wonder if the optimization
couldn't be done only in the DCE passes post IPA, otherwise we risk
behavior changes for __builtin_object_size.

Jakub


Re: [PATCH][RFC] final-value replacement from DCE

2019-05-29 Thread Jeff Law
On 5/29/19 7:36 AM, Richard Biener wrote:
> 
> The following tries to address PR90648 by performing final
> value replacement from DCE when DCE knows the final value
> computation is not used during loop iteration.  This fits
> neatly enough into existing tricks performed by DCE like
> removing unused malloc/free pairs.
DO you have the right BZ #?  90648 is a ICE in tree checking and doesn't
have a loop :-)



> 
> There's a few complications, one is it fails to bootstrap
> because it exposes a few uninit warning false positives,
> another is that -fno-tree-sccp is no longer effective.
> As written this turns gcc.dg/pr34027-1.c into a division
> again (I did not copy the expression_expensive checking).
> It seems to also need -ftrapv adjustements (gcc.dg/pr81661.c).
> 
> The goal of this patch is to remove the SCCP pass, or rather
> us unconditionally replacing loop-closed PHIs with final
> value computations which we've got complaints in the past
> already that it duplicates computation that is readily
> available.  I've not yet figured testsuite fallout from that
> change.
> 
> For the -fno-tree-sccp I consider to simply honor that
> flag in the DCE path, for the gcc.dg/pr34027-1.c I'll
> re-install the expression_expensive checking.  I'll
> also fix the -ftrapv issue.
> 
> Does this otherwise look a sensible way forward?

> 
> Thanks,
> Richard.
> 
> FAIL: gcc.dg/builtin-object-size-1.c execution test
> FAIL: gcc.dg/builtin-object-size-5.c scan-assembler-not abort
> FAIL: gcc.dg/pr34027-1.c scan-tree-dump-times optimized " / " 0
> FAIL: gcc.dg/pr81661.c (internal compiler error)
> FAIL: gcc.dg/pr81661.c (test for excess errors)
> XPASS: gcc.dg/tree-ssa/loop-15.c scan-tree-dump-times optimized " + " 0
> FAIL: gcc.dg/tree-ssa/loop-15.c scan-tree-dump-times optimized "if " 1
> FAIL: gcc.dg/tree-ssa/loop-26.c scan-tree-dump-times optimized "if" 2
> FAIL: gcc.dg/tree-ssa/pr32044.c scan-tree-dump-times optimized " / " 0
> FAIL: gcc.dg/tree-ssa/pr32044.c scan-tree-dump-times optimized "if" 6
> FAIL: gcc.dg/tree-ssa/pr64183.c scan-tree-dump cunroll "Loop 2 iterates at 
> most 3 times"
> FAIL: gcc.dg/tree-ssa/ssa-pre-3.c scan-tree-dump-times pre "Eliminated: 2" 1
> FAIL: gcc.dg/vect/no-scevccp-noreassoc-outer-3.c scan-tree-dump-times vect 
> "OUTER LOOP VECTORIZED." 1
> FAIL: gcc.dg/vect/no-scevccp-noreassoc-outer-4.c scan-tree-dump-times vect 
> "OUTER LOOP VECTORIZED." 1
> FAIL: gcc.dg/vect/no-scevccp-noreassoc-outer-5.c scan-tree-dump-times vect 
> "OUTER LOOP VECTORIZED." 1
> FAIL: gcc.dg/vect/no-scevccp-outer-11.c scan-tree-dump-times vect "OUTER LOOP 
> VECTORIZED." 1
> FAIL: gcc.dg/vect/no-scevccp-outer-13.c scan-tree-dump-times vect "OUTER LOOP 
> VECTORIZED." 1
> FAIL: gcc.dg/vect/no-scevccp-outer-14.c scan-tree-dump-times vect "OUTER LOOP 
> VECTORIZED." 1
> FAIL: gcc.dg/vect/no-scevccp-outer-15.c scan-tree-dump-times vect "OUTER LOOP 
> VECTORIZED." 1
> FAIL: gcc.dg/vect/no-scevccp-outer-16.c scan-tree-dump-times vect "OUTER LOOP 
> VECTORIZED." 1
> FAIL: gcc.dg/vect/no-scevccp-outer-17.c scan-tree-dump-times vect "OUTER LOOP 
> VECTORIZED." 1
> FAIL: gcc.dg/vect/no-scevccp-outer-18.c scan-tree-dump-times vect "OUTER LOOP 
> VECTORIZED." 1
> FAIL: gcc.dg/vect/no-scevccp-outer-19.c scan-tree-dump-times vect "OUTER LOOP 
> VECTORIZED." 1
> FAIL: gcc.dg/vect/no-scevccp-outer-2.c scan-tree-dump-times vect "OUTER LOOP 
> VECTORIZED" 1
> FAIL: gcc.dg/vect/no-scevccp-outer-20.c scan-tree-dump-times vect "OUTER LOOP 
> VECTORIZED." 1
> FAIL: gcc.dg/vect/no-scevccp-outer-21.c scan-tree-dump-times vect "OUTER LOOP 
> VECTORIZED." 1
> FAIL: gcc.dg/vect/no-scevccp-outer-3.c scan-tree-dump-times vect "OUTER LOOP 
> VECTORIZED." 1
> FAIL: gcc.dg/vect/no-scevccp-outer-5.c scan-tree-dump-times vect "OUTER LOOP 
> VECTORIZED." 1
> FAIL: gcc.dg/vect/no-scevccp-outer-6-global.c scan-tree-dump-times vect 
> "OUTER LOOP VECTORIZED." 1
> FAIL: gcc.dg/vect/no-scevccp-outer-6.c scan-tree-dump-times vect "OUTER LOOP 
> VECTORIZED." 1
> FAIL: gcc.dg/vect/no-scevccp-outer-7.c scan-tree-dump-times vect "OUTER LOOP 
> VECTORIZED." 1
> FAIL: gcc.dg/vect/no-scevccp-outer-8.c scan-tree-dump-times vect "OUTER LOOP 
> VECTORIZED." 1
> FAIL: gcc.dg/vect/no-scevccp-vect-iv-1.c scan-tree-dump-times vect 
> "vectorized 1 loops" 1
> FAIL: gcc.dg/vect/no-scevccp-vect-iv-3.c scan-tree-dump-times vect 
> "vect_recog_widen_sum_pattern: detected" 1
> FAIL: gcc.dg/vect/no-scevccp-vect-iv-3.c scan-tree-dump-times vect 
> "vectorized 1 loops" 1
> 
> Running target unix//-m32
> FAIL: gcc.dg/builtin-object-size-1.c execution test
> FAIL: gcc.dg/builtin-object-size-5.c scan-assembler-not abort
> FAIL: gcc.dg/pr34027-1.c scan-tree-dump-times optimized " / " 0
> FAIL: gcc.dg/pr81661.c (internal compiler error)
> FAIL: gcc.dg/pr81661.c (test for excess errors)
> XPASS: gcc.dg/tree-ssa/loop-15.c scan-tree-dump-times optimized " + " 0
> FAIL: gcc.dg/tree-ssa/loop-15.c scan-tree-dump-times optimized "if " 1
> FAIL: gcc.dg/tree-ssa/loop-26.c 

[PATCH][RFC] final-value replacement from DCE

2019-05-29 Thread Richard Biener


The following tries to address PR90648 by performing final
value replacement from DCE when DCE knows the final value
computation is not used during loop iteration.  This fits
neatly enough into existing tricks performed by DCE like
removing unused malloc/free pairs.

There's a few complications, one is it fails to bootstrap
because it exposes a few uninit warning false positives,
another is that -fno-tree-sccp is no longer effective.
As written this turns gcc.dg/pr34027-1.c into a division
again (I did not copy the expression_expensive checking).
It seems to also need -ftrapv adjustements (gcc.dg/pr81661.c).

The goal of this patch is to remove the SCCP pass, or rather
us unconditionally replacing loop-closed PHIs with final
value computations which we've got complaints in the past
already that it duplicates computation that is readily
available.  I've not yet figured testsuite fallout from that
change.

For the -fno-tree-sccp I consider to simply honor that
flag in the DCE path, for the gcc.dg/pr34027-1.c I'll
re-install the expression_expensive checking.  I'll
also fix the -ftrapv issue.

Does this otherwise look a sensible way forward?

Thanks,
Richard.

FAIL: gcc.dg/builtin-object-size-1.c execution test
FAIL: gcc.dg/builtin-object-size-5.c scan-assembler-not abort
FAIL: gcc.dg/pr34027-1.c scan-tree-dump-times optimized " / " 0
FAIL: gcc.dg/pr81661.c (internal compiler error)
FAIL: gcc.dg/pr81661.c (test for excess errors)
XPASS: gcc.dg/tree-ssa/loop-15.c scan-tree-dump-times optimized " + " 0
FAIL: gcc.dg/tree-ssa/loop-15.c scan-tree-dump-times optimized "if " 1
FAIL: gcc.dg/tree-ssa/loop-26.c scan-tree-dump-times optimized "if" 2
FAIL: gcc.dg/tree-ssa/pr32044.c scan-tree-dump-times optimized " / " 0
FAIL: gcc.dg/tree-ssa/pr32044.c scan-tree-dump-times optimized "if" 6
FAIL: gcc.dg/tree-ssa/pr64183.c scan-tree-dump cunroll "Loop 2 iterates at most 
3 times"
FAIL: gcc.dg/tree-ssa/ssa-pre-3.c scan-tree-dump-times pre "Eliminated: 2" 1
FAIL: gcc.dg/vect/no-scevccp-noreassoc-outer-3.c scan-tree-dump-times vect 
"OUTER LOOP VECTORIZED." 1
FAIL: gcc.dg/vect/no-scevccp-noreassoc-outer-4.c scan-tree-dump-times vect 
"OUTER LOOP VECTORIZED." 1
FAIL: gcc.dg/vect/no-scevccp-noreassoc-outer-5.c scan-tree-dump-times vect 
"OUTER LOOP VECTORIZED." 1
FAIL: gcc.dg/vect/no-scevccp-outer-11.c scan-tree-dump-times vect "OUTER LOOP 
VECTORIZED." 1
FAIL: gcc.dg/vect/no-scevccp-outer-13.c scan-tree-dump-times vect "OUTER LOOP 
VECTORIZED." 1
FAIL: gcc.dg/vect/no-scevccp-outer-14.c scan-tree-dump-times vect "OUTER LOOP 
VECTORIZED." 1
FAIL: gcc.dg/vect/no-scevccp-outer-15.c scan-tree-dump-times vect "OUTER LOOP 
VECTORIZED." 1
FAIL: gcc.dg/vect/no-scevccp-outer-16.c scan-tree-dump-times vect "OUTER LOOP 
VECTORIZED." 1
FAIL: gcc.dg/vect/no-scevccp-outer-17.c scan-tree-dump-times vect "OUTER LOOP 
VECTORIZED." 1
FAIL: gcc.dg/vect/no-scevccp-outer-18.c scan-tree-dump-times vect "OUTER LOOP 
VECTORIZED." 1
FAIL: gcc.dg/vect/no-scevccp-outer-19.c scan-tree-dump-times vect "OUTER LOOP 
VECTORIZED." 1
FAIL: gcc.dg/vect/no-scevccp-outer-2.c scan-tree-dump-times vect "OUTER LOOP 
VECTORIZED" 1
FAIL: gcc.dg/vect/no-scevccp-outer-20.c scan-tree-dump-times vect "OUTER LOOP 
VECTORIZED." 1
FAIL: gcc.dg/vect/no-scevccp-outer-21.c scan-tree-dump-times vect "OUTER LOOP 
VECTORIZED." 1
FAIL: gcc.dg/vect/no-scevccp-outer-3.c scan-tree-dump-times vect "OUTER LOOP 
VECTORIZED." 1
FAIL: gcc.dg/vect/no-scevccp-outer-5.c scan-tree-dump-times vect "OUTER LOOP 
VECTORIZED." 1
FAIL: gcc.dg/vect/no-scevccp-outer-6-global.c scan-tree-dump-times vect "OUTER 
LOOP VECTORIZED." 1
FAIL: gcc.dg/vect/no-scevccp-outer-6.c scan-tree-dump-times vect "OUTER LOOP 
VECTORIZED." 1
FAIL: gcc.dg/vect/no-scevccp-outer-7.c scan-tree-dump-times vect "OUTER LOOP 
VECTORIZED." 1
FAIL: gcc.dg/vect/no-scevccp-outer-8.c scan-tree-dump-times vect "OUTER LOOP 
VECTORIZED." 1
FAIL: gcc.dg/vect/no-scevccp-vect-iv-1.c scan-tree-dump-times vect "vectorized 
1 loops" 1
FAIL: gcc.dg/vect/no-scevccp-vect-iv-3.c scan-tree-dump-times vect 
"vect_recog_widen_sum_pattern: detected" 1
FAIL: gcc.dg/vect/no-scevccp-vect-iv-3.c scan-tree-dump-times vect "vectorized 
1 loops" 1

Running target unix//-m32
FAIL: gcc.dg/builtin-object-size-1.c execution test
FAIL: gcc.dg/builtin-object-size-5.c scan-assembler-not abort
FAIL: gcc.dg/pr34027-1.c scan-tree-dump-times optimized " / " 0
FAIL: gcc.dg/pr81661.c (internal compiler error)
FAIL: gcc.dg/pr81661.c (test for excess errors)
XPASS: gcc.dg/tree-ssa/loop-15.c scan-tree-dump-times optimized " + " 0
FAIL: gcc.dg/tree-ssa/loop-15.c scan-tree-dump-times optimized "if " 1
FAIL: gcc.dg/tree-ssa/loop-26.c scan-tree-dump-times optimized "if" 2
FAIL: gcc.dg/tree-ssa/pr32044.c scan-tree-dump-times optimized " / " 0
FAIL: gcc.dg/tree-ssa/pr32044.c scan-tree-dump-times optimized "if" 6
FAIL: gcc.dg/tree-ssa/pr64183.c scan-tree-dump cunroll "Loop 2 iterates at most 
3 times"
FAIL: gcc.dg/tree-ssa/ssa-pre-3.c scan-tree-dump-times pre "Eliminated: