Re: [PATCH][RFC] final-value replacement from DCE
On Wed, 29 May 2019, Jakub Jelinek wrote: > On Wed, May 29, 2019 at 09:57:50AM -0600, Jeff Law wrote: > > > FAIL: gcc.dg/builtin-object-size-1.c execution test > > > FAIL: gcc.dg/builtin-object-size-5.c scan-assembler-not abort > > I admit I haven't looked at the details here, but wonder if the optimization > couldn't be done only in the DCE passes post IPA, otherwise we risk > behavior changes for __builtin_object_size. We can do that - the first CD-DCE pass is in the loop pipeline though, _after_ final value replacement. Looking at the testsuite fallout it's also clear that doing loop-header copying before final-value replacement results in better code for some testcases. So I'm trying turning the first DCE after loop-header copying into a CD-DCE run, not doing final value replacement before IPA. The following does that independently, bootstrapped & tested on x86_64-unknown-linux-gnu. It will leave FAIL: gcc.dg/tree-ssa/pr68619-4.c scan-tree-dump optimized "PHI <.*, 39" because the testcase is totally unclear on who is supposed to propagate 39 and why. With CD-DCE there's one PRE opportunity less because, well, a value is no longer partially redundant. I hope I catched all dce/cddce dump issues and it just seemed to me that unifying dce and cd-dce may be a useful cleanup and just have NEXT_PASS (pass_dce, true /* perform control-dependent DCE */) but not for today... Not going to apply this separately but only eventually together with the rest. Richard. 2019-05-31 Richard Biener PR tree-optimization/68619 * passes.def (pass_dce after CH): Turn into pass_cd_dce. * g++.dg/tree-ssa/copyprop-1.C: Adjust dump scanned. * gcc.dg/tree-ssa/20030709-2.c: Likewise. * gcc.dg/tree-ssa/20030808-1.c: Likewise. * gcc.dg/tree-ssa/20040729-1.c: Likewise. * gcc.dg/tree-ssa/loop-36.c: Likewise. * gcc.dg/tree-ssa/ssa-dce-1.c: Likewise. * gcc.dg/tree-ssa/ssa-dce-2.c: Likewise. Index: gcc/passes.def === --- gcc/passes.def (revision 271802) +++ gcc/passes.def (working copy) @@ -231,7 +231,7 @@ along with GCC; see the file COPYING3. NEXT_PASS (pass_isolate_erroneous_paths); NEXT_PASS (pass_dse); NEXT_PASS (pass_reassoc, true /* insert_powi_p */); - NEXT_PASS (pass_dce); + NEXT_PASS (pass_cd_dce); NEXT_PASS (pass_forwprop); NEXT_PASS (pass_phiopt, false /* early_p */); NEXT_PASS (pass_ccp, true /* nonzero_p */); Index: gcc/testsuite/g++.dg/tree-ssa/copyprop-1.C === --- gcc/testsuite/g++.dg/tree-ssa/copyprop-1.C (revision 271802) +++ gcc/testsuite/g++.dg/tree-ssa/copyprop-1.C (working copy) @@ -1,5 +1,5 @@ /* { dg-do compile } */ -/* { dg-options "-O -fdump-tree-dce3" } */ +/* { dg-options "-O -fdump-tree-cddce2" } */ /* Verify that we can eliminate the useless conversions to/from const qualified pointer types @@ -27,4 +27,4 @@ int foo(Object) /* Remaining should be two loads. */ -/* { dg-final { scan-tree-dump-times " = \[^\n\]*;" 2 "dce3" } } */ +/* { dg-final { scan-tree-dump-times " = \[^\n\]*;" 2 "cddce2" } } */ Index: gcc/testsuite/gcc.dg/tree-ssa/20030709-2.c === --- gcc/testsuite/gcc.dg/tree-ssa/20030709-2.c (revision 271802) +++ gcc/testsuite/gcc.dg/tree-ssa/20030709-2.c (working copy) @@ -1,5 +1,5 @@ /* { dg-do compile } */ -/* { dg-options "-O -fdump-tree-dce3" } */ +/* { dg-options "-O -fdump-tree-cddce2" } */ struct rtx_def; typedef struct rtx_def *rtx; @@ -42,13 +42,13 @@ get_alias_set (t) /* There should be precisely one load of ->decl.rtl. If there is more than, then the dominator optimizations failed. */ -/* { dg-final { scan-tree-dump-times "->decl\\.rtl" 1 "dce3"} } */ +/* { dg-final { scan-tree-dump-times "->decl\\.rtl" 1 "cddce2"} } */ /* There should be no loads of .rtmem since the complex return statement is just "return 0". */ -/* { dg-final { scan-tree-dump-times ".rtmem" 0 "dce3"} } */ +/* { dg-final { scan-tree-dump-times ".rtmem" 0 "cddce2"} } */ /* There should be one IF statement (the complex return statement should collapse down to a simple return 0 without any conditionals). */ -/* { dg-final { scan-tree-dump-times "if " 1 "dce3"} } */ +/* { dg-final { scan-tree-dump-times "if " 1 "cddce2"} } */ Index: gcc/testsuite/gcc.dg/tree-ssa/20030808-1.c === --- gcc/testsuite/gcc.dg/tree-ssa/20030808-1.c (revision 271802) +++ gcc/testsuite/gcc.dg/tree-ssa/20030808-1.c (working copy) @@ -1,5 +1,5 @@ /* { dg-do compile } */ -/* { dg-options "-O1 -fdump-tree-cddce3" } */ +/* { dg-options "-O1 -fdump-tree-cddce4" } */ extern void abort (void); @@ -33,8 +33,8 @@ delete_dead_jumptables () /* There should be no loads of ->code. If any
Re: [PATCH][RFC] final-value replacement from DCE
On Wed, May 29, 2019 at 09:57:50AM -0600, Jeff Law wrote: > > FAIL: gcc.dg/builtin-object-size-1.c execution test > > FAIL: gcc.dg/builtin-object-size-5.c scan-assembler-not abort I admit I haven't looked at the details here, but wonder if the optimization couldn't be done only in the DCE passes post IPA, otherwise we risk behavior changes for __builtin_object_size. Jakub
Re: [PATCH][RFC] final-value replacement from DCE
On 5/29/19 7:36 AM, Richard Biener wrote: > > The following tries to address PR90648 by performing final > value replacement from DCE when DCE knows the final value > computation is not used during loop iteration. This fits > neatly enough into existing tricks performed by DCE like > removing unused malloc/free pairs. DO you have the right BZ #? 90648 is a ICE in tree checking and doesn't have a loop :-) > > There's a few complications, one is it fails to bootstrap > because it exposes a few uninit warning false positives, > another is that -fno-tree-sccp is no longer effective. > As written this turns gcc.dg/pr34027-1.c into a division > again (I did not copy the expression_expensive checking). > It seems to also need -ftrapv adjustements (gcc.dg/pr81661.c). > > The goal of this patch is to remove the SCCP pass, or rather > us unconditionally replacing loop-closed PHIs with final > value computations which we've got complaints in the past > already that it duplicates computation that is readily > available. I've not yet figured testsuite fallout from that > change. > > For the -fno-tree-sccp I consider to simply honor that > flag in the DCE path, for the gcc.dg/pr34027-1.c I'll > re-install the expression_expensive checking. I'll > also fix the -ftrapv issue. > > Does this otherwise look a sensible way forward? > > Thanks, > Richard. > > FAIL: gcc.dg/builtin-object-size-1.c execution test > FAIL: gcc.dg/builtin-object-size-5.c scan-assembler-not abort > FAIL: gcc.dg/pr34027-1.c scan-tree-dump-times optimized " / " 0 > FAIL: gcc.dg/pr81661.c (internal compiler error) > FAIL: gcc.dg/pr81661.c (test for excess errors) > XPASS: gcc.dg/tree-ssa/loop-15.c scan-tree-dump-times optimized " + " 0 > FAIL: gcc.dg/tree-ssa/loop-15.c scan-tree-dump-times optimized "if " 1 > FAIL: gcc.dg/tree-ssa/loop-26.c scan-tree-dump-times optimized "if" 2 > FAIL: gcc.dg/tree-ssa/pr32044.c scan-tree-dump-times optimized " / " 0 > FAIL: gcc.dg/tree-ssa/pr32044.c scan-tree-dump-times optimized "if" 6 > FAIL: gcc.dg/tree-ssa/pr64183.c scan-tree-dump cunroll "Loop 2 iterates at > most 3 times" > FAIL: gcc.dg/tree-ssa/ssa-pre-3.c scan-tree-dump-times pre "Eliminated: 2" 1 > FAIL: gcc.dg/vect/no-scevccp-noreassoc-outer-3.c scan-tree-dump-times vect > "OUTER LOOP VECTORIZED." 1 > FAIL: gcc.dg/vect/no-scevccp-noreassoc-outer-4.c scan-tree-dump-times vect > "OUTER LOOP VECTORIZED." 1 > FAIL: gcc.dg/vect/no-scevccp-noreassoc-outer-5.c scan-tree-dump-times vect > "OUTER LOOP VECTORIZED." 1 > FAIL: gcc.dg/vect/no-scevccp-outer-11.c scan-tree-dump-times vect "OUTER LOOP > VECTORIZED." 1 > FAIL: gcc.dg/vect/no-scevccp-outer-13.c scan-tree-dump-times vect "OUTER LOOP > VECTORIZED." 1 > FAIL: gcc.dg/vect/no-scevccp-outer-14.c scan-tree-dump-times vect "OUTER LOOP > VECTORIZED." 1 > FAIL: gcc.dg/vect/no-scevccp-outer-15.c scan-tree-dump-times vect "OUTER LOOP > VECTORIZED." 1 > FAIL: gcc.dg/vect/no-scevccp-outer-16.c scan-tree-dump-times vect "OUTER LOOP > VECTORIZED." 1 > FAIL: gcc.dg/vect/no-scevccp-outer-17.c scan-tree-dump-times vect "OUTER LOOP > VECTORIZED." 1 > FAIL: gcc.dg/vect/no-scevccp-outer-18.c scan-tree-dump-times vect "OUTER LOOP > VECTORIZED." 1 > FAIL: gcc.dg/vect/no-scevccp-outer-19.c scan-tree-dump-times vect "OUTER LOOP > VECTORIZED." 1 > FAIL: gcc.dg/vect/no-scevccp-outer-2.c scan-tree-dump-times vect "OUTER LOOP > VECTORIZED" 1 > FAIL: gcc.dg/vect/no-scevccp-outer-20.c scan-tree-dump-times vect "OUTER LOOP > VECTORIZED." 1 > FAIL: gcc.dg/vect/no-scevccp-outer-21.c scan-tree-dump-times vect "OUTER LOOP > VECTORIZED." 1 > FAIL: gcc.dg/vect/no-scevccp-outer-3.c scan-tree-dump-times vect "OUTER LOOP > VECTORIZED." 1 > FAIL: gcc.dg/vect/no-scevccp-outer-5.c scan-tree-dump-times vect "OUTER LOOP > VECTORIZED." 1 > FAIL: gcc.dg/vect/no-scevccp-outer-6-global.c scan-tree-dump-times vect > "OUTER LOOP VECTORIZED." 1 > FAIL: gcc.dg/vect/no-scevccp-outer-6.c scan-tree-dump-times vect "OUTER LOOP > VECTORIZED." 1 > FAIL: gcc.dg/vect/no-scevccp-outer-7.c scan-tree-dump-times vect "OUTER LOOP > VECTORIZED." 1 > FAIL: gcc.dg/vect/no-scevccp-outer-8.c scan-tree-dump-times vect "OUTER LOOP > VECTORIZED." 1 > FAIL: gcc.dg/vect/no-scevccp-vect-iv-1.c scan-tree-dump-times vect > "vectorized 1 loops" 1 > FAIL: gcc.dg/vect/no-scevccp-vect-iv-3.c scan-tree-dump-times vect > "vect_recog_widen_sum_pattern: detected" 1 > FAIL: gcc.dg/vect/no-scevccp-vect-iv-3.c scan-tree-dump-times vect > "vectorized 1 loops" 1 > > Running target unix//-m32 > FAIL: gcc.dg/builtin-object-size-1.c execution test > FAIL: gcc.dg/builtin-object-size-5.c scan-assembler-not abort > FAIL: gcc.dg/pr34027-1.c scan-tree-dump-times optimized " / " 0 > FAIL: gcc.dg/pr81661.c (internal compiler error) > FAIL: gcc.dg/pr81661.c (test for excess errors) > XPASS: gcc.dg/tree-ssa/loop-15.c scan-tree-dump-times optimized " + " 0 > FAIL: gcc.dg/tree-ssa/loop-15.c scan-tree-dump-times optimized "if " 1 > FAIL: gcc.dg/tree-ssa/loop-26.c
[PATCH][RFC] final-value replacement from DCE
The following tries to address PR90648 by performing final value replacement from DCE when DCE knows the final value computation is not used during loop iteration. This fits neatly enough into existing tricks performed by DCE like removing unused malloc/free pairs. There's a few complications, one is it fails to bootstrap because it exposes a few uninit warning false positives, another is that -fno-tree-sccp is no longer effective. As written this turns gcc.dg/pr34027-1.c into a division again (I did not copy the expression_expensive checking). It seems to also need -ftrapv adjustements (gcc.dg/pr81661.c). The goal of this patch is to remove the SCCP pass, or rather us unconditionally replacing loop-closed PHIs with final value computations which we've got complaints in the past already that it duplicates computation that is readily available. I've not yet figured testsuite fallout from that change. For the -fno-tree-sccp I consider to simply honor that flag in the DCE path, for the gcc.dg/pr34027-1.c I'll re-install the expression_expensive checking. I'll also fix the -ftrapv issue. Does this otherwise look a sensible way forward? Thanks, Richard. FAIL: gcc.dg/builtin-object-size-1.c execution test FAIL: gcc.dg/builtin-object-size-5.c scan-assembler-not abort FAIL: gcc.dg/pr34027-1.c scan-tree-dump-times optimized " / " 0 FAIL: gcc.dg/pr81661.c (internal compiler error) FAIL: gcc.dg/pr81661.c (test for excess errors) XPASS: gcc.dg/tree-ssa/loop-15.c scan-tree-dump-times optimized " + " 0 FAIL: gcc.dg/tree-ssa/loop-15.c scan-tree-dump-times optimized "if " 1 FAIL: gcc.dg/tree-ssa/loop-26.c scan-tree-dump-times optimized "if" 2 FAIL: gcc.dg/tree-ssa/pr32044.c scan-tree-dump-times optimized " / " 0 FAIL: gcc.dg/tree-ssa/pr32044.c scan-tree-dump-times optimized "if" 6 FAIL: gcc.dg/tree-ssa/pr64183.c scan-tree-dump cunroll "Loop 2 iterates at most 3 times" FAIL: gcc.dg/tree-ssa/ssa-pre-3.c scan-tree-dump-times pre "Eliminated: 2" 1 FAIL: gcc.dg/vect/no-scevccp-noreassoc-outer-3.c scan-tree-dump-times vect "OUTER LOOP VECTORIZED." 1 FAIL: gcc.dg/vect/no-scevccp-noreassoc-outer-4.c scan-tree-dump-times vect "OUTER LOOP VECTORIZED." 1 FAIL: gcc.dg/vect/no-scevccp-noreassoc-outer-5.c scan-tree-dump-times vect "OUTER LOOP VECTORIZED." 1 FAIL: gcc.dg/vect/no-scevccp-outer-11.c scan-tree-dump-times vect "OUTER LOOP VECTORIZED." 1 FAIL: gcc.dg/vect/no-scevccp-outer-13.c scan-tree-dump-times vect "OUTER LOOP VECTORIZED." 1 FAIL: gcc.dg/vect/no-scevccp-outer-14.c scan-tree-dump-times vect "OUTER LOOP VECTORIZED." 1 FAIL: gcc.dg/vect/no-scevccp-outer-15.c scan-tree-dump-times vect "OUTER LOOP VECTORIZED." 1 FAIL: gcc.dg/vect/no-scevccp-outer-16.c scan-tree-dump-times vect "OUTER LOOP VECTORIZED." 1 FAIL: gcc.dg/vect/no-scevccp-outer-17.c scan-tree-dump-times vect "OUTER LOOP VECTORIZED." 1 FAIL: gcc.dg/vect/no-scevccp-outer-18.c scan-tree-dump-times vect "OUTER LOOP VECTORIZED." 1 FAIL: gcc.dg/vect/no-scevccp-outer-19.c scan-tree-dump-times vect "OUTER LOOP VECTORIZED." 1 FAIL: gcc.dg/vect/no-scevccp-outer-2.c scan-tree-dump-times vect "OUTER LOOP VECTORIZED" 1 FAIL: gcc.dg/vect/no-scevccp-outer-20.c scan-tree-dump-times vect "OUTER LOOP VECTORIZED." 1 FAIL: gcc.dg/vect/no-scevccp-outer-21.c scan-tree-dump-times vect "OUTER LOOP VECTORIZED." 1 FAIL: gcc.dg/vect/no-scevccp-outer-3.c scan-tree-dump-times vect "OUTER LOOP VECTORIZED." 1 FAIL: gcc.dg/vect/no-scevccp-outer-5.c scan-tree-dump-times vect "OUTER LOOP VECTORIZED." 1 FAIL: gcc.dg/vect/no-scevccp-outer-6-global.c scan-tree-dump-times vect "OUTER LOOP VECTORIZED." 1 FAIL: gcc.dg/vect/no-scevccp-outer-6.c scan-tree-dump-times vect "OUTER LOOP VECTORIZED." 1 FAIL: gcc.dg/vect/no-scevccp-outer-7.c scan-tree-dump-times vect "OUTER LOOP VECTORIZED." 1 FAIL: gcc.dg/vect/no-scevccp-outer-8.c scan-tree-dump-times vect "OUTER LOOP VECTORIZED." 1 FAIL: gcc.dg/vect/no-scevccp-vect-iv-1.c scan-tree-dump-times vect "vectorized 1 loops" 1 FAIL: gcc.dg/vect/no-scevccp-vect-iv-3.c scan-tree-dump-times vect "vect_recog_widen_sum_pattern: detected" 1 FAIL: gcc.dg/vect/no-scevccp-vect-iv-3.c scan-tree-dump-times vect "vectorized 1 loops" 1 Running target unix//-m32 FAIL: gcc.dg/builtin-object-size-1.c execution test FAIL: gcc.dg/builtin-object-size-5.c scan-assembler-not abort FAIL: gcc.dg/pr34027-1.c scan-tree-dump-times optimized " / " 0 FAIL: gcc.dg/pr81661.c (internal compiler error) FAIL: gcc.dg/pr81661.c (test for excess errors) XPASS: gcc.dg/tree-ssa/loop-15.c scan-tree-dump-times optimized " + " 0 FAIL: gcc.dg/tree-ssa/loop-15.c scan-tree-dump-times optimized "if " 1 FAIL: gcc.dg/tree-ssa/loop-26.c scan-tree-dump-times optimized "if" 2 FAIL: gcc.dg/tree-ssa/pr32044.c scan-tree-dump-times optimized " / " 0 FAIL: gcc.dg/tree-ssa/pr32044.c scan-tree-dump-times optimized "if" 6 FAIL: gcc.dg/tree-ssa/pr64183.c scan-tree-dump cunroll "Loop 2 iterates at most 3 times" FAIL: gcc.dg/tree-ssa/ssa-pre-3.c scan-tree-dump-times pre "Eliminated: