On Mon, 30 Apr 2018, Kyrill Tkachov wrote: > Hi Richard, > > > On 27/04/18 09:24, Richard Biener wrote: > > On Thu, 26 Apr 2018, Richard Biener wrote: > > > > > > > > The following makes loop-header copying stop after the first exit test > > > it copied. The reports rightfully complain about too much peeling. > > > If some cases pop up which show we should peel up to a specific test > > > we need to improve this heuristic which simply errs on the easy side. > > > > > > Bootstrap & regtest running on x86_64-unknown-linux-gnu. > > > > Bootstrap went ok but it showed a few required testsuite adjustments. > > > > gcc.dg/tree-ssa/cunroll-13.c looks fragile (it was a test for a > > profile mismatch), so I rewrote it as a GIMPLE testcase. > > > > With the ivopt_mult_[12].c testcases IVOPTs no longer elminiates > > an IV due to the change in IL -- what loop header copying does now > > looks more sensible than before though, so I added GIMPLE testcase > > variants that verify IVOPTs can still pull off the trick but I > > had to XFAIL the C testcases (less IVs still look good and I fail > > to see why the trick shouldn't work with the new IL - sth to > > investigate). > > > > Re-bootstrapping and testing on x86_64-unknown-linux-gnu now. > > > > After this patch I'm seeing gfortran.dg/pr51434.f90 FAIL on aarch64 at -O1 > It is an execution failure. > > The current trunk removes almost all of main (final optimised GIMPLE dump): > > __attribute__((externally_visible)) > main (integer(kind=4) argc, character(kind=1) * * argv) > { > static integer(kind=4) options.3[7] = {68, 8191, 0, 1, 1, 0, 31}; > > <bb 2> [local count: 1073741826]: > _gfortran_set_args (argc_2(D), argv_3(D)); > _gfortran_set_options (7, &options.3[0]); > return 0; > > }
I see this at -O3. But the IL below following at -O1. Meanwhile PR85579 got filed for this where I noted the testcase is somwhat fishy. Richard. > whereas trunk on the 28th had much more control flow in it: > > __attribute__((externally_visible)) > main (integer(kind=4) argc, character(kind=1) * * argv) > { > unsigned long ivtmp.5; > static struct a c = {.m=5, .t={"a", "b", "c", "d", "e", " ", " ", " ", " ", > " ", " ", " ", " ", " ", " ", " ", " ", " "}}; > static integer(kind=4) options.3[7] = {68, 8191, 0, 1, 1, 0, 31}; > character(kind=1) _8; > character(kind=1)[1:1] * _9; > character(kind=1) _10; > > <bb 2> [local count: 1073741826]: > _gfortran_set_args (argc_2(D), argv_3(D)); > _gfortran_set_options (7, &options.3[0]); > > <bb 3> [local count: 4208275768]: > # ivtmp.5_12 = PHI <0(2), ivtmp.5_15(4)> > _8 = MEM[symbol: c, index: ivtmp.5_12, offset: 4B]; > _9 = MEM[symbol: A.1, index: ivtmp.5_12, step: 8, offset: 0B]; > _10 = *_9[1]{lb: 1 sz: 1}; > if (_8 != _10) > goto <bb 6>; [5.50%] > else > goto <bb 4>; [94.50%] > > <bb 4> [local count: 3976820602]: > ivtmp.5_15 = ivtmp.5_12 + 1; > if (ivtmp.5_15 == 5) > goto <bb 5>; [16.67%] > else > goto <bb 3>; [83.33%] > > <bb 5> [local count: 1072883003]: > return 0; > > <bb 6> [local count: 429327]: > _gfortran_stop_numeric (2, 0); > > } > > Cheers, > Kyrill > > > Richard. > > > > 2018-04-26 Richard Biener <rguent...@suse.de> > > > > PR tree-optimization/28364 > > PR tree-optimization/85275 > > * tree-ssa-loop-ch.c (ch_base::copy_headers): Stop after > > copying first exit test. > > > > * gcc.dg/tree-ssa/copy-headers-5.c: New testcase. > > * gcc.dg/tree-ssa/predcom-8.c: Likewise. > > * gcc.dg/tree-ssa/cunroll-13.c: Rewrite to gimple testcase. > > * gcc.dg/tree-ssa/ivopt_mult_1.c: XFAIL. > > * gcc.dg/tree-ssa/ivopt_mult_1g.c: Add gimple variant that > > still passes. > > * gcc.dg/tree-ssa/ivopt_mult_2.c: XFAIL. > > * gcc.dg/tree-ssa/ivopt_mult_2g.c: Add gimple variant that > > still passes. > > * gcc.dg/tree-ssa/ssa-dom-thread-7.c: Adjust. > > * gcc.dg/tree-ssa/20030710-1.c: Likewise. > > * gcc.dg/tree-ssa/20030711-1.c: Likewise. > > > > Index: gcc/tree-ssa-loop-ch.c > > =================================================================== > > --- gcc/tree-ssa-loop-ch.c (revision 259695) > > +++ gcc/tree-ssa-loop-ch.c (working copy) > > @@ -340,6 +340,11 @@ ch_base::copy_headers (function *fun) > > bbs[n_bbs++] = header; > > gcc_assert (bbs_size > n_bbs); > > header = exit->dest; > > + /* Make sure to stop copying after we copied the first exit test. > > + Without further heuristics we do not want to rotate the loop > > + any further. */ > > + if (loop_exits_from_bb_p (loop, exit->src)) > > + break; > > } > > > > if (!exit) > > Index: gcc/testsuite/gcc.dg/tree-ssa/copy-headers-5.c > > =================================================================== > > --- gcc/testsuite/gcc.dg/tree-ssa/copy-headers-5.c (nonexistent) > > +++ gcc/testsuite/gcc.dg/tree-ssa/copy-headers-5.c (working copy) > > @@ -0,0 +1,16 @@ > > +/* { dg-do compile } */ > > +/* { dg-options "-O2 -fdump-tree-ch2-details" } */ > > + > > +int is_sorted(int *a, int n) > > +{ > > + for (int i = 0; i < n - 1; i++) > > + if (a[i] > a[i + 1]) > > + return 0; > > + return 1; > > +} > > + > > +/* Verify we apply loop header copying but only copy the IV test and > > + not the alternate exit test. */ > > + > > +/* { dg-final { scan-tree-dump "is now do-while loop" "ch2" } } */ > > +/* { dg-final { scan-tree-dump-times " if " 3 "ch2" } } */ > > Index: gcc/testsuite/gcc.dg/tree-ssa/predcom-8.c > > =================================================================== > > --- gcc/testsuite/gcc.dg/tree-ssa/predcom-8.c (nonexistent) > > +++ gcc/testsuite/gcc.dg/tree-ssa/predcom-8.c (working copy) > > @@ -0,0 +1,12 @@ > > +/* { dg-do compile } */ > > +/* { dg-options "-O3 -fdump-tree-pcom-details" } */ > > + > > +int is_sorted(int *a, int n) > > +{ > > + for (int i = 0; i < n - 1; i++) > > + if (a[i] > a[i + 1]) > > + return 0; > > + return 1; > > +} > > + > > +/* { dg-final { scan-tree-dump "Executing predictive commoning without > > unrolling" "pcom" } } */ > > Index: gcc/testsuite/gcc.dg/tree-ssa/cunroll-13.c > > =================================================================== > > --- gcc/testsuite/gcc.dg/tree-ssa/cunroll-13.c (revision 259695) > > +++ gcc/testsuite/gcc.dg/tree-ssa/cunroll-13.c (working copy) > > @@ -1,5 +1,5 @@ > > /* { dg-do compile } */ > > -/* { dg-options "-O3 -fdisable-tree-evrp -fdisable-tree-cunrolli > > -fdisable-tree-vrp1 -fdump-tree-cunroll-blocks-details" } */ > > +/* { dg-options "-O3 -fgimple -fdump-tree-cunroll-blocks-details" } */ > > > > #if __SIZEOF_INT__ < 4 > > __extension__ typedef __INT32_TYPE__ i32; > > @@ -7,15 +7,53 @@ __extension__ typedef __INT32_TYPE__ i32 > > typedef int i32; > > #endif > > > > -struct a {int a[8];int b;}; > > -void > > -t(struct a *a) > > +struct a {i32 a[8];i32 b;}; > > + > > +void __GIMPLE (startwith("fix_loops")) > > +t (struct a * a) > > { > > - for (i32 i=0;i<123456 && a->a[i];i++) > > - a->a[i]++; > > + i32 i; > > + i32 _1; > > + i32 _2; > > + i32 _9; > > + i32 _11; > > + > > +bb_2: > > + _11 = a_6(D)->a[0]; > > + if (_11 != _Literal (i32) 0) > > + goto bb_6; > > + else > > + goto bb_3; > > + > > +bb_3: > > + return; > > + > > +bb_4: > > + _1 = _2 + 1; > > + a_6(D)->a[i_19] = _1; > > + i_8 = i_19 + _Literal (i32) 1; > > + if (i_8 <= _Literal (i32) 123455) > > + goto bb_5; > > + else > > + goto bb_3; > > + > > +bb_5: > > + i_19 = __PHI (bb_6: _Literal (i32) 1, bb_4: i_8); > > + _2 = a_6(D)->a[i_19]; > > + if (_2 != _Literal (i32) 0) > > + goto bb_4; > > + else > > + goto bb_3; > > + > > +bb_6: > > + _9 = _11 + _Literal (i32) 1; > > + a_6(D)->a[0] = _9; > > + goto bb_5; > > } > > -/* This pass relies on the fact that we do not eliminate the redundant test > > for i early. > > - It is necessary to disable all passes that do so. At the moment it is > > evrp, vrp1 and cunrolli. */ > > + > > +/* This testcase relies on the fact that we do not eliminate the redundant > > test > > + for i early. It is necessary to disable all passes that do so, for the > > + moment starting with the loop pipeline is good enough. */ > > /* { dg-final { scan-tree-dump-times "Loop 1 iterates 123454 times" 1 > > "cunroll" } } */ > > /* { dg-final { scan-tree-dump-times "Last iteration exit edge was proved > > true" 1 "cunroll" } } */ > > /* { dg-final { scan-tree-dump-times "Exit condition of peeled iterations > > was eliminated" 1 "cunroll" } } */ > > Index: gcc/testsuite/gcc.dg/tree-ssa/ivopt_mult_1.c > > =================================================================== > > --- gcc/testsuite/gcc.dg/tree-ssa/ivopt_mult_1.c (revision 259695) > > +++ gcc/testsuite/gcc.dg/tree-ssa/ivopt_mult_1.c (working copy) > > @@ -20,4 +20,4 @@ long foo(long* p, long* p2, int N1, int > > return s; > > } > > > > -/* { dg-final { scan-tree-dump-times "Replacing" 1 "ivopts"} } */ > > +/* { dg-final { scan-tree-dump-times "Replacing" 1 "ivopts" { xfail *-*-* } > > } } */ > > Index: gcc/testsuite/gcc.dg/tree-ssa/ivopt_mult_2.c > > =================================================================== > > --- gcc/testsuite/gcc.dg/tree-ssa/ivopt_mult_2.c (revision 259695) > > +++ gcc/testsuite/gcc.dg/tree-ssa/ivopt_mult_2.c (working copy) > > @@ -21,4 +21,4 @@ long foo(long* p, long* p2, int N1, int > > return s; > > } > > > > -/* { dg-final { scan-tree-dump-times "Replacing" 2 "ivopts"} } */ > > +/* { dg-final { scan-tree-dump-times "Replacing" 2 "ivopts" { xfail *-*-* } > > } } */ > > Index: gcc/testsuite/gcc.dg/tree-ssa/ivopt_mult_1g.c > > =================================================================== > > --- gcc/testsuite/gcc.dg/tree-ssa/ivopt_mult_1g.c (nonexistent) > > +++ gcc/testsuite/gcc.dg/tree-ssa/ivopt_mult_1g.c (working copy) > > @@ -0,0 +1,82 @@ > > +/* { dg-do compile { target {{ i?86-*-* x86_64-*-* } && lp64 } } } */ > > +/* { dg-options "-O2 -fgimple -m64 -fdump-tree-ivopts-details" } */ > > + > > +/* The test 'if (p2 > p_limit2)' can be replaced, so iv p2 can be > > + * eliminated. */ > > +long int __GIMPLE (startwith("fix_loops")) > > +foo (long int * p, long int * p2, int N1, int N2) > > +{ > > + long int s; > > + long int * p_limit2; > > + long int * p_limit; > > + long unsigned int _1; > > + long unsigned int _2; > > + long unsigned int _3; > > + long unsigned int _4; > > + long int _5; > > + > > + bb_2: > > + _1 = (long unsigned int) N1_10(D); > > + _2 = _1 * 8ul; > > + p_limit_12 = p_11(D) + _2; > > + _3 = (long unsigned int) N2_13(D); > > + _4 = _3 * 8ul; > > + p_limit2_15 = p2_14(D) + _4; > > + if (p_11(D) <= p_limit_12) > > + goto bb_3; > > + else > > + goto bb_13; > > + > > + bb_13: > > + > > + bb_9: > > + goto bb_6; > > + > > + bb_3: > > + p_20 = p_11(D) + 8ul; > > + p2_23 = p2_14(D) + 8ul; > > + if (p_limit2_15 < p2_23) > > + goto bb_14; > > + else > > + goto bb_7; > > + > > + bb_14: > > + goto bb_9; > > + > > + bb_7: > > + goto bb_5; > > + > > + bb_4: > > + p_16 = p_26 + 8ul; > > + p2_17 = p2_27 + 8ul; > > + if (p_limit2_15 < p2_17) > > + goto bb_11; > > + else > > + goto bb_8; > > + > > + bb_11: > > + goto bb_6; > > + > > + bb_8: > > + ; > > + > > + bb_5: > > + s_24 = __PHI (bb_7: 0l, bb_8: s_19); > > + p_26 = __PHI (bb_7: p_20, bb_8: p_16); > > + p2_27 = __PHI (bb_7: p2_23, bb_8: p2_17); > > + _5 = __MEM <long int> (p_26); > > + s_19 = _5 + s_24; > > + if (p_limit_12 >= p_26) > > + goto bb_4; > > + else > > + goto bb_12; > > + > > + bb_12: > > + ; > > + > > + bb_6: > > + s_25 = __PHI (bb_12: s_19, bb_11: s_19, bb_9: 0l); > > + return s_25; > > +} > > + > > +/* { dg-final { scan-tree-dump-times "Replacing" 1 "ivopts" } } */ > > Index: gcc/testsuite/gcc.dg/tree-ssa/ivopt_mult_2g.c > > =================================================================== > > --- gcc/testsuite/gcc.dg/tree-ssa/ivopt_mult_2g.c (nonexistent) > > +++ gcc/testsuite/gcc.dg/tree-ssa/ivopt_mult_2g.c (working copy) > > @@ -0,0 +1,79 @@ > > +/* { dg-do compile { target {{ i?86-*-* x86_64-*-* } && lp64 } } } */ > > +/* { dg-options "-O2 -fgimple -m64 -fdump-tree-ivopts-details" } */ > > + > > +/* Exit tests 'i < N1' and 'p2 > p_limit2' can be replaced, so > > + * two ivs i and p2 can be eliminate. */ > > +long int __GIMPLE (startwith("fix_loops")) > > +foo (long int * p, long int * p2, int N1, int N2) > > +{ > > + long int s; > > + long int * p_limit2; > > + int i; > > + long unsigned int _1; > > + long unsigned int _2; > > + long int _3; > > + > > + bb_2: > > + _1 = (long unsigned int) N2_9(D); > > + _2 = _1 * 8ul; > > + p_limit2_11 = p2_10(D) + _2; > > + if (N1_13(D) > 0) > > + goto bb_3; > > + else > > + goto bb_13; > > + > > + bb_13: > > + > > + bb_9: > > + goto bb_6; > > + > > + bb_3: > > + p_22 = p_12(D) + 8ul; > > + p2_23 = p2_10(D) + 8ul; > > + if (p_limit2_11 < p2_23) > > + goto bb_14; > > + else > > + goto bb_7; > > + > > + bb_14: > > + goto bb_9; > > + > > + bb_7: > > + goto bb_5; > > + > > + bb_4: > > + p_14 = p_27 + 8ul; > > + p2_15 = p2_28 + 8ul; > > + i_16 = i_29 + 1; > > + if (p_limit2_11 < p2_15) > > + goto bb_11; > > + else > > + goto bb_8; > > + > > + bb_11: > > + goto bb_6; > > + > > + bb_8: > > + ; > > + > > + bb_5: > > + s_25 = __PHI (bb_7: 0l, bb_8: s_18); > > + p_27 = __PHI (bb_7: p_22, bb_8: p_14); > > + p2_28 = __PHI (bb_7: p2_23, bb_8: p2_15); > > + i_29 = __PHI (bb_7: 1, bb_8: i_16); > > + _3 = __MEM <long int> (p_27); > > + s_18 = _3 + s_25; > > + if (N1_13(D) > i_29) > > + goto bb_4; > > + else > > + goto bb_12; > > + > > + bb_12: > > + ; > > + > > + bb_6: > > + s_26 = __PHI (bb_12: s_18, bb_11: s_18, bb_9: 0l); > > + return s_26; > > +} > > + > > +/* { dg-final { scan-tree-dump-times "Replacing" 2 "ivopts"} } */ > > Index: gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-7.c > > =================================================================== > > --- gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-7.c (revision 259695) > > +++ gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-7.c (working copy) > > @@ -3,7 +3,7 @@ > > /* { dg-final { scan-tree-dump "Jumps threaded: 16" "thread1" } } */ > > /* { dg-final { scan-tree-dump "Jumps threaded: 9" "thread2" } } */ > > /* { dg-final { scan-tree-dump "Jumps threaded: 3" "thread3" } } */ > > -/* { dg-final { scan-tree-dump-not "Jumps threaded" "dom2" } } */ > > +/* { dg-final { scan-tree-dump "Jumps threaded: 1" "dom2" } } */ > > /* { dg-final { scan-tree-dump-not "Jumps threaded" "dom3" } } */ > > /* { dg-final { scan-tree-dump-not "Jumps threaded" "vrp2" } } */ > > > > Index: gcc/testsuite/gcc.dg/tree-ssa/20030710-1.c > > =================================================================== > > --- gcc/testsuite/gcc.dg/tree-ssa/20030710-1.c (revision 259695) > > +++ gcc/testsuite/gcc.dg/tree-ssa/20030710-1.c (working copy) > > @@ -1,5 +1,5 @@ > > /* { dg-do compile } */ > > -/* { dg-options "-O1 -fdump-tree-dom2" } */ > > +/* { dg-options "-O1 -fdump-tree-phicprop1" } */ > > > > extern void abort (void); > > extern void blah (void); > > @@ -42,14 +42,14 @@ record_component_aliases (type) > > /* The call to blah should have been eliminated. If the call is not > > eliminated, then dominator optimizations failed and it'll be > > impossible to delete other unnecessary code. */ > > -/* { dg-final { scan-tree-dump-not "blah \\(\\)" "dom2" } } */ > > +/* { dg-final { scan-tree-dump-not "blah \\(\\)" "phicprop1" } } */ > > > > /* There should be two IF conditionals. */ > > -/* { dg-final { scan-tree-dump-times "if " 2 "dom2"} } */ > > +/* { dg-final { scan-tree-dump-times "if " 2 "phicprop1"} } */ > > > > /* There should be a single load of type.binfo. */ > > -/* { dg-final { scan-tree-dump-times "type\\.binfo" 1 "dom2"} } */ > > +/* { dg-final { scan-tree-dump-times "type\\.binfo" 1 "phicprop1"} } */ > > > > /* There should be two loads of vec.length. */ > > -/* { dg-final { scan-tree-dump-times "vec.length" 2 "dom2"} } */ > > +/* { dg-final { scan-tree-dump-times "vec.length" 2 "phicprop1"} } */ > > > > Index: gcc/testsuite/gcc.dg/tree-ssa/20030711-1.c > > =================================================================== > > --- gcc/testsuite/gcc.dg/tree-ssa/20030711-1.c (revision 259695) > > +++ gcc/testsuite/gcc.dg/tree-ssa/20030711-1.c (working copy) > > @@ -44,12 +44,12 @@ record_component_aliases (type) > > /* The call to blah can not be eliminated. */ > > /* { dg-final { scan-tree-dump-times "blah \\(\\)" 1 "dom2" } } */ > > > > -/* There should be four IF conditionals. */ > > -/* { dg-final { scan-tree-dump-times "if " 4 "dom2"} } */ > > +/* There should be three IF conditionals. */ > > +/* { dg-final { scan-tree-dump-times "if " 3 "dom2"} } */ > > > > /* There should be two loads of type.binfo. */ > > /* { dg-final { scan-tree-dump-times "type\\.binfo" 2 "dom2"} } */ > > > > -/* There should be four loads of vec.length. */ > > -/* { dg-final { scan-tree-dump-times "vec.length" 4 "dom2"} } */ > > +/* There should be three loads of vec.length. */ > > +/* { dg-final { scan-tree-dump-times "vec.length" 3 "dom2"} } */ > > > > -- Richard Biener <rguent...@suse.de> SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg)