[Bug middle-end/55555] [4.8 Regression] miscompilation at -O2 (number_of_iterations)

2014-12-10 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=5
Bug 5 depends on bug 57511, which changed state.

Bug 57511 Summary: [4.8 Regression] Missing SCEV final value replacement
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=57511

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED


[Bug middle-end/55555] [4.8 Regression] miscompilation at -O2 (number_of_iterations)

2012-12-18 Thread rguenth at gcc dot gnu.org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=5



Richard Biener  changed:



   What|Removed |Added



 Status|ASSIGNED|RESOLVED

 Resolution||FIXED



--- Comment #13 from Richard Biener  2012-12-18 
13:29:52 UTC ---

Fixed.


[Bug middle-end/55555] [4.8 Regression] miscompilation at -O2 (number_of_iterations)

2012-12-18 Thread rguenth at gcc dot gnu.org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=5



--- Comment #12 from Richard Biener  2012-12-18 
13:12:39 UTC ---

Author: rguenth

Date: Tue Dec 18 13:12:34 2012

New Revision: 194578



URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=194578

Log:

2012-12-18  Richard Biener  



PR tree-optimization/5

* tree-ssa-loop-niter.c (idx_infer_loop_bounds): Properly

analyze evolution of the index for the loop it is used in.

* tree-scalar-evolution.c (instantiate_scev_name): Take

inner loop we will be creating a chrec for.  Generalize

fix for PR40281 and prune invalid SCEVs.

(instantiate_scev_poly): Likewise - pass down inner loop

we will be creating a chrec for.

(instantiate_scev_binary): Take and pass through inner loop.

(instantiate_array_ref): Likewise.

(instantiate_scev_convert): Likewise.

(instantiate_scev_not): Likewise.

(instantiate_scev_3): Likewise.

(instantiate_scev_2): Likewise.

(instantiate_scev_1): Likewise.

(instantiate_scev_r): Likewise.

(resolve_mixers): Adjust.

(instantiate_scev): Likewise.



* gcc.dg/torture/pr5.c: New testcase.

* gcc.dg/vect/vect-iv-11.c: Adjust.



Added:

trunk/gcc/testsuite/gcc.dg/torture/pr5.c

Modified:

trunk/gcc/ChangeLog

trunk/gcc/testsuite/ChangeLog

trunk/gcc/testsuite/gcc.dg/vect/vect-iv-11.c

trunk/gcc/tree-scalar-evolution.c

trunk/gcc/tree-ssa-loop-niter.c


[Bug middle-end/55555] [4.8 Regression] miscompilation at -O2 (number_of_iterations)

2012-12-17 Thread rguenth at gcc dot gnu.org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=5



--- Comment #11 from Richard Biener  2012-12-17 
12:14:18 UTC ---

Ok, I'm confused by the following:



  : (loop 1 header)

  # lxp_1 = PHI <0(2), lxp_24(12)>

  t_9 = pol_x[lxp_1];

  _10 = (long int) lxp_1;

  _11 = _10 * 4;

  l_12 = _11 + -1;

  goto ;





  :

  _15 = S_2 + l_12;

  _16 = coef_x[_15];

  _17 = S_2 + -1;

  _18 = s[_17];

  _19 = _18 * t_9;

  _20 = _16 + _19;

  coef_x[_15] = _20;

  S_22 = S_2 + 1;



  : (loop 2 header)

  # S_2 = PHI <1(3), S_22(4)>

  if (S_2 <= 4)

goto ;

  else

goto ;



  :

  lxp_24 = lxp_1 + 1;

  if (lxp_1 != 1)

goto ;

  else

goto ;



  :

  goto ;



and SCEV says _15 is {l_12 + 1, +, 1}_2 which looks correct.  But then it

does



(chrec_apply

  (varying_loop = 2)

  (chrec = {l_12 + 1, +, 1}_2)

  (x = 4)

  (res = l_12 + 5))



and magically, via l_12 being {-1, +, 4}_1 (also correct) arrives at



(instantiate_scev

  (instantiate_below = 2)

  (evolution_loop = 1)

  (chrec = {4, +, 4}_1)

  (res = {4, +, 4}_1))



huh?  So to it _15 is {4, +, 4}_1 (not sure what is considered "initial"

in terms of scalar evolution with a value varying in an inner loop).



This is what infer_loop_bounds_from_undefined derives the bogus bound

for loop 1 from.  To my eyes _15 should be {0, + 4}_1!



Maybe it doesn't really make sense to ask for the evolution of something

defined in loop N with respect to an outer loop M?



If we change idx_infer_loop_bounds with



Index: tree-ssa-loop-niter.c

===

--- tree-ssa-loop-niter.c   (revision 194552)

+++ tree-ssa-loop-niter.c   (working copy)

@@ -2671,7 +2671,12 @@ idx_infer_loop_bounds (tree base, tree *

   upper = false;

 }



-  ev = instantiate_parameters (loop, analyze_scalar_evolution (loop, *idx));

+  struct loop *dloop = loop_containing_stmt (data->stmt);

+  if (!dloop)

+return true;

+

+  ev = analyze_scalar_evolution (dloop, *idx);

+  ev = instantiate_parameters (loop, ev);

   init = initial_condition (ev);

   step = evolution_part_in_loop_num (ev, loop->num);



then we obtain via {l_12 + 1, +, 1}_2, {{0, +, 4}_1, +, 1}_2 the correct

solution (init == 0, step == 4).



I am going to bootstrap and regtest that.


[Bug middle-end/55555] [4.8 Regression] miscompilation at -O2 (number_of_iterations)

2012-12-17 Thread rguenth at gcc dot gnu.org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=5



Richard Biener  changed:



   What|Removed |Added



 Status|NEW |ASSIGNED

 AssignedTo|unassigned at gcc dot   |rguenth at gcc dot gnu.org

   |gnu.org |



--- Comment #10 from Richard Biener  2012-12-17 
10:14:37 UTC ---

I'll have a more detailled look.


[Bug middle-end/55555] [4.8 Regression] miscompilation at -O2 (number_of_iterations)

2012-12-14 Thread rguenth at gcc dot gnu.org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=5



--- Comment #9 from Richard Biener  2012-12-14 
14:11:59 UTC ---

The unrolling puts __builtin_unreachable ()s into the inner duplicated loops:



First iteration, good:



  :

  # lxp_30 = PHI <0(2)>

  t_32 = pol_x[lxp_30];

  _33 = (long int) lxp_30;

  _34 = _33 * 4;

  l_35 = _34 + -1;



  :

  # S_36 = PHI <1(16), S_45(18)>

  if (S_36 <= 4)

goto ;

  else

goto ;



  :

  _38 = S_36 + l_35;

  _39 = coef_x[_38];

  _40 = S_36 + -1;

  _41 = s[_40];

  _42 = _41 * t_32;

  _43 = _39 + _42;

  coef_x[_38] = _43;

  S_45 = S_36 + 1;

  goto ;



  :

  lxp_46 = lxp_30 + 1;



  :



Peeled iteration, bad:



  :

  # lxp_1 = PHI 

  t_9 = pol_x[lxp_1];

  _10 = (long int) lxp_1;

  _11 = _10 * 4;

  l_12 = _11 + -1;

  goto ;



  :

  _13 = S_3 + l_12;

  __builtin_unreachable ();

  _14 = coef_x[_13];

  _15 = S_3 + -1;

  _16 = s[_15];

  _17 = _16 * t_9;

  _18 = _14 + _17;

  __builtin_unreachable ();

  coef_x[_13] = _18;

  S_20 = S_3 + 1;



  :

  # S_3 = PHI <1(3), S_20(4)>

  if (S_3 <= 4)

goto ;

  else

goto ;



  :

  lxp_21 = lxp_1 + 1;

  if (1 == 0)

goto ;

  else

goto ;



  :



  :

  __builtin_unreachable ();



  :

  __asm__ __volatile__("" :  : "r" &coef_x : "memory");

  goto ;



Note that the outer loop body looks ok - it is the inner body that get's

mangled in a bogus way.



That's because the loop bounds derived from the inner loop accesses

are bogus:



remove_exits_and_undefined_stmts (loop=0x767ec770, npeeled=1)

at /space/rguenther/src/svn/trunk/gcc/tree-ssa-loop-ivcanon.c:481

481   bool changed = false;

(gdb) n

483   for (elt = loop->bounds; elt; elt = elt->next)

(gdb) 

488   if (!elt->is_exit

(gdb) p elt 

$8 = (nb_iter_bound *) 0x76925e38

(gdb) p *elt

$9 = {stmt = 0x76905c80, bound = {low = 0, high = 0}, is_exit = false, 

  next = 0x76925dc0}

(gdb) p elt->stmt

$10 = (gimple) 0x76905c80

(gdb) call debug_gimple_stmt ($10)

# .MEM_19 = VDEF <.MEM_6>

coef_x[_13] = _18;



as far as I can see even with lxp == 1 we have at most an index of 4 + 1*4 - 1.



Statement _14 = coef_x[_13];

 is executed at most 0 (bounded by 0) + 1 times in loop 1.



_that's_ bogus.


[Bug middle-end/55555] [4.8 Regression] miscompilation at -O2 (number_of_iterations)

2012-12-14 Thread rguenth at gcc dot gnu.org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=5



--- Comment #8 from Richard Biener  2012-12-14 
13:49:02 UTC ---

(In reply to comment #7)

> I've tried to rewrite this as C, but managed to turn it into something that is

> miscompiled at a different spot.  The fortran testcase starts having

> __builtin_unreachable () in it in the *.cunroll pass, this one already in

> *.cunrolli pass.  Still, I believe it doesn't do any out of bounds access

> anywhere.  -O2 on x86_64-linux.

> 

> double s[4] = { 1.0, 2.0, 3.0, 4.0 }, pol_x[2] = { 5.0, 6.0 };

> 

> __attribute__((noinline)) int

> foo (void)

> {

>   double coef_x[8] = { 0, 0, 0, 0, 0, 0, 0, 0 };

>   int lxp = 0;

>   if (lxp <= 1)

> do

>   {

> double t = pol_x[lxp];

> long S;

> long l = lxp * 4L - 1;

> for (S = 1; S <= 4; S++)

>   coef_x[S + l] = coef_x[S + l] + s[S - 1] * t;

>   }

> while (lxp++ != 1);

>   asm volatile ("" : : "r" (coef_x) : "memory");

>   for (lxp = 0; lxp < 8; lxp++)

> if (coef_x[lxp] != ((lxp & 3) + 1) * (5.0 + (lxp >= 4)))

>   __builtin_abort ();

>   return 1;

> }

> 

> int

> main ()

> {

>   asm volatile ("" : : : "memory");

>   if (!foo ())

> __builtin_abort ();

>   return 0;

> }

> 

> Works with r193067, fails with r193100, haven't tried to bisect exactly, but

> would guess this is r193098 again.  For the outer loop it prints:

> 

> Analyzing # of iterations of loop 1

>   exit condition [0, + , 1](no_overflow) != 1

>   bounds on difference of bases: 1 ... 1

>   result:

> # of iterations 1, bounded by 1

> Loop 1 iterates 1 times.

> Loop 1 iterates at most 1 times.

> 

> but that is wrong, the outer loop iterates exactly 2 times.



That's latch block executions, so one latch block execution is correct.


[Bug middle-end/55555] [4.8 Regression] miscompilation at -O2 (number_of_iterations)

2012-12-07 Thread jakub at gcc dot gnu.org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=5



Jakub Jelinek  changed:



   What|Removed |Added



   Priority|P3  |P1

Summary|[4.8 Regression]|[4.8 Regression]

   |miscompilation at -O2   |miscompilation at -O2

   |(tree-pre?) |(number_of_iterations)