[Bug tree-optimization/112282] [14 Regression] wrong code (generated code hangs) at -O3 on x86_64-linux-gnu since r14-4777-g88c27070c25309

2023-10-29 Thread sjames at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112282

Sam James  changed:

   What|Removed |Added

Summary|[14 Regression] wrong code  |[14 Regression] wrong code
   |(generated code hangs) at   |(generated code hangs) at
   |-O3 on x86_64-linux-gnu |-O3 on x86_64-linux-gnu
   ||since
   ||r14-4777-g88c27070c25309
 CC||tnfchris at gcc dot gnu.org

--- Comment #7 from Sam James  ---
88c27070c253094fb7e366583fbe09cec2371e8b is the first bad commit
commit 88c27070c253094fb7e366583fbe09cec2371e8b
Author: Tamar Christina 
Date:   Fri Oct 20 08:09:45 2023 +0100

ifcvt: Support bitfield lowering of multiple-exit loops

i.e. r14-4777-g88c27070c25309

[Bug tree-optimization/112282] [14 Regression] wrong code (generated code hangs) at -O3 on x86_64-linux-gnu since r14-4777-g88c27070c25309

2023-10-30 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112282

--- Comment #8 from Tamar Christina  ---
Thanks for the report, that's very odd..

It looks like loop control is broken and `u` never gets incremented.  It's even
more strange since the structures getting lowered are both unused so should not
have had any effect at all..

will take a look.

[Bug tree-optimization/112282] [14 Regression] wrong code (generated code hangs) at -O3 on x86_64-linux-gnu since r14-4777-g88c27070c25309

2023-10-30 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112282

Richard Biener  changed:

   What|Removed |Added

   Priority|P3  |P1
Version|unknown |14.0

[Bug tree-optimization/112282] [14 Regression] wrong code (generated code hangs) at -O3 on x86_64-linux-gnu since r14-4777-g88c27070c25309

2023-10-30 Thread avieira at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112282

avieira at gcc dot gnu.org changed:

   What|Removed |Added

 CC||avieira at gcc dot gnu.org

--- Comment #9 from avieira at gcc dot gnu.org ---
So I had a look at this and this is as far as I got.
It seems to get stuck in the 'for (u = -22; u < 2; ++u)' loop. It looks like
the loop IV never gets updated and it keeps looping.

Looking at the codegen it seems that cunroll decides to remove A LOT of code
and there is now:
bb 4:
..
# ivtmp_1055 = PHI 
..
bb 24:
...
ivtmp_1056 = ivtmp_1055 - 1;
  goto ; [100.00%]

I've not yet been able to figure out why this happens, the dumps weren't very
helpful. So I tried -fdisable-tree-cunroll, it was still failing. So I looked
at the dumps to try and see what was turning this loop into an infinite loop
and vrp2 shows me:
Global Exported: _19 = [irange] int [-21, 0]
Folding predicate _19 != 2 to 1

and in the dump before vrp2 we see:
  [local count: 7354175]:
  # u.13_485 = PHI <_19(105), -22(3)>
  # u_lsm.72_510 = PHI <_19(105), _497(D)(3)>
  # u_lsm_flag.73_235 = PHI <1(105), 0(3)>
...
   [local count: 6634488]:
  al ={v} {CLOBBER(eol)};
  _19 = u.13_485 + 1;
  if (_19 != 2)
goto ; [96.34%]
  else
goto ; [3.66%]

   [local count: 6391666]:
  goto ; [100.00%]

Something to point out here, that u_lsm.72_510 seems odd. It is used to set
global 'u', but its initialized with _497(D) which is undefined... So that
itself seems wrong to me too... I'll try and find out what's causing that
codegen next. Maybe that can explain why the irange for _19 is so wrong here.

[Bug tree-optimization/112282] [14 Regression] wrong code (generated code hangs) at -O3 on x86_64-linux-gnu since r14-4777-g88c27070c25309

2023-10-31 Thread avieira at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112282

--- Comment #10 from avieira at gcc dot gnu.org ---
So I had a look at that u_lsm.72_510 variable and it's only undefined if we
don't loop, but if we don't loop then u_lsm_flag is set to 0 and we don't use
u_lsm. So it's OK. I also checked and the early exits are covered by the same
mechanism.
So really the question is, why does irange think the range is [-21, 0]. Anyone
have an idea of how to debug this?

[Bug tree-optimization/112282] [14 Regression] wrong code (generated code hangs) at -O3 on x86_64-linux-gnu since r14-4777-g88c27070c25309

2023-10-31 Thread avieira at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112282

--- Comment #11 from avieira at gcc dot gnu.org ---
So I had a look at that u_lsm.72_510 variable and it's only undefined if we
don't loop, but if we don't loop then u_lsm_flag is set to 0 and we don't use
u_lsm. So it's OK. I also checked and the early exits are covered by the same
mechanism.
So really the question is, why does irange think the range is [-21, 0]. Anyone
have an idea of how to debug this?

[Bug tree-optimization/112282] [14 Regression] wrong code (generated code hangs) at -O3 on x86_64-linux-gnu since r14-4777-g88c27070c25309

2023-11-14 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112282

--- Comment #12 from Richard Biener  ---
+  _ifc__797 = al[0].D.2794;
+  _ifc__796 = BIT_INSERT_EXPR <_ifc__797, 7, 0 (20 bits)>;
+  _ifc__798 = BIT_INSERT_EXPR <_ifc__796, 9, 32 (20 bits)>;
+  _ifc__800 = BIT_INSERT_EXPR <_ifc__798, 8, 52 (10 bits)>;
+  al[0].D.2794 = _ifc__800;

from

  al[0].c = 2;
  al[0].d = 7;
  al[0].e = 9;

which looks OK.  We then vectorize the first part:

  vect__ifc__797.86_709 = MEM  [(struct b *)&al];
  vect__ifc__797.87_708 = MEM  [(struct b *)&al +
16B];
  vect__ifc__797.88_707 = MEM  [(struct b *)&al +
32B];
  vect_patt_751.89_706 = vect__ifc__797.86_709 & { 18446744073708503040,
18446744073708503040 };
...
  _ifc__830 = BIT_INSERT_EXPR <_ifc__828, 2, 52 (10 bits)>;
  MEM  [(struct b *)&al] = vect_patt_746.94_691;
  MEM  [(struct b *)&al + 16B] =
vect_patt_746.94_690;
  MEM  [(struct b *)&al + 32B] =
vect_patt_746.94_689;
  _ifc__833 = al[6].D.2794;
  _ifc__832 = BIT_INSERT_EXPR <_ifc__833, 7, 0 (20 bits)>;
  _ifc__834 = BIT_INSERT_EXPR <_ifc__832, 9, 32 (20 bits)>;
  _ifc__836 = BIT_INSERT_EXPR <_ifc__834, 0, 52 (10 bits)>;
  al[6].D.2794 = _ifc__836;

leaving around the last element unvectorized.  But later this is all
dead code it seems?  IL can be simplified with -fno-unswitch-loops.

Btw, I see we lower bitfields in a loop we are not if-converting fully
in the end - was that desired?  Looks like so, this is a multi-exit loop
which we don't if-convert but lower bitfields in?  It has complicated
control flow that would make early-break vectorization not successful
at least.

Not sure what goes wrong.

[Bug tree-optimization/112282] [14 Regression] wrong code (generated code hangs) at -O3 on x86_64-linux-gnu since r14-4777-g88c27070c25309

2023-11-15 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112282

Richard Biener  changed:

   What|Removed |Added

   Assignee|unassigned at gcc dot gnu.org  |rguenth at gcc dot 
gnu.org
 Status|NEW |ASSIGNED

--- Comment #13 from Richard Biener  ---
So I can cut off bitfield lowering completely, the important part is that we
version the loop and thus try to BB vectorize the loop header (yeah, we don't
BB vectorize the whole body - or rather, we think the header _is_ the fully
body).

But a key to the failure seems to be that we BB vectorize the unrolled

  for (; ac < 1; ac++)
for (k = 0; k < 9; k++)
  am[k] = 0;

and doing that not from SLP but from loop vectorization of if-conversion
versioned (but otherwise unchanged) loop.

It's also solely triggered by unrolling the 'z' loop.  Disabling all
following passes will still reproduce it.  The region VN triggered by
ifconversion/vectorization/unrolling isn't needed either (I disabled it).

Maybe PR111572 is related (but it doesn't change unrolling and disabling
ch_vect doesn't avoid the problem).

Unrolling does

Analyzing # of iterations of loop 2
  exit condition [23, + , 4294967295] != 0
  bounds on difference of bases: -23 ... -23
  result:
# of iterations 23, bounded by 23
Removed pointless exit: if (ivtmp_1055 != 0)

because we computed loop->nb_iterations_upper_bound to 21:

Statement (exit)if (ivtmp_1055 != 0)
 is executed at most 23 (bounded by 23) + 1 times in loop 2.
Induction variable (int) 21 + -1 * iteration does not wrap in statement _4 =
~u.13_485;
 in loop 2.
Statement _4 = ~u.13_485;
 is executed at most 21 (bounded by 21) + 1 times in loop 2.
Induction variable (int) -21 + 1 * iteration does not wrap in statement _19 =
u.13_485 + 1;
 in loop 2.
Statement _19 = u.13_485 + 1;
 is executed at most 23 (bounded by 23) + 1 times in loop 2.
Reducing loop iteration estimate by 1; undefined statement must be executed at
the last iteration.

we're SCEV analyzing _4 here, computing {21, +, -1}_2 and VRP1 computed
[irange] int [0, +INF] somehow for it.  u.13_485 has a global range of
[-2147483647, 1], so obviously it must infer sth else here somehow and
wrongly so?

That very same def also appears with plain -O3.

Global Exported: _4 = [irange] int [0, +INF]

Hmm.  We have

Folding statement: _64 = ~u.13_20;
Global Exported: _64 = [irange] int [-2, -1] MASK 0x1 VALUE 0xfffe

Folding statement: _4 = ~u.13_20;
Global Exported: _4 = [irange] int [0, +INF]

but the if-conversion pass hoists that before the .LOOP_VECTORIZED

properly resetting flow-sensitive info on stmts hoisted fixes this.

Meh.

Premature duplicate transforms ...

[Bug tree-optimization/112282] [14 Regression] wrong code (generated code hangs) at -O3 on x86_64-linux-gnu since r14-4777-g88c27070c25309

2023-11-15 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112282

--- Comment #14 from Richard Biener  ---
Well, but then I still question the causing rev. - we are only performing
bitfield lowering but not any if-conversion.  IMHO the rev is a hack.

Not sure if I want to add another hack to fix this miscompile?

The change as-is behaves totally un-intuitively, we can't easily detect
half-if-converted cases and require loop vectorization like we can when
.MASK_LOAD/STORE appear.

Testing a patch.

[Bug tree-optimization/112282] [14 Regression] wrong code (generated code hangs) at -O3 on x86_64-linux-gnu since r14-4777-g88c27070c25309

2023-11-15 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112282

--- Comment #15 from CVS Commits  ---
The master branch has been updated by Richard Biener :

https://gcc.gnu.org/g:5cb8610d3a8f8849a4bb6a0f81a2934484d6a15a

commit r14-5493-g5cb8610d3a8f8849a4bb6a0f81a2934484d6a15a
Author: Richard Biener 
Date:   Wed Nov 15 12:24:46 2023 +0100

tree-optimization/112282 - wrong-code with ifcvt hoisting

The following avoids hoisting of invariants from conditionally
executed parts of an if-converted loop.  That now makes a difference
since we perform bitfield lowering even when we do not actually
if-convert the loop.  if-conversion deals with resetting flow-sensitive
info when necessary already.

PR tree-optimization/112282
* tree-if-conv.cc (ifcvt_hoist_invariants): Only hoist from
the loop header.

* gcc.dg/torture/pr112282.c: New testcase.

[Bug tree-optimization/112282] [14 Regression] wrong code (generated code hangs) at -O3 on x86_64-linux-gnu since r14-4777-g88c27070c25309

2023-11-15 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112282

Richard Biener  changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

--- Comment #16 from Richard Biener  ---
Fixed.

[Bug tree-optimization/112282] [14 Regression] wrong code (generated code hangs) at -O3 on x86_64-linux-gnu since r14-4777-g88c27070c25309

2023-11-15 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112282

--- Comment #17 from CVS Commits  ---
The master branch has been updated by Richard Biener :

https://gcc.gnu.org/g:31bf21c78029434b7515a94477ce3565bff0743f

commit r14-5517-g31bf21c78029434b7515a94477ce3565bff0743f
Author: Richard Biener 
Date:   Thu Nov 16 08:03:55 2023 +0100

tree-optimization/112282 - fix testcase

Avoid requiring a glibc specific symbol.

PR tree-optimization/112282
* gcc.dg/torture/pr112282.c: Do not use __assert_fail.