[Bug tree-optimization/114403] [14 regression] LLVM miscompiled with -O3 -march=znver2 -fno-vect-cost-model since r14-6822-g01f4251b8775c8

2024-04-16 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114403

--- Comment #30 from GCC Commits  ---
The master branch has been updated by Tamar Christina :

https://gcc.gnu.org/g:f438acf7ce2e6cb862cf62f2543c36639e2af233

commit r14-9997-gf438acf7ce2e6cb862cf62f2543c36639e2af233
Author: Tamar Christina 
Date:   Tue Apr 16 20:56:26 2024 +0100

testsuite: Fix data check loop on vect-early-break_124-pr114403.c

The testcase had the wrong indices in the buffer check loop.

gcc/testsuite/ChangeLog:

PR tree-optimization/114403
* gcc.dg/vect/vect-early-break_124-pr114403.c: Fix check loop.

[Bug tree-optimization/114403] [14 regression] LLVM miscompiled with -O3 -march=znver2 -fno-vect-cost-model since r14-6822-g01f4251b8775c8

2024-04-15 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114403

Tamar Christina  changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

--- Comment #29 from Tamar Christina  ---
Fixed, thanks for the report!

[Bug tree-optimization/114403] [14 regression] LLVM miscompiled with -O3 -march=znver2 -fno-vect-cost-model since r14-6822-g01f4251b8775c8

2024-04-15 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114403

--- Comment #28 from GCC Commits  ---
The master branch has been updated by Tamar Christina :

https://gcc.gnu.org/g:85002f8085c25bb3e74ab013581a74e7c7ae006b

commit r14-9969-g85002f8085c25bb3e74ab013581a74e7c7ae006b
Author: Tamar Christina 
Date:   Mon Apr 15 12:06:21 2024 +0100

middle-end: adjust loop upper bounds when peeling for gaps and early break
[PR114403].

This fixes a bug with the interaction between peeling for gaps and early
break.

Before I go further, I'll first explain how I understand this to work for
loops
with a single exit.

When peeling for gaps we peel N < VF iterations to scalar.
This happens by removing N iterations from the calculation of niters such
that
vect_iters * VF == niters is always false.

In other words, when we exit the vector loop we always fall to the scalar
loop.
The loop bounds adjustment guarantees this. Because of this we potentially
execute a vector loop iteration less.  That is, if you're at the boundary
condition where niters % VF by peeling one or more scalar iterations the
vector
loop executes one less.

This is accounted for by the adjustments in vect_transform_loops.  This
adjustment happens differently based on whether the the vector loop can be
partial or not:

Peeling for gaps sets the bias to 0 and then:

when not partial:  we take the floor of (scalar_upper_bound / VF) - 1 to
get the
   vector latch iteration count.

when loop is partial:  For a single exit this means the loop is masked, we
take
   the ceil to account for the fact that the loop can
handle
   the final partial iteration using masking.

Note that there's no difference between ceil an floor on the boundary
condition.
There is a difference however when you're slightly above it. i.e. if scalar
iterates 14 times and VF = 4 and we peel 1 iteration for gaps.

The partial loop does ((13 + 0) / 4) - 1 == 2 vector iterations. and in
effect
the partial iteration is ignored and it's done as scalar.

This is fine because the niters modification has capped the vector
iteration at
2.  So that when we reduce the induction values you end up entering the
scalar
code with ind_var.2 = ind_var.1 + 2 * VF.

Now lets look at early breaks.  To make it esier I'll focus on the specific
testcase:

char buffer[64];

__attribute__ ((noipa))
buff_t *copy (buff_t *first, buff_t *last)
{
  char *buffer_ptr = buffer;
  char *const buffer_end = [SZ-1];
  int store_size = sizeof(first->Val);
  while (first != last && (buffer_ptr + store_size) <= buffer_end)
{
  const char *value_data = (const char *)(>Val);
  __builtin_memcpy(buffer_ptr, value_data, store_size);
  buffer_ptr += store_size;
  ++first;
}

  if (first == last)
return 0;

  return first;
}

Here the first, early exit is on the condition:

  (buffer_ptr + store_size) <= buffer_end

and the main exit is on condition:

  first != last

This is important, as this bug only manifests itself when the first exit
has a
known constant iteration count that's lower than the latch exit count.

because buffer holds 64 bytes, and VF = 4, unroll = 2, we end up processing
16
bytes per iteration.  So the exit has a known bounds of 8 + 1.

The vectorizer correctly analizes this:

Statement (exit)if (ivtmp_21 != 0)
 is executed at most 8 (bounded by 8) + 1 times in loop 1.

and as a consequence the IV is bound by 9:

  # vect_vec_iv_.14_117 = PHI <_118(9), { 9, 8, 7, 6 }(20)>
  ...
  vect_ivtmp_21.16_124 = vect_vec_iv_.14_117 + { 18446744073709551615,
18446744073709551615, 18446744073709551615, 18446744073709551615 };
  mask_patt_22.17_126 = vect_ivtmp_21.16_124 != { 0, 0, 0, 0 };
  if (mask_patt_22.17_126 == { -1, -1, -1, -1 })
goto ; [88.89%]
  else
goto ; [11.11%]

The imporant bits are this:

In this example the value of last - first = 416.

the calculated vector iteration count, is:

x = (((ptr2 - ptr1) - 16) / 16) + 1 = 27

the bounds generated, adjusting for gaps:

   x == (((x - 1) >> 2) << 2)

which means we'll always fall through to the scalar code. as intended.

Here are two key things to note:

1. In this loop, the early exit will always be the one taken.  When it's
taken
   we enter the scalar loop with the correct induction value to apply the
gap
   peeling.

2. If the main exit is taken, the induction values assumes you've finished
all
   vector iterations.  i.e. it assumes you have completed 24 iterations, as
we
   treat the main exit the same for normal loop vect and early break when
not
   PEELED.
   This means the induction value is adjusted to ind_var.2 = ind_var.1 + 24
* VF;

 

[Bug tree-optimization/114403] [14 regression] LLVM miscompiled with -O3 -march=znver2 -fno-vect-cost-model since r14-6822-g01f4251b8775c8

2024-04-12 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114403

--- Comment #27 from Richard Biener  ---
I think that adjusting an existing upper bound by -1 because of gap peeling
is wrong when that upper bound may not apply to the IV exit.  Because gap
peeling only affects the IV exit test and not the early exit test.

[Bug tree-optimization/114403] [14 regression] LLVM miscompiled with -O3 -march=znver2 -fno-vect-cost-model since r14-6822-g01f4251b8775c8

2024-04-12 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114403

--- Comment #26 from Tamar Christina  ---
(In reply to Richard Biener from comment #25)
> That means, when the loop takes the early exit we _must_ take that during
> the vector iterations.  Peeling for gaps means if we would take the early
> exit during one of the gap peeled iterations this is a conflicting
> requirement.
> Now - the current analysis guarantees that the early exit conditions can
> be safely evaluated even for the gap iterations, but not the following
> code when the early exit is _not_ taken.
> 
> So peeling for gaps and early exit vect are not compatible?

I don't see why not, as my email explains for the early exits we always go
to the scalar loop, which already adheres to the condition of peeling for
gaps.

I just think that peeling for gaps should not force it to exit from the main
exit.

[Bug tree-optimization/114403] [14 regression] LLVM miscompiled with -O3 -march=znver2 -fno-vect-cost-model since r14-6822-g01f4251b8775c8

2024-04-12 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114403

Richard Biener  changed:

   What|Removed |Added

 CC||rsandifo at gcc dot gnu.org

--- Comment #25 from Richard Biener  ---
That means, when the loop takes the early exit we _must_ take that during
the vector iterations.  Peeling for gaps means if we would take the early
exit during one of the gap peeled iterations this is a conflicting requirement.
Now - the current analysis guarantees that the early exit conditions can
be safely evaluated even for the gap iterations, but not the following
code when the early exit is _not_ taken.

So peeling for gaps and early exit vect are not compatible?

[Bug tree-optimization/114403] [14 regression] LLVM miscompiled with -O3 -march=znver2 -fno-vect-cost-model since r14-6822-g01f4251b8775c8

2024-04-12 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114403

--- Comment #24 from Tamar Christina  ---
(In reply to Richard Biener from comment #23)
> Maybe easier to understand testcase:
> 
> with -O3 -msse4.1 -fno-vect-cost-model we return 20 instead of 8.  Adding
> -fdisable-tree-cunroll avoids the issue.  The upper bound we set on the
> vector loop causes us to force taking the IV exit which continues
> with i == (niter - 1) / VF * VF, but 'niter' is 20 here.

yes,indeed, that's what my patch was arguing last time, but I didn't explain it
well enough.

I'm about to send out v2 (waiting for regtest to finish) which hopefully
articulates this better.

[Bug tree-optimization/114403] [14 regression] LLVM miscompiled with -O3 -march=znver2 -fno-vect-cost-model since r14-6822-g01f4251b8775c8

2024-04-12 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114403

--- Comment #23 from Richard Biener  ---
Maybe easier to understand testcase:

long x[9];
long a[20];
struct { long x; long b[40]; } b;
int __attribute__((noipa))
foo (int n)
{
  int i = 0;
  int k = 0;
  do
{
  if (x[k++])  // early exit, loop upper bound is 8 because of this
break;
  a[i] = b.b[2*i]; // the misaligned 2*i access causes peeling for gaps
}
  while (++i < n);
  return i;
}

int main()
{
  x[8] = 1;
  if (foo (20) != 8)
__builtin_abort ();
  return 0;
}

with -O3 -msse4.1 -fno-vect-cost-model we return 20 instead of 8.  Adding
-fdisable-tree-cunroll avoids the issue.  The upper bound we set on the
vector loop causes us to force taking the IV exit which continues
with i == (niter - 1) / VF * VF, but 'niter' is 20 here.

[Bug tree-optimization/114403] [14 regression] LLVM miscompiled with -O3 -march=znver2 -fno-vect-cost-model since r14-6822-g01f4251b8775c8

2024-04-11 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114403

--- Comment #22 from Tamar Christina  ---
note that due to the secondary exit the actual full vector iteration count is 8
scalar elements at VF=4 == 2.

And it's this boundary condition where we fail, since ceil (8/4) == 2. any
other value would have done the partial vector iteration.

Basically final_iter_may_be_partial ends up being ignored.

[Bug tree-optimization/114403] [14 regression] LLVM miscompiled with -O3 -march=znver2 -fno-vect-cost-model since r14-6822-g01f4251b8775c8

2024-04-11 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114403

--- Comment #21 from Tamar Christina  ---
Created attachment 57932
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=57932=edit
loop.c

attached reduced testcase that reproduces the issue and also checks the buffer
position and copied values.

As discussed on IRC when peeling for gaps we need to either adjust the upper
bounds of the vector loop or force the vector loop to get to the scalar loop. 
However we already go to the scalar loop, just with the wrong induction value
because we were never supposed to take the main exit.

whether go to the scalar loop depends on
x = (((ptr2 - ptr1) - 16) / 16) + 1
x == (((x - 1) >> 2) << 2)

in this case x == 26, so we do go to the scalar code already, but through the
main exit.

exiting through the main exit assumes you've done all vector iterations, in
this case 6 iterations based on the main exit condition which is first != last.

In this case the inductions values will be set on niters_vector_mult.

so in this case first += 24

But that's wrong since the secondary exit has a known iteration count of 9, due
to (buffer_ptr + store_size) <= buffer_end.

Statement (exit)if (ivtmp_21 != 0)
 is executed at most 8 (bounded by 8) + 1 times in loop 1.

So we will always exit through it as 9 < 24.

that means that when we calculate the upper bounds of the vector loop, we must
add a bias so that in this boundary condition that we do an extra partial
vector iteration.

I think the discussion on IRC went off track for a bit and hopefully this
testcase and the explanation above shows that for all early break and all
epilogue peeling reasons, we must bias up for the upper bound to give the
secondary exits a chance to trigger.

So really do think the correct patch is:

diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index 4375ebdcb49..0973b952c70 100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -12144,6 +12144,9 @@ vect_transform_loop (loop_vec_info loop_vinfo, gimple
*loop_vectorized_call)
  -min_epilogue_iters to remove iterations that cannot be performed
by the vector code.  */
   int bias_for_lowest = 1 - min_epilogue_iters;
+  if (LOOP_VINFO_EARLY_BREAKS (loop_vinfo))
+bias_for_lowest = 1;
+
   int bias_for_assumed = bias_for_lowest;
   int alignment_npeels = LOOP_VINFO_PEELING_FOR_ALIGNMENT (loop_vinfo);
   if (alignment_npeels && LOOP_VINFO_USING_PARTIAL_VECTORS_P (loop_vinfo))

for the reasons described above.  There's no way for us to take the main exit,
which signifies (we've reached the end of all iterations we can possibly do as
vector) and get the correct induction values in this case.

[Bug tree-optimization/114403] [14 regression] LLVM miscompiled with -O3 -march=znver2 -fno-vect-cost-model since r14-6822-g01f4251b8775c8

2024-04-02 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114403

--- Comment #20 from Tamar Christina  ---
This is a bad interaction with early break and peeling for gaps.

when peeling for gaps we set bias_for_lowest to 0, which then negates the ceil
for the upper bound calculation when the div is exact.

We end up doing on a loop that does:

Analyzing # of iterations of loop 1
  exit condition [8, + , 18446744073709551615] != 0
  bounds on difference of bases: -8 ... -8
  result:
# of iterations 8, bounded by 8

and a VF=4 calculating:

Loop 1 iterates at most 1 times.
Loop 1 likely iterates at most 1 times.
Analyzing # of iterations of loop 1
  exit condition [1, + , 1](no_overflow) < bnd.5505_39
  bounds on difference of bases: 0 ... 4611686018427387902
Matching expression match.pd:2011, generic-match-8.cc:27
Applying pattern match.pd:2067, generic-match-1.cc:4813
  result:
# of iterations bnd.5505_39 + 18446744073709551615, bounded by
4611686018427387902
Estimating sizes for loop 1
...
   Induction variable computation will be folded away.
  size:   2 if (ivtmp_312 < bnd.5505_39)
   Exit condition will be eliminated in last copy.
size: 24-3, last_iteration: 24-5
  Loop size: 24
  Estimated size after unrolling: 26
;; Guessed iterations of loop 1 is 0.858446. New upper bound 1.

upper bound should be 2 not 1. I have a working patch, trying to create a
standalone testcase for it.

[Bug tree-optimization/114403] [14 regression] LLVM miscompiled with -O3 -march=znver2 -fno-vect-cost-model since r14-6822-g01f4251b8775c8

2024-04-02 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114403

Tamar Christina  changed:

   What|Removed |Added

 Status|UNCONFIRMED |ASSIGNED
   Last reconfirmed||2024-04-02
   Assignee|unassigned at gcc dot gnu.org  |tnfchris at gcc dot 
gnu.org
 Ever confirmed|0   |1

--- Comment #19 from Tamar Christina  ---
Thanks! back from holidays and looking into it now.

mine.

[Bug tree-optimization/114403] [14 regression] LLVM miscompiled with -O3 -march=znver2 -fno-vect-cost-model since r14-6822-g01f4251b8775c8

2024-03-26 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114403

--- Comment #18 from Richard Biener  ---
Just as hint we've had wrong upper bounds on vectorized loops/epilogues which
would trigger wrong unrolling.  But then unrolling also always hints as
eventually having wrong range-info.

[Bug tree-optimization/114403] [14 regression] LLVM miscompiled with -O3 -march=znver2 -fno-vect-cost-model since r14-6822-g01f4251b8775c8

2024-03-22 Thread sjames at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114403

--- Comment #17 from Sam James  ---
Created attachment 57780
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=57780=edit
EarlyCSE.cpp.cpp.182t.cunroll-bad

[Bug tree-optimization/114403] [14 regression] LLVM miscompiled with -O3 -march=znver2 -fno-vect-cost-model since r14-6822-g01f4251b8775c8

2024-03-22 Thread sjames at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114403

--- Comment #16 from Sam James  ---
-fdisable-tree-cunroll seems to help.

[Bug tree-optimization/114403] [14 regression] LLVM miscompiled with -O3 -march=znver2 -fno-vect-cost-model since r14-6822-g01f4251b8775c8

2024-03-22 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114403

--- Comment #15 from Richard Biener  ---
The valgrind output might be because we vectorize the loads a[i], a[i+8], ...
as full vector loads at a[i], a[i+8] but the last we access as scalar.  So
the uninit load might be harmless.

[Bug tree-optimization/114403] [14 regression] LLVM miscompiled with -O3 -march=znver2 -fno-vect-cost-model since r14-6822-g01f4251b8775c8

2024-03-22 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114403

--- Comment #14 from Richard Biener  ---
There are a few vectorizations in the dumps but only one early-exit where
we vectorize

   [local count: 102053600]:
  first$I_39 = MEM[(struct value_op_iterator *)];
  last$I_40 = MEM[(struct value_op_iterator *)];
  seed_15 = llvm::hashing::detail::get_execution_seed ();
  if (first$I_39 != last$I_40)
goto ; [94.50%]

   [local count: 96440652]:

   [local count: 179229733]:
  # buffer_ptr_22 = PHI <_20(24), (22)>
  # first$I_24 = PHI <_29(24), first$I_39(22)>
  # ivtmp_226 = PHI 
  _20 = buffer_ptr_22 + 8;
  ivtmp_216 = ivtmp_226 - 1;
  if (ivtmp_216 == 0)
goto ; [51.12%]
  else
goto ; [48.88%]

   [local count: 87607493]:
  _30 = MEM[(const struct Use *)first$I_24].Val;
  _35 = (unsigned long) _30;
  MEM  [(char * {ref-all})buffer_ptr_22] = _35;
  _29 = first$I_24 + 32;
  if (_29 != last$I_40)
goto ; [94.50%]
  else
goto ; [5.50%]

   [local count: 82789081]:
  goto ; [100.00%]

   [local count: 96440652]:
  # buffer_ptr_248 = PHI <_20(4), buffer_ptr_22(3)>
  # first$I_175 = PHI 
  if (last$I_40 == first$I_175)
...

as far as I can see that's a non-peeled case and from what I see it looks
OK how we process that.

[Bug tree-optimization/114403] [14 regression] LLVM miscompiled with -O3 -march=znver2 -fno-vect-cost-model since r14-6822-g01f4251b8775c8

2024-03-22 Thread sjames at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114403

--- Comment #13 from Sam James  ---
Created attachment 5
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=5=edit
valgrind output when broken

[Bug tree-optimization/114403] [14 regression] LLVM miscompiled with -O3 -march=znver2 -fno-vect-cost-model since r14-6822-g01f4251b8775c8

2024-03-22 Thread sjames at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114403

--- Comment #12 from Sam James  ---
Created attachment 57776
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=57776=edit
EarlyCSE.cpp.cpp.179t.vect-bad

[Bug tree-optimization/114403] [14 regression] LLVM miscompiled with -O3 -march=znver2 -fno-vect-cost-model since r14-6822-g01f4251b8775c8

2024-03-22 Thread sjames at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114403

--- Comment #11 from Sam James  ---
Created attachment 57775
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=57775=edit
EarlyCSE.cpp.cpp.178t.ifcvt-bad

[Bug tree-optimization/114403] [14 regression] LLVM miscompiled with -O3 -march=znver2 -fno-vect-cost-model since r14-6822-g01f4251b8775c8

2024-03-22 Thread sjames at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114403

--- Comment #10 from Sam James  ---
Created attachment 57774
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=57774=edit
EarlyCSE.cpp.cpp.177t.ch_vect-bad

optimize("O2") on `template 
hash_code hash_combine_range_impl(InputIteratorT first, InputIteratorT last)
works,` but O3 is broken.

Unfortunately, novector pragmas don't work on the while()s in there. I get a
ignored warning.

Attached those dumps w/ -fdbg-cnt=vect_loop:7 (so just the one bad loop). I can
tarball up the 6 vs 7 if useful.

Thanks. Will try disabling those passes next..

[Bug tree-optimization/114403] [14 regression] LLVM miscompiled with -O3 -march=znver2 -fno-vect-cost-model since r14-6822-g01f4251b8775c8

2024-03-22 Thread law at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114403

Jeffrey A. Law  changed:

   What|Removed |Added

   Priority|P3  |P1
 CC||law at gcc dot gnu.org

[Bug tree-optimization/114403] [14 regression] LLVM miscompiled with -O3 -march=znver2 -fno-vect-cost-model since r14-6822-g01f4251b8775c8

2024-03-22 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114403

--- Comment #9 from Richard Biener  ---
Nothing obviously suspicious here ... I wonder if you can attach
177t.ch_vect, 178t.ifcvt and 179t.vect for the case with the single vectorized
bad loop?

Maybe we're running into a latent issue downstream?  What happens if you
disable most followup passes?

[Bug tree-optimization/114403] [14 regression] LLVM miscompiled with -O3 -march=znver2 -fno-vect-cost-model since r14-6822-g01f4251b8775c8

2024-03-22 Thread sjames at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114403

--- Comment #8 from Sam James  ---
Created attachment 57770
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=57770=edit
EarlyCSE.cpp.cpp.179t.vect.diff

(In reply to Sam James from comment #7)
> I'll go back to trying to see which specific loop it is.

tamar and richi both suggested separately debug counters.

lbound: 6
ubound: 7

Attached the diff for EarlyCSE.cpp.cpp.179t.vect.

Further suggestions?

[Bug tree-optimization/114403] [14 regression] LLVM miscompiled with -O3 -march=znver2 -fno-vect-cost-model since r14-6822-g01f4251b8775c8

2024-03-22 Thread sjames at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114403

--- Comment #7 from Sam James  ---
I'll go back to trying to see which specific loop it is.

[Bug tree-optimization/114403] [14 regression] LLVM miscompiled with -O3 -march=znver2 -fno-vect-cost-model since r14-6822-g01f4251b8775c8

2024-03-22 Thread sjames at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114403

--- Comment #6 from Sam James  ---
Modifying llvm/include/llvm/ADT/iterator.h like so helps (!):
```
#pragma GCC push_options
#pragma GCC optimize ("O0")
  friend bool operator==(const iterator_adaptor_base ,
 const iterator_adaptor_base ) {
return LHS.I == RHS.I;
  }
#pragma GCC pop_options
```

[Bug tree-optimization/114403] [14 regression] LLVM miscompiled with -O3 -march=znver2 -fno-vect-cost-model since r14-6822-g01f4251b8775c8

2024-03-20 Thread sjames at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114403

--- Comment #5 from Sam James  ---
I'm narrowing it down in there, currently several headers deep. I'll finish
that tomorrow.

[Bug tree-optimization/114403] [14 regression] LLVM miscompiled with -O3 -march=znver2 -fno-vect-cost-model since r14-6822-g01f4251b8775c8

2024-03-20 Thread sjames at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114403

--- Comment #4 from Sam James  ---
Created attachment 57752
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=57752=edit
EarlyCSE.cpp.ii.xz

The bad object seems to be EarlyCSE.cpp.o. Building it with -O0 makes things
work.

[Bug tree-optimization/114403] [14 regression] LLVM miscompiled with -O3 -march=znver2 -fno-vect-cost-model since r14-6822-g01f4251b8775c8

2024-03-20 Thread sjames at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114403

Sam James  changed:

   What|Removed |Added

Summary|[14 regression] LLVM|[14 regression] LLVM
   |miscompiled with -O3|miscompiled with -O3
   |-march=znver2   |-march=znver2
   |-fno-vect-cost-model|-fno-vect-cost-model since
   ||r14-6822-g01f4251b8775c8
 CC||tnfchris at gcc dot gnu.org

--- Comment #3 from Sam James  ---
r14-6822-g01f4251b8775c8

so far, isolating it is a pain because sometimes llvm-tblgen will segfault
during the build (it's built-and-then-run to generate machine descriptions).