[Bug target/107160] [13 regression] r13-2641-g0ee1548d96884d causes verification failure in spec2006

2022-10-13 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107160

--- Comment #16 from CVS Commits  ---
The master branch has been updated by Richard Biener :

https://gcc.gnu.org/g:5cbaf84c191b9a3e3cb26545c808d208bdbf2ab5

commit r13-3273-g5cbaf84c191b9a3e3cb26545c808d208bdbf2ab5
Author: Richard Biener 
Date:   Thu Oct 13 14:24:05 2022 +0200

tree-optimization/107160 - avoid reusing multiple accumulators

Epilogue vectorization is not set up to re-use a vectorized
accumulator consisting of more than one vector.  For non-SLP
we always reduce to a single but for SLP that isn't happening.
In such case we currenlty miscompile the epilog so avoid this.

PR tree-optimization/107160
* tree-vect-loop.cc (vect_create_epilog_for_reduction):
Do not register accumulator if we failed to reduce it
to a single vector.

* gcc.dg/vect/pr107160.c: New testcase.

[Bug target/107160] [13 regression] r13-2641-g0ee1548d96884d causes verification failure in spec2006

2022-10-13 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107160

Richard Biener  changed:

   What|Removed |Added

   Assignee|linkw at gcc dot gnu.org   |rguenth at gcc dot 
gnu.org
 Status|NEW |ASSIGNED

--- Comment #15 from Richard Biener  ---
I think that either vect_find_reusable_accumulator should punt on this or
vect_create_epilog_for_reduction shouldn't register this accumulator.
The caller of vect_find_reusable_accumulator checks for vec_num == 1
(in the epilog) but we register for vec_num == 2.  For SLP reductions
we "fail" to reduce to a single accumulator.

  int vec_num = vec_oprnds0.length ();
  gcc_assert (vec_num == 1 || slp_node);


The following fixes this:

diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index 1996ecfee7a..b1442a93581 100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -6232,7 +6232,8 @@ vect_create_epilog_for_reduction (loop_vec_info
loop_vinfo,
 }

   /* Record this operation if it could be reused by the epilogue loop.  */
-  if (STMT_VINFO_REDUC_TYPE (reduc_info) == TREE_CODE_REDUCTION)
+  if (STMT_VINFO_REDUC_TYPE (reduc_info) == TREE_CODE_REDUCTION
+  && vec_num == 1)
 loop_vinfo->reusable_accumulators.put (scalar_results[0],
   { orig_reduc_input, reduc_info });

[Bug target/107160] [13 regression] r13-2641-g0ee1548d96884d causes verification failure in spec2006

2022-10-13 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107160

--- Comment #14 from Richard Biener  ---
Aha, so the issue is that we have a vectorized epilogue here and the epilogue
of _that_ ends up doing


   [local count: 94607391]:
  # sum0_48 = PHI 
  # sum1_47 = PHI 
  # sum2_46 = PHI 
  # sum3_45 = PHI 
  # vect_sum3_31.16_101 = PHI 
  # vect_sum3_31.16_102 = PHI 
  # vect_sum3_31.16_103 = PHI 
  # vect_sum3_31.16_104 = PHI 
  _105 = BIT_FIELD_REF ;
  _106 = BIT_FIELD_REF ;
...
this is from the main vect

   [local count: 81467476]:
  # sum0_135 = PHI 
  # sum1_136 = PHI 
  # sum2_137 = PHI 
  # sum3_138 = PHI 
  # vect_sum3_50.25_159 = PHI 
this is from the epilogue

   [local count: 105119324]:
  # sum0_15 = PHI 
  # sum1_77 = PHI 
  # sum2_75 = PHI 
  # sum3_13 = PHI 
  # _160 = PHI 
  _161 = BIT_FIELD_REF <_160, 32, 0>;
  _162 = BIT_FIELD_REF <_160, 32, 32>;
  _163 = BIT_FIELD_REF <_160, 32, 64>;
  _164 = BIT_FIELD_REF <_160, 32, 96>;
  _74 = _161 + _162;
  _76 = _74 + _163;
  _78 = _76 + _164;

so we fail to accumulate the main loops accumulators and just
use the first one.  On x86 the vectorized epilogue uses a smaller
vector size but the same number of accumulators.

It seems it's simply unexpected to have the unrolled SLP reduction
and a vectorized epilogue with the same vector mode (but not unrolled).

I can reproduce the failure when patching the x86 cost model to force
unrolling by 2 (maybe we want a --param to force that to aid debugging...).

[Bug target/107160] [13 regression] r13-2641-g0ee1548d96884d causes verification failure in spec2006

2022-10-13 Thread marxin at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107160

--- Comment #13 from Martin Liška  ---
A bit reduced test-case that can be compiled with cross compiler:

$ cat pr107160.c
#define N 16
float fl[N];

__attribute__ ((noipa, optimize (0))) void
init ()
{
  for (int i = 0; i < N; i++)
fl[i] = 1 << i;
}

__attribute__ ((noipa)) float
foo (int n)
{
  float sum0, sum1, sum2, sum3;
  sum0 = sum1 = sum2 = sum3 = 0.0f;

  for (int i = 0; i < n; i += 4)
{
  sum0 += __builtin_fabs (fl[i]);
  sum1 += __builtin_fabs (fl[i + 1]);
  sum2 += __builtin_fabs (fl[i + 2]);
  sum3 += __builtin_fabs (fl[i + 3]);
}

  return sum0 + sum1 + sum2 + sum3;
}

__attribute__ ((optimize (0))) int
main ()
{
  init ();
  float res = foo (N);
  __builtin_printf ("res:%f\n", res);
  return 0;
}

x86_64:
./a.out 
res:65535.00

ppc64le:
./a.out
res:15.00

so the result is wrong, it sums only first 4 elements (and not 16).

[Bug target/107160] [13 regression] r13-2641-g0ee1548d96884d causes verification failure in spec2006

2022-10-13 Thread rguenther at suse dot de via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107160

--- Comment #12 from rguenther at suse dot de  ---
On Thu, 13 Oct 2022, linkw at gcc dot gnu.org wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107160
> 
> --- Comment #11 from Kewen Lin  ---
> > > > Btw, I've fixed a SLP reduction issue two days ago in
> > > > r13-3226-gee467644c53ee2
> > > > though that looks unrelated?
> > > 
> > > Thanks for the information, I'll double check it.
> > > 
> 
> To rebase to r13-3226 or even the latest trunk doesn't help.

:/

> > 
> > I do - the epilog is even vectorized and it works fine at runtime.
> 
> You meant the code on x86 is all fine? and the case doesn't show the "res"
> difference? hmm, it's weird.

Yes.  I'll try to look at cross compiler IL.

[Bug target/107160] [13 regression] r13-2641-g0ee1548d96884d causes verification failure in spec2006

2022-10-13 Thread linkw at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107160

--- Comment #11 from Kewen Lin  ---
> > > Btw, I've fixed a SLP reduction issue two days ago in
> > > r13-3226-gee467644c53ee2
> > > though that looks unrelated?
> > 
> > Thanks for the information, I'll double check it.
> > 

To rebase to r13-3226 or even the latest trunk doesn't help.

> 
> I do - the epilog is even vectorized and it works fine at runtime.

You meant the code on x86 is all fine? and the case doesn't show the "res"
difference? hmm, it's weird.

[Bug target/107160] [13 regression] r13-2641-g0ee1548d96884d causes verification failure in spec2006

2022-10-13 Thread rguenther at suse dot de via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107160

--- Comment #10 from rguenther at suse dot de  ---
On Thu, 13 Oct 2022, linkw at gcc dot gnu.org wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107160
> 
> --- Comment #9 from Kewen Lin  ---
> > 
> > The above doesn't look wrong (but may miss the rest of the IL).  On
> > x86_64 this looks like
> > 
> >[local count: 105119324]:
> >   # sum0_41 = PHI 
> >   # sum1_39 = PHI 
> >   # sum2_37 = PHI 
> >   # sum3_35 = PHI 
> >   # vect_sum3_31.11_59 = PHI 
> >   _58 = BIT_FIELD_REF ;
> >   _57 = BIT_FIELD_REF ;
> >   _56 = BIT_FIELD_REF ;
> >   _55 = BIT_FIELD_REF ;
> >   _74 = _58 + _57;
> >   _76 = _56 + _74;
> >   _78 = _55 + _76;
> > 
> >[local count: 118111600]:
> >   # prephitmp_79 = PHI <_78(4), 0.0(2)>
> >   return prephitmp_79;
> > 
> 
> Yeah, it looks expected without unrolling.
> 
> > when unrolling is applied, thus with a larger VF, you should ideally
> > see the vectors accumulated.
> > 
> > Btw, I've fixed a SLP reduction issue two days ago in
> > r13-3226-gee467644c53ee2
> > though that looks unrelated?
> 
> Thanks for the information, I'll double check it.
> 
> > 
> > When I force a larger VF on x86 by adding a int store in the loop I see
> > 
> >[local count: 94607391]:
> >   # sum0_48 = PHI 
> >   # sum1_36 = PHI 
> >   # sum2_35 = PHI 
> >   # sum3_24 = PHI 
> >   # vect_sum3_32.16_110 = PHI 
> >   # vect_sum3_32.16_111 = PHI 
> >   # vect_sum3_32.16_112 = PHI 
> >   # vect_sum3_32.16_113 = PHI 
> >   _114 = BIT_FIELD_REF ;
> >   _115 = BIT_FIELD_REF ;
> >   _116 = BIT_FIELD_REF ;
> >   _117 = BIT_FIELD_REF ;
> >   _118 = BIT_FIELD_REF ;
> >   _119 = BIT_FIELD_REF ;
> >   _120 = BIT_FIELD_REF ;
> >   _121 = BIT_FIELD_REF ;
> >   _122 = BIT_FIELD_REF ;
> >   _123 = BIT_FIELD_REF ;
> >   _124 = BIT_FIELD_REF ;
> >   _125 = BIT_FIELD_REF ;
> >   _126 = BIT_FIELD_REF ;
> >   _127 = BIT_FIELD_REF ;
> >   _128 = BIT_FIELD_REF ;
> >   _129 = BIT_FIELD_REF ;
> >   _130 = _114 + _118;
> >   _131 = _115 + _119;
> >   _132 = _116 + _120;
> >   _133 = _117 + _121;
> >   _134 = _130 + _122;
> >   _135 = _131 + _123;
> >   _136 = _132 + _124;
> >   _137 = _133 + _125;
> >   _138 = _134 + _126;
> > 
> > see how the lanes from the different vectors are accumulated?  (yeah,
> > we should simply add the vectors!)
> 
> Yes, it's the same as what I saw on ppc64le, but the closely following dce6
> removes the three vect_sum3_32 (in your dump, they are
> vect_sum3_32.16_10{7,8,9}) as the subsequent joints don't actually use the
> separated accumulated lane values (_138 -> sum0 ...) but only use
> vect_sum3_32.16_110.

I do - the epilog is even vectorized and it works fine at runtime.

[Bug target/107160] [13 regression] r13-2641-g0ee1548d96884d causes verification failure in spec2006

2022-10-13 Thread linkw at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107160

--- Comment #9 from Kewen Lin  ---
> 
> The above doesn't look wrong (but may miss the rest of the IL).  On
> x86_64 this looks like
> 
>[local count: 105119324]:
>   # sum0_41 = PHI 
>   # sum1_39 = PHI 
>   # sum2_37 = PHI 
>   # sum3_35 = PHI 
>   # vect_sum3_31.11_59 = PHI 
>   _58 = BIT_FIELD_REF ;
>   _57 = BIT_FIELD_REF ;
>   _56 = BIT_FIELD_REF ;
>   _55 = BIT_FIELD_REF ;
>   _74 = _58 + _57;
>   _76 = _56 + _74;
>   _78 = _55 + _76;
> 
>[local count: 118111600]:
>   # prephitmp_79 = PHI <_78(4), 0.0(2)>
>   return prephitmp_79;
> 

Yeah, it looks expected without unrolling.

> when unrolling is applied, thus with a larger VF, you should ideally
> see the vectors accumulated.
> 
> Btw, I've fixed a SLP reduction issue two days ago in
> r13-3226-gee467644c53ee2
> though that looks unrelated?

Thanks for the information, I'll double check it.

> 
> When I force a larger VF on x86 by adding a int store in the loop I see
> 
>[local count: 94607391]:
>   # sum0_48 = PHI 
>   # sum1_36 = PHI 
>   # sum2_35 = PHI 
>   # sum3_24 = PHI 
>   # vect_sum3_32.16_110 = PHI 
>   # vect_sum3_32.16_111 = PHI 
>   # vect_sum3_32.16_112 = PHI 
>   # vect_sum3_32.16_113 = PHI 
>   _114 = BIT_FIELD_REF ;
>   _115 = BIT_FIELD_REF ;
>   _116 = BIT_FIELD_REF ;
>   _117 = BIT_FIELD_REF ;
>   _118 = BIT_FIELD_REF ;
>   _119 = BIT_FIELD_REF ;
>   _120 = BIT_FIELD_REF ;
>   _121 = BIT_FIELD_REF ;
>   _122 = BIT_FIELD_REF ;
>   _123 = BIT_FIELD_REF ;
>   _124 = BIT_FIELD_REF ;
>   _125 = BIT_FIELD_REF ;
>   _126 = BIT_FIELD_REF ;
>   _127 = BIT_FIELD_REF ;
>   _128 = BIT_FIELD_REF ;
>   _129 = BIT_FIELD_REF ;
>   _130 = _114 + _118;
>   _131 = _115 + _119;
>   _132 = _116 + _120;
>   _133 = _117 + _121;
>   _134 = _130 + _122;
>   _135 = _131 + _123;
>   _136 = _132 + _124;
>   _137 = _133 + _125;
>   _138 = _134 + _126;
> 
> see how the lanes from the different vectors are accumulated?  (yeah,
> we should simply add the vectors!)

Yes, it's the same as what I saw on ppc64le, but the closely following dce6
removes the three vect_sum3_32 (in your dump, they are
vect_sum3_32.16_10{7,8,9}) as the subsequent joints don't actually use the
separated accumulated lane values (_138 -> sum0 ...) but only use
vect_sum3_32.16_110.

[Bug target/107160] [13 regression] r13-2641-g0ee1548d96884d causes verification failure in spec2006

2022-10-13 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107160

--- Comment #8 from Richard Biener  ---
(In reply to Kewen Lin from comment #7)
> One reduced test case is:
> 
> 
> 
> #include 
> #include 
> 
> #define N 128
> float fl[N];
> 
> __attribute__ ((noipa, optimize (0))) void
> init ()
> {
>   for (int i = 0; i < N; i++)
> fl[i] = i;
> }
> 
> __attribute__ ((noipa)) float
> foo (int n1)
> {
>   float sum0, sum1, sum2, sum3;
>   sum0 = sum1 = sum2 = sum3 = 0.0f;
> 
>   int n = (n1 / 4) * 4;
>   for (int i = 0; i < n; i += 4)
> {
>   sum0 += fabs (fl[i]);
>   sum1 += fabs (fl[i + 1]);
>   sum2 += fabs (fl[i + 2]);
>   sum3 += fabs (fl[i + 3]);
> }
> 
>   return sum0 + sum1 + sum2 + sum3;
> }
> 
> __attribute__ ((optimize (0))) int
> main ()
> {
>   init ();
>   float res = foo (80);
>   __builtin_printf ("res:%f\n", res);
>   return 0;
> }
> 
>  
> incorrect result "res:670.00" vs expected result "res:3160.00"
> 
> It looks it exposes one bug in vectorization reduction support. The
> reduction epilogue handling looks wrong, it generates gimple code like:
> 
> # vect_sum3_31.16_101 = PHI 
> # vect_sum3_31.16_102 = PHI 
> # vect_sum3_31.16_103 = PHI 
> # vect_sum3_31.16_104 = PHI 
> _105 = BIT_FIELD_REF ;
> _106 = BIT_FIELD_REF ;
> _107 = BIT_FIELD_REF ;
> _108 = BIT_FIELD_REF ;
> _109 = BIT_FIELD_REF ;
> _110 = BIT_FIELD_REF ;
> _111 = BIT_FIELD_REF ;
> _112 = BIT_FIELD_REF ;
> ...
> 
> it doesn't consider the reduced results vect_sum3_31.16_10{1,2,3,4} from the
> loop can be reduced again in loop exit block as they are in the same slp
> group.

The above doesn't look wrong (but may miss the rest of the IL).  On
x86_64 this looks like

   [local count: 105119324]:
  # sum0_41 = PHI 
  # sum1_39 = PHI 
  # sum2_37 = PHI 
  # sum3_35 = PHI 
  # vect_sum3_31.11_59 = PHI 
  _58 = BIT_FIELD_REF ;
  _57 = BIT_FIELD_REF ;
  _56 = BIT_FIELD_REF ;
  _55 = BIT_FIELD_REF ;
  _74 = _58 + _57;
  _76 = _56 + _74;
  _78 = _55 + _76;

   [local count: 118111600]:
  # prephitmp_79 = PHI <_78(4), 0.0(2)>
  return prephitmp_79;

when unrolling is applied, thus with a larger VF, you should ideally
see the vectors accumulated.

Btw, I've fixed a SLP reduction issue two days ago in r13-3226-gee467644c53ee2
though that looks unrelated?

When I force a larger VF on x86 by adding a int store in the loop I see

   [local count: 94607391]:
  # sum0_48 = PHI 
  # sum1_36 = PHI 
  # sum2_35 = PHI 
  # sum3_24 = PHI 
  # vect_sum3_32.16_110 = PHI 
  # vect_sum3_32.16_111 = PHI 
  # vect_sum3_32.16_112 = PHI 
  # vect_sum3_32.16_113 = PHI 
  _114 = BIT_FIELD_REF ;
  _115 = BIT_FIELD_REF ;
  _116 = BIT_FIELD_REF ;
  _117 = BIT_FIELD_REF ;
  _118 = BIT_FIELD_REF ;
  _119 = BIT_FIELD_REF ;
  _120 = BIT_FIELD_REF ;
  _121 = BIT_FIELD_REF ;
  _122 = BIT_FIELD_REF ;
  _123 = BIT_FIELD_REF ;
  _124 = BIT_FIELD_REF ;
  _125 = BIT_FIELD_REF ;
  _126 = BIT_FIELD_REF ;
  _127 = BIT_FIELD_REF ;
  _128 = BIT_FIELD_REF ;
  _129 = BIT_FIELD_REF ;
  _130 = _114 + _118;
  _131 = _115 + _119;
  _132 = _116 + _120;
  _133 = _117 + _121;
  _134 = _130 + _122;
  _135 = _131 + _123;
  _136 = _132 + _124;
  _137 = _133 + _125;
  _138 = _134 + _126;

see how the lanes from the different vectors are accumulated?  (yeah,
we should simply add the vectors!)

[Bug target/107160] [13 regression] r13-2641-g0ee1548d96884d causes verification failure in spec2006

2022-10-13 Thread linkw at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107160

Kewen Lin  changed:

   What|Removed |Added

 CC||rguenth at gcc dot gnu.org,
   ||rsandifo at gcc dot gnu.org

--- Comment #7 from Kewen Lin  ---

One reduced test case is:



#include 
#include 

#define N 128
float fl[N];

__attribute__ ((noipa, optimize (0))) void
init ()
{
  for (int i = 0; i < N; i++)
fl[i] = i;
}

__attribute__ ((noipa)) float
foo (int n1)
{
  float sum0, sum1, sum2, sum3;
  sum0 = sum1 = sum2 = sum3 = 0.0f;

  int n = (n1 / 4) * 4;
  for (int i = 0; i < n; i += 4)
{
  sum0 += fabs (fl[i]);
  sum1 += fabs (fl[i + 1]);
  sum2 += fabs (fl[i + 2]);
  sum3 += fabs (fl[i + 3]);
}

  return sum0 + sum1 + sum2 + sum3;
}

__attribute__ ((optimize (0))) int
main ()
{
  init ();
  float res = foo (80);
  __builtin_printf ("res:%f\n", res);
  return 0;
}

 
incorrect result "res:670.00" vs expected result "res:3160.00"

It looks it exposes one bug in vectorization reduction support. The reduction
epilogue handling looks wrong, it generates gimple code like:

# vect_sum3_31.16_101 = PHI 
# vect_sum3_31.16_102 = PHI 
# vect_sum3_31.16_103 = PHI 
# vect_sum3_31.16_104 = PHI 
_105 = BIT_FIELD_REF ;
_106 = BIT_FIELD_REF ;
_107 = BIT_FIELD_REF ;
_108 = BIT_FIELD_REF ;
_109 = BIT_FIELD_REF ;
_110 = BIT_FIELD_REF ;
_111 = BIT_FIELD_REF ;
_112 = BIT_FIELD_REF ;
...

it doesn't consider the reduced results vect_sum3_31.16_10{1,2,3,4} from the
loop can be reduced again in loop exit block as they are in the same slp group.

[Bug target/107160] [13 regression] r13-2641-g0ee1548d96884d causes verification failure in spec2006

2022-10-12 Thread linkw at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107160

Kewen Lin  changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
   Last reconfirmed||2022-10-12
 Ever confirmed|0   |1

--- Comment #6 from Kewen Lin  ---
Confirmed.

The option can be reduced to: -m64 -O3 -mcpu=power8 -ffast-math

The culprit loop is the one in function l1_norm of file vector.cc (actually
it's from the header file). The resulting gimple after loop vect pass looks
weird to me, it seems not to expose one issue related to math but one actual
vect issue. I'll check it further and post one reduced test case.

[Bug target/107160] [13 regression] r13-2641-g0ee1548d96884d causes verification failure in spec2006

2022-10-10 Thread seurer at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107160

--- Comment #5 from seurer at gcc dot gnu.org ---
I added that option and 554.roms_r now runs OK.

[Bug target/107160] [13 regression] r13-2641-g0ee1548d96884d causes verification failure in spec2006

2022-10-09 Thread linkw at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107160

Kewen Lin  changed:

   What|Removed |Added

   Assignee|unassigned at gcc dot gnu.org  |linkw at gcc dot gnu.org

--- Comment #4 from Kewen Lin  ---
Just back from holiday, thanks for reporting! I'll take a look.


(In reply to seurer from comment #3)
> Note that 554.roms_r from spec2017 also fails after this commit.

I suspected what you saw on SPEC2017 554.roms_r is related to PR103320. The
culprit commit can make some loops unrolled (they are not previously), PR103320
showed that loops unrolling matters. So if you did add option
"-fno-unsafe-math-optimizations" for the run with "-funroll-loops", then you
need it since this commit too. Could you double check?

[Bug target/107160] [13 regression] r13-2641-g0ee1548d96884d causes verification failure in spec2006

2022-10-06 Thread seurer at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107160

--- Comment #3 from seurer at gcc dot gnu.org ---
Note that 554.roms_r from spec2017 also fails after this commit.

[Bug target/107160] [13 regression] r13-2641-g0ee1548d96884d causes verification failure in spec2006

2022-10-06 Thread seurer at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107160

--- Comment #2 from seurer at gcc dot gnu.org ---
One of the compilation commands:

/home/seurer/gcc/git/install/gcc-test/bin/g++ -c -o auto_derivative_function.o
-DSPEC_CPU -DNDEBUG  -Iinclude -DBOOST_DISABLE_THREADS -Ddeal_II_dimension=3
-m64 -O3 -mcpu=power8 -ffast-math -funroll-loops -fpeel-loops -fvect-cost-model
-mpopcntd -mrecip=rsqrt-DSPEC_CPU_LP64 -DSPEC_CPU_LINUX  -include cstddef
-std=gnu++98  auto_derivative_function.cc

[Bug target/107160] [13 regression] r13-2641-g0ee1548d96884d causes verification failure in spec2006

2022-10-06 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107160

Richard Biener  changed:

   What|Removed |Added

   Keywords||wrong-code
   Target Milestone|--- |13.0

--- Comment #1 from Richard Biener  ---
what other flags do you use?