[Bug middle-end/113364] [14 regression] ICE verify_ssa: `definition in block N does not dominate use in block` with `-O3 -march=znver2`

2024-01-23 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113364

Tamar Christina  changed:

   What|Removed |Added

 CC||dcb314 at hotmail dot com

--- Comment #18 from Tamar Christina  ---
*** Bug 113561 has been marked as a duplicate of this bug. ***

[Bug tree-optimization/113561] yet more verify_ssa fails

2024-01-23 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113561

Tamar Christina  changed:

   What|Removed |Added

 Resolution|--- |DUPLICATE
 CC||tnfchris at gcc dot gnu.org
 Status|UNCONFIRMED |RESOLVED

--- Comment #2 from Tamar Christina  ---
Is indeed fixed by the commit for #113364

*** This bug has been marked as a duplicate of bug 113364 ***

[Bug tree-optimization/113467] [14 regression] libgcrypt-1.10.3 is miscompiled

2024-01-23 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113467

--- Comment #20 from Tamar Christina  ---
(In reply to rguent...@suse.de from comment #19)
> > Am 23.01.2024 um 18:06 schrieb tnfchris at gcc dot gnu.org 
> > :
> > 
> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113467
> > 
> > --- Comment #18 from Tamar Christina  ---
> > (In reply to Richard Biener from comment #7)
> >> I do wonder whether LOOP_VINFO_EARLY_BREAKS_VECT_PEELED actually works 
> >> (since
> >> without early exits we cannot handle a non-empty latch because of 
> >> correctness
> >> issues).  I'd very much have preferred to deal with these by loop rotation
> >> (there's the loop_ch pass).  We're still doing this, even when
> >> LOOP_VINFO_EARLY_BREAKS_VECT_PEELED:
> >> 
> >>  /* We assume that the loop exit condition is at the end of the loop. i.e,
> >> that the loop is represented as a do-while (with a proper if-guard
> >> before the loop if needed), where the loop header contains all the
> >> executable statements, and the latch is empty.  */
> >>  if (!empty_block_p (loop->latch)
> >>  || !gimple_seq_empty_p (phi_nodes (loop->latch)))
> >>return opt_result::failure_at (vect_location,
> >>   "not vectorized: latch block not
> >> empty.\n");
> >> 
> >> so that's a bit odd (but loop_ch tries to ensure the latch is empty 
> >> anyway).
> >> 
> >> Does the following fix the issue?
> > 
> > Not really sure I understand what the latch being empty has to do with
> > LOOP_VINFO_EARLY_BREAKS_VECT_PEELED as the latch is still empty even with 
> > it.
> 
> The latch is everything after the IV exit.

Wait, are you saying, that conceptually if we pick an earlier exit as the main
exit then for the vectorizer the "latch" is everything below the fall through
edge?

i.e. that the "latch" then contains the normal loop exit?

> 
> > I guess if it's just going to disabled it then wouldn't it better to just
> > always pick the latch exit rather than trying to do the whole analysis thing
> > and maybe pick another exit while the main exit would have worked.
> 
> The point was to quickly see whether a peeled early exit vectorization is
> the issue here.

I see, I should submit that dbgcnt patch. I wrote it just never sent it.

[Bug tree-optimization/113467] [14 regression] libgcrypt-1.10.3 is miscompiled

2024-01-23 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113467

--- Comment #18 from Tamar Christina  ---
(In reply to Richard Biener from comment #7)
> I do wonder whether LOOP_VINFO_EARLY_BREAKS_VECT_PEELED actually works (since
> without early exits we cannot handle a non-empty latch because of correctness
> issues).  I'd very much have preferred to deal with these by loop rotation
> (there's the loop_ch pass).  We're still doing this, even when
> LOOP_VINFO_EARLY_BREAKS_VECT_PEELED:
> 
>   /* We assume that the loop exit condition is at the end of the loop. i.e,
>  that the loop is represented as a do-while (with a proper if-guard
>  before the loop if needed), where the loop header contains all the
>  executable statements, and the latch is empty.  */
>   if (!empty_block_p (loop->latch)
>   || !gimple_seq_empty_p (phi_nodes (loop->latch)))
> return opt_result::failure_at (vect_location,
>"not vectorized: latch block not
> empty.\n");
> 
> so that's a bit odd (but loop_ch tries to ensure the latch is empty anyway).
> 
> Does the following fix the issue?

Not really sure I understand what the latch being empty has to do with
LOOP_VINFO_EARLY_BREAKS_VECT_PEELED as the latch is still empty even with it.

I guess if it's just going to disabled it then wouldn't it better to just
always pick the latch exit rather than trying to do the whole analysis thing
and maybe pick another exit while the main exit would have worked.

[Bug middle-end/113364] [14 regression] ICE verify_ssa: `definition in block N does not dominate use in block` with `-O3 -march=znver2`

2024-01-23 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113364

--- Comment #16 from Tamar Christina  ---
Ok, I've submitted the patch since the ICE and miscompare are unrelated.

I'll keep this ticket open in any case.  The miscompares didn't happen based on
commits from ~2 weeks ago, So this will give me a place to start.

Hopefully send a patch for those tomorrow.

[Bug middle-end/113364] [14 regression] ICE verify_ssa: `definition in block N does not dominate use in block` with `-O3 -march=znver2`

2024-01-23 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113364

--- Comment #15 from Tamar Christina  ---
Ok, the fix fixes the ICE but after rebasing to trunk I get a misscompile
during bootstrap which miscompiles the x86 backend.

This is likely related to https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113539
so tracking it down...

[Bug tree-optimization/113441] [14 Regression] Fail to fold the last element with multiple loop

2024-01-23 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113441

--- Comment #25 from Tamar Christina  ---
> >  void record_nonwrapping_chrec (tree chrec)
> >  {
> > -  CHREC_NOWRAP(chrec) = 1;
> > +  CHREC_NOWRAP(chrec) = 0;
> >  
> >if (dump_file && (dump_flags & TDF_SCEV))
> >  {
> 
> Hmmm. With experiments. The codegen looks slightly better but still didn't
> recover back to GCC-12.
> 
> 
> Btw, I compare ARM SVE codegen, even with cost model:
> 
> https://godbolt.org/z/cKc1PG3dv
> 
> I think GCC 13.2 codegen is better than GCC trunk with cost model.

If you have the cost model enabled you hit
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113441#c9 which is just a target
bug I need to look into separately.

[Bug tree-optimization/113441] [14 Regression] Fail to fold the last element with multiple loop

2024-01-23 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113441

--- Comment #23 from Tamar Christina  ---
tamar:~/gcc-dsg/test$ extract-toolchain gcc 2efe3a7de01
A   1514 files
D   0 files
M   0 files
Extracted 'origin/manygcc-basepoints-gcc-14-6292-g2f512f6fcdd:2efe3a7de01'

> ./bin/gcc -S -o ../wlo-bad.s -march=armv8-a+sve -O3 -msve-vector-bits=512 
> -fno-vect-cost-model -g0 ../wlo.c -fdump-tree-vect-all

tamar:~/gcc-dsg/test$ extract-toolchain gcc 9f7ad5eff3b
A   1514 files
D   0 files
M   0 files
Extracted 'origin/manygcc-basepoints-gcc-14-6292-g2f512f6fcdd:9f7ad5eff3b'

> ./bin/gcc -S -o ../wlo-good.s -march=armv8-a+sve -O3 -msve-vector-bits=512 
> -fno-vect-cost-model -g0 ../wlo.c -fdump-tree-vect-all

> diff ../wlo-bad.s ../wlo-good.s  | wc -l
537

and for the record the bisect was scanning for  "requires scalar epilogue loop"
and that's the first commit they appear on.

[Bug tree-optimization/113441] [14 Regression] Fail to fold the last element with multiple loop

2024-01-23 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113441

--- Comment #22 from Tamar Christina  ---
for me with `-fno-vect-cost-model` on without this commit we generate
https://gist.github.com/Mistuke/d9252bfcb2aa766327c5f377e162f5b7 for the loop
and with the commit well.. it doesn't fit on the screen but the codegen is
pretty horrible with

smlal2  v24.4s, v13.8h, v5.8h
smull   v31.4s, v30.4h, v17.4h
add v20.4s, v20.4s, v11.4s
smlal2  v29.4s, v3.8h, v6.8h
smull2  v25.4s, v25.8h, v15.8h
add v22.4s, v28.4s, v22.4s
shrnv21.4h, v21.4s, 15
add v20.4s, v20.4s, v26.4s
add v29.4s, v29.4s, v24.4s
smlal2  v25.4s, v16.8h, v7.8h
smlal   v31.4s, v18.4h, v8.4h
smull2  v27.4s, v27.8h, v17.8h
shrn2   v21.8h, v22.4s, 15
add v29.4s, v29.4s, v25.4s
add v31.4s, v31.4s, v20.4s
smlal2  v27.4s, v18.8h, v8.8h
str h21, [x5, x9]
add x9, x9, 32
add x9, x5, x9
shrnv31.4h, v31.4s, 15
st1 {v21.h}[1], [x10]
add v27.4s, v27.4s, v29.4s
st1 {v21.h}[2], [x6]
add x6, x7, 20
add x10, x1, x21
st1 {v21.h}[3], [x2]
add x2, x7, 24
add x7, x7, 28
st1 {v21.h}[4], [x8]
shrn2   v31.8h, v27.4s, 15
st1 {v21.h}[5], [x6]
lsl x6, x10, 1
add x10, x5, x10, lsl 1
st1 {v21.h}[6], [x2]
add x2, x10, 4
st1 {v21.h}[7], [x7]
add x7, x10, 8
str h31, [x5, x6]
add x8, x10, 12
lsl x1, x1, 1
add x6, x6, 32
st1 {v31.h}[1], [x2]
add x2, x10, 16
st1 {v31.h}[2], [x7]
add x7, x10, 20
st1 {v31.h}[3], [x8]
add x8, x10, 24
add x10, x10, 28
st1 {v31.h}[4], [x2]
st1 {v31.h}[5], [x7]
add x11, x1, 32
st1 {v31.h}[6], [x8]
add x11, x0, x11
st1 {v31.h}[7], [x10]
add x10, x1, x25
ld1hz31.s, p5/z, [x11]

going on for a while. i.e. single element lane stores. So with the cost model
disabled, it definitely does get worse witht that commit. with the cost model
on there's no difference.

[Bug tree-optimization/113441] [14 Regression] Fail to fold the last element with multiple loop

2024-01-23 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113441

--- Comment #17 from Tamar Christina  ---
Ok, bisected to

g:2efe3a7de0107618397264017fb045f237764cc7 is the first bad commit
commit 2efe3a7de0107618397264017fb045f237764cc7
Author: Hao Liu 
Date:   Wed Dec 6 14:52:19 2023 +0800

tree-optimization/112774: extend the SCEV CHREC tree with a nonwrapping
flag

Before this commit we were unable to analyse the stride of the access.
After this niters seems to estimate the loop trip count at 4 and after that the
logs diverge enormously.

[Bug tree-optimization/113552] [11/12/13/14 Regression] vectorizer generates calls to vector math routines with 1 simd lane.

2024-01-23 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113552

Tamar Christina  changed:

   What|Removed |Added

   Assignee|unassigned at gcc dot gnu.org  |tnfchris at gcc dot 
gnu.org
 Status|NEW |ASSIGNED

--- Comment #9 from Tamar Christina  ---
(In reply to Richard Biener from comment #7)
> So - the target should reject this clone or not generate it in the first
> place.  And of course the cost thing should be fixed which will likely mask
> the issue in the target.

Yeah, looks like there's a bug in
aarch64_simd_clone_compute_vecsize_and_simdlen that's also present on the
branches.  I'll submit a patch.

[Bug tree-optimization/113441] [14 Regression] Fail to fold the last element with multiple loop

2024-01-23 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113441

--- Comment #16 from Tamar Christina  ---
(In reply to rguent...@suse.de from comment #13)
> > > You could check if we call this with sane values.
> > 
> > Do you mean it's RISC-V backend cost model issue ?
> 
> I responded to Tamar which means a aarch64 cost model issue - the
> specific issue that the PHIs appear to have no cost.  I didn't look
> at any of the rest.

Yeah, I'll be checking this separately and make a different issue if need be.

(In reply to JuzheZhong from comment #14)
> I just tried again both GCC-13.2 and GCC-14 with -fno-vect-cost-model.
> 
> https://godbolt.org/z/enEG3qf5K
> 
> GCC-14 requires scalar epilogue loop, whereas GCC-13.2 doesn't.
> 
> I believe it's not cost model issue.

Yes, my bisect originally stopped because of the costing change.  I've started
a new one with -fno-vect-cost-model but having trouble with the condition to
check for.  Will be back in a bit

[Bug tree-optimization/113552] [11/12/13/14 Regression] vectorizer generates calls to vector math routines with 1 simd lane.

2024-01-23 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113552

Tamar Christina  changed:

   What|Removed |Added

 Status|WAITING |NEW

--- Comment #5 from Tamar Christina  ---
__attribute__ ((__simd__ ("notinbranch"), const))
double cos (double);

void foo (float *a, double *b)
{
for (int i = 0; i < 12; i+=3)
  {
b[i] = cos (5.0 * a[i]);
b[i+1] = cos (5.0 * a[i+1]);
b[i+2] = cos (5.0 * a[i+2]);
  }
}

Simple C example that shows the problem.

This seems to happen when SLP succeeds and the group size is a non power of
two.
The vectorizer then unrolls to make it a power of two and during vectorization
it seems to destroy the vector, make the call and reconstruct it.

So this seems like an SLP vectorization bug.  I can't seem to trigger it
however on GCC < 14 since SLP consistently fails for all my examples because it
tries a mode that's larger than the vector size.

So It may be a GCC 14 only regression, but I think it's latent in the
vectorizer.

[Bug tree-optimization/113552] [11/12/13/14 Regression] vectorizer generates calls to vector math routines with 1 simd lane.

2024-01-23 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113552

--- Comment #4 from Tamar Christina  ---
(In reply to nsz from comment #2)
> is this fortran only?
> 

No it should be C as well, I was just reducing from a Fortran workload that
failed so I can see what the vectorizer was doing.

[Bug tree-optimization/113552] [11/12/13/14 Regression] vectorizer generates calls to vector math routines with 1 simd lane.

2024-01-23 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113552

--- Comment #3 from Tamar Christina  ---
(In reply to Richard Biener from comment #1)
> Hum, the vectorizer looks at the simd specs and if it says 1-lane variants
> (simdlen == 1) are available it will happily create them.
>

My understanding is that the spec just says "All SIMD variants are available"
but technically V1DF is FP not SIMD. 

> Can you provide the testcase amended with the used SIMD "declarations"
> (as with the fortran syntax or with a C testcase)?

fair point:

!GCC$ builtin (cos) attributes simd (notinbranch)

  SUBROUTINE a(b)
  DIMENSION b(3,0)
  COMMON c
  DO 4 m=1,c
 DO 4 d=1,3
 b(d,m)=b(d,m)+COS(5.0D00*m)
   4  CONTINUE
  END
  DIMENSION e(53)
  DIMENSION f(6,91),g(6,91),h(6,91),
 *  i(6,91),j(6,91),k(6,86)
  DIMENSION l(107)
  END

where just

aarch64-unknown-linux-gnu-gfortran -S -o - -Ofast -w cosmo.fppized3.f

is enough.

[Bug middle-end/113364] [14 regression] ICE verify_ssa: `definition in block N does not dominate use in block` with `-O3 -march=znver2`

2024-01-23 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113364

--- Comment #14 from Tamar Christina  ---
Yes I had to rerun my baseline after updating trunk. Will post patch once peak
finishes

[Bug middle-end/113364] [14 regression] ICE verify_ssa: `definition in block N does not dominate use in block` with `-O3 -march=znver2`

2024-01-23 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113364

--- Comment #13 from Tamar Christina  ---
Yes I had to rerun my baseline after updating trunk. Will post patch once peak
finishes

[Bug tree-optimization/113552] [11/12/13/14 Regression] vectorizer generates calls to vector math routines with 1 simd lane.

2024-01-22 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113552

Tamar Christina  changed:

   What|Removed |Added

   Target Milestone|--- |14.0
   Priority|P3  |P1
  Component|middle-end  |tree-optimization

[Bug middle-end/113552] New: [11/12/13/14 Regression] vectorizer generates calls to vector math routines with 1 simd lane.

2024-01-22 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113552

Bug ID: 113552
   Summary: [11/12/13/14 Regression] vectorizer generates calls to
vector math routines with 1 simd lane.
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Keywords: link-failure
  Severity: normal
  Priority: P3
 Component: middle-end
  Assignee: unassigned at gcc dot gnu.org
  Reporter: tnfchris at gcc dot gnu.org
  Target Milestone: ---
Target: aarch64-*

In GCC 7 the Arm vector PCS was implemented to support libmvec but the libmvec
component never made it into glibc until now.

GLIBC 2.39 which will be paired with GCC 14 now implements the vector math
routines.

However consider this function:

> cat cosmo.fppized3.f
  SUBROUTINE a(b)
  DIMENSION b(3,0)
  COMMON c
  DO 4 m=1,c
 DO 4 d=1,3
 b(d,m)=b(d,m)+COS(5.0D00*m)
   4  CONTINUE
  END
  DIMENSION e(53)
  DIMENSION f(6,91),g(6,91),h(6,91),
 *  i(6,91),j(6,91),k(6,86)
  DIMENSION l(107)
  END

and compiled with headers from a glibc 2.39:

> aarch64-unknown-linux-gnu-gfortran -S -o - -Ofast 
> -L/data/repro/glibc/usr/lib64 -I/data/repro/glibc/include 
> --sysroot=/data/repro/glibc -w cosmo.fppized3.f

produces:

fmulv13.2d, v13.2d, v19.2d
fmovd0, d13
bl  _ZGVnN1v_cos
fmovd12, d0
dup d0, v13.d[1]
bl  _ZGVnN1v_cos
fmovd31, d0
stp d12, d31, [sp, 96]

which has deconstructed the vector to scalar and performs a vector call with 1
element.
This is not just inefficient but _ZGVnN1v_cos does not exist in glibc as such
code is produced that we cannot link.

It looks like the vectorizer starts with 4 floats and widens to 2x 2 double. 
But then during vectorizable simd this is again split into multiple vectors,
even though the operation already fits in a vector:

cosmo.fppized3.f:4:13: note:   -->vectorizing SLP node starting from: _49 =
__builtin_cos (_48);
cosmo.fppized3.f:4:13: note:   vect_is_simple_use: operand _47 * 5.0e+0, type
of def: internal
cosmo.fppized3.f:4:13: note:   transform call.
cosmo.fppized3.f:4:13: note:   add new stmt: _132 = BIT_FIELD_REF
;
cosmo.fppized3.f:4:13: note:   add new stmt: _133 = cos.simdclone.0 (_132);
cosmo.fppized3.f:4:13: note:   add new stmt: _134 = BIT_FIELD_REF
;
cosmo.fppized3.f:4:13: note:   add new stmt: _135 = cos.simdclone.0 (_134);
cosmo.fppized3.f:4:13: note:   add new stmt: vect__49.27_136 = {_133, _135};
cosmo.fppized3.f:4:13: note:   add new stmt: _137 = BIT_FIELD_REF
;
cosmo.fppized3.f:4:13: note:   add new stmt: _138 = cos.simdclone.0 (_137);
cosmo.fppized3.f:4:13: note:   add new stmt: _139 = BIT_FIELD_REF
;
cosmo.fppized3.f:4:13: note:   add new stmt: _140 = cos.simdclone.0 (_139);
...

Because we happen to have a V1DF mode that is meant to only be used by some
intrinsics the operation succeeds.

So several issues here:

1. We should remove the new libmvec headers from glibc from applying to GCC
10,9,8,7 since we can't fix those anymore.  So we need a GCC version check on
them, however glibc is now frozen for release.
2. The vectorizer should not decompose a simd call if the input and result
don't require it.
3. We shouldn't generate a call with simdlen 1.  That said in theory this could
still be beneficial because it would allow the rest of the code to vectorize
and the vector pcs is cheaper to call.

[Bug tree-optimization/113441] [14 Regression] Fail to fold the last element with multiple loop

2024-01-22 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113441

--- Comment #9 from Tamar Christina  ---
So on SVE the change is cost modelling.

Bisect landed on g:33c2b70dbabc02788caabcbc66b7baeafeb95bcf which changed the
compiler's defaults to using the new throughput matched cost modelling used be
newer cores.

It looks like this changes which mode the compiler picks for when using a fixed
register size.

This is because the new cost model (correctly) models the costs for FMAs and
promotions.

Before:

array1[0][_1] 1 times scalar_load costs 1 in prologue
int) _2 1 times scalar_stmt costs 1 in prologue

after:

array1[0][_1] 1 times scalar_load costs 1 in prologue 
(int) _2 1 times scalar_stmt costs 0 in prologue 

and the cost goes from:

Vector inside of loop cost: 125

to

Vector inside of loop cost: 83 

so far, nothing sticks out, and in fact the profitability for VNx4QI drops from

Calculated minimum iters for profitability: 5

to

Calculated minimum iters for profitability: 3

This causes a clash, as this is now exactly the same cost as VNx2QI which used
to be what it preferred before.

Which then leads it to pick the higher VF.

In the end smaller VF shows:

;; Guessed iterations of loop 4 is 0.500488. New upper bound 1.

and now we get:

Vectorization factor 16 seems too large for profile prevoiusly believed to be
consistent; reducing.  
;; Guessed iterations of loop 4 is 0.500488. New upper bound 0.
;; Scaling loop 4 with scale 66.6% (guessed) to reach upper bound 0

which I guess is the big difference.

There is a weird costing going on in the PHI nodes though:

m_108 = PHI  1 times vector_stmt costs 0 in body 
m_108 = PHI  2 times scalar_to_vec costs 0 in prologue

they have collapsed to 0. which can't be right..

[Bug middle-end/112600] Failed to optimize saturating addition using __builtin_add_overflow

2024-01-22 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112600

Tamar Christina  changed:

   What|Removed |Added

 CC||tnfchris at gcc dot gnu.org

--- Comment #5 from Tamar Christina  ---
Yeah, this is hurting us a lot on vectors as well:

https://godbolt.org/z/ecnGadxcG

The first one isn't vectorizable and the second one we generates too
complicated code as the pattern vec_cond is expanded to something quite
complicated.

It was too complicated for the intern we had at the time, but I think basically
we should still do the conclusion of this thread no?
https://www.mail-archive.com/gcc@gcc.gnu.org/msg95398.html

i.e. we should just make proper saturating IFN.

The only remaining question is, should we make them optab backed or can we do
something reasonably better for most target with better fallback code.

This seems to indicate yes since the REALPART_EXPR seems to screw things up a
bit.

[Bug tree-optimization/113441] [13/14 Regression] Fail to fold the last element with multiple loop

2024-01-22 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113441

--- Comment #6 from Tamar Christina  ---
Hello,

I can bisect it if you want. it should only take a few seconds.

[Bug tree-optimization/113539] [14 Regression] perlbench miscompiled on aarch64 since r14-8223-g1c1853a70f

2024-01-22 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113539

Tamar Christina  changed:

   What|Removed |Added

   Priority|P3  |P1
 Status|UNCONFIRMED |ASSIGNED
   Last reconfirmed||2024-01-22
 Ever confirmed|0   |1
   Assignee|unassigned at gcc dot gnu.org  |tnfchris at gcc dot 
gnu.org

--- Comment #3 from Tamar Christina  ---
(In reply to Richard Biener from comment #2)
> It accepts all constant known may_be_zero - we can handle all of those later.
> 
> I suspect this just triggers a latent issue (vectorizing now simply using
> a different exit as canonical in one case).

Indeed, I'll take a look.

[Bug tree-optimization/113539] [14 Regression] perlbench miscompiled on aarch64 since r14-8223-g1c1853a70f

2024-01-22 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113539

Tamar Christina  changed:

   What|Removed |Added

 CC||tnfchris at gcc dot gnu.org

--- Comment #1 from Tamar Christina  ---
If that's the commit that's miscomparing then it's probably a bug in
early-break vect.

So I'll take a look.

+ if ((integer_zerop (may_be_zero)
+  || integer_nonzerop (may_be_zero)

is odd though, isn't that basically accepting all values of may_be_zero?

[Bug testsuite/113425] gcc.dg/fold-copysign-1.c fails on arm since g:7cbe41d35e6

2024-01-22 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113425

--- Comment #4 from Tamar Christina  ---
(In reply to Christophe Lyon from comment #3)
> What I meant by arm-* is that we see the same issue on several of the
> configurations we test, as can be seen on
> https://linaro.atlassian.net/browse/GNU-1100
> 
> We have recently improved the extraction of the configure line, so now some
> of the xxx/details.txt on that page include it.
> 
> The two "simplest" configurations we test are tcwg_gcc_check/master-arm and
> tcwg_gnu_native_check_gcc, but both of them ran before the improvement
> mentioned above; in these cases, the information is present inside
> console.log.xz in the relevant CI step directory (03-build_abe-gcc for
> tcwg_gcc_check/master-arm and 
> 04-build_abe-gcc for tcwg_gnu_native_check_gcc/master-arm, the "-gcc" suffix
> meaning it's the step is which we build gcc)
> 
> Anyway, here is the GCC configure line for tcwg_gcc_check/master-arm:
> /configure SHELL=/bin/bash 
> --with-mpc=/home/tcwg-buildslave/workspace/tcwg_gnu_1/abe/builds/destdir/
> armv8l-unknown-linux-gnueabihf
> --with-mpfr=/home/tcwg-buildslave/workspace/tcwg_gnu_1/abe/builds/destdir/
> armv8l-unknown-linux-gnueabihf
> --with-gmp=/home/tcwg-buildslave/workspace/tcwg_gnu_1/abe/builds/destdir/
> armv8l-unknown-linux-gnueabihf --with-gnu-as --with-gnu-ld
> --disable-libmudflap --enable-lto --enable-shared --without-included-gettext
> --enable-nls --with-system-zlib --disable-sjlj-exceptions
> --enable-gnu-unique-object --enable-linker-build-id --disable-libstdcxx-pch
> --enable-c99 --enable-clocale=gnu --enable-libstdcxx-debug
> --enable-long-long --with-cloog=no --with-ppl=no --with-isl=no
> --disable-multilib --with-float=hard --with-fpu=neon-fp-armv8
> --with-mode=thumb --with-arch=armv8-a --enable-threads=posix
> --enable-multiarch --enable-libstdcxx-time=yes
> --enable-gnu-indirect-function --enable-checking=yes --disable-bootstrap
> --enable-languages=default
> --prefix=/home/tcwg-buildslave/workspace/tcwg_gnu_1/abe/builds/destdir/
> armv8l-unknown-linux-gnueabih

Yes, but the reason I need the configure flags is because it doesn't fail with
the arm-none-linux-gnueabihf target our build scripts make.

I'll check with those options.  Immediately one big difference is the forcing
of armv8 and thumb which is likely causing the difference.

[Bug testsuite/113425] gcc.dg/fold-copysign-1.c fails on arm since g:7cbe41d35e6

2024-01-22 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113425

Tamar Christina  changed:

   What|Removed |Added

 CC||tnfchris at gcc dot gnu.org

--- Comment #2 from Tamar Christina  ---
It is updated for arm, but I need to know how the toolchain was configured. 
This is just a difference in default options.

So I need the configure flags to be able to do anything here.

[Bug middle-end/113364] [14 regression] ICE verify_ssa: `definition in block N does not dominate use in block` with `-O3 -march=znver2`

2024-01-13 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113364

--- Comment #9 from Tamar Christina  ---
vect_create_epilog_for_reduction needs to handle the case where the vectorizer
has picked a different exit than the main one.

diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index eccf0953bba..6f761a4a78f 100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -5965,7 +5965,8 @@ vect_create_epilog_for_reduction (loop_vec_info
loop_vinfo,
  loop-closed PHI of the inner loop which we remember as
  def for the reduction PHI generation.  */
   bool double_reduc = false;
-  bool main_exit_p = LOOP_VINFO_IV_EXIT (loop_vinfo) == loop_exit;
+  bool main_exit_p = LOOP_VINFO_IV_EXIT (loop_vinfo) == loop_exit
+&& !LOOP_VINFO_EARLY_BREAKS_VECT_PEELED (loop_vinfo);
   stmt_vec_info rdef_info = stmt_info;
   if (STMT_VINFO_DEF_TYPE (stmt_info) == vect_double_reduction_def)
 {

fixes it. But would be good if I can reproduce the bootstrap issue. Will try
with provided options.

[Bug middle-end/113364] [14 regression] ICE verify_ssa: `definition in block N does not dominate use in block` with `-O3 -march=znver2`

2024-01-13 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113364

Tamar Christina  changed:

   What|Removed |Added

   Assignee|unassigned at gcc dot gnu.org  |tnfchris at gcc dot 
gnu.org
 Status|NEW |ASSIGNED

--- Comment #8 from Tamar Christina  ---
Hmm curious, does it work for you with --with-build-config='bootstrap-O3'
that's how I tested it before

[Bug target/109636] [14 Regression] ICE: in paradoxical_subreg_p, at rtl.h:3205 with -O -march=armv8.4-a+sve

2024-01-12 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109636

Tamar Christina  changed:

   What|Removed |Added

   Assignee|ktkachov at gcc dot gnu.org|tnfchris at gcc dot 
gnu.org

--- Comment #11 from Tamar Christina  ---
Have a patch for the division case and will finish the multiplication and
submit when I'm back. Sorry for the delay.

[Bug tree-optimization/113178] [14 Regression] ice in find_uses_to_rename_use

2024-01-12 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113178

Tamar Christina  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|ASSIGNED|RESOLVED

--- Comment #9 from Tamar Christina  ---
Fixed. Thanks for the report and let me know if there's something still broken.

[Bug tree-optimization/113237] [14 Regression] ICE verify_ssa failed when building 500.perlbench_r since r14-6822-g01f4251b8775c8

2024-01-12 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113237

Tamar Christina  changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

--- Comment #6 from Tamar Christina  ---
Fixed. Thanks for the report and let me know if there's something still broken.

[Bug middle-end/26163] [meta-bug] missed optimization in SPEC (2k17, 2k and 2k6 and 95)

2024-01-12 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=26163
Bug 26163 depends on bug 113237, which changed state.

Bug 113237 Summary: [14 Regression] ICE verify_ssa failed when building 
500.perlbench_r since r14-6822-g01f4251b8775c8
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113237

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

[Bug middle-end/113172] [14 Regression] ice in move_early_exit_stmts

2024-01-12 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113172

Tamar Christina  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|ASSIGNED|RESOLVED

--- Comment #4 from Tamar Christina  ---
Fixed. Thanks for the report and let me know if there's something still broken.

[Bug tree-optimization/113137] [14 regression] Failed bootstrap with -O3 -march=znver2

2024-01-12 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113137

Tamar Christina  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|ASSIGNED|RESOLVED

--- Comment #17 from Tamar Christina  ---
Fixed. Thanks for the report and let me know if there's something still broken.

[Bug tree-optimization/113136] [14 regression] ICE when building Perl

2024-01-12 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113136

Tamar Christina  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|ASSIGNED|RESOLVED

--- Comment #11 from Tamar Christina  ---
Fixed. Thanks for the report and let me know if there's something still broken.

[Bug tree-optimization/113137] [14 regression] Failed bootstrap with -O3 -march=znver2

2024-01-12 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113137

--- Comment #15 from Tamar Christina  ---
(In reply to David Binderman from comment #14)
> (In reply to Tamar Christina from comment #13)
> > Patch submitted
> 
> Two weeks have elapsed and the patch doesn't seem to appear in git.
> 
> Is it perhaps stuck somewhere ?

maintainers were on holiday till this week.  Everything's been approved now and
making the final changes maintainers wanted and will regtest on various
architectures.

I expect to commit the patches sometime today.  Sorry for the delay.

[Bug testsuite/113319] Random LTO test failures

2024-01-11 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113319

Tamar Christina  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|ASSIGNED|RESOLVED

--- Comment #5 from Tamar Christina  ---
Fixed, sorry for the breakage.

[Bug testsuite/113319] Random LTO test failures

2024-01-11 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113319

Tamar Christina  changed:

   What|Removed |Added

 Status|NEW |ASSIGNED
   Assignee|unassigned at gcc dot gnu.org  |tnfchris at gcc dot 
gnu.org
   Target Milestone|--- |14.0

[Bug fortran/90608] Inline non-scalar minloc/maxloc calls

2024-01-10 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90608

--- Comment #18 from Tamar Christina  ---
(In reply to Mikael Morin from comment #16)
> This missed the gcc stage 1 deadline, but I'm still working on it.

Thanks Mikael!  If I can help with anything do let me know :)

[Bug tree-optimization/112468] [14 Regression] Missed phi-opt after recent change (phi-opt-24.c)

2024-01-10 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112468

Tamar Christina  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|ASSIGNED|RESOLVED

--- Comment #14 from Tamar Christina  ---
Fixed thanks for the report.

[Bug tree-optimization/113287] wrong code with __builtin_mul_overflow_p() and _BitInt() with -O3 -msse4

2024-01-10 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113287

Tamar Christina  changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

--- Comment #7 from Tamar Christina  ---
Fixed, thanks for the report!

[Bug tree-optimization/113144] [14 regression] ICE when building dpkg-1.21.15 in verify_dominators (error: dominator of 9 should be 48, not 12)

2024-01-10 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113144

Tamar Christina  changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

--- Comment #17 from Tamar Christina  ---
Fixed, thanks for the reports!

[Bug tree-optimization/113287] wrong code with __builtin_mul_overflow_p() and _BitInt() with -O3 -msse4

2024-01-10 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113287

--- Comment #5 from Tamar Christina  ---
Created attachment 57023
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=57023=edit
branch_check.patch

Patch undergoing testing

[Bug tree-optimization/113144] [14 regression] ICE when building dpkg-1.21.15 in verify_dominators (error: dominator of 9 should be 48, not 12)

2024-01-09 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113144

--- Comment #15 from Tamar Christina  ---
(In reply to Martin Jambor from comment #13)
> The testcase below segfaults when compiled with master configured with
> release checking.  However, it is very likely affected by this bug (it
> fails with checking compiler like testcases for this issue do) and so
> I did not want to file a new bug for a testcase where we know we're
> currently having problems keeping dominance information.
> 
> Tamar, after you fix this issue, can you please check if the following
> segfaults when compiled with -std=gnu99 -fpermissive -fgnu89-inline
> -Ofast -march=znver2 -fprofile-generate -S ?
> 

yeah, with my current patches it works fine:

> ./install/bin/gcc segf.c -std=gnu99 -fpermissive -fgnu89-inline -Ofast 
> -march=znver2 -fprofile-generate -S -w; and echo $status
0

I'll commit the bulk tomorrow. Thanks for the testcase!

[Bug tree-optimization/113144] [14 regression] ICE when building dpkg-1.21.15 in verify_dominators (error: dominator of 9 should be 48, not 12)

2024-01-09 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113144

--- Comment #14 from Tamar Christina  ---
Yeah I'll test. Richi approved the fix today and I'll commit after a final
regtest

[Bug rtl-optimization/113287] wrong code with __builtin_mul_overflow_p() and _BitInt() with -O3 -msse4

2024-01-09 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113287

--- Comment #4 from Tamar Christina  ---
Ok, definitely mine :)

I've miss identified that the exit doesn't leave the loop.
Quick hack fixes the issue. I'll work on a proper one tomorrow morning.

[Bug rtl-optimization/113287] wrong code with __builtin_mul_overflow_p() and _BitInt() with -O3 -msse4

2024-01-09 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113287

Tamar Christina  changed:

   What|Removed |Added

 Status|UNCONFIRMED |ASSIGNED
   Assignee|unassigned at gcc dot gnu.org  |tnfchris at gcc dot 
gnu.org
   Last reconfirmed||2024-01-09
 Ever confirmed|0   |1

--- Comment #3 from Tamar Christina  ---
I'll take it. Thanks :)

The key difference between the loop that aborts and the one that doesn't is
this:

The one that doesn't abort generates:

  mask_patt_21.17_45 = vect__5.16_43 != vect_cst__44;
  if (mask_patt_21.17_45 != { 0, 0, 0, 0 })
goto ; [5.50%]

and the one that does genertaes:

  mask_patt_72.30_100 = vect_cst__101 != vect__48.29_102;
  if (mask_patt_72.30_100 == { -1, -1, -1, -1 })
goto ; [20.00%]

This happens when the CFG in the loop is flipped, so the vectorizer is instead
asking for the target to check that all values are true instead of any.

Looking at ix86_expand_branch only one part of the branch does a XOR to account
for the difference.

So I'll first start looking whether the generated assembly is as expected.

[Bug tree-optimization/113290] Optimize dominator updated for peeling with multiple exits

2024-01-09 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113290

Tamar Christina  changed:

   What|Removed |Added

   Target Milestone|--- |15.0
   Keywords||compile-time-hog

--- Comment #2 from Tamar Christina  ---
marked as compile-time-hog but should be within reason in GCC 14 due to limits
on which loops we support.  But as we generalize it it'll become an issue.

[Bug middle-end/113199] [14 Regression][GCN] ICE (segfault) due to invalid 'loop_mask_46 = VEC_PERM_EXPR' when compiling Newlib's wcsftime.c

2024-01-09 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113199

Tamar Christina  changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

--- Comment #4 from Tamar Christina  ---
Fixed, thanks for the report!

[Bug tree-optimization/113290] Optimize dominator updated for peeling with multiple exits

2024-01-09 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113290

Tamar Christina  changed:

   What|Removed |Added

 Status|UNCONFIRMED |ASSIGNED
 Ever confirmed|0   |1
   Last reconfirmed||2024-01-09

--- Comment #1 from Tamar Christina  ---
mine for GCC 15

[Bug tree-optimization/113290] New: Optimize dominator updated for peeling with multiple exits

2024-01-09 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113290

Bug ID: 113290
   Summary: Optimize dominator updated for peeling with multiple
exits
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: tnfchris at gcc dot gnu.org
  Reporter: tnfchris at gcc dot gnu.org
  Target Milestone: ---

When peeling with multiple exits we currently have a quadratic dominators
update.

We should see if we can't optimize this by delaying dominators update till
after vectorization as we don't require dominators during vect.

[Bug c/113267] pragma novector ICEs when no loop condition

2024-01-09 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113267

Tamar Christina  changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

--- Comment #3 from Tamar Christina  ---
Fixed.

[Bug middle-end/113163] [14 Regression][GCN] ICE in vect_peel_nonlinear_iv_init, at tree-vect-loop.cc:9420

2024-01-09 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113163

Tamar Christina  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|ASSIGNED|RESOLVED

--- Comment #14 from Tamar Christina  ---
Fixed, thanks for the report.

[Bug tree-optimization/113144] [14 regression] ICE when building dpkg-1.21.15 in verify_dominators (error: dominator of 9 should be 48, not 12)

2024-01-08 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113144

--- Comment #12 from Tamar Christina  ---
(In reply to Andrew Pinski from comment #11)
> (In reply to Zdenek Sojka from comment #10)
> > Created attachment 57009 [details]
> > simpler testcase using _BitInt()
> > 
> > $ x86_64-pc-linux-gnu-gcc -O3 -mavx2 testcase.c 
> > testcase.c: In function 'foo':
> > testcase.c:5:1: error: dominator of 24 should be 57, not 12
> > 5 | foo (void)
> >   | ^~~
> > testcase.c:5:1: error: dominator of 25 should be 57, not 12
> > during GIMPLE pass: vect
> > testcase.c:5:1: internal compiler error: in verify_dominators, at
> > dominance.cc:1194
> > 0x742528 verify_dominators(cdi_direction)
> > ...
> 
> that might be a different issue ...

It's the same issue, but also fixed by the patch.
It's however not simpler :) this expands into a series of loops, and one of the
loops has 3 exits which is why it triggers.

The fixes that I am re-spinning fixes this and correctly vectorizes the loops.

[Bug c/113267] pragma novector ICEs when no loop condition

2024-01-08 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113267

Tamar Christina  changed:

   What|Removed |Added

   Target Milestone|--- |14.0

[Bug c/113267] pragma novector ICEs when no loop condition

2024-01-08 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113267

Tamar Christina  changed:

   What|Removed |Added

 Status|UNCONFIRMED |ASSIGNED
   Last reconfirmed||2024-01-08
   Assignee|unassigned at gcc dot gnu.org  |tnfchris at gcc dot 
gnu.org
 Ever confirmed|0   |1

--- Comment #1 from Tamar Christina  ---
mine.

[Bug c/113267] New: pragma novector ICEs when no loop condition

2024-01-08 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113267

Bug ID: 113267
   Summary: pragma novector ICEs when no loop condition
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Keywords: ice-on-valid-code
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: tnfchris at gcc dot gnu.org
  Target Milestone: ---

The following:

void f (char *a, int i)
{
#pragma GCC novector
  for (;;i++)
a[i] *= 2;
}

segfaults:

pragma.c: In function 'f':
pragma.c:4:3: internal compiler error: Segmentation fault
4 |   for (;;i++)
  |   ^~~
0x17d238f crash_signal
/data/tamchr01/gnu-work-b1/src/gcc/gcc/toplev.cc:316
0x7fb59e26451f ???
./signal/../sysdeps/unix/sysv/linux/x86_64/libc_sigaction.c:0
0xd9b626 contains_struct_check(tree_node*, tree_node_structure_enum, char
const*, int, char const*)
/data/tamchr01/gnu-work-b1/src/gcc/gcc/tree.h:3757
0xe4600a c_parser_for_statement
/data/tamchr01/gnu-work-b1/src/gcc/gcc/c/c-parser.cc:8446
0xe59e57 c_parser_pragma
/data/tamchr01/gnu-work-b1/src/gcc/gcc/c/c-parser.cc:14676
0xe42daa c_parser_compound_statement_nostart
/data/tamchr01/gnu-work-b1/src/gcc/gcc/c/c-parser.cc:7201
0xe40b00 c_parser_compound_statement
/data/tamchr01/gnu-work-b1/src/gcc/gcc/c/c-parser.cc:6527
0xe37915 c_parser_declaration_or_fndef

because there's no loop condition to attach the pragma too.

[Bug tree-optimization/113237] [14 Regression] ICE verify_ssa failed when building 500.perlbench_r since r14-6822-g01f4251b8775c8

2024-01-07 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113237

--- Comment #4 from Tamar Christina  ---
Created attachment 57003
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=57003=edit
perlbench.patch

submitted patch

[Bug tree-optimization/113144] [14 regression] ICE when building dpkg-1.21.15 in verify_dominators (error: dominator of 9 should be 48, not 12)

2024-01-05 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113144

--- Comment #9 from Tamar Christina  ---
*** Bug 113145 has been marked as a duplicate of this bug. ***

[Bug middle-end/26163] [meta-bug] missed optimization in SPEC (2k17, 2k and 2k6 and 95)

2024-01-05 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=26163
Bug 26163 depends on bug 113145, which changed state.

Bug 113145 Summary: [14 regression] ICE in verify_dominators when building 
mit-krb5-1.21.2 since r14-6822-g01f4251b8775c8
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113145

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |DUPLICATE

[Bug tree-optimization/113145] [14 regression] ICE in verify_dominators when building mit-krb5-1.21.2 since r14-6822-g01f4251b8775c8

2024-01-05 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113145

Tamar Christina  changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |DUPLICATE

--- Comment #3 from Tamar Christina  ---
Oh I had missed this one, both those cases already fixed in one of the
submitted patches. I think 113144 so I'll mark it as a dup.

With the submitted patches both vectorize correctly.

*** This bug has been marked as a duplicate of bug 113144 ***

[Bug tree-optimization/113237] [14 Regression] ICE verify_ssa failed when building 500.perlbench_r since r14-6822-g01f4251b8775c8

2024-01-04 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113237

Tamar Christina  changed:

   What|Removed |Added

   Priority|P3  |P1
   Last reconfirmed||2024-01-04
 Ever confirmed|0   |1
   Assignee|unassigned at gcc dot gnu.org  |tnfchris at gcc dot 
gnu.org
 Status|UNCONFIRMED |ASSIGNED

--- Comment #3 from Tamar Christina  ---
Thanks,

Indeed the patch for PR 113137 won't fix this one as it looks like the peeling
code has gotten confused about which exit is which when adjusting
virtual_operands.

It looks like it's swapped them, and this happens because non of the loop exits
are counting one so it just picks a random one.

Looks the one it picks is not the latch connected one:

perl.c:10:8: note:   using as main loop exit: 11 -> 7 [AUX: (nil)]
perl.c:10:8: note:=== get_loop_niters ===
perl.c:10:8: note:Loop has 2 exits.
perl.c:10:8: note:Analyzing exit 0...
perl.c:10:8: note:Analyzing exit 1...

which then incorrectly peels:

 # iters_46 = PHI 

which should be:

 # iters_46 = PHI 

I started implemented a fix for this same situation earlier for PR 113178 but
didn't finish it because I didn't think we'd get this far with a legit loop.

I'll finish that part.  Thanks for the testcase!

[Bug tree-optimization/113237] [14 Regression] ICE verify_ssa failed when building 500.perlbench_r since r14-6822-g01f4251b8775c8

2024-01-04 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113237

--- Comment #2 from Tamar Christina  ---
Ah wait, I see. Ok, taking a look.

[Bug tree-optimization/113237] [14 Regression] ICE verify_ssa failed when building 500.perlbench_r since r14-6822-g01f4251b8775c8

2024-01-04 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113237

--- Comment #1 from Tamar Christina  ---
> I have bisected the failure to r14-6822-g01f4251b8775c8 (middle-end: Support
> vectorization of loops with multiple exits).  I have tried if the patch
> attached to PR 113137 helps but unfortunately it does not.

Indeed this should be fixed by the patch in PR 113136 not 113137 :)

[Bug target/113116] [14 Regression] ~11-17% exec time regression of 436.cactusADM on aarch64

2024-01-03 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113116

Tamar Christina  changed:

   What|Removed |Added

 CC||tnfchris at gcc dot gnu.org
   Last reconfirmed||2024-01-03
 Status|UNCONFIRMED |NEW
 Ever confirmed|0   |1

--- Comment #2 from Tamar Christina  ---
(In reply to Filip Kastl from comment #0)
> As seen on the graphs here
> 
> https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=578.100.0
> https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=586.100.0
> 
> between commits
> g:8e0568d8ac9dbfc8
> g:5641787abeea0fdc
> 
> there is a slowdown of 436.cactusADM SPEC2006 benchmark, 11% for Ofast
> native LTO PGO and 17% for Ofast native LTO.


Did you mean -Ofast native PGO? both linked runs are PGO.

In any case, confirmed. Running bisect.

[Bug middle-end/113199] [14 Regression][GCN] ICE (segfault) due to invalid 'loop_mask_46 = VEC_PERM_EXPR' when compiling Newlib's wcsftime.c

2024-01-02 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113199

--- Comment #2 from Tamar Christina  ---
Created attachment 56980
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=56980=edit
submitted-patch.patch

Have submitted this to list. thanks for report!

[Bug middle-end/113199] [14 Regression][GCN] ICE (segfault) due to invalid 'loop_mask_46 = VEC_PERM_EXPR' when compiling Newlib's wcsftime.c

2024-01-02 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113199

Tamar Christina  changed:

   What|Removed |Added

   Last reconfirmed||2024-01-02
   Assignee|unassigned at gcc dot gnu.org  |tnfchris at gcc dot 
gnu.org
 Ever confirmed|0   |1
 Status|UNCONFIRMED |ASSIGNED

--- Comment #1 from Tamar Christina  ---
Thanks, patch for https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113137 will
reject this function since it's not safe to vectorize.

For this is looks like I has assumed the target could reverse a mask and didn't
check. I'll add a check.

[Bug middle-end/113163] [14 Regression][GCN] ICE in vect_peel_nonlinear_iv_init, at tree-vect-loop.cc:9420

2024-01-02 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113163

--- Comment #12 from Tamar Christina  ---
(In reply to Andrew Stubbs from comment #11)
> (In reply to Tamar Christina from comment #7)
> > This seems to happen because the vectorizer decides to use partial vectors
> > to vectorize the loop and the target picks a nonlinear induction step which
> > we can't support for early breaks.
> 
> In which hook is this selected?
> 
> I'm not aware of this being a deliberate choice we made...

I haven't looked into why on the target we get a nonlinear induction (mostly
because I don't know much about the target). I however wasn't able to reproduce
it on SVE and x86_64 even when forcing partial vectors.

So I guess it's more accurate to say "something about the target" is making the
vectorizer pick a nonlinear induction.  most likely missing support for
something.

Note that the testcase does work if partial vectors is turned off. So It should
work fine if the vectorizer retries without partial vectors.

waiting for Richi to come back from holiday to review the patch and should be
fixed.

[Bug tree-optimization/113178] [14 Regression] ice in find_uses_to_rename_use

2024-01-02 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113178

Tamar Christina  changed:

   What|Removed |Added

 Status|NEW |ASSIGNED
   See Also||https://gcc.gnu.org/bugzill
   ||a/show_bug.cgi?id=113137
   Assignee|unassigned at gcc dot gnu.org  |tnfchris at gcc dot 
gnu.org
 CC||tnfchris at gcc dot gnu.org

--- Comment #5 from Tamar Christina  ---
Thanks, fixed by submitted patch for
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113137

[Bug middle-end/113172] [14 Regression] ice in move_early_exit_stmts

2023-12-29 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113172

--- Comment #2 from Tamar Christina  ---
Patch submitted

[Bug middle-end/113163] [14 Regression][GCN] ICE in vect_peel_nonlinear_iv_init, at tree-vect-loop.cc:9420

2023-12-29 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113163

--- Comment #10 from Tamar Christina  ---
Patch submitted

[Bug tree-optimization/113144] [14 regression] ICE when building dpkg-1.21.15 in verify_dominators (error: dominator of 9 should be 48, not 12)

2023-12-29 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113144

--- Comment #8 from Tamar Christina  ---
Patch submitted

[Bug tree-optimization/113136] [14 regression] ICE when building Perl

2023-12-29 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113136

--- Comment #9 from Tamar Christina  ---
Patch submitted

[Bug tree-optimization/113137] [14 regression] Failed bootstrap with -O3 -march=znver2

2023-12-29 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113137

--- Comment #13 from Tamar Christina  ---
Patch submitted

[Bug middle-end/113172] [14 Regression] ice in move_early_exit_stmts

2023-12-29 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113172

Tamar Christina  changed:

   What|Removed |Added

   Last reconfirmed||2023-12-29
 Status|UNCONFIRMED |ASSIGNED
 Ever confirmed|0   |1
   Assignee|unassigned at gcc dot gnu.org  |tnfchris at gcc dot 
gnu.org
   See Also||https://gcc.gnu.org/bugzill
   ||a/show_bug.cgi?id=113137

--- Comment #1 from Tamar Christina  ---
Thanks for the report, fixed by patch for #113137 and now correctly vectorizes

[Bug target/110625] [14 Regression][AArch64] Vect: SLP fails to vectorize a loop as the reduction_latency calculated by new costs is too large

2023-12-29 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110625

Tamar Christina  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|ASSIGNED|RESOLVED

--- Comment #25 from Tamar Christina  ---
(In reply to Hao Liu from comment #0)
> This problem causes a performance regression in SPEC2017 538.imagick.  For
> the following simple case (modified from pr96208):
> 
> typedef struct {
> unsigned short m1, m2, m3, m4;
> } the_struct_t;
> typedef struct {
> double m1, m2, m3, m4, m5;
> } the_struct2_t;
> 
> double bar1 (the_struct2_t*);
> 
> double foo (double* k, unsigned int n, the_struct_t* the_struct) {
> unsigned int u;
> the_struct2_t result;
> for (u=0; u < n; u++, k--) {
> result.m1 += (*k)*the_struct[u].m1;
> result.m2 += (*k)*the_struct[u].m2;
> result.m3 += (*k)*the_struct[u].m3;
> result.m4 += (*k)*the_struct[u].m4;
> }
> return bar1 ();
> }
> 

In the context of this report the regression should be fixed, however we still
don't vectorize this loop.  We ran this and other cases comparing scalar and
vector versions of this loop and it looks like specifically Neoverse N2 does
much better using the scalar version here.  So it looks like the cost model is
doing the right thing here for the current codegen of the function.

Note that the vector version:

ldr q31, [x3], 16
ldr q29, [x4], -16
rev64   v31.8h, v31.8h
uxtlv30.4s, v31.4h
uxtl2   v31.4s, v31.8h
sxtlv27.2d, v30.2s
sxtlv28.2d, v31.2s
sxtl2   v30.2d, v30.4s
sxtl2   v31.2d, v31.4s
scvtf   v27.2d, v27.2d
scvtf   v28.2d, v28.2d
scvtf   v30.2d, v30.2d
scvtf   v31.2d, v31.2d
fmlav26.2d, v27.2d, v29.d[1]
fmlav24.2d, v30.2d, v29.d[1]
fmlav23.2d, v28.2d, v29.d[0]
fmlav25.2d, v31.2d, v29.d[0]

Is still pretty inefficient due to all the extends.  If we generate better code
here this may tip the scale back to vector.  But for now, the patch should fix
the regression.

[Bug middle-end/113163] [14 Regression][GCN] ICE in vect_peel_nonlinear_iv_init, at tree-vect-loop.cc:9420

2023-12-29 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113163

Tamar Christina  changed:

   What|Removed |Added

 CC||doko at gcc dot gnu.org

--- Comment #9 from Tamar Christina  ---
*** Bug 113169 has been marked as a duplicate of this bug. ***

[Bug target/113169] [14 Regression] ICE in vect_peel_nonlinear_iv_init, at tree-vect-loop.cc:9420 on amdgcn-amdhsa

2023-12-29 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113169

Tamar Christina  changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution|--- |DUPLICATE

--- Comment #2 from Tamar Christina  ---
Thanks, dup of #113163 patch already submitted waiting review.

*** This bug has been marked as a duplicate of bug 113163 ***

[Bug target/113171] Unneeded zero extend after widening load with SVE

2023-12-29 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113171

Tamar Christina  changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution|--- |INVALID

--- Comment #1 from Tamar Christina  ---
Whoops, missed the add in between

[Bug target/113171] New: Unneeded zero extend after widening load with SVE

2023-12-29 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113171

Bug ID: 113171
   Summary: Unneeded zero extend after widening load with SVE
   Product: gcc
   Version: 13.0
Status: UNCONFIRMED
  Keywords: missed-optimization
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: tnfchris at gcc dot gnu.org
  Target Milestone: ---
Target: aarch64*

The following testcase

#include 

void __attribute__ ((noinline, noclone))
unpack_double_int_plus9 (double *d, uint32_t *s, int size)
{
  for (int i = 0; i < size; i++)
d[i] = (double) (s[i] + 9);
}

compiled with

-march=armv8-a+sve -O2 -ftree-vectorize

generates:

.L3:
ld1wz31.d, p7/z, [x1, x3, lsl 2]
add z31.s, z31.s, #9
uxtwz31.d, p6/m, z31.d
scvtf   z31.d, p6/m, z31.d
st1dz31.d, p7, [x0, x3, lsl 3]
incdx3
whilelo p7.d, w3, w2
b.any   .L3

which looks like the zero extend is unneeded.

[Bug tree-optimization/113137] [14 regression] Failed bootstrap with -O3 -march=znver2

2023-12-29 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113137

--- Comment #12 from Tamar Christina  ---
ok, x86_64 bootstrap and regtest with -O3 and --enable-checking=yes,rtl,extra
now passes.

aarch64 hit a small issue in libgcc that I'm not sure I should be allowing or
not. will investigate and either fix or disable and post patches.

[Bug tree-optimization/113137] [14 regression] Failed bootstrap with -O3 -march=znver2

2023-12-29 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113137

--- Comment #11 from Tamar Christina  ---
Created attachment 56963
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=56963=edit
maintain-lcssa-peeled.patch

patch undergoing testing for both this and PR113136

[Bug testsuite/113167] [14 Regression] gcc.dg/tree-ssa/gen-vect-26.c started failing many targets after recent change

2023-12-29 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113167

Tamar Christina  changed:

   What|Removed |Added

 CC||tnfchris at gcc dot gnu.org

--- Comment #6 from Tamar Christina  ---
Thanks for the fix.

I suspect they'll be other target tests that fail in similar ways as well as a
side effect of abort loops being vectorizable now.

[Bug tree-optimization/113137] [14 regression] Failed bootstrap with -O3 -march=znver2

2023-12-28 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113137

--- Comment #10 from Tamar Christina  ---
Ok, so this bug is simply fixed by:

diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index f51ae3e719e..e7a5917bc4c 100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -976,7 +976,8 @@ vec_init_loop_exit_info (class loop *loop)
   if (number_of_iterations_exit_assumptions (loop, exit, _desc,
NULL)
  && !chrec_contains_undetermined (niter_desc.niter))
{
- if (!niter_desc.may_be_zero || !candidate)
+ tree may_be_zero = niter_desc.may_be_zero;
+ if ((may_be_zero && integer_zerop (may_be_zero)) || !candidate)
candidate = exit;
}
 }

because niter_desc.may_be_zero is not a boolean but instead a tree that encodes
a boolean.

Due to this we were forcing much more complicated loops than required.  However
we *should* be able to handle these complicated loops since we don't know when
they'll occur.. so I'll post a companion patch to fix those too.

[Bug tree-optimization/113137] [14 regression] Failed bootstrap with -O3 -march=znver2

2023-12-28 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113137

--- Comment #9 from Tamar Christina  ---
Ok, have a working patch but it's a bit ugly, working on cleaning it up.

[Bug middle-end/113163] [14 Regression][GCN] ICE in vect_peel_nonlinear_iv_init, at tree-vect-loop.cc:9420

2023-12-28 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113163

--- Comment #8 from Tamar Christina  ---
Created attachment 56959
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=56959=edit
nonlinear IV

[Bug middle-end/113163] [14 Regression][GCN] ICE in vect_peel_nonlinear_iv_init, at tree-vect-loop.cc:9420

2023-12-28 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113163

Tamar Christina  changed:

   What|Removed |Added

 Ever confirmed|0   |1
 Status|UNCONFIRMED |ASSIGNED
   Priority|P3  |P1
   Last reconfirmed||2023-12-28
   Assignee|unassigned at gcc dot gnu.org  |tnfchris at gcc dot 
gnu.org

--- Comment #7 from Tamar Christina  ---
ok, managed to reproduce using a cross cc1.

This seems to happen because the vectorizer decides to use partial vectors to
vectorize the loop and the target picks a nonlinear induction step which we
can't support for early breaks.

The attached patch fixes it by rejecting these kind of inductions when partial
vectors are forced.

I'll submit the patch for when maintainers are back from holidays.

[Bug middle-end/113163] [14 Regression][GCN] ICE in vect_peel_nonlinear_iv_init, at tree-vect-loop.cc:9420

2023-12-28 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113163

--- Comment #4 from Tamar Christina  ---
Hmm so can't seem to reproduce it with x86_64 or aarch64.

let me build a --target=amdgcn-amdhsa

[Bug middle-end/113163] [14 Regression][GCN] ICE in vect_peel_nonlinear_iv_init, at tree-vect-loop.cc:9420

2023-12-28 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113163

--- Comment #3 from Tamar Christina  ---
Thanks, taking a look.

[Bug middle-end/113163] [14 Regression][GCN] ICE in vect_peel_nonlinear_iv_init, at tree-vect-loop.cc:9420

2023-12-28 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113163

Tamar Christina  changed:

   What|Removed |Added

 CC||tnfchris at gcc dot gnu.org

--- Comment #1 from Tamar Christina  ---
Hi,

This does not contain enough information for me to try to reproduce the issue.

Can you give me the full target triple, compile flag and how the compiler was
configured?

A reproducer would be helpful too.

[Bug c/113134] gcc does not version loops with early break conditions that don't have side-effects

2023-12-27 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113134

--- Comment #16 from Tamar Christina  ---
> 
> I wonder whether ARM SVE can also use this approach VEC_EXTRACT with index =
> 0.

Perhaps, I'll look into it thanks. though this is ofcourse only applicable when
the mask comes from whilelo.

In the future when we get to loops such as:

for (int i = ..;;)
{
  if (a)
{
  
  if (b)
return i;
}
}

the reduction would come from the first active element of the mask created by
the condition a and not the whilelo.

[Bug c/113134] gcc does not version loops with early break conditions that don't have side-effects

2023-12-27 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113134

--- Comment #15 from Tamar Christina  ---
(In reply to JuzheZhong from comment #14)
> > > > sure, but you can't use BIT_FIELD_REF on VLA vectors.
> > > 
> > > So, for length partial vector. We can use VEC_EXTRACT with index = 0 since
> > > VEC_EXTRACT optab allows VLA vectors now for length target.
> > 
> > Sounds good :)
> 
> I wonder whether ARM SVE can also use this approach VEC_EXTRACT with index =
> 0.
> 
> I guess the only issue is that when mask = all zero. That is, there is no
> active elements, What behavior should be here for early break ?

That shouldn't happen, in that case you wouldn't have entered the loop. To
prevent this there's always a compare of n > 0 at the start of the loops to
skip the vector body entirely.

[Bug c/113134] gcc does not version loops with early break conditions that don't have side-effects

2023-12-27 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113134

--- Comment #13 from Tamar Christina  ---
(In reply to JuzheZhong from comment #12)
> (In reply to Tamar Christina from comment #11)
> > (In reply to JuzheZhong from comment #10)
> > > (In reply to Tamar Christina from comment #9)
> > > > (In reply to JuzheZhong from comment #8)
> > > > > Suppose the loop mask is generated by whilelo instruction of ARM SVE.
> > > > > 
> > > > > Suppose we have 8 elements in a single whole vector.
> > > > > 
> > > > > mask = whilo (0, res) if res = 6, then mask = 1000.
> > > > > data = 12345678
> > > > > 
> > > > > Then if it is early break. You are reversing both data and mask as 
> > > > > follows:
> > > > > 
> > > > > new_mask = 0001
> > > > > new_data = 87654321
> > > > > 
> > > > > Then use the EXTRACT_LAST, we will get value = 1 for early break.
> > > > > 
> > > > > Am I right ?
> > > > 
> > > > Yeah, the idea being the scalar loop will then run from 1 to 6 to do any
> > > > side effects that we couldn't apply.
> > > > 
> > > > We went with this approach first because it works for non-masked
> > > > architectures too. In GCC-15 we'll try to implement staying entirely 
> > > > inside
> > > > a vector loop by splitting the mask in elements until first active and
> > > > element from first active so we can correctly mask the operations.
> > > 
> > > Ok. For the current approach. Isn't it the first element is always 
> > > element 0
> > > ?
> > > 
> > > Since for ARM SVE loop mask is generated by whilelo instructions, it 
> > > always
> > > set
> > > mask bit from 0 to the last active element - 1.
> > 
> > sure, but you can't use BIT_FIELD_REF on VLA vectors.
> 
> So, for length partial vector. We can use VEC_EXTRACT with index = 0 since
> VEC_EXTRACT optab allows VLA vectors now for length target.

Sounds good :)

[Bug c/113134] gcc does not version loops with early break conditions that don't have side-effects

2023-12-27 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113134

--- Comment #11 from Tamar Christina  ---
(In reply to JuzheZhong from comment #10)
> (In reply to Tamar Christina from comment #9)
> > (In reply to JuzheZhong from comment #8)
> > > Suppose the loop mask is generated by whilelo instruction of ARM SVE.
> > > 
> > > Suppose we have 8 elements in a single whole vector.
> > > 
> > > mask = whilo (0, res) if res = 6, then mask = 1000.
> > > data = 12345678
> > > 
> > > Then if it is early break. You are reversing both data and mask as 
> > > follows:
> > > 
> > > new_mask = 0001
> > > new_data = 87654321
> > > 
> > > Then use the EXTRACT_LAST, we will get value = 1 for early break.
> > > 
> > > Am I right ?
> > 
> > Yeah, the idea being the scalar loop will then run from 1 to 6 to do any
> > side effects that we couldn't apply.
> > 
> > We went with this approach first because it works for non-masked
> > architectures too. In GCC-15 we'll try to implement staying entirely inside
> > a vector loop by splitting the mask in elements until first active and
> > element from first active so we can correctly mask the operations.
> 
> Ok. For the current approach. Isn't it the first element is always element 0
> ?
> 
> Since for ARM SVE loop mask is generated by whilelo instructions, it always
> set
> mask bit from 0 to the last active element - 1.

sure, but you can't use BIT_FIELD_REF on VLA vectors.

[Bug c/113134] gcc does not version loops with early break conditions that don't have side-effects

2023-12-27 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113134

--- Comment #9 from Tamar Christina  ---
(In reply to JuzheZhong from comment #8)
> Suppose the loop mask is generated by whilelo instruction of ARM SVE.
> 
> Suppose we have 8 elements in a single whole vector.
> 
> mask = whilo (0, res) if res = 6, then mask = 1000.
> data = 12345678
> 
> Then if it is early break. You are reversing both data and mask as follows:
> 
> new_mask = 0001
> new_data = 87654321
> 
> Then use the EXTRACT_LAST, we will get value = 1 for early break.
> 
> Am I right ?

Yeah, the idea being the scalar loop will then run from 1 to 6 to do any side
effects that we couldn't apply.

We went with this approach first because it works for non-masked architectures
too. In GCC-15 we'll try to implement staying entirely inside a vector loop by
splitting the mask in elements until first active and element from first active
so we can correctly mask the operations.

[Bug c/113134] gcc does not version loops with early break conditions that don't have side-effects

2023-12-27 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113134

--- Comment #7 from Tamar Christina  ---
You may be able to use the same approach as

  else if (LOOP_VINFO_FULLY_MASKED_P (loop_vinfo))

that is, reverse both the mask and the vector and using extract last.
It's not going to be performance critical so it's more important to be correct
rather than fast.

[Bug c/113134] gcc does not version loops with early break conditions that don't have side-effects

2023-12-27 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113134

--- Comment #6 from Tamar Christina  ---
(In reply to JuzheZhong from comment #5)
> (In reply to Tamar Christina from comment #4)
> > (In reply to JuzheZhong from comment #3)
> > > I guess this code is just disabling partial vector for length for now.
> > > 
> > > And need me to test and port this part for length in the followup patches.
> > > 
> > > Am I right ?
> > 
> > Yeah, it needed to safely not allow it through for now. Once implemented 
> > you'll hit an assert in vectorizable_live_operations where you need to
> > provide a way to also get the first active element from a vector.
> 
> So for a length target, I enable cbranch optab but no vcond_mask_len optab.
> Will it behavior wrong ?
> 

You need both, if the operation requires a mask it'll reject it without
vcond_mask_len support.  Because I didn't know how to extract first element
using vcond_mask_len I had to disable it.

> Another question is could you give me more hints about
> vectorizable_live_operation?
> 
> I thought vectorizable_live_operation is doing extract last active element,
> I didn't see extract first active element.

Normally yes, but I added extract first active element for this patch.  This is
because when you hit and take an early exit we restrart the vector iteration
since there may be partial effects to perform between where the loop started
and where the element is found.

specifically look at vectorizable_live_operation_1 there's an assert under 
  if (LOOP_VINFO_FULLY_WITH_LENGTH_P (loop_vinfo))

with a comment saying what's needed.

[Bug c/113134] gcc does not version loops with early break conditions that don't have side-effects

2023-12-27 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113134

--- Comment #4 from Tamar Christina  ---
(In reply to JuzheZhong from comment #3)
> I guess this code is just disabling partial vector for length for now.
> 
> And need me to test and port this part for length in the followup patches.
> 
> Am I right ?

Yeah, it needed to safely not allow it through for now. Once implemented 
you'll hit an assert in vectorizable_live_operations where you need to provide
a way to also get the first active element from a vector.

[Bug c/113134] gcc does not version loops with side-effect early breaks

2023-12-27 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113134

Tamar Christina  changed:

   What|Removed |Added

   Last reconfirmed|2023-12-25 00:00:00 |2023-12-27
Summary|Middle end early break  |gcc does not version loops
   |vectorization: Fail to  |with side-effect early
   |vectorize a simple early|breaks
   |break code  |
 Ever confirmed|0   |1
 Status|UNCONFIRMED |NEW

--- Comment #2 from Tamar Christina  ---
So GCC's approach is much different than clang.

I think this should be handled by IVcannon as it makes the vectorizer code much
easier.  At the moment the vectorizer assumes that any exit it sees are
actually needed.  So even if I relax my patch to allow this we still produce a
pointless compare.

Looking at IVcannon it does for a constant sized array:

Loop 1 iterates 1001 times.
Loop 1 iterates at most 999 times.
Loop 1 likely iterates at most 999 times.
Analyzing # of iterations of loop 1
  exit condition [0, + , 1](no_overflow) <= 1000
  bounds on difference of bases: 1000 ... 1000
  result:
# of iterations 1001, bounded by 1001
Removed pointless exit: if (i_13 > 1000)

but for the example attached:

Loop 1 iterates 1001 times.
Loop 1 iterates at most 1001 times.
Loop 1 likely iterates at most 1001 times.
Analyzing # of iterations of loop 1
  exit condition [1, + , 1](no_overflow) < N_13(D)
  bounds on difference of bases: 0 ... 2147483646

It has correctly determined that the loop bounds is at most 1001 but since N
can  be < 1001 it doesn't think the additional exit is useless.

However like clang we can just version the loop. Unlike clang however we can
probably do better.

if N >= 1000 then we can enter the vector code without the additional exit, but
if N < 1000 we can use my new pass.

It's not hard to allow this through the pass, but I doubt this will be accepted
in stage3..

For best result the loop should be versioned like clang does.

Richi?

[Bug tree-optimization/113136] [14 regression] ICE when building Perl

2023-12-27 Thread tnfchris at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113136

--- Comment #8 from Tamar Christina  ---
Thanks, was able to reproduce with `--enable-checking=yes,rtl,extra`.

The issue seems to be that the value is unused, and we were relying on DSE
removing such statement. but with --enable-checking=yes,rtl,extra the extra
verification is done before we can remove the dead statements on these inverted
loops.

That's why it doesn't fail without and produces the right code.

So it looks like this is the same bug as
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113137 and the fix for that should
fix this.  It looks like for these inverted loops, even though none of the
values are used, I have to maintain the virtual phis.

I'll keep the two separate for now..

<    1   2   3   4   5   6   7   8   >