[Bug target/116145] SVE constant pool loads not hoisted outside loops

2024-07-31 Thread ktkachov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116145

ktkachov at gcc dot gnu.org changed:

   What|Removed |Added

Summary|Suboptimal SVE immediate|SVE constant pool loads not
   |synthesis   |hoisted outside loops

--- Comment #9 from ktkachov at gcc dot gnu.org ---
Renaming summary to better reflect the core problem.

[Bug target/116145] Suboptimal SVE immediate synthesis

2024-07-31 Thread ktkachov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116145

--- Comment #8 from ktkachov at gcc dot gnu.org ---
(In reply to Richard Sandiford from comment #7)
> I'll test a cleaned-up version of the change in comment 6.  Kyrill, is it OK
> to use your godbolt testcase in the testsuite?

Yes, that is fine, thanks for looking at it.

[Bug target/116145] Suboptimal SVE immediate synthesis

2024-07-31 Thread ktkachov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116145

--- Comment #4 from ktkachov at gcc dot gnu.org ---
Intersting, thanks for the background. The bigger issue I was seeing was with a
string-matching loop like https://godbolt.org/z/E7b13915E where the constant
pool load is a reasonable codegen decision, but unfortunately every iteration
of the loop reloads the constant which would hurt in a tight inner loop.
So perhaps my problem is that the constant-pool loads are not being considered
loop invariant, or something is sinking them erroneously

[Bug target/113813] SVE Reduction of xor/and/ior of 16 bytes can be improved

2024-07-31 Thread ktkachov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113813

ktkachov at gcc dot gnu.org changed:

   What|Removed |Added

 CC||ktkachov at gcc dot gnu.org
   Last reconfirmed||2024-07-31
 Ever confirmed|0   |1
 Status|UNCONFIRMED |NEW

--- Comment #1 from ktkachov at gcc dot gnu.org ---
Confirmed.

[Bug target/114603] aarch64: Invalid SVE cnot optimisation

2024-07-31 Thread ktkachov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114603

ktkachov at gcc dot gnu.org changed:

   What|Removed |Added

 CC||ktkachov at gcc dot gnu.org

--- Comment #3 from ktkachov at gcc dot gnu.org ---
GCC 12 and 13 still generate the erroneous CNOT. Is the fix suitable for a
backport?

[Bug target/114607] aarch64: Incorrect expansion of svsudot

2024-07-31 Thread ktkachov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114607

ktkachov at gcc dot gnu.org changed:

   What|Removed |Added

 CC||ktkachov at gcc dot gnu.org

--- Comment #3 from ktkachov at gcc dot gnu.org ---
Does this still need backporting?

[Bug target/109498] SVE support for ctz

2024-07-31 Thread ktkachov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109498

ktkachov at gcc dot gnu.org changed:

   What|Removed |Added

  Known to work|14.0|
 Status|UNCONFIRMED |NEW
 Ever confirmed|0   |1
 CC||ktkachov at gcc dot gnu.org
   Last reconfirmed||2024-07-31

--- Comment #2 from ktkachov at gcc dot gnu.org ---
This does get vectorised with SVE in GCC 14 but not optimally. It doesn't use
the recommended RBIT + CLZ but instead gives:
ctz:
cmp w2, 0
ble .L1
mov x3, 0
whilelo p7.s, wzr, w2
ptrue   p6.b, all
.L3:
ld1wz31.s, p7/z, [x1, x3, lsl 2]
movprfx z30, z31
neg z30.s, p6/m, z31.s
and z30.d, z30.d, z31.d
clz z30.s, p6/m, z30.s
subrz30.s, z30.s, #31
st1wz30.s, p7, [x0, x3, lsl 2]
incwx3
whilelo p7.s, w3, w2
b.any   .L3
.L1:
ret

with -Ofast -march=armv9-a --param aarch64-autovec-preference=2 .

[Bug target/116145] New: Suboptimal SVE immediate synthesis

2024-07-30 Thread ktkachov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116145

Bug ID: 116145
   Summary: Suboptimal SVE immediate synthesis
   Product: gcc
   Version: 15.0
Status: UNCONFIRMED
  Keywords: aarch64-sve, missed-optimization
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: ktkachov at gcc dot gnu.org
  Target Milestone: ---
Target: aarch64

While optimising some string matching code I wanted to create a vector of
characters to match through an svdup and an svreinterpret but am getting
suboptimal codegen through the constant pool:
A minimised testcase:
#include 

svuint8_t
foo (void)
{
return svreinterpret_u8(svdup_u32(0x0a0d5c3f));
}

generates for -O2 -march=armv9-a:
foo:
ptrue   p3.b, all
adrpx0, .LC0
add x0, x0, :lo12:.LC0
ld1rw   z0.s, p3/z, [x0]
ret
.LC0:
.word   168647743

but LLVM can do it with:
foo:
mov w8, #23615
movkw8, #2573, lsl #16
mov z0.s, w8
ret

[Bug tree-optimization/116139] ICE with --param fully-pipelined-fma=1 since it was added in r14-6559-g8afdbcdd7abe1e

2024-07-30 Thread ktkachov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116139

--- Comment #2 from ktkachov at gcc dot gnu.org ---
(In reply to Andrew Pinski from comment #1)
> Confirmed. This is NOT a regression since it ICEs with GCC 14 when
> configured with --enable-checking=yes as this is a gcc_checking_assert
> assert (which is disabled for --enable-checking=release :) ).

err yeah, I did think it was suspicious that 14 didn't ICE but I only used
godbolt to try older versions. You're right, it's a problem there as well.

[Bug tree-optimization/116139] [15 Regression] ICE with --param fully-pipelined-fma=1

2024-07-30 Thread ktkachov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116139

ktkachov at gcc dot gnu.org changed:

   What|Removed |Added

   Target Milestone|--- |15.0
  Known to fail||15.0
  Known to work||14.1.0

[Bug tree-optimization/116139] New: [15 Regression] ICE with --param fully-pipelined-fma=1

2024-07-30 Thread ktkachov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116139

Bug ID: 116139
   Summary: [15 Regression] ICE with --param fully-pipelined-fma=1
   Product: gcc
   Version: 15.0
Status: UNCONFIRMED
  Keywords: ice-on-valid-code
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: ktkachov at gcc dot gnu.org
  Target Milestone: ---
Target: aarch64

The testcase gcc.dg/pr110279-2.c ICEs when compiled with -Ofast
-mcpu=neoverse-v2 --param fully-pipelined-fma=1

during GIMPLE pass: reassoc
ice.c: In function ‘foo’:
ice.c:10:1: internal compiler error: in get_reassociation_width, at
tree-ssa-reassoc.cc:5520
   10 | foo (data_e in)
  | ^~~
0x20eab57 internal_error(char const*, ...)
$SRC/gcc/diagnostic-global-context.cc:491
0x7b014f fancy_abort(char const*, int, char const*)
$SRC/gcc/diagnostic.cc:1725
0x118eefb get_reassociation_width
$SRC/gcc/tree-ssa-reassoc.cc:5520
0x11a047b reassociate_bb
$SRC/gcc/tree-ssa-reassoc.cc:7223
0x119fda7 reassociate_bb
$SRC/gcc/tree-ssa-reassoc.cc:7277
0x119fda7 reassociate_bb
$SRC/gcc/tree-ssa-reassoc.cc:7277
0x11a2b27 do_reassoc
$SRC/gcc/tree-ssa-reassoc.cc:7389
0x11a2b27 execute_reassoc
$SRC/gcc/tree-ssa-reassoc.cc:7479
0x11a2b27 execute
$SRC/gcc/tree-ssa-reassoc.cc:7520
Please submit a full bug report, with preprocessed source (by using
-freport-bug).
Please include the complete backtrace with any bug report.
See  for instructions.

It hits the assert at:
5512  Find out if we can get a smaller width considering FMA.  */
5513   if (width > 1 && mult_num && param_fully_pipelined_fma)
5514 {
5515   /* When param_fully_pipelined_fma is set, assume FMUL and FMA use
the
5516  same units that can also do FADD.  For other scenarios, such as
when
5517  FMUL and FADD are using separated units, the following code may
not
5518  appy.  */
5519   int width_mult = targetm.sched.reassociation_width (MULT_EXPR,
mode);
5520   gcc_checking_assert (width_mult <= width);

I think this shouldn't be an assert but rather a condition in the if statement
on line 5513?

[Bug target/116129] New: Use SVE INDEX instruction to create constant vectors

2024-07-29 Thread ktkachov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116129

Bug ID: 116129
   Summary: Use SVE INDEX instruction to create constant vectors
   Product: gcc
   Version: 15.0
Status: UNCONFIRMED
  Keywords: aarch64-sve, missed-optimization
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: ktkachov at gcc dot gnu.org
  Target Milestone: ---
Target: aarch64

The INDEX (immediates) instruction can be used to create vectors based on a
base value and a step. We should make use of it even if generating fixed-length
vector immediates as it allows base and step values in [-16, 15]. We can use it
when TARGET_SVE is available.

For example:
#include 

typedef char v16qi __attribute__((vector_size (16)));

v16qi
foo (void)
{
  return (v16qi){0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15};
}

svint8_t
foo_sve (void)
{
  return svindex_s8 (0, 1);
}

These two functions should just use:
index   z0.b, #0, #1

Currently function foo will emit an ADR and an LDR from a constant pool

[Bug target/116084] New: Use dot-product instructions for byte->word PLUS reductions in vectorisation

2024-07-25 Thread ktkachov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116084

Bug ID: 116084
   Summary: Use dot-product instructions for byte->word PLUS
reductions in vectorisation
   Product: gcc
   Version: 15.0
Status: UNCONFIRMED
  Keywords: missed-optimization
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: ktkachov at gcc dot gnu.org
  Target Milestone: ---
Target: aarch64

#define N 32000
unsigned char in[N];
#define u32 unsigned

u32
foo (void)
{
  u32 res = 0;
  for (int i = 0; i < N; i++)
res += in[i];
  return res;
}

-Ofast -march=armv9-a
Same as we do for the SAD expansions, we should used the TARGET_DOTPROD
instructions to do the byte to word plus reduction as that instruction allows
us to do a two-step plus reduction as long as it gets a vector of 1s for its
second operand. Currently this generates:
.L3:
add w1, w1, w4
ld1bz2.s, p7/z, [x0]
ld1bz1.s, p7/z, [x0, #1, mul vl]
ld1bz0.s, p7/z, [x0, #2, mul vl]
ld1bz30.s, p7/z, [x0, #3, mul vl]
add z26.s, z26.s, z2.s
add x0, x0, x5
add z27.s, z27.s, z1.s
add z28.s, z28.s, z0.s
add z29.s, z29.s, z30.s
cmp w1, w2
bls .L3
add z26.s, z26.s, z27.s
add z28.s, z28.s, z29.s
add z31.s, z26.s, z28.s
cmp w1, w6
beq .L4
.L2:
mov x0, 0
cntwx4
add x3, x3, w1, uxtw
mov w2, 32000
sub w1, w2, w1
whilelo p7.s, wzr, w1
.L5:
ld1bz3.s, p7/z, [x3, x0]
add x0, x0, x4
add z31.s, p7/m, z31.s, z3.s
whilelo p7.s, w0, w1
b.any   .L5
.L4:
ptrue   p7.b, all
uaddv   d31, p7, z31.s
fmovx0, d31
ret

which is not bad but we can avoid the extending SVE loads and process the full
packed SVE vector of values with each iteration if we use UDOT.

[Bug target/116075] New: Inefficient SVE INSR codegen

2024-07-24 Thread ktkachov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116075

Bug ID: 116075
   Summary: Inefficient SVE INSR codegen
   Product: gcc
   Version: 15.0
Status: UNCONFIRMED
  Keywords: aarch64-sve, missed-optimization
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: ktkachov at gcc dot gnu.org
  Target Milestone: ---
Target: aarch64

I'm using the testcase:
#include 
#define N 32000
uint8_t in[N];
uint8_t in2[N];

uint32_t
foo (void)
{
  uint32_t res = 0;
  for (int i = 0; i < N; i++)
res += in[i];
  return res;
}

compiling with -Ofast -mcpu=neoverse-v2
Ignoring the vector loop for now, in the preamble I see generated code:
mov z31.b, #0
movprfx z30, z31
insrz30.s, wzr

which seems inefficient as it just zeroes out z31 and z30.

[Bug target/115950] Missed SVE fold to INCP

2024-07-17 Thread ktkachov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115950

--- Comment #3 from ktkachov at gcc dot gnu.org ---
(In reply to Andrew Pinski from comment #2)
> Hmm actually there are patterns there but they are not matching. Something
> seems to be going wrong with define_insn_and_rewrite ...

The MD pattern requires a (const_int SVE_KNOWN_PTRUE) in one of its operands
but the attempted match has (const_int 0) i.e. SVE_MAYBE_NOT_PTRUE which blocks
matching.

[Bug target/115950] New: Missed SVE fold to INCP

2024-07-16 Thread ktkachov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115950

Bug ID: 115950
   Summary: Missed SVE fold to INCP
   Product: gcc
   Version: 15.0
Status: UNCONFIRMED
  Keywords: missed-optimization
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: ktkachov at gcc dot gnu.org
  Target Milestone: ---

#include 

using u64 = uint64_t;

u64 foo2(u64 x, svbool_t pg)
{
  return x+svcntp_b8(pg, pg);
}

compiled with -O3 -march=armv9-a generates:
foo2(unsigned long, __SVBool_t, __SVBool_t):
cntpx1, p0, p0.b
add x0, x1, x0
ret

but that should be folded to:
foo2(unsigned long, __SVBool_t, __SVBool_t):// @foo2(unsigned
long, __SVBool_t, __SVBool_t)
incpx0, p0.b
ret

Like LLVM does.

[Bug target/115475] AArch64 should define __ARM_FEATURE_SVE_BF16 when appropriate

2024-07-09 Thread ktkachov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115475

ktkachov at gcc dot gnu.org changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|ASSIGNED|RESOLVED
   Target Milestone|15.0|11.5

--- Comment #8 from ktkachov at gcc dot gnu.org ---
Fixed on all active branches.

[Bug target/115457] AArch64 should define __ARM_FEATURE_BF16

2024-07-09 Thread ktkachov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115457

ktkachov at gcc dot gnu.org changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|ASSIGNED|RESOLVED
   Target Milestone|--- |11.5

--- Comment #8 from ktkachov at gcc dot gnu.org ---
Fixed on all active branches.

[Bug ipa/102061] .constprop gets exposed in warning message

2024-07-03 Thread ktkachov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102061

ktkachov at gcc dot gnu.org changed:

   What|Removed |Added

 CC||ktkachov at gcc dot gnu.org
   Assignee|unassigned at gcc dot gnu.org  |peter0x44 at disroot 
dot org
 Status|NEW |ASSIGNED

--- Comment #4 from ktkachov at gcc dot gnu.org ---
Assigning to Peter as per his request.

[Bug target/115475] AArch64 should define __ARM_FEATURE_SVE_BF16 when appropriate

2024-06-27 Thread ktkachov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115475

ktkachov at gcc dot gnu.org changed:

   What|Removed |Added

 Status|NEW |ASSIGNED
   Assignee|unassigned at gcc dot gnu.org  |ktkachov at gcc dot 
gnu.org

--- Comment #2 from ktkachov at gcc dot gnu.org ---
Mine.

[Bug target/115457] AArch64 should define __ARM_FEATURE_BF16

2024-06-27 Thread ktkachov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115457

ktkachov at gcc dot gnu.org changed:

   What|Removed |Added

   Assignee|unassigned at gcc dot gnu.org  |ktkachov at gcc dot 
gnu.org
 Status|NEW |ASSIGNED

--- Comment #2 from ktkachov at gcc dot gnu.org ---
Mine.

[Bug rtl-optimization/115667] Improve expansion for popcountti2

2024-06-26 Thread ktkachov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115667

--- Comment #1 from ktkachov at gcc dot gnu.org ---
In fact I'm sure it could even use the proposed new udot approach

[Bug rtl-optimization/115667] New: Improve expansion for popcountti2

2024-06-26 Thread ktkachov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115667

Bug ID: 115667
   Summary: Improve expansion for popcountti2
   Product: gcc
   Version: 13.3.1
Status: UNCONFIRMED
  Keywords: missed-optimization
  Severity: normal
  Priority: P3
 Component: rtl-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: ktkachov at gcc dot gnu.org
  Target Milestone: ---

Maybe this is aarch64-specific but for the testcase:
int
cnt (unsigned __int128 a)
{
  return __builtin_popcountg (a);
}

GCC for aarch64 will generate:
cnt:
fmovd30, x0
fmovd31, x1
cnt v30.8b, v30.8b
cnt v31.8b, v31.8b
addvb30, v30.8b
addvb31, v31.8b
fmovx1, d30
fmovx0, d31
add w0, w1, w0
ret

Effectively doing two DImode popcount expansions and adding the results.
Clang does the more effective:
cnt:// @cnt
fmovd0, x0
mov v0.d[1], x1
cnt v0.16b, v0.16b
uaddlv  h0, v0.16b
fmovw0, s0
ret

[Bug target/115618] [11/12/13 only] should defined __ARM_FEATURE_CRYPTO with +aes+sha2

2024-06-25 Thread ktkachov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115618

--- Comment #3 from ktkachov at gcc dot gnu.org ---
Testing showed no regressions and the issue in this PR is fixed by the
backport. I think we should backport this patch

[Bug target/115618] [11/12/13 only] should defined __ARM_FEATURE_CRYPTO with +aes+sha2

2024-06-25 Thread ktkachov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115618

--- Comment #2 from ktkachov at gcc dot gnu.org ---
(In reply to Andrew Pinski from comment #1)
> r14-6612-g8d30107455f230

Yeah I had suspected that commit. It does indeed fix this and applies cleanly
to GCC 13. I'll run more testing.

[Bug target/115618] New: GCC 13.3 should defined __ARM_FEATURE_CRYPTO with +aes+sha2

2024-06-24 Thread ktkachov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115618

Bug ID: 115618
   Summary: GCC 13.3 should defined __ARM_FEATURE_CRYPTO with
+aes+sha2
   Product: gcc
   Version: 13.3.1
Status: UNCONFIRMED
  Keywords: rejects-valid
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: ktkachov at gcc dot gnu.org
CC: tnfchris at gcc dot gnu.org
  Target Milestone: ---
Target: aarch64

I think this has been fixed in GCC 14 onwards but we're seeing
__ARM_FEATURE_CRYPTO missing from some -mcpu=native cases that should be
including it (the prerequisite "aes pmull sha1 sha2" info exists in cpuinfo)
with GCC 13-based compilers.

I think this can be reproduced with:
#ifndef __ARM_FEATURE_CRYPTO
#error "__ARM_FEATURE_CRYPTO should be defined!"
#endif
void
foo (void)
{
}

compiled with GCC 13.3 with -march=armv9-a+aes+sha2 gives an error but it works
with GCC 14.1. I remember there was much rework in this area, could something
be backported to the branch?

[Bug tree-optimization/114814] Reduction sum of comparison should be better

2024-06-20 Thread ktkachov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114814

ktkachov at gcc dot gnu.org changed:

   What|Removed |Added

 CC||ktkachov at gcc dot gnu.org
 Ever confirmed|0   |1
 Status|UNCONFIRMED |NEW
   Last reconfirmed||2024-06-20

--- Comment #1 from ktkachov at gcc dot gnu.org ---
Confirmed.
The SVE2 codegen isn't much better.
-O2 -mcpu=neoverse-v2 --param aarch64-autovec-preference=2

.L3:
ld1bz3.b, p5/z, [x0, x3]
mov p5.b, p13.b
cmpeq   p14.b, p5/z, z30.b, z3.b
mov z26.b, p14/z, #1
uunpklo z1.h, z26.b
whilelo p5.b, x3, x4
uunpklo z2.s, z1.h
uunpkhi z26.h, z26.b
uunpkhi z1.s, z1.h
uunpklo z0.s, z26.h
uunpklo z27.d, z2.s
uunpklo z24.d, z1.s
add z28.d, p7/m, z28.d, z27.d
uunpkhi z26.s, z26.h
mov p7.b, p15.b
uunpklo z25.d, z0.s
uunpklo z29.d, z26.s
whilelo p15.d, x3, x11
uunpkhi z2.d, z2.s
uunpkhi z1.d, z1.s
add z28.d, p6/m, z28.d, z2.d
uunpkhi z0.d, z0.s
add z28.d, p4/m, z28.d, z24.d
whilelo p6.d, x3, x5
add z28.d, p3/m, z28.d, z1.d
whilelo p4.d, x3, x6
add z28.d, p2/m, z28.d, z25.d
whilelo p3.d, x3, x7
add z28.d, p1/m, z28.d, z0.d
whilelo p2.d, x3, x8
add z28.d, p0/m, z28.d, z29.d
whilelo p1.d, x3, x9
whilelo p0.d, x3, x10
mov x1, x3
uunpkhi z26.d, z26.s
add x3, x3, x12
add z28.d, p7/m, z28.d, z26.d
whilelo p7.d, x1, x4
b.any   .L3
mov p7.b, p13.b
uaddv   d31, p7, z28.d
fmovx0, d31
ret

[Bug target/115475] AArch64 should define __ARM_FEATURE_SVE_BF16 when appropriate

2024-06-13 Thread ktkachov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115475

ktkachov at gcc dot gnu.org changed:

   What|Removed |Added

   Target Milestone|--- |15.0

[Bug target/115475] New: AArch64 should define __ARM_FEATURE_SVE_BF16 when appropriate

2024-06-13 Thread ktkachov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115475

Bug ID: 115475
   Summary: AArch64 should define __ARM_FEATURE_SVE_BF16 when
appropriate
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: ktkachov at gcc dot gnu.org
  Target Milestone: ---
Target: aarch64

According to the ACLE:
https://github.com/ARM-software/acle/blob/main/main/acle.md#brain-16-bit-floating-point-support

GCC should define __ARM_FEATURE_SVE_BF16 when SVE and the BF16 intrinsics are
available.
GCC implements the associated intrinsics, but doesn't define the macro. Clang
does.

[Bug target/115457] New: AArch64 should define __ARM_FEATURE_BF16

2024-06-12 Thread ktkachov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115457

Bug ID: 115457
   Summary: AArch64 should define __ARM_FEATURE_BF16
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: ktkachov at gcc dot gnu.org
  Target Milestone: ---
Target: aarch64

According to the ACLE:
https://github.com/ARM-software/acle/blob/main/main/acle.md#arm_bf16h
GCC should define __ARM_FEATURE_BF16 to allow users to check if the
 header is available.
Since GCC provides that header it should define __ARM_FEATURE_BF16.
LLVM does implement this, so it's an inconsistency between the compilers

[Bug tree-optimization/115383] [15 Regression] ICE with TCVC_2 build

2024-06-07 Thread ktkachov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115383

ktkachov at gcc dot gnu.org changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
   Last reconfirmed||2024-06-07
 CC||ktkachov at gcc dot gnu.org
 Target||aarch64
   Target Milestone|--- |15.0
Summary|[15 regression] ICE with|[15 Regression] ICE with
   |TCVC_2 build since  |TCVC_2 build
   |r15-1053-g28edeb1409a7b8|
  Known to fail||15.0
 Ever confirmed|0   |1
  Known to work||14.1.0
   Keywords||ice-on-valid-code

--- Comment #1 from ktkachov at gcc dot gnu.org ---
Confirmed. -O3 -msve-vector-bits=128 -mcpu=neoverse-v2 is enough to trigger it:
ice.c:2:6: internal compiler error: Segmentation fault
2 | void s331() {
  |  ^~~~
0xf4b59b crash_signal
$TOP/gcc/gcc/toplev.cc:319
0xb7f518 phi_nodes_ptr(basic_block_def*)
$TOP/gcc/gcc/gimple.h:4701
0xb7f518 gsi_start_phis(basic_block_def*)
$TOP/gcc/gcc/gimple-iterator.cc:937
0xb7f518 gsi_for_stmt(gimple*)
$TOP/gcc/gcc/gimple-iterator.cc:621
0x1fa982f vectorizable_condition
$TOP/gcc/gcc/tree-vect-stmts.cc:12577
0x1fc4627 vect_transform_stmt(vec_info*, _stmt_vec_info*,
gimple_stmt_iterator*, _slp_tree*, _slp_instance*)
$TOP/gcc/gcc/tree-vect-stmts.cc:13467
0x1261733 vect_schedule_slp_node
$TOP/gcc/gcc/tree-vect-slp.cc:9729
0x1276837 vect_schedule_slp_node
$TOP/gcc/gcc/tree-vect-slp.cc:9522
0x1276837 vect_schedule_scc
$TOP/gcc/gcc/tree-vect-slp.cc:10017
0x12776df vect_schedule_slp(vec_info*, vec<_slp_instance*, va_heap, vl_ptr>
const&)
$TOP/gcc/gcc/tree-vect-slp.cc:10110
0x1244837 vect_transform_loop(_loop_vec_info*, gimple*)
$TOP/gcc/gcc/tree-vect-loop.cc:12114
0x1287d5f vect_transform_loops
$TOP/gcc/gcc/tree-vectorizer.cc:1007
0x12883e7 try_vectorize_loop_1
$TOP/gcc/gcc/tree-vectorizer.cc:1153
0x12883e7 try_vectorize_loop
$TOP/gcc/gcc/tree-vectorizer.cc:1183
0x128875b execute
$TOP/gcc/gcc/tree-vectorizer.cc:1299
Please submit a full bug report, with preprocessed source (by using
-freport-bug).
Please include the complete backtrace with any bug report.
See  for instructions.

[Bug target/109939] Invalid return type for __builtin_arm_ssat: Unsigned instead of signed

2024-06-04 Thread ktkachov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109939

ktkachov at gcc dot gnu.org changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

--- Comment #8 from ktkachov at gcc dot gnu.org ---
This has been fixed some time ago.

[Bug target/99195] Optimise away vec_concat of 64-bit AdvancedSIMD operations with zeroes in aarch64

2024-04-04 Thread ktkachov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99195

ktkachov at gcc dot gnu.org changed:

   What|Removed |Added

  Known to work||14.0
 Resolution|--- |FIXED
 Status|NEW |RESOLVED

--- Comment #21 from ktkachov at gcc dot gnu.org ---
I think all the straightforward cases are handled and the infrastructure for
doing this is added. Any future improvements in the area should be tracked
separately. Marking as fixed for GCC 14.1

[Bug rtl-optimization/113019] [NOT A BUG] Multi-architecture binaries for Linux

2023-12-14 Thread ktkachov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113019

ktkachov at gcc dot gnu.org changed:

   What|Removed |Added

 CC||ktkachov at gcc dot gnu.org

--- Comment #1 from ktkachov at gcc dot gnu.org ---
GCC provides the Function Multiversioning feature that's supported on some
architectures:
https://gcc.gnu.org/onlinedocs/gcc/Function-Multiversioning.html

That seems to do what you want?

[Bug middle-end/111782] New: [11/12/13/14 Regression] Extra move in complex double multiplication

2023-10-12 Thread ktkachov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111782

Bug ID: 111782
   Summary: [11/12/13/14 Regression] Extra move in complex double
multiplication
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Keywords: missed-optimization
  Severity: normal
  Priority: P3
 Component: middle-end
  Assignee: unassigned at gcc dot gnu.org
  Reporter: ktkachov at gcc dot gnu.org
  Target Milestone: ---
Target: aarch64

The testcase:
__complex double
foo (__complex double a, __complex double b)
{
  return a * b;
}

With GCC trunk at -Ofast I see on aarch64:
foo(double _Complex, double _Complex):
fmovd31, d1
fmuld1, d1, d2
fmadd   d1, d0, d3, d1
fmuld31, d31, d3
fnmsub  d0, d0, d2, d31
ret

with GCC 10 the codegen used to be tighter:
foo(double _Complex, double _Complex):
fmuld4, d1, d3
fmuld5, d1, d2
fmadd   d1, d0, d3, d5
fnmsub  d0, d0, d2, d4
ret

There's an extra fmov emitted on trunk.
I noticed this regressed with the GCC 11 series

[Bug target/111733] New: Emit inline SVE FSCALE instruction for ldexp

2023-10-09 Thread ktkachov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111733

Bug ID: 111733
   Summary: Emit inline SVE FSCALE instruction for ldexp
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Keywords: missed-optimization
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: ktkachov at gcc dot gnu.org
  Target Milestone: ---
Target: aarch64

Having noticed https://github.com/llvm/llvm-project/pull/67552 in LLVM GCC
should be able to emit the SVE fscale instruction [1] to implement the ldexp
standard function.

There is already an ldexpm3 optab defined so it should be a relatively simple
matter of wiring up the expander for TARGET_SVE

[1]
https://developer.arm.com/documentation/ddi0596/2021-12/SVE-Instructions/FSCALE--Floating-point-adjust-exponent-by-vector--predicated--?lang=en

[Bug tree-optimization/111478] [12/13/14 regression] aarch64 SVE ICE: in compute_live_loop_exits, at tree-ssa-loop-manip.cc:250

2023-09-27 Thread ktkachov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111478

ktkachov at gcc dot gnu.org changed:

   What|Removed |Added

 CC||ktkachov at gcc dot gnu.org
   Target Milestone|14.0|12.4
   Priority|P3  |P1

--- Comment #3 from ktkachov at gcc dot gnu.org ---
Marking as P1. We hit this with a Fortran reproducer:
  SUBROUTINE REPRODUCER( M, A, LDA )
  IMPLICIT NONE
  INTEGERLDA, M, I
  COMPLEXA( LDA, * )
  DO I = 2, M
A( I, 1 ) = A( I, 1 ) / A( 1, 1 )
  END DO
  RETURN
  END

on aarch64 with -march=armv8-a+sve -O3
The ICE triggeres on 12.3 but compiles fine wiht 12.2

[Bug tree-optimization/111476] [14 regression] ICE when building Ruby 3.1.4

2023-09-19 Thread ktkachov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111476

ktkachov at gcc dot gnu.org changed:

   What|Removed |Added

 Ever confirmed|0   |1
 Status|UNCONFIRMED |NEW
   Last reconfirmed||2023-09-19
 CC||ktkachov at gcc dot gnu.org

--- Comment #2 from ktkachov at gcc dot gnu.org ---
Confirmed. Reduced testcase.

int a, b, c, d;
void
e() {
  int f, g, h;
  for (;;)
switch (c) {
case '-':
  if (!b) {
if (a) {
  g = 0;
  goto i;
}
goto j;
  }
  for (; a;)
  i:
g++;
  if (b)
continue;
  f = 1;
  for (; f < g; f++) {
b++;
if (b)
  h *= 10;
  }
}
j:
  d = h;
}

[Bug middle-end/111378] Missed optimization for comparing with exact_log2 constants

2023-09-12 Thread ktkachov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111378

ktkachov at gcc dot gnu.org changed:

   What|Removed |Added

 Ever confirmed|0   |1
 Status|UNCONFIRMED |NEW
   Last reconfirmed||2023-09-12
 CC||ktkachov at gcc dot gnu.org

--- Comment #1 from ktkachov at gcc dot gnu.org ---
Confirmed. On aarch64 GCC generates:
test:
mov w2, 65535
cmp w1, w2
bhi .L2
b   do_something
.L2:
b   do_something_other

but LLVM generates the shorter:
test:   // @test
lsr w8, w1, #16
cbnzw8, .LBB0_2
b   do_something
.LBB0_2:
b   do_something_other

[Bug web/111120] Rrrrr

2023-08-23 Thread ktkachov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=20

ktkachov at gcc dot gnu.org changed:

   What|Removed |Added

 Resolution|--- |INVALID
 Status|UNCONFIRMED |RESOLVED

--- Comment #1 from ktkachov at gcc dot gnu.org ---
.

[Bug target/110280] internal compiler error: in const_unop, at fold-const.cc:1884

2023-06-16 Thread ktkachov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110280

ktkachov at gcc dot gnu.org changed:

   What|Removed |Added

 Ever confirmed|0   |1
 CC||ktkachov at gcc dot gnu.org
   Last reconfirmed||2023-06-16
 Status|UNCONFIRMED |NEW
 Target|arm64   |aarch64

--- Comment #2 from ktkachov at gcc dot gnu.org ---
Confirmed, reducing.

[Bug target/110235] [14 Regression] Wrong use of us_truncate in SSE and AVX RTL representation

2023-06-15 Thread ktkachov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110235

--- Comment #4 from ktkachov at gcc dot gnu.org ---
(In reply to Hongtao.liu from comment #3)
> (In reply to Hongtao.liu from comment #2)
> > FAIL: gcc.target/i386/avx2-vpackssdw-2.c execution test
> > 
> > This one is about sign saturation which should match rtl SS_TRUNCATE.
> 
> I realize for 256-bit/512-bit vpackssdw, it's an 128-bit iterleave of src1
> and src2, and then ss_truncate to the dest, not just vec_concat src1 and
> src2. So the simplification exposed the bug.

Thanks for looking at it. I think it'd make sense for someone with x86/sse/avx
experience to rewrite the RTL representation of the patterns involved to match
the correct semantics for saturation and lane behaviour.
Alternatively, a quick solution would be to convert uses of
us_truncate/ss_truncate in the problematic patterns to an x86-specific UNSPEC,
which would make things work like they did before the simplification was added.
That would be just a stop-gap solution as it's better to use standard RTL
operations where possible.

[Bug target/110235] New: Wrong use of us_truncate in SSE and AVX RTL representation

2023-06-13 Thread ktkachov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110235

Bug ID: 110235
   Summary: Wrong use of us_truncate in SSE and AVX RTL
representation
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Keywords: wrong-code
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: ktkachov at gcc dot gnu.org
CC: uros at gcc dot gnu.org
  Target Milestone: ---
Target: x86

After g:921b841350c4fc298d09f6c5674663e0f4208610 added constant-folding for
SS_TRUNCATE and US_TRUNCATE some tests in i386.exp started failing:
FAIL: gcc.target/i386/avx-vpackuswb-1.c execution test
FAIL: gcc.target/i386/avx2-vpackssdw-2.c execution test
FAIL: gcc.target/i386/avx2-vpackusdw-2.c execution test
FAIL: gcc.target/i386/avx2-vpackuswb-2.c execution test
FAIL: gcc.target/i386/sse2-packuswb-1.c execution test

>From what I can gather from the documentation for intrinsics like
_mm_packus_epi16 the operation they perform is not what we model as us_truncate
in RTL. That is, they don't perform a truncation while treating their input as
an unsigned value. Rather, they treat the input as a signed value and saturate
it to the unsigned min and max of the narrow mode before truncation. In that
regard they seem similar to the SQMOVUN instructions in aarch64.

I think it'd be best to change the representation of those instructions to a
truncating clamp operation, similar to
g:b747f54a2a930da55330c2861cd1e344f67a88d9 in aarch64.

[Bug target/110059] When SPEC is used to test the GCC (10.3.1), the test result of subitem 548 fluctuates abnormally.

2023-05-31 Thread ktkachov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110059

ktkachov at gcc dot gnu.org changed:

   What|Removed |Added

 CC||ktkachov at gcc dot gnu.org

--- Comment #3 from ktkachov at gcc dot gnu.org ---
548.exchange2_r was improved in GCC 12 after PR98782 was fixed. I'd suggest you
try out a later version of GCC

[Bug target/110039] [14 Regression] FAIL: gcc.target/aarch64/rev16_2.c scan-assembler-times rev16\\tw[0-9]+ 2

2023-05-30 Thread ktkachov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110039

ktkachov at gcc dot gnu.org changed:

   What|Removed |Added

   Target Milestone|--- |14.0

[Bug target/110039] New: FAIL: gcc.target/aarch64/rev16_2.c scan-assembler-times rev16\\tw[0-9]+ 2

2023-05-30 Thread ktkachov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110039

Bug ID: 110039
   Summary: FAIL: gcc.target/aarch64/rev16_2.c
scan-assembler-times rev16\\tw[0-9]+ 2
   Product: gcc
   Version: 13.0
Status: UNCONFIRMED
  Keywords: missed-optimization
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: ktkachov at gcc dot gnu.org
  Target Milestone: ---
Target: aarch64

I think after g:d8545fb2c71683f407bfd96706103297d4d6e27b the test regresses on
aarch64.
We now generate:
__rev16_32_alt:
rev w0, w0
ror w0, w0, 16
ret

__rev16_32:
rev w0, w0
ror w0, w0, 16
ret

whereas before it was:
__rev16_32_alt:
rev16   w0, w0
ret

__rev16_32:
rev16   w0, w0
ret

I think the GIMPLE at expand time is better and the RTL that it tries to match
is simpler:
Failed to match this instruction:
(set (reg:SI 95)
(rotate:SI (bswap:SI (reg:SI 96))
(const_int 16 [0x10])))

So maybe it's simply a matter of adding that pattern to aarch64.md.

Anyway, filing this here to track the regression

[Bug target/109939] Invalid return type for __builtin_arm_ssat: Unsigned instead of signed

2023-05-24 Thread ktkachov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109939

ktkachov at gcc dot gnu.org changed:

   What|Removed |Added

 Status|NEW |ASSIGNED
   Assignee|unassigned at gcc dot gnu.org  |ktkachov at gcc dot 
gnu.org

--- Comment #5 from ktkachov at gcc dot gnu.org ---
Fixed for GCC 14. It should be a very low risk patch to backport to the
branches as it fixes an inconsistency with the spec. Will do so after some time
for testing on trunk.

[Bug target/109855] [14 Regression] ICE: in curr_insn_transform, at lra-constraints.cc:4231 unable to generate reloads for {aarch64_mlav4hi_vec_concatz_le} at -O1

2023-05-23 Thread ktkachov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109855

ktkachov at gcc dot gnu.org changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

--- Comment #9 from ktkachov at gcc dot gnu.org ---
Fixed, thanks for the report.

[Bug target/109939] Invalid return type for __builtin_arm_ssat: Unsigned instead of signed

2023-05-23 Thread ktkachov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109939

ktkachov at gcc dot gnu.org changed:

   What|Removed |Added

 Status|WAITING |NEW
 CC||ktkachov at gcc dot gnu.org

--- Comment #3 from ktkachov at gcc dot gnu.org ---
I think you're right, the qualifier for the return value of
SAT_BINOP_UNSIGNED_IMM should be qualifier_none

[Bug c/109940] [14 Regression] ICE in decide_candidate_validity, bisected

2023-05-23 Thread ktkachov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109940

ktkachov at gcc dot gnu.org changed:

   What|Removed |Added

 Ever confirmed|0   |1
 Status|UNCONFIRMED |NEW
   Target Milestone|--- |14.0
  Known to fail||14.0
  Known to work||13.1.0
   Last reconfirmed||2023-05-23
Summary|ICE in  |[14 Regression] ICE in
   |decide_candidate_validity,  |decide_candidate_validity,
   |bisected|bisected
 CC||ktkachov at gcc dot gnu.org

--- Comment #1 from ktkachov at gcc dot gnu.org ---
Confirmed. A more cleaned up testcase:
int a;
int *b;
void
c (int *d) { *d = a; }

int
e(int d, int f) {
  if (d <= 1)
return 1;
  int g = d / 2;
  for (int h = 0; h < g; h++)
if (f == (long int)b > b[h])
  c([h]);
  e(g, f);
  e(g, f);
}

[Bug target/109855] [14 Regression] ICE: in curr_insn_transform, at lra-constraints.cc:4231 unable to generate reloads for {aarch64_mlav4hi_vec_concatz_le} at -O1

2023-05-22 Thread ktkachov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109855

ktkachov at gcc dot gnu.org changed:

   What|Removed |Added

   Assignee|unassigned at gcc dot gnu.org  |ktkachov at gcc dot 
gnu.org
 Status|NEW |ASSIGNED

--- Comment #7 from ktkachov at gcc dot gnu.org ---
I'll take it.

[Bug target/109855] [14 Regression] ICE: in curr_insn_transform, at lra-constraints.cc:4231 unable to generate reloads for {aarch64_mlav4hi_vec_concatz_le} at -O1

2023-05-22 Thread ktkachov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109855

--- Comment #6 from ktkachov at gcc dot gnu.org ---
(In reply to ktkachov from comment #5)
> (In reply to rsand...@gcc.gnu.org from comment #4)
> > I guess the problem is that the define_subst output template has:
> > 
> >   (match_operand: 0)
> > 
> > which creates a new operand 0 with an empty predicate and constraint,
> > as opposed to a (match_dup 0), which would be substituted with the
> > original operand 0.  Unfortunately
> > 
> >   (match_dup: 0)
> > 
> > doesn't work as a way of inserting the original destination with
> > a different mode, since the : is ignored.  Perhaps we should
> > “fix” that.  Alternatively:
> > 
> >   (match_operand: 0 "register_operand" "=w")
> > 
> > should work, but probably locks us into using patterns that have one
> > alternative only.
> 
> I think this approach is the most promising and probably okay for the vast
> majority of cases we want to handle with these substs.

Interestingly, it does seem to do the right thing for multi-alternative
patterns too. For example:
(define_insn ("aarch64_cmltv4hf_vec_concatz_le")
 [
(set (match_operand:V8HI 0 ("register_operand") ("=w,w"))
(vec_concat:V8HI (neg:V4HI (lt:V4HI (match_operand:V4HF 1
("register_operand") ("w,w"))
(match_operand:V4HF 2 ("aarch64_simd_reg_or_zero")
("w,YDz"
(match_operand:V4HI 3 ("aarch64_simd_or_scalar_imm_zero")
(""
] ("(!BYTES_BIG_ENDIAN) && ((TARGET_SIMD) && (TARGET_SIMD_F16INST))") ("@
  fcmgt\t%0.4h, %2.4h, %1.4h
  fcmlt\t%0.4h, %1.4h, 0")
 [
(set_attr ("type") ("neon_fp_compare_s"))
(set_attr ("add_vec_concat_subst_le") ("no"))
])

[Bug target/109855] [14 Regression] ICE: in curr_insn_transform, at lra-constraints.cc:4231 unable to generate reloads for {aarch64_mlav4hi_vec_concatz_le} at -O1

2023-05-22 Thread ktkachov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109855

--- Comment #5 from ktkachov at gcc dot gnu.org ---
(In reply to rsand...@gcc.gnu.org from comment #4)
> I guess the problem is that the define_subst output template has:
> 
>   (match_operand: 0)
> 
> which creates a new operand 0 with an empty predicate and constraint,
> as opposed to a (match_dup 0), which would be substituted with the
> original operand 0.  Unfortunately
> 
>   (match_dup: 0)
> 
> doesn't work as a way of inserting the original destination with
> a different mode, since the : is ignored.  Perhaps we should
> “fix” that.  Alternatively:
> 
>   (match_operand: 0 "register_operand" "=w")
> 
> should work, but probably locks us into using patterns that have one
> alternative only.

I think this approach is the most promising and probably okay for the vast
majority of cases we want to handle with these substs.

[Bug target/109855] [14 Regression] ICE: in curr_insn_transform, at lra-constraints.cc:4231 unable to generate reloads for {aarch64_mlav4hi_vec_concatz_le} at -O1

2023-05-22 Thread ktkachov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109855

ktkachov at gcc dot gnu.org changed:

   What|Removed |Added

   Last reconfirmed||2023-05-22
 Ever confirmed|0   |1
 Status|UNCONFIRMED |NEW

--- Comment #2 from ktkachov at gcc dot gnu.org ---
Confirmed.
The ICE in LRA happens very early on:
** Local #1: **

   Spilling non-eliminable hard regs: 31
alt=0: Bad operand -- refuse


The pattern matches:
 [(set (match_operand:VDQ_BHSI 0 "register_operand" "=w")
   (plus:VDQ_BHSI (mult:VDQ_BHSI
(match_operand:VDQ_BHSI 2 "register_operand" "w")
(match_operand:VDQ_BHSI 3 "register_operand" "w"))
  (match_operand:VDQ_BHSI 1 "register_operand" "0")))]

I wonder whether the substitution breaks something on the constraint in operand
1, which is tied to 0. The define_subst rule adds another operand to the
pattern to match the zero vector, but I would have expected the substitution
machinery to handle it all transparently...

[Bug target/108140] ICE expanding __rbit

2023-05-09 Thread ktkachov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108140

ktkachov at gcc dot gnu.org changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

--- Comment #9 from ktkachov at gcc dot gnu.org ---
This should have been fixed for 12.3.

[Bug target/109636] [14 Regression] ICE: in paradoxical_subreg_p, at rtl.h:3205 with -O -mcpu=a64fx

2023-04-28 Thread ktkachov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109636

--- Comment #7 from ktkachov at gcc dot gnu.org ---
(In reply to rsand...@gcc.gnu.org from comment #6)
> Ugh.  I guess we've got no option but to force the original
> subreg into a fresh register, but that's going to pessimise
> cases where arithmetic is done on tuple types.
> 
> Perhaps we should just expose the SVE operation as a native
> V2DI one.  Handling predicated ops would be a bit more challenging
> though.

I did try a copy_to_mode_reg to a fresh V2DI register for non-REG_P arguments
and that did progress, but (surprisingly?) still ICEd during fwprop:
during RTL pass: fwprop1
mulice.c: In function 'foom':
mulice.c:17:1: internal compiler error: in paradoxical_subreg_p, at rtl.h:3205
   17 | }
  | ^
0xe903b9 paradoxical_subreg_p(machine_mode, machine_mode)
$SRC/gcc/rtl.h:3205
0xe903b9 simplify_context::simplify_subreg(machine_mode, rtx_def*,
machine_mode, poly_int<2u, unsigned long>)
$SRC/gcc/simplify-rtx.cc:7533
0xe1b5f7 insn_propagation::apply_to_rvalue_1(rtx_def**)
$SRC/gcc/recog.cc:1176
0xe1b3d8 insn_propagation::apply_to_rvalue_1(rtx_def**)
$SRC/gcc/recog.cc:1118
0xe1b7b7 insn_propagation::apply_to_rvalue_1(rtx_def**)
$SRC/gcc/recog.cc:1254
0xe1babf insn_propagation::apply_to_pattern_1(rtx_def**)
$SRC/gcc/recog.cc:1361
0xe1bae4 insn_propagation::apply_to_pattern(rtx_def**)
$SRC/gcc/recog.cc:1383
0x1c22e5b try_fwprop_subst_pattern
$SRC/gcc/fwprop.cc:454
0x1c22e5b try_fwprop_subst
$SRC/gcc/fwprop.cc:627
0x1c239a9 forward_propagate_and_simplify
$SRC/gcc/fwprop.cc:823
0x1c239a9 forward_propagate_into
$SRC/gcc/fwprop.cc:886
0x1c23bc1 fwprop_insn
$SRC/gcc/fwprop.cc:943
0x1c23d98 fwprop
$SRC/gcc/fwprop.cc:995
0x1c240e1 execute
$SRC/gcc/fwprop.cc:1033
Please submit a full bug report, with preprocessed source (by using
-freport-bug).
Please include the complete backtrace with any bug report.
See  for instructions.

fwprop ended up creating:
(mult:VNx2DI (subreg:VNx2DI (reg/v:V2DI 95 [ v ]) 0)
(subreg:VNx2DI (subreg:V2DI (reg/v:OI 97 [ w ]) 16) 0))

and something blew up anyway, so it seems the RTL passes *really* don't like
these kind of subregs ;)
I'll look into expressing these ops as native V2DI patterns. I guess for the
unpredicated SVE2 mul that's easy, but for the predicated forms perhaps we can
have them consume a predicate register, generated at expand time, similar to
the  aarch64-sve.md expanders. Not super-pretty but maybe it'll be enough

[Bug target/109636] [14 Regression] ICE: in paradoxical_subreg_p, at rtl.h:3205 with -O -mcpu=a64fx

2023-04-28 Thread ktkachov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109636

ktkachov at gcc dot gnu.org changed:

   What|Removed |Added

   Priority|P3  |P1
   Assignee|unassigned at gcc dot gnu.org  |ktkachov at gcc dot 
gnu.org
 Status|NEW |ASSIGNED

--- Comment #5 from ktkachov at gcc dot gnu.org ---
The multiplication case also ICEs
void foom (V v, W w)
{
  bar (__builtin_shuffle (v, __builtin_shufflevector ((V){}, w, 4, 5) * v));
}

as mulv2di3 was implemented with a similar trick for TARGET_SVE.
I'll take this, once I figure out how to wire up the Neon modes through SVE...

[Bug target/109636] [14 Regression] ICE: in paradoxical_subreg_p, at rtl.h:3205 with -O -mcpu=a64fx

2023-04-27 Thread ktkachov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109636

ktkachov at gcc dot gnu.org changed:

   What|Removed |Added

   Last reconfirmed||2023-04-27
 Status|UNCONFIRMED |NEW
 CC||rsandifo at gcc dot gnu.org
 Ever confirmed|0   |1

--- Comment #4 from ktkachov at gcc dot gnu.org ---
Confirmed. The operand that's blowing it up is:
(subreg:V2DI (reg/v:OI 97 [ w ]) 16)
at
rtx sve_op1 = simplify_gen_subreg (sve_mode, operands[1], mode, 0);

simplify_gen_subreg, lowpart_subreg, copy_to_mode_reg and force_reg all ICE :(

[Bug target/109636] [14 Regression] ICE: in paradoxical_subreg_p, at rtl.h:3205 with -O -mcpu=a64fx

2023-04-26 Thread ktkachov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109636

ktkachov at gcc dot gnu.org changed:

   What|Removed |Added

 CC||ktkachov at gcc dot gnu.org

--- Comment #2 from ktkachov at gcc dot gnu.org ---
(In reply to Andrew Pinski from comment #1)
> Are you sure this is not a regression also in GCC 13.1.0.
> The most obvious revision which caused this is r13-6620-gf23dc726875c26f2c3 .

I'd expect it's g:c69db3ef7f7d82a50f46038aa5457b7c8cc2d643 but haven't looked
deeper yet

[Bug tree-optimization/53947] [meta-bug] vectorizer missed-optimizations

2023-04-24 Thread ktkachov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53947
Bug 53947 depends on bug 109406, which changed state.

Bug 109406 Summary: Missing use of aarch64 SVE2 unpredicated integer multiply
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109406

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED

[Bug target/109406] Missing use of aarch64 SVE2 unpredicated integer multiply

2023-04-24 Thread ktkachov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109406

ktkachov at gcc dot gnu.org changed:

   What|Removed |Added

 Status|NEW |RESOLVED
   Target Milestone|--- |14.0
 Resolution|--- |FIXED

--- Comment #3 from ktkachov at gcc dot gnu.org ---
Fixed for GCC 14

[Bug target/108779] AARCH64 should add an option to change TLS register location to support EL1/EL2/EL3 system registers

2023-04-21 Thread ktkachov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108779

ktkachov at gcc dot gnu.org changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
   Target Milestone|--- |14.0
 Resolution|--- |FIXED

--- Comment #10 from ktkachov at gcc dot gnu.org ---
Implemented for GCC 14.

[Bug c/109553] New: Atomic operations vs const locations

2023-04-19 Thread ktkachov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109553

Bug ID: 109553
   Summary: Atomic operations vs const locations
   Product: gcc
   Version: 13.0
Status: UNCONFIRMED
  Keywords: diagnostic
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: ktkachov at gcc dot gnu.org
  Target Milestone: ---

When reasoning about optimal sequences for atomic operations for various
targets the issue of read-only memory locations keeps coming up, particularly
when talking about doing non-native larger-sized accesses locklessly

I wonder if the frontends in GCC should be more assertive with warnings on such
constructs. Consider, for example:
#include 

uint32_t
load_uint32_t (const uint32_t *a)
{
  return __atomic_load_n (a, __ATOMIC_ACQUIRE);
}

void
casa_uint32_t (const uint32_t *a, uint32_t *b, uint32_t *c)
{
  __atomic_compare_exchange_n (a, b, 3, 0, __ATOMIC_ACQUIRE, __ATOMIC_ACQUIRE);
}

Both of these functions compile fine with GCC.
With Clang casa_uint32_t  gives a hard error:
error: address argument to atomic operation must be a pointer to non-const type
('const uint32_t *' (aka 'const unsigned int *') invalid)
  __atomic_compare_exchange_n (a, b, 3, 0, __ATOMIC_ACQUIRE, __ATOMIC_ACQUIRE);

I would argue that for both cases the compiler should emit something. I think
an error is a appropriate for the __atomic_compare_exchange_n case, but even
for atomic load we may want to hint to the user to avoid doing an atomic load
from const types.

[Bug target/108840] Aarch64 doesn't optimize away shift counter masking

2023-04-19 Thread ktkachov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108840

ktkachov at gcc dot gnu.org changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|ASSIGNED|RESOLVED
   Target Milestone|--- |14.0

--- Comment #5 from ktkachov at gcc dot gnu.org ---
Fixed for GCC 14.

[Bug tree-optimization/109154] [13 regression] jump threading de-optimizes nested floating point comparisons

2023-04-05 Thread ktkachov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109154

ktkachov at gcc dot gnu.org changed:

   What|Removed |Added

   Priority|P3  |P1

--- Comment #43 from ktkachov at gcc dot gnu.org ---
Indeed, thank you for the high quality analysis and improvements!
Marking this as P1 as it's a regression on aarch64-linux in GCC 13 so we'd want
to track this for the release, but of course it's up to RMs for the final say.

[Bug target/109406] Missing use of aarch64 SVE2 unpredicated integer multiply

2023-04-04 Thread ktkachov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109406

ktkachov at gcc dot gnu.org changed:

   What|Removed |Added

   Severity|normal  |enhancement

[Bug target/109406] New: Missing use of aarch64 SVE2 unpredicated integer multiply

2023-04-04 Thread ktkachov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109406

Bug ID: 109406
   Summary: Missing use of aarch64 SVE2 unpredicated integer
multiply
   Product: gcc
   Version: 13.0
Status: UNCONFIRMED
  Keywords: missed-optimization
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: ktkachov at gcc dot gnu.org
  Target Milestone: ---
Target: aarch64

For the testcase
#define N 1024

long long res[N];
long long in1[N];
long long in2[N];

void
mult (void)
{
  for (int i = 0; i < N; i++)
res[i] = in1[i] * in2[i];
}

With -O3 -march=armv8.5-a+sve2 we generate the loop:
ptrue   p1.b, all
whilelo p0.d, wzr, w2
.L2:
ld1dz0.d, p0/z, [x4, x0, lsl 3]
ld1dz1.d, p0/z, [x3, x0, lsl 3]
mul z0.d, p1/m, z0.d, z1.d
st1dz0.d, p0, [x1, x0, lsl 3]
incdx0
whilelo p0.d, w0, w2
b.any   .L2
ret

SVE2 supports the MUL (vectors, unpredicated) instruction that would allow us
to  eliminate the use of p1. Clang manages to do this (though it has other
inefficiencies) in https://godbolt.org/z/7xj6xEchx

[Bug tree-optimization/109401] New: Optimise max (a, b) + min (a, b) into a + b

2023-04-04 Thread ktkachov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109401

Bug ID: 109401
   Summary: Optimise max (a, b) + min (a, b) into a + b
   Product: gcc
   Version: 13.0
Status: UNCONFIRMED
  Keywords: missed-optimization
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: ktkachov at gcc dot gnu.org
  Target Milestone: ---

The testcase

#include 
#include 

uint32_t
foo (uint32_t a, uint32_t b)
{
  return std::max (a, b) + std::min (a, b);
}

uint32_t
foom (uint32_t a, uint32_t b)
{
  return std::max (a, b) * std::min (a, b);
}

could optimise foo into a + b and foom into a * b.
Should be a matter of some match.pd patterns?

[Bug target/109332] Bug in gcc (13.0.1) support for ARM SVE, which randomly ignore the predict register

2023-03-29 Thread ktkachov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109332

ktkachov at gcc dot gnu.org changed:

   What|Removed |Added

 CC||ktkachov at gcc dot gnu.org
 Status|UNCONFIRMED |RESOLVED
 Resolution|--- |INVALID

--- Comment #1 from ktkachov at gcc dot gnu.org ---
That's expected. Please see
https://github.com/ARM-software/acle/blob/main/main/acle.md#sve-naming-convention
Since the input uses the _x form of the intrinsic svsub_n_s64_x the predication
behaviour is left to the compiler and the ACLE specifies:
"This form of predication removes the need to choose between zeroing and
merging in cases where the inactive elements are unimportant. The code
generator can then pick whichever form of instruction seems to give the best
code. This includes using unpredicated instructions, where available and
suitable."

So using an unpredicated sub instruction is appropriate here and not a bug.

[Bug tree-optimization/109176] [13 Regression] internal compiler error: in to_constant, at poly-int.h:504

2023-03-21 Thread ktkachov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109176

--- Comment #10 from ktkachov at gcc dot gnu.org ---
For the testcase, having it in gcc.target/aarch64/sve as
/* { dg-options "-O2" } */

#include 

svbool_t
foo (svint8_t a, svint8_t b, svbool_t c)
{
  svbool_t d = svcmplt_s8 (svptrue_pat_b8 (SV_ALL), a, b);
  return svsel_b (d, c, d);
}

would be fine.

[Bug tree-optimization/109176] [13 Regression] internal compiler error: in to_constant, at poly-int.h:504

2023-03-20 Thread ktkachov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109176

--- Comment #3 from ktkachov at gcc dot gnu.org ---
Created attachment 54708
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=54708=edit
Reduced testcase

Reduced testcase ICEs at -O2

[Bug tree-optimization/109176] internal compiler error: in to_constant, at poly-int.h:504

2023-03-17 Thread ktkachov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109176

ktkachov at gcc dot gnu.org changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
 CC||ktkachov at gcc dot gnu.org
   Target Milestone|--- |13.0
 Ever confirmed|0   |1
   Last reconfirmed||2023-03-17

--- Comment #2 from ktkachov at gcc dot gnu.org ---
Confirmed. Running reduction

[Bug middle-end/109153] missed vector constructor optimizations

2023-03-16 Thread ktkachov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109153

ktkachov at gcc dot gnu.org changed:

   What|Removed |Added

 CC||ktkachov at gcc dot gnu.org
 Ever confirmed|0   |1
   Last reconfirmed||2023-03-16
 Status|UNCONFIRMED |NEW

--- Comment #1 from ktkachov at gcc dot gnu.org ---
Confirmed. Does the midend have a way of judging whether a constructor is
cheaper?

[Bug c++/108967] internal compiler error: in expand_debug_expr, at cfgexpand.cc:5450

2023-02-28 Thread ktkachov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108967

ktkachov at gcc dot gnu.org changed:

   What|Removed |Added

 Target||aarch64
   Last reconfirmed||2023-02-28
 Status|UNCONFIRMED |NEW
 CC||ktkachov at gcc dot gnu.org
 Ever confirmed|0   |1
   Target Milestone|--- |13.0
   Keywords||ice-on-valid-code
  Known to fail||13.0

--- Comment #2 from ktkachov at gcc dot gnu.org ---
Confirmed

[Bug rtl-optimization/106594] [13 Regression] sign-extensions no longer merged into addressing mode

2023-02-27 Thread ktkachov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106594

ktkachov at gcc dot gnu.org changed:

   What|Removed |Added

 CC||ktkachov at gcc dot gnu.org

--- Comment #12 from ktkachov at gcc dot gnu.org ---
(In reply to Tamar Christina from comment #11)
> This patch seems to have stalled. CC'ing the maintainers as this is still a
> large regression for us.

Roger's latest updated patch was posted recently at
https://gcc.gnu.org/pipermail/gcc-patches/2023-February/612840.html

[Bug target/108840] Aarch64 doesn't optimize away shift counter masking

2023-02-24 Thread ktkachov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108840

--- Comment #3 from ktkachov at gcc dot gnu.org ---
Created attachment 54531
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=54531=edit
Candidate patch

Candidate patch attached.

[Bug tree-optimization/108901] [13 Regression] Testsuite failures in gcc.target/aarch64/sve/cond_*

2023-02-23 Thread ktkachov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108901

ktkachov at gcc dot gnu.org changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution|--- |FIXED

--- Comment #2 from ktkachov at gcc dot gnu.org ---
Yes, they are fixed now. Thank you!

[Bug tree-optimization/108901] [13 Regression] Testsuite failures in gcc.target/aarch64/sve/cond_*

2023-02-23 Thread ktkachov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108901

ktkachov at gcc dot gnu.org changed:

   What|Removed |Added

   Target Milestone|--- |13.0
   Priority|P3  |P1

[Bug tree-optimization/108901] New: [13 Regression] Testsuite failures in gcc.target/aarch64/sve/cond_*

2023-02-23 Thread ktkachov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108901

Bug ID: 108901
   Summary: [13 Regression] Testsuite failures in
gcc.target/aarch64/sve/cond_*
   Product: gcc
   Version: 13.0
Status: UNCONFIRMED
  Keywords: testsuite-fail
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: ktkachov at gcc dot gnu.org
  Target Milestone: ---
Target: aarch64

After g:3da77f217c8b2089ecba3eb201e727c3fcdcd19d we're seeing testsuite
failures like:
gcc.target/aarch64/sve/cond_fmaxnm_7.c
gcc.target/aarch64/sve/cond_fminnm_7.c
gcc.target/aarch64/sve/cond_fmaxnm_8.c
gcc.target/aarch64/sve/cond_fminnm_8.c
gcc.target/aarch64/sve/cond_fminnm_6.c
gcc.target/aarch64/sve/fmla_2.c
gcc.target/aarch64/sve/cond_xorsign_2.c
gcc.target/aarch64/sve/cond_xorsign_1.c
gcc.target/aarch64/sve/cond_fmaxnm_6.c

on aarch64. I haven't looked into the cause, just reporting here for tracking

[Bug target/108874] [10/11/12/13 Regression] Missing bswap detection

2023-02-22 Thread ktkachov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108874

--- Comment #3 from ktkachov at gcc dot gnu.org ---
(In reply to Richard Biener from comment #2)
> The regression is probably rtl-optimization/target specific since we never
> had this kind of pattern detected on the tree/GIMPLE level and there's no
> builtin or IFN for this shuffling on u32.

FWIW a colleague reported that he bisected the failure to
g:98e30e515f184bd63196d4d500a682fbfeb9635e though I haven't tried it myself.
We do have patterns for these in aarch64 and arm, but combine would need to
match about 5 insns to get there and that's beyond its current limit of 4

[Bug tree-optimization/108874] [10/11/12/13 Regression] Missing bswap detection

2023-02-21 Thread ktkachov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108874

--- Comment #1 from ktkachov at gcc dot gnu.org ---
(In reply to ktkachov from comment #0)
> If we look at the arm testcases in gcc.target/arm/rev16.c
> typedef unsigned int __u32;
> 
> __u32
> __rev16_32_alt (__u32 x)
> {
>   return (((__u32)(x) & (__u32)0xff00ff00UL) >> 8)
>  | (((__u32)(x) & (__u32)0x00ff00ffUL) << 8);
> }
> 
> __u32
> __rev16_32 (__u32 x)
> {
>   return (((__u32)(x) & (__u32)0x00ff00ffUL) << 8)
>  | (((__u32)(x) & (__u32)0xff00ff00UL) >> 8);
> }
> 

this isn't a simple __builtin_bswap16 as that returns a uint16_t, this is sort
of a __builtin_swap16 in each of the half-words of the u32

[Bug tree-optimization/108874] New: [10/11/12/13 Regression] Missing bswap detection

2023-02-21 Thread ktkachov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108874

Bug ID: 108874
   Summary: [10/11/12/13 Regression] Missing bswap detection
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Keywords: missed-optimization
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: ktkachov at gcc dot gnu.org
  Target Milestone: ---

If we look at the arm testcases in gcc.target/arm/rev16.c
typedef unsigned int __u32;

__u32
__rev16_32_alt (__u32 x)
{
  return (((__u32)(x) & (__u32)0xff00ff00UL) >> 8)
 | (((__u32)(x) & (__u32)0x00ff00ffUL) << 8);
}

__u32
__rev16_32 (__u32 x)
{
  return (((__u32)(x) & (__u32)0x00ff00ffUL) << 8)
 | (((__u32)(x) & (__u32)0xff00ff00UL) >> 8);
}

we should be able to generate rev16 instructions for aarch64 (and arm) i.e.
recognise a __builtin_bswap16 essentially.
GCC fails to do so and generates:
__rev16_32_alt:
lsr w1, w0, 8
lsl w0, w0, 8
and w1, w1, 16711935
and w0, w0, -16711936
orr w0, w1, w0
ret
__rev16_32:
lsl w1, w0, 8
lsr w0, w0, 8
and w1, w1, -16711936
and w0, w0, 16711935
orr w0, w1, w0
ret

whereas clang manages to recognise it all into:
__rev16_32_alt: // @__rev16_32_alt
rev16   w0, w0
ret
__rev16_32: // @__rev16_32
rev16   w0, w0
ret

does the bswap pass need some tweaking perhaps?

Looks like this worked fine with GCC 5 but broke in the GCC 6 timeframe so
marking as a regression

[Bug target/108840] Aarch64 doesn't optimize away shift counter masking

2023-02-21 Thread ktkachov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108840

ktkachov at gcc dot gnu.org changed:

   What|Removed |Added

   Assignee|unassigned at gcc dot gnu.org  |ktkachov at gcc dot 
gnu.org
 Status|NEW |ASSIGNED

--- Comment #2 from ktkachov at gcc dot gnu.org ---
I have a patch to simplify and fix the aarch64 rtx costs for this case. I'll
aim it for GCC 14 as it's not a regression.

[Bug target/108779] AARCH64 should add an option to change TLS register location to support EL1/EL2/EL3 system registers

2023-02-14 Thread ktkachov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108779

--- Comment #3 from ktkachov at gcc dot gnu.org ---
Created attachment 54459
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=54459=edit
Candidate patch

Patch that implements -mtp= similar to clang if you have the capability to try
it out

[Bug target/108779] AARCH64 should add an option to change TLS register location to support EL1/EL2/EL3 system registers

2023-02-14 Thread ktkachov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108779

ktkachov at gcc dot gnu.org changed:

   What|Removed |Added

 Ever confirmed|0   |1
 CC||ktkachov at gcc dot gnu.org
 Status|UNCONFIRMED |ASSIGNED
   Assignee|unassigned at gcc dot gnu.org  |ktkachov at gcc dot 
gnu.org
   Last reconfirmed||2023-02-14

--- Comment #2 from ktkachov at gcc dot gnu.org ---
Confirmed. I have a patch I'm testing for it.
Since GCC 13 is in stage4 (regression and wrong-code fixes only) this would be
GCC 14 material. Would that timeline be okay with you?

[Bug target/108659] Suboptimal 128 bit atomics codegen on AArch64 and x64

2023-02-03 Thread ktkachov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108659

ktkachov at gcc dot gnu.org changed:

   What|Removed |Added

 CC||ktkachov at gcc dot gnu.org

--- Comment #2 from ktkachov at gcc dot gnu.org ---
(In reply to Niall Douglas from comment #0)
> Related:
> - https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80878
> - https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94649
> - https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104688
> 
> I got bitten by this again, latest GCC still does not emit single
> instruction 128 bit atomics, even when the -march is easily new enough. Here
> is a godbolt comparing latest MSVC, latest GCC and latest clang for the
> skylake-avx512 architecture, which unquestionably supports cmpxchg16b. Only
> clang emits the single instruction atomic:
> 
> https://godbolt.org/z/EnbeeW4az
> 
> I'm gathering from the issue comments and from the comments at
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104688 that you're going to
> wait for AMD to guarantee atomicity of SSE instructions before changing the
> codegen here, which makes sense. However I also wanted to raise potentially
> suboptimal 128 bit atomic codegen by GCC for AArch64 as compared to clang:
> 
> https://godbolt.org/z/oKv4o81nv
> 
> GCC emits `dmb` to force a global memory fence, whereas clang does not.
> 
> I think clang is in the right here, the seq_cst atomic semantics are not
> supposed to globally memory fence.

FWIW, the GCC codegen for aarch64 is at https://godbolt.org/z/qvx9484nY (arm
and aarch64 are different targets). It emits a call to libatomic, which for GCC
13 will use a lockless implementation when possible at runtime, see
g:d1288d850944f69a795e4ff444a427eba3fec11b

[Bug target/108495] [10/11/12/13 Regression] aarch64 ICE with __builtin_aarch64_rndr

2023-01-25 Thread ktkachov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108495

--- Comment #7 from ktkachov at gcc dot gnu.org ---
Yes, GCC could be more helpful here. The intrinsics and their use is documented
in the ACLE document:
https://github.com/ARM-software/acle/blob/main/main/acle.md#random-number-generation-intrinsics
There is work ongoing to augument it with more user-friendly information about
compiler flags, but GCC could keep track of the options used to gate these
builtins/intrinsics and report a hint

[Bug target/108495] [10/11/12/13 Regression] aarch64 ICE with __builtin_aarch64_rndr

2023-01-23 Thread ktkachov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108495

ktkachov at gcc dot gnu.org changed:

   What|Removed |Added

 Ever confirmed|0   |1
   Keywords||ice-on-invalid-code
   Last reconfirmed||2023-01-23
 Status|UNCONFIRMED |NEW

--- Comment #1 from ktkachov at gcc dot gnu.org ---
Confirmed. That said, __builtin_aarch64_rndr is not supposed to be used
directly by the user. They should include  and use the __rndr
intrinsic instead.
That will give the appropriate error:
inlining failed in call to 'always_inline' '__rndr': target specific option
mismatch

Still, I suppose the compiler shouldn't ICE

[Bug tree-optimization/108446] New: GCC fails to elide udiv/msub when doing modulus by select of constants

2023-01-18 Thread ktkachov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108446

Bug ID: 108446
   Summary: GCC fails to elide udiv/msub when doing modulus by
select of constants
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Keywords: missed-optimization
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: ktkachov at gcc dot gnu.org
  Target Milestone: ---

unsigned foo(int vl, unsigned len) {
  unsigned pad = vl <= 256 ? 128 : 256;
  return len % pad;
}

At -O2 aarch64 gcc generates:
foo:
cmp w0, 256
mov w2, 256
mov w0, 128
cselw2, w2, w0, gt
udivw0, w1, w2
msubw0, w0, w2, w1
ret

clang, for example can generate the cheaper:
foo:// @foo
cmp w0, #256
mov w8, #127
mov w9, #255
cselw8, w9, w8, gt
and w0, w8, w1
ret

Similar situation on x86.
I suppose this could be a match.pd fix or otherwise something during
expand-time?

[Bug middle-end/88345] -Os overrides -falign-functions=N on the command line

2023-01-17 Thread ktkachov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88345

ktkachov at gcc dot gnu.org changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
 Ever confirmed|0   |1
 CC||ktkachov at gcc dot gnu.org
   Last reconfirmed||2023-01-17

--- Comment #12 from ktkachov at gcc dot gnu.org ---
(In reply to Kito Cheng from comment #7)
> We are hitting this issue on RISC-V, and got some complain from linux kernel
> developers, but in different form as the original report, we found cold
> function or any function is marked as cold by `-fguess-branch-probability`
> are all not honor to the -falign-functions=N setting, that become problem on
> some linux kernel feature since they want to control the minimal alignment
> to make sure they can atomically update the instruction which require align
> to 4 byte.
> 
> However current GCC behavior can't guarantee that even -falign-functions=4
> is given, there is 3 option in my mind:
> 
> 1. Fix -falign-functions=N, let it work as expect on -Os and all cold
> functions
> 2. Force align to 4 byte if -fpatchable-function-entry is given, that's
> should be doable by adjust RISC-V's FUNCTION_BOUNDARY
> 3. Adjust RISC-V's FUNCTION_BOUNDARY to let it honor to -falign-functions=N
> 4. Adding a -malign-functions=N...Okay, I know that suck idea, x86 already
> deprecated that.
> 
> But I think ideally this should fixed by 1 option if possible.
> 
> Testcase from RISC-V kernel guy:
> ```
> /* { dg-do compile } */
> /* { dg-options "-march=rv64gc -mabi=lp64d -O1 -falign-functions=128" } */
> /* { dg-final { scan-assembler-times ".align 7" 2 } } */
> 
> // Using 128 byte align rather than 4 byte align since it easier to observe.
> 
> __attribute__((__cold__)) void a() {} // This function isn't align to 128
> byte
> void b() {} // This function align to 128 byte.
> ```
> 
> Proposed fix:
> ```
> diff --git a/gcc/varasm.c b/gcc/varasm.c
> index 49d5cda122f..6f8ed85fea9 100644
> --- a/gcc/varasm.c
> +++ b/gcc/varasm.c
> @@ -1907,8 +1907,7 @@ assemble_start_function (tree decl, const char *fnname)
>   Note that we still need to align to DECL_ALIGN, as above,
>   because ASM_OUTPUT_MAX_SKIP_ALIGN might not do any alignment at all. 
> */
>if (! DECL_USER_ALIGN (decl)
> -  && align_functions.levels[0].log > align
> -  && optimize_function_for_speed_p (cfun))
> +  && align_functions.levels[0].log > align)
>  {
>  #ifdef ASM_OUTPUT_MAX_SKIP_ALIGN
>int align_log = align_functions.levels[0].log;
> 
> ```

I think this patch makes sense given the extra information you and Mark have
provided. Would you mind testing it and posting it to gcc-patches for review
please?

[Bug rust/106072] [13 Regression] -Wnonnull warning breaks rust bootstrap

2022-12-20 Thread ktkachov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106072

--- Comment #18 from ktkachov at gcc dot gnu.org ---
(In reply to Richard Biener from comment #17)
> Fixed(?)

Yes on aarch64, thanks!

[Bug target/102218] 128-bit atomic compare and exchange does not honor memory model on AArch64 and Arm

2022-12-20 Thread ktkachov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102218

ktkachov at gcc dot gnu.org changed:

   What|Removed |Added

 CC||ktkachov at gcc dot gnu.org

--- Comment #3 from ktkachov at gcc dot gnu.org ---
Does this need to be backported to other release versions as it's a wrong-code
bug?

[Bug target/95751] [aarch64] Consider using ldapr for __atomic_load_n(acquire) on ARMv8.3-RCPC

2022-12-20 Thread ktkachov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95751

ktkachov at gcc dot gnu.org changed:

   What|Removed |Added

 CC||ktkachov at gcc dot gnu.org
 Ever confirmed|0   |1
   Last reconfirmed||2022-12-20
 Status|UNCONFIRMED |NEW

--- Comment #1 from ktkachov at gcc dot gnu.org ---
I had not seen this report at the time, but LDAPR generation has now been
implemented in GCC 13.1 for acquire loads with
https://gcc.gnu.org/g:0431e8ae5bdb854bda5f9005e41c8c4d03f6d74e and follow-ups.
Any testing/evaluation/feedback would be welcome

[Bug target/107209] [13 Regression] ICE: verify_gimple failed (error: statement marked for throw, but doesn't)

2022-12-20 Thread ktkachov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107209

--- Comment #5 from ktkachov at gcc dot gnu.org ---
(In reply to Jakub Jelinek from comment #4)
> Looking at other backends, rs6000 uses in *gimple_fold_builtin gsi_replace
> (..., true);
> all the time, ix86 gsi_replace (..., false); all the time, alpha with true,
> aarch64 with true.  But perhaps what is more important if the builtins
> folded are declared nothrow or not, if they are nothrow, then they shouldn't
> have any EH edges at the start already and so it shouldn't matter what is
> used.

The vmulx_f64 intrinsic is not marked "nothrow" by the logic:
1284 static tree
1285 aarch64_get_attributes (unsigned int f, machine_mode mode)
1286 {
1287   tree attrs = NULL_TREE;
1288
1289   if (!aarch64_modifies_global_state_p (f, mode))
1290 {
1291   if (aarch64_reads_global_state_p (f, mode))
1292 attrs = aarch64_add_attribute ("pure", attrs);
1293   else
1294 attrs = aarch64_add_attribute ("const", attrs);
1295 }
1296
1297   if (!flag_non_call_exceptions || !aarch64_could_trap_p (f, mode))
1298 attrs = aarch64_add_attribute ("nothrow", attrs);
1299
1300   return aarch64_add_attribute ("leaf", attrs);
1301 }

aarch64_could_trap_p returns true for it as it can raise an FP exception.
Should that affect the nothrow attribute though? Shouldn't that be for C++
exceptions only?

[Bug middle-end/108140] ICE expanding __rbit

2022-12-16 Thread ktkachov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108140

ktkachov at gcc dot gnu.org changed:

   What|Removed |Added

   Target Milestone|--- |12.3
 CC||ktkachov at gcc dot gnu.org
 Ever confirmed|0   |1
   Last reconfirmed||2022-12-16
 Status|UNCONFIRMED |ASSIGNED

--- Comment #5 from ktkachov at gcc dot gnu.org ---
Confirmed the ICE and I'm testing a patch to fix that, thanks for the report

[Bug rust/108084] New: AArch64 Linux bootstrap failure in rust

2022-12-13 Thread ktkachov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108084

Bug ID: 108084
   Summary: AArch64 Linux bootstrap failure in rust
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Keywords: build
  Severity: normal
  Priority: P3
 Component: rust
  Assignee: unassigned at gcc dot gnu.org
  Reporter: ktkachov at gcc dot gnu.org
CC: dkm at gcc dot gnu.org
  Target Milestone: ---
  Host: aarch64-none-linux-gnu
Target: aarch64-none-linux-gnu

Congratulations on getting the rust frontend committed!
When trying a bootstrap on aarch64-none-linux with
--enable-languages=c,c++,fortran,rust I get a -Werror=nonnull failure

In file included from $SRC/gcc/rust/parse/rust-parse.h:730,
 from $SRC/gcc/rust/expand/rust-macro-builtins.cc:25:
$SRC/gcc/rust/parse/rust-parse-impl.h: In member function
'Rust::AST::ClosureParam
Rust::Parser::parse_closure_param() [with
ManagedTokenSource = Rust::Lexer]':
$SRC/gcc/rust/parse/rust-parse-impl.h:8916:70: error: 'this' pointer is null
[-Werror=nonnull]
 8916 | std::move (type), std::move (outer_attrs));
  |  ^
In file included from $SRC/gcc/rust/parse/rust-parse.h:730,
 from $SRC/gcc/rust/expand/rust-macro-expand.h:23,
 from $SRC/gcc/rust/expand/rust-macro-expand.cc:19:
$SRC/gcc/rust/parse/rust-parse-impl.h: In member function
'Rust::AST::ClosureParam
Rust::Parser::parse_closure_param() [with
ManagedTokenSource = Rust::MacroInvocLexer]':
$SRC/gcc/rust/parse/rust-parse-impl.h:8916:70: error: 'this' pointer is null
[-Werror=nonnull]
 8916 | std::move (type), std::move (outer_attrs));
  |  ^
In file included from $SRC/gcc/rust/parse/rust-parse.h:730,
 from $SRC/gcc/rust/rust-session-manager.cc:23:
$SRC/gcc/rust/parse/rust-parse-impl.h: In member function
'Rust::AST::ClosureParam
Rust::Parser::parse_closure_param() [with
ManagedTokenSource = Rust::Lexer]':
$SRC/gcc/rust/parse/rust-parse-impl.h:8916:70: error: 'this' pointer is null
[-Werror=nonnull]
 8916 | std::move (type), std::move (outer_attrs));

[Bug target/108006] [13 Regression] ICE in aarch64_move_imm building 502.gcc_r

2022-12-07 Thread ktkachov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108006

ktkachov at gcc dot gnu.org changed:

   What|Removed |Added

 CC||wdijkstr at arm dot com

--- Comment #1 from ktkachov at gcc dot gnu.org ---
Wilco, is this something you've touched recently?

[Bug target/108006] New: [13 Regression] ICE in aarch64_move_imm building 502.gcc_r

2022-12-07 Thread ktkachov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108006

Bug ID: 108006
   Summary: [13 Regression] ICE in aarch64_move_imm building
502.gcc_r
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: ktkachov at gcc dot gnu.org
  Target Milestone: ---

Building 502.gcc_r from SPEC2017 with -O2 -mcpu=neoverse-v1 ICEs with trunk.
Reduced testcase:

void c();

short *foo;
short *bar;
void
a() {
  for (bar; bar < foo; bar++)
*bar = 999;
  c();
}

backtrace is:
during RTL pass: expand
ice.c: In function a:
ice.c:8:10: internal compiler error: in aarch64_move_imm, at
config/aarch64/aarch64.cc:5692
8 | *bar = 999;
  | ~^
0x129db4c aarch64_move_imm(unsigned long, machine_mode)
$SRC/gcc/config/aarch64/aarch64.cc:5692
0x12c01cd aarch64_expand_sve_const_vector
$SRC/gcc/config/aarch64/aarch64.cc:6516
0x12c63cb aarch64_expand_mov_immediate(rtx_def*, rtx_def*)
$SRC/gcc/config/aarch64/aarch64.cc:6996
0x18c3248 gen_movvnx8hi(rtx_def*, rtx_def*)
$SRC/gcc/config/aarch64/aarch64-sve.md:662
0xa09062 rtx_insn* insn_gen_fn::operator()(rtx_def*,
rtx_def*) const
$SRC/gcc/recog.h:407
0xa09062 emit_move_insn_1(rtx_def*, rtx_def*)
$SRC/gcc/expr.cc:4172
0xa095bb emit_move_insn(rtx_def*, rtx_def*)
$SRC/gcc/expr.cc:4342
0x9db8aa copy_to_mode_reg(machine_mode, rtx_def*)
$SRC/gcc/explow.cc:654
0xd0607d maybe_legitimize_operand
$SRC/gcc/optabs.cc:7809
0xd0607d maybe_legitimize_operands(insn_code, unsigned int, unsigned int,
expand_operand*)
$SRC/gcc/optabs.cc:7941
0xd06366 maybe_gen_insn(insn_code, unsigned int, expand_operand*)
$SRC/gcc/optabs.cc:7960
0xd06592 maybe_expand_insn(insn_code, unsigned int, expand_operand*)
$SRC/gcc/optabs.cc:8005
0xd05b17 expand_insn(insn_code, unsigned int, expand_operand*)
$SRC/gcc/optabs.cc:8036
0xb53fb7 expand_partial_store_optab_fn
$SRC/gcc/internal-fn.cc:2878
0xb54307 expand_MASK_STORE
$SRC/gcc/internal-fn.def:141
0xb59960 expand_internal_call(internal_fn, gcall*)
$SRC/gcc/internal-fn.cc:4436
0xb5997a expand_internal_call(gcall*)
$SRC/gcc/internal-fn.cc:
0x8b6161 expand_call_stmt
$SRC/gcc/cfgexpand.cc:2737
0x8b6161 expand_gimple_stmt_1

[Bug target/107988] [13 Regression] ICE: in extract_insn, at recog.cc:2791 (unrecognizable insn) on aarch64-unknown-linux-gnu

2022-12-06 Thread ktkachov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107988

ktkachov at gcc dot gnu.org changed:

   What|Removed |Added

   Last reconfirmed||2022-12-06
 Ever confirmed|0   |1
 CC||ktkachov at gcc dot gnu.org,
   ||tnfchris at gcc dot gnu.org
 Status|UNCONFIRMED |NEW

--- Comment #1 from ktkachov at gcc dot gnu.org ---
Confirmed. Looks related to the recent div-by-special-constant changes but ICEs
only at -O0

[Bug target/107830] [13 Regression] ICE in gen_aarch64_bitmask_udiv3, at ./insn-opinit.h:813

2022-11-23 Thread ktkachov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107830

ktkachov at gcc dot gnu.org changed:

   What|Removed |Added

 CC||tnfchris at gcc dot gnu.org

--- Comment #2 from ktkachov at gcc dot gnu.org ---
I think it's more likely Tamar's recent patches for that optab

  1   2   3   >