[Bug target/114801] [14/15 Regression] arm: ICE in find_cached_value, at rtx-vector-builder.cc:100 with MVE intrinsics

2024-05-07 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114801

Richard Biener  changed:

   What|Removed |Added

   Target Milestone|14.0|14.2

--- Comment #34 from Richard Biener  ---
GCC 14.1 is being released, retargeting bugs to GCC 14.2.

[Bug target/114801] [14/15 Regression] arm: ICE in find_cached_value, at rtx-vector-builder.cc:100 with MVE intrinsics

2024-05-06 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114801

--- Comment #33 from Jakub Jelinek  ---
That is still a hack, but guess can be acceptable for 14.22 and short term
trunk if the ARM maintainers approve it.
But, for GCC 15+, I think if the behavior is that when the predicate
constant/register is used in an instruction, regardless of the element mode it
actually performs per-byte predication, then it should be represented as
V16BImode, not V8BImode or V4BImode.
It is fine if instructions which produce the predicate mask like comparisons
produce V8BImode or V4BImode, but what consumes should use subreg of that to
V16BImode.
At least if the behavior is either perform the operation on all elements and
then based on the 16 bits in the predicate choose result between the newly
computed result and something else on byte by byte basis.  Or perhaps if the
operation is performed only
on elements where at least one predicate bit for the element is non-zero and
then merged.

I think it would be useful if you pointed at the docs how the instructions
exactly work or tried to explain it here.

[Bug target/114801] [14/15 Regression] arm: ICE in find_cached_value, at rtx-vector-builder.cc:100 with MVE intrinsics

2024-05-06 Thread clyon at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114801

--- Comment #32 from Christophe Lyon  ---
Created attachment 58110
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=58110=edit
patch v2

Here is another patch proposal along the lines of what you suggest in #c24

[Bug target/114801] [14/15 Regression] arm: ICE in find_cached_value, at rtx-vector-builder.cc:100 with MVE intrinsics

2024-04-29 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114801

--- Comment #31 from Jakub Jelinek  ---
Seems most of the V4BI/V8BImode predicates are in UNSPECs, I think long term
turning them into just using there V16BImode might help.
Keep using V4BI/V8BImode for cases where we know it is all 0 or all 1 bits in
each element, say when some comparison produces the mask, and then just use
SUBREG to turn that into V16BImode.

[Bug target/114801] [14/15 Regression] arm: ICE in find_cached_value, at rtx-vector-builder.cc:100 with MVE intrinsics

2024-04-29 Thread clyon at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114801

--- Comment #30 from Christophe Lyon  ---

> ./cc1 -quiet -isystem include/ -march=armv8.1-m.main+mve -mfloat-abi=hard
> pr114801.c  -mthumb -O2 -da

Thanks, for some reason -O2 had disappeared from my flags, it does ICE with it.

[Bug target/114801] [14/15 Regression] arm: ICE in find_cached_value, at rtx-vector-builder.cc:100 with MVE intrinsics

2024-04-29 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114801

--- Comment #29 from Jakub Jelinek  ---
With
--- a/gcc/config/arm/arm-mve-builtins.cc
+++ b/gcc/config/arm/arm-mve-builtins.cc
@@ -2100,7 +2100,22 @@ function_expander::add_input_operand (insn_code icode,
rtx x)
   mode = GET_MODE (x);
 }
   else if (VALID_MVE_PRED_MODE (mode))
-x = gen_lowpart (mode, x);
+{
+  if (CONST_INT_P (x) && (mode == V8BImode || mode == V4BImode))
+   {
+ /* In V8BI or V4BI each element has 2 or 4 bits, if those
+bits aren't all the same, gen_lowpart might ICE.  */
+ unsigned HOST_WIDE_INT xi = UINTVAL (x);
+ if ((xi & 0x) != ((xi >> 1) & 0x)
+ || (mode == V4BImode
+ && (xi & 0x) != ((xi >> 2) & 0x)))
+   x = force_reg (HImode, x);
+   }
+  else if (SUBREG_P (x))
+   /* gen_lowpart on a SUBREG can ICE.  */
+   x = force_reg (GET_MODE (x), x);
+  x = gen_lowpart (mode, x);
+}

   m_ops.safe_grow (m_ops.length () + 1, true);
   create_input_operand (_ops.last (), x, mode);

[Bug target/114801] [14/15 Regression] arm: ICE in find_cached_value, at rtx-vector-builder.cc:100 with MVE intrinsics

2024-04-29 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114801

--- Comment #28 from Jakub Jelinek  ---
#include 
uint32x4_t test_9() {
  return vdupq_m_n_u32(vdupq_n_u32(0x), 0, 0x);
}

./cc1 -quiet -isystem include/ -march=armv8.1-m.main+mve -mfloat-abi=hard
pr114801.c  -mthumb -O2 -da
during RTL pass: cse1
dump file: pr114801.c.272r.cse1
pr114801.c: In function ‘test_9’:
pr114801.c:4:1: internal compiler error: in find_cached_value, at
rtx-vector-builder.cc:100
4 | }
  | ^
0x15a1370 rtx_vector_builder::find_cached_value()
../../gcc/rtx-vector-builder.cc:100
0x15a1123 rtx_vector_builder::build()
../../gcc/rtx-vector-builder.cc:64
0x15f2a57 native_decode_vector_rtx(machine_mode, vec const&, unsigned int, unsigned int, unsigned int)
../../gcc/simplify-rtx.cc:7269
0x15f2b7e native_decode_rtx(machine_mode, vec
const&, unsigned int)
../../gcc/simplify-rtx.cc:7289
0x15f3bd7 simplify_immed_subreg
../../gcc/simplify-rtx.cc:7529
0x15f431b simplify_context::simplify_subreg(machine_mode, rtx_def*,
machine_mode, poly_int<1u, unsigned long>)
../../gcc/simplify-rtx.cc:7603
0x105c627 simplify_subreg(machine_mode, rtx_def*, machine_mode, poly_int<1u,
unsigned long>)
../../gcc/rtl.h:3516
0x25c8293 equiv_constant
../../gcc/cse.cc:3804
0x25c65a0 fold_rtx
../../gcc/cse.cc:3138
0x25c9e7e cse_insn
../../gcc/cse.cc:4667
0x25cf1fd cse_extended_basic_block
../../gcc/cse.cc:6574
0x25cf70e cse_main
../../gcc/cse.cc:6719
0x25d1879 rest_of_handle_cse
../../gcc/cse.cc:7549
0x25d1990 execute
../../gcc/cse.cc:7594
Please submit a full bug report, with preprocessed source (by using
-freport-bug).
Please include the complete backtrace with any bug report.
See  for instructions.

(cross from x86_64-linux to armv7hl-linux-gnueabi).

[Bug target/114801] [14/15 Regression] arm: ICE in find_cached_value, at rtx-vector-builder.cc:100 with MVE intrinsics

2024-04-29 Thread clyon at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114801

--- Comment #27 from Christophe Lyon  ---
(In reply to Jakub Jelinek from comment #25)
> 
> Indeed, it ICEs e.g. during CSE.
> Though, that also means it is just about luck, if something isn't a
> CONST_INT at expansion time but simplified into CONST_INT later, it can ICE
> as well.

How did you test it to make it crash?
The (modifed) testcase compiles OK for me:
return vdupq_m_n_u32(vdupq_n_u32(0x), 0, 0x0acf);
return vdupq_m_n_u16(vdupq_n_u16(0x), 0, 0x1b0f);

[Bug target/114801] [14/15 Regression] arm: ICE in find_cached_value, at rtx-vector-builder.cc:100 with MVE intrinsics

2024-04-29 Thread clyon at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114801

--- Comment #26 from Christophe Lyon  ---
Thanks for testing, my build is still in progress.

[Bug target/114801] [14/15 Regression] arm: ICE in find_cached_value, at rtx-vector-builder.cc:100 with MVE intrinsics

2024-04-29 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114801

--- Comment #25 from Jakub Jelinek  ---
(In reply to Jakub Jelinek from comment #24)
> Another short term (14.1 only) possibility would be to force_reg a CONST_INT
> operand into a register if it didn't have all bits the same (and hope
> combine or whatever else won't simplify it again), i.e.
>else if (VALID_MVE_PRED_MODE (mode))
>  {
>if (CONST_INT_P (x) && (mode == V8BImode || mode == V4BImode))
>  {
>/* In V8BI or V4BI each element has 2 or 4 bits, if those
>   bits aren't all the same, gen_lowpart might ICE.  */
>unsigned HOST_WIDE_INT xi = UINTVAL (xi);

UINTVAL (x) above.

>if ((xi & 0x) != ((xi >> 1) & 0x)
>|| (mode == V4BImode
>&& (xi & 0x) != ((xi >> 2) & 0x)))
>  x = force_reg (HImode, x);
>  }
>else if (SUBREG_P (x))
>  /* gen_lowpart on a SUBREG can ICE.  */
>  x = force_reg (GET_MODE (x), x);
>x = gen_lowpart (mode, x);
>  }
> If that fixes just the ICE during expansion but ICEs during fwprop, combine
> etc.,
> then you might try (temporarily) harder and hide the constant from the
> optimizers,
> e.g. instead of using force_reg emit some (set (reg:HI) (unspec:HI
> (const_int ...) UNSPEC_MVE_PRED)) insn that expands like (set (reg:HI)
> (const_int ...)) but the
> optimizers don't know that.

Indeed, it ICEs e.g. during CSE.
Though, that also means it is just about luck, if something isn't a CONST_INT
at expansion time but simplified into CONST_INT later, it can ICE as well.

[Bug target/114801] [14/15 Regression] arm: ICE in find_cached_value, at rtx-vector-builder.cc:100 with MVE intrinsics

2024-04-29 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114801

--- Comment #24 from Jakub Jelinek  ---
Another short term (14.1 only) possibility would be to force_reg a CONST_INT
operand into a register if it didn't have all bits the same (and hope combine
or whatever else won't simplify it again), i.e.
   else if (VALID_MVE_PRED_MODE (mode))
 {
   if (CONST_INT_P (x) && (mode == V8BImode || mode == V4BImode))
 {
   /* In V8BI or V4BI each element has 2 or 4 bits, if those
  bits aren't all the same, gen_lowpart might ICE.  */
   unsigned HOST_WIDE_INT xi = UINTVAL (xi);
   if ((xi & 0x) != ((xi >> 1) & 0x)
   || (mode == V4BImode
   && (xi & 0x) != ((xi >> 2) & 0x)))
 x = force_reg (HImode, x);
 }
   else if (SUBREG_P (x))
 /* gen_lowpart on a SUBREG can ICE.  */
 x = force_reg (GET_MODE (x), x);
   x = gen_lowpart (mode, x);
 }
If that fixes just the ICE during expansion but ICEs during fwprop, combine
etc.,
then you might try (temporarily) harder and hide the constant from the
optimizers,
e.g. instead of using force_reg emit some (set (reg:HI) (unspec:HI (const_int
...) UNSPEC_MVE_PRED)) insn that expands like (set (reg:HI) (const_int ...))
but the
optimizers don't know that.

[Bug target/114801] [14/15 Regression] arm: ICE in find_cached_value, at rtx-vector-builder.cc:100 with MVE intrinsics

2024-04-29 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114801

--- Comment #23 from Jakub Jelinek  ---
(In reply to Christophe Lyon from comment #22)
> Sure, that's what I'm worried about.
> 
> So we can:
> - leave this as-is for gcc-14 (known bug)
> 
> - commit what we discussed in #c15 #c16, (with an improved testcase as you
> mentioned on the list,) thus at least temporarily forcing canonicalization
> (preventing users from using 'weird' values)
> 
> - possibly revisit this for gcc-15 by handling predicates differently

This primarily needs to be ARM maintainer decision.

Anyway, if the predicate is really always a byte predicate, then perhaps best
representation would be to always use V16BImode for it and model it in the RTL
patterns as VEC_SELECT of a subreg of the actual operation to V16QImode (and
then subreg back).
That might accurately model what the hw does.
Or some UNSPECs with just HImode predicate and then hide what actually happens.

[Bug target/114801] [14/15 Regression] arm: ICE in find_cached_value, at rtx-vector-builder.cc:100 with MVE intrinsics

2024-04-29 Thread clyon at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114801

--- Comment #22 from Christophe Lyon  ---
Sure, that's what I'm worried about.

So we can:
- leave this as-is for gcc-14 (known bug)

- commit what we discussed in #c15 #c16, (with an improved testcase as you
mentioned on the list,) thus at least temporarily forcing canonicalization
(preventing users from using 'weird' values)

- possibly revisit this for gcc-15 by handling predicates differently

[Bug target/114801] [14/15 Regression] arm: ICE in find_cached_value, at rtx-vector-builder.cc:100 with MVE intrinsics

2024-04-29 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114801

--- Comment #21 from Jakub Jelinek  ---
I mean if you represent it incorrectly, there are very high changes that
simplify-rtx.cc will screw this up, because it certainly doesn't understand
what a MODE_VECTOR_BOOL mode with elements other than 0/1/-1 means.

[Bug target/114801] [14/15 Regression] arm: ICE in find_cached_value, at rtx-vector-builder.cc:100 with MVE intrinsics

2024-04-29 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114801

--- Comment #20 from Jakub Jelinek  ---
Pretending a boolean mode has non-boolean value is invalid RTL (or GIMPLE).
So, the rtx-vector-builder.cc change looks wrong to me.
If you want to handle the predicate elements with element values other than all
0s or all 1s some special way, like handling it as a predication, then I think
you should stop pretending it is a boolean mode, use partial int mode or
something similar instead.  But such a change doesn't feel safe for 14.1.

[Bug target/114801] [14/15 Regression] arm: ICE in find_cached_value, at rtx-vector-builder.cc:100 with MVE intrinsics

2024-04-29 Thread clyon at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114801

--- Comment #19 from Christophe Lyon  ---
So basically values such as 0x are not UB and we want to accept them.

I tested:
diff --git a/gcc/rtx-vector-builder.cc b/gcc/rtx-vector-builder.cc
index 9509d9fc453..f89aa717903 100644
--- a/gcc/rtx-vector-builder.cc
+++ b/gcc/rtx-vector-builder.cc
@@ -96,8 +96,6 @@ rtx_vector_builder::find_cached_value ()
        return CONSTM1_RTX (m_mode);
       else if (elt == const0_rtx)
        return CONST0_RTX (m_mode);
-      else
-       gcc_unreachable ();
     }

   /* We can be called before the global vector constants are set up,
diff --git a/gcc/config/arm/arm-mve-builtins.cc
b/gcc/config/arm/arm-mve-builtins.cc
index 6a5775c67e5..6dc0b603dad 100644
--- a/gcc/config/arm/arm-mve-builtins.cc
+++ b/gcc/config/arm/arm-mve-builtins.cc
@@ -2205,7 +2205,13 @@ function_expander::add_input_operand (insn_code icode,
rtx x)
       mode = GET_MODE (x);
     }
   else if (VALID_MVE_PRED_MODE (mode))
-    x = gen_lowpart (mode, x);
+    {
+      if (SUBREG_P (x))
+       /* gen_lowpart on a SUBREG can ICE.  */
+       x = force_reg (GET_MODE (x), x);
+
+      x = gen_lowpart (mode, x);
+    }

   m_ops.safe_grow (m_ops.length () + 1, true);
   create_input_operand (_ops.last (), x, mode);


And it works: we generate
mov r2, #52428 for 0x
mov r3, #43690 for 0x

But I guess removing the call to gcc_unreachable breaks a strong assumption in
many places?

[Bug target/114801] [14/15 Regression] arm: ICE in find_cached_value, at rtx-vector-builder.cc:100 with MVE intrinsics

2024-04-29 Thread avieira at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114801

--- Comment #18 from avieira at gcc dot gnu.org ---
Sorry to be clear, the 'here' in the last sentence refers to supporting masks
as 0x to control the writing of the output register as the ISA allows,
rather than interpret 0x and 0x as the same mask.

I'll also see if I can propose a change to the ACLE specs to make this clearer.

[Bug target/114801] [14/15 Regression] arm: ICE in find_cached_value, at rtx-vector-builder.cc:100 with MVE intrinsics

2024-04-29 Thread avieira at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114801

avieira at gcc dot gnu.org changed:

   What|Removed |Added

 CC||avieira at gcc dot gnu.org

--- Comment #17 from avieira at gcc dot gnu.org ---
Before anything, it might be worth to redefine the testcase to something where
the predicate would have an effect in the result, for instance:

#include 
uint32x4_t test_9() {
  return vdupq_m_n_u32(vdupq_n_u32(0x), 0, 0x);
}

Next, it might be worth pointing out that the ISA does specify what happens
when a predicate mask does not have all bits set for a specific element.
Basically, the predicate mask operates on a per byte basis. Hence 16-bits in
the mask, controlling all 16-bytes in a vector register.

So for the above, the expected output would be {0x, 0x,
0x, 0x}.

Having said that I can see how you'd interpret the ACLE specs as defining such
a mask to be 'UB', but I believe the intent was to make clear that all bits
needed to be set if you wanted to true-predicate the full {32,16}-bit element.
This is the most common use, I can't imagine many users will be manipulating
the mask in such ways.

clang seems to follow this behavior generating an assembly sequence that leads
to the expected output, though they use vpsel probably due to some
canonicalization. And I'd prefer to be consistent with clang here.

[Bug target/114801] [14/15 Regression] arm: ICE in find_cached_value, at rtx-vector-builder.cc:100 with MVE intrinsics

2024-04-26 Thread clyon at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114801

--- Comment #16 from Christophe Lyon  ---
Thanks for the suggestion, this works.
I've posted the patch + testcase:
https://gcc.gnu.org/pipermail/gcc-patches/2024-April/650086.html

[Bug target/114801] [14/15 Regression] arm: ICE in find_cached_value, at rtx-vector-builder.cc:100 with MVE intrinsics

2024-04-26 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114801

--- Comment #15 from Jakub Jelinek  ---
https://developer.arm.com/documentation/101028/0012/14--M-profile-Vector-Extension--MVE--intrinsics
suggests that it is a UB if all the bits for a single element aren't the same
(i.e. false is all 0s, true is all 1s aka -1, anything else UB).
So the testcase is invalid at runtime, right?
It probably shouldn't error because it can be in dead code, and it certainly
shouldn't ICE, it could emit __builtin_trap or it can just canonicalize it any
way it likes it.

Seems you are canonicalizing that to 1s, given the HW behavior I think it would
be more correct to canonicalize to -1s.

I think you could simply canonicalize on the CONST_INT, so
   else if (VALID_MVE_PRED_MODE (mode))
 {
   if (CONST_INT_P (x) && (mode == V8BImode || mode == V4BImode))
 {
   /* In V8BI or V4BI each element has 2 or 4 bits, if those
  bits aren't all the same, it is UB and gen_lowpart might
  ICE.  Canonicalize all the 2 or 4 bits to all ones if
  any of them is non-zero.  */
   unsigned HOST_WIDE_INT xi = UINTVAL (xi);
   xi |= ((xi & 0x) << 1) | ((xi & 0x) >> 1);
   if (mode == V4BImode)
 xi |= ((xi & 0x) << 2) | ((xi & 0x) >> 2);
   x = gen_int_mode (xi, HImode);
 }
   else if (SUBREG_P (x))
 /* gen_lowpart on a SUBREG can ICE.  */
 x = force_reg (GET_MODE (x), x);
   x = gen_lowpart (mode, x);
 }

[Bug target/114801] [14/15 Regression] arm: ICE in find_cached_value, at rtx-vector-builder.cc:100 with MVE intrinsics

2024-04-26 Thread clyon at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114801

--- Comment #14 from Christophe Lyon  ---
Created attachment 58050
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=58050=edit
patch proposal

Here is the patch I propose.

[Bug target/114801] [14/15 Regression] arm: ICE in find_cached_value, at rtx-vector-builder.cc:100 with MVE intrinsics

2024-04-26 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114801

--- Comment #13 from Jakub Jelinek  ---
(In reply to Christophe Lyon from comment #12)
> (In reply to Jakub Jelinek from comment #11)
> > So, tried this under the debugger.  All VALID_MVE_PRED_MODE modes have the
> > same size, 2 bytes, so I'd go with
> >   else if (VALID_MVE_PRED_MODE (mode))
> > {
> >   /* unsigned short arguments to functions get promoted to int, undo
> > that.  */
> >   if (GET_MODE_SIZE (x) != GET_MODE_SIZE (HImode))
> > x = gen_lowpart (HImode, x);
> >   if (GET_MODE (x) != mode)
> > {
> >   /* Nested SUBREGs are invalid.  */
> >   if (SUBREG_P (x))
> > x = force_reg (GET_MODE (x), x);
> >   x = lowpart_subreg (mode, x,
> >   GET_MODE (x) == VOIDmode ? HImode : GET_MODE
> > (x));
> 
> This still crashes with mode == V*BI, because we reach
> rtx_vector_builder::find_cached_value() where elt is not a supported
> constant.

Ah, I was testing just V16BImode and V8BImode, with V16BImode even just
gen_lowpart
on
(const_int -13108 [0x]) works and gives
(const_vector:V16BI [
(const_int 0 [0]) repeated x2
(const_int 1 [0x1]) repeated x2
(const_int 0 [0]) repeated x2
(const_int 1 [0x1]) repeated x2
(const_int 0 [0]) repeated x2
(const_int 1 [0x1]) repeated x2
(const_int 0 [0]) repeated x2
(const_int 1 [0x1]) repeated x2
])
while for V8BImode it gives
(const_vector:V8BI [
(const_int 0 [0])
(const_int -1 [0x])
(const_int 0 [0])
(const_int -1 [0x])
(const_int 0 [0])
(const_int -1 [0x])
(const_int 0 [0])
(const_int -1 [0x])
])
Now, the question is what these weird B2Imode and B4Imode modes are about.
Are they really 2bit and 4bit booleans, with 0 being false, some value (1 or
all ones?) true, everything else UB?  Something else?
The 0x when it is 1 bit per element indeed is what the above V16BImode
CONST_VECTOR is, the 0x with 2 bits per element is 0 or -1 (but, shouldn't
that be UB?),
but with 0x with 4 bits per element it is element 0xc, that doesn't feel
right
for a boolean in any case.
native_decode_vector_rtx for the bool vectors does:
  unsigned int bit_index = first_byte * BITS_PER_UNIT + i * elt_bits;
  unsigned int byte_index = bit_index / BITS_PER_UNIT;
  unsigned int lsb = bit_index % BITS_PER_UNIT;
  unsigned int value = bytes[byte_index] >> lsb;
  builder.quick_push (gen_int_mode (value, GET_MODE_INNER (mode)));
and kind of expects that gen_int_mode canonicalizes it to some boolean value
(apparently it can handle both all ones and 1 as true, but not other values).
93if (elt == const1_rtx)
94  return CONST1_RTX (m_mode);
95else if (elt == constm1_rtx)
96  return CONSTM1_RTX (m_mode);
97else if (elt == const0_rtx)
98  return CONST0_RTX (m_mode);
99else
100 gcc_unreachable ();
Guess you can get similar ICE for V8BImode if some 2 bit pair is 2 (3 and 1 are
considered true, one of them -1, another 1) and 0 is false;
and for V4BImode obviously far more invalid values.
If the CPU somehow canonicalizes, say any non-zero 2-bit pair (or 4-bit pair)
is considered true (or say decide just based on most significant or least
significant bit), then perhaps you need to do that canonicalization by hand.

[Bug target/114801] [14/15 Regression] arm: ICE in find_cached_value, at rtx-vector-builder.cc:100 with MVE intrinsics

2024-04-26 Thread clyon at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114801

--- Comment #12 from Christophe Lyon  ---
(In reply to Jakub Jelinek from comment #11)
> So, tried this under the debugger.  All VALID_MVE_PRED_MODE modes have the
> same size, 2 bytes, so I'd go with
>   else if (VALID_MVE_PRED_MODE (mode))
> {
>   /* unsigned short arguments to functions get promoted to int, undo
> that.  */
>   if (GET_MODE_SIZE (x) != GET_MODE_SIZE (HImode))
> x = gen_lowpart (HImode, x);
>   if (GET_MODE (x) != mode)
> {
>   /* Nested SUBREGs are invalid.  */
>   if (SUBREG_P (x))
> x = force_reg (GET_MODE (x), x);
>   x = lowpart_subreg (mode, x,
>   GET_MODE (x) == VOIDmode ? HImode : GET_MODE
> (x));

This still crashes with mode == V*BI, because we reach
rtx_vector_builder::find_cached_value() where elt is not a supported constant.

[Bug target/114801] [14/15 Regression] arm: ICE in find_cached_value, at rtx-vector-builder.cc:100 with MVE intrinsics

2024-04-26 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114801

--- Comment #11 from Jakub Jelinek  ---
So, tried this under the debugger.  All VALID_MVE_PRED_MODE modes have the same
size, 2 bytes, so I'd go with
  else if (VALID_MVE_PRED_MODE (mode))
{
  /* unsigned short arguments to functions get promoted to int, undo that. 
*/
  if (GET_MODE_SIZE (x) != GET_MODE_SIZE (HImode))
x = gen_lowpart (HImode, x);
  if (GET_MODE (x) != mode)
{
  /* Nested SUBREGs are invalid.  */
  if (SUBREG_P (x))
x = force_reg (GET_MODE (x), x);
  x = lowpart_subreg (mode, x,
  GET_MODE (x) == VOIDmode ? HImode : GET_MODE
(x));
}
}
It would surprise me if (subreg:V4BI (reg:SI 116 [ _3 ]) 0) gets through
validation given the differences in sizes, but if you really want to handle
that as before, then just:
  else if (VALID_MVE_PRED_MODE (mode))
{
  if (GET_CODE (x) == CONST_INT)
x = lowpart_subreg (mode, gen_lowpart (HImode, x), HImode);
  else if (GET_MODE (x) != mode)
{
  if (SUBREG_P (x))
x = force_reg (GET_MODE (x), x);
  x = gen_lowpart (mode, x);
}
}
i.e. use lowpart_subreg on gen_lowpart on CONST_INTs (gen_lowpart effectively
does
lowpart_subreg (HImode, x, DImode) in that case), the force_reg case to avoid
ICEs if say (subreg:SI (reg:DI ...) 0) is passed and gen_lowpart as before.

[Bug target/114801] [14/15 Regression] arm: ICE in find_cached_value, at rtx-vector-builder.cc:100 with MVE intrinsics

2024-04-26 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114801

--- Comment #10 from Jakub Jelinek  ---
(mode == HImode || mode == V16BImode) could be GET_MODE_SIZE (mode) == 2 too.

[Bug target/114801] [14/15 Regression] arm: ICE in find_cached_value, at rtx-vector-builder.cc:100 with MVE intrinsics

2024-04-26 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114801

--- Comment #9 from Jakub Jelinek  ---
(In reply to Christophe Lyon from comment #7)
> This fails because down the call chain from lowpart_subreg, we reach
> gcc_unreachable in rtx_vector_builder::find_cached_value because m_mode ==
> V4BImode (so is MODE_VECTOR_BOOL), but is not a valid expected boolean
> constant vector.

Ah, so the builtin prototype says the argument is unsigned short (or unsigned
char?)
but mode (as the insn predicate mode) is V*BImode in that case?
Then it would still be passed as SImode, so one would need
if (GET_MODE (x) == SImode || GET_MODE (x) == VOIDmode)
  {
x = lowpart_subreg ((mode == HImode || mode == V16BImode) ? HImode :
QImode, x, SImode);
if (mode != HImode)
  x = lowpart_subreg (mode, x, mode == V16BImode ? HImode : QImode);
  }
Does that second lowpart_subreg create CONST_VECTOR you are looking for?

[Bug target/114801] [14/15 Regression] arm: ICE in find_cached_value, at rtx-vector-builder.cc:100 with MVE intrinsics

2024-04-26 Thread clyon at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114801

--- Comment #8 from Christophe Lyon  ---
(In reply to Jakub Jelinek from comment #5)
> Guess the primary question is why there is the gen_lowpart call at all.
> Is it that normally the mode of x is already right due to the prototypes of
> the builtins, with the exception that gcc likes to promote QImode/HImode
> arguments of calls to SImode, so is the intent in that case to just narrow
> down SImode back to HImode (seems VALID_MVE_PRED_MODE is only true for
> HImode from scalar MODE_INT modes)?
> 
We have mode == V4BImode (could be V16BI or V8BI, it depends on the intrinsic
being expanded)
and x is HImode.
The intent is to generate:
(set (reg:V4BI 122)
   (subreg:V4BI (reg:SI 116 [ _3 ]) 0))

This works if x is not a constant (this is what we have in trunk)

> If so, best would be to limit the call to just that case.
> So (completely untested):
> --- gcc/config/arm/arm-mve-builtins.cc.jj 2024-03-19 09:51:05.203631194 
> +0100
> +++ gcc/config/arm/arm-mve-builtins.cc2024-04-26 15:49:55.380344060 
> +0200
> @@ -2100,7 +2100,12 @@ function_expander::add_input_operand (in
>mode = GET_MODE (x);
>  }
>else if (VALID_MVE_PRED_MODE (mode))
> -x = gen_lowpart (mode, x);
> +{
> +  if (mode == HImode && GET_MODE (x) != HImode)
> + /* GCC promotes QI/HImode arguments to int, undo that
> +here.  */
> + x = lowpart_subreg (mode, x, SImode);

So we won't enter the 'if' since mode != HImode

> +}
>  
>m_ops.safe_grow (m_ops.length () + 1, true);
>create_input_operand (_ops.last (), x, mode);
> 
> I'd hope if the argument is a vector mode x already has that mode...

[Bug target/114801] [14/15 Regression] arm: ICE in find_cached_value, at rtx-vector-builder.cc:100 with MVE intrinsics

2024-04-26 Thread clyon at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114801

--- Comment #7 from Christophe Lyon  ---
(In reply to Jakub Jelinek from comment #4)
> (In reply to Christophe Lyon from comment #3)
> > lowpart_subreg does not work either.
> > 
> > If we put the predicates in a variable in the testcase, then in
> > function_expander::add_input_operand() x is already a (subreg/s/v:HI (reg:SI
> > 116 [ _3 ]) 0) so taking gen_lowpart of that is OK (we just want HImode to
> > get the 16 bits of predicates).
> 
> If x could be e.g. (subreg:SI (reg:DI ...) ...), then one needs to use for
> GET_CODE (x) == SUBREG && GET_MODE (x) != mode do a force_reg first.
> 

No sure I got this right:
if (GET_CODE (x) == SUBREG && GET_MODE (x) != mode)
  x = force_reg (mode, x);
breaks the assert in emit_move_insn because mode is V4BImode and 'y' is HImode.

> > But when predicates are passed as a constant as in the testcase, we have
> > x = (const_int -13108 [0x])
> > and gen_lowpart (HImode, x) fails on that.
> 
> Why doesn't lowpart_subreg (mode, x, GET_MODE (x) == VOIDmode ? DImode :
> GET_MODE (x));
> work?
> I.e. for CONST_INTs assume it is some constant in a mode equal or wider than
> mode.
> Unless mode can be TImode or x can be __int128 constants, that is.
>
This fails because down the call chain from lowpart_subreg, we reach
gcc_unreachable in rtx_vector_builder::find_cached_value because m_mode ==
V4BImode (so is MODE_VECTOR_BOOL), but is not a valid expected boolean constant
vector.

[Bug target/114801] [14/15 Regression] arm: ICE in find_cached_value, at rtx-vector-builder.cc:100 with MVE intrinsics

2024-04-26 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114801

--- Comment #6 from Jakub Jelinek  ---
(In reply to Jakub Jelinek from comment #5)
> So (completely untested):
> --- gcc/config/arm/arm-mve-builtins.cc.jj 2024-03-19 09:51:05.203631194 
> +0100
> +++ gcc/config/arm/arm-mve-builtins.cc2024-04-26 15:49:55.380344060 
> +0200
> @@ -2100,7 +2100,12 @@ function_expander::add_input_operand (in
>mode = GET_MODE (x);
>  }
>else if (VALID_MVE_PRED_MODE (mode))
> -x = gen_lowpart (mode, x);
> +{
> +  if (mode == HImode && GET_MODE (x) != HImode)
> + /* GCC promotes QI/HImode arguments to int, undo that
> +here.  */
> + x = lowpart_subreg (mode, x, SImode);

Perhaps even
  else
gcc_assert (GET_MODE (x) == mode);
here?

[Bug target/114801] [14/15 Regression] arm: ICE in find_cached_value, at rtx-vector-builder.cc:100 with MVE intrinsics

2024-04-26 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114801

--- Comment #5 from Jakub Jelinek  ---
Guess the primary question is why there is the gen_lowpart call at all.
Is it that normally the mode of x is already right due to the prototypes of the
builtins, with the exception that gcc likes to promote QImode/HImode arguments
of calls to SImode, so is the intent in that case to just narrow down SImode
back to HImode (seems VALID_MVE_PRED_MODE is only true for HImode from scalar
MODE_INT modes)?

If so, best would be to limit the call to just that case.
So (completely untested):
--- gcc/config/arm/arm-mve-builtins.cc.jj   2024-03-19 09:51:05.203631194
+0100
+++ gcc/config/arm/arm-mve-builtins.cc  2024-04-26 15:49:55.380344060 +0200
@@ -2100,7 +2100,12 @@ function_expander::add_input_operand (in
   mode = GET_MODE (x);
 }
   else if (VALID_MVE_PRED_MODE (mode))
-x = gen_lowpart (mode, x);
+{
+  if (mode == HImode && GET_MODE (x) != HImode)
+   /* GCC promotes QI/HImode arguments to int, undo that
+  here.  */
+   x = lowpart_subreg (mode, x, SImode);
+}

   m_ops.safe_grow (m_ops.length () + 1, true);
   create_input_operand (_ops.last (), x, mode);

I'd hope if the argument is a vector mode x already has that mode...

[Bug target/114801] [14/15 Regression] arm: ICE in find_cached_value, at rtx-vector-builder.cc:100 with MVE intrinsics

2024-04-26 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114801

--- Comment #4 from Jakub Jelinek  ---
(In reply to Christophe Lyon from comment #3)
> lowpart_subreg does not work either.
> 
> If we put the predicates in a variable in the testcase, then in
> function_expander::add_input_operand() x is already a (subreg/s/v:HI (reg:SI
> 116 [ _3 ]) 0) so taking gen_lowpart of that is OK (we just want HImode to
> get the 16 bits of predicates).

If x could be e.g. (subreg:SI (reg:DI ...) ...), then one needs to use for
GET_CODE (x) == SUBREG && GET_MODE (x) != mode do a force_reg first.

> But when predicates are passed as a constant as in the testcase, we have
> x = (const_int -13108 [0x])
> and gen_lowpart (HImode, x) fails on that.

Why doesn't lowpart_subreg (mode, x, GET_MODE (x) == VOIDmode ? DImode :
GET_MODE (x));
work?
I.e. for CONST_INTs assume it is some constant in a mode equal or wider than
mode.
Unless mode can be TImode or x can be __int128 constants, that is.

> I'm trying an approach to convert the constant into a vector of constants
> whose mode will V4BImode in this case.

[Bug target/114801] [14/15 Regression] arm: ICE in find_cached_value, at rtx-vector-builder.cc:100 with MVE intrinsics

2024-04-26 Thread clyon at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114801

--- Comment #3 from Christophe Lyon  ---
lowpart_subreg does not work either.

If we put the predicates in a variable in the testcase, then in
function_expander::add_input_operand() x is already a (subreg/s/v:HI (reg:SI
116 [ _3 ]) 0) so taking gen_lowpart of that is OK (we just want HImode to get
the 16 bits of predicates).

But when predicates are passed as a constant as in the testcase, we have
x = (const_int -13108 [0x])
and gen_lowpart (HImode, x) fails on that.

I'm trying an approach to convert the constant into a vector of constants whose
mode will V4BImode in this case.