[Bug tree-optimization/112450] RVV vectorization ICE in vect_get_loop_mask, at tree-vect-loop.cc:11037

2023-11-08 Thread juzhe.zhong at rivai dot ai via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112450

--- Comment #1 from JuzheZhong  ---
Oh. I see we have cond_xxx pattern for VLS modes.

like V64HImdoe. But we don't support partial vectorization for VLS modes.

VLS modes are supposed to used as SIMD GNU vectorization.

As long as COND_XXX is enabled, loop vectorizer considers target support
partial
vectorization with mask and since no while_ult, then go through AVX512 partial
vectorization.

It seems that for conditional operations, I should use backend RTL PASS to walk
around that.

[Bug tree-optimization/112450] RVV vectorization ICE in vect_get_loop_mask, at tree-vect-loop.cc:11037

2023-11-08 Thread juzhe.zhong at rivai dot ai via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112450

--- Comment #2 from JuzheZhong  ---
  if (loop_vinfo
  && LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo)
  && mask_out_inactive)
{
  if (cond_len_fn != IFN_LAST
  && direct_internal_fn_supported_p (cond_len_fn, vectype,
 OPTIMIZE_FOR_SPEED))
vect_record_loop_len (loop_vinfo, lens, ncopies * vec_num, vectype,
  1);
  else if (cond_fn != IFN_LAST
   && direct_internal_fn_supported_p (cond_fn, vectype,
  OPTIMIZE_FOR_SPEED))
vect_record_loop_mask (loop_vinfo, masks, ncopies * vec_num,
   vectype, NULL);
  else
{
  if (dump_enabled_p ())
dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
 "can't use a fully-masked loop because no"
 " conditional operation is available.\n");
  LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo) = false;
}
}

go through second condition with
vect_record_loop_mask here.

Seems that we can't differentiate RVV VLS mode with cond_xxx.

RVV VLS mode just want to support COND_XXX to support

for (int i < N)
cond[i]? a[i] + b[i] : c[i]

N is known iterations.

[Bug tree-optimization/112450] RVV vectorization ICE in vect_get_loop_mask, at tree-vect-loop.cc:11037

2023-11-08 Thread juzhe.zhong at rivai dot ai via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112450

--- Comment #3 from JuzheZhong  ---
Add cond_len pattern for VLS mode can work around this bug.
Even though COND_LEN_xxx is not eventually

Testing a patch to fix it.

[Bug tree-optimization/112450] RVV vectorization ICE in vect_get_loop_mask, at tree-vect-loop.cc:11037

2023-11-09 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112450

Richard Biener  changed:

   What|Removed |Added

 CC||rguenth at gcc dot gnu.org
 Target||riscv

--- Comment #4 from Richard Biener  ---
(In reply to JuzheZhong from comment #1)
> Oh. I see we have cond_xxx pattern for VLS modes.
> 
> like V64HImdoe. But we don't support partial vectorization for VLS modes.
> 
> VLS modes are supposed to used as SIMD GNU vectorization.
> 
> As long as COND_XXX is enabled, loop vectorizer considers target support
> partial
> vectorization with mask and since no while_ult, then go through AVX512
> partial vectorization.

I think the bug is in the AVX512 code where it probably lacks some guards.
But in theory even with RVV you can do mask based vectorization of
partial loops, the AVX512 code doesn't require .WHILE_ULT but instead
uses regular compares.

I don't think you should work around this by disabling RVV patterns here.

I can have a look later what happens.

> It seems that for conditional operations, I should use backend RTL PASS to
> walk around that.

[Bug tree-optimization/112450] RVV vectorization ICE in vect_get_loop_mask, at tree-vect-loop.cc:11037

2023-11-09 Thread juzhe.zhong at rivai dot ai via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112450

--- Comment #5 from JuzheZhong  ---
(In reply to Richard Biener from comment #4)
> (In reply to JuzheZhong from comment #1)
> > Oh. I see we have cond_xxx pattern for VLS modes.
> > 
> > like V64HImdoe. But we don't support partial vectorization for VLS modes.
> > 
> > VLS modes are supposed to used as SIMD GNU vectorization.
> > 
> > As long as COND_XXX is enabled, loop vectorizer considers target support
> > partial
> > vectorization with mask and since no while_ult, then go through AVX512
> > partial vectorization.
> 
> I think the bug is in the AVX512 code where it probably lacks some guards.
> But in theory even with RVV you can do mask based vectorization of
> partial loops, the AVX512 code doesn't require .WHILE_ULT but instead
> uses regular compares.
> 
> I don't think you should work around this by disabling RVV patterns here.
> 
> I can have a look later what happens.
> 
> > It seems that for conditional operations, I should use backend RTL PASS to
> > walk around that.

Thanks a lot Richi.

I was about to add disable cond_xxx pattern or add cond_len_xxx pattern to walk
around this issue.

Actually, we always apply partial vectorization on VLA modes.
We always use VLS modes on SIMD GNU vectorization.

We enable cond_xxx for VLS modes to handle conditional operation which 
makes use of match.pd vectorizations.

Here is the example:
https://godbolt.org/z/csx995anE

You can see with cond_div on VLS modes, we can have much better codegen.

Anyway, really appreciate you take care of this issue!

[Bug tree-optimization/112450] RVV vectorization ICE in vect_get_loop_mask, at tree-vect-loop.cc:11037

2023-11-09 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112450

Richard Biener  changed:

   What|Removed |Added

   Last reconfirmed||2023-11-09
 CC||rsandifo at gcc dot gnu.org
 Ever confirmed|0   |1
 Status|UNCONFIRMED |NEW

--- Comment #6 from Richard Biener  ---
So in fact RVV with it's single-bit element mask and the ability to
produce it from a V64QImode unsigned LT compare (but not from V64SImode?)
is supposed to be able to handle the "AVX512" style masking as far as
checking in vect_verify_full_masking_avx512 is concerned.

What I failed to implement (and check) is that the mask types have an
integer mode, thus we run into

  if (known_eq (TYPE_VECTOR_SUBPARTS (rgm->type),
TYPE_VECTOR_SUBPARTS (vectype)))
return rgm->controls[index];

  /* Split the vector if needed.  Since we are dealing with integer mode
 masks with AVX512 we can operate on the integer representation
 performing the whole vector shifting.  */
  unsigned HOST_WIDE_INT factor;
  bool ok = constant_multiple_p (TYPE_VECTOR_SUBPARTS (rgm->type),
 TYPE_VECTOR_SUBPARTS (vectype), &factor);
  gcc_assert (ok);
  gcc_assert (GET_MODE_CLASS (TYPE_MODE (rgm->type)) == MODE_INT);

it would be fine if we didn't need to split the 64 element mask into
two halves for a V32SImode vector op we need to mask here.

We try to look at the subset of the mask by converting it to a same
size integer type, right-rshift it, truncate and covert back to the
mask type.  That might or might not be possible with RVV masks (might
or might not be the "optimal" way to do things).

We can "fix" this by doing

diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index a544bc9b059..c7a92354578 100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -11034,24 +11034,24 @@ vect_get_loop_mask (loop_vec_info loop_vinfo,
   bool ok = constant_multiple_p (TYPE_VECTOR_SUBPARTS (rgm->type),
 TYPE_VECTOR_SUBPARTS (vectype), &factor);
   gcc_assert (ok);
-  gcc_assert (GET_MODE_CLASS (TYPE_MODE (rgm->type)) == MODE_INT);
   tree mask_type = truth_type_for (vectype);
-  gcc_assert (GET_MODE_CLASS (TYPE_MODE (mask_type)) == MODE_INT);
   unsigned vi = index / factor;
   unsigned vpart = index % factor;
   tree vec = rgm->controls[vi];
   gimple_seq seq = NULL;
   vec = gimple_build (&seq, VIEW_CONVERT_EXPR,
- lang_hooks.types.type_for_mode
-   (TYPE_MODE (rgm->type), 1), vec);
+ lang_hooks.types.type_for_size
+   (GET_MODE_BITSIZE (TYPE_MODE (rgm->type))
+ .to_constant (), 1), vec);
   /* For integer mode masks simply shift the right bits into position.  */
   if (vpart != 0)
vec = gimple_build (&seq, RSHIFT_EXPR, TREE_TYPE (vec), vec,
build_int_cst (integer_type_node,
   (TYPE_VECTOR_SUBPARTS (vectype)
* vpart)));
-  vec = gimple_convert (&seq, lang_hooks.types.type_for_mode
-   (TYPE_MODE (mask_type), 1), vec);
+  vec = gimple_convert (&seq, lang_hooks.types.type_for_size
+   (GET_MODE_BITSIZE (TYPE_MODE (mask_type))
+ .to_constant (), 1), vec);
   vec = gimple_build (&seq, VIEW_CONVERT_EXPR, mask_type, vec);
   if (seq)
gsi_insert_seq_before (gsi, seq, GSI_SAME_STMT);

which then generates the "expected" partial vector code.  If you don't
want partial vectors for VLS modes then I guess we could also enhance
the vector_modes "iteration" to allow the target to override
--param vect-partial-vector-usage on a per-mode base.

Or I can simply not "fix" the code above but instead add an integer mode
check to vect_verify_full_masking_avx512.  But as said, in principle this
scheme works.  That fix would be

diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index a544bc9b059..0b364ac1c6e 100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -1462,7 +1462,10 @@ vect_verify_full_masking_avx512 (loop_vec_info
loop_vinfo
)
   if (!mask_type)
continue;

-  if (TYPE_PRECISION (TREE_TYPE (mask_type)) != 1)
+  /* For now vect_get_loop_mask only supports integer mode masks
+when we need to split it.  */
+  if (GET_MODE_CLASS (TYPE_MODE (mask_type)) != MODE_INT
+ || TYPE_PRECISION (TREE_TYPE (mask_type)) != 1)
{
  ok = false;
  break;

[Bug tree-optimization/112450] RVV vectorization ICE in vect_get_loop_mask, at tree-vect-loop.cc:11037

2023-11-09 Thread juzhe.zhong at rivai dot ai via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112450

--- Comment #7 from JuzheZhong  ---

breakpoint.vect_record_loop_mask (loop_vinfo, masks, ncopies *
vec_num,
(gdb) p vectype->type_common.mode
$1 = E_V64HImode


Form my observation. It seems to be V64HImode.


I tried you patch locally, it fixes the ICE now.

Thanks!

[Bug tree-optimization/112450] RVV vectorization ICE in vect_get_loop_mask, at tree-vect-loop.cc:11037

2023-11-09 Thread juzhe.zhong at rivai dot ai via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112450

--- Comment #8 from JuzheZhong  ---

I think RVV won't use vec_pack/vec_unpack for mask.
Since we always uses len as the loop control.

I think it's fine just disable it when target doesn't support split mask
operations like RVV.

[Bug tree-optimization/112450] RVV vectorization ICE in vect_get_loop_mask, at tree-vect-loop.cc:11037

2023-11-09 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112450

Richard Biener  changed:

   What|Removed |Added

   Assignee|unassigned at gcc dot gnu.org  |rguenth at gcc dot 
gnu.org
 Status|NEW |ASSIGNED

--- Comment #9 from Richard Biener  ---
OK, I'll include it in my next round of testing.

[Bug tree-optimization/112450] RVV vectorization ICE in vect_get_loop_mask, at tree-vect-loop.cc:11037

2023-11-09 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112450

Richard Biener  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|ASSIGNED|RESOLVED

--- Comment #10 from Richard Biener  ---
commit 8863a7990e9f0cd49c8900605a2c75a0e8886e85 (origin/master, origin/HEAD)
Author: Richard Biener 
Date:   Thu Nov 9 11:44:07 2023 +0100

tree-optimization/112450 - avoid AVX512 style masking for BImode masks

The following avoids running into the AVX512 style masking code for
RVV which would theoretically be able to handle it if I were not
relying on integer mode maskness in vect_get_loop_mask.  While that's
easy to fix (patch in PR), the preference is to not have AVX512 style
masking for RVV, thus the following.

* tree-vect-loop.cc (vect_verify_full_masking_avx512):
Check we have integer mode masks as required by
vect_get_loop_mask.

[Bug tree-optimization/112450] RVV vectorization ICE in vect_get_loop_mask, at tree-vect-loop.cc:11037

2023-11-09 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112450

--- Comment #11 from CVS Commits  ---
The master branch has been updated by Pan Li :

https://gcc.gnu.org/g:83f66d90af69837f7c8fc88f8afb7074d4555394

commit r14-5278-g83f66d90af69837f7c8fc88f8afb7074d4555394
Author: Juzhe-Zhong 
Date:   Thu Nov 9 20:00:38 2023 +0800

RISC-V: Add PR112450 test to avoid regression

ICE has been fixed by
Richard:https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112450.

Add test to avoid future regression. Committed.

PR target/112450

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/pr112450.c: New test.

[Bug tree-optimization/112450] RVV vectorization ICE in vect_get_loop_mask, at tree-vect-loop.cc:11037

2023-11-12 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112450

Andrew Pinski  changed:

   What|Removed |Added

   Target Milestone|--- |14.0