Re: [PATCH v5] RISC-V: Refactor requirement of ZVFH and ZVFHMIN.

2023-06-07 Thread juzhe.zh...@rivai.ai
I am not sure for load/stores of FP16 vector should be gated by ZVFHMIN or ZVFH?
Since IMHO, load/stores of FP16 is no different from load/stores of INT16?



juzhe.zh...@rivai.ai
 
From: pan2.li
Date: 2023-06-07 16:06
To: gcc-patches
CC: juzhe.zhong; rdapp.gcc; jeffreyalaw; pan2.li; yanzhang.wang
Subject: [PATCH v5] RISC-V: Refactor requirement of ZVFH and ZVFHMIN.
From: Pan Li 
 
This patch would like to refactor the requirement of both the ZVFH
and ZVFHMIN. The related define_insn and iterator will take the
requirement based on the ZVFHMIN and ZVFH.
 
Please note the ZVFH will cover the ZVFHMIN instructions. This patch
add one test for this.
 
Signed-off-by: Pan Li 
 
gcc/ChangeLog:
 
* config/riscv/vector-iterators.md: Add requirement to VF,
VWEXTF and VWCONVERTI, add V_CONVERT_F and VCONVERTF.
* config/riscv/vector.md: Adjust FP convert to V_CONVERT_F
and VCONVERTF, and fix V_WHOLE and V_FRACT.
 
gcc/testsuite/ChangeLog:
 
* gcc.target/riscv/rvv/base/zvfh-over-zvfhmin.c: New test.
---
gcc/config/riscv/vector-iterators.md  | 79 +--
gcc/config/riscv/vector.md| 46 +--
.../riscv/rvv/base/zvfh-over-zvfhmin.c| 25 ++
3 files changed, 104 insertions(+), 46 deletions(-)
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/zvfh-over-zvfhmin.c
 
diff --git a/gcc/config/riscv/vector-iterators.md 
b/gcc/config/riscv/vector-iterators.md
index f4946d84449..e6c2ecf7c86 100644
--- a/gcc/config/riscv/vector-iterators.md
+++ b/gcc/config/riscv/vector-iterators.md
@@ -296,13 +296,13 @@ (define_mode_iterator VWI_ZVE32 [
])
(define_mode_iterator VF [
-  (VNx1HF "TARGET_VECTOR_ELEN_FP_16 && TARGET_MIN_VLEN < 128")
-  (VNx2HF "TARGET_VECTOR_ELEN_FP_16")
-  (VNx4HF "TARGET_VECTOR_ELEN_FP_16")
-  (VNx8HF "TARGET_VECTOR_ELEN_FP_16")
-  (VNx16HF "TARGET_VECTOR_ELEN_FP_16")
-  (VNx32HF "TARGET_VECTOR_ELEN_FP_16 && TARGET_MIN_VLEN > 32")
-  (VNx64HF "TARGET_VECTOR_ELEN_FP_16 && TARGET_MIN_VLEN >= 128")
+  (VNx1HF "TARGET_ZVFH && TARGET_MIN_VLEN < 128")
+  (VNx2HF "TARGET_ZVFH")
+  (VNx4HF "TARGET_ZVFH")
+  (VNx8HF "TARGET_ZVFH")
+  (VNx16HF "TARGET_ZVFH")
+  (VNx32HF "TARGET_ZVFH && TARGET_MIN_VLEN > 32")
+  (VNx64HF "TARGET_ZVFH && TARGET_MIN_VLEN >= 128")
   (VNx1SF "TARGET_VECTOR_ELEN_FP_32 && TARGET_MIN_VLEN < 128")
   (VNx2SF "TARGET_VECTOR_ELEN_FP_32")
@@ -453,9 +453,8 @@ (define_mode_iterator V_WHOLE [
   (VNx1DI "TARGET_VECTOR_ELEN_64 && TARGET_MIN_VLEN < 128") (VNx2DI 
"TARGET_VECTOR_ELEN_64")
   (VNx4DI "TARGET_VECTOR_ELEN_64") (VNx8DI "TARGET_VECTOR_ELEN_64") (VNx16DI 
"TARGET_VECTOR_ELEN_64 && TARGET_MIN_VLEN >= 128")
-  (VNx1HF "TARGET_VECTOR_ELEN_FP_16 && TARGET_MIN_VLEN < 128")
-  (VNx2HF "TARGET_VECTOR_ELEN_FP_16")
-  (VNx4HF "TARGET_VECTOR_ELEN_FP_16")
+  (VNx2HF "TARGET_VECTOR_ELEN_FP_16 && TARGET_MIN_VLEN == 32")
+  (VNx4HF "TARGET_VECTOR_ELEN_FP_16 && TARGET_MIN_VLEN == 64")
   (VNx8HF "TARGET_VECTOR_ELEN_FP_16")
   (VNx16HF "TARGET_VECTOR_ELEN_FP_16")
   (VNx32HF "TARGET_VECTOR_ELEN_FP_16 && TARGET_MIN_VLEN > 32")
@@ -477,7 +476,11 @@ (define_mode_iterator V_WHOLE [
(define_mode_iterator V_FRACT [
   (VNx1QI "TARGET_MIN_VLEN < 128") VNx2QI (VNx4QI "TARGET_MIN_VLEN > 32") 
(VNx8QI "TARGET_MIN_VLEN >= 128")
   (VNx1HI "TARGET_MIN_VLEN < 128") (VNx2HI "TARGET_MIN_VLEN > 32") (VNx4HI 
"TARGET_MIN_VLEN >= 128")
-  (VNx1HF "TARGET_MIN_VLEN < 128") (VNx2HF "TARGET_MIN_VLEN > 32") (VNx4HF 
"TARGET_MIN_VLEN >= 128")
+
+  (VNx1HF "TARGET_VECTOR_ELEN_FP_16 && TARGET_MIN_VLEN < 128")
+  (VNx2HF "TARGET_VECTOR_ELEN_FP_16 && TARGET_MIN_VLEN > 32")
+  (VNx4HF "TARGET_VECTOR_ELEN_FP_16 && TARGET_MIN_VLEN >= 128")
+
   (VNx1SI "TARGET_MIN_VLEN > 32 && TARGET_MIN_VLEN < 128") (VNx2SI 
"TARGET_MIN_VLEN >= 128")
   (VNx1SF "TARGET_VECTOR_ELEN_FP_32 && TARGET_MIN_VLEN > 32 && TARGET_MIN_VLEN 
< 128")
   (VNx2SF "TARGET_VECTOR_ELEN_FP_32 && TARGET_MIN_VLEN >= 128")
@@ -497,12 +500,12 @@ (define_mode_iterator VWEXTI [
])
(define_mode_iterator VWEXTF [
-  (VNx1SF "TARGET_VECTOR_ELEN_FP_32 && TARGET_MIN_VLEN < 128")
-  (VNx2SF "TARGET_VECTOR_ELEN_FP_32")
-  (VNx4SF "TARGET_VECTOR_ELEN_FP_32")
-  (VNx8SF "TARGET_VECTOR_ELEN_FP_32")
-  (VNx16SF "TARGET_VECTOR_ELEN_FP_32 && TARGET_MIN_VLEN > 32")
-  (VNx32SF &q

Re: Re: [PATCH V3] VECT: Add SELECT_VL support

2023-06-07 Thread juzhe.zh...@rivai.ai
Hi, Richi.
Thanks for review.

>> At least for VMAT_GATHER_SCATTER you wouldn't execute this function
>> but get into
>>This function belongs to tree-vect-data-refs.cc alongside the
>>other vect_create_data_ref_* functions.
I want to support data reference pointer adjusted by outcome of SELECT_VL.
for contiguous load/store, gather load/scatter store, load lanes/store lanes. 

For continguous load/store:
void f (int32_t * a, int32_t *b, int n)
  for (int i; i < n; i++)
if (cond[i])
  a[i] = a[i] + b[i];

The optimal gimple IR should be:
...
len = SELECT_VL (VF)
...
v = LEN_MASK_LOAD (len, ptr1, mask)
v2 = LEN_MASK_LOAD (len, ptr2, mask)
v2 = v + v2;
LEN_MASK_STORE (v2, len, ptr3, mask)
ptr1 = ptr1 + len * 4 (adjust to byte).
ptr2 = ptr2 + len * 4 (adjust to byte).
ptr3 = ptr3 + len * 4 (adjust to byte).
...

For gather/scatter:
void f (int32_t * a, int32_t *b, int n)
  for (int i; i < n; i++)
if (cond[i])
a[i *m +n] = a[i *m + n] + b[i];

The optimal gimple IR should be:
...
len = SELECT_VL (VF)
...
v = LEN_MASK_GATHER_LOAD (len, ptr1, mask)
v2 = LEN_MASK_GATHER_LOAD (len, ptr2, mask)
v2 = v + v2;
LEN_MASK_SCATTER_STORE (v2, len, ptr3, mask)
ptr1 = ptr1 + len * 4 (adjust to byte).
ptr2 = ptr2 + len * 4 (adjust to byte).
ptr3 = ptr3 + len * 4 (adjust to byte).
...

load_lanes/store_lanes similar.

Could you share more details to teach me how to write the codes ?


>>... before the loop we compute 'bump' and that you do not touch?  That
>>means you do not support ncopies != 1?  Can you place an assert
>>in the else branch of the if (j == 0) that LOOP_VINFO_USING_SELECT_VL_P
>>is false?
Ok.

>>because indeed you have to place the computation inside of the loop.
>>And also in the place we compute 'bump' add the case
>>LOOP_VINFO_USING_SELECT_VL_P initializing bump to NULL_TREE and
>>a comment indicating the variable bump is computed inside the loop.
>>For the case of ncopies != 1 how are you dealing with that?  I suppose
>>for j != ncopies - 1 the bump is actually exactly VF?
Yes, we only do the SELECT_VL for single-rgroup, so ncopies should always be 1.
For ncopies != 1, we always use VF.


Thanks.


juzhe.zh...@rivai.ai
 
From: Richard Biener
Date: 2023-06-07 15:41
To: Ju-Zhe Zhong
CC: gcc-patches; richard.sandiford
Subject: Re: [PATCH V3] VECT: Add SELECT_VL support
On Mon, 5 Jun 2023, juzhe.zh...@rivai.ai wrote:
 
> From: Ju-Zhe Zhong 
> 
> Co-authored-by: Richard Sandiford
> 
> This patch address comments from Richard and rebase to trunk.
> 
> This patch is adding SELECT_VL middle-end support
> allow target have target dependent optimization in case of
> length calculation.
> 
> This patch is inspired by RVV ISA and LLVM:
> https://reviews.llvm.org/D99750
> 
> The SELECT_VL is same behavior as LLVM "get_vector_length" with
> these following properties:
> 
> 1. Only apply on single-rgroup.
> 2. non SLP.
> 3. adjust loop control IV.
> 4. adjust data reference IV.
> 5. allow non-vf elements processing in non-final iteration
> 
> Code:
># void vvaddint32(size_t n, const int*x, const int*y, int*z)
> # { for (size_t i=0; i 
> Take RVV codegen for example:
> 
> Before this patch:
> vvaddint32:
> ble a0,zero,.L6
> csrra4,vlenb
> srlia6,a4,2
> .L4:
> mv  a5,a0
> bleua0,a6,.L3
> mv  a5,a6
> .L3:
> vsetvli zero,a5,e32,m1,ta,ma
> vle32.v v2,0(a1)
> vle32.v v1,0(a2)
> vsetvli a7,zero,e32,m1,ta,ma
> sub a0,a0,a5
> vadd.vv v1,v1,v2
> vsetvli zero,a5,e32,m1,ta,ma
> vse32.v v1,0(a3)
> add a2,a2,a4
> add a3,a3,a4
> add a1,a1,a4
> bne a0,zero,.L4
> .L6:
> ret
> 
> After this patch:
> 
> vvaddint32:
> vsetvli t0, a0, e32, ta, ma  # Set vector length based on 32-bit vectors
> vle32.v v0, (a1) # Get first vector
>   sub a0, a0, t0 # Decrement number done
>   slli t0, t0, 2 # Multiply number done by 4 bytes
>   add a1, a1, t0 # Bump pointer
> vle32.v v1, (a2) # Get second vector
>   add a2, a2, t0 # Bump pointer
> vadd.vv v2, v0, v1   # Sum vectors
> vse32.v v2, (a3) # Store result
>   add a3, a3, t0 # Bump pointer
>   bnez a0, vvaddint32# Loop back
>   ret# Finished
> 
> gcc/ChangeLog:
> 
> * doc/md.texi: Add SELECT_VL support.
> * internal-fn.def (SELECT_VL): Ditto.
> * optabs.def (OPTAB_D): Ditto.
> * tree-vect-loop-manip.cc (vect_set_loop_controls_directly): Ditto.
> * tree-v

Re: [PATCH v3] RISC-V: Refactor requirement of ZVFH and ZVFHMIN.

2023-06-06 Thread juzhe.zh...@rivai.ai
HI,  

+  (VNx1SF "TARGET_VECTOR_ELEN_FP_16 && TARGET_MIN_VLEN < 128")
+  (VNx2SF "TARGET_VECTOR_ELEN_FP_16")
+  (VNx4SF "TARGET_VECTOR_ELEN_FP_16")
+  (VNx8SF "TARGET_VECTOR_ELEN_FP_16")
+  (VNx16SF "TARGET_VECTOR_ELEN_FP_16 && TARGET_MIN_VLEN > 32")
+  (VNx32SF "TARGET_VECTOR_ELEN_FP_16 && TARGET_MIN_VLEN >= 128")

Add TARGET_VECTOR_ELEN_FP_32 here, for FP16->FP32 conversion,
we need both ELEN_FP16 and ELEN_FP32 enable.





juzhe.zh...@rivai.ai
 
From: pan2.li
Date: 2023-06-07 11:00
To: gcc-patches
CC: juzhe.zhong; kito.cheng; pan2.li; yanzhang.wang
Subject: [PATCH v3] RISC-V: Refactor requirement of ZVFH and ZVFHMIN.
From: Pan Li 
 
This patch would like to refactor the requirement of both the ZVFH
and ZVFHMIN. The related define_insn and iterator will take the
requirement based on the ZVFHMIN and ZVFH.
 
Please note the ZVFH will cover the ZVFHMIN instructions. This patch
add one test for this.
 
Signed-off-by: Pan Li 
 
gcc/ChangeLog:
 
* config/riscv/vector-iterators.md: Add requirement to VF,
VWEXTF and VWCONVERTI, add V_CONVERT_F and VCONVERTF.
* config/riscv/vector.md: Adjust FP convert to V_CONVERT_F
and VCONVERTF.
 
gcc/testsuite/ChangeLog:
 
* gcc.target/riscv/rvv/base/zvfh-over-zvfhmin.c: New test.
---
gcc/config/riscv/vector-iterators.md  | 68 +--
gcc/config/riscv/vector.md| 46 ++---
.../riscv/rvv/base/zvfh-over-zvfhmin.c| 25 +++
3 files changed, 97 insertions(+), 42 deletions(-)
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/zvfh-over-zvfhmin.c
 
diff --git a/gcc/config/riscv/vector-iterators.md 
b/gcc/config/riscv/vector-iterators.md
index f4946d84449..1dc82bd44d3 100644
--- a/gcc/config/riscv/vector-iterators.md
+++ b/gcc/config/riscv/vector-iterators.md
@@ -296,13 +296,13 @@ (define_mode_iterator VWI_ZVE32 [
])
(define_mode_iterator VF [
-  (VNx1HF "TARGET_VECTOR_ELEN_FP_16 && TARGET_MIN_VLEN < 128")
-  (VNx2HF "TARGET_VECTOR_ELEN_FP_16")
-  (VNx4HF "TARGET_VECTOR_ELEN_FP_16")
-  (VNx8HF "TARGET_VECTOR_ELEN_FP_16")
-  (VNx16HF "TARGET_VECTOR_ELEN_FP_16")
-  (VNx32HF "TARGET_VECTOR_ELEN_FP_16 && TARGET_MIN_VLEN > 32")
-  (VNx64HF "TARGET_VECTOR_ELEN_FP_16 && TARGET_MIN_VLEN >= 128")
+  (VNx1HF "TARGET_ZVFH && TARGET_MIN_VLEN < 128")
+  (VNx2HF "TARGET_ZVFH")
+  (VNx4HF "TARGET_ZVFH")
+  (VNx8HF "TARGET_ZVFH")
+  (VNx16HF "TARGET_ZVFH")
+  (VNx32HF "TARGET_ZVFH && TARGET_MIN_VLEN > 32")
+  (VNx64HF "TARGET_ZVFH && TARGET_MIN_VLEN >= 128")
   (VNx1SF "TARGET_VECTOR_ELEN_FP_32 && TARGET_MIN_VLEN < 128")
   (VNx2SF "TARGET_VECTOR_ELEN_FP_32")
@@ -497,12 +497,12 @@ (define_mode_iterator VWEXTI [
])
(define_mode_iterator VWEXTF [
-  (VNx1SF "TARGET_VECTOR_ELEN_FP_32 && TARGET_MIN_VLEN < 128")
-  (VNx2SF "TARGET_VECTOR_ELEN_FP_32")
-  (VNx4SF "TARGET_VECTOR_ELEN_FP_32")
-  (VNx8SF "TARGET_VECTOR_ELEN_FP_32")
-  (VNx16SF "TARGET_VECTOR_ELEN_FP_32 && TARGET_MIN_VLEN > 32")
-  (VNx32SF "TARGET_VECTOR_ELEN_FP_32 && TARGET_MIN_VLEN >= 128")
+  (VNx1SF "TARGET_ZVFH && TARGET_MIN_VLEN < 128")
+  (VNx2SF "TARGET_ZVFH")
+  (VNx4SF "TARGET_ZVFH")
+  (VNx8SF "TARGET_ZVFH")
+  (VNx16SF "TARGET_ZVFH && TARGET_MIN_VLEN > 32")
+  (VNx32SF "TARGET_ZVFH && TARGET_MIN_VLEN >= 128")
   (VNx1DF "TARGET_VECTOR_ELEN_FP_64 && TARGET_MIN_VLEN < 128")
   (VNx2DF "TARGET_VECTOR_ELEN_FP_64")
@@ -512,12 +512,12 @@ (define_mode_iterator VWEXTF [
])
(define_mode_iterator VWCONVERTI [
-  (VNx1SI "TARGET_MIN_VLEN < 128 && TARGET_VECTOR_ELEN_FP_16")
-  (VNx2SI "TARGET_VECTOR_ELEN_FP_16")
-  (VNx4SI "TARGET_VECTOR_ELEN_FP_16")
-  (VNx8SI "TARGET_VECTOR_ELEN_FP_16")
-  (VNx16SI "TARGET_MIN_VLEN > 32 && TARGET_VECTOR_ELEN_FP_16")
-  (VNx32SI "TARGET_MIN_VLEN >= 128 && TARGET_VECTOR_ELEN_FP_16")
+  (VNx1SI "TARGET_ZVFH && TARGET_MIN_VLEN < 128")
+  (VNx2SI "TARGET_ZVFH")
+  (VNx4SI "TARGET_ZVFH")
+  (VNx8SI "TARGET_ZVFH")
+  (VNx16SI "TARGET_ZVFH && TARGET_MIN_VLEN > 32")
+  (VNx32SI "TARGET_ZVFH && TARGET_MIN_VLEN >= 128")
   (VNx1DI "TARGET_VECTOR_ELEN_64 && TARGET_VECTOR_ELEN_FP_32 && 
TARGET_MIN_VLEN < 128")
   (VNx2DI "TARGET_VECTOR_ELEN_64 && TARGET_VECTOR_ELEN_FP_32")
@@ -526,6 +526,21 @@ (define_mode_iterator VWCONVERTI [
   (VNx16DI "TARGET_VECTOR_ELEN_64 &am

Re: Re: [PATCH] RISC-V: Support RVV VLA SLP auto-vectorization

2023-06-06 Thread juzhe.zh...@rivai.ai
Hi, Thanks kito..

I have added comments as you suggested.

>> Do we have check builder.npatterns () must be power of 2 in somewhere?
I also added:
  /* We don't enable SLP for non-power of 2 NPATTERNS.  */
  if (!pow2p_hwi (d->perm.encoding().npatterns ()))
return false;

too.

To make sure we won't break and cause ICE.

Committed soon.


juzhe.zh...@rivai.ai
 
From: Kito Cheng
Date: 2023-06-07 10:38
To: juzhe.zh...@rivai.ai
CC: gcc-patches; Kito.cheng; palmer; palmer; jeffreyalaw; Robin Dapp; pan2.li
Subject: Re: [PATCH] RISC-V: Support RVV VLA SLP auto-vectorization
Few comments, but all comments are asking adding more comment :P
 
> @@ -398,6 +410,48 @@ rvv_builder::get_merge_scalar_mask (unsigned int 
> index_in_pattern) const
>return gen_int_mode (mask, inner_int_mode ());
>  }
>
> +/* Return true if the variable-length vector is single step.  */
> +bool
> +rvv_builder::single_step_npatterns_p () const
 
what is single_step_npatterns? could you have more comment?
 
> +{
> +  if (nelts_per_pattern () != 3)
> +return false;
> +
> +  poly_int64 step
> += rtx_to_poly_int64 (elt (npatterns ())) - rtx_to_poly_int64 (elt (0));
> +  for (unsigned int i = 0; i < npatterns (); i++)
> +{
> +  poly_int64 ele0 = rtx_to_poly_int64 (elt (i));
> +  poly_int64 ele1 = rtx_to_poly_int64 (elt (npatterns () + i));
> +  poly_int64 ele2 = rtx_to_poly_int64 (elt (npatterns () * 2 + i));
> +  poly_int64 diff1 = ele1 - ele0;
> +  poly_int64 diff2 = ele2 - ele1;
> +  if (maybe_ne (step, diff1) || maybe_ne (step, diff2))
> +   return false;
> +}
> +  return true;
> +}
> +
> +/* Return true if all elements of NPATTERNS are equal.
> +
> +   E.g. NPATTERNS = 4:
> + { 2, 2, 2, 2, 4, 4, 4, 4, 8, 8, 8, 8, 16, 16, 16, 16, ... }
> +   E.g. NPATTERNS = 8:
> + { 2, 2, 2, 2, 2, 2, 2, 2, 8, 8, 8, 8, 8, 8, 8, 8, ... }
> +*/
> +bool
> +rvv_builder::npatterns_all_equal_p () const
> +{
> +  poly_int64 ele0 = rtx_to_poly_int64 (elt (0));
> +  for (unsigned int i = 1; i < npatterns (); i++)
> +{
> +  poly_int64 ele = rtx_to_poly_int64 (elt (i));
> +  if (!known_eq (ele, ele0))
> +   return false;
> +}
> +  return true;
> +}
> +
>  static unsigned
>  get_sew (machine_mode mode)
>  {
> @@ -425,7 +479,7 @@ const_vec_all_same_in_range_p (rtx x, HOST_WIDE_INT 
> minval,
> future.  */
>
>  static bool
> -const_vec_all_in_range_p (rtx vec, HOST_WIDE_INT minval, HOST_WIDE_INT 
> maxval)
> +const_vec_all_in_range_p (rtx vec, poly_int64 minval, poly_int64 maxval)
>  {
>if (!CONST_VECTOR_P (vec)
>|| GET_MODE_CLASS (GET_MODE (vec)) != MODE_VECTOR_INT)
> @@ -440,8 +494,10 @@ const_vec_all_in_range_p (rtx vec, HOST_WIDE_INT minval, 
> HOST_WIDE_INT maxval)
>for (int i = 0; i < nunits; i++)
>  {
>rtx vec_elem = CONST_VECTOR_ELT (vec, i);
> -  if (!CONST_INT_P (vec_elem)
> - || !IN_RANGE (INTVAL (vec_elem), minval, maxval))
> +  poly_int64 value;
> +  if (!poly_int_rtx_p (vec_elem, &value)
> + || maybe_lt (value, minval)
> + || maybe_gt (value, maxval))
> return false;
>  }
>return true;
> @@ -453,7 +509,7 @@ const_vec_all_in_range_p (rtx vec, HOST_WIDE_INT minval, 
> HOST_WIDE_INT maxval)
> future.  */
>
>  static rtx
> -gen_const_vector_dup (machine_mode mode, HOST_WIDE_INT val)
> +gen_const_vector_dup (machine_mode mode, poly_int64 val)
>  {
>rtx c = gen_int_mode (val, GET_MODE_INNER (mode));
>return gen_const_vec_duplicate (mode, c);
> @@ -727,7 +783,10 @@ emit_vlmax_gather_insn (rtx target, rtx op, rtx sel)
>rtx elt;
>insn_code icode;
>machine_mode data_mode = GET_MODE (target);
> -  if (const_vec_duplicate_p (sel, &elt))
> +  machine_mode sel_mode = GET_MODE (sel);
> +  if (maybe_ne (GET_MODE_SIZE (data_mode), GET_MODE_SIZE (sel_mode)))
> +icode = code_for_pred_gatherei16 (data_mode);
> +  else if (const_vec_duplicate_p (sel, &elt))
>  {
>icode = code_for_pred_gather_scalar (data_mode);
>sel = elt;
> @@ -744,7 +803,10 @@ emit_vlmax_masked_gather_mu_insn (rtx target, rtx op, 
> rtx sel, rtx mask)
>rtx elt;
>insn_code icode;
>machine_mode data_mode = GET_MODE (target);
> -  if (const_vec_duplicate_p (sel, &elt))
> +  machine_mode sel_mode = GET_MODE (sel);
> +  if (maybe_ne (GET_MODE_SIZE (data_mode), GET_MODE_SIZE (sel_mode)))
> +icode = code_for_pred_gatherei16 (data_mode);
> +  else if (const_vec_duplicate_p (sel, &elt))
>  {
>icode = code_for_pred_gather_scalar (data_mode);
>sel = elt;
> @@ -895,11 +957,130 @@ expand_const_vector

Re: Re: [PATCH] RISC-V: Fix ICE when include riscv_vector.h with rv64gcv

2023-06-06 Thread juzhe.zh...@rivai.ai
LGTM.



juzhe.zh...@rivai.ai
 
From: Kito Cheng
Date: 2023-06-07 10:22
To: pan2.li
CC: gcc-patches; juzhe.zhong; kito.cheng; yanzhang.wang; rdapp.gcc
Subject: Re: [PATCH] RISC-V: Fix ICE when include riscv_vector.h with rv64gcv
lgtm, thanks for fixing this :)
 
On Wed, Jun 7, 2023 at 10:19 AM Pan Li via Gcc-patches
 wrote:
>
> From: Pan Li 
>
> This patch would like to fix the incorrect requirement of the vector
> builtin types for the ZVFH/ZVFHMIN extension. The incorrect requirement
> will result in the ops mismatch with iterators, and then ICE will be
> triggered if ZVFH/ZVFHMIN is not given.
>
> Sorry for inconviensient.
>
> Signed-off-by: Pan Li 
>
> gcc/ChangeLog:
>
> * config/riscv/riscv-vector-builtins-types.def
> (vfloat32mf2_t): Take RVV_REQUIRE_ELEN_FP_16 as requirement.
> (vfloat32m1_t): Ditto.
> (vfloat32m2_t): Ditto.
> (vfloat32m4_t): Ditto.
> (vfloat32m8_t): Ditto.
> (vint16mf4_t): Ditto.
> (vint16mf2_t): Ditto.
> (vint16m1_t): Ditto.
> (vint16m2_t): Ditto.
> (vint16m4_t): Ditto.
> (vint16m8_t): Ditto.
> (vuint16mf4_t): Ditto.
> (vuint16mf2_t): Ditto.
> (vuint16m1_t): Ditto.
> (vuint16m2_t): Ditto.
> (vuint16m4_t): Ditto.
> (vuint16m8_t): Ditto.
> (vint32mf2_t): Ditto.
> (vint32m1_t): Ditto.
> (vint32m2_t): Ditto.
> (vint32m4_t): Ditto.
> (vint32m8_t): Ditto.
> (vuint32mf2_t): Ditto.
> (vuint32m1_t): Ditto.
> (vuint32m2_t): Ditto.
> (vuint32m4_t): Ditto.
> (vuint32m8_t): Ditto.
> ---
>  .../riscv/riscv-vector-builtins-types.def | 66 +--
>  1 file changed, 33 insertions(+), 33 deletions(-)
>
> diff --git a/gcc/config/riscv/riscv-vector-builtins-types.def 
> b/gcc/config/riscv/riscv-vector-builtins-types.def
> index bd3deae8340..589ea532727 100644
> --- a/gcc/config/riscv/riscv-vector-builtins-types.def
> +++ b/gcc/config/riscv/riscv-vector-builtins-types.def
> @@ -518,23 +518,23 @@ DEF_RVV_FULL_V_U_OPS (vuint64m2_t, RVV_REQUIRE_FULL_V)
>  DEF_RVV_FULL_V_U_OPS (vuint64m4_t, RVV_REQUIRE_FULL_V)
>  DEF_RVV_FULL_V_U_OPS (vuint64m8_t, RVV_REQUIRE_FULL_V)
>
> -DEF_RVV_WEXTF_OPS (vfloat32mf2_t, TARGET_ZVFH | RVV_REQUIRE_ELEN_FP_32 | 
> RVV_REQUIRE_MIN_VLEN_64)
> -DEF_RVV_WEXTF_OPS (vfloat32m1_t, TARGET_ZVFH | RVV_REQUIRE_ELEN_FP_32)
> -DEF_RVV_WEXTF_OPS (vfloat32m2_t, TARGET_ZVFH | RVV_REQUIRE_ELEN_FP_32)
> -DEF_RVV_WEXTF_OPS (vfloat32m4_t, TARGET_ZVFH | RVV_REQUIRE_ELEN_FP_32)
> -DEF_RVV_WEXTF_OPS (vfloat32m8_t, TARGET_ZVFH | RVV_REQUIRE_ELEN_FP_32)
> +DEF_RVV_WEXTF_OPS (vfloat32mf2_t, RVV_REQUIRE_ELEN_FP_16 | 
> RVV_REQUIRE_MIN_VLEN_64)
> +DEF_RVV_WEXTF_OPS (vfloat32m1_t, RVV_REQUIRE_ELEN_FP_16)
> +DEF_RVV_WEXTF_OPS (vfloat32m2_t, RVV_REQUIRE_ELEN_FP_16)
> +DEF_RVV_WEXTF_OPS (vfloat32m4_t, RVV_REQUIRE_ELEN_FP_16)
> +DEF_RVV_WEXTF_OPS (vfloat32m8_t, RVV_REQUIRE_ELEN_FP_16)
>
>  DEF_RVV_WEXTF_OPS (vfloat64m1_t, RVV_REQUIRE_ELEN_FP_64)
>  DEF_RVV_WEXTF_OPS (vfloat64m2_t, RVV_REQUIRE_ELEN_FP_64)
>  DEF_RVV_WEXTF_OPS (vfloat64m4_t, RVV_REQUIRE_ELEN_FP_64)
>  DEF_RVV_WEXTF_OPS (vfloat64m8_t, RVV_REQUIRE_ELEN_FP_64)
>
> -DEF_RVV_CONVERT_I_OPS (vint16mf4_t, TARGET_ZVFH | RVV_REQUIRE_MIN_VLEN_64)
> -DEF_RVV_CONVERT_I_OPS (vint16mf2_t, TARGET_ZVFH)
> -DEF_RVV_CONVERT_I_OPS (vint16m1_t, TARGET_ZVFH)
> -DEF_RVV_CONVERT_I_OPS (vint16m2_t, TARGET_ZVFH)
> -DEF_RVV_CONVERT_I_OPS (vint16m4_t, TARGET_ZVFH)
> -DEF_RVV_CONVERT_I_OPS (vint16m8_t, TARGET_ZVFH)
> +DEF_RVV_CONVERT_I_OPS (vint16mf4_t, RVV_REQUIRE_ELEN_FP_16 | 
> RVV_REQUIRE_MIN_VLEN_64)
> +DEF_RVV_CONVERT_I_OPS (vint16mf2_t, RVV_REQUIRE_ELEN_FP_16)
> +DEF_RVV_CONVERT_I_OPS (vint16m1_t, RVV_REQUIRE_ELEN_FP_16)
> +DEF_RVV_CONVERT_I_OPS (vint16m2_t, RVV_REQUIRE_ELEN_FP_16)
> +DEF_RVV_CONVERT_I_OPS (vint16m4_t, RVV_REQUIRE_ELEN_FP_16)
> +DEF_RVV_CONVERT_I_OPS (vint16m8_t, RVV_REQUIRE_ELEN_FP_16)
>
>  DEF_RVV_CONVERT_I_OPS (vint32mf2_t, RVV_REQUIRE_MIN_VLEN_64)
>  DEF_RVV_CONVERT_I_OPS (vint32m1_t, 0)
> @@ -546,12 +546,12 @@ DEF_RVV_CONVERT_I_OPS (vint64m2_t, RVV_REQUIRE_ELEN_64)
>  DEF_RVV_CONVERT_I_OPS (vint64m4_t, RVV_REQUIRE_ELEN_64)
>  DEF_RVV_CONVERT_I_OPS (vint64m8_t, RVV_REQUIRE_ELEN_64)
>
> -DEF_RVV_CONVERT_U_OPS (vuint16mf4_t, TARGET_ZVFH | RVV_REQUIRE_MIN_VLEN_64)
> -DEF_RVV_CONVERT_U_OPS (vuint16mf2_t, TARGET_ZVFH)
> -DEF_RVV_CONVERT_U_OPS (vuint16m1_t, TARGET_ZVFH)
> -DEF_RVV_CONVERT_U_OPS (vuint16m2_t, TARGET_ZVFH)
> -DEF_RVV_CONVERT_U_OPS (vuint16m4_t, TARGET_ZVFH)
> -DEF_RVV_CONVERT_U_OPS (vuint16m8_t, TARGET_ZVFH)
> +DEF_RVV_CONVERT_U_OPS (vuint16mf4_t, RVV_REQUIRE_ELEN_FP_16 | 
> RVV_REQUIRE_

Re: [PATCH] RISC-V: Support RVV VLA SLP auto-vectorization

2023-06-06 Thread juzhe.zh...@rivai.ai
Ping this patch. Ok for trunk ?
Since following patches are blocked by this.



juzhe.zh...@rivai.ai
 
From: juzhe.zhong
Date: 2023-06-06 12:16
To: gcc-patches
CC: kito.cheng; kito.cheng; palmer; palmer; jeffreyalaw; rdapp.gcc; pan2.li; 
Juzhe-Zhong
Subject: [PATCH] RISC-V: Support RVV VLA SLP auto-vectorization
From: Juzhe-Zhong 
 
This patch enables basic VLA SLP auto-vectorization.
Consider this following case:
void
f (uint8_t *restrict a, uint8_t *restrict b)
{
  for (int i = 0; i < 100; ++i)
{
  a[i * 8 + 0] = b[i * 8 + 7] + 1;
  a[i * 8 + 1] = b[i * 8 + 7] + 2;
  a[i * 8 + 2] = b[i * 8 + 7] + 8;
  a[i * 8 + 3] = b[i * 8 + 7] + 4;
  a[i * 8 + 4] = b[i * 8 + 7] + 5;
  a[i * 8 + 5] = b[i * 8 + 7] + 6;
  a[i * 8 + 6] = b[i * 8 + 7] + 7;
  a[i * 8 + 7] = b[i * 8 + 7] + 3;
}
}
 
To enable VLA SLP auto-vectorization, we should be able to handle this 
following const vector:
 
1. NPATTERNS = 8, NELTS_PER_PATTERN = 3.
{ 0, 0, 0, 0, 0, 0, 0, 0, 8, 8, 8, 8, 8, 8, 8, 8, 16, 16, 16, 16, 16, 16, 16, 
16, ... }
 
2. NPATTERNS = 8, NELTS_PER_PATTERN = 1. 
{ 1, 2, 8, 4, 5, 6, 7, 3, ... }
 
And these vector can be generated at prologue.
 
After this patch, we end up with this following codegen:
 
Prologue:
...
vsetvli a7,zero,e16,m2,ta,ma
vid.v   v4
vsrl.vi v4,v4,3
li  a3,8
vmul.vx v4,v4,a3  ===> v4 = { 0, 0, 0, 0, 0, 0, 0, 0, 8, 8, 8, 8, 8, 8, 
8, 8, 16, 16, 16, 16, 16, 16, 16, 16, ... }
...
li  t1,67633152
addit1,t1,513
li  a3,50790400
addia3,a3,1541
sllia3,a3,32
add a3,a3,t1
vsetvli t1,zero,e64,m1,ta,ma
vmv.v.x v3,a3   ===> v3 = { 1, 2, 8, 4, 5, 6, 7, 3, ... }
...
LoopBody:
...
min a3,...
vsetvli zero,a3,e8,m1,ta,ma
vle8.v  v2,0(a6)
vsetvli a7,zero,e8,m1,ta,ma
vrgatherei16.vv v1,v2,v4
vadd.vv v1,v1,v3
vsetvli zero,a3,e8,m1,ta,ma
vse8.v  v1,0(a2)
add a6,a6,a4
add a2,a2,a4
mv  a3,a5
add a5,a5,t1
bgtua3,a4,.L3
...
 
Note: we need to use "vrgatherei16.vv" instead of "vrgather.vv" for SEW = 8 
since "vrgatherei16.vv" can cover larger
  range than "vrgather.vv" (which only can maximum element index = 255).
Epilogue:
lbu a5,799(a1)
addiw   a4,a5,1
sb  a4,792(a0)
addiw   a4,a5,2
sb  a4,793(a0)
addiw   a4,a5,8
sb  a4,794(a0)
addiw   a4,a5,4
sb  a4,795(a0)
addiw   a4,a5,5
sb  a4,796(a0)
addiw   a4,a5,6
sb  a4,797(a0)
addiw   a4,a5,7
sb  a4,798(a0)
addiw   a5,a5,3
sb  a5,799(a0)
ret
 
There is one more last thing we need to do is the "Epilogue auto-vectorization" 
which needs VLS modes support.
I will support VLS modes for "Epilogue auto-vectorization" in the future.
 
gcc/ChangeLog:
 
* config/riscv/riscv-protos.h (expand_vec_perm_const): New function.
* config/riscv/riscv-v.cc 
(rvv_builder::can_duplicate_repeating_sequence_p): Support POLY handling.
(rvv_builder::single_step_npatterns_p): New function.
(rvv_builder::npatterns_all_equal_p): Ditto.
(const_vec_all_in_range_p): Support POLY handling.
(gen_const_vector_dup): Ditto.
(emit_vlmax_gather_insn): Add vrgatherei16.
(emit_vlmax_masked_gather_mu_insn): Ditto.
(expand_const_vector): Add VLA SLP const vector support.
(expand_vec_perm): Support POLY.
(struct expand_vec_perm_d): New struct.
(shuffle_generic_patterns): New function.
(expand_vec_perm_const_1): Ditto.
(expand_vec_perm_const): Ditto.
* config/riscv/riscv.cc (riscv_vectorize_vec_perm_const): Ditto.
(TARGET_VECTORIZE_VEC_PERM_CONST): New targethook.
 
gcc/testsuite/ChangeLog:
 
* gcc.target/riscv/rvv/autovec/scalable-1.c: Adapt testcase for VLA 
vectorizer.
* gcc.target/riscv/rvv/autovec/v-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/zve32f_zvl128b-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/zve32x_zvl128b-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/zve64d-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/zve64d_zvl128b-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/zve64f-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/zve64f_zvl128b-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/zve64x_zvl128b-1.c: Ditto.
* gcc.target/riscv/rvv/autovec/partial/slp-1.c: New test.
* gcc.target/riscv/rvv/autovec/partial/slp-2.c: New test.
* gcc.target/riscv/rvv/autovec/partial/slp-3.c: New test.
* gcc.target/riscv/rvv/autovec/partial/slp-4.c: New test.
* gcc.target/riscv/rvv/autovec/partial/slp-5.c: New test.
* gcc.target/riscv/rvv/autovec/partial/slp-6.c: New t

Re: Re: [PATCH V3] VECT: Change flow of decrement IV

2023-06-06 Thread juzhe.zh...@rivai.ai
Hi, Richard and Richi.
Recently, we have some tries on our downstream RVV LLVM.
We change "get_vector_length" (same IR like GCC "select_vl") into "umin", turns 
out LLVM SCEV analysis succeed.
The unroll PASS works in LLVM.

I think Richard's  suggestion is very reasonable.

I didn't have the time take a look at how LLVM do now but I will take a look at 
it in the future. 

Thanks.


juzhe.zh...@rivai.ai
 
From: Richard Sandiford
Date: 2023-06-02 19:43
To: juzhe.zhong\@rivai.ai
CC: rguenther; gcc-patches; linkw
Subject: Re: [PATCH V3] VECT: Change flow of decrement IV
"juzhe.zh...@rivai.ai"  writes:
> Thanks Richi. I am gonna merge it after Richard's final approve.
 
Thanks for checking, but no need to wait for a second ack from me!
Please go ahead and commit.
 
Richard
 


Re: Re: [PATCH] RISC-V: Support RVV VLA SLP auto-vectorization

2023-06-06 Thread juzhe.zh...@rivai.ai
I compare the codegen with aarch64:
https://godbolt.org/z/xTxPGcYMj 

asm of aarch64:
f:
add x5, x1, 7
mov x2, 0
cntbx4
ptrue   p7.b, all
adrpx3, .LC0
add x3, x3, :lo12:.LC0
index   z30.b, #0, #8
ld1rd   z29.d, p7/z, [x3]
zip1z30.b, z30.b, z30.b
mov x3, 792
zip1z30.b, z30.b, z30.b
whilelo p7.b, xzr, x3
zip1z30.b, z30.b, z30.b
.L2:
ld1bz31.b, p7/z, [x5, x2]
tbl z31.b, z31.b, z30.b
add z31.b, z31.b, z29.b
st1bz31.b, p7, [x0, x2]
add x2, x2, x4
whilelo p7.b, x2, x3
b.any   .L2
Epilogue:
ldr b31, [x1, 799]
adrpx1, .LC1
ldr d30, [x1, #:lo12:.LC1]
dup v31.8b, v31.b[0]
add v31.8b, v31.8b, v30.8b
str d31, [x0, 792]
ret

Here you can see using ADVSIMD (NEON) of aarch64 to vectorize the res codes.
With this patch of RVV:
> Epilogue:
> lbu a5,799(a1)
> addiw   a4,a5,1
> sb  a4,792(a0)
> addiw   a4,a5,2
> sb  a4,793(a0)
> addiw   a4,a5,8
> sb  a4,794(a0)
> addiw   a4,a5,4
> sb  a4,795(a0)
> addiw   a4,a5,5
> sb  a4,796(a0)
> addiw   a4,a5,6
> sb  a4,797(a0)
> addiw   a4,a5,7
> sb  a4,798(a0)
> addiw   a5,a5,3
> sb  a5,799(a0)
> ret

Ideally, this scalar codes should be able to vectorized like aarch64.


juzhe.zh...@rivai.ai
 
From: Richard Biener
Date: 2023-06-06 14:55
To: juzhe.zhong
CC: gcc-patches; kito.cheng; kito.cheng; palmer; palmer; jeffreyalaw; 
rdapp.gcc; pan2.li
Subject: Re: [PATCH] RISC-V: Support RVV VLA SLP auto-vectorization
On Tue, Jun 6, 2023 at 6:17 AM  wrote:
>
> From: Juzhe-Zhong 
>
> This patch enables basic VLA SLP auto-vectorization.
> Consider this following case:
> void
> f (uint8_t *restrict a, uint8_t *restrict b)
> {
>   for (int i = 0; i < 100; ++i)
> {
>   a[i * 8 + 0] = b[i * 8 + 7] + 1;
>   a[i * 8 + 1] = b[i * 8 + 7] + 2;
>   a[i * 8 + 2] = b[i * 8 + 7] + 8;
>   a[i * 8 + 3] = b[i * 8 + 7] + 4;
>   a[i * 8 + 4] = b[i * 8 + 7] + 5;
>   a[i * 8 + 5] = b[i * 8 + 7] + 6;
>   a[i * 8 + 6] = b[i * 8 + 7] + 7;
>   a[i * 8 + 7] = b[i * 8 + 7] + 3;
> }
> }
>
> To enable VLA SLP auto-vectorization, we should be able to handle this 
> following const vector:
>
> 1. NPATTERNS = 8, NELTS_PER_PATTERN = 3.
> { 0, 0, 0, 0, 0, 0, 0, 0, 8, 8, 8, 8, 8, 8, 8, 8, 16, 16, 16, 16, 16, 16, 16, 
> 16, ... }
>
> 2. NPATTERNS = 8, NELTS_PER_PATTERN = 1.
> { 1, 2, 8, 4, 5, 6, 7, 3, ... }
>
> And these vector can be generated at prologue.
>
> After this patch, we end up with this following codegen:
>
> Prologue:
> ...
> vsetvli a7,zero,e16,m2,ta,ma
> vid.v   v4
> vsrl.vi v4,v4,3
> li  a3,8
> vmul.vx v4,v4,a3  ===> v4 = { 0, 0, 0, 0, 0, 0, 0, 0, 8, 8, 8, 8, 8, 
> 8, 8, 8, 16, 16, 16, 16, 16, 16, 16, 16, ... }
> ...
> li  t1,67633152
> addit1,t1,513
> li  a3,50790400
> addia3,a3,1541
> sllia3,a3,32
> add a3,a3,t1
> vsetvli t1,zero,e64,m1,ta,ma
> vmv.v.x v3,a3   ===> v3 = { 1, 2, 8, 4, 5, 6, 7, 3, ... }
> ...
> LoopBody:
> ...
> min a3,...
> vsetvli zero,a3,e8,m1,ta,ma
> vle8.v  v2,0(a6)
> vsetvli a7,zero,e8,m1,ta,ma
> vrgatherei16.vv v1,v2,v4
> vadd.vv v1,v1,v3
> vsetvli zero,a3,e8,m1,ta,ma
> vse8.v  v1,0(a2)
> add a6,a6,a4
> add a2,a2,a4
> mv  a3,a5
> add a5,a5,t1
> bgtua3,a4,.L3
> ...
>
> Note: we need to use "vrgatherei16.vv" instead of "vrgather.vv" for SEW = 8 
> since "vrgatherei16.vv" can cover larger
>   range than "vrgather.vv" (which only can maximum element index = 255).
> Epilogue:
> lbu a5,799(a1)
> addiw   a4,a5,1
> sb  a4,792(a0)
> addiw   a4,a5,2
> sb  a4,793(a0)
> addiw   a4,a5,8
> sb  a4,794(a0)
> addiw   a4,a5,4
> sb  a4,795(a0)
> addiw   a4,a5,5
> sb  a4,796(a0)
> addiw   a4,a5,6
> sb  a4,797(a0)
> addiw   a4,a5,7
> sb  a4,798(a0)
> addiw   a5,a5,3
> sb  a5,799(a0)
> ret
>
> There is one more last thing we need to do is the "Epilogue 
> auto-vectorization" whic

Re: Re: [PATCH v1] RISC-V: Support RVV FP16 ZVFH Reduction floating-point intrinsic API

2023-06-05 Thread juzhe.zh...@rivai.ai
I think we should split instructions pattern which belongs to ZVFHMIN.
And add ZVFH gating into all original iterator for example: VF VWFetc.



juzhe.zh...@rivai.ai
 
From: Kito Cheng
Date: 2023-06-06 09:32
To: juzhe.zh...@rivai.ai
CC: pan2.li; gcc-patches; Kito.cheng; yanzhang.wang
Subject: Re: [PATCH v1] RISC-V: Support RVV FP16 ZVFH Reduction floating-point 
intrinsic API
> diff --git a/gcc/config/riscv/vector-iterators.md 
> b/gcc/config/riscv/vector-iterators.md
> index e4f2ba90799..c338e3c9003 100644
> --- a/gcc/config/riscv/vector-iterators.md
> +++ b/gcc/config/riscv/vector-iterators.md
> @@ -330,10 +330,18 @@ (define_mode_iterator VF_ZVE32 [
> ])
> (define_mode_iterator VWF [
> +  (VNx1HF "TARGET_VECTOR_ELEN_FP_16 && TARGET_MIN_VLEN < 128")
> +  (VNx2HF "TARGET_VECTOR_ELEN_FP_16")
> +  (VNx4HF "TARGET_VECTOR_ELEN_FP_16")
> +  (VNx8HF "TARGET_VECTOR_ELEN_FP_16")
> +  (VNx16HF "TARGET_VECTOR_ELEN_FP_16")
> +  (VNx32HF "TARGET_VECTOR_ELEN_FP_16 && TARGET_MIN_VLEN > 32")
> +  (VNx64HF "TARGET_VECTOR_ELEN_FP_16 && TARGET_MIN_VLEN >= 128")
 
I am little concern about using TARGET_VECTOR_ELEN_FP_16 as predictor here,
zvfhmin also set TARGET_VECTOR_ELEN_FP_16 flag,
so it means zvfhmin also enabled reduction?
 
and also has the same concern for V and VF in the last patch[1] too.
 
[1] 
https://patchwork.sourceware.org/project/gcc/patch/20230605082043.1707158-1-pan2...@intel.com/
 
Give a more practical example to explain my concern:
 
We've using V and VF iterators in autovec.md, and zvfhmin will set
MASK_VECTOR_ELEN_FP_16
which means zvfhmin WILL enable most autovec patterns with fp16,
that should not what we expected to do I think?
 


Re: Re: [PATCH v1] RISC-V: Support RVV FP16 ZVFH Reduction floating-point intrinsic API

2023-06-05 Thread juzhe.zh...@rivai.ai
Oh. YES. Thanks for catching this.
VF will be used in autovec for example: vfadd.
When specify zfhmin, the vfadd autovec will be enabled unexpectedly.



juzhe.zh...@rivai.ai
 
From: Kito Cheng
Date: 2023-06-06 09:32
To: juzhe.zh...@rivai.ai
CC: pan2.li; gcc-patches; Kito.cheng; yanzhang.wang
Subject: Re: [PATCH v1] RISC-V: Support RVV FP16 ZVFH Reduction floating-point 
intrinsic API
> diff --git a/gcc/config/riscv/vector-iterators.md 
> b/gcc/config/riscv/vector-iterators.md
> index e4f2ba90799..c338e3c9003 100644
> --- a/gcc/config/riscv/vector-iterators.md
> +++ b/gcc/config/riscv/vector-iterators.md
> @@ -330,10 +330,18 @@ (define_mode_iterator VF_ZVE32 [
> ])
> (define_mode_iterator VWF [
> +  (VNx1HF "TARGET_VECTOR_ELEN_FP_16 && TARGET_MIN_VLEN < 128")
> +  (VNx2HF "TARGET_VECTOR_ELEN_FP_16")
> +  (VNx4HF "TARGET_VECTOR_ELEN_FP_16")
> +  (VNx8HF "TARGET_VECTOR_ELEN_FP_16")
> +  (VNx16HF "TARGET_VECTOR_ELEN_FP_16")
> +  (VNx32HF "TARGET_VECTOR_ELEN_FP_16 && TARGET_MIN_VLEN > 32")
> +  (VNx64HF "TARGET_VECTOR_ELEN_FP_16 && TARGET_MIN_VLEN >= 128")
 
I am little concern about using TARGET_VECTOR_ELEN_FP_16 as predictor here,
zvfhmin also set TARGET_VECTOR_ELEN_FP_16 flag,
so it means zvfhmin also enabled reduction?
 
and also has the same concern for V and VF in the last patch[1] too.
 
[1] 
https://patchwork.sourceware.org/project/gcc/patch/20230605082043.1707158-1-pan2...@intel.com/
 
Give a more practical example to explain my concern:
 
We've using V and VF iterators in autovec.md, and zvfhmin will set
MASK_VECTOR_ELEN_FP_16
which means zvfhmin WILL enable most autovec patterns with fp16,
that should not what we expected to do I think?
 


Re: [PATCH v1] RISC-V: Support RVV FP16 ZVFH Reduction floating-point intrinsic API

2023-06-05 Thread juzhe.zh...@rivai.ai
LGTM



juzhe.zh...@rivai.ai
 
From: pan2.li
Date: 2023-06-05 22:49
To: gcc-patches
CC: juzhe.zhong; kito.cheng; pan2.li; yanzhang.wang
Subject: [PATCH v1] RISC-V: Support RVV FP16 ZVFH Reduction floating-point 
intrinsic API
From: Pan Li 
 
This patch support the intrinsic API of FP16 ZVFH Reduction floating-point.
Aka SEW=16 for below instructions:
 
vfredosum vfredusum
vfredmax vfredmin
vfwredosum vfwredusum
 
Then users can leverage the instrinsic APIs to perform the FP=16 related
reduction operations. Please note not all the instrinsic APIs are coverred
in the test files, only pick some typical ones due to too many. We will
perform the FP16 related instrinsic API test entirely soon.
 
Signed-off-by: Pan Li 
 
gcc/ChangeLog:
 
* config/riscv/riscv-vector-builtins-types.def
(vfloat16mf4_t): Add vfloat16mf4_t to WF operations.
(vfloat16mf2_t): Likewise.
(vfloat16m1_t): Likewise.
(vfloat16m2_t): Likewise.
(vfloat16m4_t): Likewise.
(vfloat16m8_t): Likewise.
* config/riscv/vector-iterators.md: Add FP=16 to VWF, VWF_ZVE64,
VWLMUL1, VWLMUL1_ZVE64, vwlmul1 and vwlmul1_zve64.
 
gcc/testsuite/ChangeLog:
 
* gcc.target/riscv/rvv/base/zvfh-intrinsic.c: Add new test cases.
---
.../riscv/riscv-vector-builtins-types.def |  7 +++
gcc/config/riscv/vector-iterators.md  | 12 
.../riscv/rvv/base/zvfh-intrinsic.c   | 58 ++-
3 files changed, 75 insertions(+), 2 deletions(-)
 
diff --git a/gcc/config/riscv/riscv-vector-builtins-types.def 
b/gcc/config/riscv/riscv-vector-builtins-types.def
index 1e2491de6d6..bd3deae8340 100644
--- a/gcc/config/riscv/riscv-vector-builtins-types.def
+++ b/gcc/config/riscv/riscv-vector-builtins-types.def
@@ -634,6 +634,13 @@ DEF_RVV_WU_OPS (vuint32m2_t, 0)
DEF_RVV_WU_OPS (vuint32m4_t, 0)
DEF_RVV_WU_OPS (vuint32m8_t, 0)
+DEF_RVV_WF_OPS (vfloat16mf4_t, TARGET_ZVFH | RVV_REQUIRE_MIN_VLEN_64)
+DEF_RVV_WF_OPS (vfloat16mf2_t, TARGET_ZVFH)
+DEF_RVV_WF_OPS (vfloat16m1_t, TARGET_ZVFH)
+DEF_RVV_WF_OPS (vfloat16m2_t, TARGET_ZVFH)
+DEF_RVV_WF_OPS (vfloat16m4_t, TARGET_ZVFH)
+DEF_RVV_WF_OPS (vfloat16m8_t, TARGET_ZVFH)
+
DEF_RVV_WF_OPS (vfloat32mf2_t, RVV_REQUIRE_ELEN_FP_32 | RVV_REQUIRE_MIN_VLEN_64)
DEF_RVV_WF_OPS (vfloat32m1_t, RVV_REQUIRE_ELEN_FP_32)
DEF_RVV_WF_OPS (vfloat32m2_t, RVV_REQUIRE_ELEN_FP_32)
diff --git a/gcc/config/riscv/vector-iterators.md 
b/gcc/config/riscv/vector-iterators.md
index e4f2ba90799..c338e3c9003 100644
--- a/gcc/config/riscv/vector-iterators.md
+++ b/gcc/config/riscv/vector-iterators.md
@@ -330,10 +330,18 @@ (define_mode_iterator VF_ZVE32 [
])
(define_mode_iterator VWF [
+  (VNx1HF "TARGET_VECTOR_ELEN_FP_16 && TARGET_MIN_VLEN < 128")
+  (VNx2HF "TARGET_VECTOR_ELEN_FP_16")
+  (VNx4HF "TARGET_VECTOR_ELEN_FP_16")
+  (VNx8HF "TARGET_VECTOR_ELEN_FP_16")
+  (VNx16HF "TARGET_VECTOR_ELEN_FP_16")
+  (VNx32HF "TARGET_VECTOR_ELEN_FP_16 && TARGET_MIN_VLEN > 32")
+  (VNx64HF "TARGET_VECTOR_ELEN_FP_16 && TARGET_MIN_VLEN >= 128")
   (VNx1SF "TARGET_MIN_VLEN < 128") VNx2SF VNx4SF VNx8SF (VNx16SF 
"TARGET_MIN_VLEN > 32") (VNx32SF "TARGET_MIN_VLEN >= 128")
])
(define_mode_iterator VWF_ZVE64 [
+  VNx1HF VNx2HF VNx4HF VNx8HF VNx16HF VNx32HF
   VNx1SF VNx2SF VNx4SF VNx8SF VNx16SF
])
@@ -1322,6 +1330,7 @@ (define_mode_attr VWLMUL1 [
   (VNx8HI "VNx4SI") (VNx16HI "VNx4SI") (VNx32HI "VNx4SI") (VNx64HI "VNx4SI")
   (VNx1SI "VNx2DI") (VNx2SI "VNx2DI") (VNx4SI "VNx2DI")
   (VNx8SI "VNx2DI") (VNx16SI "VNx2DI") (VNx32SI "VNx2DI")
+  (VNx1HF "VNx4SF") (VNx2HF "VNx4SF") (VNx4HF "VNx4SF") (VNx8HF "VNx4SF") 
(VNx16HF "VNx4SF") (VNx32HF "VNx4SF") (VNx64HF "VNx4SF")
   (VNx1SF "VNx2DF") (VNx2SF "VNx2DF")
   (VNx4SF "VNx2DF") (VNx8SF "VNx2DF") (VNx16SF "VNx2DF") (VNx32SF "VNx2DF")
])
@@ -1333,6 +1342,7 @@ (define_mode_attr VWLMUL1_ZVE64 [
   (VNx8HI "VNx2SI") (VNx16HI "VNx2SI") (VNx32HI "VNx2SI")
   (VNx1SI "VNx1DI") (VNx2SI "VNx1DI") (VNx4SI "VNx1DI")
   (VNx8SI "VNx1DI") (VNx16SI "VNx1DI")
+  (VNx1HF "VNx2SF") (VNx2HF "VNx2SF") (VNx4HF "VNx2SF") (VNx8HF "VNx2SF") 
(VNx16HF "VNx2SF") (VNx32HF "VNx2SF")
   (VNx1SF "VNx1DF") (VNx2SF "VNx1DF")
   (VNx4SF "VNx1DF") (VNx8SF "VNx1DF") (VNx16SF "VNx1DF")
])
@@ -1393,6 +1403,7 @@ (define_mode_attr vwlmul1 [
   (VNx8HI "vnx4si") (VNx16HI "vnx4si") (VNx32HI "vnx4si") (VNx64HI "vnx4si")
   (VNx1SI "vnx2di") (VNx2SI "vnx2di") (VNx4SI "vnx2di")
   (VNx8SI "vnx2di") (VNx16SI "vnx2di") (VNx32SI &qu

回复: Re: [PATCH V3] VECT: Add SELECT_VL support

2023-06-05 Thread juzhe.zh...@rivai.ai
More update,  just passed regression on X86.

Thanks.


juzhe.zh...@rivai.ai
 
发件人: juzhe.zh...@rivai.ai
发送时间: 2023-06-05 18:40
收件人: 钟居哲; gcc-patches
抄送: richard.sandiford; rguenther
主题: Re: [PATCH V3] VECT: Add SELECT_VL support
Hi, Richard and Richi.
Thanks for the help.
This patch is boostrap PASS. Ok for trunk?



juzhe.zh...@rivai.ai
 
From: juzhe.zhong
Date: 2023-06-05 18:30
To: gcc-patches
CC: richard.sandiford; rguenther; Ju-Zhe Zhong
Subject: [PATCH V3] VECT: Add SELECT_VL support
From: Ju-Zhe Zhong 
 
Co-authored-by: Richard Sandiford
 
This patch address comments from Richard and rebase to trunk.
 
This patch is adding SELECT_VL middle-end support
allow target have target dependent optimization in case of
length calculation.
 
This patch is inspired by RVV ISA and LLVM:
https://reviews.llvm.org/D99750
 
The SELECT_VL is same behavior as LLVM "get_vector_length" with
these following properties:
 
1. Only apply on single-rgroup.
2. non SLP.
3. adjust loop control IV.
4. adjust data reference IV.
5. allow non-vf elements processing in non-final iteration
 
Code:
   # void vvaddint32(size_t n, const int*x, const int*y, int*z)
# { for (size_t i=0; i
 
---
gcc/doc/md.texi | 22 
gcc/internal-fn.def |  1 +
gcc/optabs.def  |  1 +
gcc/tree-vect-loop-manip.cc | 32 -
gcc/tree-vect-loop.cc   | 72 +
gcc/tree-vect-stmts.cc  | 66 ++
gcc/tree-vectorizer.h   |  6 
7 files changed, 191 insertions(+), 9 deletions(-)
 
diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
index 6a435eb4461..95f7fe1f802 100644
--- a/gcc/doc/md.texi
+++ b/gcc/doc/md.texi
@@ -4974,6 +4974,28 @@ for (i = 1; i < operand3; i++)
   operand0[i] = operand0[i - 1] && (operand1 + i < operand2);
@end smallexample
+@cindex @code{select_vl@var{m}} instruction pattern
+@item @code{select_vl@var{m}}
+Set operand 0 to the number of scalar iterations that should be handled
+by one iteration of a vector loop.  Operand 1 is the total number of
+scalar iterations that the loop needs to process and operand 2 is a
+maximum bound on the result (also known as the maximum ``vectorization
+factor'').
+
+The maximum value of operand 0 is given by:
+@smallexample
+operand0 = MIN (operand1, operand2)
+@end smallexample
+However, targets might choose a lower value than this, based on
+target-specific criteria.  Each iteration of the vector loop might
+therefore process a different number of scalar iterations, which in turn
+means that induction variables will have a variable step.  Because of
+this, it is generally not useful to define this instruction if it will
+always calculate the maximum value.
+
+This optab is only useful on targets that implement @samp{len_load_@var{m}}
+and/or @samp{len_store_@var{m}}.
+
@cindex @code{check_raw_ptrs@var{m}} instruction pattern
@item @samp{check_raw_ptrs@var{m}}
Check whether, given two pointers @var{a} and @var{b} and a length @var{len},
diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def
index 7fe742c2ae7..6f6fa7d37f9 100644
--- a/gcc/internal-fn.def
+++ b/gcc/internal-fn.def
@@ -153,6 +153,7 @@ DEF_INTERNAL_OPTAB_FN (VEC_SET, 0, vec_set, vec_set)
DEF_INTERNAL_OPTAB_FN (LEN_STORE, 0, len_store, len_store)
DEF_INTERNAL_OPTAB_FN (WHILE_ULT, ECF_CONST | ECF_NOTHROW, while_ult, while)
+DEF_INTERNAL_OPTAB_FN (SELECT_VL, ECF_CONST | ECF_NOTHROW, select_vl, binary)
DEF_INTERNAL_OPTAB_FN (CHECK_RAW_PTRS, ECF_CONST | ECF_NOTHROW,
   check_raw_ptrs, check_ptrs)
DEF_INTERNAL_OPTAB_FN (CHECK_WAR_PTRS, ECF_CONST | ECF_NOTHROW,
diff --git a/gcc/optabs.def b/gcc/optabs.def
index 695f5911b30..b637471b76e 100644
--- a/gcc/optabs.def
+++ b/gcc/optabs.def
@@ -476,3 +476,4 @@ OPTAB_DC (vec_series_optab, "vec_series$a", VEC_SERIES)
OPTAB_D (vec_shl_insert_optab, "vec_shl_insert_$a")
OPTAB_D (len_load_optab, "len_load_$a")
OPTAB_D (len_store_optab, "len_store_$a")
+OPTAB_D (select_vl_optab, "select_vl$a")
diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc
index 3f735945e67..1c8100c1a1c 100644
--- a/gcc/tree-vect-loop-manip.cc
+++ b/gcc/tree-vect-loop-manip.cc
@@ -534,7 +534,7 @@ vect_set_loop_controls_directly (class loop *loop, 
loop_vec_info loop_vinfo,
   _10 = (unsigned long) count_12(D);
   ...
   # ivtmp_9 = PHI 
-_36 = MIN_EXPR ;
+_36 = (MIN_EXPR | SELECT_VL) ;
   ...
   vect__4.8_28 = .LEN_LOAD (_17, 32B, _36, 0);
   ...
@@ -549,15 +549,28 @@ vect_set_loop_controls_directly (class loop *loop, 
loop_vec_info loop_vinfo,
   tree step = rgc->controls.length () == 1 ? rgc->controls[0]
   : make_ssa_name (iv_type);
   /* Create decrement IV.  */
-  create_iv (nitems_total, MINUS_EXPR, nitems_step, NULL_TREE, loop,
- &incr_gsi, insert_after, &index_before_incr,
- &index_after_incr);
-  gimple_seq_add_stmt (header_seq, gimple_build_assign 

Re: [PATCH V3] VECT: Add SELECT_VL support

2023-06-05 Thread juzhe.zh...@rivai.ai
Hi, Richard and Richi.
Thanks for the help.
This patch is boostrap PASS. Ok for trunk?



juzhe.zh...@rivai.ai
 
From: juzhe.zhong
Date: 2023-06-05 18:30
To: gcc-patches
CC: richard.sandiford; rguenther; Ju-Zhe Zhong
Subject: [PATCH V3] VECT: Add SELECT_VL support
From: Ju-Zhe Zhong 
 
Co-authored-by: Richard Sandiford
 
This patch address comments from Richard and rebase to trunk.
 
This patch is adding SELECT_VL middle-end support
allow target have target dependent optimization in case of
length calculation.
 
This patch is inspired by RVV ISA and LLVM:
https://reviews.llvm.org/D99750
 
The SELECT_VL is same behavior as LLVM "get_vector_length" with
these following properties:
 
1. Only apply on single-rgroup.
2. non SLP.
3. adjust loop control IV.
4. adjust data reference IV.
5. allow non-vf elements processing in non-final iteration
 
Code:
   # void vvaddint32(size_t n, const int*x, const int*y, int*z)
# { for (size_t i=0; i
 
---
gcc/doc/md.texi | 22 
gcc/internal-fn.def |  1 +
gcc/optabs.def  |  1 +
gcc/tree-vect-loop-manip.cc | 32 -
gcc/tree-vect-loop.cc   | 72 +
gcc/tree-vect-stmts.cc  | 66 ++
gcc/tree-vectorizer.h   |  6 
7 files changed, 191 insertions(+), 9 deletions(-)
 
diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
index 6a435eb4461..95f7fe1f802 100644
--- a/gcc/doc/md.texi
+++ b/gcc/doc/md.texi
@@ -4974,6 +4974,28 @@ for (i = 1; i < operand3; i++)
   operand0[i] = operand0[i - 1] && (operand1 + i < operand2);
@end smallexample
+@cindex @code{select_vl@var{m}} instruction pattern
+@item @code{select_vl@var{m}}
+Set operand 0 to the number of scalar iterations that should be handled
+by one iteration of a vector loop.  Operand 1 is the total number of
+scalar iterations that the loop needs to process and operand 2 is a
+maximum bound on the result (also known as the maximum ``vectorization
+factor'').
+
+The maximum value of operand 0 is given by:
+@smallexample
+operand0 = MIN (operand1, operand2)
+@end smallexample
+However, targets might choose a lower value than this, based on
+target-specific criteria.  Each iteration of the vector loop might
+therefore process a different number of scalar iterations, which in turn
+means that induction variables will have a variable step.  Because of
+this, it is generally not useful to define this instruction if it will
+always calculate the maximum value.
+
+This optab is only useful on targets that implement @samp{len_load_@var{m}}
+and/or @samp{len_store_@var{m}}.
+
@cindex @code{check_raw_ptrs@var{m}} instruction pattern
@item @samp{check_raw_ptrs@var{m}}
Check whether, given two pointers @var{a} and @var{b} and a length @var{len},
diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def
index 7fe742c2ae7..6f6fa7d37f9 100644
--- a/gcc/internal-fn.def
+++ b/gcc/internal-fn.def
@@ -153,6 +153,7 @@ DEF_INTERNAL_OPTAB_FN (VEC_SET, 0, vec_set, vec_set)
DEF_INTERNAL_OPTAB_FN (LEN_STORE, 0, len_store, len_store)
DEF_INTERNAL_OPTAB_FN (WHILE_ULT, ECF_CONST | ECF_NOTHROW, while_ult, while)
+DEF_INTERNAL_OPTAB_FN (SELECT_VL, ECF_CONST | ECF_NOTHROW, select_vl, binary)
DEF_INTERNAL_OPTAB_FN (CHECK_RAW_PTRS, ECF_CONST | ECF_NOTHROW,
   check_raw_ptrs, check_ptrs)
DEF_INTERNAL_OPTAB_FN (CHECK_WAR_PTRS, ECF_CONST | ECF_NOTHROW,
diff --git a/gcc/optabs.def b/gcc/optabs.def
index 695f5911b30..b637471b76e 100644
--- a/gcc/optabs.def
+++ b/gcc/optabs.def
@@ -476,3 +476,4 @@ OPTAB_DC (vec_series_optab, "vec_series$a", VEC_SERIES)
OPTAB_D (vec_shl_insert_optab, "vec_shl_insert_$a")
OPTAB_D (len_load_optab, "len_load_$a")
OPTAB_D (len_store_optab, "len_store_$a")
+OPTAB_D (select_vl_optab, "select_vl$a")
diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc
index 3f735945e67..1c8100c1a1c 100644
--- a/gcc/tree-vect-loop-manip.cc
+++ b/gcc/tree-vect-loop-manip.cc
@@ -534,7 +534,7 @@ vect_set_loop_controls_directly (class loop *loop, 
loop_vec_info loop_vinfo,
   _10 = (unsigned long) count_12(D);
   ...
   # ivtmp_9 = PHI 
-_36 = MIN_EXPR ;
+_36 = (MIN_EXPR | SELECT_VL) ;
   ...
   vect__4.8_28 = .LEN_LOAD (_17, 32B, _36, 0);
   ...
@@ -549,15 +549,28 @@ vect_set_loop_controls_directly (class loop *loop, 
loop_vec_info loop_vinfo,
   tree step = rgc->controls.length () == 1 ? rgc->controls[0]
   : make_ssa_name (iv_type);
   /* Create decrement IV.  */
-  create_iv (nitems_total, MINUS_EXPR, nitems_step, NULL_TREE, loop,
- &incr_gsi, insert_after, &index_before_incr,
- &index_after_incr);
-  gimple_seq_add_stmt (header_seq, gimple_build_assign (step, MIN_EXPR,
- index_before_incr,
- nitems_step));
+  if (LOOP_VINFO_USING_SELECT_VL_P (loop_vinfo))
+ {
+   create_iv (nitems_total, MINUS_EXPR, step, NULL_TREE, loop, &incr_gsi,
+  insert_afte

Re: Re: [PATCH V2] VECT: Add SELECT_VL support

2023-06-05 Thread juzhe.zh...@rivai.ai
>> But it would make the pointer IV updates more complex.  So let's
>> say that that's the reason for preferring solution 3.

Yes, I prefer pointer solution 3 to avoid complex IV updates and there is
no benefits in solution 2(unlike single-rgroup).

I read your comments, it's more comprehensive than I wrote.

I will send V3 patch with appending your comments.

Thanks you so much!


juzhe.zh...@rivai.ai
 
From: Richard Sandiford
Date: 2023-06-05 18:09
To: juzhe.zhong\@rivai.ai
CC: gcc-patches; rguenther
Subject: Re: [PATCH V2] VECT: Add SELECT_VL support
"juzhe.zh...@rivai.ai"  writes:
> Hi, Richard.
>
>>> No, I meant that the comment I quoted seemed to be saying that solution
>>> 3 wasn't possible.  The comment seemed to say that we would need to do
>>> solution 1.
> I am so sorry that I didn't write the comments accurately.
> Could you help me with comments ? Base on what we have discussed above (I 
> think we are on same page now).
 
Yeah, agree we seem to be on the same page
 
>>> When comparing solutions 2 and 3 for case (b), is solution 3 still better?
>>> E.g. is "vsetvli zero" cheaper than "vsetvli "?
>
>
> "vsetvli zero" is the same cost as "vsetvli gpr", 
>
> I think for (b),  solution 2 and solution 3 should be almost the same.
 
OK, thanks.  If we wanted to use solution 2 for (b), the condition
would be just:
 
  LOOP_VINFO_LENS (loop_vinfo)[0].factor == 1
 
dropping the:
 
  LOOP_VINFO_LENS (loop_vinfo).length () == 1
 
But it would make the pointer IV updates more complex.  So let's
say that that's the reason for preferring solution 3.
 
So rather than:
 
+  /* If we're using decrement IV approach in loop control, we can use output of
+ SELECT_VL to adjust IV of loop control and data reference when it 
satisfies
+ the following checks:
+
+ (a) SELECT_VL is supported by the target.
+ (b) LOOP_VINFO is single-rgroup control.
+ (c) non-SLP.
+ (d) LOOP can not be unrolled.
+
+ Otherwise, we use MIN_EXPR approach.
+
+ 1. We only apply SELECT_VL on single-rgroup since:
+
+ (1). Multiple-rgroup controls N vector loads/stores would need N pointer
+   updates by variable amounts.
+ (2). SELECT_VL allows flexible length (<=VF) in each iteration.
+ (3). For decrement IV approach, we calculate the MAX length of the loop
+   and then deduce the length of each control from this MAX length.
+
+ Base on (1), (2) and (3) situations, if we try to use SELECT_VL on
+ multiple-rgroup control, we need to generate multiple SELECT_VL to
+ carefully adjust length of each control. Such approach is very inefficient
+ and unprofitable for targets that are using a standalone instruction
+ to configure the length of each operation.
+ E.g. RISC-V vector use 'vsetvl' to configure the length of each operation.
 
how about:
 
  /* If a loop uses length controls and has a decrementing loop control IV,
 we will normally pass that IV through a MIN_EXPR to calcaluate the
 basis for the length controls.  E.g. in a loop that processes one
 element per scalar iteration, the number of elements would be
 MIN_EXPR , where N is the number of scalar iterations left.
 
 This MIN_EXPR approach allows us to use pointer IVs with an invariant
 step, since only the final iteration of the vector loop can have
 inactive lanes.
 
 However, some targets have a dedicated instruction for calculating the
 preferred length, given the total number of elements that still need to
 be processed.  This is encapsulated in the SELECT_VL internal function.
 
 If the target supports SELECT_VL, we can use it instead of MIN_EXPR
 to determine the basis for the length controls.  However, unlike the
 MIN_EXPR calculation, the SELECT_VL calculation can decide to make
 lanes inactive in any iteration of the vector loop, not just the last
 iteration.  This SELECT_VL approach therefore requires us to use pointer
 IVs with variable steps.
 
 Once we've decided how many elements should be processed by one
 iteration of the vector loop, we need to populate the rgroup controls.
 If a loop has multiple rgroups, we need to make sure that those rgroups
 "line up" (that is, they must be consistent about which elements are
 active and which aren't).  This is done by vect_adjust_loop_lens_control.
 
 In principle, it would be possible to use vect_adjust_loop_lens_control
 on either the result of a MIN_EXPR or the result of a SELECT_VL.
 However:
 
 (1) In practice, it only makes sense to use SELECT_VL when a vector
 operation will be controlled directly by the result.  It is not
 worth using SELECT_VL if it would only be the input to other
 calculations.
 
 (2) If we u

Re: [PATCH v2] RISC-V: Support RVV FP16 ZVFH floating-point intrinsic API

2023-06-05 Thread juzhe.zh...@rivai.ai
LGTM,



juzhe.zh...@rivai.ai
 
From: pan2.li
Date: 2023-06-05 16:20
To: gcc-patches
CC: juzhe.zhong; kito.cheng; pan2.li; yanzhang.wang
Subject: [PATCH v2] RISC-V: Support RVV FP16 ZVFH floating-point intrinsic API
From: Pan Li 
 
This patch support the intrinsic API of FP16 ZVFH floating-point. Aka
SEW=16 for below instructions:
 
vfadd vfsub vfrsub vfwadd vfwsub
vfmul vfdiv vfrdiv vfwmul
vfmacc vfnmacc vfmsac vfnmsac vfmadd
vfnmadd vfmsub vfnmsub vfwmacc vfwnmacc vfwmsac vfwnmsac
vfsqrt vfrsqrt7 vfrec7
vfmin vfmax
vfsgnj vfsgnjn vfsgnjx
vmfeq vmfne vmflt vmfle vmfgt vmfge
vfclass vfmerge
vfmv
vfcvt vfwcvt vfncvt
 
Then users can leverage the instrinsic APIs to perform the FP=16 related
operations. Please note not all the instrinsic APIs are coverred in the
test files, only pick some typical ones due to too many. We will perform
the FP16 related instrinsic API test entirely soon.
 
Signed-off-by: Pan Li 
 
gcc/ChangeLog:
 
* config/riscv/riscv-vector-builtins-types.def
(vfloat32mf2_t): New type for DEF_RVV_WEXTF_OPS.
(vfloat32m1_t): Ditto.
(vfloat32m2_t): Ditto.
(vfloat32m4_t): Ditto.
(vfloat32m8_t): Ditto.
(vint16mf4_t): New type for DEF_RVV_CONVERT_I_OPS.
(vint16mf2_t): Ditto.
(vint16m1_t): Ditto.
(vint16m2_t): Ditto.
(vint16m4_t): Ditto.
(vint16m8_t): Ditto.
(vuint16mf4_t): New type for DEF_RVV_CONVERT_U_OPS.
(vuint16mf2_t): Ditto.
(vuint16m1_t): Ditto.
(vuint16m2_t): Ditto.
(vuint16m4_t): Ditto.
(vuint16m8_t): Ditto.
(vint32mf2_t): New type for DEF_RVV_WCONVERT_I_OPS.
(vint32m1_t): Ditto.
(vint32m2_t): Ditto.
(vint32m4_t): Ditto.
(vint32m8_t): Ditto.
(vuint32mf2_t): New type for DEF_RVV_WCONVERT_U_OPS.
(vuint32m1_t): Ditto.
(vuint32m2_t): Ditto.
(vuint32m4_t): Ditto.
(vuint32m8_t): Ditto.
* config/riscv/vector-iterators.md: Add FP=16 support for V,
VWCONVERTI, VCONVERT, VNCONVERT, VMUL1 and vlmul1.
 
gcc/testsuite/ChangeLog:
 
* gcc.target/riscv/rvv/base/zvfh-intrinsic.c: New test.
 
Signed-off-by: Pan Li 
---
.../riscv/riscv-vector-builtins-types.def |  32 ++
gcc/config/riscv/vector-iterators.md  |  21 +
.../riscv/rvv/base/zvfh-intrinsic.c   | 418 ++
3 files changed, 471 insertions(+)
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/zvfh-intrinsic.c
 
diff --git a/gcc/config/riscv/riscv-vector-builtins-types.def 
b/gcc/config/riscv/riscv-vector-builtins-types.def
index 9cb3aca992e..1e2491de6d6 100644
--- a/gcc/config/riscv/riscv-vector-builtins-types.def
+++ b/gcc/config/riscv/riscv-vector-builtins-types.def
@@ -518,11 +518,24 @@ DEF_RVV_FULL_V_U_OPS (vuint64m2_t, RVV_REQUIRE_FULL_V)
DEF_RVV_FULL_V_U_OPS (vuint64m4_t, RVV_REQUIRE_FULL_V)
DEF_RVV_FULL_V_U_OPS (vuint64m8_t, RVV_REQUIRE_FULL_V)
+DEF_RVV_WEXTF_OPS (vfloat32mf2_t, TARGET_ZVFH | RVV_REQUIRE_ELEN_FP_32 | 
RVV_REQUIRE_MIN_VLEN_64)
+DEF_RVV_WEXTF_OPS (vfloat32m1_t, TARGET_ZVFH | RVV_REQUIRE_ELEN_FP_32)
+DEF_RVV_WEXTF_OPS (vfloat32m2_t, TARGET_ZVFH | RVV_REQUIRE_ELEN_FP_32)
+DEF_RVV_WEXTF_OPS (vfloat32m4_t, TARGET_ZVFH | RVV_REQUIRE_ELEN_FP_32)
+DEF_RVV_WEXTF_OPS (vfloat32m8_t, TARGET_ZVFH | RVV_REQUIRE_ELEN_FP_32)
+
DEF_RVV_WEXTF_OPS (vfloat64m1_t, RVV_REQUIRE_ELEN_FP_64)
DEF_RVV_WEXTF_OPS (vfloat64m2_t, RVV_REQUIRE_ELEN_FP_64)
DEF_RVV_WEXTF_OPS (vfloat64m4_t, RVV_REQUIRE_ELEN_FP_64)
DEF_RVV_WEXTF_OPS (vfloat64m8_t, RVV_REQUIRE_ELEN_FP_64)
+DEF_RVV_CONVERT_I_OPS (vint16mf4_t, TARGET_ZVFH | RVV_REQUIRE_MIN_VLEN_64)
+DEF_RVV_CONVERT_I_OPS (vint16mf2_t, TARGET_ZVFH)
+DEF_RVV_CONVERT_I_OPS (vint16m1_t, TARGET_ZVFH)
+DEF_RVV_CONVERT_I_OPS (vint16m2_t, TARGET_ZVFH)
+DEF_RVV_CONVERT_I_OPS (vint16m4_t, TARGET_ZVFH)
+DEF_RVV_CONVERT_I_OPS (vint16m8_t, TARGET_ZVFH)
+
DEF_RVV_CONVERT_I_OPS (vint32mf2_t, RVV_REQUIRE_MIN_VLEN_64)
DEF_RVV_CONVERT_I_OPS (vint32m1_t, 0)
DEF_RVV_CONVERT_I_OPS (vint32m2_t, 0)
@@ -533,6 +546,13 @@ DEF_RVV_CONVERT_I_OPS (vint64m2_t, RVV_REQUIRE_ELEN_64)
DEF_RVV_CONVERT_I_OPS (vint64m4_t, RVV_REQUIRE_ELEN_64)
DEF_RVV_CONVERT_I_OPS (vint64m8_t, RVV_REQUIRE_ELEN_64)
+DEF_RVV_CONVERT_U_OPS (vuint16mf4_t, TARGET_ZVFH | RVV_REQUIRE_MIN_VLEN_64)
+DEF_RVV_CONVERT_U_OPS (vuint16mf2_t, TARGET_ZVFH)
+DEF_RVV_CONVERT_U_OPS (vuint16m1_t, TARGET_ZVFH)
+DEF_RVV_CONVERT_U_OPS (vuint16m2_t, TARGET_ZVFH)
+DEF_RVV_CONVERT_U_OPS (vuint16m4_t, TARGET_ZVFH)
+DEF_RVV_CONVERT_U_OPS (vuint16m8_t, TARGET_ZVFH)
+
DEF_RVV_CONVERT_U_OPS (vuint32mf2_t, RVV_REQUIRE_MIN_VLEN_64)
DEF_RVV_CONVERT_U_OPS (vuint32m1_t, 0)
DEF_RVV_CONVERT_U_OPS (vuint32m2_t, 0)
@@ -543,11 +563,23 @@ DEF_RVV_CONVERT_U_OPS (vuint64m2_t, RVV_REQUIRE_ELEN_64)
DEF_RVV_CONVERT_U_OPS (vuint64m4_t, RVV_REQUIRE_ELEN_64)
DEF_RVV_CONVERT_U_OPS (vuint64m8_t, RVV_REQUIRE_ELEN_64)
+DEF_RVV_WCONVERT_I_OPS (vint32mf2_t, TARGET_ZVFH | RVV_REQUIRE_MIN_VLEN_64)
+DEF_RVV_WCONVERT_I_OPS (vint32m1_t, TARGET_ZVFH)
+DEF_RVV_WCONVERT_I_OPS (vint32m2_t, TARGET_ZVFH)
+DEF_RVV_WCONVERT_I_OPS (vint32m4_t, TARGET_ZVFH)
+DEF_RVV_WCONVERT_I_OPS (vint32m8_t, TARGET_ZVFH)
+
DEF_RVV_WCONVERT_I_OPS (vint64m1_t, RVV_REQUIRE_ELEN_FP_32 | 
RVV_REQUIRE_ELEN_64

Re: [PATCH] RISC-V: Fix 'REQUIREMENT' for machine_mode 'MODE' in vector-iterators.md.

2023-06-05 Thread juzhe.zh...@rivai.ai
Thanks for catching this.
LGTM.



juzhe.zh...@rivai.ai
 
From: Li Xu
Date: 2023-06-05 16:18
To: gcc-patches
CC: kito.cheng; palmer; juzhe.zhong; Li Xu
Subject: [PATCH] RISC-V: Fix 'REQUIREMENT' for machine_mode 'MODE' in 
vector-iterators.md.
gcc/ChangeLog:
 
* config/riscv/vector-iterators.md: Fix 'REQUIREMENT' for machine_mode 
'MODE'.
* config/riscv/vector.md 
(@pred_indexed_store): change 
VNX16_QHSI to VNX16_QHSDI.
(@pred_indexed_store): Ditto.
---
gcc/config/riscv/vector-iterators.md | 26 +-
gcc/config/riscv/vector.md   |  6 +++---
2 files changed, 16 insertions(+), 16 deletions(-)
 
diff --git a/gcc/config/riscv/vector-iterators.md 
b/gcc/config/riscv/vector-iterators.md
index 90743ed76c5..42cbbb49894 100644
--- a/gcc/config/riscv/vector-iterators.md
+++ b/gcc/config/riscv/vector-iterators.md
@@ -148,7 +148,7 @@
])
(define_mode_iterator VEEWEXT8 [
-  (VNx1DI "TARGET_VECTOR_ELEN_64") (VNx2DI "TARGET_VECTOR_ELEN_64")
+  (VNx1DI "TARGET_VECTOR_ELEN_64 && TARGET_MIN_VLEN < 128") (VNx2DI 
"TARGET_VECTOR_ELEN_64")
   (VNx4DI "TARGET_VECTOR_ELEN_64") (VNx8DI "TARGET_VECTOR_ELEN_64") (VNx16DI 
"TARGET_VECTOR_ELEN_64 && TARGET_MIN_VLEN >= 128")
   (VNx1DF "TARGET_VECTOR_ELEN_FP_64 && TARGET_MIN_VLEN < 128")
   (VNx2DF "TARGET_VECTOR_ELEN_FP_64")
@@ -188,7 +188,7 @@
   (VNx4SF "TARGET_VECTOR_ELEN_FP_32")
   (VNx8SF "TARGET_VECTOR_ELEN_FP_32")
   (VNx16SF "TARGET_VECTOR_ELEN_FP_32 && TARGET_MIN_VLEN >= 128")
-  (VNx1DF "TARGET_VECTOR_ELEN_FP_64")
+  (VNx1DF "TARGET_VECTOR_ELEN_FP_64 && TARGET_MIN_VLEN < 128")
   (VNx2DF "TARGET_VECTOR_ELEN_FP_64")
   (VNx4DF "TARGET_VECTOR_ELEN_FP_64")
   (VNx8DF "TARGET_VECTOR_ELEN_FP_32 && TARGET_MIN_VLEN >= 128")
@@ -199,7 +199,7 @@
   (VNx1HI "TARGET_MIN_VLEN < 128") VNx2HI VNx4HI VNx8HI (VNx16HI 
"TARGET_MIN_VLEN >= 128")
   (VNx1SI "TARGET_MIN_VLEN < 128") VNx2SI VNx4SI (VNx8SI "TARGET_MIN_VLEN >= 
128")
   (VNx1DI "TARGET_VECTOR_ELEN_64 && TARGET_MIN_VLEN < 128") (VNx2DI 
"TARGET_VECTOR_ELEN_64")
-  (VNx4DI "TARGET_VECTOR_ELEN_64")
+  (VNx4DI "TARGET_VECTOR_ELEN_64 && TARGET_MIN_VLEN >= 128")
   (VNx1SF "TARGET_VECTOR_ELEN_FP_32 && TARGET_MIN_VLEN < 128")
   (VNx2SF "TARGET_VECTOR_ELEN_FP_32")
   (VNx4SF "TARGET_VECTOR_ELEN_FP_32")
@@ -213,11 +213,11 @@
   (VNx1QI "TARGET_MIN_VLEN < 128") VNx2QI VNx4QI VNx8QI (VNx16QI 
"TARGET_MIN_VLEN >= 128")
   (VNx1HI "TARGET_MIN_VLEN < 128") VNx2HI VNx4HI (VNx8HI "TARGET_MIN_VLEN >= 
128")
   (VNx1SI "TARGET_MIN_VLEN < 128") VNx2SI (VNx4SI "TARGET_MIN_VLEN >= 128")
-  (VNx1DI "TARGET_VECTOR_ELEN_64") (VNx2DI "TARGET_VECTOR_ELEN_64 && 
TARGET_MIN_VLEN >= 128")
+  (VNx1DI "TARGET_VECTOR_ELEN_64 && TARGET_MIN_VLEN < 128") (VNx2DI 
"TARGET_VECTOR_ELEN_64 && TARGET_MIN_VLEN >= 128")
   (VNx1SF "TARGET_VECTOR_ELEN_FP_32 && TARGET_MIN_VLEN < 128")
   (VNx2SF "TARGET_VECTOR_ELEN_FP_32")
   (VNx4SF "TARGET_VECTOR_ELEN_FP_32 && TARGET_MIN_VLEN >= 128")
-  (VNx1DF "TARGET_VECTOR_ELEN_FP_64")
+  (VNx1DF "TARGET_VECTOR_ELEN_FP_64 && TARGET_MIN_VLEN < 128")
   (VNx2DF "TARGET_VECTOR_ELEN_FP_32 && TARGET_MIN_VLEN >= 128")
])
@@ -400,26 +400,26 @@
(define_mode_iterator VNX1_QHSDI [
   (VNx1QI "TARGET_MIN_VLEN < 128") (VNx1HI "TARGET_MIN_VLEN < 128") (VNx1SI 
"TARGET_MIN_VLEN < 128")
-  (VNx1DI "TARGET_64BIT && TARGET_MIN_VLEN > 32")
+  (VNx1DI "TARGET_64BIT && TARGET_VECTOR_ELEN_64 && TARGET_MIN_VLEN < 128")
])
(define_mode_iterator VNX2_QHSDI [
   VNx2QI VNx2HI VNx2SI
-  (VNx2DI "TARGET_64BIT && TARGET_MIN_VLEN > 32")
+  (VNx2DI "TARGET_64BIT && TARGET_VECTOR_ELEN_64")
])
(define_mode_iterator VNX4_QHSDI [
   VNx4QI VNx4HI VNx4SI
-  (VNx4DI "TARGET_64BIT && TARGET_MIN_VLEN > 32")
+  (VNx4DI "TARGET_64BIT && TARGET_VECTOR_ELEN_64")
])
(define_mode_iterator VNX8_QHSDI [
   VNx8QI VNx8HI VNx8SI
-  (VNx8DI "TARGET_64BIT && TARGET_MIN_VLEN > 32")
+  (VNx8DI "TARGET_64BIT && TARGET_VECTOR_ELEN_64")
])
-(define_mode_iterator VNX16_QHSI [
-  VNx16QI VNx16HI (VNx16SI "TARGET_MIN_VLEN > 32") (VNx16DI "TARGET_MIN_VLEN 
>= 128")
+(define_mode_iterator VNX16_QHSDI [
+  VNx16QI VNx16HI (VNx16SI "T

Re: Re: [PATCH V2] VECT: Add SELECT_VL support

2023-06-05 Thread juzhe.zh...@rivai.ai
Hi, Richard.

>> No, I meant that the comment I quoted seemed to be saying that solution
>> 3 wasn't possible.  The comment seemed to say that we would need to do
>> solution 1.
I am so sorry that I didn't write the comments accurately.
Could you help me with comments ? Base on what we have discussed above (I think 
we are on same page now).
Hmmm. I am not the native English speaker, I use google translator for comments 
:).

>> When comparing solutions 2 and 3 for case (b), is solution 3 still better?
>> E.g. is "vsetvli zero" cheaper than "vsetvli "?


"vsetvli zero" is the same cost as "vsetvli gpr", 

I think for (b),  solution 2 and solution 3 should be almost the same.




juzhe.zh...@rivai.ai
 
From: Richard Sandiford
Date: 2023-06-05 15:57
To: juzhe.zhong\@rivai.ai
CC: gcc-patches; rguenther
Subject: Re: [PATCH V2] VECT: Add SELECT_VL support
Richard Sandiford  writes:
> "juzhe.zh...@rivai.ai"  writes:
>> Hi, Richard. Thanks for the comments.
>>
>>>> If we use SELECT_VL to refer only to the target-independent ifn, I don't
>>>> see why this last bit is true.
>> Could you give me more details and information about this since I am not 
>> sure whether I catch up with you. 
>> You mean the current SELECT_VL is not an appropriate IFN?
>
> No, I meant that the comment I quoted seemed to be saying that solution
> 3 wasn't possible.  The comment seemed to say that we would need to do
> solution 1.
 
Sorry, I meant solution 2 rather than solution 3.
 


Re: [PATCH v1] RISC-V: Support RVV FP16 ZVFH floating-point intrinsic API

2023-06-05 Thread juzhe.zh...@rivai.ai
+DEF_RVV_WEXTF_OPS (vfloat32mf2_t, RVV_REQUIRE_ELEN_FP_32 | 
RVV_REQUIRE_MIN_VLEN_64)
+DEF_RVV_WEXTF_OPS (vfloat32m1_t, RVV_REQUIRE_ELEN_FP_32)
+DEF_RVV_WEXTF_OPS (vfloat32m2_t, RVV_REQUIRE_ELEN_FP_32)
+DEF_RVV_WEXTF_OPS (vfloat32m4_t, RVV_REQUIRE_ELEN_FP_32)
+DEF_RVV_WEXTF_OPS (vfloat32m8_t, RVV_REQUIRE_ELEN_FP_32)
Is this used in vfwcvt ? convert FP16 -> FP32, if yes, you should add ZVFHMIN 
or ZVFH require checking.


+DEF_RVV_CONVERT_I_OPS (vint16mf4_t, RVV_REQUIRE_MIN_VLEN_64)
+DEF_RVV_CONVERT_I_OPS (vint16mf2_t, 0)
+DEF_RVV_CONVERT_I_OPS (vint16m1_t, 0)
+DEF_RVV_CONVERT_I_OPS (vint16m2_t, 0)
+DEF_RVV_CONVERT_I_OPS (vint16m4_t, 0)
+DEF_RVV_CONVERT_I_OPS (vint16m8_t, 0)

same


+DEF_RVV_CONVERT_U_OPS (vuint16mf4_t, RVV_REQUIRE_MIN_VLEN_64)
+DEF_RVV_CONVERT_U_OPS (vuint16mf2_t, 0)
+DEF_RVV_CONVERT_U_OPS (vuint16m1_t, 0)
+DEF_RVV_CONVERT_U_OPS (vuint16m2_t, 0)
+DEF_RVV_CONVERT_U_OPS (vuint16m4_t, 0)
+DEF_RVV_CONVERT_U_OPS (vuint16m8_t, 0

same

+DEF_RVV_WCONVERT_I_OPS (vint32mf2_t, RVV_REQUIRE_MIN_VLEN_64)
+DEF_RVV_WCONVERT_I_OPS (vint32m1_t, 0)
+DEF_RVV_WCONVERT_I_OPS (vint32m2_t, 0)
+DEF_RVV_WCONVERT_I_OPS (vint32m4_t, 0)
+DEF_RVV_WCONVERT_I_OPS (vint32m8_t, 0)


same

+DEF_RVV_WCONVERT_U_OPS (vuint32mf2_t, RVV_REQUIRE_MIN_VLEN_64)
+DEF_RVV_WCONVERT_U_OPS (vuint32m1_t, 0)
+DEF_RVV_WCONVERT_U_OPS (vuint32m2_t, 0)
+DEF_RVV_WCONVERT_U_OPS (vuint32m4_t, 0)
+DEF_RVV_WCONVERT_U_OPS (vuint32m8_t, 0)

same



Otherwise, LGTM.


juzhe.zh...@rivai.ai
 
From: pan2.li
Date: 2023-06-05 14:50
To: gcc-patches
CC: juzhe.zhong; kito.cheng; pan2.li; yanzhang.wang
Subject: [PATCH v1] RISC-V: Support RVV FP16 ZVFH floating-point intrinsic API
From: Pan Li 
 
This patch support the intrinsic API of FP16 ZVFH floating-point. Aka
SEW=16 for below instructions:
 
vfadd vfsub vfrsub vfwadd vfwsub
vfmul vfdiv vfrdiv vfwmul
vfmacc vfnmacc vfmsac vfnmsac vfmadd
vfnmadd vfmsub vfnmsub vfwmacc vfwnmacc vfwmsac vfwnmsac
vfsqrt vfrsqrt7 vfrec7
vfmin vfmax
vfsgnj vfsgnjn vfsgnjx
vmfeq vmfne vmflt vmfle vmfgt vmfge
vfclass vfmerge
vfmv
vfcvt vfwcvt vfncvt
 
Then users can leverage the instrinsic APIs to perform the FP=16 related
operations. Please note not all the instrinsic APIs are coverred in the
test files, only pick some typical ones due to too many. We will perform
the FP16 related instrinsic API test entirely soon.
 
Signed-off-by: Pan Li 
 
gcc/ChangeLog:
 
* config/riscv/riscv-vector-builtins-types.def
(vfloat32mf2_t): New type for DEF_RVV_WEXTF_OPS.
(vfloat32m1_t): Ditto.
(vfloat32m2_t): Ditto.
(vfloat32m4_t): Ditto.
(vfloat32m8_t): Ditto.
(vint16mf4_t): New type for DEF_RVV_CONVERT_I_OPS.
(vint16mf2_t): Ditto.
(vint16m1_t): Ditto.
(vint16m2_t): Ditto.
(vint16m4_t): Ditto.
(vint16m8_t): Ditto.
(vuint16mf4_t): New type for DEF_RVV_CONVERT_U_OPS.
(vuint16mf2_t): Ditto.
(vuint16m1_t): Ditto.
(vuint16m2_t): Ditto.
(vuint16m4_t): Ditto.
(vuint16m8_t): Ditto.
(vint32mf2_t): New type for DEF_RVV_WCONVERT_I_OPS.
(vint32m1_t): Ditto.
(vint32m2_t): Ditto.
(vint32m4_t): Ditto.
(vint32m8_t): Ditto.
(vuint32mf2_t): New type for DEF_RVV_WCONVERT_U_OPS.
(vuint32m1_t): Ditto.
(vuint32m2_t): Ditto.
(vuint32m4_t): Ditto.
(vuint32m8_t): Ditto.
* config/riscv/vector-iterators.md: Add FP=16 support for V,
VWCONVERTI, VCONVERT, VNCONVERT, VMUL1 and vlmul1.
 
gcc/testsuite/ChangeLog:
 
* gcc.target/riscv/rvv/base/zvfh-intrinsic.c: New test.
---
.../riscv/riscv-vector-builtins-types.def |  32 ++
gcc/config/riscv/vector-iterators.md  |  21 +
.../riscv/rvv/base/zvfh-intrinsic.c   | 418 ++
3 files changed, 471 insertions(+)
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/zvfh-intrinsic.c
 
diff --git a/gcc/config/riscv/riscv-vector-builtins-types.def 
b/gcc/config/riscv/riscv-vector-builtins-types.def
index 9cb3aca992e..348aa05dd91 100644
--- a/gcc/config/riscv/riscv-vector-builtins-types.def
+++ b/gcc/config/riscv/riscv-vector-builtins-types.def
@@ -518,11 +518,24 @@ DEF_RVV_FULL_V_U_OPS (vuint64m2_t, RVV_REQUIRE_FULL_V)
DEF_RVV_FULL_V_U_OPS (vuint64m4_t, RVV_REQUIRE_FULL_V)
DEF_RVV_FULL_V_U_OPS (vuint64m8_t, RVV_REQUIRE_FULL_V)
+DEF_RVV_WEXTF_OPS (vfloat32mf2_t, RVV_REQUIRE_ELEN_FP_32 | 
RVV_REQUIRE_MIN_VLEN_64)
+DEF_RVV_WEXTF_OPS (vfloat32m1_t, RVV_REQUIRE_ELEN_FP_32)
+DEF_RVV_WEXTF_OPS (vfloat32m2_t, RVV_REQUIRE_ELEN_FP_32)
+DEF_RVV_WEXTF_OPS (vfloat32m4_t, RVV_REQUIRE_ELEN_FP_32)
+DEF_RVV_WEXTF_OPS (vfloat32m8_t, RVV_REQUIRE_ELEN_FP_32)
+
DEF_RVV_WEXTF_OPS (vfloat64m1_t, RVV_REQUIRE_ELEN_FP_64)
DEF_RVV_WEXTF_OPS (vfloat64m2_t, RVV_REQUIRE_ELEN_FP_64)
DEF_RVV_WEXTF_OPS (vfloat64m4_t, RVV_REQUIRE_ELEN_FP_64)
DEF_RVV_WEXTF_OPS (vfloat64m8_t, RVV_REQUIRE_ELEN_FP_64)
+DEF_RVV_CONVERT_I_OPS (vint16mf4_t, RVV_REQUIRE_MIN_VLEN_64)
+DEF_RVV_CONVERT_I_OPS (vint16mf2_t, 0)
+DEF_RVV_CONVERT_I_OPS (vint16m1_t, 0)
+DEF_RVV_CONVERT_I_OPS (vint16m2_t, 0)
+DEF_RVV_CONVERT_I_OPS (vint16m4_t, 0)
+DEF_RVV_CONVERT_I_OPS (vint16m8_t, 0)
+
DEF_RVV_CONVERT_I_OPS (vint32mf2_t, RVV_REQUIRE_MIN_VLEN_64)
DEF_RVV_CONVERT_I_

Re: Re: [PATCH V2] VECT: Add SELECT_VL support

2023-06-05 Thread juzhe.zh...@rivai.ai
 than the general 
scalar instruction (for example "min" is much cheaper than "vsetvli").
So I am 100% sure that solution 3 (current MIN flow in GCC) is much better than 
above:

max length = min (vf=8) ===> replaced "vsetli" by "min"
length 1 = min (vf=4) ===> replaced "vsetli" by "min"
length 2 = max length = length 1
...
vsetvli zero, length 1 <==insert by "VSETVL" PASS of RISC-V backend
load
vsetvli zero, length 2 <==insert by "VSETVL" PASS of RISC-V backend
load

This is much better than Solution 3 and avoid multiple switching of "VL" 
register by "vsetvli"

Ok, you may want ask if "min" is much cheaper than "vsetvli", why we need 
SELECT_VL?
The reason is I want to optimize the special case (single-rgoup), since rgroup 
is just using a single length, 
unlike multiple-rgroup control which has multiple length calculation statement:

Current flow of single-rgoup:

...
length = min (vf)
...
vsetvli zero. length <=== insert by VSETLVI PASS
load (pointer IV)
vadd.
...
pointer IV = pointer IV + VF

I want to optimize it into:

...
length = vsetvli (Vf)
... <=== not need to insert vsetvlli.
load (pointer IV)
vadd.
...
pointer IV = pointer IV + length (adjust in bytesize).

This flow is the same as RVV ISA and LLVM. 
And also base on "vsetvli" definition, we can allow "even distribution" in the 
last iterations.

Hope my description is clear, feel free to comment.
Thanks so much.



juzhe.zh...@rivai.ai
 
From: Richard Sandiford
Date: 2023-06-05 14:21
To: juzhe.zhong
CC: gcc-patches; rguenther
Subject: Re: [PATCH V2] VECT: Add SELECT_VL support
juzhe.zh...@rivai.ai writes:
> +  /* If we're using decrement IV approach in loop control, we can use output 
> of
> + SELECT_VL to adjust IV of loop control and data reference when it 
> satisfies
> + the following checks:
> +
> + (a) SELECT_VL is supported by the target.
> + (b) LOOP_VINFO is single-rgroup control.
> + (c) non-SLP.
> + (d) LOOP can not be unrolled.
> +
> + Otherwise, we use MIN_EXPR approach.
> +
> + 1. We only apply SELECT_VL on single-rgroup since:
> +
> + (1). Multiple-rgroup controls N vector loads/stores would need N pointer
> +   updates by variable amounts.
> + (2). SELECT_VL allows flexible length (<=VF) in each iteration.
> + (3). For decrement IV approach, we calculate the MAX length of the loop
> +   and then deduce the length of each control from this MAX length.
> +
> + Base on (1), (2) and (3) situations, if we try to use SELECT_VL on
> + multiple-rgroup control, we need to generate multiple SELECT_VL to
> + carefully adjust length of each control.
 
If we use SELECT_VL to refer only to the target-independent ifn, I don't
see why this last bit is true.  Like I said in the previous message,
when it comes to determining the length of each control, the approach we
take for MIN_EXPR IVs should work for SELECT_VL IVs.  The point is that,
in both cases, any inactive lanes are always the last lanes.
 
E.g. suppose that, for one particular iteration, SELECT_VL decides that
6 lanes should be active in a loop with VF==8.  If there is a 2-control
rgroup with 4 lanes each, the first control must be 4 and the second
control must be 2, just as if a MIN_EXPR had decided that 6 lanes of
the final iteration are active.
 
I'm not saying the decision itself is wrong.  But I think the explanation
could be clearer.
 
> + Such approach is very inefficient
> + and unprofitable for targets that are using a standalone instruction
> + to configure the length of each operation.
> + E.g. RISC-V vector use 'vsetvl' to configure the length of each 
> operation.
 
What I don't understand is why this isn't also a problem with the
fallback MIN_EXPR approach.  That is, with the same example as above,
but using MIN_EXPR IVs, I would have expected:
 
  VF == 8
 
  1-control rgroup "A":
A set by MIN_EXPR IV
 
  2-control rgroup "B1", "B2":
B1 = MIN (A, 4)
B2 = A - B1
 
and so the vectors controlled by A, B1 and B2 would all have different
lengths.
 
Is the point that, when using MIN_EXPR, this only happens in the
final iteration?  And that you use a tail/epilogue loop for that,
so that the main loop body operates on full vectors only?
 
Thanks,
Richard
 


Re: Re: [PATCH] RISC-V: Fix warning in predicated.md

2023-06-02 Thread juzhe.zh...@rivai.ai
Hi, I fixed it :
https://gcc.gnu.org/pipermail/gcc-patches/2023-June/620462.html 
Just feel free to commit it.

Thanks.


juzhe.zh...@rivai.ai
 
From: Andreas Schwab
Date: 2023-06-02 17:29
To: juzhe.zhong
CC: gcc-patches; kito.cheng; kito.cheng; palmer; palmer; jeffreyalaw; rdapp.gcc
Subject: Re: [PATCH] RISC-V: Fix warning in predicated.md
../../gcc/gcc/config/riscv/predicates.md: In function ‘bool arith_operand_or_mod
e_mask(rtx, machine_mode)’:
../../gcc/gcc/config/riscv/predicates.md:33:14: warning: comparison between 
signed and unsigned integer expressions [-Wsign-compare]
 (match_test "INTVAL (op) == GET_MODE_MASK (HImode)
 
-- 
Andreas Schwab, sch...@linux-m68k.org
GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510  2552 DF73 E780 A9DA AEC1
"And now for something completely different."
 


Re: Re: [PATCH] RISC-V: Fix warning in predicated.md

2023-06-02 Thread juzhe.zh...@rivai.ai
Oh there is 2 INTVAL (op) == GET_MODE_MASK...
I only change one  :)



juzhe.zh...@rivai.ai
 
From: Andreas Schwab
Date: 2023-06-02 17:29
To: juzhe.zhong
CC: gcc-patches; kito.cheng; kito.cheng; palmer; palmer; jeffreyalaw; rdapp.gcc
Subject: Re: [PATCH] RISC-V: Fix warning in predicated.md
../../gcc/gcc/config/riscv/predicates.md: In function ‘bool arith_operand_or_mod
e_mask(rtx, machine_mode)’:
../../gcc/gcc/config/riscv/predicates.md:33:14: warning: comparison between 
signed and unsigned integer expressions [-Wsign-compare]
 (match_test "INTVAL (op) == GET_MODE_MASK (HImode)
 
-- 
Andreas Schwab, sch...@linux-m68k.org
GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510  2552 DF73 E780 A9DA AEC1
"And now for something completely different."
 


Re: Re: [PATCH V3] VECT: Change flow of decrement IV

2023-06-02 Thread juzhe.zh...@rivai.ai
Thanks Richi. I am gonna merge it after Richard's final approve.



juzhe.zh...@rivai.ai
 
From: Richard Biener
Date: 2023-06-02 16:56
To: juzhe.zh...@rivai.ai
CC: gcc-patches; richard.sandiford; linkw
Subject: Re: [PATCH V3] VECT: Change flow of decrement IV
On Thu, 1 Jun 2023, juzhe.zh...@rivai.ai wrote:
 
> This patch is no difference from V2.
> Just add PR tree-optimization/109971 as Kewen's suggested.
> 
> Already bootstrapped and Regression on X86 no difference.
> 
> Ok for trunk ?
 
OK.
 
Richard.
 
> 
> juzhe.zh...@rivai.ai
>  
> From: juzhe.zhong
> Date: 2023-06-01 12:36
> To: gcc-patches
> CC: richard.sandiford; rguenther; linkw; Ju-Zhe Zhong
> Subject: [PATCH V3] VECT: Change flow of decrement IV
> From: Ju-Zhe Zhong 
>  
> Follow Richi's suggestion, I change current decrement IV flow from:
>  
> do {
>remain -= MIN (vf, remain);
> } while (remain != 0);
>  
> into:
>  
> do {
>old_remain = remain;
>len = MIN (vf, remain);
>remain -= vf;
> } while (old_remain >= vf);
>  
> to enhance SCEV.
>  
> Include fixes from kewen.
>  
>  
> This patch will need to wait for Kewen's test feedback.
>  
> Testing on X86 is on-going
>  
> Co-Authored by: Kewen Lin  
>  
>   PR tree-optimization/109971
>  
> gcc/ChangeLog:
>  
> * tree-vect-loop-manip.cc (vect_set_loop_controls_directly): Change 
> decrement IV flow.
> (vect_set_loop_condition_partial_vectors): Ditto.
>  
> ---
> gcc/tree-vect-loop-manip.cc | 36 +---
> 1 file changed, 25 insertions(+), 11 deletions(-)
>  
> diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc
> index acf3642ceb2..3f735945e67 100644
> --- a/gcc/tree-vect-loop-manip.cc
> +++ b/gcc/tree-vect-loop-manip.cc
> @@ -483,7 +483,7 @@ vect_set_loop_controls_directly (class loop *loop, 
> loop_vec_info loop_vinfo,
> gimple_stmt_iterator loop_cond_gsi,
> rgroup_controls *rgc, tree niters,
> tree niters_skip, bool might_wrap_p,
> - tree *iv_step)
> + tree *iv_step, tree *compare_step)
> {
>tree compare_type = LOOP_VINFO_RGROUP_COMPARE_TYPE (loop_vinfo);
>tree iv_type = LOOP_VINFO_RGROUP_IV_TYPE (loop_vinfo);
> @@ -538,9 +538,9 @@ vect_set_loop_controls_directly (class loop *loop, 
> loop_vec_info loop_vinfo,
>...
>vect__4.8_28 = .LEN_LOAD (_17, 32B, _36, 0);
>...
> -ivtmp_35 = ivtmp_9 - _36;
> +ivtmp_35 = ivtmp_9 - POLY_INT_CST [4, 4];
>...
> -if (ivtmp_35 != 0)
> +if (ivtmp_9 > POLY_INT_CST [4, 4])
>  goto ; [83.33%]
>else
>  goto ; [16.67%]
> @@ -549,13 +549,15 @@ vect_set_loop_controls_directly (class loop *loop, 
> loop_vec_info loop_vinfo,
>tree step = rgc->controls.length () == 1 ? rgc->controls[0]
>: make_ssa_name (iv_type);
>/* Create decrement IV.  */
> -  create_iv (nitems_total, MINUS_EXPR, step, NULL_TREE, loop, &incr_gsi,
> - insert_after, &index_before_incr, &index_after_incr);
> +  create_iv (nitems_total, MINUS_EXPR, nitems_step, NULL_TREE, loop,
> + &incr_gsi, insert_after, &index_before_incr,
> + &index_after_incr);
>gimple_seq_add_stmt (header_seq, gimple_build_assign (step, MIN_EXPR,
> index_before_incr,
> nitems_step));
>*iv_step = step;
> -  return index_after_incr;
> +  *compare_step = nitems_step;
> +  return index_before_incr;
>  }
>/* Create increment IV.  */
> @@ -825,6 +827,7 @@ vect_set_loop_condition_partial_vectors (class loop *loop,
>   arbitrarily pick the last.  */
>tree test_ctrl = NULL_TREE;
>tree iv_step = NULL_TREE;
> +  tree compare_step = NULL_TREE;
>rgroup_controls *rgc;
>rgroup_controls *iv_rgc = nullptr;
>unsigned int i;
> @@ -861,7 +864,7 @@ vect_set_loop_condition_partial_vectors (class loop *loop,
> &preheader_seq, &header_seq,
> loop_cond_gsi, rgc, niters,
> niters_skip, might_wrap_p,
> - &iv_step);
> + &iv_step, &compare_step);
> iv_rgc = rgc;
>   }
> @@ -884,10 +887,21 @@ vect_set_loop_condition_partial_vectors (class loop 
> *loop,
>/* Get a boolean result that tells us whether to iterate.  */
>edge exit_edge = single_exit (loop);
> -  tree_code code = (exit_edge->flags & EDGE_TRUE_VALUE) ? EQ_EXPR : NE_EXPR;
> -  tree zero_ctrl = build_zero_cst (TREE_TYPE (test_ctrl));
> -  gcond *cond_stmt = gimple_build_cond (code, test_ctrl, zero_ctrl,
> - NULL_TREE, NULL_TREE);
> +  gcond *cond_stmt;
> +  if (LOOP_VINFO_USING_DECREMENTING_IV_P (loop_vinfo))
> +{
> +  gcc_assert (compare_step);
> +  tree_code code = (exit_edge->flags &a

Re: Re: [PATCH V2] RISC-V: Add pseudo vwmul.wv pattern to enhance vwmul.vv instruction optimizations

2023-06-02 Thread juzhe.zh...@rivai.ai
Thanks. I am gonna wait for Jeff or Kito final approve.



juzhe.zh...@rivai.ai
 
From: Robin Dapp
Date: 2023-06-02 15:18
To: juzhe.zh...@rivai.ai; gcc-patches
CC: rdapp.gcc; kito.cheng; Kito.cheng; palmer; palmer; jeffreyalaw
Subject: Re: [PATCH V2] RISC-V: Add pseudo vwmul.wv pattern to enhance vwmul.vv 
instruction optimizations
>>> I like the code examples in general but find them hard to read
>>> at lengths > 5-10 or so.  Could we condense this a bit?
> Ok, Do I need to send V2 ? Or condense the commit log when merged the patch?
 
Sure, just condense a bit. No need for V2.
 
Regards
Robin
 


Re: Re: [PATCH V2] RISC-V: Add pseudo vwmul.wv pattern to enhance vwmul.vv instruction optimizations

2023-06-02 Thread juzhe.zh...@rivai.ai
Hi, Robin.

>> I like the code examples in general but find them hard to read
>> at lengths > 5-10 or so.  Could we condense this a bit?
Ok, Do I need to send V2 ? Or condense the commit log when merged the patch?


>> I'm a bit wary about getting the costs

>> right for combine patterns but we can deal with this later.

No, you don't need to worry about combining extensions and I don't think we 
need cost to adjust extensions combining.

For vmv.v.x + vadd.vv ==> vadd.vx, we can't claim that vadd.vx is better since 
it will increase scalar register pressures.
So, for such combining, I would like take a another approach to combine this 
pattern carefully with accurate register pressure calculation.

However, for this patch.

vext.vf2 + vext.vf2 + vadd ==> vwadd.vv is always better.
I don't think it is possible that using vwadd.vv will be worse. 

Thanks.



juzhe.zh...@rivai.ai
 
From: Robin Dapp
Date: 2023-06-02 15:01
To: juzhe.zhong; gcc-patches
CC: rdapp.gcc; kito.cheng; kito.cheng; palmer; palmer; jeffreyalaw
Subject: Re: [PATCH V2] RISC-V: Add pseudo vwmul.wv pattern to enhance vwmul.vv 
instruction optimizations
Hi Juzhe,
 
> ...
>vsetvli zero,t1,e8,m1,ta,ma
> vle8.v  v1,0(a4)
> vsetvli t3,zero,e16,m2,ta,ma
> vsext.vf2   v6,v1
> vsetvli zero,t1,e8,m1,ta,ma
> vle8.v  v1,0(a5)
> vsetvli t3,zero,e16,m2,ta,ma
> add t0,a0,t4
> vzext.vf2   v4,v1
> vmul.vv v2,v4,v6
> vsetvli zero,t1,e16,m2,ta,ma
> vse16.v v2,0(t0)
> vle8.v  v1,0(a6)
> vsetvli t3,zero,e16,m2,ta,ma
> add t0,a1,t4
> vzext.vf2   v2,v1
> vmul.vv v4,v2,v4
> vsetvli zero,t1,e16,m2,ta,ma
> vse16.v v4,0(t0)
> vsetvli t3,zero,e16,m2,ta,ma
> add t0,a2,t4
> vmul.vv v2,v2,v6
> vsetvli zero,t1,e16,m2,ta,ma
> vse16.v v2,0(t0)
> add t0,a3,t4
> vle8.v  v1,0(a7)
> vsetvli t3,zero,e16,m2,ta,ma
> sub t6,t6,t1
> vsext.vf2   v2,v1
> vmul.vv v2,v2,v6
> vsetvli zero,t1,e16,m2,ta,ma
> vse16.v v2,0(t0)
> ...
> 
> After this patch:
> ...
>   vsetvli zero,t1,e8,mf2,ta,ma
> vle8.v  v1,0(a4)
> vle8.v  v3,0(a5)
> vsetvli t6,zero,e8,mf2,ta,ma
> add t0,a0,t3
> vwmulsu.vv  v2,v1,v3
> vsetvli zero,t1,e16,m1,ta,ma
> vse16.v v2,0(t0)
> vle8.v  v2,0(a6)
> vsetvli t6,zero,e8,mf2,ta,ma
> add t0,a1,t3
> vwmulu.vv   v4,v3,v2
> vsetvli zero,t1,e16,m1,ta,ma
> vse16.v v4,0(t0)
> vsetvli t6,zero,e8,mf2,ta,ma
> add t0,a2,t3
> vwmulsu.vv  v3,v1,v2
> vsetvli zero,t1,e16,m1,ta,ma
> vse16.v v3,0(t0)
> add t0,a3,t3
> vle8.v  v3,0(a7)
> vsetvli t6,zero,e8,mf2,ta,ma
> sub t4,t4,t1
> vwmul.vvv2,v1,v3
> vsetvli zero,t1,e16,m1,ta,ma
> vse16.v v2,0(t0)
> ...
 
I like the code examples in general but find them hard to read
at lengths > 5-10 or so.  Could we condense this a bit?
 
> +(include "autovec-opt.md")
ACK for this.  We discussed before that not cluttering the regular
autovec.md with combine-targeted patterns too much so I'm in favor
of the separate file.
 
In total looks good to me.  I'm a bit wary about getting the costs
right for combine patterns but we can deal with this later.
 
Regards
Robin
 


Re: Re: [PATCH] RISC-V: Add _mu C++ overloaded intrinsics for load && viota && vid

2023-06-01 Thread juzhe.zh...@rivai.ai
Oh. Yes. Thanks for catching this!
Will send V2 soon.



juzhe.zh...@rivai.ai
 
From: KuanLin Chen
Date: 2023-06-02 09:26
To: gcc-patches; juzhe.zhong
CC: kito.cheng; palmer; rdapp.gcc; jeffreyalaw
Subject: Re: [PATCH] RISC-V: Add _mu C++ overloaded intrinsics for load && 
viota && vid
Hi Juzhe,
 
I think fault_load_def::get_name should remove "instance.pred ==
PRED_TYPE_mu", right?
 
 於 2023年6月2日 週五 上午7:05寫道:
>
> From: Juzhe-Zhong 
>
> Base on these:
> https://github.com/riscv-non-isa/rvv-intrinsic-doc/issues/232
> https://github.com/riscv-non-isa/rvv-intrinsic-doc/pull/233
>
> Add _mu C++ overloaded intrinsics for load && viota && vid.
>
> gcc/ChangeLog:
>
> * config/riscv/riscv-vector-builtins-bases.cc: Add _mu overloaded 
> intrinsics.
>
> ---
>  gcc/config/riscv/riscv-vector-builtins-bases.cc | 10 +-
>  1 file changed, 5 insertions(+), 5 deletions(-)
>
> diff --git a/gcc/config/riscv/riscv-vector-builtins-bases.cc 
> b/gcc/config/riscv/riscv-vector-builtins-bases.cc
> index a8113f6602b..498c6ba042e 100644
> --- a/gcc/config/riscv/riscv-vector-builtins-bases.cc
> +++ b/gcc/config/riscv/riscv-vector-builtins-bases.cc
> @@ -164,7 +164,7 @@ public:
>{
>  if (STORE_P || LST_TYPE == LST_INDEXED)
>return true;
> -return pred != PRED_TYPE_none && pred != PRED_TYPE_mu;
> +return pred != PRED_TYPE_none;
>}
>
>rtx expand (function_expander &e) const override
> @@ -963,7 +963,7 @@ public:
>bool can_be_overloaded_p (enum predication_type_index pred) const override
>{
>  return pred == PRED_TYPE_tu || pred == PRED_TYPE_tum
> -  || pred == PRED_TYPE_tumu;
> +  || pred == PRED_TYPE_tumu || pred == PRED_TYPE_mu;
>}
>
>rtx expand (function_expander &e) const override
> @@ -979,7 +979,7 @@ public:
>bool can_be_overloaded_p (enum predication_type_index pred) const override
>{
>  return pred == PRED_TYPE_tu || pred == PRED_TYPE_tum
> -  || pred == PRED_TYPE_tumu;
> +  || pred == PRED_TYPE_tumu || pred == PRED_TYPE_mu;
>}
>
>rtx expand (function_expander &e) const override
> @@ -1749,7 +1749,7 @@ public:
>
>bool can_be_overloaded_p (enum predication_type_index pred) const override
>{
> -return pred != PRED_TYPE_none && pred != PRED_TYPE_mu;
> +return pred != PRED_TYPE_none;
>}
>
>rtx expand (function_expander &e) const override
> @@ -1794,7 +1794,7 @@ public:
>
>bool can_be_overloaded_p (enum predication_type_index pred) const override
>{
> -return pred != PRED_TYPE_none && pred != PRED_TYPE_mu;
> +return pred != PRED_TYPE_none;
>}
>
>rtx expand (function_expander &e) const override
> --
> 2.36.1
>
 


Re: [PATCH] RISC-V: Add pseudo vwmul.wv pattern to enhance vwmul.vv instruction optimizations

2023-06-01 Thread juzhe.zh...@rivai.ai
Hi, forget about this patch.
Just go directly the V2 patch with same title.

That's the last patch I fine tune for integer widening auto-vectorization.

Thanks.


juzhe.zh...@rivai.ai
 
From: juzhe.zhong
Date: 2023-06-01 15:31
To: gcc-patches
CC: kito.cheng; kito.cheng; palmer; palmer; jeffreyalaw; rdapp.gcc; Juzhe-Zhong
Subject: [PATCH] RISC-V: Add pseudo vwmul.wv pattern to enhance vwmul.vv 
instruction optimizations
From: Juzhe-Zhong 
 
This patch is to enhance vwmul.vv combine optimizations.
Consider this following code:
void
vwadd_int16_t_int8_t (int16_t *__restrict dst, int16_t *__restrict dst2,
  int16_t *__restrict dst3, int16_t *__restrict dst4,
  int8_t *__restrict a, int8_t *__restrict b,
  int8_t *__restrict a2, int8_t *__restrict b2, int n)
{
  for (int i = 0; i < n; i++)
{
  dst[i] = (int16_t) a[i] * (int16_t) b[i];
  dst2[i] = (int16_t) a2[i] * (int16_t) b[i];
  dst3[i] = (int16_t) a2[i] * (int16_t) a[i];
  dst4[i] = (int16_t) a[i] * (int16_t) b2[i];
}
}
 
In such complicate case, the operand is not single used, used by multiple 
statements.
GCC combine optimization will iterate the combination of the operands.
 
First round -> combine one of the operand and change vsext + vmul into vwmul.wv
Second round -> combine the other operand and change vwmul.wv into vwmul.vv
 
Notice when I add a pseudo vwmul.wv pattern, it makes vwmulsu.vv testcase fail
since GCC prefer such pattern order:
 
(mul: (zero_extend)
  (sign_exted))
 
So change vwmulsu.vv instruction operands order.
 
gcc/ChangeLog:
 
* config/riscv/vector.md: Shift zero_extend and sign_extend order.
* config/riscv/autovec-opt.md: New file.
 
gcc/testsuite/ChangeLog:
 
* gcc.target/riscv/rvv/autovec/widen/widen-7.c: New test.
* gcc.target/riscv/rvv/autovec/widen/widen-complicate-3.c: New test.
* gcc.target/riscv/rvv/autovec/widen/widen_run-7.c: New test.
 
---
gcc/config/riscv/autovec-opt.md   | 56 +++
gcc/config/riscv/vector.md|  9 +--
.../riscv/rvv/autovec/widen/widen-7.c | 27 +
.../rvv/autovec/widen/widen-complicate-3.c| 32 +++
.../riscv/rvv/autovec/widen/widen_run-7.c | 34 +++
5 files changed, 154 insertions(+), 4 deletions(-)
create mode 100644 gcc/config/riscv/autovec-opt.md
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen-7.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen-complicate-3.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen_run-7.c
 
diff --git a/gcc/config/riscv/autovec-opt.md b/gcc/config/riscv/autovec-opt.md
new file mode 100644
index 000..5b7dc9bef8c
--- /dev/null
+++ b/gcc/config/riscv/autovec-opt.md
@@ -0,0 +1,56 @@
+;; Machine description for optimization of RVV auto-vectorization.
+;; Copyright (C) 2023 Free Software Foundation, Inc.
+;; Contributed by Juzhe Zhong (juzhe.zh...@rivai.ai), RiVAI Technologies Ltd.
+
+;; This file is part of GCC.
+
+;; GCC is free software; you can redistribute it and/or modify
+;; it under the terms of the GNU General Public License as published by
+;; the Free Software Foundation; either version 3, or (at your option)
+;; any later version.
+
+;; GCC is distributed in the hope that it will be useful,
+;; but WITHOUT ANY WARRANTY; without even the implied warranty of
+;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+;; GNU General Public License for more details.
+
+;; You should have received a copy of the GNU General Public License
+;; along with GCC; see the file COPYING3.  If not see
+;; <http://www.gnu.org/licenses/>.
+
+;; We don't have vwmul.wv instruction like vwadd.wv in RVV.
+;; This pattern is an intermediate RTL IR as a pseudo vwmul.wv to enhance
+;; optimization of instructions combine.
+(define_insn_and_split "@pred_single_widen_mul"
+  [(set (match_operand:VWEXTI 0 "register_operand"  "=&vr,&vr")
+ (if_then_else:VWEXTI
+   (unspec:
+ [(match_operand: 1 "vector_mask_operand"   "vmWc1,vmWc1")
+  (match_operand 5 "vector_length_operand"  "   rK,   rK")
+  (match_operand 6 "const_int_operand"  "i,i")
+  (match_operand 7 "const_int_operand"  "i,i")
+  (match_operand 8 "const_int_operand"  "i,i")
+  (reg:SI VL_REGNUM)
+  (reg:SI VTYPE_REGNUM)] UNSPEC_VPREDICATE)
+   (mult:VWEXTI
+ (any_extend:VWEXTI
+   (match_operand: 4 "register_operand" "   vr,   vr"))
+ (match_operand:VWEXTI 3 "register_operand" "   vr,   vr"))
+   (match_operand:VWEXTI 2 "vector_merge_operand"   "   vu,0")))]
+  "TARGET_VECTOR"
+  &quo

Re: Re: [PATCH V3] VECT: Change flow of decrement IV

2023-06-01 Thread juzhe.zh...@rivai.ai
Thanks Kewen. Let's wait for Richard and Richi.



juzhe.zh...@rivai.ai
 
From: Kewen.Lin
Date: 2023-06-01 13:24
To: juzhe.zh...@rivai.ai
CC: richard.sandiford; rguenther; gcc-patches
Subject: Re: [PATCH V3] VECT: Change flow of decrement IV
Hi,
 
on 2023/6/1 13:00, juzhe.zh...@rivai.ai wrote:
> This patch is no difference from V2.
 
I support this patch based on the testing and SPEC2017 evaluation
results on Power (see my comments on patch v2).
 
> Just add PR tree-optimization/109971 as Kewen's suggested.
 
Thanks for adding that, I was expecting you will add that when you
are committing it, not really requesting one new version. :)  btw,
the PR marker(s) will trigger scripts to comment some commit info
(commit link, commit log) into the specified PR(s), people can
find some connections between PRs and (fixing or progressing forward)
commits easily.
 
BR,
Kewen
 
> 
> Already bootstrapped and Regression on X86 no difference.
> 
> Ok for trunk ?
> --
> juzhe.zh...@rivai.ai
> 
>  
> *From:* juzhe.zhong <mailto:juzhe.zh...@rivai.ai>
> *Date:* 2023-06-01 12:36
> *To:* gcc-patches <mailto:gcc-patches@gcc.gnu.org>
> *CC:* richard.sandiford <mailto:richard.sandif...@arm.com>; rguenther 
> <mailto:rguent...@suse.de>; linkw <mailto:li...@linux.ibm.com>; Ju-Zhe Zhong 
> <mailto:juzhe.zh...@rivai.ai>
> *Subject:* [PATCH V3] VECT: Change flow of decrement IV
> From: Ju-Zhe Zhong 
>  
> Follow Richi's suggestion, I change current decrement IV flow from:
>  
> do {
>remain -= MIN (vf, remain);
> } while (remain != 0);
>  
> into:
>  
> do {
>old_remain = remain;
>len = MIN (vf, remain);
>remain -= vf;
> } while (old_remain >= vf);
>  
> to enhance SCEV.
>  
> Include fixes from kewen.
>  
>  
> This patch will need to wait for Kewen's test feedback.
>  
> Testing on X86 is on-going
>  
> Co-Authored by: Kewen Lin  
>  
>   PR tree-optimization/109971
>  
> gcc/ChangeLog:
>  
> * tree-vect-loop-manip.cc (vect_set_loop_controls_directly): 
> Change decrement IV flow.
> (vect_set_loop_condition_partial_vectors): Ditto.
>  
> ---
> gcc/tree-vect-loop-manip.cc | 36 +---
> 1 file changed, 25 insertions(+), 11 deletions(-)
>  
> diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc
> index acf3642ceb2..3f735945e67 100644
> --- a/gcc/tree-vect-loop-manip.cc
> +++ b/gcc/tree-vect-loop-manip.cc
> @@ -483,7 +483,7 @@ vect_set_loop_controls_directly (class loop *loop, 
> loop_vec_info loop_vinfo,
> gimple_stmt_iterator loop_cond_gsi,
> rgroup_controls *rgc, tree niters,
> tree niters_skip, bool might_wrap_p,
> - tree *iv_step)
> + tree *iv_step, tree *compare_step)
> {
>tree compare_type = LOOP_VINFO_RGROUP_COMPARE_TYPE (loop_vinfo);
>tree iv_type = LOOP_VINFO_RGROUP_IV_TYPE (loop_vinfo);
> @@ -538,9 +538,9 @@ vect_set_loop_controls_directly (class loop *loop, 
> loop_vec_info loop_vinfo,
>...
>vect__4.8_28 = .LEN_LOAD (_17, 32B, _36, 0);
>...
> -ivtmp_35 = ivtmp_9 - _36;
> +ivtmp_35 = ivtmp_9 - POLY_INT_CST [4, 4];
>...
> -if (ivtmp_35 != 0)
> +if (ivtmp_9 > POLY_INT_CST [4, 4])
>  goto ; [83.33%]
>else
>  goto ; [16.67%]
> @@ -549,13 +549,15 @@ vect_set_loop_controls_directly (class loop *loop, 
> loop_vec_info loop_vinfo,
>tree step = rgc->controls.length () == 1 ? rgc->controls[0]
>: make_ssa_name (iv

Re: [PATCH] RISC-V: Introduce vfloat16m{f}*_t and their machine mode.

2023-06-01 Thread juzhe.zh...@rivai.ai
LGTM. 

We are waiting for FP16 vector to start floating-point auto-vectorizations

Thanks so much.


juzhe.zh...@rivai.ai
 
From: pan2.li
Date: 2023-06-01 15:17
To: gcc-patches
CC: juzhe.zhong; kito.cheng; pan2.li; yanzhang.wang
Subject: [PATCH] RISC-V: Introduce vfloat16m{f}*_t and their machine mode.
From: Pan Li 
 
This patch would like to introduce the built-in type vfloat16m{f}*_t, as
well as their machine mode VNx*HF. They depend on architecture zvfhmin
or zvfh.
 
When givn the zvfhmin or zvfh, the macro TARGET_VECTOR_ELEN_FP_16 will
be true.
 
The underlying PATCH will implement the zvfhmin extension based on this.
 
Signed-off-by: Pan Li 
 
gcc/ChangeLog:
 
* common/config/riscv/riscv-common.cc: Add FP_16 mask to zvfhmin
and zvfh.
* config/riscv/genrvv-type-indexer.cc (valid_type): Allow FP16.
(main): Disable FP16 tuple.
* config/riscv/riscv-opts.h (MASK_VECTOR_ELEN_FP_16): New macro.
(TARGET_VECTOR_ELEN_FP_16): Ditto.
* config/riscv/riscv-vector-builtins.cc (check_required_extensions):
Add FP16.
* config/riscv/riscv-vector-builtins.def (vfloat16mf4_t): New type.
(vfloat16mf2_t): Ditto.
(vfloat16m1_t): Ditto.
(vfloat16m2_t): Ditto.
(vfloat16m4_t): Ditto.
(vfloat16m8_t): Ditto.
* config/riscv/riscv-vector-builtins.h (RVV_REQUIRE_ELEN_FP_16):
New macro.
* config/riscv/riscv-vector-switch.def (ENTRY): Allow FP16
machine mode based on TARGET_VECTOR_ELEN_FP_16.
---
gcc/common/config/riscv/riscv-common.cc|  2 ++
gcc/config/riscv/genrvv-type-indexer.cc|  7 +--
gcc/config/riscv/riscv-opts.h  |  4 
gcc/config/riscv/riscv-vector-builtins.cc  |  2 ++
gcc/config/riscv/riscv-vector-builtins.def | 20 +++
gcc/config/riscv/riscv-vector-builtins.h   |  1 +
gcc/config/riscv/riscv-vector-switch.def   | 23 ++
7 files changed, 49 insertions(+), 10 deletions(-)
 
diff --git a/gcc/common/config/riscv/riscv-common.cc 
b/gcc/common/config/riscv/riscv-common.cc
index e6ed3df9ea6..3247d526c0a 100644
--- a/gcc/common/config/riscv/riscv-common.cc
+++ b/gcc/common/config/riscv/riscv-common.cc
@@ -1248,6 +1248,8 @@ static const riscv_ext_flag_table_t 
riscv_ext_flag_table[] =
   {"zve64x",   &gcc_options::x_riscv_vector_elen_flags, MASK_VECTOR_ELEN_64},
   {"zve64f",   &gcc_options::x_riscv_vector_elen_flags, 
MASK_VECTOR_ELEN_FP_32},
   {"zve64d",   &gcc_options::x_riscv_vector_elen_flags, 
MASK_VECTOR_ELEN_FP_64},
+  {"zvfhmin",  &gcc_options::x_riscv_vector_elen_flags, 
MASK_VECTOR_ELEN_FP_16},
+  {"zvfh", &gcc_options::x_riscv_vector_elen_flags, 
MASK_VECTOR_ELEN_FP_16},
   {"zvl32b",&gcc_options::x_riscv_zvl_flags, MASK_ZVL32B},
   {"zvl64b",&gcc_options::x_riscv_zvl_flags, MASK_ZVL64B},
diff --git a/gcc/config/riscv/genrvv-type-indexer.cc 
b/gcc/config/riscv/genrvv-type-indexer.cc
index 18e1b375396..8fc93ceaab4 100644
--- a/gcc/config/riscv/genrvv-type-indexer.cc
+++ b/gcc/config/riscv/genrvv-type-indexer.cc
@@ -54,7 +54,7 @@ valid_type (unsigned sew, int lmul_log2, bool float_p)
 case 8:
   return lmul_log2 >= -3 && !float_p;
 case 16:
-  return lmul_log2 >= -2 && !float_p;
+  return lmul_log2 >= -2;
 case 32:
   return lmul_log2 >= -1;
 case 64:
@@ -73,6 +73,9 @@ valid_type (unsigned sew, int lmul_log2, unsigned nf, bool 
float_p)
   if (nf > 8 || nf < 1)
 return false;
+  if (sew == 16 && nf != 1 && float_p) // Disable FP16 tuple in temporarily.
+return false;
+
   switch (lmul_log2)
 {
 case 1:
@@ -342,7 +345,7 @@ main (int argc, const char **argv)
fprintf (fp, ")\n");
  }
   // Build for vfloat
-  for (unsigned sew : {32, 64})
+  for (unsigned sew : {16, 32, 64})
 for (int lmul_log2 : {-3, -2, -1, 0, 1, 2, 3})
   for (unsigned nf : {1, 2, 3, 4, 5, 6, 7, 8})
{
diff --git a/gcc/config/riscv/riscv-opts.h b/gcc/config/riscv/riscv-opts.h
index 5f387d0e393..208a557b8ff 100644
--- a/gcc/config/riscv/riscv-opts.h
+++ b/gcc/config/riscv/riscv-opts.h
@@ -154,6 +154,8 @@ enum riscv_entity
#define MASK_VECTOR_ELEN_64(1 << 1)
#define MASK_VECTOR_ELEN_FP_32 (1 << 2)
#define MASK_VECTOR_ELEN_FP_64 (1 << 3)
+/* Align the bit index to riscv-vector-builtins.h.  */
+#define MASK_VECTOR_ELEN_FP_16 (1 << 6)
#define TARGET_VECTOR_ELEN_32 \
   ((riscv_vector_elen_flags & MASK_VECTOR_ELEN_32) != 0)
@@ -163,6 +165,8 @@ enum riscv_entity
   ((riscv_vector_elen_flags & MASK_VECTOR_ELEN_FP_32) != 0)
#define TARGET_VECTOR_ELEN_FP_64 \
   ((riscv_vector_elen_flags & MASK_VECTOR_ELEN_FP_64) != 0)
+#define TARGET_VECTOR_ELEN_FP_16 \
+  ((riscv_vector_elen_flags & MASK_VECTOR_ELEN_FP_16) != 0)
#define MASK_ZVL32B(1 <<  0)
#define MASK_ZVL64B(1 <<  1)
diff --git a/gcc/config/riscv/riscv-vector-builtins.cc 
b/gcc/config/riscv/riscv-vector-builtins.cc
index 9fea70709fd..43bf6d8f262 100644
--- a/gcc

Re: FW: [RFC] RISC-V: Support risc-v bfloat16 This patch support bfloat16 in riscv like x86_64 and arm.

2023-06-01 Thread juzhe.zh...@rivai.ai
I plan to implement BF16 vector in GCC but still waiting for ISA ratified since 
GCC policy doesn't allow un-ratified ISA.

Currently, we are working on INT8,INT16,INT32,INT64,FP16,FP32,FP64 
auto-vectorizaiton.
It should very simple BF16 in current vector framework in GCC.

Thanks.


juzhe.zh...@rivai.ai
 
From: Li, Pan2
Date: 2023-06-01 14:57
To: juzhe.zh...@rivai.ai
Subject: FW: [RFC] RISC-V: Support risc-v bfloat16 This patch support bfloat16 
in riscv like x86_64 and arm.
FYI.
 
-Original Message-
From: Gcc-patches  On Behalf 
Of Jin Ma via Gcc-patches
Sent: Thursday, June 1, 2023 2:51 PM
To: gcc-patches@gcc.gnu.org
Cc: shi...@iscas.ac.cn; kito.ch...@gmail.com; Jin Ma 
Subject: [RFC] RISC-V: Support risc-v bfloat16 This patch support bfloat16 in 
riscv like x86_64 and arm.
 
hi, 
 
Are there any new developments about Zfb? Are there any plans to implement the 
Zvfbfmin and Zvfbfwma expansion? I see that Zfb is being reviewed in llvm, 
maybe we should do the same on gcc.
 
Ref: https://reviews.llvm.org/D151313
 https://reviews.llvm.org/D150929
 


Re: [PATCH V3] VECT: Change flow of decrement IV

2023-05-31 Thread juzhe.zh...@rivai.ai
This patch is no difference from V2.
Just add PR tree-optimization/109971 as Kewen's suggested.

Already bootstrapped and Regression on X86 no difference.

Ok for trunk ?


juzhe.zh...@rivai.ai
 
From: juzhe.zhong
Date: 2023-06-01 12:36
To: gcc-patches
CC: richard.sandiford; rguenther; linkw; Ju-Zhe Zhong
Subject: [PATCH V3] VECT: Change flow of decrement IV
From: Ju-Zhe Zhong 
 
Follow Richi's suggestion, I change current decrement IV flow from:
 
do {
   remain -= MIN (vf, remain);
} while (remain != 0);
 
into:
 
do {
   old_remain = remain;
   len = MIN (vf, remain);
   remain -= vf;
} while (old_remain >= vf);
 
to enhance SCEV.
 
Include fixes from kewen.
 
 
This patch will need to wait for Kewen's test feedback.
 
Testing on X86 is on-going
 
Co-Authored by: Kewen Lin  
 
  PR tree-optimization/109971
 
gcc/ChangeLog:
 
* tree-vect-loop-manip.cc (vect_set_loop_controls_directly): Change 
decrement IV flow.
(vect_set_loop_condition_partial_vectors): Ditto.
 
---
gcc/tree-vect-loop-manip.cc | 36 +---
1 file changed, 25 insertions(+), 11 deletions(-)
 
diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc
index acf3642ceb2..3f735945e67 100644
--- a/gcc/tree-vect-loop-manip.cc
+++ b/gcc/tree-vect-loop-manip.cc
@@ -483,7 +483,7 @@ vect_set_loop_controls_directly (class loop *loop, 
loop_vec_info loop_vinfo,
gimple_stmt_iterator loop_cond_gsi,
rgroup_controls *rgc, tree niters,
tree niters_skip, bool might_wrap_p,
- tree *iv_step)
+ tree *iv_step, tree *compare_step)
{
   tree compare_type = LOOP_VINFO_RGROUP_COMPARE_TYPE (loop_vinfo);
   tree iv_type = LOOP_VINFO_RGROUP_IV_TYPE (loop_vinfo);
@@ -538,9 +538,9 @@ vect_set_loop_controls_directly (class loop *loop, 
loop_vec_info loop_vinfo,
   ...
   vect__4.8_28 = .LEN_LOAD (_17, 32B, _36, 0);
   ...
-ivtmp_35 = ivtmp_9 - _36;
+ivtmp_35 = ivtmp_9 - POLY_INT_CST [4, 4];
   ...
-if (ivtmp_35 != 0)
+if (ivtmp_9 > POLY_INT_CST [4, 4])
 goto ; [83.33%]
   else
 goto ; [16.67%]
@@ -549,13 +549,15 @@ vect_set_loop_controls_directly (class loop *loop, 
loop_vec_info loop_vinfo,
   tree step = rgc->controls.length () == 1 ? rgc->controls[0]
   : make_ssa_name (iv_type);
   /* Create decrement IV.  */
-  create_iv (nitems_total, MINUS_EXPR, step, NULL_TREE, loop, &incr_gsi,
- insert_after, &index_before_incr, &index_after_incr);
+  create_iv (nitems_total, MINUS_EXPR, nitems_step, NULL_TREE, loop,
+ &incr_gsi, insert_after, &index_before_incr,
+ &index_after_incr);
   gimple_seq_add_stmt (header_seq, gimple_build_assign (step, MIN_EXPR,
index_before_incr,
nitems_step));
   *iv_step = step;
-  return index_after_incr;
+  *compare_step = nitems_step;
+  return index_before_incr;
 }
   /* Create increment IV.  */
@@ -825,6 +827,7 @@ vect_set_loop_condition_partial_vectors (class loop *loop,
  arbitrarily pick the last.  */
   tree test_ctrl = NULL_TREE;
   tree iv_step = NULL_TREE;
+  tree compare_step = NULL_TREE;
   rgroup_controls *rgc;
   rgroup_controls *iv_rgc = nullptr;
   unsigned int i;
@@ -861,7 +864,7 @@ vect_set_loop_condition_partial_vectors (class loop *loop,
&preheader_seq, &header_seq,
loop_cond_gsi, rgc, niters,
niters_skip, might_wrap_p,
- &iv_step);
+ &iv_step, &compare_step);
iv_rgc = rgc;
  }
@@ -884,10 +887,21 @@ vect_set_loop_condition_partial_vectors (class loop *loop,
   /* Get a boolean result that tells us whether to iterate.  */
   edge exit_edge = single_exit (loop);
-  tree_code code = (exit_edge->flags & EDGE_TRUE_VALUE) ? EQ_EXPR : NE_EXPR;
-  tree zero_ctrl = build_zero_cst (TREE_TYPE (test_ctrl));
-  gcond *cond_stmt = gimple_build_cond (code, test_ctrl, zero_ctrl,
- NULL_TREE, NULL_TREE);
+  gcond *cond_stmt;
+  if (LOOP_VINFO_USING_DECREMENTING_IV_P (loop_vinfo))
+{
+  gcc_assert (compare_step);
+  tree_code code = (exit_edge->flags & EDGE_TRUE_VALUE) ? LE_EXPR : 
GT_EXPR;
+  cond_stmt = gimple_build_cond (code, test_ctrl, compare_step, NULL_TREE,
+  NULL_TREE);
+}
+  else
+{
+  tree_code code = (exit_edge->flags & EDGE_TRUE_VALUE) ? EQ_EXPR : 
NE_EXPR;
+  tree zero_ctrl = build_zero_cst (TREE_TYPE (test_ctrl));
+  cond_stmt
+ = gimple_build_cond (code, test_ctrl, zero_ctrl, NULL_TREE, NULL_TREE);
+}
   gsi_insert_before (&loop_cond_gsi, cond_stmt, GSI_SAME_STMT);
   /* The loop iterates (NITERS - 1) / VF + 1 times.
-- 
2.36.3
 


Re: Re: [PATCH V2] VECT: Change flow of decrement IV

2023-05-31 Thread juzhe.zh...@rivai.ai
Thanks kewen.
I have send V3 patch. Could you comment that ?
I want to make sure you do support that patch.

Thanks.


juzhe.zh...@rivai.ai
 
From: Kewen.Lin
Date: 2023-06-01 12:32
To: juzhe.zh...@rivai.ai
CC: richard.sandiford; rguenther; gcc-patches
Subject: Re: [PATCH V2] VECT: Change flow of decrement IV
Hi Juzhe,
 
on 2023/6/1 08:31, juzhe.zh...@rivai.ai wrote:
> Bootstrapped and Regression on X86 no surprise different.
> 
> Looking forward Kewen's test report for this patch.
> 
 
This patch can be bootstrapped and regress-tested on
powerpc64-linux-gnu P9 and powerpc64le-linux-gnu P9/P10.
 
Also SPEC2017 int/fp bmks build and run successfully
with it on powerpc64le-linux-gnu P10 (with an explicit
parameter --param=vect-partial-vector-usage=2).
 
It can fix the 510.parest_r -5% degradation, and it speed-ed up
525.x264_r +1%, 521.wrf_r +2.03%, 544.nab_r +1.27% and
549.fotonik3d_r +3.22%, but it degraded 503.bwaves_r -4%, we have
some heuristics on load and load pct. for 503.bwaves_r on Power,
I suspected it's related, by considering vect-partial-vector-usage=2
isn't default on Power and this can fix exposed failures and parest_r
degradation, I think the bwaves_r degradation should not block this.
For bwaves_r degradation, I'll have a further look later, open a PR
if it's an actual issue rather than just costing heuristics having
no effects.
 
btw, it would be better to add one PR marker line to associate
this with PR109971, something like:
 
PR tree-optimization/109971
 
Thanks!
 
BR,
Kewen
 
> Thanks.
> --
> juzhe.zh...@rivai.ai
> 
>  
> *From:* juzhe.zhong <mailto:juzhe.zh...@rivai.ai>
> *Date:* 2023-05-31 23:08
> *To:* gcc-patches <mailto:gcc-patches@gcc.gnu.org>
> *CC:* richard.sandiford <mailto:richard.sandif...@arm.com>; rguenther 
> <mailto:rguent...@suse.de>; linkw <mailto:li...@linux.ibm.com>; Ju-Zhe Zhong 
> <mailto:juzhe.zh...@rivai.ai>
> *Subject:* [PATCH V2] VECT: Change flow of decrement IV
> From: Ju-Zhe Zhong 
>  
> Follow Richi's suggestion, I change current decrement IV flow from:
>  
> do {
>remain -= MIN (vf, remain);
> } while (remain != 0);
>  
> into:
>  
> do {
>old_remain = remain;
>len = MIN (vf, remain);
>remain -= vf;
> } while (old_remain >= vf);
>  
> to enhance SCEV.
>  
> Include fixes from kewen.
>  
>  
> This patch will need to wait for Kewen's test feedback.
>  
> Testing on X86 is on-going
>  
> Co-Authored by: Kewen Lin  
>  
> gcc/ChangeLog:
>  
> * tree-vect-loop-manip.cc (vect_set_loop_controls_directly): 
> Change decrement IV flow.
> (vect_set_loop_condition_partial_vectors): Ditto.
>  
> ---
> gcc/tree-vect-loop-manip.cc | 36 +---
> 1 file changed, 25 insertions(+), 11 deletions(-)
>  
> diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc
> index acf3642ceb2..3f735945e67 100644
> --- a/gcc/tree-vect-loop-manip.cc
> +++ b/gcc/tree-vect-loop-manip.cc
> @@ -483,7 +483,7 @@ vect_set_loop_controls_directly (class loop *loop, 
> loop_vec_info loop_vinfo,
> gimple_stmt_iterator loop_cond_gsi,
> rgroup_controls *rgc, tree niters,
> tree niters_skip, bool might_wrap_p,
> - tree *iv_step)
> + tree *iv_step, tree *compare_step)
> {
>tree compare_type = LOOP_VINFO_RGROUP_COMPARE_TYPE (loop_vinfo);
>tree iv_type = LOOP_VINFO_RGROUP_IV_TYPE (loop_vinfo);
> @@ -538,9 +538,9 @@ vect_set_loop_controls_directly (class loop *loop, 
> loop_vec_info loop_vinfo,
>...
>vect__4.8_28 = .LEN_LOAD (_17, 32B

Re: [PATCH V2] VECT: Change flow of decrement IV

2023-05-31 Thread juzhe.zh...@rivai.ai
Bootstrapped and Regression on X86 no surprise different.

Looking forward Kewen's test report for this patch.

Thanks.


juzhe.zh...@rivai.ai
 
From: juzhe.zhong
Date: 2023-05-31 23:08
To: gcc-patches
CC: richard.sandiford; rguenther; linkw; Ju-Zhe Zhong
Subject: [PATCH V2] VECT: Change flow of decrement IV
From: Ju-Zhe Zhong 
 
Follow Richi's suggestion, I change current decrement IV flow from:
 
do {
   remain -= MIN (vf, remain);
} while (remain != 0);
 
into:
 
do {
   old_remain = remain;
   len = MIN (vf, remain);
   remain -= vf;
} while (old_remain >= vf);
 
to enhance SCEV.
 
Include fixes from kewen.
 
 
This patch will need to wait for Kewen's test feedback.
 
Testing on X86 is on-going
 
Co-Authored by: Kewen Lin  
 
gcc/ChangeLog:
 
* tree-vect-loop-manip.cc (vect_set_loop_controls_directly): Change 
decrement IV flow.
(vect_set_loop_condition_partial_vectors): Ditto.
 
---
gcc/tree-vect-loop-manip.cc | 36 +---
1 file changed, 25 insertions(+), 11 deletions(-)
 
diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc
index acf3642ceb2..3f735945e67 100644
--- a/gcc/tree-vect-loop-manip.cc
+++ b/gcc/tree-vect-loop-manip.cc
@@ -483,7 +483,7 @@ vect_set_loop_controls_directly (class loop *loop, 
loop_vec_info loop_vinfo,
gimple_stmt_iterator loop_cond_gsi,
rgroup_controls *rgc, tree niters,
tree niters_skip, bool might_wrap_p,
- tree *iv_step)
+ tree *iv_step, tree *compare_step)
{
   tree compare_type = LOOP_VINFO_RGROUP_COMPARE_TYPE (loop_vinfo);
   tree iv_type = LOOP_VINFO_RGROUP_IV_TYPE (loop_vinfo);
@@ -538,9 +538,9 @@ vect_set_loop_controls_directly (class loop *loop, 
loop_vec_info loop_vinfo,
   ...
   vect__4.8_28 = .LEN_LOAD (_17, 32B, _36, 0);
   ...
-ivtmp_35 = ivtmp_9 - _36;
+ivtmp_35 = ivtmp_9 - POLY_INT_CST [4, 4];
   ...
-if (ivtmp_35 != 0)
+if (ivtmp_9 > POLY_INT_CST [4, 4])
 goto ; [83.33%]
   else
 goto ; [16.67%]
@@ -549,13 +549,15 @@ vect_set_loop_controls_directly (class loop *loop, 
loop_vec_info loop_vinfo,
   tree step = rgc->controls.length () == 1 ? rgc->controls[0]
   : make_ssa_name (iv_type);
   /* Create decrement IV.  */
-  create_iv (nitems_total, MINUS_EXPR, step, NULL_TREE, loop, &incr_gsi,
- insert_after, &index_before_incr, &index_after_incr);
+  create_iv (nitems_total, MINUS_EXPR, nitems_step, NULL_TREE, loop,
+ &incr_gsi, insert_after, &index_before_incr,
+ &index_after_incr);
   gimple_seq_add_stmt (header_seq, gimple_build_assign (step, MIN_EXPR,
index_before_incr,
nitems_step));
   *iv_step = step;
-  return index_after_incr;
+  *compare_step = nitems_step;
+  return index_before_incr;
 }
   /* Create increment IV.  */
@@ -825,6 +827,7 @@ vect_set_loop_condition_partial_vectors (class loop *loop,
  arbitrarily pick the last.  */
   tree test_ctrl = NULL_TREE;
   tree iv_step = NULL_TREE;
+  tree compare_step = NULL_TREE;
   rgroup_controls *rgc;
   rgroup_controls *iv_rgc = nullptr;
   unsigned int i;
@@ -861,7 +864,7 @@ vect_set_loop_condition_partial_vectors (class loop *loop,
&preheader_seq, &header_seq,
loop_cond_gsi, rgc, niters,
niters_skip, might_wrap_p,
- &iv_step);
+ &iv_step, &compare_step);
iv_rgc = rgc;
  }
@@ -884,10 +887,21 @@ vect_set_loop_condition_partial_vectors (class loop *loop,
   /* Get a boolean result that tells us whether to iterate.  */
   edge exit_edge = single_exit (loop);
-  tree_code code = (exit_edge->flags & EDGE_TRUE_VALUE) ? EQ_EXPR : NE_EXPR;
-  tree zero_ctrl = build_zero_cst (TREE_TYPE (test_ctrl));
-  gcond *cond_stmt = gimple_build_cond (code, test_ctrl, zero_ctrl,
- NULL_TREE, NULL_TREE);
+  gcond *cond_stmt;
+  if (LOOP_VINFO_USING_DECREMENTING_IV_P (loop_vinfo))
+{
+  gcc_assert (compare_step);
+  tree_code code = (exit_edge->flags & EDGE_TRUE_VALUE) ? LE_EXPR : 
GT_EXPR;
+  cond_stmt = gimple_build_cond (code, test_ctrl, compare_step, NULL_TREE,
+  NULL_TREE);
+}
+  else
+{
+  tree_code code = (exit_edge->flags & EDGE_TRUE_VALUE) ? EQ_EXPR : 
NE_EXPR;
+  tree zero_ctrl = build_zero_cst (TREE_TYPE (test_ctrl));
+  cond_stmt
+ = gimple_build_cond (code, test_ctrl, zero_ctrl, NULL_TREE, NULL_TREE);
+}
   gsi_insert_before (&loop_cond_gsi, cond_stmt, GSI_SAME_STMT);
   /* The loop iterates (NITERS - 1) / VF + 1 times.
-- 
2.36.3
 


Re: Re: [PATCH] VECT: Change flow of decrement IV

2023-05-31 Thread juzhe.zh...@rivai.ai
Thanks Richard.
Seems that this patch's approach is ok to trunk?
Maybe the only thing we should do is to wait Kewen's testing feedback, am I 
right ?

Thanks.


juzhe.zh...@rivai.ai
 
From: Richard Sandiford
Date: 2023-05-31 17:01
To: Richard Biener via Gcc-patches
CC: Richard Biener; juzhe.zhong\@rivai.ai; linkw
Subject: Re: [PATCH] VECT: Change flow of decrement IV
Richard Biener via Gcc-patches  writes:
> On Wed, 31 May 2023, Richard Sandiford wrote:
>
>> Richard Biener  writes:
>> > On Wed, 31 May 2023, juzhe.zh...@rivai.ai wrote:
>> >
>> >> Hi?all. I have posted my several investigations:
>> >> https://gcc.gnu.org/pipermail/gcc-patches/2023-May/620101.html 
>> >> https://gcc.gnu.org/pipermail/gcc-patches/2023-May/620105.html 
>> >> https://gcc.gnu.org/pipermail/gcc-patches/2023-May/620108.html 
>> >> 
>> >> Turns out when "niters is a constant value and vf is a constant value"
>> >> This patch can allow SCEV/IVOPTS optimize a lot for RVV too (Take 
>> >> tesecase from IBM's testsuite for example) and I think this patch can fix 
>> >> IBM's cunroll issue.
>> >> Even though it will produce a 'mv' instruction in some ohter cases for 
>> >> RVV, I think Gain > Pain overal.
>> >> 
>> >> Actually, for current flow:
>> >> 
>> >> step = MIN ()
>> >> ...
>> >> remain = remain - step.
>> >> 
>> >> I don't know how difficult to extend SCEV/IVOPTS to fix this issue.
>> >> So, could you make a decision for this patch?
>> >> 
>> >> I wonder whether we should apply the approach of this patch (the codes 
>> >> can be refined after well reviewed) or
>> >> we should extend SCEV/IVOPTS ?
>> >
>> > I don't think we can do anything in SCEV for this which means we'd
>> > need to special-case this in niter analysis, in IVOPTs and any other
>> > passes that might be affected (and not fixed by handling it in niter
>> > analysis).  While improving niter analysis would be good (the user
>> > could write this pattern as well) I do not have time to try
>> > implementing that (I have no idea how ugly or robust it is going to be).
>> >
>> > So I think we should patch this up in the vectorizer itself like with
>> > your patch.  I'm going to wait for Richards input though since he
>> > seems to disagree.
>> 
>> I think my main disagreement is that the IV phi can be analysed
>> as a SCEV with sufficient work (realising that the MIN result is
>> always VF when the latch is executed).  That SCEV might be useful
>> ?as is? for things like IVOPTS, without specific work in those passes.
>> (Although perhaps not too useful, since most other IVs will be upcounting.)
>
> I think we'd need another API for SCEV there then,
> analyze_scalar_evolution_for_latch () so we can disregard the
> value on the exit edges then.  That means we'd still need to touch
> all users and decide whether it's safe to use that or not.
 
I'd expect the phi for the IV with the constant step to have the same
value as the phi for the IV with a MIN step.  I realise that the phi
isn't the thing that matters for niters, but I'd expect IVOPTS to
consider both the phi and the adjusted value to be candidates.  Only the
phi can be a candidate with the MIN step, but it feels like it should
still be a candidate, even with current interfaces.
 
You know this stuff much better than I do though, so I^m almost
certainly oversimplifying/overlooking things.
 
Like I say, I don't object to the vectoriser change, so please
don't go down a rabbit hole on my account. :)
 
Thanks,
Richard
 
 


Re: Re: [PATCH] VECT: Change flow of decrement IV

2023-05-31 Thread juzhe.zh...@rivai.ai
Oh, it's correct fix. Thanks for catching this.




juzhe.zh...@rivai.ai
 
From: Kewen.Lin
Date: 2023-05-31 15:38
To: juzhe.zh...@rivai.ai
CC: richard.sandiford; gcc-patches; rguenther
Subject: Re: [PATCH] VECT: Change flow of decrement IV
> Hi, Richi.
> 
>>> Note with SELECT_VL all bets will be off since as I understand the
>>> value it gives can vary from iteration to iteration (but we know
>>> a lower and maybe an upper bound?)
> Yes, in RVV side, the SELECT_VL output can be in range of [ceil(avl/2), 
> vlmax], 
> can be any value between the range depending on the hardware implementation.
> 
>>> So I think we should patch this up in the vectorizer itself like with
>>> your patch.  I'm going to wait for Richards input though since he
>>> seems to disagree.
> 
> According tohttps://gcc.gnu.org/bugzilla/show_bug.cgi?id=109971, 
> <https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109971,> 
> Kewen is happy with this patch, turns out this patch can fix power's issue.
 
Yeah, the exposed degradation and failures can be fixed by this patch.
I'd expect both approaches (this patch or extending niter analysis and
others) should work for the exposed issues.
 
A new finding is that my SPEC2017 rerun with this patch exposed some
verification failures, I made a regression test on Power10, it showed
a few failures too (mainly from fortran).  By looking into one of them
(case gfortran.dg/array_alloc_2.f90), I think the patch needs some
adjustment on chosen code according to exit_edge->flags like:
 
diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc
index ef28711c58f..5d518460b6d 100644
--- a/gcc/tree-vect-loop-manip.cc
+++ b/gcc/tree-vect-loop-manip.cc
@@ -892,8 +892,9 @@ vect_set_loop_condition_partial_vectors (class loop *loop,
   if (LOOP_VINFO_USING_DECREMENTING_IV_P (loop_vinfo))
 {
   gcc_assert (compare_step);
-  cond_stmt = gimple_build_cond (GT_EXPR, test_ctrl, compare_step,
-  NULL_TREE, NULL_TREE);
+  tree_code code = (exit_edge->flags & EDGE_TRUE_VALUE) ? LE_EXPR : 
GT_EXPR;
+  cond_stmt = gimple_build_cond (code, test_ctrl, compare_step, NULL_TREE,
+  NULL_TREE);
   gsi_insert_before (&loop_cond_gsi, cond_stmt, GSI_SAME_STMT);
 }
   else
 
I'm running regression testing again based on this adjustment, will see
if it can fix all exposed failures.
 
BR,
Kewen
 
> So, Let's wait for Richard's comments.
> 
> Thanks.
> ------
> juzhe.zh...@rivai.ai
> 
>  
> *From:* Richard Biener <mailto:rguent...@suse.de>
> *Date:* 2023-05-31 14:41
> *To:* juzhe.zh...@rivai.ai <mailto:juzhe.zh...@rivai.ai>
> *CC:* richard.sandiford <mailto:richard.sandif...@arm.com>; gcc-patches 
> <mailto:gcc-patches@gcc.gnu.org>; linkw <mailto:li...@linux.ibm.com>
> *Subject:* Re: Re: [PATCH] VECT: Change flow of decrement IV
> On Wed, 31 May 2023, juzhe.zh...@rivai.ai wrote:
>  
> > Hi?all. I have posted my several investigations:
> > https://gcc.gnu.org/pipermail/gcc-patches/2023-May/620101.html
> > https://gcc.gnu.org/pipermail/gcc-patches/2023-May/620105.html
> > https://gcc.gnu.org/pipermail/gcc-patches/2023-May/620108.html
> >
> > Turns out when "niters is a constant value and vf is a constant value"
> > This patch can allow SCEV/IVOPTS optimize a lot for RVV too (Take 
> tesecase from IBM's testsuite for example) and I think this patch can fix 
> IBM's cunroll issue.
> > Even though it will produce a 'mv' instruction in some ohter cases for 
> RVV, I think Gain > Pain overal.
> >
> > Actually, for current flow:
> >
> > step = MIN ()
> > ...
> > remain = remain - step.
> >
> > I

Re: Re: [PATCH] VECT: Change flow of decrement IV

2023-05-31 Thread juzhe.zh...@rivai.ai

>> I'm just saying that to go forward the vectorizer change looks
>>more promising (also considering the pace RISC-V people are working at
>>...)

Yeah,  RVV needs a lot of middle-end support:
SELECT_VL, LEN_MASK_LOAD/LEN_MASK_STORE,.etc

LEN_ADD for RVV reduction support like COND_ADD for ARM SVE...etc

SELECT_VL is still pending.

Without support in middle-end, GCC can not support powerful auto-vectorization 
(Performance will be much worse than RVV LLVM).
And unfortunately, I am the only guy working on middle-end support of RVV 
auto-vectorization. :)

I think we can make this patch merged and record the enhancement of SCEV in 
bugzilla to see we can improve that in the future.

Thanks.


juzhe.zh...@rivai.ai
 
From: Richard Biener
Date: 2023-05-31 15:38
To: Richard Sandiford
CC: juzhe.zh...@rivai.ai; gcc-patches; linkw
Subject: Re: [PATCH] VECT: Change flow of decrement IV
On Wed, 31 May 2023, Richard Sandiford wrote:
 
> Richard Biener  writes:
> > On Wed, 31 May 2023, juzhe.zh...@rivai.ai wrote:
> >
> >> Hi?all. I have posted my several investigations:
> >> https://gcc.gnu.org/pipermail/gcc-patches/2023-May/620101.html 
> >> https://gcc.gnu.org/pipermail/gcc-patches/2023-May/620105.html 
> >> https://gcc.gnu.org/pipermail/gcc-patches/2023-May/620108.html 
> >> 
> >> Turns out when "niters is a constant value and vf is a constant value"
> >> This patch can allow SCEV/IVOPTS optimize a lot for RVV too (Take tesecase 
> >> from IBM's testsuite for example) and I think this patch can fix IBM's 
> >> cunroll issue.
> >> Even though it will produce a 'mv' instruction in some ohter cases for 
> >> RVV, I think Gain > Pain overal.
> >> 
> >> Actually, for current flow:
> >> 
> >> step = MIN ()
> >> ...
> >> remain = remain - step.
> >> 
> >> I don't know how difficult to extend SCEV/IVOPTS to fix this issue.
> >> So, could you make a decision for this patch?
> >> 
> >> I wonder whether we should apply the approach of this patch (the codes can 
> >> be refined after well reviewed) or
> >> we should extend SCEV/IVOPTS ?
> >
> > I don't think we can do anything in SCEV for this which means we'd
> > need to special-case this in niter analysis, in IVOPTs and any other
> > passes that might be affected (and not fixed by handling it in niter
> > analysis).  While improving niter analysis would be good (the user
> > could write this pattern as well) I do not have time to try
> > implementing that (I have no idea how ugly or robust it is going to be).
> >
> > So I think we should patch this up in the vectorizer itself like with
> > your patch.  I'm going to wait for Richards input though since he
> > seems to disagree.
> 
> I think my main disagreement is that the IV phi can be analysed
> as a SCEV with sufficient work (realising that the MIN result is
> always VF when the latch is executed).  That SCEV might be useful
> ?as is? for things like IVOPTS, without specific work in those passes.
> (Although perhaps not too useful, since most other IVs will be upcounting.)
 
I think we'd need another API for SCEV there then,
analyze_scalar_evolution_for_latch () so we can disregard the
value on the exit edges then.  That means we'd still need to touch
all users and decide whether it's safe to use that or not.
 
> I don't object though.  It just feels like we're giving up easily.
> And that's a bit frustrating, since this potential problem was flagged
> ahead of time.
 
Well, I expect that massaging SCEV and niter analysis will take
up quite some developer time while avoiding the situation in
the vectorizer is possible (and would fix the observed regressions).
We can always improve later here and I'd suggest to file an
enhancement bugreport with a simple C testcase using this kind of
iteration.
 
I'm just saying that to go forward the vectorizer change looks
more promising (also considering the pace RISC-V people are working at 
...)
 
Richard.
 
> > Note with SELECT_VL all bets will be off since as I understand the
> > value it gives can vary from iteration to iteration (but we know
> > a lower and maybe an upper bound?)
> 
> Right.  All IVs will have a variable step for SELECT_VL.
> 
> Thanks,
> Richard
> 
 


Re: Re: [PATCH] VECT: Change flow of decrement IV

2023-05-31 Thread juzhe.zh...@rivai.ai
Hi, Richard.

>> I don't object though.  It just feels like we're giving up easily.
>> And that's a bit frustrating, since this potential problem was flagged
>> ahead of time.

I can take a look at it. Would you mind giving me some hints?
Should I do this in which PASS ? "ivopts" PASS?
Is that right that we can enhance analysis when we see the statement as follows:
remain = remain - step and step is coming from a MIN_EXPR (remain, vf).
Then what we need to do?
 
Thanks.


juzhe.zh...@rivai.ai
 
From: Richard Sandiford
Date: 2023-05-31 15:28
To: Richard Biener
CC: juzhe.zhong\@rivai.ai; gcc-patches; linkw
Subject: Re: [PATCH] VECT: Change flow of decrement IV
Richard Biener  writes:
> On Wed, 31 May 2023, juzhe.zh...@rivai.ai wrote:
>
>> Hi?all. I have posted my several investigations:
>> https://gcc.gnu.org/pipermail/gcc-patches/2023-May/620101.html 
>> https://gcc.gnu.org/pipermail/gcc-patches/2023-May/620105.html 
>> https://gcc.gnu.org/pipermail/gcc-patches/2023-May/620108.html 
>> 
>> Turns out when "niters is a constant value and vf is a constant value"
>> This patch can allow SCEV/IVOPTS optimize a lot for RVV too (Take tesecase 
>> from IBM's testsuite for example) and I think this patch can fix IBM's 
>> cunroll issue.
>> Even though it will produce a 'mv' instruction in some ohter cases for RVV, 
>> I think Gain > Pain overal.
>> 
>> Actually, for current flow:
>> 
>> step = MIN ()
>> ...
>> remain = remain - step.
>> 
>> I don't know how difficult to extend SCEV/IVOPTS to fix this issue.
>> So, could you make a decision for this patch?
>> 
>> I wonder whether we should apply the approach of this patch (the codes can 
>> be refined after well reviewed) or
>> we should extend SCEV/IVOPTS ?
>
> I don't think we can do anything in SCEV for this which means we'd
> need to special-case this in niter analysis, in IVOPTs and any other
> passes that might be affected (and not fixed by handling it in niter
> analysis).  While improving niter analysis would be good (the user
> could write this pattern as well) I do not have time to try
> implementing that (I have no idea how ugly or robust it is going to be).
>
> So I think we should patch this up in the vectorizer itself like with
> your patch.  I'm going to wait for Richards input though since he
> seems to disagree.
 
I think my main disagreement is that the IV phi can be analysed
as a SCEV with sufficient work (realising that the MIN result is
always VF when the latch is executed).  That SCEV might be useful
“as is” for things like IVOPTS, without specific work in those passes.
(Although perhaps not too useful, since most other IVs will be upcounting.)
 
I don't object though.  It just feels like we're giving up easily.
And that's a bit frustrating, since this potential problem was flagged
ahead of time.
 
> Note with SELECT_VL all bets will be off since as I understand the
> value it gives can vary from iteration to iteration (but we know
> a lower and maybe an upper bound?)
 
Right.  All IVs will have a variable step for SELECT_VL.
 
Thanks,
Richard
 


Re: Re: [PATCH] VECT: Change flow of decrement IV

2023-05-30 Thread juzhe.zh...@rivai.ai

Hi, Richi.

>> Note with SELECT_VL all bets will be off since as I understand the
>> value it gives can vary from iteration to iteration (but we know
>> a lower and maybe an upper bound?)
Yes, in RVV side, the SELECT_VL output can be in range of [ceil(avl/2), vlmax], 
can be any value between the range depending on the hardware implementation.

>> So I think we should patch this up in the vectorizer itself like with
>> your patch.  I'm going to wait for Richards input though since he
>> seems to disagree.

According to https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109971, 
Kewen is happy with this patch, turns out this patch can fix power's issue.
So, Let's wait for Richard's comments.

Thanks.


juzhe.zh...@rivai.ai
 
From: Richard Biener
Date: 2023-05-31 14:41
To: juzhe.zh...@rivai.ai
CC: richard.sandiford; gcc-patches; linkw
Subject: Re: Re: [PATCH] VECT: Change flow of decrement IV
On Wed, 31 May 2023, juzhe.zh...@rivai.ai wrote:
 
> Hi?all. I have posted my several investigations:
> https://gcc.gnu.org/pipermail/gcc-patches/2023-May/620101.html 
> https://gcc.gnu.org/pipermail/gcc-patches/2023-May/620105.html 
> https://gcc.gnu.org/pipermail/gcc-patches/2023-May/620108.html 
> 
> Turns out when "niters is a constant value and vf is a constant value"
> This patch can allow SCEV/IVOPTS optimize a lot for RVV too (Take tesecase 
> from IBM's testsuite for example) and I think this patch can fix IBM's 
> cunroll issue.
> Even though it will produce a 'mv' instruction in some ohter cases for RVV, I 
> think Gain > Pain overal.
> 
> Actually, for current flow:
> 
> step = MIN ()
> ...
> remain = remain - step.
> 
> I don't know how difficult to extend SCEV/IVOPTS to fix this issue.
> So, could you make a decision for this patch?
> 
> I wonder whether we should apply the approach of this patch (the codes can be 
> refined after well reviewed) or
> we should extend SCEV/IVOPTS ?
 
I don't think we can do anything in SCEV for this which means we'd
need to special-case this in niter analysis, in IVOPTs and any other
passes that might be affected (and not fixed by handling it in niter
analysis).  While improving niter analysis would be good (the user
could write this pattern as well) I do not have time to try
implementing that (I have no idea how ugly or robust it is going to be).
 
So I think we should patch this up in the vectorizer itself like with
your patch.  I'm going to wait for Richards input though since he
seems to disagree.
 
Note with SELECT_VL all bets will be off since as I understand the
value it gives can vary from iteration to iteration (but we know
a lower and maybe an upper bound?)
 
Thanks,
Richard.
 
> Thanks. 
> 
> 
> juzhe.zh...@rivai.ai
>  
> From: ???
> Date: 2023-05-30 23:05
> To: rguenther
> CC: richard.sandiford; gcc-patches; linkw
> Subject: Re: Re: [PATCH] VECT: Change flow of decrement IV
> More information of power's testcase:
> 
> Before this patch:
> test_npeel_int16_t:
> lui a4,%hi(.LANCHOR0+130)
> lui a3,%hi(.LANCHOR1)
> addi a3,a3,%lo(.LANCHOR1)
> addi a4,a4,%lo(.LANCHOR0+130)
> li a5,58
> li a2,16
> vsetivli zero,16,e16,m1,ta,ma
> vl1re16.v v3,0(a3)
> vid.v v1
> .L5:
> minu a3,a5,a2
> vsetvli zero,a3,e16,m1,ta,ma
> sub a5,a5,a3
> vse16.v v1,0(a4)
> vsetivli zero,16,e16,m1,ta,ma
> addi a4,a4,32
> vadd.vv v1,v1,v3
> bne a5,zero,.L5
> ret
> 
> After this patch:
> test_npeel_int16_t:
> lui a5,%hi(.LANCHOR0)
> addi a5,a5,%lo(.LANCHOR0)
> li a1,16
> vsetivli zero,16,e16,m1,ta,ma
> addi a2,a5,130
> vid.v v1
> addi a3,a5,162
> vadd.vx v4,v1,a1
> addi a4,a5,194
> li a1,32
> vadd.vx v3,v1,a1
> vse16.v v1,0(a2)
> vse16.v v4,0(a3)
> vse16.v v3,0(a4)
> addi a5,a5,226
> li a1,48
> vadd.vx v2,v1,a1
> vsetivli zero,10,e16,m1,ta,ma
> vse16.v v2,0(a5)
> ret
> 
> It's obvious, previously, power's testcase in RVV side can not unroll, but 
> after this patch, in RVV side, it can unroll now.
> 
> 
> juzhe.zh...@rivai.ai
>  
> From: Richard Biener
> Date: 2023-05-30 20:33
> To: juzhe.zhong
> CC: Richard Sandiford; gcc-patches; linkw
> Subject: Re: [PATCH] VECT: Change flow of decrement IV
> On Tue, 30 May 2023, juzhe.zhong wrote:
>  
> > This patch will generate the number of rgroup ?mov? instructions inside the
> > loop. This is unacceptable. For example?if number of rgroups=3? will be 3 
> > more
> > instruction in loop. If this patch is necessary? I think I should find a way
> > to fix it.
>  
> That's odd, you only need to adjust the IV which is used in the exit test,
> not all the others.
>  
> >  Replied Message 
> > From
>

Re: Re: [PATCH] VECT: Change flow of decrement IV

2023-05-30 Thread juzhe.zh...@rivai.ai
Hi,all. I have posted my several investigations:
https://gcc.gnu.org/pipermail/gcc-patches/2023-May/620101.html 
https://gcc.gnu.org/pipermail/gcc-patches/2023-May/620105.html 
https://gcc.gnu.org/pipermail/gcc-patches/2023-May/620108.html 

Turns out when "niters is a constant value and vf is a constant value"
This patch can allow SCEV/IVOPTS optimize a lot for RVV too (Take tesecase from 
IBM's testsuite for example) and I think this patch can fix IBM's cunroll issue.
Even though it will produce a 'mv' instruction in some ohter cases for RVV, I 
think Gain > Pain overal.

Actually, for current flow:

step = MIN ()
...
remain = remain - step.

I don't know how difficult to extend SCEV/IVOPTS to fix this issue.
So, could you make a decision for this patch?

I wonder whether we should apply the approach of this patch (the codes can be 
refined after well reviewed) or
we should extend SCEV/IVOPTS ?

Thanks. 


juzhe.zh...@rivai.ai
 
From: 钟居哲
Date: 2023-05-30 23:05
To: rguenther
CC: richard.sandiford; gcc-patches; linkw
Subject: Re: Re: [PATCH] VECT: Change flow of decrement IV
More information of power's testcase:

Before this patch:
test_npeel_int16_t:
lui a4,%hi(.LANCHOR0+130)
lui a3,%hi(.LANCHOR1)
addi a3,a3,%lo(.LANCHOR1)
addi a4,a4,%lo(.LANCHOR0+130)
li a5,58
li a2,16
vsetivli zero,16,e16,m1,ta,ma
vl1re16.v v3,0(a3)
vid.v v1
.L5:
minu a3,a5,a2
vsetvli zero,a3,e16,m1,ta,ma
sub a5,a5,a3
vse16.v v1,0(a4)
vsetivli zero,16,e16,m1,ta,ma
addi a4,a4,32
vadd.vv v1,v1,v3
bne a5,zero,.L5
ret

After this patch:
test_npeel_int16_t:
lui a5,%hi(.LANCHOR0)
addi a5,a5,%lo(.LANCHOR0)
li a1,16
vsetivli zero,16,e16,m1,ta,ma
addi a2,a5,130
vid.v v1
addi a3,a5,162
vadd.vx v4,v1,a1
addi a4,a5,194
li a1,32
vadd.vx v3,v1,a1
vse16.v v1,0(a2)
vse16.v v4,0(a3)
vse16.v v3,0(a4)
addi a5,a5,226
li a1,48
vadd.vx v2,v1,a1
vsetivli zero,10,e16,m1,ta,ma
vse16.v v2,0(a5)
ret

It's obvious, previously, power's testcase in RVV side can not unroll, but 
after this patch, in RVV side, it can unroll now.


juzhe.zh...@rivai.ai
 
From: Richard Biener
Date: 2023-05-30 20:33
To: juzhe.zhong
CC: Richard Sandiford; gcc-patches; linkw
Subject: Re: [PATCH] VECT: Change flow of decrement IV
On Tue, 30 May 2023, juzhe.zhong wrote:
 
> This patch will generate the number of rgroup ?mov? instructions inside the
> loop. This is unacceptable. For example?if number of rgroups=3? will be 3 more
> instruction in loop. If this patch is necessary? I think I should find a way
> to fix it.
 
That's odd, you only need to adjust the IV which is used in the exit test,
not all the others.
 
>  Replied Message 
> From
> Richard Sandiford
> Date
> 05/30/2023 19:41
> To
> juzhe.zh...@rivai.ai
> Cc
> gcc-patches,
> rguenther,
> linkw
> Subject
> Re: [PATCH] VECT: Change flow of decrement IV
> "juzhe.zh...@rivai.ai"  writes:
> > Before this patch:
> > foo:
> > ble a2,zero,.L5
> > csrr a3,vlenb
> > srli a4,a3,2
> > .L3:
> > minu a5,a2,a4
> > vsetvli zero,a5,e32,m1,ta,ma
> > vle32.v v2,0(a1)
> > vle32.v v1,0(a0)
> > vsetvli t1,zero,e32,m1,ta,ma
> > vadd.vv v1,v1,v2
> > vsetvli zero,a5,e32,m1,ta,ma
> > vse32.v v1,0(a0)
> > add a1,a1,a3
> > add a0,a0,a3
> >   sub   a2,a2,a5
> > bne a2,zero,.L3
> > .L5:
> > ret
> >
> > After this patch:
> >
> > foo:
> > ble a2,zero,.L5
> > csrr a3,vlenb
> > srli a4,a3,2
> > neg a7,a4   -->>>additional instruction
> > .L3:
> > minu a5,a2,a4
> > vsetvli zero,a5,e32,m1,ta,ma
> > vle32.v v2,0(a1)
> > vle32.v v1,0(a0)
> > vsetvli t1,zero,e32,m1,ta,ma
> > mv a6,a2  -->>>additional instruction
> > vadd.vv v1,v1,v2
> > vsetvli zero,a5,e32,m1,ta,ma
> > vse32.v v1,0(a0)
> > add a1,a1,a3
> > add a0,a0,a3
> > add a2,a2,a7
> > bgtu a6,a4,.L3
> > .L5:
> > ret
> >
> > There is 1 more instruction in preheader and 1 more instruction in loop.
> > But I think it's OK for RVV since we will definitely be using SELECT_VL so
> this issue will gone.
> 
> But what about cases where you won't be using SELECT_VL, such as SLP?
> 
> Richard
> 
> 
 
-- 
Richard Biener 
SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg,
Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman;
HRB 36809 (AG Nuernberg)


Re: Re: [PATCH] VECT: Change flow of decrement IV

2023-05-30 Thread juzhe.zh...@rivai.ai
>> How does it affect RVV code quality?  I thought you specifically chose
>> the previous approach because code quality was better that way.
Yes, previous way is better for RVV.  But as I said, we will definitely use 
SELECT_VL then
in SELECT_VL,  we will using remain - step (produced by SELET_VL).



juzhe.zh...@rivai.ai
 
From: Richard Sandiford
Date: 2023-05-30 19:31
To: juzhe.zhong
CC: gcc-patches; rguenther; linkw
Subject: Re: [PATCH] VECT: Change flow of decrement IV
juzhe.zh...@rivai.ai writes:
> From: Ju-Zhe Zhong 
>
> Follow Richi's suggestion, I change current decrement IV flow from:
>
> do {
>remain -= MIN (vf, remain);
> } while (remain != 0);
>
> into:
>
> do {
>old_remain = remain;
>len = MIN (vf, remain);
>remain -= vf;
> } while (old_remain >= vf);
>
> to enhance SCEV.
>
> ALL tests (decrement IV) of RVV are passed.
 
How does it affect RVV code quality?  I thought you specifically chose
the previous approach because code quality was better that way.
 
Richard
 


Re: Re: [PATCH] VECT: Change flow of decrement IV

2023-05-30 Thread juzhe.zh...@rivai.ai
Before this patch:
foo:
ble a2,zero,.L5
csrr a3,vlenb
srli a4,a3,2
.L3:
minu a5,a2,a4
vsetvli zero,a5,e32,m1,ta,ma
vle32.v v2,0(a1)
vle32.v v1,0(a0)
vsetvli t1,zero,e32,m1,ta,ma
vadd.vv v1,v1,v2
vsetvli zero,a5,e32,m1,ta,ma
vse32.v v1,0(a0)
add a1,a1,a3
add a0,a0,a3
  sub   a2,a2,a5
bne a2,zero,.L3
.L5:
ret

After this patch:

foo:
ble a2,zero,.L5
csrr a3,vlenb
srli a4,a3,2
neg a7,a4   -->>>additional instruction
.L3:
minu a5,a2,a4
vsetvli zero,a5,e32,m1,ta,ma
vle32.v v2,0(a1)
vle32.v v1,0(a0)
vsetvli t1,zero,e32,m1,ta,ma
mv a6,a2  -->>>additional instruction
vadd.vv v1,v1,v2
vsetvli zero,a5,e32,m1,ta,ma
vse32.v v1,0(a0)
add a1,a1,a3
add a0,a0,a3
add a2,a2,a7
bgtu a6,a4,.L3
.L5:
ret

There is 1 more instruction in preheader and 1 more instruction in loop.
But I think it's OK for RVV since we will definitely be using SELECT_VL so this 
issue will gone.
As long as this flow is better to power (SCEV)。



juzhe.zh...@rivai.ai
 
From: Richard Sandiford
Date: 2023-05-30 19:31
To: juzhe.zhong
CC: gcc-patches; rguenther; linkw
Subject: Re: [PATCH] VECT: Change flow of decrement IV
juzhe.zh...@rivai.ai writes:
> From: Ju-Zhe Zhong 
>
> Follow Richi's suggestion, I change current decrement IV flow from:
>
> do {
>remain -= MIN (vf, remain);
> } while (remain != 0);
>
> into:
>
> do {
>old_remain = remain;
>len = MIN (vf, remain);
>remain -= vf;
> } while (old_remain >= vf);
>
> to enhance SCEV.
>
> ALL tests (decrement IV) of RVV are passed.
 
How does it affect RVV code quality?  I thought you specifically chose
the previous approach because code quality was better that way.
 
Richard
 


Re: Re: decremnt IV patch create fails on PowerPC

2023-05-30 Thread juzhe.zh...@rivai.ai
Hi, Richi.
I have send patch by following your suggestion and change the decrement IV 
follow:
https://gcc.gnu.org/pipermail/gcc-patches/2023-May/620086.html 

It works well in RVV.

Could you take a look at it?
If it's ok, I will send patch of SELECT_VL base on this.

Thanks.


juzhe.zh...@rivai.ai
 
From: Richard Biener
Date: 2023-05-30 17:50
To: juzhe.zh...@rivai.ai
CC: gcc-patches; richard.sandiford; linkw
Subject: Re: Re: decremnt IV patch create fails on PowerPC
On Tue, 30 May 2023, juzhe.zh...@rivai.ai wrote:
 
> Ok.
> 
> It seems that for this conditions:
> 
> +  /* If we're vectorizing a loop that uses length "controls" and
> + can iterate more than once, we apply decrementing IV approach
> + in loop control.  */
> +  if (LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo)
> +  && !LOOP_VINFO_LENS (loop_vinfo).is_empty ()
> +  && LOOP_VINFO_PARTIAL_LOAD_STORE_BIAS (loop_vinfo) == 0
> +  && !(LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo)
> +&& known_le (LOOP_VINFO_INT_NITERS (loop_vinfo),
> + LOOP_VINFO_VECT_FACTOR (loop_vinfo
> +LOOP_VINFO_USING_DECREMENTING_IV_P (loop_vinfo) = true;
> 
> I should add direct_supportted_p (SELECT_VL...) to this is that right?
 
No, since powerpc is fine with decrementing VL it should also use it.
Instead you should make sure to produce SCEV analyzable IVs when
possible (when SELECT_VL is not or cannot be used).
 
Richard.
 
> I have send SELECT_VL patch. I will add this in next SELECT_VL patch.
> 
> Let's wait Richard's more comments.
> 
> Thanks.
> 
> 
> juzhe.zh...@rivai.ai
>  
> From: Richard Biener
> Date: 2023-05-30 17:22
> To: juzhe.zh...@rivai.ai
> CC: gcc-patches; richard.sandiford; linkw
> Subject: Re: Re: decremnt IV patch create fails on PowerPC
> On Fri, 26 May 2023, juzhe.zh...@rivai.ai wrote:
>  
> > Hi, Richi. Thanks for your analysis and helps.
> > 
> > >> We could simply retain the original
> > >> incrementing IV for loop control and add the decrementing
> > >> IV for computing LEN in addition to that and leave IVOPTs
> > >> sorting out to eventually merge them (or not).
> > 
> > I am not sure how to do that. Could you give me more informations?
> > 
> > I somehow understand your concern is that variable amount of IV will make
> > IVOPT fails. 
> > 
> > I have seen similar situation in LLVM (when apply variable IV,
> > they failed to interleave the vectorize code). I am not sure whether they
> > are the same reason for that.
> > 
> > For RVV, we not only want decrement IV style in vectorization but also
> > we want to apply SELECT_VL in single-rgroup which is most happen cases 
> > (LLVM also only apply get_vector_length in single vector length).
> >
> > >>You can do some testing with a cross compiler, alternatively
> > >>there are powerpc machines in the GCC compile farm.
> > 
> > It seems that Power is ok with decrement IV since most cases are improved.
>  
> Well, but Power never will have SELECT_VL so at least for !SELECT_VL
> targets you should avoid having an IV with variable decrement.  As
> I said it should be easy to rewrite decrement IV to use a constant
> increment (when not using SELECT_VL) and testing the pre-decrement
> value in the exit test.
>  
> Richard.
> > I think Richard may help to explain decrement IV more clearly.
> > 
> > Thanks
> > 
> > 
> > juzhe.zh...@rivai.ai
> >  
> > From: Richard Biener
> > Date: 2023-05-26 14:46
> > To: ???
> > CC: gcc-patches; richard.sandiford; linkw
> > Subject: Re: decremnt IV patch create fails on PowerPC
> > On Fri, 26 May 2023, ??? wrote:
> >  
> > > Yesterday's patch has been approved (decremnt IV support):
> > > https://gcc.gnu.org/pipermail/gcc-patches/2023-May/619663.html 
> > > 
> > > However, it creates fails on PowerPC:
> > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109971 
> > > 
> > > I am really sorry for causing inconvinience.
> > > 
> > > I wonder as we disccussed:
> > > +  /* If we're vectorizing a loop that uses length "controls" and
> > > + can iterate more than once, we apply decrementing IV approach
> > > + in loop control.  */
> > > +  if (LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo)
> > > +  && !LOOP_VINFO_LENS (loop_vinfo).is_empty ()
> > > +  && LOOP_VINFO_PARTIAL_LOAD_STORE_BIAS (loop_vinfo) == 0
> > > +  && !(LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo)
> > > +&

Re: Re: decremnt IV patch create fails on PowerPC

2023-05-30 Thread juzhe.zh...@rivai.ai
>> No, I said the current scheme does sth along

>> do {
>>remain -= MIN (vf, remain);
>> } while (remain != 0);

>> and I suggest to instead do

>> do {
>>old_remain = remain;
>>len = MIN (vf, remain);
>>remain -= vf;
>> } while (old_remain >= vf);

>> basically since only the last iteration will have len < vf we can
>> ignore that remain -= vf will underflow there if we appropriately
>> rewrite the exit test to use the pre-decrement value.

Oh, I understand you now. I will definitely have a try and send a patch.

Thank you so much.

By the way, could you take a look at SELECT_VL patch?
I guess you want to defer it to Richard and I will wait but still I think your 
comment is very important.

Thanks.


juzhe.zh...@rivai.ai
 
From: Richard Biener
Date: 2023-05-30 18:00
To: Kewen.Lin
CC: juzhe.zh...@rivai.ai; gcc-patches; richard.sandiford
Subject: Re: decremnt IV patch create fails on PowerPC
On Tue, 30 May 2023, Kewen.Lin wrote:
 
> on 2023/5/30 17:26, juzhe.zh...@rivai.ai wrote:
> > Ok.
> > 
> > It seems that for this conditions:
> > 
> > +  /* If we're vectorizing a loop that uses length "controls" and
> > + can iterate more than once, we apply decrementing IV approach
> > + in loop control.  */
> > +  if (LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo)
> > +  && !LOOP_VINFO_LENS (loop_vinfo).is_empty ()
> > +  && LOOP_VINFO_PARTIAL_LOAD_STORE_BIAS (loop_vinfo) == 0
> > +  && !(LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo)
> > +&& known_le (LOOP_VINFO_INT_NITERS (loop_vinfo),
> > + LOOP_VINFO_VECT_FACTOR (loop_vinfo
> > +LOOP_VINFO_USING_DECREMENTING_IV_P (loop_vinfo) = true;
> > 
> > 
> > I should add direct_supportted_p (SELECT_VL...) to this is that right?
> 
> I guess no, with this condition any targets without SELECT_VL are unable
> to leverage the new decrement scheme for lengths, as your reply in PR109971
> you didn't meant to disable it.  IIUC, what Richi suggested is to introduce
> one new IV just like the previous one which has non-variable step, then it's
> SCEV-ed and some analysis based on it can do a good job.
 
No, I said the current scheme does sth along
 
do {
   remain -= MIN (vf, remain);
} while (remain != 0);
 
and I suggest to instead do
 
do {
   old_remain = remain;
   len = MIN (vf, remain);
   remain -= vf;
} while (old_remain >= vf);
 
basically since only the last iteration will have len < vf we can
ignore that remain -= vf will underflow there if we appropriately
rewrite the exit test to use the pre-decrement value.
 
> Since this is mainly for targets without SELECT_VL capability, I can follow
> up this if you don't mind.
> 
> BR,
> Kewen
> 
 
-- 
Richard Biener 
SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg,
Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman;
HRB 36809 (AG Nuernberg)
 


Re: Re: decremnt IV patch create fails on PowerPC

2023-05-30 Thread juzhe.zh...@rivai.ai
>> No, since powerpc is fine with decrementing VL it should also use it.
>>Instead you should make sure to produce SCEV analyzable IVs when
>>possible (when SELECT_VL is not or cannot be used).
Ok. Would you mind giving me the guideline how to rewrite the decrement IV?
Since I am not familiar with SCEV and I am not sure how to do that SCEV can 
analysis the decrement IV.



juzhe.zh...@rivai.ai
 
From: Richard Biener
Date: 2023-05-30 17:50
To: juzhe.zh...@rivai.ai
CC: gcc-patches; richard.sandiford; linkw
Subject: Re: Re: decremnt IV patch create fails on PowerPC
On Tue, 30 May 2023, juzhe.zh...@rivai.ai wrote:
 
> Ok.
> 
> It seems that for this conditions:
> 
> +  /* If we're vectorizing a loop that uses length "controls" and
> + can iterate more than once, we apply decrementing IV approach
> + in loop control.  */
> +  if (LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo)
> +  && !LOOP_VINFO_LENS (loop_vinfo).is_empty ()
> +  && LOOP_VINFO_PARTIAL_LOAD_STORE_BIAS (loop_vinfo) == 0
> +  && !(LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo)
> +&& known_le (LOOP_VINFO_INT_NITERS (loop_vinfo),
> + LOOP_VINFO_VECT_FACTOR (loop_vinfo
> +LOOP_VINFO_USING_DECREMENTING_IV_P (loop_vinfo) = true;
> 
> I should add direct_supportted_p (SELECT_VL...) to this is that right?
 
No, since powerpc is fine with decrementing VL it should also use it.
Instead you should make sure to produce SCEV analyzable IVs when
possible (when SELECT_VL is not or cannot be used).
 
Richard.
 
> I have send SELECT_VL patch. I will add this in next SELECT_VL patch.
> 
> Let's wait Richard's more comments.
> 
> Thanks.
> 
> 
> juzhe.zh...@rivai.ai
>  
> From: Richard Biener
> Date: 2023-05-30 17:22
> To: juzhe.zh...@rivai.ai
> CC: gcc-patches; richard.sandiford; linkw
> Subject: Re: Re: decremnt IV patch create fails on PowerPC
> On Fri, 26 May 2023, juzhe.zh...@rivai.ai wrote:
>  
> > Hi, Richi. Thanks for your analysis and helps.
> > 
> > >> We could simply retain the original
> > >> incrementing IV for loop control and add the decrementing
> > >> IV for computing LEN in addition to that and leave IVOPTs
> > >> sorting out to eventually merge them (or not).
> > 
> > I am not sure how to do that. Could you give me more informations?
> > 
> > I somehow understand your concern is that variable amount of IV will make
> > IVOPT fails. 
> > 
> > I have seen similar situation in LLVM (when apply variable IV,
> > they failed to interleave the vectorize code). I am not sure whether they
> > are the same reason for that.
> > 
> > For RVV, we not only want decrement IV style in vectorization but also
> > we want to apply SELECT_VL in single-rgroup which is most happen cases 
> > (LLVM also only apply get_vector_length in single vector length).
> >
> > >>You can do some testing with a cross compiler, alternatively
> > >>there are powerpc machines in the GCC compile farm.
> > 
> > It seems that Power is ok with decrement IV since most cases are improved.
>  
> Well, but Power never will have SELECT_VL so at least for !SELECT_VL
> targets you should avoid having an IV with variable decrement.  As
> I said it should be easy to rewrite decrement IV to use a constant
> increment (when not using SELECT_VL) and testing the pre-decrement
> value in the exit test.
>  
> Richard.
> > I think Richard may help to explain decrement IV more clearly.
> > 
> > Thanks
> > 
> > 
> > juzhe.zh...@rivai.ai
> >  
> > From: Richard Biener
> > Date: 2023-05-26 14:46
> > To: ???
> > CC: gcc-patches; richard.sandiford; linkw
> > Subject: Re: decremnt IV patch create fails on PowerPC
> > On Fri, 26 May 2023, ??? wrote:
> >  
> > > Yesterday's patch has been approved (decremnt IV support):
> > > https://gcc.gnu.org/pipermail/gcc-patches/2023-May/619663.html 
> > > 
> > > However, it creates fails on PowerPC:
> > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109971 
> > > 
> > > I am really sorry for causing inconvinience.
> > > 
> > > I wonder as we disccussed:
> > > +  /* If we're vectorizing a loop that uses length "controls" and
> > > + can iterate more than once, we apply decrementing IV approach
> > > + in loop control.  */
> > > +  if (LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo)
> > > +  && !LOOP_VINFO_LENS (loop_vinfo).is_empty ()
> > > +  && LOOP_VINFO_PARTIAL_LOAD_STORE_BIAS (loop_vinfo) == 0
> > &

Re: Re: [PATCH] RISC-V: Basic VLS code gen for RISC-V

2023-05-30 Thread juzhe.zh...@rivai.ai
I think I prefer doing VLS mode like these:
This is current VLA patterns:
(define_insn "@pred_"
  [(set (match_operand:VI 0 "register_operand"   "=vd, vd, vr, vr, vd, 
vd, vr, vr, vd, vd, vr, vr")
  (if_then_else:VI
(unspec:
  [(match_operand: 1 "vector_mask_operand" " vm, vm,Wc1, Wc1, vm, 
vm,Wc1,Wc1, vm, vm,Wc1,Wc1")
   (match_operand 5 "vector_length_operand"" rK, rK, rK,  rK, rK, rK, 
rK, rK, rK, rK, rK, rK")
   (match_operand 6 "const_int_operand""  i,  i,  i,   i,  i,  i,  
i,  i,  i,  i,  i,  i")
   (match_operand 7 "const_int_operand""  i,  i,  i,   i,  i,  i,  
i,  i,  i,  i,  i,  i")
   (match_operand 8 "const_int_operand""  i,  i,  i,   i,  i,  i,  
i,  i,  i,  i,  i,  i")
   (reg:SI VL_REGNUM)
   (reg:SI VTYPE_REGNUM)] UNSPEC_VPREDICATE)
(any_int_binop:VI
  (match_operand:VI 3 "" "")
  (match_operand:VI 4 "" ""))
(match_operand:VI 2 "vector_merge_operand" 
"vu,0,vu,0,vu,0,vu,0,vu,0,vu,0")))]
  "TARGET_VECTOR"
  "@
   v.vv\t%0,%3,%4%p1
   v.vv\t%0,%3,%4%p1
   v.vv\t%0,%3,%4%p1
   v.vv\t%0,%3,%4%p1
   v\t%0,%p1
   v\t%0,%p1
   v\t%0,%p1
   v\t%0,%p1
   v\t%0,%p1
   v\t%0,%p1
   v\t%0,%p1
   v\t%0,%p1"
  [(set_attr "type" "")
   (set_attr "mode" "")])

(define_mode_iterator VI [
  (VNx1QI "TARGET_MIN_VLEN < 128") VNx2QI VNx4QI VNx8QI VNx16QI VNx32QI 
(VNx64QI "TARGET_MIN_VLEN > 32") (VNx128QI "TARGET_MIN_VLEN >= 128")
  (VNx1HI "TARGET_MIN_VLEN < 128") VNx2HI VNx4HI VNx8HI VNx16HI (VNx32HI 
"TARGET_MIN_VLEN > 32") (VNx64HI "TARGET_MIN_VLEN >= 128")
  (VNx1SI "TARGET_MIN_VLEN < 128") VNx2SI VNx4SI VNx8SI (VNx16SI 
"TARGET_MIN_VLEN > 32") (VNx32SI "TARGET_MIN_VLEN >= 128")
  (VNx1DI "TARGET_VECTOR_ELEN_64 && TARGET_MIN_VLEN < 128") (VNx2DI 
"TARGET_VECTOR_ELEN_64")
  (VNx4DI "TARGET_VECTOR_ELEN_64") (VNx8DI "TARGET_VECTOR_ELEN_64") (VNx16DI 
"TARGET_VECTOR_ELEN_64 && TARGET_MIN_VLEN >= 128")
])

You can see there is no VLS modes in "VI". Now to support VLS, I think we 
should extend "VI" iterator:
(define_mode_iterator VI [
  (VNx1QI "TARGET_MIN_VLEN < 128") VNx2QI VNx4QI VNx8QI VNx16QI VNx32QI 
(VNx64QI "TARGET_MIN_VLEN > 32") (VNx128QI "TARGET_MIN_VLEN >= 128")
  (VNx1HI "TARGET_MIN_VLEN < 128") VNx2HI VNx4HI VNx8HI VNx16HI (VNx32HI 
"TARGET_MIN_VLEN > 32") (VNx64HI "TARGET_MIN_VLEN >= 128")
  (VNx1SI "TARGET_MIN_VLEN < 128") VNx2SI VNx4SI VNx8SI (VNx16SI 
"TARGET_MIN_VLEN > 32") (VNx32SI "TARGET_MIN_VLEN >= 128")
  (VNx1DI "TARGET_VECTOR_ELEN_64 && TARGET_MIN_VLEN < 128") (VNx2DI 
"TARGET_VECTOR_ELEN_64")
  (VNx4DI "TARGET_VECTOR_ELEN_64") (VNx8DI "TARGET_VECTOR_ELEN_64") (VNx16DI 
"TARGET_VECTOR_ELEN_64 && TARGET_MIN_VLEN >= 128")
V4SI V2DI V8HI V16QI
])

Then codegen directly to this VLS patterns without any conversion.
This is the safe way to deal with VLS patterns.

Thanks.


juzhe.zh...@rivai.ai
 
From: Richard Biener
Date: 2023-05-30 17:29
To: juzhe.zh...@rivai.ai
CC: Robin Dapp; Kito.cheng; gcc-patches; palmer; kito.cheng; jeffreyalaw; 
pan2.li
Subject: Re: Re: [PATCH] RISC-V: Basic VLS code gen for RISC-V
On Tue, May 30, 2023 at 11:17 AM juzhe.zh...@rivai.ai
 wrote:
>
> In the future, we will definitely mixing VLA and VLS-vlmin together in a 
> codegen and it will not cause any issues.
> For VLS-vlmin, I prefer it is used in length style auto-vectorization (I am 
> not sure since my SELECT_VL patch is not
> finished, I will check if can work when I am working in SELECT_VL patch).
 
For the future it would be then good to have the vectorizer
re-vectorize loops with
VLS vector uses to VLA style?  I think there's a PR with a draft patch
from a few
years ago attached (from me) somewhere.  Currently the vectorizer will give
up when seeing vector operations in a loop but ideally those should simply
be SLPed.
 
> >> In general I don't have a good overview of which optimizations we gain by
> >> such an approach or rather which ones are prevented by VLA altogether?
> These patches VLS modes can help for SLP auto-vectorization.
>
> 
> juzhe.zh...@rivai.ai
>
>
> From: Robin Dapp
> Date: 2023-05-30 17:05
> To: juzhe.zh...@rivai.ai; Richard Biener; Kito.cheng
> CC: rdapp.gcc; gcc-patches; palmer; kito.cheng; jeffreyalaw; pan2.li
> Subject: Re: [PATCH] RISC-V: Basic V

Re: Re: [PATCH] RISC-V: Basic VLS code gen for RISC-V

2023-05-30 Thread juzhe.zh...@rivai.ai
>> For the future it would be then good to have the vectorizer
>>re-vectorize loops with
>>VLS vector uses to VLA style?
 Not really, this patch is just using a magic convert VLS vector into VLA stype 
since
 it can avoid defining the RVV patterns with VLS modes and avoid a lot of work.

 There is no benefits in case of convert VLS into VLS
 And I don't even consider it's safe.

especially this code:
+   case MEM: 
+ operands[i] = change_address (operands[i], vla_mode, NULL_RTX); 

I feel it is unsafe code.

Actually, my original plan is to define new RVV patterns with new VLS modes 
(The patterns are same as VLA patterns, just modes are different).
Then emit codegen this VLS RVV patterns.




juzhe.zh...@rivai.ai
 
From: Richard Biener
Date: 2023-05-30 17:29
To: juzhe.zh...@rivai.ai
CC: Robin Dapp; Kito.cheng; gcc-patches; palmer; kito.cheng; jeffreyalaw; 
pan2.li
Subject: Re: Re: [PATCH] RISC-V: Basic VLS code gen for RISC-V
On Tue, May 30, 2023 at 11:17 AM juzhe.zh...@rivai.ai
 wrote:
>
> In the future, we will definitely mixing VLA and VLS-vlmin together in a 
> codegen and it will not cause any issues.
> For VLS-vlmin, I prefer it is used in length style auto-vectorization (I am 
> not sure since my SELECT_VL patch is not
> finished, I will check if can work when I am working in SELECT_VL patch).
 
For the future it would be then good to have the vectorizer
re-vectorize loops with
VLS vector uses to VLA style?  I think there's a PR with a draft patch
from a few
years ago attached (from me) somewhere.  Currently the vectorizer will give
up when seeing vector operations in a loop but ideally those should simply
be SLPed.
 
> >> In general I don't have a good overview of which optimizations we gain by
> >> such an approach or rather which ones are prevented by VLA altogether?
> These patches VLS modes can help for SLP auto-vectorization.
>
> ____
> juzhe.zh...@rivai.ai
>
>
> From: Robin Dapp
> Date: 2023-05-30 17:05
> To: juzhe.zh...@rivai.ai; Richard Biener; Kito.cheng
> CC: rdapp.gcc; gcc-patches; palmer; kito.cheng; jeffreyalaw; pan2.li
> Subject: Re: [PATCH] RISC-V: Basic VLS code gen for RISC-V
> >>> but ideally the user would be able to specify -mrvv-size=32 for an
> >>> implementation with 32 byte vectors and then vector lowering would make 
> >>> use
> >>> of vectors up to 32 bytes?
> >
> > Actually, we don't want to specify -mrvv-size = 32 to enable vectorization 
> > on GNU vectors.
> > You can take a look this example:
> > https://godbolt.org/z/3jYqoM84h <https://godbolt.org/z/3jYqoM84h>
> >
> > GCC need to specify the mrvv size to enable GNU vectors and the codegen 
> > only can run on CPU with vector-length = 128bit.
> > However, LLVM doesn't need to specify the vector length, and the codegen 
> > can run on any CPU with RVV  vector-length >= 128 bits.
> >
> > This is what this patch want to do.
> >
> > Thanks.
> I think Richard's question was rather if it wasn't better to do it more
> generically and lower vectors to what either the current cpu or what the
> user specified rather than just 16-byte vectors (i.e. indeed a fixed
> vlmin and not a fixed vlmin == fixed vlmax).
>
> This patch assumes everything is fixed for optimization purposes and then
> switches over to variable-length when nothing can be changed anymore.  That
> is, we would work on "vlmin"-sized chunks in a VLA fashion at runtime?
> We would need to make sure that no pass after reload makes use of VLA
> properties at all.
>
> In general I don't have a good overview of which optimizations we gain by
> such an approach or rather which ones are prevented by VLA altogether?
> What's the idea for the future?  Still use LEN_LOAD et al. (and masking)
> with "fixed vlmin"?  Wouldn't we select different IVs with this patch than
> what we would have for pure VLA?
>
> Regards
> Robin
>
 


Re: Re: decremnt IV patch create fails on PowerPC

2023-05-30 Thread juzhe.zh...@rivai.ai
Ok.

It seems that for this conditions:

+  /* If we're vectorizing a loop that uses length "controls" and
+ can iterate more than once, we apply decrementing IV approach
+ in loop control.  */
+  if (LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo)
+  && !LOOP_VINFO_LENS (loop_vinfo).is_empty ()
+  && LOOP_VINFO_PARTIAL_LOAD_STORE_BIAS (loop_vinfo) == 0
+  && !(LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo)
+  && known_le (LOOP_VINFO_INT_NITERS (loop_vinfo),
+   LOOP_VINFO_VECT_FACTOR (loop_vinfo
+LOOP_VINFO_USING_DECREMENTING_IV_P (loop_vinfo) = true;

I should add direct_supportted_p (SELECT_VL...) to this is that right?

I have send SELECT_VL patch. I will add this in next SELECT_VL patch.

Let's wait Richard's more comments.

Thanks.


juzhe.zh...@rivai.ai
 
From: Richard Biener
Date: 2023-05-30 17:22
To: juzhe.zh...@rivai.ai
CC: gcc-patches; richard.sandiford; linkw
Subject: Re: Re: decremnt IV patch create fails on PowerPC
On Fri, 26 May 2023, juzhe.zh...@rivai.ai wrote:
 
> Hi, Richi. Thanks for your analysis and helps.
> 
> >> We could simply retain the original
> >> incrementing IV for loop control and add the decrementing
> >> IV for computing LEN in addition to that and leave IVOPTs
> >> sorting out to eventually merge them (or not).
> 
> I am not sure how to do that. Could you give me more informations?
> 
> I somehow understand your concern is that variable amount of IV will make
> IVOPT fails. 
> 
> I have seen similar situation in LLVM (when apply variable IV,
> they failed to interleave the vectorize code). I am not sure whether they
> are the same reason for that.
> 
> For RVV, we not only want decrement IV style in vectorization but also
> we want to apply SELECT_VL in single-rgroup which is most happen cases (LLVM 
> also only apply get_vector_length in single vector length).
>
> >>You can do some testing with a cross compiler, alternatively
> >>there are powerpc machines in the GCC compile farm.
> 
> It seems that Power is ok with decrement IV since most cases are improved.
 
Well, but Power never will have SELECT_VL so at least for !SELECT_VL
targets you should avoid having an IV with variable decrement.  As
I said it should be easy to rewrite decrement IV to use a constant
increment (when not using SELECT_VL) and testing the pre-decrement
value in the exit test.
 
Richard.
> I think Richard may help to explain decrement IV more clearly.
> 
> Thanks
> 
> 
> juzhe.zh...@rivai.ai
>  
> From: Richard Biener
> Date: 2023-05-26 14:46
> To: ???
> CC: gcc-patches; richard.sandiford; linkw
> Subject: Re: decremnt IV patch create fails on PowerPC
> On Fri, 26 May 2023, ??? wrote:
>  
> > Yesterday's patch has been approved (decremnt IV support):
> > https://gcc.gnu.org/pipermail/gcc-patches/2023-May/619663.html 
> > 
> > However, it creates fails on PowerPC:
> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109971 
> > 
> > I am really sorry for causing inconvinience.
> > 
> > I wonder as we disccussed:
> > +  /* If we're vectorizing a loop that uses length "controls" and
> > + can iterate more than once, we apply decrementing IV approach
> > + in loop control.  */
> > +  if (LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo)
> > +  && !LOOP_VINFO_LENS (loop_vinfo).is_empty ()
> > +  && LOOP_VINFO_PARTIAL_LOAD_STORE_BIAS (loop_vinfo) == 0
> > +  && !(LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo)
> > +&& known_le (LOOP_VINFO_INT_NITERS (loop_vinfo),
> > + LOOP_VINFO_VECT_FACTOR (loop_vinfo
> > +LOOP_VINFO_USING_DECREMENTING_IV_P (loop_vinfo) = true;
> > 
> > This conditions can not disable decrement IV on PowerPC.
> > Should I add a target hook for it?
>  
> No.  I've put some analysis in the PR.  To me the question is
> why (without that SELECT_VL case) we need a decrementing IV
> _for the loop control_?  We could simply retain the original
> incrementing IV for loop control and add the decrementing
> IV for computing LEN in addition to that and leave IVOPTs
> sorting out to eventually merge them (or not).
>  
> Alternatively avoid the variable decrement as I wrote in the
> PR and do the exit test based on the previous IV value.
>  
> But as said all this won't work for the SELECT_VL case, but
> then it's availability is something to key off rather than a
> new target hook?
>  
> > The patch I can only do bootstrap and regression on X86.
> > I didn't have an environment to test PowerPC. I am really sorry.
>  
> You can do some testing with a cross compiler, alternatively
> there are powerpc machines in the GCC compile farm.
>  
> Richard.
>  
> 
 
-- 
Richard Biener 
SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg,
Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman;
HRB 36809 (AG Nuernberg)
 


Re: Re: [PATCH] RISC-V: Basic VLS code gen for RISC-V

2023-05-30 Thread juzhe.zh...@rivai.ai
In the future, we will definitely mixing VLA and VLS-vlmin together in a 
codegen and it will not cause any issues.
For VLS-vlmin, I prefer it is used in length style auto-vectorization (I am not 
sure since my SELECT_VL patch is not
finished, I will check if can work when I am working in SELECT_VL patch).

>> In general I don't have a good overview of which optimizations we gain by
>> such an approach or rather which ones are prevented by VLA altogether?
These patches VLS modes can help for SLP auto-vectorization.



juzhe.zh...@rivai.ai
 
From: Robin Dapp
Date: 2023-05-30 17:05
To: juzhe.zh...@rivai.ai; Richard Biener; Kito.cheng
CC: rdapp.gcc; gcc-patches; palmer; kito.cheng; jeffreyalaw; pan2.li
Subject: Re: [PATCH] RISC-V: Basic VLS code gen for RISC-V
>>> but ideally the user would be able to specify -mrvv-size=32 for an
>>> implementation with 32 byte vectors and then vector lowering would make use
>>> of vectors up to 32 bytes?
> 
> Actually, we don't want to specify -mrvv-size = 32 to enable vectorization on 
> GNU vectors.
> You can take a look this example:
> https://godbolt.org/z/3jYqoM84h <https://godbolt.org/z/3jYqoM84h> 
> 
> GCC need to specify the mrvv size to enable GNU vectors and the codegen only 
> can run on CPU with vector-length = 128bit.
> However, LLVM doesn't need to specify the vector length, and the codegen can 
> run on any CPU with RVV  vector-length >= 128 bits.
> 
> This is what this patch want to do.
> 
> Thanks.
I think Richard's question was rather if it wasn't better to do it more
generically and lower vectors to what either the current cpu or what the
user specified rather than just 16-byte vectors (i.e. indeed a fixed
vlmin and not a fixed vlmin == fixed vlmax).
 
This patch assumes everything is fixed for optimization purposes and then
switches over to variable-length when nothing can be changed anymore.  That
is, we would work on "vlmin"-sized chunks in a VLA fashion at runtime?
We would need to make sure that no pass after reload makes use of VLA
properties at all.
 
In general I don't have a good overview of which optimizations we gain by
such an approach or rather which ones are prevented by VLA altogether?
What's the idea for the future?  Still use LEN_LOAD et al. (and masking)
with "fixed vlmin"?  Wouldn't we select different IVs with this patch than
what we would have for pure VLA?
 
Regards
Robin
 


Re: Re: [PATCH] RISC-V: Basic VLS code gen for RISC-V

2023-05-30 Thread juzhe.zh...@rivai.ai
Hi, Richi.

>> but ideally the user would be able to specify -mrvv-size=32 for an
>> implementation with 32 byte vectors and then vector lowering would make use
>> of vectors up to 32 bytes?

Actually, we don't want to specify -mrvv-size = 32 to enable vectorization on 
GNU vectors.
You can take a look this example:
https://godbolt.org/z/3jYqoM84h 

GCC need to specify the mrvv size to enable GNU vectors and the codegen only 
can run on CPU with vector-length = 128bit.
However, LLVM doesn't need to specify the vector length, and the codegen can 
run on any CPU with RVV  vector-length >= 128 bits.

This is what this patch want to do.

Thanks.


juzhe.zh...@rivai.ai
 
From: Richard Biener
Date: 2023-05-30 15:13
To: Kito Cheng
CC: gcc-patches; palmer; kito.cheng; juzhe.zhong; jeffreyalaw; rdapp.gcc; 
pan2.li
Subject: Re: [PATCH] RISC-V: Basic VLS code gen for RISC-V
On Tue, May 30, 2023 at 8:07 AM Kito Cheng via Gcc-patches
 wrote:
>
> GNU vector extensions is widly used around this world, and this patch
> enable that with RISC-V vector extensions, this can help people
> leverage existing code base with RVV, and also can write vector programs in a
> familiar way.
>
> The idea of VLS code gen support is emulate VLS operation by VLA operation 
> with
> specific length.
 
In the patch you added fixed 16 bytes vector modes, correct?  I've
never looked at
how ARM deals with the GNU vector extensions but I suppose they get mapped
to NEON and not SVE so basically behave the same way here.
 
But I do wonder about the efficiency for RVV where there doesn't exist a
complementary fixed-length ISA.  Shouldn't vector lowering
(tree-vect-generic.cc)
be enhanced to support lowering fixed-length vectors to variable length ones
with (variable) fixed length instead?  From your patch I second-guess the RVV
specification requires 16 byte vectors to be available (or will your
patch split the
insns?) but ideally the user would be able to specify -mrvv-size=32 for an
implementation with 32 byte vectors and then vector lowering would make use
of vectors up to 32 bytes?
 
Also vector lowering will split smaller vectors not equal to the fixed size to
scalars unless you add all fixed length modes smaller than 16 bytes as well.
 
> Key design point is we defer the mode conversion (From VLS to VLA mode) after
> register allocation, it come with several advantages:
> - VLS pattern is much friendly for most optimization pass like combine.
> - Register allocator can spill/restore exact size of VLS type instead of
>   whole register.
>
> This is compatible with VLA vectorization.
>
> Only support move and binary part of operation patterns.
>
> gcc/ChangeLog:
>
> * config/riscv/riscv-modes.def: Introduce VLS modes.
> * config/riscv/riscv-protos.h (riscv_vector::minimal_vls_mode): New.
> (riscv_vector::vls_insn_expander): New.
> (riscv_vector::vls_mode_p): New.
> * config/riscv/riscv-v.cc (riscv_vector::minimal_vls_mode): New.
> (riscv_vector::vls_mode_p): New.
> (riscv_vector::vls_insn_expander): New.
> (riscv_vector::update_vls_mode): New.
> * config/riscv/riscv.cc (riscv_v_ext_mode_p): New.
> (riscv_v_adjust_nunits): Handle VLS type.
> (riscv_hard_regno_nregs): Ditto.
> (riscv_hard_regno_mode_ok): Ditto.
> (riscv_regmode_natural_size): Ditto.
> * config/riscv/vector-iterators.md (VLS): New.
> (VM): Handle VLS type.
> (vel): Ditto.
> * config/riscv/vector.md: Include vector-vls.md.
> * config/riscv/vector-vls.md: New file.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/riscv/rvv/rvv.exp: Add vls folder.
> * gcc.target/riscv/rvv/vls/binop-template.h: New test.
> * gcc.target/riscv/rvv/vls/binop-v.c: New test.
> * gcc.target/riscv/rvv/vls/binop-zve32x.c: New test.
> * gcc.target/riscv/rvv/vls/binop-zve64x.c: New test.
> * gcc.target/riscv/rvv/vls/move-template.h: New test.
> * gcc.target/riscv/rvv/vls/move-v.c: New test.
> * gcc.target/riscv/rvv/vls/move-zve32x.c: New test.
> * gcc.target/riscv/rvv/vls/move-zve64x.c: New test.
> * gcc.target/riscv/rvv/vls/load-store-template.h: New test.
> * gcc.target/riscv/rvv/vls/load-store-v.c: New test.
> * gcc.target/riscv/rvv/vls/load-store-zve32x.c: New test.
> * gcc.target/riscv/rvv/vls/load-store-zve64x.c: New test.
> * gcc.target/riscv/rvv/vls/vls-types.h: New test.
> ---
>  gcc/config/riscv/riscv-modes.def  |  3 +
>  gcc/config/riscv/riscv-protos.h   |  4 ++
>  gcc/config/riscv/riscv-v.cc   | 67 +++
>  gcc/config/riscv/riscv.cc | 27 +++-
&g

Re: Re: [PATCH] RISC-V: Basic VLS code gen for RISC-V

2023-05-30 Thread juzhe.zh...@rivai.ai
>> why is the conversion after register allocation always
>> safe?
I do worry about this issue too. 
I just notice :

+   case MEM:
+ operands[i] = change_address (operands[i], vla_mode, NULL_RTX);

I am not sure whether it is safe.

>> Couldn't we "lower" the fixed-length vectors to VLA at some point and
>> how does everything relate to fixed-vlmax?

I can answer you why we need this patch (I call it fixed-vlmin).
You can take a look at this example:
https://godbolt.org/z/3jYqoM84h 

This is how LLVM works.
This example, you can see GCC need --param=riscv-autovec-preference=fixed-vlmax 
-march=rv64gcv (same as mrvv-vector-bits=128).
However, LLVM doesn't need to specify the vector-length.

The benefits:
1. We don't need to specify actual real vector length, then we can vectorize 
this example.
2. GCC codegen can only run on CPU with vector length=128. However, LLVM can 
run on any RVV CPU with vector length >= 128.

Thanks.


juzhe.zh...@rivai.ai
 
From: Robin Dapp
Date: 2023-05-30 15:27
To: Kito Cheng; gcc-patches; palmer; kito.cheng; juzhe.zhong; jeffreyalaw; 
pan2.li
CC: rdapp.gcc
Subject: Re: [PATCH] RISC-V: Basic VLS code gen for RISC-V
Hi Kito,
 
> GNU vector extensions is widly used around this world, and this patch
> enable that with RISC-V vector extensions, this can help people
> leverage existing code base with RVV, and also can write vector programs in a
> familiar way.
> 
> The idea of VLS code gen support is emulate VLS operation by VLA operation 
> with
> specific length.
> 
> Key design point is we defer the mode conversion (From VLS to VLA mode) after
> register allocation, it come with several advantages:
> - VLS pattern is much friendly for most optimization pass like combine.
> - Register allocator can spill/restore exact size of VLS type instead of
>   whole register.
> 
> This is compatible with VLA vectorization.
> 
> Only support move and binary part of operation patterns.
 
On a high-level:  Why do we need to do it this way and not any other way? :)
Some more comments/explanations would definitely help, i.e. prior art on
aarch64, what exactly is easier for combine and friends now (no undef and so
on) and, importantly, why is the conversion after register allocation always
safe?  Couldn't we "lower" the fixed-length vectors to VLA at some point and
how does everything relate to fixed-vlmax? Essentially this is a "separate"
backend similar to ARM NEON but we share most of the things and possibly grow
it in the future?
 
What would the alternative be?
 
That said, couldn't we reuse the existing binop tests?  If you don't like them
change the existing ones as well and reuse then?
 
> +/* Return the minimal containable VLA mode for MODE.  */
> +
> +machine_mode
> +minimal_vla_mode (machine_mode mode)
> +{
> +  gcc_assert (GET_MODE_NUNITS (mode).is_constant ());
> +  unsigned type_size = GET_MODE_NUNITS (mode).to_constant ();
 
Couldn't you use .require () right away?  Same in some other hunks.
 
Regards
Robin
 
 


Re: [PATCH] VECT: Add SELECT_VL support

2023-05-30 Thread juzhe.zh...@rivai.ai
Hi, this patch is bootstrapped PASS.

Ok for trunk ?

Thanks.


juzhe.zh...@rivai.ai
 
From: juzhe.zhong
Date: 2023-05-25 23:26
To: gcc-patches
CC: richard.sandiford; rguenther; Ju-Zhe Zhong
Subject: [PATCH] VECT: Add SELECT_VL support
From: Ju-Zhe Zhong 
 
This patch is adding SELECT_VL middle-end support
allow target have target dependent optimization in case of
length calculation.
 
This patch is inspired by RVV ISA and LLVM:
https://reviews.llvm.org/D99750
 
The SELECT_VL is same behavior as LLVM "get_vector_length" with
these following properties:
 
1. Only apply on single-rgroup.
2. non SLP.
3. adjust loop control IV.
4. adjust data reference IV.
5. allow non-vf elements processing in non-final iteration
 
Code:
   # void vvaddint32(size_t n, const int*x, const int*y, int*z)
# { for (size_t i=0; i
-_36 = MIN_EXPR ;
+_36 = (MIN_EXPR | SELECT_VL) ;
   ...
   vect__4.8_28 = .LEN_LOAD (_17, 32B, _36, 0);
   ...
@@ -551,9 +551,14 @@ vect_set_loop_controls_directly (class loop *loop, 
loop_vec_info loop_vinfo,
   /* Create decrement IV.  */
   create_iv (nitems_total, MINUS_EXPR, step, NULL_TREE, loop, &incr_gsi,
insert_after, &index_before_incr, &index_after_incr);
-  gimple_seq_add_stmt (header_seq, gimple_build_assign (step, MIN_EXPR,
- index_before_incr,
- nitems_step));
+  tree len = NULL_TREE;
+  if (LOOP_VINFO_USING_SELECT_VL_P (loop_vinfo))
+ len = gimple_build (header_seq, IFN_SELECT_VL, iv_type,
+ index_before_incr, nitems_step);
+  else
+ len = gimple_build (header_seq, MIN_EXPR, iv_type, index_before_incr,
+ nitems_step);
+  gimple_seq_add_stmt (header_seq, gimple_build_assign (step, len));
   *iv_step = step;
   return index_after_incr;
 }
diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index 5b7a0da0034..f67340976c8 100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -974,6 +974,7 @@ _loop_vec_info::_loop_vec_info (class loop *loop_in, 
vec_info_shared *shared)
 can_use_partial_vectors_p (param_vect_partial_vector_usage != 0),
 using_partial_vectors_p (false),
 using_decrementing_iv_p (false),
+using_select_vl_p (false),
 epil_using_partial_vectors_p (false),
 partial_load_store_bias (0),
 peeling_for_gaps (false),
@@ -2737,6 +2738,14 @@ start_over:
LOOP_VINFO_VECT_FACTOR (loop_vinfo
 LOOP_VINFO_USING_DECREMENTING_IV_P (loop_vinfo) = true;
+  /* If we're using decrement IV and SELECT_VL is supported by the target.
+ Use output of SELECT_VL to adjust IV of loop control and data reference.
+ Note: We only use SELECT_VL on single-rgroup control.  */
+  if (LOOP_VINFO_USING_DECREMENTING_IV_P (loop_vinfo)
+  && LOOP_VINFO_LENS (loop_vinfo).length () == 1
+  && !slp)
+LOOP_VINFO_USING_SELECT_VL_P (loop_vinfo) = true;
+
   /* If we're vectorizing an epilogue loop, the vectorized loop either needs
  to be able to handle fewer than VF scalars, or needs to have a lower VF
  than the main loop.  */
diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index 127b987cd62..8e8b0f71a4a 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -3147,6 +3147,61 @@ vect_get_data_ptr_increment (vec_info *vinfo,
   return iv_step;
}
+/* Prepare the pointer IVs which needs to be updated by a variable amount.
+   Such variable amount is the outcome of .SELECT_VL. In this case, we can
+   allow each iteration process the flexible number of elements as long as
+   the number <= vf elments.
+
+   Return data reference according to SELECT_VL.
+   If new statements are needed, insert them before GSI.  */
+
+static tree
+get_select_vl_data_ref_ptr (vec_info *vinfo, stmt_vec_info stmt_info,
+ tree aggr_type, class loop *at_loop, tree offset,
+ tree *dummy, gimple_stmt_iterator *gsi,
+ bool simd_lane_access_p, vec_loop_lens *loop_lens,
+ dr_vec_info *dr_info,
+ vect_memory_access_type memory_access_type)
+{
+  loop_vec_info loop_vinfo = dyn_cast (vinfo);
+  tree step = vect_dr_behavior (vinfo, dr_info)->step;
+
+  /* TODO: We don't support gather/scatter or load_lanes/store_lanes for 
pointer
+ IVs are updated by variable amount but we will support them in the future.
+   */
+  gcc_assert (memory_access_type != VMAT_GATHER_SCATTER
+   && memory_access_type != VMAT_LOAD_STORE_LANES);
+
+  /* When we support SELECT_VL pattern, we dynamic adjust
+ the memory address by .SELECT_VL result.
+
+ The result of .SELECT_VL is the number of elements to
+ be processed of each iteration. So the memory address
+ adjustment operation should be:
+
+ bytesize = GET_MODE_SIZE (element_mode (aggr_type));
+ addr = addr + .SELECT_VL (ARG..) * bytesize;
+  */
+  gimple *ptr_incr;
+  tree loop_len
+= vect_get_loop_len (loop_vinfo, gsi, loop_lens, 1, aggr_type, 0, 0);
+  tree len_type = TREE_TYPE (loop_len);
+  poly_uint64 bytesize = GET_MODE_SIZE (

Re: Re: [PATCH] RISC-V: Basic VLS code gen for RISC-V

2023-05-30 Thread juzhe.zh...@rivai.ai
Ok.  LGTM as long as you change the patch as I suggested.

Thanks.


juzhe.zh...@rivai.ai
 
From: Kito Cheng
Date: 2023-05-30 14:51
To: juzhe.zh...@rivai.ai
CC: gcc-patches; palmer; kito.cheng; jeffreyalaw; Robin Dapp; pan2.li
Subject: Re: [PATCH] RISC-V: Basic VLS code gen for RISC-V
> >> /* Return true if MODE is true VLS mode.  */
> >> bool
> >> vls_mode_p (machine_mode mode)
> >> {
> >>   switch (mode)
> >> {
> >> case E_V4SImode:
> >> case E_V2DImode:
> >> case E_V8HImode:
> >> case E_V16QImode:
> >>   return true;
> >> default:
> >>   return false;
> >> }
> >> }
>
> To be consistent, you should put these into riscv-vector-switching.def.
> It can make the function easier extend,change it like this:
> change name into riscv_v_ext_vls_mode_p
>
> bool
> riscv_v_ext_vls_mode_p (machine_mode mode)
> {
> #define VLS_ENTRY(MODE, REQUIREMENT, ...) 
>  \
>   case MODE##mode:
>  \
> return REQUIREMENT;
>   switch (mode)
> {
> #include "riscv-vector-switch.def"
> default:
>   return false;
> }
>   return false;
> }
>
> Then in riscv-vector-switch.def
> VLS_ENTRY (V4SI...
> VLS_ENTRY (V2DI..
> ...
> In the future, we extend more VLS modes in riscv-vector-switch.def
 
Good point, we should make this more consistent :)
 
> >>(define_insn_and_split "3"
> >>  [(set (match_operand:VLS 0 "register_operand" "=vr")
> >> (any_int_binop_no_shift:VLS
> >>  (match_operand:VLS 1 "register_operand" "vr")
> >>  (match_operand:VLS 2 "register_operand" "vr")))]
> >>  "TARGET_VECTOR"
> >>  "#"
> >>  "reload_completed"
> >>  [(const_int 0)]
> >>+{
> >>  machine_mode vla_mode = riscv_vector::minimal_vla_mode (mode);
> >>  riscv_vector::vls_insn_expander (
> >>code_for_pred (, vla_mode), riscv_vector::RVV_BINOP,
> >>operands, mode, vla_mode);
> >>  DONE;
> >>})
>
> This pattern can work for current VLS modes so far since they are within 
> 0~31, if we add more VLSmodes such as V32QImode, V64QImode,
> it can't work . I am ok with this, but I should remind you early.
 
Yeah, I Know the problem, my thought is we will have another set of
VLS patterns for those NUNITS >= 32, and require one clobber with GPR.
 
> Add tests with -march=rv64gcv_zvl256b to see whether your testcase can 
> generate LMUL = mf2 vsetvli
>
> and -march=rv64gcv_zvl2048 make sure your testcase will not go into the VLS 
> modes (2048 * 1 / 8 > 128)
 
I guess I should make a loop to test those combinations instead of
spearted file but with different options.
 
>
>
> For VSETVL part, I didn't see you define attribute sew/vlmul ...ratio for VLS 
> modes.
>
> I wonder how these VLS modes emit correct VSETVL?
 
That's the magic I made here, I split the pattern after RA, but before
vsetvli, and convert all operands to VLA mode and use VLA pattern, so
that we don't need to modify any line of vsetvli stuff.
 


Re: [PATCH] RISC-V: Basic VLS code gen for RISC-V

2023-05-29 Thread juzhe.zh...@rivai.ai

>> /* Return true if MODE is true VLS mode.  */
>> bool
>> vls_mode_p (machine_mode mode)
>> {
>>   switch (mode)
>> {
>> case E_V4SImode:
>> case E_V2DImode:
>> case E_V8HImode:
>> case E_V16QImode:
>>   return true;
>> default:
>>   return false;
>> }
>> }
To be consistent, you should put these into riscv-vector-switching.def.
It can make the function easier extend,change it like this:
change name into riscv_v_ext_vls_mode_p 
bool
riscv_v_ext_vls_mode_p (machine_mode mode)
{
#define VLS_ENTRY(MODE, REQUIREMENT, ...)   
   \
  case MODE##mode: \
return REQUIREMENT;
  switch (mode)
{
#include "riscv-vector-switch.def"
default:
  return false;
}
  return false;
}
Then in riscv-vector-switch.def
VLS_ENTRY (V4SI...
VLS_ENTRY (V2DI..
...
In the future, we extend more VLS modes in riscv-vector-switch.def

>>(define_insn_and_split "3"
>>  [(set (match_operand:VLS 0 "register_operand" "=vr")
>>  (any_int_binop_no_shift:VLS
>>(match_operand:VLS 1 "register_operand" "vr")
>>(match_operand:VLS 2 "register_operand" "vr")))]
>>  "TARGET_VECTOR"
>>  "#"
>>  "reload_completed"
>>  [(const_int 0)]
>>+{
>>  machine_mode vla_mode = riscv_vector::minimal_vla_mode (mode);
>>  riscv_vector::vls_insn_expander (
>>code_for_pred (, vla_mode), riscv_vector::RVV_BINOP,
>>operands, mode, vla_mode);
>>  DONE;
>>})
This pattern can work for current VLS modes so far since they are within 0~31, 
if we add more VLSmodes such as V32QImode, V64QImode,
it can't work . I am ok with this, but I should remind you early.

>> # VLS test
>>gcc-dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/vls/*.\[cS\]]] \
>>  "" $CFLAGS
>>Add tests with -march=rv64gcv_zvl256b to see whether your testcase can 
>>generate LMUL = mf2 vsetvliand -march=rv64gcv_zvl2048 make sure your testcase 
>>will not go into the VLS modes (2048 * 1 / 8 > 128) 
For VSETVL part, I didn't see you define attribute sew/vlmul ...ratio for VLS 
modes.I wonder how these VLS modes emit correct VSETVL?For example in vector.md:
(define_attr "sew" ""
  (cond [(eq_attr "mode" "VNx1QI,VNx2QI,VNx4QI,VNx8QI,VNx16QI,VNx32QI,VNx64QI,\
VNx1BI,VNx2BI,VNx4BI,VNx8BI,VNx16BI,VNx32BI,VNx64BI,\
VNx128QI,VNx128BI,VNx2x64QI,VNx2x32QI,VNx3x32QI,VNx4x32QI,\
VNx2x16QI,VNx3x16QI,VNx4x16QI,VNx5x16QI,VNx6x16QI,VNx7x16QI,VNx8x16QI,\
VNx2x8QI,VNx3x8QI,VNx4x8QI,VNx5x8QI,VNx6x8QI,VNx7x8QI,VNx8x8QI,\
VNx2x4QI,VNx3x4QI,VNx4x4QI,VNx5x4QI,VNx6x4QI,VNx7x4QI,VNx8x4QI,\
VNx2x2QI,VNx3x2QI,VNx4x2QI,VNx5x2QI,VNx6x2QI,VNx7x2QI,VNx8x2QI,\
VNx2x1QI,VNx3x1QI,VNx4x1QI,VNx5x1QI,VNx6x1QI,VNx7x1QI,VNx8x1QI")
   (const_int 8)
   (eq_attr "mode" "VNx1HI,VNx2HI,VNx4HI,VNx8HI,VNx16HI,VNx32HI,VNx64HI,\
VNx2x32HI,VNx2x16HI,VNx3x16HI,VNx4x16HI,\
VNx2x8HI,VNx3x8HI,VNx4x8HI,VNx5x8HI,VNx6x8HI,VNx7x8HI,VNx8x8HI,\
VNx2x4HI,VNx3x4HI,VNx4x4HI,VNx5x4HI,VNx6x4HI,VNx7x4HI,VNx8x4HI,\
VNx2x2HI,VNx3x2HI,VNx4x2HI,VNx5x2HI,VNx6x2HI,VNx7x2HI,VNx8x2HI,\
VNx2x1HI,VNx3x1HI,VNx4x1HI,VNx5x1HI,VNx6x1HI,VNx7x1HI,VNx8x1HI")
   (const_int 16)
   (eq_attr "mode" "VNx1SI,VNx2SI,VNx4SI,VNx8SI,VNx16SI,VNx32SI,\
VNx1SF,VNx2SF,VNx4SF,VNx8SF,VNx16SF,VNx32SF,\
VNx2x16SI,VNx2x8SI,VNx3x8SI,VNx4x8SI,\
VNx2x4SI,VNx3x4SI,VNx4x4SI,VNx5x4SI,VNx6x4SI,VNx7x4SI,VNx8x4SI,\
VNx2x2SI,VNx3x2SI,VNx4x2SI,VNx5x2SI,VNx6x2SI,VNx7x2SI,VNx8x2SI,\
VNx2x1SI,VNx3x1SI,VNx4x1SI,VNx5x1SI,VNx6x1SI,VNx7x1SI,VNx8x1SI,\
VNx2x16SF,VNx2x8SF,VNx3x8SF,VNx4x8SF,\
VNx2x4SF,VNx3x4SF,VNx4x4SF,VNx5x4SF,VNx6x4SF,VNx7x4SF,VNx8x4SF,\
VNx2x2SF,VNx3x2SF,VNx4x2SF,VNx5x2SF,VNx6x2SF,VNx7x2SF,VNx8x2SF,\
VNx2x1SF,VNx3x1SF,VNx4x1SF,VNx5x1SF,VNx6x1SF,VNx7x1SF,VNx8x1SF")
   (const_int 32)
   (eq_attr "mode" "VNx1DI,VNx2DI,VNx4DI,VNx8DI,VNx16DI,\
VNx1DF,VNx2DF,VNx4DF,VNx8DF,VNx16DF,\
    VNx2x8DI,VNx2x4DI,VNx3x4DI,VNx4x4DI,\
VNx2x2DI,VNx3x2DI,VNx4x2DI,VNx5x2DI,VNx6x2DI,VNx7x2DI,VNx8x2DI,\
VNx2x1DI,VNx3x1DI,VNx4x1DI,VNx5x1DI,VNx6x1DI,VNx7x1DI,VNx8x1DI,\
VNx2x8DF,VNx2x4DF,VNx3x4DF,VNx4x4DF,\
VNx2x2DF,VNx3x2DF,VNx4x2DF,VNx5x2DF,VNx6x2DF,VNx7x2DF,VNx8x2DF,\
VNx2x1DF,VNx3x1DF,VNx4x1DF,VNx5x1DF,VNx6x1DF,VNx7x1DF,VNx8x1DF")
   (const_int 64)]
  (const_int INVALID_ATTRIBUTE)))




juzhe.zh...@rivai.ai
 
From: Kito Cheng
Date: 2023-05-30 14:06
To: gcc-patches; palmer; kito.cheng; juzh

Re: [PATCH V2] RISC-V: Add floating-point to integer conversion RVV auto-vectorization support

2023-05-29 Thread juzhe.zh...@rivai.ai
Ok for trunk ?



juzhe.zh...@rivai.ai
 
From: juzhe.zhong
Date: 2023-05-29 12:35
To: gcc-patches
CC: kito.cheng; kito.cheng; palmer; palmer; jeffreyalaw; rdapp.gcc; Juzhe-Zhong
Subject: [PATCH V2] RISC-V: Add floating-point to integer conversion RVV 
auto-vectorization support
From: Juzhe-Zhong 
 
Even though we can't support floating-point operations which are depending
on FRM yet, (for example vfadd support is blocked) since the RVV intrinsic doc 
is not updated
and we can't support mode switching for this.
 
We can support floating-point to integer conversion now since it's not 
depending on FRM and
we don't need mode switching support for this ('rtz' conversions independent 
FRM).
 
gcc/ChangeLog:
 
* config/riscv/autovec.md (2): New pattern.
* config/riscv/iterators.md: New attribute.
* config/riscv/vector-iterators.md: New attribute.
 
gcc/testsuite/ChangeLog:
 
* gcc.target/riscv/rvv/autovec/conversions/vfcvt_rtz-run.c: New test.
* gcc.target/riscv/rvv/autovec/conversions/vfcvt_rtz-rv32gcv.c: New 
test.
* gcc.target/riscv/rvv/autovec/conversions/vfcvt_rtz-rv64gcv.c: New 
test.
* gcc.target/riscv/rvv/autovec/conversions/vfcvt_rtz-template.h: New 
test.
 
---
gcc/config/riscv/autovec.md   | 23 
gcc/config/riscv/iterators.md |  4 +-
gcc/config/riscv/vector-iterators.md  |  5 ++
.../rvv/autovec/conversions/vfcvt_rtz-run.c   | 52 +++
.../autovec/conversions/vfcvt_rtz-rv32gcv.c   |  6 +++
.../autovec/conversions/vfcvt_rtz-rv64gcv.c   |  6 +++
.../autovec/conversions/vfcvt_rtz-template.h  | 15 ++
7 files changed, 110 insertions(+), 1 deletion(-)
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/conversions/vfcvt_rtz-run.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/conversions/vfcvt_rtz-rv32gcv.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/conversions/vfcvt_rtz-rv64gcv.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/conversions/vfcvt_rtz-template.h
 
diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
index b24867ae4d0..3989ffb26ee 100644
--- a/gcc/config/riscv/autovec.md
+++ b/gcc/config/riscv/autovec.md
@@ -478,6 +478,29 @@
   DONE;
})
+;; =
+;; == Conversions
+;; =
+
+;; -
+;;  [INT<-FP] Conversions
+;; -
+;; Includes:
+;; - vfcvt.rtz.xu.f.v
+;; - vfcvt.rtz.x.f.v
+;; -
+
+(define_expand "2"
+  [(set (match_operand: 0 "register_operand")
+ (any_fix:
+   (match_operand:VF 1 "register_operand")))]
+  "TARGET_VECTOR"
+{
+  insn_code icode = code_for_pred (, mode);
+  riscv_vector::emit_vlmax_insn (icode, riscv_vector::RVV_UNOP, operands);
+  DONE;
+})
+
;; =
;; == Unary arithmetic
;; =
diff --git a/gcc/config/riscv/iterators.md b/gcc/config/riscv/iterators.md
index 8afe98e4410..d374a10810c 100644
--- a/gcc/config/riscv/iterators.md
+++ b/gcc/config/riscv/iterators.md
@@ -225,7 +225,9 @@
(ss_minus "sssub")
(us_minus "ussub")
(sign_extend "extend")
- (zero_extend "zero_extend")])
+ (zero_extend "zero_extend")
+ (fix "fix_trunc")
+ (unsigned_fix "fixuns_trunc")])
;;  code attributes
(define_code_attr or_optab [(ior "ior")
diff --git a/gcc/config/riscv/vector-iterators.md 
b/gcc/config/riscv/vector-iterators.md
index 70fb5b80b1b..937ec3c7f67 100644
--- a/gcc/config/riscv/vector-iterators.md
+++ b/gcc/config/riscv/vector-iterators.md
@@ -1208,6 +1208,11 @@
   (VNx1DF "VNx1DI") (VNx2DF "VNx2DI") (VNx4DF "VNx4DI") (VNx8DF "VNx8DI") 
(VNx16DF "VNx16DI")
])
+(define_mode_attr vconvert [
+  (VNx1SF "vnx1si") (VNx2SF "vnx2si") (VNx4SF "vnx4si") (VNx8SF "vnx8si") 
(VNx16SF "vnx16si") (VNx32SF "vnx32si")
+  (VNx1DF "vnx1di") (VNx2DF "vnx2di") (VNx4DF "vnx4di") (VNx8DF "vnx8di") 
(VNx16DF "vnx16di")
+])
+
(define_mode_attr VNCONVERT [
   (VNx1SF "VNx1HI") (VNx2SF "VNx2HI") (VNx4SF "VNx4HI") (VNx8SF "VNx8HI") 
(VNx16SF "VNx16HI") (VNx32SF "VNx32HI")
   (VNx1DI "VNx1SF") (VNx2DI "VNx2SF") (VNx4DI "VNx4SF") (VNx8DI "VNx8SF") 
(VNx16DI "VNx16SF")
diff --git 
a/gcc/testsuite/gcc.target/riscv/r

Re: [PATCH V2] RISC-V: Add RVV FNMA auto-vectorization support

2023-05-29 Thread juzhe.zh...@rivai.ai
Hi, this patch is same implementation as FMA which has been merged.
Ok for trunk?



juzhe.zh...@rivai.ai
 
From: juzhe.zhong
Date: 2023-05-29 14:53
To: gcc-patches
CC: kito.cheng; kito.cheng; palmer; palmer; jeffreyalaw; rdapp.gcc; Juzhe-Zhong
Subject: [PATCH V2] RISC-V: Add RVV FNMA auto-vectorization support
From: Juzhe-Zhong 
 
Like FMA, Add FNMA (VNMSAC or VNMSUB) auto-vectorization support.
 
gcc/ChangeLog:
 
* config/riscv/autovec.md (fnma4): New pattern.
(*fnma): Ditto.
 
gcc/testsuite/ChangeLog:
 
* gcc.target/riscv/rvv/autovec/ternop/ternop-4.c: New test.
* gcc.target/riscv/rvv/autovec/ternop/ternop-5.c: New test.
* gcc.target/riscv/rvv/autovec/ternop/ternop-6.c: New test.
* gcc.target/riscv/rvv/autovec/ternop/ternop_run-4.c: New test.
* gcc.target/riscv/rvv/autovec/ternop/ternop_run-5.c: New test.
* gcc.target/riscv/rvv/autovec/ternop/ternop_run-6.c: New test.
 
---
gcc/config/riscv/autovec.md   |  45 
.../riscv/rvv/autovec/ternop/ternop-4.c   |  28 +
.../riscv/rvv/autovec/ternop/ternop-5.c   |  34 ++
.../riscv/rvv/autovec/ternop/ternop-6.c   |  33 ++
.../riscv/rvv/autovec/ternop/ternop_run-4.c   |  84 ++
.../riscv/rvv/autovec/ternop/ternop_run-5.c   | 104 ++
.../riscv/rvv/autovec/ternop/ternop_run-6.c   | 104 ++
7 files changed, 432 insertions(+)
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/ternop/ternop-4.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/ternop/ternop-5.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/ternop/ternop-6.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/ternop/ternop_run-4.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/ternop/ternop_run-5.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/ternop/ternop_run-6.c
 
diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
index eff3e484fb4..a1028d71467 100644
--- a/gcc/config/riscv/autovec.md
+++ b/gcc/config/riscv/autovec.md
@@ -606,3 +606,48 @@
   }
   [(set_attr "type" "vimuladd")
(set_attr "mode" "")])
+
+;; -
+;;  [INT] VNMSAC and VNMSUB
+;; -
+;; Includes:
+;; - vnmsac
+;; - vnmsub
+;; -
+
+(define_expand "fnma4"
+  [(parallel
+[(set (match_operand:VI 0 "register_operand" "=vr")
+   (minus:VI
+ (match_operand:VI 3 "register_operand"   " vr")
+ (mult:VI
+   (match_operand:VI 1 "register_operand" " vr")
+   (match_operand:VI 2 "register_operand" " vr"
+ (clobber (match_scratch:SI 4))])]
+  "TARGET_VECTOR"
+  {})
+
+(define_insn_and_split "*fnma"
+  [(set (match_operand:VI 0 "register_operand" "=vr, vr, ?&vr")
+ (minus:VI
+   (match_operand:VI 3 "register_operand"   " vr,  0,   vr")
+   (mult:VI
+ (match_operand:VI 1 "register_operand" " %0, vr,   vr")
+ (match_operand:VI 2 "register_operand" " vr, vr,   vr"
+   (clobber (match_scratch:SI 4 "=r,r,r"))]
+  "TARGET_VECTOR"
+  "#"
+  "&& reload_completed"
+  [(const_int 0)]
+  {
+PUT_MODE (operands[4], Pmode);
+riscv_vector::emit_vlmax_vsetvl (mode, operands[4]);
+if (which_alternative == 2)
+  emit_insn (gen_rtx_SET (operands[0], operands[3]));
+rtx ops[] = {operands[0], operands[1], operands[2], operands[3], 
operands[0]};
+riscv_vector::emit_vlmax_ternary_insn (code_for_pred_minus_mul 
(mode),
+riscv_vector::RVV_TERNOP, ops, operands[4]);
+DONE;
+  }
+  [(set_attr "type" "vimuladd")
+   (set_attr "mode" "")])
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/ternop/ternop-4.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/ternop/ternop-4.c
new file mode 100644
index 000..22d11de89a1
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/ternop/ternop-4.c
@@ -0,0 +1,28 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-march=rv32gcv -mabi=ilp32d 
--param=riscv-autovec-preference=scalable" } */
+
+#include 
+
+#define TEST_TYPE(TYPE)
\
+  __attribute__ ((noipa)) void ternop_##TYPE (TYPE *__restrict dst,
\
+   TYPE *__restrict a,  \
+   TYPE *__restrict b, int n)   \
+  {
\
+for (int i = 0; i < n; i++)
\
+  dst[i] += -(a[i] *

Re: Re: [PATCH] RISC-V: Fix VSETVL PASS ICE on SLP auto-vectorization

2023-05-28 Thread juzhe.zh...@rivai.ai
Yes.



juzhe.zh...@rivai.ai
 
From: Kito Cheng
Date: 2023-05-29 12:36
To: juzhe.zh...@rivai.ai
CC: Kito.cheng; Robin Dapp; gcc-patches; jeffreyalaw; palmer; palmer; pan2.li
Subject: Re: [PATCH] RISC-V: Fix VSETVL PASS ICE on SLP auto-vectorization
Ok, and just make sure this only appear for trunk, right?

juzhe.zh...@rivai.ai 於 2023年5月29日 週一,12:19寫道:
This patch is fixing VSETVL PASS bug. Ok for trunk ?



juzhe.zh...@rivai.ai

From: juzhe.zhong
Date: 2023-05-26 11:01
To: gcc-patches
CC: kito.cheng; kito.cheng; palmer; palmer; jeffreyalaw; rdapp.gcc; pan2.li; 
Juzhe-Zhong
Subject: [PATCH] RISC-V: Fix VSETVL PASS ICE on SLP auto-vectorization
From: Juzhe-Zhong 

Fix bug reported here:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109974

PR target/109974

gcc/ChangeLog:

* config/riscv/riscv-vsetvl.cc (source_equal_p): Fix ICE.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/vsetvl/pr109974.c: New test.

---
gcc/config/riscv/riscv-vsetvl.cc  | 30 ++-
.../gcc.target/riscv/rvv/vsetvl/pr109974.c| 17 +++
2 files changed, 46 insertions(+), 1 deletion(-)
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr109974.c

diff --git a/gcc/config/riscv/riscv-vsetvl.cc b/gcc/config/riscv/riscv-vsetvl.cc
index 9847d649d1d..fe55f4ccd30 100644
--- a/gcc/config/riscv/riscv-vsetvl.cc
+++ b/gcc/config/riscv/riscv-vsetvl.cc
@@ -1138,7 +1138,35 @@ source_equal_p (insn_info *insn1, insn_info *insn2)
 return false;
   if (!rtx_equal_p (SET_SRC (single_set1), SET_SRC (single_set2)))
 return false;
-  gcc_assert (insn1->uses ().size () == insn2->uses ().size ());
+  /* RTL_SSA uses include REG_NOTE. Consider this following case:
+
+ insn1 RTL:
+ (insn 41 39 42 4 (set (reg:DI 26 s10 [orig:159 loop_len_46 ] [159])
+   (umin:DI (reg:DI 15 a5 [orig:201 _149 ] [201])
+ (reg:DI 14 a4 [276]))) 408 {*umindi3}
+ (expr_list:REG_EQUAL (umin:DI (reg:DI 15 a5 [orig:201 _149 ] [201])
+ (const_int 2 [0x2]))
+ (nil)))
+ The RTL_SSA uses of this instruction has 2 uses:
+ 1. (reg:DI 15 a5 [orig:201 _149 ] [201]) - twice.
+ 2. (reg:DI 14 a4 [276]) - once.
+
+ insn2 RTL:
+ (insn 38 353 351 4 (set (reg:DI 27 s11 [orig:160 loop_len_47 ] [160])
+   (umin:DI (reg:DI 15 a5 [orig:199 _146 ] [199])
+ (reg:DI 14 a4 [276]))) 408 {*umindi3}
+ (expr_list:REG_EQUAL (umin:DI (reg:DI 28 t3 [orig:200 ivtmp_147 ] [200])
+ (const_int 2 [0x2]))
+ (nil)))
+  The RTL_SSA uses of this instruction has 3 uses:
+ 1. (reg:DI 15 a5 [orig:199 _146 ] [199]) - once
+ 2. (reg:DI 14 a4 [276]) - once
+ 3. (reg:DI 28 t3 [orig:200 ivtmp_147 ] [200]) - once
+
+  Return false when insn1->uses ().size () != insn2->uses ().size ()
+  */
+  if (insn1->uses ().size () != insn2->uses ().size ())
+return false;
   for (size_t i = 0; i < insn1->uses ().size (); i++)
 if (insn1->uses ()[i] != insn2->uses ()[i])
   return false;
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr109974.c 
b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr109974.c
new file mode 100644
index 000..06a8562ebab
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr109974.c
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv32gcv_zbb -mabi=ilp32d --param 
riscv-autovec-preference=fixed-vlmax -O3" } */
+
+#include 
+
+void
+func (int8_t *__restrict x, int64_t *__restrict y, int n)
+{
+  for (int i = 0, j = 0; i < n; i++, j +=2 )
+  {
+x[i + 0] += 1;
+y[j + 0] += 1;
+y[j + 1] += 2;
+  }
+}
+
+/* { dg-final { scan-assembler {vsetvli} { target { no-opts "-O0" no-opts 
"-O1" no-opts "-Os" no-opts "-Oz" no-opts "-g" no-opts "-funroll-loops" } } } } 
*/
-- 
2.36.3



Re: [PATCH] RISC-V: Fix VSETVL PASS ICE on SLP auto-vectorization

2023-05-28 Thread juzhe.zh...@rivai.ai
This patch is fixing VSETVL PASS bug. Ok for trunk ?



juzhe.zh...@rivai.ai
 
From: juzhe.zhong
Date: 2023-05-26 11:01
To: gcc-patches
CC: kito.cheng; kito.cheng; palmer; palmer; jeffreyalaw; rdapp.gcc; pan2.li; 
Juzhe-Zhong
Subject: [PATCH] RISC-V: Fix VSETVL PASS ICE on SLP auto-vectorization
From: Juzhe-Zhong 
 
Fix bug reported here:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109974
 
PR target/109974
 
gcc/ChangeLog:
 
* config/riscv/riscv-vsetvl.cc (source_equal_p): Fix ICE.
 
gcc/testsuite/ChangeLog:
 
* gcc.target/riscv/rvv/vsetvl/pr109974.c: New test.
 
---
gcc/config/riscv/riscv-vsetvl.cc  | 30 ++-
.../gcc.target/riscv/rvv/vsetvl/pr109974.c| 17 +++
2 files changed, 46 insertions(+), 1 deletion(-)
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr109974.c
 
diff --git a/gcc/config/riscv/riscv-vsetvl.cc b/gcc/config/riscv/riscv-vsetvl.cc
index 9847d649d1d..fe55f4ccd30 100644
--- a/gcc/config/riscv/riscv-vsetvl.cc
+++ b/gcc/config/riscv/riscv-vsetvl.cc
@@ -1138,7 +1138,35 @@ source_equal_p (insn_info *insn1, insn_info *insn2)
 return false;
   if (!rtx_equal_p (SET_SRC (single_set1), SET_SRC (single_set2)))
 return false;
-  gcc_assert (insn1->uses ().size () == insn2->uses ().size ());
+  /* RTL_SSA uses include REG_NOTE. Consider this following case:
+
+ insn1 RTL:
+ (insn 41 39 42 4 (set (reg:DI 26 s10 [orig:159 loop_len_46 ] [159])
+   (umin:DI (reg:DI 15 a5 [orig:201 _149 ] [201])
+ (reg:DI 14 a4 [276]))) 408 {*umindi3}
+ (expr_list:REG_EQUAL (umin:DI (reg:DI 15 a5 [orig:201 _149 ] [201])
+ (const_int 2 [0x2]))
+ (nil)))
+ The RTL_SSA uses of this instruction has 2 uses:
+ 1. (reg:DI 15 a5 [orig:201 _149 ] [201]) - twice.
+ 2. (reg:DI 14 a4 [276]) - once.
+
+ insn2 RTL:
+ (insn 38 353 351 4 (set (reg:DI 27 s11 [orig:160 loop_len_47 ] [160])
+   (umin:DI (reg:DI 15 a5 [orig:199 _146 ] [199])
+ (reg:DI 14 a4 [276]))) 408 {*umindi3}
+ (expr_list:REG_EQUAL (umin:DI (reg:DI 28 t3 [orig:200 ivtmp_147 ] [200])
+ (const_int 2 [0x2]))
+ (nil)))
+  The RTL_SSA uses of this instruction has 3 uses:
+ 1. (reg:DI 15 a5 [orig:199 _146 ] [199]) - once
+ 2. (reg:DI 14 a4 [276]) - once
+ 3. (reg:DI 28 t3 [orig:200 ivtmp_147 ] [200]) - once
+
+  Return false when insn1->uses ().size () != insn2->uses ().size ()
+  */
+  if (insn1->uses ().size () != insn2->uses ().size ())
+return false;
   for (size_t i = 0; i < insn1->uses ().size (); i++)
 if (insn1->uses ()[i] != insn2->uses ()[i])
   return false;
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr109974.c 
b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr109974.c
new file mode 100644
index 000..06a8562ebab
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/vsetvl/pr109974.c
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv32gcv_zbb -mabi=ilp32d --param 
riscv-autovec-preference=fixed-vlmax -O3" } */
+
+#include 
+
+void
+func (int8_t *__restrict x, int64_t *__restrict y, int n)
+{
+  for (int i = 0, j = 0; i < n; i++, j +=2 )
+  {
+x[i + 0] += 1;
+y[j + 0] += 1;
+y[j + 1] += 2;
+  }
+}
+
+/* { dg-final { scan-assembler {vsetvli} { target { no-opts "-O0" no-opts 
"-O1" no-opts "-Os" no-opts "-Oz" no-opts "-g" no-opts "-funroll-loops" } } } } 
*/
-- 
2.36.3
 


Re: Re: [PATCH V2] RISC-V: Add RVV FMA auto-vectorization support

2023-05-28 Thread juzhe.zh...@rivai.ai
This is existing bug in GCC 13. I think I should split into 2 patches.



juzhe.zh...@rivai.ai
 
From: Kito Cheng
Date: 2023-05-29 11:17
To: juzhe.zhong
CC: gcc-patches; kito.cheng; palmer; rdapp.gcc; jeffreyalaw; pan2.li
Subject: Re: [PATCH V2] RISC-V: Add RVV FMA auto-vectorization support
LGTM, but with one question.
 
On Fri, May 26, 2023 at 7:36 PM  wrote:
>
> From: Juzhe-Zhong 
>
> This patch support FMA auto-vectorization pattern.
> 1. Let's RA decide vmacc or vmadd.
> 2. Fix bug of vector.md which generate incorrect information to VSETVL
>PASS when testing ternop-3.c.
 
Does this bug also appear in GCC 13? or this is new bug introduced at trunk
 


Re: [PATCH V2] RISC-V: Add RVV FMA auto-vectorization support

2023-05-28 Thread juzhe.zh...@rivai.ai
Ping。Ok for trunk?



juzhe.zh...@rivai.ai
 
From: juzhe.zhong
Date: 2023-05-26 19:35
To: gcc-patches
CC: kito.cheng; palmer; rdapp.gcc; jeffreyalaw; kito.cheng; pan2.li; Juzhe-Zhong
Subject: [PATCH V2] RISC-V: Add RVV FMA auto-vectorization support
From: Juzhe-Zhong 
 
This patch support FMA auto-vectorization pattern.
1. Let's RA decide vmacc or vmadd.
2. Fix bug of vector.md which generate incorrect information to VSETVL
   PASS when testing ternop-3.c.
 
gcc/ChangeLog:
 
* config/riscv/autovec.md (fma4): New pattern.
(*fma): Ditto.
* config/riscv/riscv-protos.h (enum insn_type): New enum.
(emit_vlmax_ternary_insn): New function.
* config/riscv/riscv-v.cc (emit_vlmax_ternary_insn): Ditto.
* config/riscv/vector.md: Fix vimuladd instruction bug.
 
gcc/testsuite/ChangeLog:
 
* gcc.target/riscv/rvv/rvv.exp: Add ternary tests
* gcc.target/riscv/rvv/autovec/ternop/ternop-1.c: New test.
* gcc.target/riscv/rvv/autovec/ternop/ternop-2.c: New test.
* gcc.target/riscv/rvv/autovec/ternop/ternop-3.c: New test.
* gcc.target/riscv/rvv/autovec/ternop/ternop_run-1.c: New test.
* gcc.target/riscv/rvv/autovec/ternop/ternop_run-2.c: New test.
* gcc.target/riscv/rvv/autovec/ternop/ternop_run-3.c: New test.
 
---
gcc/config/riscv/autovec.md   |  65 +++
gcc/config/riscv/riscv-protos.h   |   2 +
gcc/config/riscv/riscv-v.cc   |  20 
gcc/config/riscv/vector.md|   2 +-
.../riscv/rvv/autovec/ternop/ternop-1.c   |  28 +
.../riscv/rvv/autovec/ternop/ternop-2.c   |  34 ++
.../riscv/rvv/autovec/ternop/ternop-3.c   |  33 ++
.../riscv/rvv/autovec/ternop/ternop_run-1.c   |  84 ++
.../riscv/rvv/autovec/ternop/ternop_run-2.c   | 104 ++
.../riscv/rvv/autovec/ternop/ternop_run-3.c   | 104 ++
gcc/testsuite/gcc.target/riscv/rvv/rvv.exp|   2 +
11 files changed, 477 insertions(+), 1 deletion(-)
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/ternop/ternop-1.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/ternop/ternop-2.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/ternop/ternop-3.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/ternop/ternop_run-1.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/ternop/ternop_run-2.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/ternop/ternop_run-3.c
 
diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
index 7fe4d94de39..04825df1210 100644
--- a/gcc/config/riscv/autovec.md
+++ b/gcc/config/riscv/autovec.md
@@ -373,3 +373,68 @@
 DONE;
   }
)
+
+;; =
+;; == Ternary arithmetic
+;; =
+
+;; -
+;;  [INT] VMACC and VMADD
+;; -
+;; Includes:
+;; - vmacc
+;; - vmadd
+;; -
+
+;; We can't expand FMA for the following reasons:
+;; 1. Before RA, we don't know which multiply-add instruction is the ideal one.
+;;The vmacc is the ideal instruction when operands[3] overlaps operands[0].
+;;The vmadd is the ideal instruction when operands[1|2] overlaps 
operands[0].
+;; 2. According to vector.md, the multiply-add patterns has 'merge' operand 
which
+;;is the operands[5]. Since operands[5] should overlap operands[0], this 
operand
+;;should be allocated the same regno as operands[1|2|3].
+;; 3. The 'merge' operand is always a real merge operand and we don't allow 
undefined
+;;operand.
+;; 4. The operation of FMA pattern needs VLMAX vsetlvi which needs a VL 
operand.
+;;
+;; In this situation, we design the codegen of FMA as follows:
+;; 1. clobber a scratch in the expand pattern of FMA.
+;; 2. Let's RA decide which input operand (operands[1|2|3]) overlap 
operands[0].
+;; 3. Generate instructions (vmacc or vmadd) according to the register 
allocation
+;;result after reload_completed.
+(define_expand "fma4"
+  [(parallel
+[(set (match_operand:VI 0 "register_operand" "=vr")
+   (plus:VI
+ (mult:VI
+   (match_operand:VI 1 "register_operand" " vr")
+   (match_operand:VI 2 "register_operand" " vr"))
+ (match_operand:VI 3 "register_operand"   " vr")))
+ (clobber (match_scratch:SI 4))])]
+  "TARGET_VECTOR"
+  {})
+
+(define_insn_and_split "*fma"
+  [(set (match_operand:VI 0 "register_operand" "=vr, vr, ?&vr")
+ (plus:VI
+   (mult:VI
+ (match_operand:VI 1 "register_operand

Re: Re: decremnt IV patch create fails on PowerPC

2023-05-26 Thread juzhe.zh...@rivai.ai
Hi, Richi. Thanks for your analysis and helps.

>> We could simply retain the original
>> incrementing IV for loop control and add the decrementing
>> IV for computing LEN in addition to that and leave IVOPTs
>> sorting out to eventually merge them (or not).

I am not sure how to do that. Could you give me more informations?

I somehow understand your concern is that variable amount of IV will make
IVOPT fails. 

I have seen similar situation in LLVM (when apply variable IV,
they failed to interleave the vectorize code). I am not sure whether they
are the same reason for that.

For RVV, we not only want decrement IV style in vectorization but also
we want to apply SELECT_VL in single-rgroup which is most happen cases (LLVM 
also only apply get_vector_length in single vector length).

>>You can do some testing with a cross compiler, alternatively
>>there are powerpc machines in the GCC compile farm.

It seems that Power is ok with decrement IV since most cases are improved.

I think Richard may help to explain decrement IV more clearly.

Thanks


juzhe.zh...@rivai.ai
 
From: Richard Biener
Date: 2023-05-26 14:46
To: 钟居哲
CC: gcc-patches; richard.sandiford; linkw
Subject: Re: decremnt IV patch create fails on PowerPC
On Fri, 26 May 2023, ??? wrote:
 
> Yesterday's patch has been approved (decremnt IV support):
> https://gcc.gnu.org/pipermail/gcc-patches/2023-May/619663.html 
> 
> However, it creates fails on PowerPC:
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109971 
> 
> I am really sorry for causing inconvinience.
> 
> I wonder as we disccussed:
> +  /* If we're vectorizing a loop that uses length "controls" and
> + can iterate more than once, we apply decrementing IV approach
> + in loop control.  */
> +  if (LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo)
> +  && !LOOP_VINFO_LENS (loop_vinfo).is_empty ()
> +  && LOOP_VINFO_PARTIAL_LOAD_STORE_BIAS (loop_vinfo) == 0
> +  && !(LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo)
> +&& known_le (LOOP_VINFO_INT_NITERS (loop_vinfo),
> + LOOP_VINFO_VECT_FACTOR (loop_vinfo
> +LOOP_VINFO_USING_DECREMENTING_IV_P (loop_vinfo) = true;
> 
> This conditions can not disable decrement IV on PowerPC.
> Should I add a target hook for it?
 
No.  I've put some analysis in the PR.  To me the question is
why (without that SELECT_VL case) we need a decrementing IV
_for the loop control_?  We could simply retain the original
incrementing IV for loop control and add the decrementing
IV for computing LEN in addition to that and leave IVOPTs
sorting out to eventually merge them (or not).
 
Alternatively avoid the variable decrement as I wrote in the
PR and do the exit test based on the previous IV value.
 
But as said all this won't work for the SELECT_VL case, but
then it's availability is something to key off rather than a
new target hook?
 
> The patch I can only do bootstrap and regression on X86.
> I didn't have an environment to test PowerPC. I am really sorry.
 
You can do some testing with a cross compiler, alternatively
there are powerpc machines in the GCC compile farm.
 
Richard.
 


Re: [PATCH] RISC-V: Add autovec sign/zero extension and truncation.

2023-05-25 Thread juzhe.zh...@rivai.ai
I realize that both TARGET_VECTORIZE_AUTOVECTORIZE_VECTOR_MODES and 
TARGET_VECTORIZE_RELATED_MODE
will partially enable some auto-vectorization even preferred_simd_mode does not 
enable auto-vectorization
when we don't specify --param=riscv-autovec-preference.

So plz add autovec_use_vlmax_p
into both these target hook implementation.

+opt_machine_mode
+vectorize_related_mode (machine_mode vector_mode, scalar_mode element_mode,
+ poly_uint64 nunits)
+{
+  /* TODO: We will support RVV VLS auto-vectorization mode in the future. */
+  poly_uint64 min_units;
+  if (riscv_v_ext_mode_p (vector_mode)
+  && multiple_p (BYTES_PER_RISCV_VECTOR * ((int) riscv_autovec_lmul),
+  GET_MODE_SIZE (element_mode), &min_units))

Change it into:

+opt_machine_mode
+vectorize_related_mode (machine_mode vector_mode, scalar_mode element_mode,
+ poly_uint64 nunits)
+{
+  /* TODO: We will support RVV VLS auto-vectorization mode in the future. */
+  poly_uint64 min_units;
+  if (riscv_v_ext_vector_mode_p (vector_mode) &&  autovec_use_vlmax_p ()
+  && multiple_p (BYTES_PER_RISCV_VECTOR * ((int) riscv_autovec_lmul),
+  GET_MODE_SIZE (element_mode), &min_units))


And

+unsigned int
+autovectorize_vector_modes (vector_modes *modes, bool)
+{
+  if (TARGET_VECTOR)
+{

You don't need TAREGET_VECTOR since you already gate it in :

+/* Implement TARGET_VECTORIZE_AUTOVECTORIZE_VECTOR_MODES.  */
+unsigned int
+riscv_autovectorize_vector_modes (vector_modes *modes, bool all)
+{
+  if (TARGET_VECTOR)
+return riscv_vector::autovectorize_vector_modes (modes, all);
+
+  return default_autovectorize_vector_modes (modes, all);
+}

so plz change it into :

+unsigned int
+autovectorize_vector_modes (vector_modes *modes, bool)
+{
+  if (autovec_use_vlmax_p ())
+{

Doing this just like in riscv_vector::preferred_simd_modes

Others let Kito chime in more comments.

Thanks.


juzhe.zh...@rivai.ai
 
From: Robin Dapp
Date: 2023-05-25 17:03
To: gcc-patches; Kito Cheng; palmer; juzhe.zh...@rivai.ai; jeffreyalaw
CC: rdapp.gcc
Subject: [PATCH] RISC-V: Add autovec sign/zero extension and truncation.
Hi,
 
this patch implements the autovec expanders for sign and zero extension
patterns as well as the accompanying truncations.  In order to use them
additional mode_attr iterators as well as vectorizer hooks are required.
Using these hooks we can e.g. vectorize with VNx4QImode as base mode
and extend VNx4SI to VNx4DI.  They are still going to be expanded in the
future.
 
vf4 and vf8 truncations are emulated by truncating two and three times
respectively.
 
The patch also adds tests and changes some expectations for already
existing ones.
 
Combine does not yet handle binary operations of two widened operands
as we are missing the necessary split/rewrite patterns.  These will be
added at a later time.
 
Co-authored-by: Juzhe Zhong 
 
riscv.exp testsuite is unchanged.  zero-scratch-regs-3.c seems
to FAIL in vcondu but that already happens on trunk.
 
Regards
Robin
 
gcc/ChangeLog:
 
* config/riscv/autovec.md (2): New
expander.
(2): Dito.
(2): Dito.
(trunc2): Dito.
(trunc2): Dito.
(trunc2): Dito.
* config/riscv/riscv-protos.h (riscv_v_ext_mode_p): Declare.
(vectorize_related_mode): Define.
(autovectorize_vector_modes): Define.
* config/riscv/riscv-v.cc (vectorize_related_mode): Implement
hook.
(autovectorize_vector_modes): Implement hook.
* config/riscv/riscv.cc (riscv_v_ext_tuple_mode_p): Export.
(riscv_autovectorize_vector_modes): Implement target hook.
(riscv_vectorize_related_mode): Implement target hook.
(TARGET_VECTORIZE_AUTOVECTORIZE_VECTOR_MODES): Define.
(TARGET_VECTORIZE_RELATED_MODE): Define.
* config/riscv/vector-iterators.md: Add lowercase versions of
mode_attr iterators.
 
gcc/testsuite/ChangeLog:
 
* gcc.target/riscv/rvv/autovec/binop/shift-rv32gcv.c: Adjust
expectation.
* gcc.target/riscv/rvv/autovec/binop/shift-rv64gcv.c: Dito.
* gcc.target/riscv/rvv/autovec/binop/vdiv-run.c: Dito.
* gcc.target/riscv/rvv/autovec/binop/vdiv-rv32gcv.c: Dito.
* gcc.target/riscv/rvv/autovec/binop/vdiv-rv64gcv.c: Dito.
* gcc.target/riscv/rvv/autovec/binop/vdiv-template.h: Dito.
* gcc.target/riscv/rvv/autovec/binop/vrem-rv32gcv.c: Dito.
* gcc.target/riscv/rvv/autovec/binop/vrem-rv64gcv.c: Dito.
* gcc.target/riscv/rvv/autovec/zve32f_zvl128b-2.c: Dito.
* gcc.target/riscv/rvv/autovec/zve32x_zvl128b-2.c: Dito.
* gcc.target/riscv/rvv/autovec/zve64d-2.c: Dito.
* gcc.target/riscv/rvv/autovec/zve64f-2.c: Dito.
* gcc.target/riscv/rvv/autovec/zve64x-2.c: Dito.
* gcc.target/riscv/rvv/rvv.exp: Add new conversion tests.
* gcc.target/riscv/rvv/vsetvl/avl_single-38.c: Do not vectorize.
* gcc.target/riscv/rvv/vsetvl/avl_single-47.c: Dito.
* gcc.target/riscv/rvv/vsetvl/avl_single-48.c: Dito.
* gcc.target/riscv/rvv/vsetvl/avl_single-49.c: Dito.
* gcc.target/riscv/rvv/vsetvl/imm_switch-8.c: Dito.
* gcc.target/riscv/rvv/autovec/conversions/vncvt-run.c: New test.
* gcc.target/riscv/rvv/autovec/conversions/vncvt-rv32gcv.c

Re: [PATCH v2] RISC-V: Implement autovec abs, vneg, vnot.

2023-05-25 Thread juzhe.zh...@rivai.ai
+(define_expand "abs2"
+  [(set (match_operand:VI 0 "register_operand")
+(match_operand:VI 1 "register_operand"))]
+  "TARGET_VECTOR"
+{
+  rtx zero = gen_const_vec_duplicate (mode, GEN_INT (0));
+  machine_mode mask_mode = riscv_vector::get_mask_mode (mode).require ();
+  rtx mask = gen_reg_rtx (mask_mode);
+  riscv_vector::expand_vec_cmp (mask, LT, operands[1], zero);
+
+  /* For masking we need two more operands than a regular unop, the mask
+ itself and the maskoff operand.  */
+  rtx ops[] = {operands[0], mask, operands[1], operands[1]};
+  riscv_vector::emit_vlmax_masked_insn (code_for_pred (NEG, mode),
+ riscv_vector::RVV_UNOP + 2, ops);
+  DONE;
+})

+/* This function emits a masked instruction.  */
+void
+emit_vlmax_masked_insn (unsigned icode, int op_num, rtx *ops)
+{
+  machine_mode dest_mode = GET_MODE (ops[0]);
+  machine_mode mask_mode = get_mask_mode (dest_mode).require ();
+  insn_expander<11> e (/*OP_NUM*/ op_num, /*HAS_DEST_P*/ true,
+/*FULLY_UNMASKED_P*/ false,
+/*USE_REAL_MERGE_P*/ true,
+/*HAS_AVL_P*/ true,
+/*VLMAX_P*/ true, dest_mode, mask_mode);
+  e.set_policy (TAIL_ANY);
+  e.set_policy (MASK_ANY);
+  e.emit_insn ((enum insn_code) icode, ops);
+}

I think it's logically incorrect.  For ABS, you want:

operands[0] = operads[1] > 0 ? operands[1] :  (-operands[1])
So you should do this following sequence:

vmslt v0,v1,0
vneg v1,v1v0.t (should use Mask undisturbed)

Here I see you set:
e.set_policy (MASK_ANY); which is incorrect.
You should use e.set_policy (MASK_UNDISTURBED); instead.

Your testcases fail to catch this issue (you should create a testcase to catch 
this bug with this patch implementation.)

Besides, 
riscv_vector::RVV_UNOP + 2, ops);

You should not use RVV_UNOP+2. Instead, you should add an enum call RVV_UNOP_MU 
and replace it.

Thanks.


juzhe.zh...@rivai.ai
 
From: Robin Dapp
Date: 2023-05-25 18:08
To: gcc-patches; Kito Cheng; palmer; juzhe.zh...@rivai.ai; jeffreyalaw
CC: rdapp.gcc
Subject: [PATCH v2] RISC-V: Implement autovec abs, vneg, vnot.
Hi,
 
this patch implements abs2, vneg2 and vnot2 expanders
for integer vector registers and adds tests for them.
 
v2 is rebased against Juzhe's latest refactoring.
 
Regards
Robin
 
gcc/ChangeLog:
 
* config/riscv/autovec.md (2): Add vneg/vnot.
(abs2): Add.
* config/riscv/riscv-protos.h (emit_vlmax_masked_insn): Declare.
* config/riscv/riscv-v.cc (emit_vlmax_masked_insn): New
function.
 
gcc/testsuite/ChangeLog:
 
* gcc.target/riscv/rvv/rvv.exp: Add unop tests.
* gcc.target/riscv/rvv/autovec/unop/abs-run.c: New test.
* gcc.target/riscv/rvv/autovec/unop/abs-rv32gcv.c: New test.
* gcc.target/riscv/rvv/autovec/unop/abs-rv64gcv.c: New test.
* gcc.target/riscv/rvv/autovec/unop/abs-template.h: New test.
* gcc.target/riscv/rvv/autovec/unop/vneg-run.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vneg-rv32gcv.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vneg-rv64gcv.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vneg-template.h: New test.
* gcc.target/riscv/rvv/autovec/unop/vnot-run.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vnot-rv32gcv.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vnot-rv64gcv.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vnot-template.h: New test.
---
gcc/config/riscv/autovec.md   | 45 ++-
gcc/config/riscv/riscv-protos.h   |  1 +
gcc/config/riscv/riscv-v.cc   | 16 +++
.../riscv/rvv/autovec/unop/abs-run.c  | 29 
.../riscv/rvv/autovec/unop/abs-rv32gcv.c  |  7 +++
.../riscv/rvv/autovec/unop/abs-rv64gcv.c  |  7 +++
.../riscv/rvv/autovec/unop/abs-template.h | 26 +++
.../riscv/rvv/autovec/unop/vneg-run.c | 29 
.../riscv/rvv/autovec/unop/vneg-rv32gcv.c |  6 +++
.../riscv/rvv/autovec/unop/vneg-rv64gcv.c |  6 +++
.../riscv/rvv/autovec/unop/vneg-template.h| 18 
.../riscv/rvv/autovec/unop/vnot-run.c | 43 ++
.../riscv/rvv/autovec/unop/vnot-rv32gcv.c |  6 +++
.../riscv/rvv/autovec/unop/vnot-rv64gcv.c |  6 +++
.../riscv/rvv/autovec/unop/vnot-template.h| 22 +
gcc/testsuite/gcc.target/riscv/rvv/rvv.exp|  2 +
16 files changed, 268 insertions(+), 1 deletion(-)
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/abs-run.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/abs-rv32gcv.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/abs-rv64gcv.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/abs-template.h
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/vneg-run.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/vneg-rv32gcv.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/vneg-rv64gcv.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/vneg-template.h
create mode 100644 gcc/te

Re: Re: [PATCH V15] VECT: Add decrement IV iteration loop control by variable amount support

2023-05-25 Thread juzhe.zh...@rivai.ai
Yeah. I see. Removing it will cause testcase run fail.
Now I found the issue, since you want to store the step in the iv_rgroup.

After I tried, the IR looks correct but create ICE:
0x18c8d41 process_bb
../../../riscv-gcc/gcc/tree-ssa-sccvn.cc:7933
0x18cb6d9 do_rpo_vn_1
../../../riscv-gcc/gcc/tree-ssa-sccvn.cc:8544
0x18cbd35 do_rpo_vn(function*, edge_def*, bitmap_head*, bool, bool, vn_lookup_ki
../../../riscv-gcc/gcc/tree-ssa-sccvn.cc:8646
0x19d42d2 execute
../../../riscv-gcc/gcc/tree-vectorizer.cc:1385

This is the IR:

loop_len_76 = MIN_EXPR ;

  loop_len_66 = MIN_EXPR ;   -> store the step in rgroup instead 
of LOOP_VINFO
  _103 = loop_len_66;  >reuse the MIN VALUE

  loop_len_66 = MIN_EXPR ;

  _104 = _103 - loop_len_66;  ->use MIN - loop_len_66

  loop_len_65 = MIN_EXPR <_104, 4>;
  _105 = _104 - loop_len_65;
  loop_len_64 = MIN_EXPR <_105, 4>;
  loop_len_63 = _105 - loop_len_64;

Since previously I store the "MIN_EXPR ;" in the LOOP_VINFO, not 
the rgroup.

So previously is correct and no ICE:

  loop_len_76 = MIN_EXPR ;

 _103 = MIN_EXPR ;-> Step store in the LOOP_VINFO (S)

  loop_len_66 = MIN_EXPR <_103, 4>; 

  _104 = _103 - loop_len_66;  ->  use MIN - loop_len_66

  loop_len_65 = MIN_EXPR <_104, 4>;
  _105 = _104 - loop_len_65;
  loop_len_64 = MIN_EXPR <_105, 4>;
  loop_len_63 = _105 - loop_len_64;

Could you help me with this ?
Thanks.


juzhe.zh...@rivai.ai
 
From: Richard Sandiford
Date: 2023-05-25 18:19
To: juzhe.zhong\@rivai.ai
CC: gcc-patches; rguenther
Subject: Re: [PATCH V15] VECT: Add decrement IV iteration loop control by 
variable amount support
"juzhe.zh...@rivai.ai"  writes:
> Hi, Richard. Thanks for the comments.
>
>>> if (!LOOP_VINFO_USING_DECREMENTING_IV_P (loop_vinfo)
>>> || !iv_rgc
>>> || (iv_rgc->max_nscalars_per_iter * iv_rgc->factor
>>> != rgc->max_nscalars_per_iter * rgc->factor))
>>>   {
>   >>   /* See whether zero-based IV would ever generate all-false 
> masks
>>> or zero length before wrapping around.  */
>>>  bool might_wrap_p = vect_rgroup_iv_might_wrap_p (loop_vinfo, 
> rgc);
>  
>>>  /* Set up all controls for this group.  */
>  >>test_ctrl = vect_set_loop_controls_directly (loop, loop_vinfo,
> >>  &preheader_seq,
> >>  &header_seq,
> >>  loop_cond_gsi, 
> rgc,
> >>  niters, 
> niters_skip,
> >>  might_wrap_p);
>  
>>>  iv_rgc = rgc;
>   >> }
>
>
> Could you tell me why you add:
> (iv_rgc->max_nscalars_per_iter * iv_rgc->factor
>>> != rgc->max_nscalars_per_iter * rgc->factor) ?
 
The patch creates IVs with the following step:
 
  gimple_seq_add_stmt (header_seq, gimple_build_assign (step, MIN_EXPR,
index_before_incr,
nitems_step));
 
If nitems_step is the same for two IVs, those IVs will always be equal.
 
So having multiple IVs with the same nitems_step is redundant.
 
nitems_step is calculated as follows:
 
  unsigned int nitems_per_iter = rgc->max_nscalars_per_iter * rgc->factor;
  ...
  poly_uint64 vf = LOOP_VINFO_VECT_FACTOR (loop_vinfo);
  ...
 
  if (nitems_per_iter != 1)
{
  ...
  tree iv_factor = build_int_cst (iv_type, nitems_per_iter);
  ...
  nitems_step = gimple_build (preheader_seq, MULT_EXPR, iv_type,
  nitems_step, iv_factor);
  ...
}
 
so nitems_per_step is equal to:
 
  rgc->max_nscalars_per_iter * rgc->factor * VF
 
VF is fixed for a loop, so nitems_step is equal for two different
rgroup_controls if:
 
  rgc->max_nscalars_per_iter * rgc->factor
 
is the same for those rgroup_controls.
 
Please try the example I posted earlier today. I think you'll see that,
without the:
 
  (iv_rgc->max_nscalars_per_iter * iv_rgc->factor
   != rgc->max_nscalars_per_iter * rgc->factor)
 
you'll have two IVs with the same step (because their MIN_EXPRs have
the same bound).
 
Thanks,
Richard
 


Re: Re: [PATCH V15] VECT: Add decrement IV iteration loop control by variable amount support

2023-05-25 Thread juzhe.zh...@rivai.ai
Hi, Richard. Thanks for the comments.

>> if (!LOOP_VINFO_USING_DECREMENTING_IV_P (loop_vinfo)
>> || !iv_rgc
>> || (iv_rgc->max_nscalars_per_iter * iv_rgc->factor
>> != rgc->max_nscalars_per_iter * rgc->factor))
>>   {
  >>   /* See whether zero-based IV would ever generate all-false masks
   >> or zero length before wrapping around.  */
   >>  bool might_wrap_p = vect_rgroup_iv_might_wrap_p (loop_vinfo, 
rgc);
 
   >>  /* Set up all controls for this group.  */
 >>test_ctrl = vect_set_loop_controls_directly (loop, loop_vinfo,
>>  &preheader_seq,
>>  &header_seq,
>>  loop_cond_gsi, rgc,
>>  niters, niters_skip,
>>  might_wrap_p);
 
   >>  iv_rgc = rgc;
  >> }


Could you tell me why you add:
(iv_rgc->max_nscalars_per_iter * iv_rgc->factor
>> != rgc->max_nscalars_per_iter * rgc->factor) ?

When I have this in the condition, ICE for fail to generate IR:
loop_len_76 = MIN_EXPR ;
  loop_len_66 = MIN_EXPR ;
  loop_len_66 = MIN_EXPR ;
  loop_len_65 = MIN_EXPR <0, 4>;

  _103 = -loop_len_65;

  loop_len_64 = MIN_EXPR <_103, 4>;
  loop_len_63 = _103 - loop_len_64;

When I remove it, it works.

Should I remove it?

Thanks.


juzhe.zh...@rivai.ai
 
From: Richard Sandiford
Date: 2023-05-25 17:02
To: juzhe.zhong
CC: gcc-patches; rguenther
Subject: Re: [PATCH V15] VECT: Add decrement IV iteration loop control by 
variable amount support
Thanks, this looks functionally correct to me.  And I agree it handles
the cases that previously needed multiplication.
 
But I think it regresses code quality when no multiplication was needed.
We can now generate duplicate IVs.  Perhaps ivopts would remove the
duplicates, but it might be hard, because of the variable steps.
 
For example, we would generate duplicate IVs for non-SLP code that
operates on multiple vector sizes.  (Can't remembrer what the status
of unpack/truncate patterns is on RVV.)  But it also shows up for SLP.
E.g., I would expect duplicate IVs for:
 
uint16_t x[100];
uint32_t y[200];
 
void f() {
  for (int i = 0; i < 100; i += 2) {
x[i + 0] += 1;
x[i + 1] += 2;
y[i + 0] += 1;
y[i + 1] += 2;
  }
}
 
So I think the call to vect_set_loop_controls_directly does still
need to be inside an "if".  But the "if" condition should be based
on whether the IV step is different.  As discussed yesterday, the
IV step is different if nitems_per_iter, aka:
 
  max_nscalars_per_iter * factor
 
is different.
 
Because of that, I think I was wrong to suggest storing the IV in
loop_vinfo.  It should probably be stored in rgroup_controls instead.
 
Then we could have a structure like this:
 
  rgroup_controls *rgc;
  rgroup_controls *iv_rgc = nullptr;
  ...
  FOR_EACH_VEC_ELT (*controls, i, rgc)
if (!rgc->controls.is_empty ())
  {
...
if (!LOOP_VINFO_USING_DECREMENTING_IV_P (loop_vinfo)
|| !iv_rgc
|| (iv_rgc->max_nscalars_per_iter * iv_rgc->factor
!= rgc->max_nscalars_per_iter * rgc->factor))
  {
/* See whether zero-based IV would ever generate all-false masks
   or zero length before wrapping around.  */
bool might_wrap_p = vect_rgroup_iv_might_wrap_p (loop_vinfo, rgc);
 
/* Set up all controls for this group.  */
test_ctrl = vect_set_loop_controls_directly (loop, loop_vinfo,
 &preheader_seq,
 &header_seq,
 loop_cond_gsi, rgc,
 niters, niters_skip,
 might_wrap_p);
 
iv_rgc = rgc;
  }
 
if (LOOP_VINFO_USING_DECREMENTING_IV_P (loop_vinfo)
&& rgc->controls.length () > 1)
  {
...your code, using the iv in iv_rgc...;
  }
  }
 
Some other comments:
 
> diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc
> index ff6159e08d5..f9d92ced982 100644
> --- a/gcc/tree-vect-loop-manip.cc
> +++ b/gcc/tree-vect-loop-manip.cc
> @@ -468,6 +468,38 @@ vect_set_loop_controls_directly (class loop *loop, 
> loop_vec_info loop_vinfo,
>gimple_stmt_iterator incr_gsi;
>bool insert_after;
>standard_iv_increment_position (loop, &incr_gsi, &insert_after);
> +  if (LOOP_VINFO_USING_DECREMENTING_IV_P (loop_vinfo))
> +{
> +  /* single rgroup:
 
Instead of "single rgroup&q

Re: [PATCH] RISC-V: Add autovec sign/zero extension and truncation.

2023-05-25 Thread juzhe.zh...@rivai.ai
Hi, Robin.

>>+extern bool riscv_v_ext_mode_p (machine_mode mode);
No, we don't need it as global extern.

>> +  if (riscv_v_ext_mode_p (vector_mode)
>>+  && multiple_p (BYTES_PER_RISCV_VECTOR * ((int) riscv_autovec_lmul),
>>+  GET_MODE_SIZE (element_mode), &min_units))

use riscv_v_ext_vector_mode_p  instead since riscv_v_ext_mode_p includes tuple 
modes.
You should not use tuple modes in related_mode. Tuple modes will be used in 
array mode target hook and
used by vec_load_lanes/vec_store_lanes.

Otherwise LGTM since I have reviewed twice already.
Wait for kito's final approval.

Thanks.


juzhe.zh...@rivai.ai
 
From: Robin Dapp
Date: 2023-05-25 17:03
To: gcc-patches; Kito Cheng; palmer; juzhe.zh...@rivai.ai; jeffreyalaw
CC: rdapp.gcc
Subject: [PATCH] RISC-V: Add autovec sign/zero extension and truncation.
Hi,
 
this patch implements the autovec expanders for sign and zero extension
patterns as well as the accompanying truncations.  In order to use them
additional mode_attr iterators as well as vectorizer hooks are required.
Using these hooks we can e.g. vectorize with VNx4QImode as base mode
and extend VNx4SI to VNx4DI.  They are still going to be expanded in the
future.
 
vf4 and vf8 truncations are emulated by truncating two and three times
respectively.
 
The patch also adds tests and changes some expectations for already
existing ones.
 
Combine does not yet handle binary operations of two widened operands
as we are missing the necessary split/rewrite patterns.  These will be
added at a later time.
 
Co-authored-by: Juzhe Zhong 
 
riscv.exp testsuite is unchanged.  zero-scratch-regs-3.c seems
to FAIL in vcondu but that already happens on trunk.
 
Regards
Robin
 
gcc/ChangeLog:
 
* config/riscv/autovec.md (2): New
expander.
(2): Dito.
(2): Dito.
(trunc2): Dito.
(trunc2): Dito.
(trunc2): Dito.
* config/riscv/riscv-protos.h (riscv_v_ext_mode_p): Declare.
(vectorize_related_mode): Define.
(autovectorize_vector_modes): Define.
* config/riscv/riscv-v.cc (vectorize_related_mode): Implement
hook.
(autovectorize_vector_modes): Implement hook.
* config/riscv/riscv.cc (riscv_v_ext_tuple_mode_p): Export.
(riscv_autovectorize_vector_modes): Implement target hook.
(riscv_vectorize_related_mode): Implement target hook.
(TARGET_VECTORIZE_AUTOVECTORIZE_VECTOR_MODES): Define.
(TARGET_VECTORIZE_RELATED_MODE): Define.
* config/riscv/vector-iterators.md: Add lowercase versions of
mode_attr iterators.
 
gcc/testsuite/ChangeLog:
 
* gcc.target/riscv/rvv/autovec/binop/shift-rv32gcv.c: Adjust
expectation.
* gcc.target/riscv/rvv/autovec/binop/shift-rv64gcv.c: Dito.
* gcc.target/riscv/rvv/autovec/binop/vdiv-run.c: Dito.
* gcc.target/riscv/rvv/autovec/binop/vdiv-rv32gcv.c: Dito.
* gcc.target/riscv/rvv/autovec/binop/vdiv-rv64gcv.c: Dito.
* gcc.target/riscv/rvv/autovec/binop/vdiv-template.h: Dito.
* gcc.target/riscv/rvv/autovec/binop/vrem-rv32gcv.c: Dito.
* gcc.target/riscv/rvv/autovec/binop/vrem-rv64gcv.c: Dito.
* gcc.target/riscv/rvv/autovec/zve32f_zvl128b-2.c: Dito.
* gcc.target/riscv/rvv/autovec/zve32x_zvl128b-2.c: Dito.
* gcc.target/riscv/rvv/autovec/zve64d-2.c: Dito.
* gcc.target/riscv/rvv/autovec/zve64f-2.c: Dito.
* gcc.target/riscv/rvv/autovec/zve64x-2.c: Dito.
* gcc.target/riscv/rvv/rvv.exp: Add new conversion tests.
* gcc.target/riscv/rvv/vsetvl/avl_single-38.c: Do not vectorize.
* gcc.target/riscv/rvv/vsetvl/avl_single-47.c: Dito.
* gcc.target/riscv/rvv/vsetvl/avl_single-48.c: Dito.
* gcc.target/riscv/rvv/vsetvl/avl_single-49.c: Dito.
* gcc.target/riscv/rvv/vsetvl/imm_switch-8.c: Dito.
* gcc.target/riscv/rvv/autovec/conversions/vncvt-run.c: New test.
* gcc.target/riscv/rvv/autovec/conversions/vncvt-rv32gcv.c: New test.
* gcc.target/riscv/rvv/autovec/conversions/vncvt-rv64gcv.c: New test.
* gcc.target/riscv/rvv/autovec/conversions/vncvt-template.h: New test.
* gcc.target/riscv/rvv/autovec/conversions/vsext-run.c: New test.
* gcc.target/riscv/rvv/autovec/conversions/vsext-rv32gcv.c: New test.
* gcc.target/riscv/rvv/autovec/conversions/vsext-rv64gcv.c: New test.
* gcc.target/riscv/rvv/autovec/conversions/vsext-template.h: New test.
* gcc.target/riscv/rvv/autovec/conversions/vzext-run.c: New test.
* gcc.target/riscv/rvv/autovec/conversions/vzext-rv32gcv.c: New test.
* gcc.target/riscv/rvv/autovec/conversions/vzext-rv64gcv.c: New test.
* gcc.target/riscv/rvv/autovec/conversions/vzext-template.h: New test.
---
gcc/config/riscv/autovec.md   | 104 ++
gcc/config/riscv/riscv-protos.h   |   5 +
gcc/config/riscv/riscv-v.cc   |  83 ++
gcc/config/riscv/riscv.cc |  31 +-
gcc/config/riscv/vector-iterators.md  |  33 +-
.../riscv/rvv/autovec/binop/shift-rv32gcv.c   |   1 -
.../riscv/rvv/autovec/binop/shift-rv64gcv.c   |   5 +-
.../riscv/rvv/autovec/binop/vdiv-run.c|   4 +-
.../riscv/rvv/autovec/binop/vdiv-rv32gcv.c|   7 +-
.../riscv/rvv/autovec/binop/vdiv-rv64gcv.

Re: [PATCH V15] VECT: Add decrement IV iteration loop control by variable amount support

2023-05-25 Thread juzhe.zh...@rivai.ai
Bootstrap && Regression on X86 passed.

Ok for trunk ?


juzhe.zh...@rivai.ai
 
From: juzhe.zhong
Date: 2023-05-25 10:58
To: gcc-patches
CC: richard.sandiford; rguenther; Ju-Zhe Zhong
Subject: [PATCH V15] VECT: Add decrement IV iteration loop control by variable 
amount support
From: Ju-Zhe Zhong 
 
This patch is supporting decrement IV by following the flow designed by Richard:
 
(1) In vect_set_loop_condition_partial_vectors, for the first iteration of:
call vect_set_loop_controls_directly.
 
(2) vect_set_loop_controls_directly calculates "step" as in your patch.
If rgc has 1 control, this step is the SSA name created for that control.
Otherwise the step is a fresh SSA name, as in your patch.
 
(3) vect_set_loop_controls_directly stores this step somewhere for later
use, probably in LOOP_VINFO.  Let's use "S" to refer to this stored step.
 
(4) After the vect_set_loop_controls_directly call above, and outside
the "if" statement that now contains vect_set_loop_controls_directly,
check whether rgc->controls.length () > 1.  If so, use
vect_adjust_loop_lens_control to set the controls based on S.
 
Then the only caller of vect_adjust_loop_lens_control is
vect_set_loop_condition_partial_vectors.  And the starting
step for vect_adjust_loop_lens_control is always S.
 
This patch has well tested for single-rgroup and multiple-rgroup (SLP) and
passed all testcase in RISC-V port.
 
Also, pass tests for multiple-rgroup (non-SLP) tested on vec_pack_trunk.
 
Fix bugs of V14 patch:
FAIL: gcc.target/riscv/rvv/autovec/partial/multiple_rgroup_run-3.c execution 
test
FAIL: gcc.target/riscv/rvv/autovec/partial/multiple_rgroup_run-3.c execution 
test
FAIL: gcc.target/riscv/rvv/autovec/partial/multiple_rgroup_run-3.c execution 
test
FAIL: gcc.target/riscv/rvv/autovec/partial/multiple_rgroup_run-3.c execution 
test
FAIL: gcc.target/riscv/rvv/autovec/partial/multiple_rgroup_run-3.c execution 
test
FAIL: gcc.target/riscv/rvv/autovec/partial/multiple_rgroup_run-3.c execution 
test
FAIL: gcc.target/riscv/rvv/autovec/partial/multiple_rgroup_run-3.c execution 
test
FAIL: gcc.target/riscv/rvv/autovec/partial/multiple_rgroup_run-3.c execution 
test
 
This patch passed all testcases listed above.
 
gcc/ChangeLog:
 
* tree-vect-loop-manip.cc (vect_set_loop_controls_directly): Add 
decrement IV support.
(vect_adjust_loop_lens_control): Ditto.
(vect_set_loop_condition_partial_vectors): Ditto.
* tree-vect-loop.cc (_loop_vec_info::_loop_vec_info): New variables.
* tree-vectorizer.h (LOOP_VINFO_USING_DECREMENTING_IV_P): New macro.
(LOOP_VINFO_DECREMENTING_IV_STEP): Ditto.
 
gcc/testsuite/ChangeLog:
 
* gcc.target/riscv/rvv/autovec/partial/multiple_rgroup-3.c: New test.
* gcc.target/riscv/rvv/autovec/partial/multiple_rgroup-4.c: New test.
* gcc.target/riscv/rvv/autovec/partial/multiple_rgroup_run-3.c: New 
test.
* gcc.target/riscv/rvv/autovec/partial/multiple_rgroup_run-4.c: New 
test.
 
---
.../rvv/autovec/partial/multiple_rgroup-3.c   | 288 ++
.../rvv/autovec/partial/multiple_rgroup-4.c   |  75 +
.../autovec/partial/multiple_rgroup_run-3.c   |  36 +++
.../autovec/partial/multiple_rgroup_run-4.c   |  15 +
gcc/tree-vect-loop-manip.cc   | 153 ++
gcc/tree-vect-loop.cc |  13 +
gcc/tree-vectorizer.h |  12 +
7 files changed, 592 insertions(+)
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/multiple_rgroup-3.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/multiple_rgroup-4.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/multiple_rgroup_run-3.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/multiple_rgroup_run-4.c
 
diff --git 
a/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/multiple_rgroup-3.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/multiple_rgroup-3.c
new file mode 100644
index 000..9579749c285
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/multiple_rgroup-3.c
@@ -0,0 +1,288 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-march=rv32gcv -mabi=ilp32d --param 
riscv-autovec-preference=fixed-vlmax" } */
+
+#include 
+
+void __attribute__ ((noinline, noclone))
+f0 (int8_t *__restrict x, int16_t *__restrict y, int n)
+{
+  for (int i = 0, j = 0; i < n; i += 4, j += 8)
+{
+  x[i + 0] += 1;
+  x[i + 1] += 2;
+  x[i + 2] += 3;
+  x[i + 3] += 4;
+  y[j + 0] += 1;
+  y[j + 1] += 2;
+  y[j + 2] += 3;
+  y[j + 3] += 4;
+  y[j + 4] += 5;
+  y[j + 5] += 6;
+  y[j + 6] += 7;
+  y[j + 7] += 8;
+}
+}
+
+void __attribute__ ((optimize (0)))
+f0_init (int8_t *__restrict x, int8_t *__restrict x2, int16_t *__restrict y,
+ int16_t *__restrict y2, int n)
+{
+  for (int i = 0, j = 0; i < n; i += 4, j += 8)
+{
+  

Re: Re: RISC-V Bootstrap problems

2023-05-24 Thread juzhe.zh...@rivai.ai
>> It's highly unlikely we'll switch from the mechanisms we're using.
>>They're pretty deeply embedded into how all the ports are developed and
>>work.

We just take a look at the build file. It seems that the functions generated by 
define_insn 
are so many. Do we have the chance optimize it?
I believe the tablegen mechanism in LLVM is well optimized in case of generated 
files and functions
so that they won't be affected to much as instructions go up.

Thanks.


juzhe.zh...@rivai.ai
 
From: Jeff Law
Date: 2023-05-25 12:07
To: juzhe.zh...@rivai.ai; kito.cheng
CC: jeffreyalaw; palmer; vineetg; Kito.cheng; gcc-patches; Patrick O'Neill; 
macro
Subject: Re: RISC-V Bootstrap problems
 
 
On 5/24/23 21:54, juzhe.zh...@rivai.ai wrote:
>  >> IIRC LLVM is using the table driven mechanism, so it's less impact 
> on the
>>>compilation time when the instruction becomes more and more.
> Oh, I see. Could you share more details ?
> Maybe we can support this in GCC.
It's highly unlikely we'll switch from the mechanisms we're using. 
They're pretty deeply embedded into how all the ports are developed and 
work.
 
The first step is to figure out what's exploding.  I strongly suspect 
we'll be able to see this in a cross, but again, the magnitude will be 
smaller.
 
jeff
 


Re: Re: RISC-V Bootstrap problems

2023-05-24 Thread juzhe.zh...@rivai.ai
>> IIRC LLVM is using the table driven mechanism, so it's less impact on the
>> compilation time when the instruction becomes more and more.
Oh, I see. Could you share more details ?
Maybe we can support this in GCC.



juzhe.zh...@rivai.ai
 
From: Kito Cheng
Date: 2023-05-25 11:53
To: juzhe.zh...@rivai.ai
CC: jeffreyalaw; palmer; vineetg; Kito.cheng; gcc-patches; Patrick O'Neill; 
jlaw; macro
Subject: Re: Re: RISC-V Bootstrap problems
Jojo has a patch to try to split those things that should help this,
but seems not landed.
 
https://patchwork.ozlabs.org/project/gcc/patch/20201104015315.81416-1-jiejie_r...@c-sky.com/
 
 
> How about LLVM? Can kito help with this issue?
> LLVM has already supported full intrinsics for a long time and no issues.
 
IIRC LLVM is using the table driven mechanism, so it's less impact on the
compilation time when the instruction becomes more and more.
 
 
On Thu, May 25, 2023 at 11:46 AM juzhe.zh...@rivai.ai
 wrote:
>
> segment intrinsics are really huge amount.
>
> Even though I have tried to optimized them, still we have the issues..
>
> How about LLVM? Can kito help with this issue?
> LLVM has already support full intrinsics for a long time and no issues.
>
> Thanks.
>
>
> juzhe.zh...@rivai.ai
>
> From: Jeff Law
> Date: 2023-05-25 11:43
> To: Palmer Dabbelt; Vineet Gupta
> CC: kito.cheng; gcc-patches; Kito Cheng; Patrick O'Neill; Jeff Law; macro; 
> juzhe.zh...@rivai.ai
> Subject: Re: RISC-V Bootstrap problems
>
>
> On 5/24/23 17:13, Palmer Dabbelt wrote:
> > On Wed, 24 May 2023 16:12:20 PDT (-0700), Vineet Gupta wrote:
>
> [ ... big snip ... ]
>
> >>
> >> Never mind. Looks like I found the issue - with just trial and error and
> >> no idea of how this stuff works.
> >> The torture-{init,finish} needs to be in riscv.exp not rvv.exp
> >> Running full tests now.
> >
> > Thanks!
> Marginally related.  I was able to bisect the "hang" when 3-staging the
> trunk on RISC-V with qemu user mode emulation.
>
> So it wasn't actually hanging, but after the introduction of segment
> intrinsics the compilation time for insn-emit explodes -- previously I
> could do a full 3-stage bootstrap, build the glibc & the kernel, then
> test c/c++/fortran in ~10 hours.
>
> Now just building insn-emit.o alone takes ~10 hours in that environment.
>   I suspect (but have not yet confirmed) that we should see a huge
> compile-time spike in cross builds as well, though obviously it won't be
> as bad since we're not using qemu emulation.
>
> Clearly something isn't scaling well.  I don't know if we've got a crazy
> large function in there, a crazy number of functions or something that's
> just triggering a compile-time scaling problem.  Whatever it is, we
> probably need to address it.
>
> jeff
>
>
 


Re: Re: RISC-V Bootstrap problems

2023-05-24 Thread juzhe.zh...@rivai.ai
Besides, we don't have compilation issues in crossing-compiling (with segment 
intrinsics).
But I do agree we need to address such issue.

As far as I known, GCC compile insn-emit in single thread single core.
Can we multi-thread && multi-core to compile it to speed up the compilation?

Thanks.


juzhe.zh...@rivai.ai
 
From: Jeff Law
Date: 2023-05-25 11:43
To: Palmer Dabbelt; Vineet Gupta
CC: kito.cheng; gcc-patches; Kito Cheng; Patrick O'Neill; Jeff Law; macro; 
juzhe.zh...@rivai.ai
Subject: Re: RISC-V Bootstrap problems
 
 
On 5/24/23 17:13, Palmer Dabbelt wrote:
> On Wed, 24 May 2023 16:12:20 PDT (-0700), Vineet Gupta wrote:
 
[ ... big snip ... ]
 
>>
>> Never mind. Looks like I found the issue - with just trial and error and
>> no idea of how this stuff works.
>> The torture-{init,finish} needs to be in riscv.exp not rvv.exp
>> Running full tests now.
> 
> Thanks!
Marginally related.  I was able to bisect the "hang" when 3-staging the 
trunk on RISC-V with qemu user mode emulation.
 
So it wasn't actually hanging, but after the introduction of segment 
intrinsics the compilation time for insn-emit explodes -- previously I 
could do a full 3-stage bootstrap, build the glibc & the kernel, then 
test c/c++/fortran in ~10 hours.
 
Now just building insn-emit.o alone takes ~10 hours in that environment. 
  I suspect (but have not yet confirmed) that we should see a huge 
compile-time spike in cross builds as well, though obviously it won't be 
as bad since we're not using qemu emulation.
 
Clearly something isn't scaling well.  I don't know if we've got a crazy 
large function in there, a crazy number of functions or something that's 
just triggering a compile-time scaling problem.  Whatever it is, we 
probably need to address it.
 
jeff
 
 


Re: Re: RISC-V Bootstrap problems

2023-05-24 Thread juzhe.zh...@rivai.ai
segment intrinsics are really huge amount. 

Even though I have tried to optimized them, still we have the issues..

How about LLVM? Can kito help with this issue? 
LLVM has already support full intrinsics for a long time and no issues.

Thanks.


juzhe.zh...@rivai.ai
 
From: Jeff Law
Date: 2023-05-25 11:43
To: Palmer Dabbelt; Vineet Gupta
CC: kito.cheng; gcc-patches; Kito Cheng; Patrick O'Neill; Jeff Law; macro; 
juzhe.zh...@rivai.ai
Subject: Re: RISC-V Bootstrap problems
 
 
On 5/24/23 17:13, Palmer Dabbelt wrote:
> On Wed, 24 May 2023 16:12:20 PDT (-0700), Vineet Gupta wrote:
 
[ ... big snip ... ]
 
>>
>> Never mind. Looks like I found the issue - with just trial and error and
>> no idea of how this stuff works.
>> The torture-{init,finish} needs to be in riscv.exp not rvv.exp
>> Running full tests now.
> 
> Thanks!
Marginally related.  I was able to bisect the "hang" when 3-staging the 
trunk on RISC-V with qemu user mode emulation.
 
So it wasn't actually hanging, but after the introduction of segment 
intrinsics the compilation time for insn-emit explodes -- previously I 
could do a full 3-stage bootstrap, build the glibc & the kernel, then 
test c/c++/fortran in ~10 hours.
 
Now just building insn-emit.o alone takes ~10 hours in that environment. 
  I suspect (but have not yet confirmed) that we should see a huge 
compile-time spike in cross builds as well, though obviously it won't be 
as bad since we're not using qemu emulation.
 
Clearly something isn't scaling well.  I don't know if we've got a crazy 
large function in there, a crazy number of functions or something that's 
just triggering a compile-time scaling problem.  Whatever it is, we 
probably need to address it.
 
jeff
 
 


Re: [PATCH v6] RISC-V: Using merge approach to optimize repeating sequence

2023-05-24 Thread juzhe.zh...@rivai.ai
* machmode.h (VECTOR_BOOL_MODE_P): New macro.
--- a/gcc/machmode.h
+++ b/gcc/machmode.h
@@ -134,6 +134,10 @@ extern const unsigned char mode_class[NUM_MACHINE_MODES];
|| GET_MODE_CLASS (MODE) == MODE_VECTOR_ACCUM   \
|| GET_MODE_CLASS (MODE) == MODE_VECTOR_UACCUM)
 
+/* Nonzero if MODE is a vector bool mode.  */
+#define VECTOR_BOOL_MODE_P(MODE)   \
+  (GET_MODE_CLASS (MODE) == MODE_VECTOR_BOOL)  \
+
Why do you add this? But no use. You should drop this.



juzhe.zh...@rivai.ai
 
From: pan2.li
Date: 2023-05-25 11:09
To: gcc-patches
CC: juzhe.zhong; kito.cheng; pan2.li; yanzhang.wang
Subject: [PATCH v6] RISC-V: Using merge approach to optimize repeating sequence
From: Pan Li 
 
This patch would like to optimize the VLS vector initialization like
repeating sequence. From the vslide1down to the vmerge with a simple
cost model, aka every instruction only has 1 cost.
 
Given code with -march=rv64gcv_zvl256b --param 
riscv-autovec-preference=fixed-vlmax
typedef int64_t vnx32di __attribute__ ((vector_size (256)));
 
__attribute__ ((noipa)) void
f_vnx32di (int64_t a, int64_t b, int64_t *out)
{
  vnx32di v = {
a, b, a, b, a, b, a, b,
a, b, a, b, a, b, a, b,
a, b, a, b, a, b, a, b,
a, b, a, b, a, b, a, b,
  };
  *(vnx32di *) out = v;
}
 
Before this patch:
vslide1down.vx (x31 times)
 
After this patch:
li a5,-1431654400
addi a5,a5,-1365
li a3,-1431654400
addi a3,a3,-1366
slli a5,a5,32
add a5,a5,a3
vsetvli a4,zero,e64,m8,ta,ma
vmv.v.x v8,a0
vmv.s.x v0,a5
vmerge.vxm v8,v8,a1,v0
vs8r.v v8,0(a2)
 
Since we dont't have SEW = 128 in vec_duplicate, we can't combine ab into
SEW = 128 element and then broadcast this big element.
 
Signed-off-by: Pan Li 
Co-Authored by: Juzhe-Zhong 
 
gcc/ChangeLog:
 
* config/riscv/riscv-protos.h (enum insn_type): New type.
* config/riscv/riscv-v.cc (RVV_INSN_OPERANDS_MAX): New macro.
(rvv_builder::can_duplicate_repeating_sequence_p): Align the
referenced class member.
(rvv_builder::get_merged_repeating_sequence):
(rvv_builder::repeating_sequence_use_merge_profitable_p): New
function to evaluate the optimization cost.
(rvv_builder::get_merge_scalar_mask): New function to get the
merge mask.
(emit_scalar_move_insn): New function to emit vmv.s.x.
(emit_vlmax_integer_move_insn): New function to emit vlmax vmv.v.x.
(emit_nonvlmax_integer_move_insn): New function to emit nonvlmax
vmv.v.x.
(get_repeating_sequence_dup_machine_mode): New function to get
the dup machine mode.
(expand_vector_init_merge_repeating_sequence): New function to
perform the optimization.
(expand_vec_init): Add this vector init optimization.
* config/riscv/riscv.h (BITS_PER_WORD): New macro.
* machmode.h (VECTOR_BOOL_MODE_P): New macro.
 
gcc/testsuite/ChangeLog:
 
* gcc.target/riscv/rvv/autovec/vls-vlmax/init-repeat-sequence-1.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/init-repeat-sequence-2.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/init-repeat-sequence-3.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/init-repeat-sequence-4.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/init-repeat-sequence-5.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/init-repeat-sequence-run-1.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/init-repeat-sequence-run-2.c: New test.
* gcc.target/riscv/rvv/autovec/vls-vlmax/init-repeat-sequence-run-3.c: New test.
 
Signed-off-by: Pan Li 
---
gcc/config/riscv/riscv-protos.h   |   1 +
gcc/config/riscv/riscv-v.cc   | 225 +-
gcc/config/riscv/riscv.h  |   1 +
gcc/machmode.h|   4 +
.../vls-vlmax/init-repeat-sequence-1.c|  21 ++
.../vls-vlmax/init-repeat-sequence-2.c|  24 ++
.../vls-vlmax/init-repeat-sequence-3.c|  25 ++
.../vls-vlmax/init-repeat-sequence-4.c|  15 ++
.../vls-vlmax/init-repeat-sequence-5.c|  17 ++
.../vls-vlmax/init-repeat-sequence-run-1.c|  47 
.../vls-vlmax/init-repeat-sequence-run-2.c|  46 
.../vls-vlmax/init-repeat-sequence-run-3.c|  41 
12 files changed, 461 insertions(+), 6 deletions(-)
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/init-repeat-sequence-1.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/init-repeat-sequence-2.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/init-repeat-sequence-3.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/init-repeat-sequence-4.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/init-repeat-sequence-5.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/init-repeat-sequence-run-1.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/init-repeat-sequence-run-2.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/vls-vlmax/init-repeat-sequence-run-3.c
 
diff --git a/gcc/config/riscv/riscv-protos.h b/

Re: Re: [PATCH V14] VECT: Add decrement IV iteration loop control by variable amount support

2023-05-24 Thread juzhe.zh...@rivai.ai
Hi, Richard. 
After several tries with your testcases (I already added into V15 patch).
I think "using a new IV" would be better than "multiplication"

Now:
 loop_len_34 = MIN_EXPR ;
  _74 = MIN_EXPR ;   --> multiplication approach will changed 
into  _74 = loop_len_34  * 2;
  loop_len_48 = MIN_EXPR <_74, 4>;
  _77 = _74 - loop_len_48;
  loop_len_49 = MIN_EXPR <_77, 4>;
  _78 = _77 - loop_len_49;
  loop_len_50 = MIN_EXPR <_78, 4>;
  loop_len_51 = _78 - loop_len_50;

I prefer "new IV" since it looks more reasonable and better codegen.
Could you take a look at it:
V15 patch:
https://gcc.gnu.org/pipermail/gcc-patches/2023-May/619534.html 
  
Thanks.


juzhe.zh...@rivai.ai
 
From: Richard Sandiford
Date: 2023-05-25 04:05
To: 钟居哲
CC: gcc-patches; rguenther
Subject: Re: [PATCH V14] VECT: Add decrement IV iteration loop control by 
variable amount support
I'll look at the samples tomorrow, but just to address one thing:
 
钟居哲  writes:
>>> What gives the best code in these cases?  Is emitting a multiplication
>>> better?  Or is using a new IV better?
> Could you give me more detail information about "new refresh IV" approach.
> I'd like to try that.
 
By “using a new IV” I meant calling vect_set_loop_controls_directly
for every rgroup, not just the first.  So in the earlier example,
there would be one decrementing IV for x and one decrementing IV for y.
 
Thanks,
Richard
 
 
 


Re: Re: [V2 COMMITTED] RISC-V: Add RVV mask logic auto-vectorization

2023-05-24 Thread juzhe.zh...@rivai.ai
>
> From: Juzhe-Zhong 
>
> This patch is adding mask logic auto-vectorization.
> define the pattern as "define_insn_and_split" to allow
 
>don't forgot to update here ^

I notice I missed changeLog here. Is that you want me to fix in the
commit log?




juzhe.zh...@rivai.ai
 
From: Kito Cheng
Date: 2023-05-24 15:31
To: juzhe.zhong
CC: gcc-patches; kito.cheng; palmer; palmer; jeffreyalaw; rdapp.gcc; pan2.li
Subject: Re: [V2 COMMITTED] RISC-V: Add RVV mask logic auto-vectorization
LGTM, just one comment in git comment, no need v3, just commit with
the fix is fine :)
 
On Wed, May 24, 2023 at 3:28 PM  wrote:
>
> From: Juzhe-Zhong 
>
> This patch is adding mask logic auto-vectorization.
> define the pattern as "define_insn_and_split" to allow
 
don't forgot to update here ^
 
> combine PASS easily combine series instructions.
>
> For example:
> combine vmxor.mm + vmnot.m into vmxnor.mm
>
> Build success and regression PASS
>
> And committed.
 


Re: Re: [PATCH] RISC-V: Add RVV mask logic auto-vectorization

2023-05-24 Thread juzhe.zh...@rivai.ai
Thanks kito.,
change it into define_insn_and_split send V2 soon.



juzhe.zh...@rivai.ai
 
From: Kito Cheng
Date: 2023-05-24 15:18
To: juzhe.zhong
CC: gcc-patches; kito.cheng; palmer; palmer; jeffreyalaw; rdapp.gcc
Subject: Re: [PATCH] RISC-V: Add RVV mask logic auto-vectorization
Just one comment: define_insn_and_split should be used in this
scenario rather than define_insn_and_rewrite since you are not really
rewriting.
 
You can commit after updating to define_insn_and_split :)
 
On Wed, May 24, 2023 at 3:04 PM  wrote:
>
> From: Juzhe-Zhong 
>
> This patch is adding mask logic auto-vectorization.
> define the pattern as "define_insn_and_split" to allow
> combine PASS easily combine series instructions.
>
> For example:
> combine vmxor.mm + vmnot.m into vmxnor.mm
>
> Build success and regression PASS
>
> Ok for trunk ?
>
> gcc/ChangeLog:
>
> * config/riscv/autovec.md (3): New pattern.
> (one_cmpl2): Ditto.
> (*not): Ditto.
> (*n): Ditto.
> * config/riscv/riscv-v.cc (expand_vec_cmp_float): Change to one_cmpl.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/riscv/rvv/autovec/cmp/vcond-4.c: New test.
> * gcc.target/riscv/rvv/autovec/cmp/vcond_run-4.c: New test.
>
> ---
>  gcc/config/riscv/autovec.md   | 95 +++
>  gcc/config/riscv/riscv-v.cc   |  7 +-
>  .../riscv/rvv/autovec/cmp/vcond-4.c   | 53 +++
>  .../riscv/rvv/autovec/cmp/vcond_run-4.c   | 35 +++
>  4 files changed, 187 insertions(+), 3 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/cmp/vcond-4.c
>  create mode 100644 
> gcc/testsuite/gcc.target/riscv/rvv/autovec/cmp/vcond_run-4.c
>
> diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
> index 4eeeab624a4..cacf27e4e60 100644
> --- a/gcc/config/riscv/autovec.md
> +++ b/gcc/config/riscv/autovec.md
> @@ -163,6 +163,101 @@
>DONE;
>  })
>
> +;; -
> +;;  [BOOL] Binary logical operations
> +;; -
> +;; Includes:
> +;; - vmand.mm
> +;; - vmxor.mm
> +;; - vmor.mm
> +;; -
> +
> +(define_insn_and_rewrite "3"
> +  [(set (match_operand:VB 0 "register_operand" "=vr")
> +   (any_bitwise:VB (match_operand:VB 1 "register_operand" " vr")
> +   (match_operand:VB 2 "register_operand" " vr")))]
> +  "TARGET_VECTOR"
> +  "#"
> +  "&& can_create_pseudo_p ()"
> +  {
> +insn_code icode = code_for_pred (, mode);
> +riscv_vector::emit_vlmax_insn (icode, riscv_vector::RVV_BINOP, operands);
> +DONE;
> +  }
> +  [(set_attr "type" "vmalu")
> +   (set_attr "mode" "")])
> +
> +;; -
> +;;  [BOOL] Inverse
> +;; -
> +;; Includes:
> +;; - vmnot.m
> +;; -
> +
> +(define_insn_and_rewrite "one_cmpl2"
> +  [(set (match_operand:VB 0 "register_operand" "=vr")
> +   (not:VB (match_operand:VB 1 "register_operand" " vr")))]
> +  "TARGET_VECTOR"
> +  "#"
> +  "&& can_create_pseudo_p ()"
> +  {
> +insn_code icode = code_for_pred_not (mode);
> +riscv_vector::emit_vlmax_insn (icode, riscv_vector::RVV_UNOP, operands);
> +DONE;
> +  }
> +  [(set_attr "type" "vmalu")
> +   (set_attr "mode" "")])
> +
> +;; -
> +;;  [BOOL] Binary logical operations (inverted second input)
> +;; -
> +;; Includes:
> +;; - vmandnot.mm
> +;; - vmornot.mm
> +;; -
> +
> +(define_insn_and_rewrite "*not"
> +  [(set (match_operand:VB 0 "register_operand"   "=vr")
> +   (bitmanip_bitwise:VB
> + (not:VB (match_operand:VB 2 "register_operand" " vr"))
> + (match_operand:VB 1 "register_operand" " vr")))]
> +  "TARGET_VECTOR"
> +  "#"
> +  "&& can_creat

Re: Re: [PATCH V2] RISC-V: Add RVV comparison autovectorization

2023-05-23 Thread juzhe.zh...@rivai.ai
Thanks a lot. Part of the comments has already been fixed in V4.
But forget about V4 patch.

Could you continue review V5 patch that I just send ?
https://gcc.gnu.org/pipermail/gcc-patches/2023-May/619366.html 
with all comments from you have been fixed.
Thanks.



juzhe.zh...@rivai.ai
 
From: Kito Cheng
Date: 2023-05-24 11:20
To: juzhe.zhong
CC: gcc-patches; kito.cheng; palmer; palmer; jeffreyalaw; rdapp.gcc; Richard 
Sandiford
Subject: Re: [PATCH V2] RISC-V: Add RVV comparison autovectorization
> +void
> +expand_vec_cmp (rtx target, rtx_code code, rtx mask, rtx maskoff, rtx op0,
> +   rtx op1)
> ...
> +  rtx cmp = gen_rtx_fmt_ee (code, mask_mode, op0, op1);
> +  rtx ops[RVV_CMP_OP + 2] = {target, mask, maskoff, cmp, op0, op1};
> +  emit_vlmax_cmp_insn (icode, RVV_CMP_OP + 2, ops);
 
It's too magic.
 
> +/* This function emits cmp instruction.  */
> +void
> +emit_vlmax_cmp_insn (unsigned icode, int op_num, rtx *ops)
> +{
> +  machine_mode mode = GET_MODE (ops[0]);
> +  bool fully_unmasked_p = op_num == RVV_CMP_OP ? true : false;
> +  bool use_real_merge_p = op_num == RVV_CMP_OP ? false : true;
 
Don't do that, plz separate break this function into two.
 
> +  /* We have a maximum of 11 operands for RVV instruction patterns according 
> to
> +   * vector.md.  */
> +  insn_expander<11> e (/*OP_NUM*/ op_num, /*HAS_DEST_P*/ true,
> +  /*FULLY_UNMASKED_P*/ fully_unmasked_p,
> +  /*USE_REAL_MERGE_P*/ use_real_merge_p,
> +  /*HAS_AVL_P*/ true,
> +  /*VLMAX_P*/ true,
> +  /*DEST_MODE*/ mode, /*MASK_MODE*/ mode);
> +  e.set_policy (op_num == RVV_CMP_OP ? MASK_UNDISTURBED : MASK_ANY);
> +  e.emit_insn ((enum insn_code) icode, ops);
> +}
> +
>  /* Expand series const vector.  */
>
>  void
> +void
> +expand_vec_cmp (rtx target, rtx_code code, rtx op0, rtx op1)
> +{
> +  machine_mode mask_mode = GET_MODE (target);
> +  machine_mode data_mode = GET_MODE (op0);
> +  insn_code icode = get_cmp_insn_code (code, data_mode);
> +
> +  if (code == LTGT)
> +{
> +  rtx gt = gen_reg_rtx (mask_mode);
> +  rtx lt = gen_reg_rtx (mask_mode);
> +  expand_vec_cmp (gt, GT, op0, op1);
> +  expand_vec_cmp (lt, LT, op0, op1);
> +  icode = code_for_pred (IOR, mask_mode);
> +  rtx ops[3] = {target, gt, lt};
 
rtx ops[] = {target, gt, lt};
 
> +  emit_vlmax_insn (icode, riscv_vector::RVV_BINOP, ops);
> +  return;
> +}
> +
> +  rtx cmp = gen_rtx_fmt_ee (code, mask_mode, op0, op1);
> +  rtx ops[RVV_CMP_OP] = {target, cmp, op0, op1};
 
rtx ops[] = {target, cmp, op0, op1};
 
> +  emit_vlmax_cmp_insn (icode, RVV_CMP_OP, ops);
> +}
> +
 
> +  /* There is native support for the inverse comparison.  */
> +  code = reverse_condition_maybe_unordered (code);
> +  if (code == ORDERED)
> +emit_move_insn (target, eq0);
> +  else
> +expand_vec_cmp (eq0, code, eq0, eq0, op0, op1);
> +
> +  if (can_invert_p)
> +{
> +  emit_move_insn (target, eq0);
> +  return true;
> +}
> +  insn_code icode = code_for_pred_not (mask_mode);
> +  rtx ops[RVV_UNOP] = {target, eq0};
> +  emit_vlmax_insn (icode, RVV_UNOP, ops);
 
rtx ops[] = {target, eq0};
 


Re: [PATCH V4] RISC-V: Add RVV comparison autovectorization

2023-05-23 Thread juzhe.zh...@rivai.ai
Hi, this patch is the patch that fixed all comments from Robin.

And this patch is the prerequisite patch for my current middle-end work.
Without this patch, I can't support len_mask_xxx middle-end pattern since 
the mask is generated by comparison.

For example,
for (int i...; i < n.)
  if (cond[i])
 a[i] = b[i]

We need len_mask_load/len_mask_store for such code and I am gonna support them
in the middle-end after this patch is merged.

Both integer && floating (order and unorder) are tested.
built && regression passed.

Ok for trunk?

Thanks.


juzhe.zh...@rivai.ai
 
From: juzhe.zhong
Date: 2023-05-24 11:11
To: gcc-patches
CC: kito.cheng; kito.cheng; palmer; palmer; jeffreyalaw; rdapp.gcc; 
Juzhe-Zhong; Richard Sandiford
Subject: [PATCH V4] RISC-V: Add RVV comparison autovectorization
From: Juzhe-Zhong 
 
This patch enable RVV auto-vectorization including floating-point
unorder and order comparison.
 
The testcases are leveraged from Richard.
So include Richard as co-author.
 
Co-Authored-By: Richard Sandiford 
 
gcc/ChangeLog:
 
* config/riscv/autovec.md (@vcond_mask_): New pattern.
(vec_cmp): Ditto.
(vec_cmpu): Ditto.
(vcond): Ditto.
(vcondu): Ditto.
* config/riscv/riscv-protos.h (enum insn_type): Add new enum.
(emit_vlmax_merge_insn): New function.
(emit_vlmax_cmp_insn): Ditto.
(expand_vec_cmp): Ditto.
(expand_vec_cmp_float):Ditto.
(expand_vcond):Ditto.
* config/riscv/riscv-v.cc (emit_vlmax_merge_insn): Ditto.
(emit_vlmax_cmp_insn): Ditto.
(get_cmp_insn_code): Ditto.
(expand_vec_cmp): Ditto.
(expand_vec_cmp_float): Ditto.
(expand_vcond): Ditto.
 
gcc/testsuite/ChangeLog:
 
* gcc.target/riscv/rvv/rvv.exp: Add RVV comparison testcases.
* gcc.target/riscv/rvv/autovec/cmp/vcond-1.c: New test.
* gcc.target/riscv/rvv/autovec/cmp/vcond-2.c: New test.
* gcc.target/riscv/rvv/autovec/cmp/vcond-3.c: New test.
* gcc.target/riscv/rvv/autovec/cmp/vcond_run-1.c: New test.
* gcc.target/riscv/rvv/autovec/cmp/vcond_run-2.c: New test.
* gcc.target/riscv/rvv/autovec/cmp/vcond_run-3.c: New test.
 
---
gcc/config/riscv/autovec.md   | 112 
gcc/config/riscv/riscv-protos.h   |   8 +
gcc/config/riscv/riscv-v.cc   | 242 ++
.../riscv/rvv/autovec/cmp/vcond-1.c   | 157 
.../riscv/rvv/autovec/cmp/vcond-2.c   |  75 ++
.../riscv/rvv/autovec/cmp/vcond-3.c   |  13 +
.../riscv/rvv/autovec/cmp/vcond_run-1.c   |  49 
.../riscv/rvv/autovec/cmp/vcond_run-2.c   |  76 ++
.../riscv/rvv/autovec/cmp/vcond_run-3.c   |   6 +
gcc/testsuite/gcc.target/riscv/rvv/rvv.exp|   2 +
10 files changed, 740 insertions(+)
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/cmp/vcond-1.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/cmp/vcond-2.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/cmp/vcond-3.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/cmp/vcond_run-1.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/cmp/vcond_run-2.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/cmp/vcond_run-3.c
 
diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
index 7c87b6012f6..4eeeab624a4 100644
--- a/gcc/config/riscv/autovec.md
+++ b/gcc/config/riscv/autovec.md
@@ -162,3 +162,115 @@
riscv_vector::RVV_BINOP, operands);
   DONE;
})
+
+;; =
+;; == Comparisons and selects
+;; =
+
+;; -
+;;  [INT,FP] Select based on masks
+;; -
+;; Includes merging patterns for:
+;; - vmerge.vv
+;; - vmerge.vx
+;; - vfmerge.vf
+;; -
+
+(define_expand "@vcond_mask_"
+  [(match_operand:V 0 "register_operand")
+   (match_operand: 3 "register_operand")
+   (match_operand:V 1 "nonmemory_operand")
+   (match_operand:V 2 "register_operand")]
+  "TARGET_VECTOR"
+  {
+/* The order of vcond_mask is opposite to pred_merge.  */
+std::swap (operands[1], operands[2]);
+riscv_vector::emit_vlmax_merge_insn (code_for_pred_merge (mode),
+riscv_vector::RVV_MERGE_OP, operands);
+DONE;
+  }
+)
+
+;; -
+;;  [INT,FP] Comparisons
+;; -
+;; Includes:
+;; - vms.
+;; -
+
+(define_expand "vec_cmp&

Re: Re: [PATCH] RISC-V: Fix incorrect code of touching inaccessible memory address

2023-05-23 Thread juzhe.zh...@rivai.ai
Thanks. I fix it by separating VL and normal operand.
V2 patch:
https://gcc.gnu.org/pipermail/gcc-patches/2023-May/619356.html 

Does it look more reasonable to you?
Just finished the building test && regression.

Thanks.


juzhe.zh...@rivai.ai
 
From: Kito Cheng
Date: 2023-05-24 10:10
To: juzhe.zhong
CC: gcc-patches; kito.cheng; palmer; palmer; jeffreyalaw; rdapp.gcc
Subject: Re: [PATCH] RISC-V: Fix incorrect code of touching inaccessible memory 
address
I am a little hesitant about that, since I feel the vl and normal op
should be put in separately, otherwise the means of m_op_num is kind
of unclear, we have comments there but I think it's not ideal since it
is really context sensitive and hard to determine.
 
And I suspect gcc_assert (ops[m_op_num]); is not too useful since it
might just be out of range access if we forgot to pass the vl
operands.
 
I am thinking we might need to introduce something like llvm::ArrayRef
to have a better sanity check, e.g. check the length of ops.
One possible solution is just using std::vector can achieve the same
purpose too, but come with more cost.
 
 
On Wed, May 24, 2023 at 9:57 AM  wrote:
>
> From: Juzhe-Zhong 
>
> For VLMAX situation, rtx len = ops[m_op_num] is incorrect since
> the last element the ops array should be ops[m_op_num - 1];
>
> I notice this issue when I am debugging code.
> This is a code bug even though the following codes will hide this issue.
> We still should need this minor fix.
>
> Built && Regression PASSed.
>
> Ok for trunk?
>
> gcc/ChangeLog:
>
> * config/riscv/riscv-v.cc: Fix bug of touching inaccessible memory.
>
> ---
>  gcc/config/riscv/riscv-v.cc | 20 +++-
>  1 file changed, 19 insertions(+), 1 deletion(-)
>
> diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc
> index fa61a850a22..a0992773644 100644
> --- a/gcc/config/riscv/riscv-v.cc
> +++ b/gcc/config/riscv/riscv-v.cc
> @@ -169,7 +169,11 @@ public:
>
>  if (m_needs_avl_p)
>{
> -   rtx len = ops[m_op_num];
> +   /* The variable "m_op_num" means the real operation operands except VL
> +  operand. For VLMAX patterns (no VL operand), the last operand is
> +  ops[m_op_num -1]. Wheras for non-VLMAX patterns, the last operand 
> is
> +  VL operand which is ops[m_op_num].  */
> +   rtx len = NULL_RTX;
> if (m_vlmax_p)
>   {
> if (const_vlmax_p (m_dest_mode))
> @@ -185,6 +189,20 @@ public:
> len = gen_reg_rtx (Pmode);
> emit_vlmax_vsetvl (m_dest_mode, len);
>   }
> +   else
> + {
> +   /* According to LRA mov pattern in vector.md. The VL operand 
> is
> +  always the last operand.  */
> +   gcc_assert (ops[m_op_num]);
> +   len = ops[m_op_num];
> + }
> + }
> +   else
> + {
> +   /* For non-VLMAX patterns. The VL operand is always the last
> +* operand.  */
> +   gcc_assert (ops[m_op_num]);
> +   len = ops[m_op_num];
>   }
> add_input_operand (len, Pmode);
>}
> --
> 2.36.3
>
 


Re: Re: [PATCH V2] RISC-V: Fix magic number of RVV auto-vectorization expander

2023-05-23 Thread juzhe.zh...@rivai.ai
I always finished build up && regression testsuite before I posted the patches.



juzhe.zh...@rivai.ai
 
From: juzhe.zh...@rivai.ai
Date: 2023-05-24 09:37
To: palmer
CC: gcc-patches; kito.cheng; Kito.cheng; jeffreyalaw; Robin Dapp
Subject: Re: Re: [PATCH V2] RISC-V: Fix magic number of RVV auto-vectorization 
expander
Yes, I built it and regression has passed.



juzhe.zh...@rivai.ai
 
From: Palmer Dabbelt
Date: 2023-05-24 09:37
To: juzhe.zhong
CC: gcc-patches; Kito Cheng; kito.cheng; jeffreyalaw; rdapp.gcc
Subject: Re: Re: [PATCH V2] RISC-V: Fix magic number of RVV auto-vectorization 
expander
On Tue, 23 May 2023 18:34:00 PDT (-0700), juzhe.zh...@rivai.ai wrote:
> Yeah. Can I merge it?
 
You built it?  Then I'm fine with merging it.
 
> 
> 
> 
> juzhe.zh...@rivai.ai
>  
> From: Palmer Dabbelt
> Date: 2023-05-24 09:32
> To: juzhe.zhong
> CC: gcc-patches; Kito Cheng; kito.cheng; jeffreyalaw; rdapp.gcc; juzhe.zhong
> Subject: Re: [PATCH V2] RISC-V: Fix magic number of RVV auto-vectorization 
> expander
> On Tue, 23 May 2023 18:28:48 PDT (-0700), juzhe.zh...@rivai.ai wrote:
>> From: Juzhe-Zhong 
>>
>> This simple patch fixes the magic number, remove magic number make codes 
>> more reasonable.
>>
>> Ok for trunk ?
>>
>> gcc/ChangeLog:
>>
>> * config/riscv/riscv-v.cc (expand_vec_series): Remove magic number.
>> (expand_const_vector): Ditto.
>> (legitimize_move): Ditto.
>> (sew64_scalar_helper): Ditto.
>> (expand_tuple_move): Ditto.
>> (expand_vector_init_insert_elems): Ditto.
>> * config/riscv/riscv.cc (vector_zero_call_used_regs): Ditto.
>>
>> ---
>>  gcc/config/riscv/riscv-v.cc | 53 +
>>  gcc/config/riscv/riscv.cc   |  2 +-
>>  2 files changed, 26 insertions(+), 29 deletions(-)
>>
>> diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc
>> index 478a052a779..fa61a850a22 100644
>> --- a/gcc/config/riscv/riscv-v.cc
>> +++ b/gcc/config/riscv/riscv-v.cc
>> @@ -406,14 +406,14 @@ expand_vec_series (rtx dest, rtx base, rtx step)
>>int shift = exact_log2 (INTVAL (step));
>>rtx shift_amount = gen_int_mode (shift, Pmode);
>>insn_code icode = code_for_pred_scalar (ASHIFT, mode);
>> -   rtx ops[3] = {step_adj, vid, shift_amount};
>> -   emit_vlmax_insn (icode, riscv_vector::RVV_BINOP, ops);
>> +   rtx ops[] = {step_adj, vid, shift_amount};
>> +   emit_vlmax_insn (icode, RVV_BINOP, ops);
>  
> Looks like it also removes the "riscv_vector" namespace from some of the 
> constants?  No big deal, it's just a different cleanup (assuming it 
> still builds and such).
>  
>>  }
>>else
>>  {
>>insn_code icode = code_for_pred_scalar (MULT, mode);
>> -   rtx ops[3] = {step_adj, vid, step};
>> -   emit_vlmax_insn (icode, riscv_vector::RVV_BINOP, ops);
>> +   rtx ops[] = {step_adj, vid, step};
>> +   emit_vlmax_insn (icode, RVV_BINOP, ops);
>>  }
>>  }
>>
>> @@ -428,8 +428,8 @@ expand_vec_series (rtx dest, rtx base, rtx step)
>>  {
>>rtx result = gen_reg_rtx (mode);
>>insn_code icode = code_for_pred_scalar (PLUS, mode);
>> -  rtx ops[3] = {result, step_adj, base};
>> -  emit_vlmax_insn (icode, riscv_vector::RVV_BINOP, ops);
>> +  rtx ops[] = {result, step_adj, base};
>> +  emit_vlmax_insn (icode, RVV_BINOP, ops);
>>emit_move_insn (dest, result);
>>  }
>>  }
>> @@ -445,8 +445,8 @@ expand_const_vector (rtx target, rtx src)
>>gcc_assert (
>>  const_vec_duplicate_p (src, &elt)
>>  && (rtx_equal_p (elt, const0_rtx) || rtx_equal_p (elt, const1_rtx)));
>> -  rtx ops[2] = {target, src};
>> -  emit_vlmax_insn (code_for_pred_mov (mode), riscv_vector::RVV_UNOP, 
>> ops);
>> +  rtx ops[] = {target, src};
>> +  emit_vlmax_insn (code_for_pred_mov (mode), RVV_UNOP, ops);
>>return;
>>  }
>>
>> @@ -458,16 +458,14 @@ expand_const_vector (rtx target, rtx src)
>>  we use vmv.v.i instruction.  */
>>if (satisfies_constraint_vi (src) || satisfies_constraint_Wc0 (src))
>>  {
>> -   rtx ops[2] = {tmp, src};
>> -   emit_vlmax_insn (code_for_pred_mov (mode), riscv_vector::RVV_UNOP,
>> -ops);
>> +   rtx ops[] = {tmp, src};
>> +   emit_vlmax_insn (code_for_pred_mov (mode), RVV_UNOP, ops);
>>  }
>>else
>>  {
>>elt = force_reg (elt_mode, elt);
>> -   rtx ops[2] = {tmp, elt};
>> -   emit_vlmax_insn (code_for_pred_broadcast (mode),
>>

Re: Re: [PATCH V2] RISC-V: Fix magic number of RVV auto-vectorization expander

2023-05-23 Thread juzhe.zh...@rivai.ai
Yes, I built it and regression has passed.



juzhe.zh...@rivai.ai
 
From: Palmer Dabbelt
Date: 2023-05-24 09:37
To: juzhe.zhong
CC: gcc-patches; Kito Cheng; kito.cheng; jeffreyalaw; rdapp.gcc
Subject: Re: Re: [PATCH V2] RISC-V: Fix magic number of RVV auto-vectorization 
expander
On Tue, 23 May 2023 18:34:00 PDT (-0700), juzhe.zh...@rivai.ai wrote:
> Yeah. Can I merge it?
 
You built it?  Then I'm fine with merging it.
 
> 
> 
> 
> juzhe.zh...@rivai.ai
>  
> From: Palmer Dabbelt
> Date: 2023-05-24 09:32
> To: juzhe.zhong
> CC: gcc-patches; Kito Cheng; kito.cheng; jeffreyalaw; rdapp.gcc; juzhe.zhong
> Subject: Re: [PATCH V2] RISC-V: Fix magic number of RVV auto-vectorization 
> expander
> On Tue, 23 May 2023 18:28:48 PDT (-0700), juzhe.zh...@rivai.ai wrote:
>> From: Juzhe-Zhong 
>>
>> This simple patch fixes the magic number, remove magic number make codes 
>> more reasonable.
>>
>> Ok for trunk ?
>>
>> gcc/ChangeLog:
>>
>> * config/riscv/riscv-v.cc (expand_vec_series): Remove magic number.
>> (expand_const_vector): Ditto.
>> (legitimize_move): Ditto.
>> (sew64_scalar_helper): Ditto.
>> (expand_tuple_move): Ditto.
>> (expand_vector_init_insert_elems): Ditto.
>> * config/riscv/riscv.cc (vector_zero_call_used_regs): Ditto.
>>
>> ---
>>  gcc/config/riscv/riscv-v.cc | 53 +
>>  gcc/config/riscv/riscv.cc   |  2 +-
>>  2 files changed, 26 insertions(+), 29 deletions(-)
>>
>> diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc
>> index 478a052a779..fa61a850a22 100644
>> --- a/gcc/config/riscv/riscv-v.cc
>> +++ b/gcc/config/riscv/riscv-v.cc
>> @@ -406,14 +406,14 @@ expand_vec_series (rtx dest, rtx base, rtx step)
>>int shift = exact_log2 (INTVAL (step));
>>rtx shift_amount = gen_int_mode (shift, Pmode);
>>insn_code icode = code_for_pred_scalar (ASHIFT, mode);
>> -   rtx ops[3] = {step_adj, vid, shift_amount};
>> -   emit_vlmax_insn (icode, riscv_vector::RVV_BINOP, ops);
>> +   rtx ops[] = {step_adj, vid, shift_amount};
>> +   emit_vlmax_insn (icode, RVV_BINOP, ops);
>  
> Looks like it also removes the "riscv_vector" namespace from some of the 
> constants?  No big deal, it's just a different cleanup (assuming it 
> still builds and such).
>  
>>  }
>>else
>>  {
>>insn_code icode = code_for_pred_scalar (MULT, mode);
>> -   rtx ops[3] = {step_adj, vid, step};
>> -   emit_vlmax_insn (icode, riscv_vector::RVV_BINOP, ops);
>> +   rtx ops[] = {step_adj, vid, step};
>> +   emit_vlmax_insn (icode, RVV_BINOP, ops);
>>  }
>>  }
>>
>> @@ -428,8 +428,8 @@ expand_vec_series (rtx dest, rtx base, rtx step)
>>  {
>>rtx result = gen_reg_rtx (mode);
>>insn_code icode = code_for_pred_scalar (PLUS, mode);
>> -  rtx ops[3] = {result, step_adj, base};
>> -  emit_vlmax_insn (icode, riscv_vector::RVV_BINOP, ops);
>> +  rtx ops[] = {result, step_adj, base};
>> +  emit_vlmax_insn (icode, RVV_BINOP, ops);
>>emit_move_insn (dest, result);
>>  }
>>  }
>> @@ -445,8 +445,8 @@ expand_const_vector (rtx target, rtx src)
>>gcc_assert (
>>  const_vec_duplicate_p (src, &elt)
>>  && (rtx_equal_p (elt, const0_rtx) || rtx_equal_p (elt, const1_rtx)));
>> -  rtx ops[2] = {target, src};
>> -  emit_vlmax_insn (code_for_pred_mov (mode), riscv_vector::RVV_UNOP, 
>> ops);
>> +  rtx ops[] = {target, src};
>> +  emit_vlmax_insn (code_for_pred_mov (mode), RVV_UNOP, ops);
>>return;
>>  }
>>
>> @@ -458,16 +458,14 @@ expand_const_vector (rtx target, rtx src)
>>  we use vmv.v.i instruction.  */
>>if (satisfies_constraint_vi (src) || satisfies_constraint_Wc0 (src))
>>  {
>> -   rtx ops[2] = {tmp, src};
>> -   emit_vlmax_insn (code_for_pred_mov (mode), riscv_vector::RVV_UNOP,
>> -ops);
>> +   rtx ops[] = {tmp, src};
>> +   emit_vlmax_insn (code_for_pred_mov (mode), RVV_UNOP, ops);
>>  }
>>else
>>  {
>>elt = force_reg (elt_mode, elt);
>> -   rtx ops[2] = {tmp, elt};
>> -   emit_vlmax_insn (code_for_pred_broadcast (mode),
>> -riscv_vector::RVV_UNOP, ops);
>> +   rtx ops[] = {tmp, elt};
>> +   emit_vlmax_insn (code_for_pred_broadcast (mode), RVV_UNOP, ops);
>>  }
>>
>>if (tmp != target)
>> @@ -536,9 +534,8 @@ legitimize_move (rtx dest, rtx src)
>>rtx tmp = gen_reg_rtx (mode);

Re: Re: [PATCH V2] RISC-V: Fix magic number of RVV auto-vectorization expander

2023-05-23 Thread juzhe.zh...@rivai.ai
Yeah. Can I merge it?



juzhe.zh...@rivai.ai
 
From: Palmer Dabbelt
Date: 2023-05-24 09:32
To: juzhe.zhong
CC: gcc-patches; Kito Cheng; kito.cheng; jeffreyalaw; rdapp.gcc; juzhe.zhong
Subject: Re: [PATCH V2] RISC-V: Fix magic number of RVV auto-vectorization 
expander
On Tue, 23 May 2023 18:28:48 PDT (-0700), juzhe.zh...@rivai.ai wrote:
> From: Juzhe-Zhong 
>
> This simple patch fixes the magic number, remove magic number make codes more 
> reasonable.
>
> Ok for trunk ?
>
> gcc/ChangeLog:
>
> * config/riscv/riscv-v.cc (expand_vec_series): Remove magic number.
> (expand_const_vector): Ditto.
> (legitimize_move): Ditto.
> (sew64_scalar_helper): Ditto.
> (expand_tuple_move): Ditto.
> (expand_vector_init_insert_elems): Ditto.
> * config/riscv/riscv.cc (vector_zero_call_used_regs): Ditto.
>
> ---
>  gcc/config/riscv/riscv-v.cc | 53 +
>  gcc/config/riscv/riscv.cc   |  2 +-
>  2 files changed, 26 insertions(+), 29 deletions(-)
>
> diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc
> index 478a052a779..fa61a850a22 100644
> --- a/gcc/config/riscv/riscv-v.cc
> +++ b/gcc/config/riscv/riscv-v.cc
> @@ -406,14 +406,14 @@ expand_vec_series (rtx dest, rtx base, rtx step)
>int shift = exact_log2 (INTVAL (step));
>rtx shift_amount = gen_int_mode (shift, Pmode);
>insn_code icode = code_for_pred_scalar (ASHIFT, mode);
> -   rtx ops[3] = {step_adj, vid, shift_amount};
> -   emit_vlmax_insn (icode, riscv_vector::RVV_BINOP, ops);
> +   rtx ops[] = {step_adj, vid, shift_amount};
> +   emit_vlmax_insn (icode, RVV_BINOP, ops);
 
Looks like it also removes the "riscv_vector" namespace from some of the 
constants?  No big deal, it's just a different cleanup (assuming it 
still builds and such).
 
>  }
>else
>  {
>insn_code icode = code_for_pred_scalar (MULT, mode);
> -   rtx ops[3] = {step_adj, vid, step};
> -   emit_vlmax_insn (icode, riscv_vector::RVV_BINOP, ops);
> +   rtx ops[] = {step_adj, vid, step};
> +   emit_vlmax_insn (icode, RVV_BINOP, ops);
>  }
>  }
>
> @@ -428,8 +428,8 @@ expand_vec_series (rtx dest, rtx base, rtx step)
>  {
>rtx result = gen_reg_rtx (mode);
>insn_code icode = code_for_pred_scalar (PLUS, mode);
> -  rtx ops[3] = {result, step_adj, base};
> -  emit_vlmax_insn (icode, riscv_vector::RVV_BINOP, ops);
> +  rtx ops[] = {result, step_adj, base};
> +  emit_vlmax_insn (icode, RVV_BINOP, ops);
>emit_move_insn (dest, result);
>  }
>  }
> @@ -445,8 +445,8 @@ expand_const_vector (rtx target, rtx src)
>gcc_assert (
>  const_vec_duplicate_p (src, &elt)
>  && (rtx_equal_p (elt, const0_rtx) || rtx_equal_p (elt, const1_rtx)));
> -  rtx ops[2] = {target, src};
> -  emit_vlmax_insn (code_for_pred_mov (mode), riscv_vector::RVV_UNOP, 
> ops);
> +  rtx ops[] = {target, src};
> +  emit_vlmax_insn (code_for_pred_mov (mode), RVV_UNOP, ops);
>return;
>  }
>
> @@ -458,16 +458,14 @@ expand_const_vector (rtx target, rtx src)
>  we use vmv.v.i instruction.  */
>if (satisfies_constraint_vi (src) || satisfies_constraint_Wc0 (src))
>  {
> -   rtx ops[2] = {tmp, src};
> -   emit_vlmax_insn (code_for_pred_mov (mode), riscv_vector::RVV_UNOP,
> -ops);
> +   rtx ops[] = {tmp, src};
> +   emit_vlmax_insn (code_for_pred_mov (mode), RVV_UNOP, ops);
>  }
>else
>  {
>elt = force_reg (elt_mode, elt);
> -   rtx ops[2] = {tmp, elt};
> -   emit_vlmax_insn (code_for_pred_broadcast (mode),
> -riscv_vector::RVV_UNOP, ops);
> +   rtx ops[] = {tmp, elt};
> +   emit_vlmax_insn (code_for_pred_broadcast (mode), RVV_UNOP, ops);
>  }
>
>if (tmp != target)
> @@ -536,9 +534,8 @@ legitimize_move (rtx dest, rtx src)
>rtx tmp = gen_reg_rtx (mode);
>if (MEM_P (src))
>  {
> -   rtx ops[2] = {tmp, src};
> -   emit_vlmax_insn (code_for_pred_mov (mode), riscv_vector::RVV_UNOP,
> -ops);
> +   rtx ops[] = {tmp, src};
> +   emit_vlmax_insn (code_for_pred_mov (mode), RVV_UNOP, ops);
>  }
>else
>  emit_move_insn (tmp, src);
> @@ -548,8 +545,8 @@ legitimize_move (rtx dest, rtx src)
>if (satisfies_constraint_vu (src))
>  return false;
>
> -  rtx ops[2] = {dest, src};
> -  emit_vlmax_insn (code_for_pred_mov (mode), riscv_vector::RVV_UNOP, ops);
> +  rtx ops[] = {dest, src};
> +  emit_vlmax_insn (code_for_pred_mov (mode), RVV_UNOP, ops);
>return true;
>  }
>
> @@ -813,7 +810,7 @@ sew64_scalar_helper (rtx *operands, rtx *scalar_op, rtx 
> vl,
>  *scalar_op = force_reg (scalar_mode, *scalar_op);
>
> 

Re: Re: [PATCH V2] RISC-V: Add RVV comparison autovectorization

2023-05-23 Thread juzhe.zh...@rivai.ai
Ok. Let's wait for Kito's more comments.
Thanks.


juzhe.zh...@rivai.ai
 
From: Robin Dapp
Date: 2023-05-24 05:07
To: 钟居哲; gcc-patches
CC: rdapp.gcc; kito.cheng; kito.cheng; palmer; palmer; Jeff Law; 
richard.sandiford
Subject: Re: [PATCH V2] RISC-V: Add RVV comparison autovectorization
>>> Don't you want to use your shiny new operand passing style here as
>>> with the other expanders?
> H, I do this just following ARM code style.
> You can see I do pass rtx[] for expand_vcond and pass rtx,rtx,rtx for 
> expand_vec_cmp.
> Well, I just follow ARM SVE implementation (You can check aarch64-sve.md, we 
> are the same)  :)
> If don't like it, could give me more information then I change it for you.
 
It doesn't matter that much in the end.  I just wondered that we just introduced
a new style of passing operands to the insn_expander and then immediately not
use it in the first follow up :)
 
Nit:
+  e.set_policy (op_num == RVV_CMP_OP ? MASK_UNDISTURBED : MASK_ANY);
 
This looks weird in an emit__cmp_insn.  Without a comment it's unclear
why anything else but a CMP_OP would ever be used here.  The double meaning
of the enum (that I wanted to be an instruction type rather than a "number
of operands") doesn't help.  But well, fixable in the future.  We just
need to make sure not to accumulate too many of these warts.
 
From the expander side V3 looks clean now.  The integer parts look OK to me
but I haven't checked the FP side at all.
 
Regards
Robin
 


Re: [PATCH V12] VECT: Add decrement IV iteration loop control by variable amount support

2023-05-23 Thread juzhe.zh...@rivai.ai
Bootstrap on X86 passed.
Ok for trunk?

Thanks.


juzhe.zh...@rivai.ai
 
From: juzhe.zhong
Date: 2023-05-22 16:38
To: gcc-patches
CC: richard.sandiford; rguenther; Ju-Zhe Zhong
Subject: [PATCH V12] VECT: Add decrement IV iteration loop control by variable 
amount support
From: Ju-Zhe Zhong 
 
gcc/ChangeLog:
 
* tree-vect-loop-manip.cc (vect_adjust_loop_lens_control): New function.
(vect_set_loop_controls_directly): Add decrement IV support.
(vect_set_loop_condition_partial_vectors): Ditto.
* tree-vect-loop.cc: Ditto.
* tree-vectorizer.h (LOOP_VINFO_USING_DECREMENTING_IV_P): New macro.
 
---
gcc/tree-vect-loop-manip.cc | 184 +++-
gcc/tree-vect-loop.cc   |  10 ++
gcc/tree-vectorizer.h   |   8 ++
3 files changed, 199 insertions(+), 3 deletions(-)
 
diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc
index ff6159e08d5..94b38d1e0fb 100644
--- a/gcc/tree-vect-loop-manip.cc
+++ b/gcc/tree-vect-loop-manip.cc
@@ -385,6 +385,66 @@ vect_maybe_permute_loop_masks (gimple_seq *seq, 
rgroup_controls *dest_rgm,
   return false;
}
+/* Try to use adjust loop lens for non-SLP multiple-rgroups.
+
+ _36 = MIN_EXPR ;
+
+ First length (MIN (X, VF/N)):
+   loop_len_15 = MIN_EXPR <_36, VF/N>;
+
+ Second length:
+   tmp = _36 - loop_len_15;
+   loop_len_16 = MIN (tmp, VF/N);
+
+ Third length:
+   tmp2 = tmp - loop_len_16;
+   loop_len_17 = MIN (tmp2, VF/N);
+
+ Last length:
+   loop_len_18 = tmp2 - loop_len_17;
+*/
+
+static void
+vect_adjust_loop_lens_control (tree iv_type, gimple_seq *seq,
+rgroup_controls *dest_rgm,
+rgroup_controls *src_rgm, tree step)
+{
+  tree ctrl_type = dest_rgm->type;
+  poly_uint64 nitems_per_ctrl
+= TYPE_VECTOR_SUBPARTS (ctrl_type) * dest_rgm->factor;
+  tree length_limit = build_int_cst (iv_type, nitems_per_ctrl);
+
+  for (unsigned int i = 0; i < dest_rgm->controls.length (); ++i)
+{
+  if (!step)
+ step = src_rgm->controls[i / dest_rgm->controls.length ()];
+  tree ctrl = dest_rgm->controls[i];
+  if (i == 0)
+ {
+   /* First iteration: MIN (X, VF/N) capped to the range [0, VF/N].  */
+   gassign *assign
+ = gimple_build_assign (ctrl, MIN_EXPR, step, length_limit);
+   gimple_seq_add_stmt (seq, assign);
+ }
+  else if (i == dest_rgm->controls.length () - 1)
+ {
+   /* Last iteration: Remain capped to the range [0, VF/N].  */
+   gassign *assign = gimple_build_assign (ctrl, MINUS_EXPR, step,
+ dest_rgm->controls[i - 1]);
+   gimple_seq_add_stmt (seq, assign);
+ }
+  else
+ {
+   /* (MIN (remain, VF*I/N)) capped to the range [0, VF/N].  */
+   step = gimple_build (seq, MINUS_EXPR, iv_type, step,
+dest_rgm->controls[i - 1]);
+   gassign *assign
+ = gimple_build_assign (ctrl, MIN_EXPR, step, length_limit);
+   gimple_seq_add_stmt (seq, assign);
+ }
+}
+}
+
/* Helper for vect_set_loop_condition_partial_vectors.  Generate definitions
for all the rgroup controls in RGC and return a control that is nonzero
when the loop needs to iterate.  Add any new preheader statements to
@@ -468,9 +528,78 @@ vect_set_loop_controls_directly (class loop *loop, 
loop_vec_info loop_vinfo,
   gimple_stmt_iterator incr_gsi;
   bool insert_after;
   standard_iv_increment_position (loop, &incr_gsi, &insert_after);
-  create_iv (build_int_cst (iv_type, 0), PLUS_EXPR, nitems_step, NULL_TREE,
-  loop, &incr_gsi, insert_after, &index_before_incr,
-  &index_after_incr);
+  if (LOOP_VINFO_USING_DECREMENTING_IV_P (loop_vinfo))
+{
+  nitems_total = gimple_convert (preheader_seq, iv_type, nitems_total);
+  tree step = make_ssa_name (iv_type);
+  /* Create decrement IV.  */
+  create_iv (nitems_total, MINUS_EXPR, step, NULL_TREE, loop, &incr_gsi,
+ insert_after, &index_before_incr, &index_after_incr);
+  tree temp = gimple_build (header_seq, MIN_EXPR, iv_type,
+ index_before_incr, nitems_step);
+  gimple_seq_add_stmt (header_seq, gimple_build_assign (step, temp));
+
+  if (rgc->max_nscalars_per_iter == 1)
+ {
+   /* single rgroup:
+  ...
+  _10 = (unsigned long) count_12(D);
+  ...
+  # ivtmp_9 = PHI 
+  _36 = MIN_EXPR ;
+  ...
+  vect__4.8_28 = .LEN_LOAD (_17, 32B, _36, 0);
+  ...
+  ivtmp_35 = ivtmp_9 - _36;
+  ...
+  if (ivtmp_35 != 0)
+goto ; [83.33%]
+  else
+goto ; [16.67%]
+   */
+   gassign *assign = gimple_build_assign (rgc->controls[0], step);
+   gimple_seq_add_stmt (header_seq, assign);
+ }
+  else
+ {
+   /* Multiple rgroup (SLP):
+  ...
+  _38 = (unsigned long) bnd.7_29;
+  _39 = _38 * 2;
+  ...
+  # ivtmp_41 = PHI 
+  ...
+  _43 = MIN_EXPR ;
+  loop_len_26 = MIN_EXPR <_43, 16>;
+  loop_len_25 = _43 - loop_len_26;
+  ...
+  .LEN_STORE (_6, 8B, loop_len_26, ...);
+ 

Re: Re: [PATCH V2] RISC-V: Refactor the framework of RVV auto-vectorization

2023-05-23 Thread juzhe.zh...@rivai.ai
Oh, Thanks.
Let's wait for Kito's final approved.



juzhe.zh...@rivai.ai
 
From: Robin Dapp
Date: 2023-05-23 17:44
To: juzhe.zhong; gcc-patches
CC: rdapp.gcc; kito.cheng; kito.cheng; palmer; palmer; jeffreyalaw
Subject: Re: [PATCH V2] RISC-V: Refactor the framework of RVV auto-vectorization
Hi Juzhe,
 
thanks, IMHO it's clearer with the changes now.  There are still
things that could be improved but it is surely an improvement over
what we currently have.  Therefore I'd vote to go ahead so we can
continue with more expanders and changes.
 
Still, we should be prepared for more refactoring changes in the future.
 
Regards
Robin
 


Re: Re: [PATCH] RISC-V: Refactor the framework of RVV auto-vectorization

2023-05-23 Thread juzhe.zh...@rivai.ai
Yeah. I know. 
Like ARM does everywhere:
(define_expand "vcond"
  [(set (match_operand:SVE_ALL 0 "register_operand")
  (if_then_else:SVE_ALL
(match_operator 3 "comparison_operator"
  [(match_operand:SVE_I 4 "register_operand")
   (match_operand:SVE_I 5 "nonmemory_operand")])
(match_operand:SVE_ALL 1 "nonmemory_operand")
(match_operand:SVE_ALL 2 "nonmemory_operand")))]
  "TARGET_SVE &&  == "
  {
aarch64_expand_sve_vcond (mode, mode, operands);
DONE;
  }
)

passing "operands" looks codes much cleaner.

Hi, kito. Could you take a look at the V2 refactor patch:
https://gcc.gnu.org/pipermail/gcc-patches/2023-May/619291.html 
This is important for us since we can't post more autovec patches without 
refactor patch.

Thanks


juzhe.zh...@rivai.ai
 
From: Kito Cheng
Date: 2023-05-23 16:45
To: juzhe.zh...@rivai.ai
CC: Robin Dapp; gcc-patches; Kito.cheng; palmer; palmer; jeffreyalaw
Subject: Re: Re: [PATCH] RISC-V: Refactor the framework of RVV 
auto-vectorization
> ARM uses rtx operands[] in many places and I personally prefer this way since
> it will make codes much cleaner.
> I dislike the way making the function argument with multiple operand ,like 
> this:
> void func(rtx dest, rtx src1, rtx src2, )
> If we are doing this, we will need to add helpers forever...
 
Don't forget we are using C++, so we have function overloading or
default arguments :)
 


Re: Re: [PATCH] RISC-V: Refactor the framework of RVV auto-vectorization

2023-05-23 Thread juzhe.zh...@rivai.ai

Hi, Robin.

>> Why does a store not have a destination (as commented below)?
OK, V2 patch will have more comments.

>> m_all_unmasked_p or m_fully_unmasked_p?
OK.

>> Apart from the insn-centric name, couldn't we also decide this
>> based on the context later?  In the vector-builtins.cc we have
>> use_real_mask_p and use_real_merge_p that do this.
Ok. V2 will follow builtin framework

>> This means "has avl operand" I suppose?  From the caller's point
>> of view (and also the vsetvl pass) something like "needs avl" or so
>> would be more descriptive but undecided here.
Ok.

>> Do we need to expose these in the constructor?  As far as I can
>> tell we can decide later whether the instruction has a policy
>> or not (as I did in my patch, depending on whether all inputs
>> are masks or so).

Maybe, we can add helpers to set policies. I will send V2 let you see.

>> Having the mask mode be automatically deduced from the destination
>>is good, it was just obnoxious before to pass ...
Ok

>> I don't particularly like the names ;) Going back to vlmax and
>> nonvlmax I don't mind but do we really need to have the policies
>> encoded in the name now?  Especially since "many" is a word and
>> the default is ANY anyway.  Why not emit_vlmax_insn/emit_vlmax_op
>> for now and add the tu/mu later?

Ok

>> You can just drop the "The number = 11 is because" and say
>> "We have a maximum of 11 operands for...".
>> The eleven arguments seem a bit clunky here ;)  I would suggest
>> changing this again in the future bur for now let's just go ahead
>> with it in order to make progress.
Ok

>> The rtx operands[] array I like least of the changes in this patch.
>> It's essentially an untyped array whose meaning is dependent on context
>> containing source operands and the length that is sometimes empty and
>> sometimes not.  I can't think of something that wouldn't complicate things
>> though but before we at least had functions called _len that would take
>> a length (NULL or not) and _vlmax that wouldn't.  It's pretty easy to mess
>> up here on the caller's side.

ARM uses rtx operands[] in many places and I personally prefer this way since
it will make codes much cleaner. 
I dislike the way making the function argument with multiple operand ,like this:
void func(rtx dest, rtx src1, rtx src2, )
If we are doing this, we will need to add helpers forever...

Sending V2 patch soon.

Thanks.


juzhe.zh...@rivai.ai
 
From: Robin Dapp
Date: 2023-05-23 16:06
To: juzhe.zhong; gcc-patches
CC: rdapp.gcc; kito.cheng; kito.cheng; palmer; palmer; jeffreyalaw
Subject: Re: [PATCH] RISC-V: Refactor the framework of RVV auto-vectorization
Hi Juzhe,
 
in general I find the revised structure quite logical and it is definitely
an improvement.  Some abstraction are still a bit leaky but we can always
refactor "on the fly".  Some comments on the general parts, skipping
over the later details.
 
>   bool m_has_dest_p;
 
Why does a store not have a destination (as commented below)?
 
>   /* It't true if the pattern uses all trues mask operand.  */
>   bool m_use_all_trues_mask_p;
 
m_all_unmasked_p or m_fully_unmasked_p?
 
>   /* It's true if the pattern uses undefined merge operand.  */
>   bool m_use_undef_merge_p;
 
Apart from the insn-centric name, couldn't we also decide this
based on the context later?  In the vector-builtins.cc we have
use_real_mask_p and use_real_merge_p that do this.
 
>   bool m_has_avl_p;
 
This means "has avl operand" I suppose?  From the caller's point
of view (and also the vsetvl pass) something like "needs avl" or so
would be more descriptive but undecided here.
 
>   bool m_vlmax_p;
>   bool m_has_tail_policy_p;
>   bool m_has_mask_policy_p;
 
Do we need to expose these in the constructor?  As far as I can
tell we can decide later whether the instruction has a policy
or not (as I did in my patch, depending on whether all inputs
are masks or so).
 
>   enum tail_policy m_tail_policy;
>   enum mask_policy m_mask_policy;
 
>   machine_mode m_dest_mode;
>   machine_mode m_mask_mode;
 
Having the mask mode be automatically deduced from the destination
is good, it was just obnoxious before to pass ...
 
> Currently, we have "emit_vlmax_tany_many" and "emit_nonvlmax_tany_many".
 
I don't particularly like the names ;) Going back to vlmax and
nonvlmax I don't mind but do we really need to have the policies
encoded in the name now?  Especially since "many" is a word and
the default is ANY anyway.  Why not emit_vlmax_insn/emit_vlmax_op
for now and add the tu/mu later?
> #define RVV_BINOP_NUM 3 (numb

Re: Re: [PATCH] RISC-V: Add RVV comparison autovectorization

2023-05-22 Thread juzhe.zh...@rivai.ai
I will first send refactor patch soon. Then second send comparison patch.
The refactor patch will be applicable for all future use, and they should come
first since I have implemented the all RVV auto-vectorization patterns and I 
know
what we will need in the future use.

Thanks.


juzhe.zh...@rivai.ai
 
From: Robin Dapp
Date: 2023-05-22 20:26
To: juzhe.zh...@rivai.ai; gcc-patches
CC: rdapp.gcc; Kito.cheng; palmer; jeffreyalaw; richard.sandiford
Subject: Re: [PATCH] RISC-V: Add RVV comparison autovectorization
> I do refactoring since we are going to have many different
> auto-vectorization patterns, for example: cond_addetc.
> 
> I should make the current framework suitable for all of them to
> simplify the future work.
 
That's good in general but can't it wait until the respective
changes go in?  I don't know how much you intend to change but
it will be easier to review as well if we don't change parts now
that might be used differently in the future. On top, we won't
get everything right with the first shot anyway.
 
Regards
Robin
 


Re: Re: [PATCH] RISC-V: Add RVV comparison autovectorization

2023-05-22 Thread juzhe.zh...@rivai.ai
Yes, I am working on it, but I noticed that the current framework is really 
ugly and bad.
I am gonna refactor it before I send comparison support.

I do refactoring since we are going to have many different auto-vectorization 
patterns,
for example: cond_addetc.

I should make the current framework suitable for all of them to simplify the 
future work.

Thanks. 


juzhe.zh...@rivai.ai
 
From: Robin Dapp
Date: 2023-05-22 20:14
To: juzhe.zh...@rivai.ai; gcc-patches
CC: rdapp.gcc; Kito.cheng; palmer; jeffreyalaw; richard.sandiford
Subject: Re: [PATCH] RISC-V: Add RVV comparison autovectorization
> Thanks Robin. Address comment.
 
Did you intend to send an update here already or are you working
on it?  Just wondering because you just sent another refactoring
patch.
 
Regards
Robin
 


Re: Re: [PATCH V11] VECT: Add decrement IV support in Loop Vectorizer

2023-05-22 Thread juzhe.zh...@rivai.ai

>> Not sure if you've covered this already in another thread but IIRC
>> RVV uses "with-len" not only for loads and stores but for arithmetic
>> instructions as well which is where (3) fails.  Fortunately RVV uses
>> element counts(?)

Yes, RVV uses element count. But I did discover we have bugs for some 
arithmetic operations.
For example, Division, we definitely need len_div (...) like cond_div in ARM 
SVE.
But this is another story. I have support full features of RVV in my downstream 
GCC and works
well for a year (I think fix all potential issue for RVV). 
So you could image I will post more middle-end patches for RVV 
auto-vectorization in the future.

Thanks. 


juzhe.zh...@rivai.ai
 
From: Richard Biener
Date: 2023-05-22 18:12
To: Richard Sandiford; juzhe.zh...@rivai.ai; gcc-patches; rguenther
Subject: Re: [PATCH V11] VECT: Add decrement IV support in Loop Vectorizer
On Fri, May 19, 2023 at 12:59 PM Richard Sandiford via Gcc-patches
 wrote:
>
> "juzhe.zh...@rivai.ai"  writes:
> >>> I don't think this is a property of decrementing IVs.  IIUC it's really
> >>> a property of rgl->factor == 1 && factor == 1, where factor would need
> >>> to be passed in by the caller.  Because of that, it should probably be
> >>> a separate patch.
> > Is it right that I just post this part code as a seperate patch then merge 
> > it?
>
> No, not in its current form.  Like I say, the test should be based on
> factors rather than TYPE_VECTOR_SUBPARTS.  But a fix for this problem
> should come before the changes to IVs.
>
> >>> That is, current LOAD_LEN targets have two properties (IIRC):
> >>> (1) all vectors used in a given piece of vector code have the same byte 
> >>> size
> >>> (2) lengths are measured in bytes rather than elements
> >>> For all cases, including SVE, the number of controls needed for a scalar
> >>> statement is equal to the number of vectors needed for that scalar
> >>> statement.
> >>> Because of (1), on current LOADL_LEN targets, the number of controls
> >>> needed for a scalar statement is also proportional to the total number
> >>> of bytes occupied by the vectors generated for that scalar statement.
> >>> And because of (2), the total number of bytes is the only thing that
> >>> matters, so all users of a particular control can use the same control
> >>> value.
> >>> E.g. on current LOAD_LEN targets, 2xV16QI and 2xV8HI would use the same
> >>> control (with no adjustment).  2xV16QI means 32 elements, while 2xV8HI
> >>> means 16 elements.  V16QI's nscalars_per_iter would therefore be double
> >>> V8HI's, but V8HI's factor would be double V16QI's (2 vs 1), so things
> >>> even out.
> >>> The code structurally supports targets that count in elements rather
> >>> than bytes, so that factor==1 for all element types.  See the
> >>> "rgl->factor == 1 && factor == 1" case in:
> >  >>  if (rgl->max_nscalars_per_iter < nscalars_per_iter)  >>   {  >> /* 
> > For now, we only support cases in which all loads and stores fall back 
> > to VnQI or none do.  */
> >>>gcc_assert (!rgl->max_nscalars_per_iter>>  || 
> > (rgl->factor == 1 && factor == 1)
> > || (rgl->max_nscalars_per_iter * rgl->factor
> >>>   == nscalars_per_iter * factor));
> >  >>  rgl->max_nscalars_per_iter = nscalars_per_iter; >>  rgl->type 
> > = vectype; >>  rgl->factor = factor;  >>   }>> But it hasn't been 
> > tested, since no current target uses it.
> >>> I think the above part of the patch shows that the current "factor is
> >>> always 1" path is in fact broken, and the patch is a correctness fix on
> >>> targets that measure in elements rather than bytes.
> >>> So I think the above part of the patch should go in ahead of the IV 
> >>> changes.
> >>> But the test should be based on factor rather than TYPE_VECTOR_SUBPARTS.
> > Since the length control measured by bytes instead of bytes is not
> > appropriate for RVV.You mean I can't support RVV auto-vectorization in
> > upstream GCC middle-end and I can only support it in my downstream, is
> > that right?
>
> No.  I haven't said in this or previous reviews that something cannot be
> supported in upstream GCC.
>
> I'm saying that the code in theory supports counting in bytes *or*
>

Re: Re: [PATCH] RISC-V: Implement autovec abs, vneg, vnot.

2023-05-22 Thread juzhe.zh...@rivai.ai
Yeah, I agree wit kito.
For example, I see you have rename "get_prefer_***" into "get_preferred_**"
I think this NFC patch should be  separated patch.

Thanks.


juzhe.zh...@rivai.ai
 
From: Kito Cheng
Date: 2023-05-22 17:05
To: Robin Dapp
CC: 钟居哲; gcc-patches; palmer; Michael Collison; Jeff Law
Subject: Re: [PATCH] RISC-V: Implement autovec abs, vneg, vnot.
So I expect you will also apply those refactor on Juzhe's new changes?
If so I would like to have a separated NFC refactor patch if possible.
 
e.g.
Juzhe's vec_cmp/vcond -> NFC refactor patch -> abs, vneg, vnot
 
On Mon, May 22, 2023 at 4:59 PM Robin Dapp  wrote:
>
> As discussed with Juzhe off-list, I will rebase this patch against
> Juzhe's vec_cmp/vcond patch once that hits the trunk.
>
> Regards
>  Robin
 


Re: Re: [PATCH] RISC-V: Add RVV comparison autovectorization

2023-05-22 Thread juzhe.zh...@rivai.ai
Thanks Robin. Address comment.



juzhe.zh...@rivai.ai
 
From: Robin Dapp
Date: 2023-05-22 16:07
To: juzhe.zhong; gcc-patches
CC: rdapp.gcc; kito.cheng; palmer; jeffreyalaw; Richard Sandiford
Subject: Re: [PATCH] RISC-V: Add RVV comparison autovectorization
Hi Juzhe,
 
thanks.  Some remarks inline.
 
> +;; Integer (signed) vcond.  Don't enforce an immediate range here, since it
> +;; depends on the comparison; leave it to riscv_vector::expand_vcond instead.
> +(define_expand "vcond"
> +  [(set (match_operand:V 0 "register_operand")
> + (if_then_else:V
> +   (match_operator 3 "comparison_operator"
> + [(match_operand:VI 4 "register_operand")
> +  (match_operand:VI 5 "nonmemory_operand")])
> +   (match_operand:V 1 "nonmemory_operand")
> +   (match_operand:V 2 "nonmemory_operand")))]
> +  "TARGET_VECTOR && known_eq (GET_MODE_NUNITS (mode),
> +  GET_MODE_NUNITS (mode))"
> +  {
> +riscv_vector::expand_vcond (mode, operands);
> +DONE;
> +  }
> +)
> +
> +;; Integer vcondu.  Don't enforce an immediate range here, since it
> +;; depends on the comparison; leave it to riscv_vector::expand_vcond instead.
> +(define_expand "vcondu"
> +  [(set (match_operand:V 0 "register_operand")
> + (if_then_else:V
> +   (match_operator 3 "comparison_operator"
> + [(match_operand:VI 4 "register_operand")
> +  (match_operand:VI 5 "nonmemory_operand")])
> +   (match_operand:V 1 "nonmemory_operand")
> +   (match_operand:V 2 "nonmemory_operand")))]
> +  "TARGET_VECTOR && known_eq (GET_MODE_NUNITS (mode),
> +  GET_MODE_NUNITS (mode))"
> +  {
> +riscv_vector::expand_vcond (mode, operands);
> +DONE;
> +  }
> +)
 
These do exactly the same (as do their aarch64 heirs).  As you are a friend
of iterators usually I guess you didn't use one for clarity here?  Also, I
didn't see that we do much of immediate-range enforcement in expand_vcond.
 
> +
> +;; Floating-point vcond.  Don't enforce an immediate range here, since it
> +;; depends on the comparison; leave it to riscv_vector::expand_vcond instead.
> +(define_expand "vcond"
> +  [(set (match_operand:V 0 "register_operand")
> + (if_then_else:V
> +   (match_operator 3 "comparison_operator"
> + [(match_operand:VF 4 "register_operand")
> +  (match_operand:VF 5 "nonmemory_operand")])
> +   (match_operand:V 1 "nonmemory_operand")
> +   (match_operand:V 2 "nonmemory_operand")))]
> +  "TARGET_VECTOR && known_eq (GET_MODE_NUNITS (mode),
> +  GET_MODE_NUNITS (mode))"
> +  {
> +riscv_vector::expand_vcond (mode, operands);
> +DONE;
> +  }
> +)
 
It comes a bit as a surprise to add float comparisons before any other
float autovec patterns are in.  I'm not against it but would wait for
other comments here.  If the tests are source from aarch64 they have
been reviewed often enough that we can be fairly sure to do the right
thing though.  I haven't checked the expander and inversion things
closely now though.
 
> +
> +;; -
> +;;  [INT,FP] Select based on masks
> +;; -
> +;; Includes merging patterns for:
> +;; - vmerge.vv
> +;; - vmerge.vx
> +;; - vfmerge.vf
> +;; -
> +
> +(define_expand "vcond_mask_"
> +  [(match_operand:V 0 "register_operand")
> +   (match_operand: 3 "register_operand")
> +   (match_operand:V 1 "nonmemory_operand")
> +   (match_operand:V 2 "register_operand")]
> +  "TARGET_VECTOR"
> +  {
> +riscv_vector::emit_merge_op (operands[0], operands[2],
> +operands[1], operands[3]);
> +DONE;
> +  }
> +)
 
Order of operands is a bit surprising, see below.
 
> +  void add_fixed_operand (rtx x)
> +  {
> +create_fixed_operand (&m_ops[m_opno++], x);
> +gcc_assert (m_opno <= MAX_OPERANDS);
> +  }
> +  void add_integer_operand (rtx x)
> +  {
> +create_integer_operand (&m_ops[m_opno++], INTVAL (x));
> +gcc_assert (m_opno <= MAX_OPERANDS);
> +  }
>void add_all_one_mask_operand (machine_mode mode)
>{
>  add_input_operand (CONSTM1_RTX (mode), mode);
> @@ -85,11 +95,14 @@ public:
>{
>  add_input_operand (RVV_VUNDEF (mode), mode);
>}
> -  void add_policy_operand (enum tail_policy vta, enum mask_policy vma)
> +  void add_policy_operand (enum tail_policy

Re: Re: [PATCH V12] VECT: Fix issue of multiple-rgroup for length is counting elements

2023-05-22 Thread juzhe.zh...@rivai.ai
Hi, Richard.
I have rebase to trunk and send the updated patch for "decrement IV support":
https://gcc.gnu.org/pipermail/gcc-patches/2023-May/619115.html 

Thanks.


juzhe.zh...@rivai.ai
 
From: Richard Sandiford
Date: 2023-05-22 16:00
To: juzhe.zhong
CC: gcc-patches; rguenther; pan2.li
Subject: Re: [PATCH V12] VECT: Fix issue of multiple-rgroup for length is 
counting elements
juzhe.zh...@rivai.ai writes:
> From: Ju-Zhe Zhong 
>
> Address comments from Richard that splits the patch of fixing multiple-rgroup
> handling of length counting elements.
>
> This patch is fixing issue of handling multiple-rgroup of length is counting 
> elements
>
> Before this patch, multiple rgroup run fail:
> FAIL: gcc.target/riscv/rvv/autovec/partial/multiple_rgroup_run-1.c execution 
> test
> FAIL: gcc.target/riscv/rvv/autovec/partial/multiple_rgroup_run-2.c execution 
> test
> FAIL: gcc.target/riscv/rvv/autovec/partial/multiple_rgroup_run-1.c execution 
> test
> FAIL: gcc.target/riscv/rvv/autovec/partial/multiple_rgroup_run-2.c execution 
> test
> FAIL: gcc.target/riscv/rvv/autovec/partial/multiple_rgroup_run-1.c execution 
> test
> FAIL: gcc.target/riscv/rvv/autovec/partial/multiple_rgroup_run-2.c execution 
> test
> FAIL: gcc.target/riscv/rvv/autovec/partial/multiple_rgroup_run-2.c execution 
> test
> FAIL: gcc.target/riscv/rvv/autovec/partial/multiple_rgroup_run-1.c execution 
> test
> FAIL: gcc.target/riscv/rvv/autovec/partial/multiple_rgroup_run-1.c execution 
> test
> FAIL: gcc.target/riscv/rvv/autovec/partial/multiple_rgroup_run-2.c execution 
> test
> FAIL: gcc.target/riscv/rvv/autovec/partial/multiple_rgroup_run-1.c execution 
> test
> FAIL: gcc.target/riscv/rvv/autovec/partial/multiple_rgroup_run-2.c execution 
> test
> FAIL: gcc.target/riscv/rvv/autovec/partial/multiple_rgroup_run-1.c execution 
> test
>
> After this patch, These tests are all passed.
 
Thanks, looks great.  A couple of minor comments below:
 
> diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
> index 905145ae97b..a13d6f5e898 100644
> --- a/gcc/tree-vect-loop.cc
> +++ b/gcc/tree-vect-loop.cc
> @@ -10364,8 +10364,9 @@ vect_record_loop_len (loop_vec_info loop_vinfo, 
> vec_loop_lens *lens,
> rgroup that operates on NVECTORS vectors, where 0 <= INDEX < NVECTORS.  */
>  
 
The new parameters need to be documented.  How about:
 
/* Given a complete set of lengths LENS, extract length number INDEX
   for an rgroup that operates on NVECTORS vectors of type VECTYPE,
   where 0 <= INDEX < NVECTORS.  Return a value that contains FACTOR
   multipled by the number of elements that should be processed.
   Insert any set-up statements before GSI.  */
 
>  tree
> -vect_get_loop_len (loop_vec_info loop_vinfo, vec_loop_lens *lens,
> -unsigned int nvectors, unsigned int index)
> +vect_get_loop_len (loop_vec_info loop_vinfo, gimple_stmt_iterator *gsi,
> +vec_loop_lens *lens, unsigned int nvectors, tree vectype,
> +unsigned int index, unsigned int factor)
>  {
>rgroup_controls *rgl = &(*lens)[nvectors - 1];
>bool use_bias_adjusted_len =
> @@ -10400,6 +10401,27 @@ vect_get_loop_len (loop_vec_info loop_vinfo, 
> vec_loop_lens *lens,
>  
>if (use_bias_adjusted_len)
>  return rgl->bias_adjusted_ctrl;
> +  else if (rgl->factor == 1 && factor == 1)
> +{
> +  tree iv_type = LOOP_VINFO_RGROUP_IV_TYPE (loop_vinfo);
> +  tree loop_len = rgl->controls[index];
> +  poly_int64 nunits1 = TYPE_VECTOR_SUBPARTS (rgl->type);
> +  poly_int64 nunits2 = TYPE_VECTOR_SUBPARTS (vectype);
> +  if (maybe_ne (nunits1, nunits2))
> + {
> +   /* A loop len for data type X can be reused for data type Y
> +  if X has N times more elements than Y and if Y's elements
> +  are N times bigger than X's.  */
> +   gcc_assert (multiple_p (nunits1, nunits2));
> +   factor = exact_div (nunits1, nunits2).to_constant ();
> +   gimple_seq seq = NULL;
> +   loop_len = gimple_build (&seq, RDIV_EXPR, iv_type, loop_len,
> +build_int_cst (iv_type, factor));
> +   if (seq)
> + gsi_insert_seq_before (gsi, seq, GSI_SAME_STMT);
> + }
> +  return loop_len;
> +}
>else
>  return rgl->controls[index];
 
This looks right, but I think it'd be clearer to rearrange things slightly:
 
  if (use_bias_adjusted_len)
return rgl->bias_adjusted_ctrl;
 
  tree loop_len = rgl->controls[index];
  if (rgl->factor == 1 && factor == 1)
{
  poly_int64 nunits1 = TYPE_VECTOR_SUBPARTS (rgl->type);
  poly_int64 nunits2 = TYPE_VECTOR_SUBPARTS (vectype);
  if (maybe_ne (nunits1, nunits2))
{
  /* A loop len for data type X can be reused for data type Y
 if X has N times more elements than Y and 

Re: Re: [PATCH V12] VECT: Fix issue of multiple-rgroup for length is counting elements

2023-05-22 Thread juzhe.zh...@rivai.ai
Thanks. Richard.
https://gcc.gnu.org/pipermail/gcc-patches/2023-May/619111.html 
Would you mind take a look again this patch?
I just copy your codes from your comments and test them.
They all passed.
Ok for trunk.

>> The patch is OK for trunk with those changes, thanks.  Once it's pushed,
>> could you post the updated decrementing IV patch?
Sure, I am working on it.


juzhe.zh...@rivai.ai
 
From: Richard Sandiford
Date: 2023-05-22 16:00
To: juzhe.zhong
CC: gcc-patches; rguenther; pan2.li
Subject: Re: [PATCH V12] VECT: Fix issue of multiple-rgroup for length is 
counting elements
juzhe.zh...@rivai.ai writes:
> From: Ju-Zhe Zhong 
>
> Address comments from Richard that splits the patch of fixing multiple-rgroup
> handling of length counting elements.
>
> This patch is fixing issue of handling multiple-rgroup of length is counting 
> elements
>
> Before this patch, multiple rgroup run fail:
> FAIL: gcc.target/riscv/rvv/autovec/partial/multiple_rgroup_run-1.c execution 
> test
> FAIL: gcc.target/riscv/rvv/autovec/partial/multiple_rgroup_run-2.c execution 
> test
> FAIL: gcc.target/riscv/rvv/autovec/partial/multiple_rgroup_run-1.c execution 
> test
> FAIL: gcc.target/riscv/rvv/autovec/partial/multiple_rgroup_run-2.c execution 
> test
> FAIL: gcc.target/riscv/rvv/autovec/partial/multiple_rgroup_run-1.c execution 
> test
> FAIL: gcc.target/riscv/rvv/autovec/partial/multiple_rgroup_run-2.c execution 
> test
> FAIL: gcc.target/riscv/rvv/autovec/partial/multiple_rgroup_run-2.c execution 
> test
> FAIL: gcc.target/riscv/rvv/autovec/partial/multiple_rgroup_run-1.c execution 
> test
> FAIL: gcc.target/riscv/rvv/autovec/partial/multiple_rgroup_run-1.c execution 
> test
> FAIL: gcc.target/riscv/rvv/autovec/partial/multiple_rgroup_run-2.c execution 
> test
> FAIL: gcc.target/riscv/rvv/autovec/partial/multiple_rgroup_run-1.c execution 
> test
> FAIL: gcc.target/riscv/rvv/autovec/partial/multiple_rgroup_run-2.c execution 
> test
> FAIL: gcc.target/riscv/rvv/autovec/partial/multiple_rgroup_run-1.c execution 
> test
>
> After this patch, These tests are all passed.
 
Thanks, looks great.  A couple of minor comments below:
 
> diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
> index 905145ae97b..a13d6f5e898 100644
> --- a/gcc/tree-vect-loop.cc
> +++ b/gcc/tree-vect-loop.cc
> @@ -10364,8 +10364,9 @@ vect_record_loop_len (loop_vec_info loop_vinfo, 
> vec_loop_lens *lens,
> rgroup that operates on NVECTORS vectors, where 0 <= INDEX < NVECTORS.  */
>  
 
The new parameters need to be documented.  How about:
 
/* Given a complete set of lengths LENS, extract length number INDEX
   for an rgroup that operates on NVECTORS vectors of type VECTYPE,
   where 0 <= INDEX < NVECTORS.  Return a value that contains FACTOR
   multipled by the number of elements that should be processed.
   Insert any set-up statements before GSI.  */
 
>  tree
> -vect_get_loop_len (loop_vec_info loop_vinfo, vec_loop_lens *lens,
> -unsigned int nvectors, unsigned int index)
> +vect_get_loop_len (loop_vec_info loop_vinfo, gimple_stmt_iterator *gsi,
> +vec_loop_lens *lens, unsigned int nvectors, tree vectype,
> +unsigned int index, unsigned int factor)
>  {
>rgroup_controls *rgl = &(*lens)[nvectors - 1];
>bool use_bias_adjusted_len =
> @@ -10400,6 +10401,27 @@ vect_get_loop_len (loop_vec_info loop_vinfo, 
> vec_loop_lens *lens,
>  
>if (use_bias_adjusted_len)
>  return rgl->bias_adjusted_ctrl;
> +  else if (rgl->factor == 1 && factor == 1)
> +{
> +  tree iv_type = LOOP_VINFO_RGROUP_IV_TYPE (loop_vinfo);
> +  tree loop_len = rgl->controls[index];
> +  poly_int64 nunits1 = TYPE_VECTOR_SUBPARTS (rgl->type);
> +  poly_int64 nunits2 = TYPE_VECTOR_SUBPARTS (vectype);
> +  if (maybe_ne (nunits1, nunits2))
> + {
> +   /* A loop len for data type X can be reused for data type Y
> +  if X has N times more elements than Y and if Y's elements
> +  are N times bigger than X's.  */
> +   gcc_assert (multiple_p (nunits1, nunits2));
> +   factor = exact_div (nunits1, nunits2).to_constant ();
> +   gimple_seq seq = NULL;
> +   loop_len = gimple_build (&seq, RDIV_EXPR, iv_type, loop_len,
> +build_int_cst (iv_type, factor));
> +   if (seq)
> + gsi_insert_seq_before (gsi, seq, GSI_SAME_STMT);
> + }
> +  return loop_len;
> +}
>else
>  return rgl->controls[index];
 
This looks right, but I think it'd be clearer to rearrange things slightly:
 
  if (use_bias_adjusted_len)
return rgl->bias_adjusted_ctrl;
 
  tree loop_len = rgl->controls[index];
  if (rgl->factor == 1 && factor == 1)
{
  poly_int64 nunits1 = TYPE_VECTOR_SUBPARTS (rgl->type

Re: [PATCH V12] VECT: Fix issue of multiple-rgroup for length is counting elements

2023-05-22 Thread juzhe.zh...@rivai.ai
Hi, Richard and Richi.
This patch bootstrap PASS on X86 and regression no surprise change.
Ok for trunk ?

Thanks.


juzhe.zh...@rivai.ai
 
From: juzhe.zhong
Date: 2023-05-22 10:08
To: gcc-patches
CC: richard.sandiford; rguenther; pan2.li; Ju-Zhe Zhong
Subject: [PATCH V12] VECT: Fix issue of multiple-rgroup for length is counting 
elements
From: Ju-Zhe Zhong 
 
Address comments from Richard that splits the patch of fixing multiple-rgroup
handling of length counting elements.
 
This patch is fixing issue of handling multiple-rgroup of length is counting 
elements
 
Before this patch, multiple rgroup run fail:
FAIL: gcc.target/riscv/rvv/autovec/partial/multiple_rgroup_run-1.c execution 
test
FAIL: gcc.target/riscv/rvv/autovec/partial/multiple_rgroup_run-2.c execution 
test
FAIL: gcc.target/riscv/rvv/autovec/partial/multiple_rgroup_run-1.c execution 
test
FAIL: gcc.target/riscv/rvv/autovec/partial/multiple_rgroup_run-2.c execution 
test
FAIL: gcc.target/riscv/rvv/autovec/partial/multiple_rgroup_run-1.c execution 
test
FAIL: gcc.target/riscv/rvv/autovec/partial/multiple_rgroup_run-2.c execution 
test
FAIL: gcc.target/riscv/rvv/autovec/partial/multiple_rgroup_run-2.c execution 
test
FAIL: gcc.target/riscv/rvv/autovec/partial/multiple_rgroup_run-1.c execution 
test
FAIL: gcc.target/riscv/rvv/autovec/partial/multiple_rgroup_run-1.c execution 
test
FAIL: gcc.target/riscv/rvv/autovec/partial/multiple_rgroup_run-2.c execution 
test
FAIL: gcc.target/riscv/rvv/autovec/partial/multiple_rgroup_run-1.c execution 
test
FAIL: gcc.target/riscv/rvv/autovec/partial/multiple_rgroup_run-2.c execution 
test
FAIL: gcc.target/riscv/rvv/autovec/partial/multiple_rgroup_run-1.c execution 
test
 
After this patch, These tests are all passed.
 
gcc/ChangeLog:
 
* tree-vect-loop.cc (vect_get_loop_len): Fix issue for multiple-rgroup 
of length.
* tree-vect-stmts.cc (vectorizable_store): Ditto.
(vectorizable_load): Ditto.
* tree-vectorizer.h (vect_get_loop_len): Ditto.
 
gcc/testsuite/ChangeLog:
 
* gcc.target/riscv/rvv/autovec/partial/multiple_rgroup-1.c: New test.
* gcc.target/riscv/rvv/autovec/partial/multiple_rgroup-1.h: New test.
* gcc.target/riscv/rvv/autovec/partial/multiple_rgroup-2.c: New test.
* gcc.target/riscv/rvv/autovec/partial/multiple_rgroup-2.h: New test.
* gcc.target/riscv/rvv/autovec/partial/multiple_rgroup_run-1.c: New 
test.
* gcc.target/riscv/rvv/autovec/partial/multiple_rgroup_run-2.c: New 
test.
 
---
.../rvv/autovec/partial/multiple_rgroup-1.c   |   6 +
.../rvv/autovec/partial/multiple_rgroup-1.h   | 304 ++
.../rvv/autovec/partial/multiple_rgroup-2.c   |   6 +
.../rvv/autovec/partial/multiple_rgroup-2.h   | 546 ++
.../autovec/partial/multiple_rgroup_run-1.c   |  19 +
.../autovec/partial/multiple_rgroup_run-2.c   |  19 +
gcc/tree-vect-loop.cc |  26 +-
gcc/tree-vect-stmts.cc|  28 +-
gcc/tree-vectorizer.h |   5 +-
9 files changed, 944 insertions(+), 15 deletions(-)
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/multiple_rgroup-1.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/multiple_rgroup-1.h
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/multiple_rgroup-2.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/multiple_rgroup-2.h
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/multiple_rgroup_run-1.c
create mode 100644 
gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/multiple_rgroup_run-2.c
 
diff --git 
a/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/multiple_rgroup-1.c 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/multiple_rgroup-1.c
new file mode 100644
index 000..69cc3be78f7
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/multiple_rgroup-1.c
@@ -0,0 +1,6 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-march=rv32gcv -mabi=ilp32d --param 
riscv-autovec-preference=fixed-vlmax" } */
+
+#include "multiple_rgroup-1.h"
+
+TEST_ALL (test_1)
diff --git 
a/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/multiple_rgroup-1.h 
b/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/multiple_rgroup-1.h
new file mode 100644
index 000..fbc49f4855d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/multiple_rgroup-1.h
@@ -0,0 +1,304 @@
+#include 
+#include 
+
+#define test_1(TYPE1, TYPE2)   
\
+  void __attribute__ ((noinline, noclone)) 
\
+  test_1_##TYPE1_##TYPE2 (TYPE1 *__restrict f, TYPE2 *__restrict d, TYPE1 x,   
\
+   TYPE1 x2, TYPE2 y, int n)\
+  {
\
+for (i

Re: Re: [PATCH V11] VECT: Add decrement IV support in Loop Vectorizer

2023-05-19 Thread juzhe.zh...@rivai.ai
Hi, Richard. Thanks for the comments.

Would you mind telling me whether it is possible that we can make decrement IV 
support into GCC middle-end ?

If yes, could you tell what I should do next for the patches since I am 
confused that it seems the implementation of this
patch should totally be abandoned and need to rewrite the whole thing.

Would you mind giving me more information?

Thanks. 


juzhe.zh...@rivai.ai
 
From: Richard Sandiford
Date: 2023-05-19 18:23
To: juzhe.zhong
CC: gcc-patches; rguenther
Subject: Re: [PATCH V11] VECT: Add decrement IV support in Loop Vectorizer
Thanks for the update.  I'll split this review into two pieces.
Second piece to follow (not sure when, but hopefully soon).
 
juzhe.zh...@rivai.ai writes:
> diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
> index ed0166fedab..6f49bdee009 100644
> --- a/gcc/tree-vect-loop.cc
> +++ b/gcc/tree-vect-loop.cc
> @@ -10364,12 +10375,14 @@ vect_record_loop_len (loop_vec_info loop_vinfo, 
> vec_loop_lens *lens,
> rgroup that operates on NVECTORS vectors, where 0 <= INDEX < NVECTORS.  */
>  
>  tree
> -vect_get_loop_len (loop_vec_info loop_vinfo, vec_loop_lens *lens,
> -unsigned int nvectors, unsigned int index)
> +vect_get_loop_len (loop_vec_info loop_vinfo, gimple_stmt_iterator *gsi,
> +vec_loop_lens *lens, unsigned int nvectors, tree vectype,
> +unsigned int index)
>  {
>rgroup_controls *rgl = &(*lens)[nvectors - 1];
>bool use_bias_adjusted_len =
>  LOOP_VINFO_PARTIAL_LOAD_STORE_BIAS (loop_vinfo) != 0;
> +  tree iv_type = LOOP_VINFO_RGROUP_IV_TYPE (loop_vinfo);
>  
>/* Populate the rgroup's len array, if this is the first time we've
>   used it.  */
> @@ -10400,6 +10413,26 @@ vect_get_loop_len (loop_vec_info loop_vinfo, 
> vec_loop_lens *lens,
>  
>if (use_bias_adjusted_len)
>  return rgl->bias_adjusted_ctrl;
> +  else if (LOOP_VINFO_USING_DECREMENTING_IV_P (loop_vinfo))
> +{
> +  tree loop_len = rgl->controls[index];
> +  poly_int64 nunits1 = TYPE_VECTOR_SUBPARTS (rgl->type);
> +  poly_int64 nunits2 = TYPE_VECTOR_SUBPARTS (vectype);
> +  if (maybe_ne (nunits1, nunits2))
> + {
> +   /* A loop len for data type X can be reused for data type Y
> +  if X has N times more elements than Y and if Y's elements
> +  are N times bigger than X's.  */
> +   gcc_assert (multiple_p (nunits1, nunits2));
> +   unsigned int factor = exact_div (nunits1, nunits2).to_constant ();
> +   gimple_seq seq = NULL;
> +   loop_len = gimple_build (&seq, RDIV_EXPR, iv_type, loop_len,
> +build_int_cst (iv_type, factor));
> +   if (seq)
> + gsi_insert_seq_before (gsi, seq, GSI_SAME_STMT);
> + }
> +  return loop_len;
> +}
 
I don't think this is a property of decrementing IVs.  IIUC it's really
a property of rgl->factor == 1 && factor == 1, where factor would need
to be passed in by the caller.  Because of that, it should probably be
a separate patch.
 
That is, current LOAD_LEN targets have two properties (IIRC):
 
(1) all vectors used in a given piece of vector code have the same byte size
(2) lengths are measured in bytes rather than elements
 
For all cases, including SVE, the number of controls needed for a scalar
statement is equal to the number of vectors needed for that scalar
statement.
 
Because of (1), on current LOADL_LEN targets, the number of controls
needed for a scalar statement is also proportional to the total number
of bytes occupied by the vectors generated for that scalar statement.
And because of (2), the total number of bytes is the only thing that
matters, so all users of a particular control can use the same control
value.
 
E.g. on current LOAD_LEN targets, 2xV16QI and 2xV8HI would use the same
control (with no adjustment).  2xV16QI means 32 elements, while 2xV8HI
means 16 elements.  V16QI's nscalars_per_iter would therefore be double
V8HI's, but V8HI's factor would be double V16QI's (2 vs 1), so things
even out.
 
The code structurally supports targets that count in elements rather
than bytes, so that factor==1 for all element types.  See the
"rgl->factor == 1 && factor == 1" case in:
 
  if (rgl->max_nscalars_per_iter < nscalars_per_iter)
{
  /* For now, we only support cases in which all loads and stores fall back
to VnQI or none do.  */
  gcc_assert (!rgl->max_nscalars_per_iter
  || (rgl->factor == 1 && factor == 1)
  || (rgl->max_nscalars_per_iter * rgl->factor
  == nscalars_per_iter * factor));
  rgl->max_nscalars_per_iter = nscalars_per_iter;
  rgl->type = vectype;
  rgl->factor = factor;
}
 
But it hasn't been tested, since no current target uses it.
 
I think the above part of the patch shows that the current &quo

Re: Re: [PATCH V11] VECT: Add decrement IV support in Loop Vectorizer

2023-05-19 Thread juzhe.zh...@rivai.ai
>> I don't think this is a property of decrementing IVs.  IIUC it's really
>> a property of rgl->factor == 1 && factor == 1, where factor would need
>> to be passed in by the caller.  Because of that, it should probably be
>> a separate patch.
Is it right that I just post this part code as a seperate patch then merge it?

>> That is, current LOAD_LEN targets have two properties (IIRC):
>> (1) all vectors used in a given piece of vector code have the same byte size
>> (2) lengths are measured in bytes rather than elements
>> For all cases, including SVE, the number of controls needed for a scalar
>> statement is equal to the number of vectors needed for that scalar
>> statement.
>> Because of (1), on current LOADL_LEN targets, the number of controls
>> needed for a scalar statement is also proportional to the total number
>> of bytes occupied by the vectors generated for that scalar statement.
>> And because of (2), the total number of bytes is the only thing that
>> matters, so all users of a particular control can use the same control
>> value.
>> E.g. on current LOAD_LEN targets, 2xV16QI and 2xV8HI would use the same
>> control (with no adjustment).  2xV16QI means 32 elements, while 2xV8HI
>> means 16 elements.  V16QI's nscalars_per_iter would therefore be double
>> V8HI's, but V8HI's factor would be double V16QI's (2 vs 1), so things
>> even out.
>> The code structurally supports targets that count in elements rather
>> than bytes, so that factor==1 for all element types.  See the
>> "rgl->factor == 1 && factor == 1" case in:
 >>  if (rgl->max_nscalars_per_iter < nscalars_per_iter)  >>   {  >> /* For 
 >> now, we only support cases in which all loads and stores fall back   to 
 >> VnQI or none do.  */
   >>gcc_assert (!rgl->max_nscalars_per_iter>>|| 
(rgl->factor == 1 && factor == 1)
  || (rgl->max_nscalars_per_iter * rgl->factor
>>== nscalars_per_iter * factor));
 >>  rgl->max_nscalars_per_iter = nscalars_per_iter; >>  rgl->type = 
 >> vectype; >>  rgl->factor = factor;  >>   }>> But it hasn't been tested, 
 >> since no current target uses it.
>> I think the above part of the patch shows that the current "factor is
>> always 1" path is in fact broken, and the patch is a correctness fix on
>> targets that measure in elements rather than bytes.
>> So I think the above part of the patch should go in ahead of the IV changes.
>> But the test should be based on factor rather than 
>> TYPE_VECTOR_SUBPARTS.Since the length control measured by bytes instead of 
>> bytes is not appropriate for RVV.You mean I can't support RVV 
>> auto-vectorization in upstream GCC middle-end and I can only support it in 
>> my downstream, is that right? 


juzhe.zh...@rivai.ai
 
From: Richard Sandiford
Date: 2023-05-19 18:23
To: juzhe.zhong
CC: gcc-patches; rguenther
Subject: Re: [PATCH V11] VECT: Add decrement IV support in Loop Vectorizer
Thanks for the update.  I'll split this review into two pieces.
Second piece to follow (not sure when, but hopefully soon).
 
juzhe.zh...@rivai.ai writes:
> diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
> index ed0166fedab..6f49bdee009 100644
> --- a/gcc/tree-vect-loop.cc
> +++ b/gcc/tree-vect-loop.cc
> @@ -10364,12 +10375,14 @@ vect_record_loop_len (loop_vec_info loop_vinfo, 
> vec_loop_lens *lens,
> rgroup that operates on NVECTORS vectors, where 0 <= INDEX < NVECTORS.  */
>  
>  tree
> -vect_get_loop_len (loop_vec_info loop_vinfo, vec_loop_lens *lens,
> -unsigned int nvectors, unsigned int index)
> +vect_get_loop_len (loop_vec_info loop_vinfo, gimple_stmt_iterator *gsi,
> +vec_loop_lens *lens, unsigned int nvectors, tree vectype,
> +unsigned int index)
>  {
>rgroup_controls *rgl = &(*lens)[nvectors - 1];
>bool use_bias_adjusted_len =
>  LOOP_VINFO_PARTIAL_LOAD_STORE_BIAS (loop_vinfo) != 0;
> +  tree iv_type = LOOP_VINFO_RGROUP_IV_TYPE (loop_vinfo);
>  
>/* Populate the rgroup's len array, if this is the first time we've
>   used it.  */
> @@ -10400,6 +10413,26 @@ vect_get_loop_len (loop_vec_info loop_vinfo, 
> vec_loop_lens *lens,
>  
>if (use_bias_adjusted_len)
>  return rgl->bias_adjusted_ctrl;
> +  else if (LOOP_VINFO_USING_DECREMENTING_IV_P (loop_vinfo))
> +{
> +  tree loop_len = rgl->controls[index];
> +  poly_int64 nunits1 = TYPE_VECTOR_SUBPARTS (rgl->type);
> +  poly_int64 nunits2 = TYPE_VECTOR_SUBPARTS (ve

Re: Re: [PATCH V5] RISC-V: Using merge approach to optimize repeating sequence in vec_init

2023-05-16 Thread juzhe.zh...@rivai.ai
>> Does it means we assume inner_int_mode is DImode? (because sizeof (uint64_t))
>> or it should be something like `for (unsigned int i = 0; i <
>> (GET_MODE_SIZE(inner_int_mode ()) * 8 / npatterns ()); i++)` ?
No, sizeof (uint64_t) means uint64_t mask = 0;

>> Do you mind give more comment about this? what it checked and what it did?
The reason we use known_gt (GET_MODE_SIZE (dup_mode), BYTES_PER_RISCV_VECTOR)
since we want are using vector integer mode to generate the mask for example
we generate 0b01010101010101 mask, we should use a scalar register holding 
value = 0b010101010...
Then vmv.v.x into a vector,then this vector will be used as a mask.

>> Why this only hide in else? I guess I have this question is because I
>> don't fully understand the logic of the if condition?

Since we can't vector floting-point instruction to generate a mask.

>> nit: builder.inner_mode () rather than GET_MODE_INNER (dup_mode)?

They are the same. I can change it using GET_MODE_INNER

>> And I would like have more commnet to explain why we need force_reg here.
Since it will creat ICE.




juzhe.zh...@rivai.ai
 
From: Kito Cheng
Date: 2023-05-17 11:21
To: juzhe.zhong
CC: gcc-patches; palmer; jeffreyalaw
Subject: Re: [PATCH V5] RISC-V: Using merge approach to optimize repeating 
sequence in vec_init
> +
> +/* Get the mask for merge approach.
> +
> + Consider such following case:
> +   {a, b, a, b, a, b, a, b, a, b, a, b, a, b, a, b}
> + To merge "a", the mask should be 1010
> + To merge "b", the mask should be 0101
> +*/
> +rtx
> +rvv_builder::get_merge_mask_bitfield (unsigned int index) const
> +{
> +  uint64_t base_mask = (1ULL << index);
> +  uint64_t mask = 0;
> +  for (unsigned int i = 0; i < (sizeof (uint64_t) * 8 / npatterns ()); i++)
> +mask |= base_mask << (i * npatterns ());
> +  return gen_int_mode (mask, inner_int_mode ());
 
Does it means we assume inner_int_mode is DImode? (because sizeof (uint64_t))
or it should be something like `for (unsigned int i = 0; i <
(GET_MODE_SIZE(inner_int_mode ()) * 8 / npatterns ()); i++)` ?
 
> +}
> +
>  /* Subroutine of riscv_vector_expand_vector_init.
> Works as follows:
> (a) Initialize TARGET by broadcasting element NELTS_REQD - 1 of BUILDER.
> @@ -1226,6 +1307,107 @@ expand_vector_init_insert_elems (rtx target, const 
> rvv_builder &builder,
>  }
>  }
>
> +/* Emit vmv.s.x instruction.  */
> +
> +static void
> +emit_scalar_move_op (rtx dest, rtx src, machine_mode mask_mode)
> +{
> +  insn_expander<8> e;
> +  machine_mode mode = GET_MODE (dest);
> +  rtx scalar_move_mask = gen_scalar_move_mask (mask_mode);
> +  e.set_dest_and_mask (scalar_move_mask, dest, mask_mode);
> +  e.add_input_operand (src, GET_MODE_INNER (mode));
> +  e.set_len_and_policy (const1_rtx, false);
> +  e.expand (code_for_pred_broadcast (mode), false);
> +}
> +
> +/* Emit merge instruction.  */
> +
> +static void
> +emit_merge_op (rtx dest, rtx src1, rtx src2, rtx mask)
> +{
> +  insn_expander<8> e;
> +  machine_mode mode = GET_MODE (dest);
> +  e.set_dest_merge (dest);
> +  e.add_input_operand (src1, mode);
> +  if (VECTOR_MODE_P (GET_MODE (src2)))
> +e.add_input_operand (src2, mode);
> +  else
> +e.add_input_operand (src2, GET_MODE_INNER (mode));
> +
> +  e.add_input_operand (mask, GET_MODE (mask));
> +  e.set_len_and_policy (NULL_RTX, true, true, false);
> +  if (VECTOR_MODE_P (GET_MODE (src2)))
> +e.expand (code_for_pred_merge (mode), false);
> +  else
> +e.expand (code_for_pred_merge_scalar (mode), false);
> +}
> +
> +/* Use merge approach to initialize the vector with repeating sequence.
> + v = {a, b, a, b, a, b, a, b}.
> +
> + v = broadcast (a).
> + mask = 0b01010101
> + v = merge (v, b, mask)
> +*/
> +static void
> +expand_vector_init_merge_repeating_sequence (rtx target,
> +const rvv_builder &builder)
> +{
> +  machine_mode mask_mode = get_mask_mode (builder.mode ()).require ();
> +  machine_mode dup_mode = builder.mode ();
> +  if (known_gt (GET_MODE_SIZE (dup_mode), BYTES_PER_RISCV_VECTOR))
> +{
> +  poly_uint64 nunits
> +   = exact_div (BYTES_PER_RISCV_VECTOR, builder.inner_units ());
> +  dup_mode = get_vector_mode (builder.inner_int_mode (), nunits).require 
> ();
> +}
 
Do you mind give more comment about this? what it checked and what it did?
 
> +  else
> +{
> +  if (FLOAT_MODE_P (dup_mode))
> +   {
> + poly_uint64 nunits = GET_MODE_NUNITS (dup_mode);
> + dup_mode
> +   = get_vector_mode (builder.inner_int_mode (), nuni

Re: Re: RISC-V Test Errors and Failures

2023-05-16 Thread juzhe.zh...@rivai.ai
Oh, I see. Kito has add /* { dg-do run { target { riscv_vector } } } */
But not all RVV tests has use this and I not sure whether it can work.
I think Kito can answer it.
If yes, I think we should add all of them.

Thanks.


juzhe.zh...@rivai.ai
 
From: Andrew Pinski
Date: 2023-05-17 10:02
To: juzhe.zh...@rivai.ai
CC: gcc-patches; palmer; Kito.cheng
Subject: Re: RISC-V Test Errors and Failures
On Tue, May 16, 2023 at 6:58 PM juzhe.zh...@rivai.ai
 wrote:
>
> Hi, Palmer.
> I saw your patch showed there are a lot of run time fail (execution fail) of 
> C++.
> bug-*.C
>
> These tests are RVV api intrinsics tests coming from Kito's that I have 
> already fixed all of them.
> I just double checked again they all passed.
> I think it may be your regression environment does not set up simulator (QEMU 
> or SPIKE or GEM5) correctly.
> For example, did not enable vector extension in simulator, I don't you may 
> try.
 
So on x86_64, we test to see if you have the right vector unit before
running those tests? The same thing was true on powerpc (and I think
aarch64 does the same for SVE now too). The reason why I am asking is
that I would need to run the testsuite using the simulator as setup
for the RISCV ISA I am using rather than the one with everything on.
So does the RVV runtime testsuite tests to see if you can run RVV
before running them (or running them and return they passed)?
 
Thanks,
Andrew Pinski
 
>
> Thanks.
>
>
> juzhe.zh...@rivai.ai
 


RISC-V Test Errors and Failures

2023-05-16 Thread juzhe.zh...@rivai.ai
Hi, Palmer.
I saw your patch showed there are a lot of run time fail (execution fail) of 
C++.
bug-*.C

These tests are RVV api intrinsics tests coming from Kito's that I have already 
fixed all of them.
I just double checked again they all passed.
I think it may be your regression environment does not set up simulator (QEMU 
or SPIKE or GEM5) correctly.
For example, did not enable vector extension in simulator, I don't you may try.

Thanks.


juzhe.zh...@rivai.ai


Re: Re: [PATCH V7] VECT: Add decrement IV support in Loop Vectorizer

2023-05-16 Thread juzhe.zh...@rivai.ai
Hi, Richard. Forget about V10 patch. Just go directly V11 patch.
I am so sorry that I send V10 since I originally did not notice Case 2 and Case 
3 are totally the same.
I apologize for that. I have reviewed V11 patch twice, it seems that this patch 
is much more reasonable and better understanding than before.

Thanks.


juzhe.zh...@rivai.ai
 
From: Richard Sandiford
Date: 2023-05-16 16:30
To: juzhe.zhong\@rivai.ai
CC: gcc-patches; rguenther
Subject: Re: [PATCH V7] VECT: Add decrement IV support in Loop Vectorizer
"juzhe.zh...@rivai.ai"  writes:
> Hi, Richard.
>
> RVV infrastructure in RISC-V backend status:
> 1. All RVV instructions pattern related to intrinsics are all finished (They 
> will be called not only by intrinsics but also autovec in the future).
> 2. In case of autovec, we finished len_load/len_store (They are temporary 
> used and will be removed after I support len_mask_load/len_mask_store in the 
> middle-end).
>binary integer autovec patterns.
>vec_init pattern.
>That's all we have so far.
 
Thanks.
 
> In case of testing of this patch, I have multiple rgroup testcases in local, 
> you mean you want me to post them together with this patch?
> Since I am gonna to put them in RISC-V backend testsuite, I was planning to 
> post them after this patch is finished and merged into trunk.
> What do you suggest ?
 
It would be useful to include the tests with the patch itself (as a patch
to the testsuite).  It doesn't matter that the tests are riscv-specific.
 
Obviously it would be more appropriate for the riscv maintainers to
review the riscv tests.  But keeping the tests with the patch helps when
reviewing the code, and also ensures that code is committed and never
later tested.
 
Richard
 


Re: Re: [PATCH V7] VECT: Add decrement IV support in Loop Vectorizer

2023-05-16 Thread juzhe.zh...@rivai.ai
Hi, Richard and Richi.
I am so sorry for sending you garbage patches (My mistake, sending RISC-V 
patches to you).

I finally realize that Case 2 and Case 3 are totally the same sequence!
I have combined them into single function called "vect_adjust_loop_lens_control"

I have sent V11 patch:
https://gcc.gnu.org/pipermail/gcc-patches/2023-May/618724.html 

I think this patch is the reasonable patch now!
Could you take a look at it?

Thanks.


juzhe.zh...@rivai.ai
 
From: Richard Sandiford
Date: 2023-05-16 16:30
To: juzhe.zhong\@rivai.ai
CC: gcc-patches; rguenther
Subject: Re: [PATCH V7] VECT: Add decrement IV support in Loop Vectorizer
"juzhe.zh...@rivai.ai"  writes:
> Hi, Richard.
>
> RVV infrastructure in RISC-V backend status:
> 1. All RVV instructions pattern related to intrinsics are all finished (They 
> will be called not only by intrinsics but also autovec in the future).
> 2. In case of autovec, we finished len_load/len_store (They are temporary 
> used and will be removed after I support len_mask_load/len_mask_store in the 
> middle-end).
>binary integer autovec patterns.
>vec_init pattern.
>That's all we have so far.
 
Thanks.
 
> In case of testing of this patch, I have multiple rgroup testcases in local, 
> you mean you want me to post them together with this patch?
> Since I am gonna to put them in RISC-V backend testsuite, I was planning to 
> post them after this patch is finished and merged into trunk.
> What do you suggest ?
 
It would be useful to include the tests with the patch itself (as a patch
to the testsuite).  It doesn't matter that the tests are riscv-specific.
 
Obviously it would be more appropriate for the riscv maintainers to
review the riscv tests.  But keeping the tests with the patch helps when
reviewing the code, and also ensures that code is committed and never
later tested.
 
Richard
 


Re: Re: [PATCH V7] VECT: Add decrement IV support in Loop Vectorizer

2023-05-16 Thread juzhe.zh...@rivai.ai
Hi, Richard.
I have sent V10:
https://gcc.gnu.org/pipermail/gcc-patches/2023-May/618718.html 

I can't combine implementation Case 2 and Case 3, Case 2 each control (len) are 
coming from same rgc.
But Case 3 each control (len) are coming coming from different rgc.
Can you help me with that ?
Also, I have append my testcases too in this patch too.
Thanks.


juzhe.zh...@rivai.ai
 
From: Richard Sandiford
Date: 2023-05-16 16:30
To: juzhe.zhong\@rivai.ai
CC: gcc-patches; rguenther
Subject: Re: [PATCH V7] VECT: Add decrement IV support in Loop Vectorizer
"juzhe.zh...@rivai.ai"  writes:
> Hi, Richard.
>
> RVV infrastructure in RISC-V backend status:
> 1. All RVV instructions pattern related to intrinsics are all finished (They 
> will be called not only by intrinsics but also autovec in the future).
> 2. In case of autovec, we finished len_load/len_store (They are temporary 
> used and will be removed after I support len_mask_load/len_mask_store in the 
> middle-end).
>binary integer autovec patterns.
>vec_init pattern.
>That's all we have so far.
 
Thanks.
 
> In case of testing of this patch, I have multiple rgroup testcases in local, 
> you mean you want me to post them together with this patch?
> Since I am gonna to put them in RISC-V backend testsuite, I was planning to 
> post them after this patch is finished and merged into trunk.
> What do you suggest ?
 
It would be useful to include the tests with the patch itself (as a patch
to the testsuite).  It doesn't matter that the tests are riscv-specific.
 
Obviously it would be more appropriate for the riscv maintainers to
review the riscv tests.  But keeping the tests with the patch helps when
reviewing the code, and also ensures that code is committed and never
later tested.
 
Richard
 


<    4   5   6   7   8   9   10   11   >