[PATCH] VECT: Support floating-point in-order reduction for length loop control

2023-07-19 Thread juzhe . zhong
From: Ju-Zhe Zhong 

Hi, Richard and Richi.

This patch support floating-point in-order reduction for loop length control.

Consider this following case:

float foo (float *__restrict a, int n)
{
  float result = 1.0;
  for (int i = 0; i < n; i++)
   result += a[i];
  return result;
}

When compile with **NO** -ffast-math on ARM SVE, we will end up with:

loop_mask = WHILE_ULT
result = MASK_FOLD_LEFT_PLUS (...loop_mask...)

For RVV, we don't use length loop control instead of mask:

So, with this patch, we expect to see:

loop_len = SELECT_VL
result = MASK_LEN_FOLD_LEFT_PLUS (...loop_len...)

gcc/ChangeLog:

* tree-vect-loop.cc (get_masked_reduction_fn): Add mask_len_fold_left.
(vectorize_fold_left_reduction): Ditto.
(vectorizable_reduction): Ditto.
(vect_transform_reduction): Ditto.

---
 gcc/tree-vect-loop.cc | 41 -
 1 file changed, 36 insertions(+), 5 deletions(-)

diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index b44fb9c7712..59ab7879d55 100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -6800,11 +6800,13 @@ static internal_fn
 get_masked_reduction_fn (internal_fn reduc_fn, tree vectype_in)
 {
   internal_fn mask_reduc_fn;
+  internal_fn mask_len_reduc_fn;
 
   switch (reduc_fn)
 {
 case IFN_FOLD_LEFT_PLUS:
   mask_reduc_fn = IFN_MASK_FOLD_LEFT_PLUS;
+  mask_len_reduc_fn = IFN_MASK_LEN_FOLD_LEFT_PLUS;
   break;
 
 default:
@@ -6814,6 +6816,9 @@ get_masked_reduction_fn (internal_fn reduc_fn, tree 
vectype_in)
   if (direct_internal_fn_supported_p (mask_reduc_fn, vectype_in,
  OPTIMIZE_FOR_SPEED))
 return mask_reduc_fn;
+  if (direct_internal_fn_supported_p (mask_len_reduc_fn, vectype_in,
+ OPTIMIZE_FOR_SPEED))
+return mask_len_reduc_fn;
   return IFN_LAST;
 }
 
@@ -6834,7 +6839,8 @@ vectorize_fold_left_reduction (loop_vec_info loop_vinfo,
   gimple *reduc_def_stmt,
   tree_code code, internal_fn reduc_fn,
   tree ops[3], tree vectype_in,
-  int reduc_index, vec_loop_masks *masks)
+  int reduc_index, vec_loop_masks *masks,
+  vec_loop_lens *lens)
 {
   class loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
   tree vectype_out = STMT_VINFO_VECTYPE (stmt_info);
@@ -6896,8 +6902,18 @@ vectorize_fold_left_reduction (loop_vec_info loop_vinfo,
 {
   gimple *new_stmt;
   tree mask = NULL_TREE;
+  tree len = NULL_TREE;
+  tree bias = NULL_TREE;
   if (LOOP_VINFO_FULLY_MASKED_P (loop_vinfo))
mask = vect_get_loop_mask (loop_vinfo, gsi, masks, vec_num, vectype_in, 
i);
+  if (LOOP_VINFO_FULLY_WITH_LENGTH_P (loop_vinfo))
+   {
+ len = vect_get_loop_len (loop_vinfo, gsi, lens, vec_num, vectype_in,
+  i, 1);
+ signed char biasval = LOOP_VINFO_PARTIAL_LOAD_STORE_BIAS (loop_vinfo);
+ bias = build_int_cst (intQI_type_node, biasval);
+ mask = build_minus_one_cst (truth_type_for (vectype_in));
+   }
 
   /* Handle MINUS by adding the negative.  */
   if (reduc_fn != IFN_LAST && code == MINUS_EXPR)
@@ -6917,7 +6933,10 @@ vectorize_fold_left_reduction (loop_vec_info loop_vinfo,
 the preceding operation.  */
   if (reduc_fn != IFN_LAST || (mask && mask_reduc_fn != IFN_LAST))
{
- if (mask && mask_reduc_fn != IFN_LAST)
+ if (len && mask && mask_reduc_fn != IFN_LAST)
+   new_stmt = gimple_build_call_internal (mask_reduc_fn, 5, reduc_var,
+  def0, mask, len, bias);
+ else if (mask && mask_reduc_fn != IFN_LAST)
new_stmt = gimple_build_call_internal (mask_reduc_fn, 3, reduc_var,
   def0, mask);
  else
@@ -7979,6 +7998,7 @@ vectorizable_reduction (loop_vec_info loop_vinfo,
   else if (loop_vinfo && LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo))
 {
   vec_loop_masks *masks = &LOOP_VINFO_MASKS (loop_vinfo);
+  vec_loop_lens *lens = &LOOP_VINFO_LENS (loop_vinfo);
   internal_fn cond_fn = get_conditional_internal_fn (op.code, op.type);
 
   if (reduction_type != FOLD_LEFT_REDUCTION
@@ -8006,8 +8026,17 @@ vectorizable_reduction (loop_vec_info loop_vinfo,
  LOOP_VINFO_CAN_USE_PARTIAL_VECTORS_P (loop_vinfo) = false;
}
   else
-   vect_record_loop_mask (loop_vinfo, masks, ncopies * vec_num,
-  vectype_in, NULL);
+   {
+ internal_fn mask_reduc_fn
+   = get_masked_reduction_fn (reduc_fn, vectype_in);
+
+ if (mask_reduc_fn == IFN_MASK_LEN_FOLD_LEFT_PLUS)
+   vect_record_loop_len (loop_vinfo, lens, ncopies * vec_num,
+ vectype_in, 1);
+ else
+   ve

Re: [PATCH] VECT: Support floating-point in-order reduction for length loop control

2023-07-19 Thread Robin Dapp via Gcc-patches
Hi Juzhe,

I just noticed that we recently started calling things MASK_LEN
(instead of LEN_MASK before) with the reductions.  Wouldn't we want
to be consistent here?  Especially as the length takes precedence.
I realize the preparational work like optabs is already upstream
but still wanted to bring it up.

Regards
 Robin


Re: [PATCH] VECT: Support floating-point in-order reduction for length loop control

2023-07-20 Thread Richard Biener via Gcc-patches
On Thu, 20 Jul 2023, Robin Dapp wrote:

> Hi Juzhe,
> 
> I just noticed that we recently started calling things MASK_LEN
> (instead of LEN_MASK before) with the reductions.  Wouldn't we want
> to be consistent here?  Especially as the length takes precedence.
> I realize the preparational work like optabs is already upstream
> but still wanted to bring it up.

Didn't notice that but yes, consistency would be nice to have.

Richard.


Re: Re: [PATCH] VECT: Support floating-point in-order reduction for length loop control

2023-07-19 Thread juzhe.zh...@rivai.ai
Yes. We want to be consistent. I am planning to change all previous 
"LEN_MASK_*" into "MASK_LEN_*"



juzhe.zh...@rivai.ai
 
From: Robin Dapp
Date: 2023-07-20 14:22
To: juzhe.zhong; gcc-patches
CC: rdapp.gcc; richard.sandiford; rguenther
Subject: Re: [PATCH] VECT: Support floating-point in-order reduction for length 
loop control
Hi Juzhe,
 
I just noticed that we recently started calling things MASK_LEN
(instead of LEN_MASK before) with the reductions.  Wouldn't we want
to be consistent here?  Especially as the length takes precedence.
I realize the preparational work like optabs is already upstream
but still wanted to bring it up.
 
Regards
Robin
 


Re: Re: [PATCH] VECT: Support floating-point in-order reduction for length loop control

2023-07-20 Thread juzhe.zh...@rivai.ai
Hi, Richard.

I plan to change all LEN_MASK into MASK_LEN.

Start from LEN_MASK_GATHER_LOAD/LEN_MASK_SCATTER_STORE,
we notice keeping mask same order with original mask_* patterns 
will make codes cleaner and easier to maintain. 

Thanks


juzhe.zh...@rivai.ai
 
From: Richard Biener
Date: 2023-07-20 15:21
To: Robin Dapp
CC: juzhe.zhong; gcc-patches; richard.sandiford
Subject: Re: [PATCH] VECT: Support floating-point in-order reduction for length 
loop control
On Thu, 20 Jul 2023, Robin Dapp wrote:
 
> Hi Juzhe,
> 
> I just noticed that we recently started calling things MASK_LEN
> (instead of LEN_MASK before) with the reductions.  Wouldn't we want
> to be consistent here?  Especially as the length takes precedence.
> I realize the preparational work like optabs is already upstream
> but still wanted to bring it up.
 
Didn't notice that but yes, consistency would be nice to have.
 
Richard.
 


Re: Re: [PATCH] VECT: Support floating-point in-order reduction for length loop control

2023-07-21 Thread juzhe.zh...@rivai.ai
Hi, all. From all previous cleanup patches.
Every thing related to "mask && len" are consistent now.

I rebase to the trunk and send V2 patch:
https://gcc.gnu.org/pipermail/gcc-patches/2023-July/625159.html 

Thanks.


juzhe.zh...@rivai.ai
 
From: Richard Biener
Date: 2023-07-20 15:21
To: Robin Dapp
CC: juzhe.zhong; gcc-patches; richard.sandiford
Subject: Re: [PATCH] VECT: Support floating-point in-order reduction for length 
loop control
On Thu, 20 Jul 2023, Robin Dapp wrote:
 
> Hi Juzhe,
> 
> I just noticed that we recently started calling things MASK_LEN
> (instead of LEN_MASK before) with the reductions.  Wouldn't we want
> to be consistent here?  Especially as the length takes precedence.
> I realize the preparational work like optabs is already upstream
> but still wanted to bring it up.
 
Didn't notice that but yes, consistency would be nice to have.
 
Richard.