Re: Re: [PATCH] RISC-V: Refactor Phase 3 (Demand fusion) of VSETVL PASS

2023-08-23 Thread juzhe.zh...@rivai.ai
I have reorder the functions so that we won't mess up deleted functions and new 
functions.
V2 patch:
https://gcc.gnu.org/pipermail/gcc-patches/2023-August/628237.html 

>> Why need this exception?

Because we have this piece code here for fusion in "EMPTY" block:
new_info = expr.merge (expr, GLOBAL_MERGE, eg->src->index);
The expr may not have a reall avl source which is considered as incompatible.
However, in this case, we should skip the compatible check, just use merge to 
compute demand info.

>>Make sure I understand this correctly: it's worth if thoe edges has
>>different probability?
>>If all probability is same, then it's not worth?
The probability is supposed to help for picking the optimal VSETVL info for 
incompatible demand infos.

Consider this following case:

void f (int32_t * restrict in, int32_t * restrict out, size_t n, size_t cond, 
size_t cond2)
{
  for (size_t i = 0; i < n; i++)
{
  if (i== cond) {
vint8mf8_t v = *(vint8mf8_t*)(in + i + 100);
*(vint8mf8_t*)(out + i + 100) = v;
  } else {
vbool1_t v = *(vbool1_t*)(in + i + 400);
*(vbool1_t*)(out + i + 400) = v;
  }
}
}

Both VSETVLs are incompatible since one want e8mf8, the other wants e8m8.
For if (i == cond) is very low probability (It could only be accessed 0 times 
or once)
We want to hoist the e8m8 to get optimal codegen like this:
f:
beq a2,zero,.L10
addi a0,a0,1600
addi a1,a1,1600
li a5,0
vsetvli a4,zero,e8,m8,ta,ma
.L5:
beq a3,a5,.L12
vlm.v v1,0(a0)
vsm.v v1,0(a1)
.L4:
addi a5,a5,1
addi a0,a0,4
addi a1,a1,4
bne a2,a5,.L5
.L10:
ret
.L12:
vsetvli a7,zero,e8,mf8,ta,ma
addi a6,a1,-1200
addi t1,a0,-1200
vle8.v v1,0(t1)
vse8.v v1,0(a6)
vsetvli a4,zero,e8,m8,ta,ma
j .L4


Wheras the other case is like this:
void f (int32_t * restrict in, int32_t * restrict out, size_t n, size_t cond, 
size_t cond2)
{
  for (size_t i = 0; i < n; i++)
{
  if (i > cond) {
vint8mf8_t v = *(vint8mf8_t*)(in + i + 100);
*(vint8mf8_t*)(out + i + 100) = v;
  } else {
vbool1_t v = *(vbool1_t*)(in + i + 400);
*(vbool1_t*)(out + i + 400) = v;
  }
}
}

Both condition probabilities are equal, so we don't want to take any of them as 
higher priority,
so the codegen should be:
f:
beq a2,zero,.L10
addi a0,a0,1600
addi a1,a1,1600
li a5,0
j .L5
.L12:
vsetvli a7,zero,e8,mf8,ta,ma
addi a5,a5,1
vle8.v v1,0(a6)
vse8.v v1,0(a4)
addi a0,a0,4
addi a1,a1,4
beq a2,a5,.L10
.L5:
addi a4,a1,-1200
addi a6,a0,-1200
bltu a3,a5,.L12
vsetvli t1,zero,e8,m8,ta,ma
addi a5,a5,1
vlm.v v1,0(a0)
vsm.v v1,0(a1)
addi a0,a0,4
addi a1,a1,4
bne a2,a5,.L5
.L10:
ret



juzhe.zh...@rivai.ai
 
From: Kito Cheng
Date: 2023-08-22 23:35
To: Kito Cheng
CC: Robin Dapp; Juzhe-Zhong; GCC Patches; Jeff Law
Subject: Re: [PATCH] RISC-V: Refactor Phase 3 (Demand fusion) of VSETVL PASS
It's really great improvement, it's drop some state like HARD_EMPTY
and DIRTY_WITH_KILLED_AVL which make this algorithm more easy to
understand!
also this also fundamentally improved the phase 3, although one
concern is the time complexity might be come more higher order,
(and it's already high enough in fact.)
but mostly those vectorized code are only appeard within the inner
most loop, so that is acceptable in generally
 
So I will try my best to review this closely to make it more close to
the perfect :)
 
I saw you has update serveral testcase, why update instead of add new testcase??
could you say more about why some testcase added __riscv_vadd_vv_i8mf8
or add some more dependency of vl variable?
 
 
 
> @@ -1423,8 +1409,13 @@ static bool
>  ge_sew_ratio_unavailable_p (const vector_insn_info &info1,
> const vector_insn_info &info2)
>  {
> -  if (!info2.demand_p (DEMAND_LMUL) && info2.demand_p (DEMAND_GE_SEW))
> -return info1.get_sew () < info2.get_sew ();
> +  if (!info2.demand_p (DEMAND_LMUL))
> +{
> +  if (info2.demand_p (DEMAND_GE_SEW))
> +   return info1.get_sew () < info2.get_sew ();
> +  else if (!info2.demand_p (DEMAND_SEW))
> +   return false;
> +}
 
This seems relax the compatiblitly check to allow optimize more case,
if so this should be a sperated patch.
 
>return true;
>  }
 
 
> @@ -1815,7 +1737,7 @@ vector_insn_info::parse_insn (rtx_insn *rinsn)
>  return;
>if (optimize == 0 && !has_vtype_op (rinsn))
>  return;
> -  if (optimize > 0 && !vsetvl_insn_p (rinsn))
> +  if (optimize > 0 && vsetvl_discard_result_insn_p (rinsn))
 
I didn't get this change, could you explan few more about that? it was
early exit for non vsetvl insn, but now it allowed that now?
 
>  return;
>m_state = VALID;
>extract_insn_cached (rinsn);
 
> @@ -2206,9 +2128,9 @@ vector_insn_info::fuse_mask_policy (const 
> vector_insn_info &am

Re: Re: [PATCH] RISC-V: Refactor Phase 3 (Demand fusion) of VSETVL PASS

2023-08-22 Thread juzhe.zh...@rivai.ai
>> This seems relax the compatiblitly check to allow optimize more case,
>> if so this should be a sperated patch.
This is not a optimization fix, It's an bug fix.

Since fusion for these 2 demands:
1. demand SEW and GE_SEW (meaning demand a SEW larger than a specific SEW).
2. demand SEW and GE_SEW (meaning demand a SEW larger than a specific SEW) and 
demand RATIO.

The new fusion demand should include RATIO demand but it didn't before. It's an 
bug.
It's lucky that previous tests didn't expose such bug before refactor.
But such bug is exposed after refactor.

I committed it with a separate patch.
Thanks.


juzhe.zh...@rivai.ai
 
From: Kito Cheng
Date: 2023-08-22 23:35
To: Kito Cheng
CC: Robin Dapp; Juzhe-Zhong; GCC Patches; Jeff Law
Subject: Re: [PATCH] RISC-V: Refactor Phase 3 (Demand fusion) of VSETVL PASS
It's really great improvement, it's drop some state like HARD_EMPTY
and DIRTY_WITH_KILLED_AVL which make this algorithm more easy to
understand!
also this also fundamentally improved the phase 3, although one
concern is the time complexity might be come more higher order,
(and it's already high enough in fact.)
but mostly those vectorized code are only appeard within the inner
most loop, so that is acceptable in generally
 
So I will try my best to review this closely to make it more close to
the perfect :)
 
I saw you has update serveral testcase, why update instead of add new testcase??
could you say more about why some testcase added __riscv_vadd_vv_i8mf8
or add some more dependency of vl variable?
 
 
 
> @@ -1423,8 +1409,13 @@ static bool
>  ge_sew_ratio_unavailable_p (const vector_insn_info &info1,
> const vector_insn_info &info2)
>  {
> -  if (!info2.demand_p (DEMAND_LMUL) && info2.demand_p (DEMAND_GE_SEW))
> -return info1.get_sew () < info2.get_sew ();
> +  if (!info2.demand_p (DEMAND_LMUL))
> +{
> +  if (info2.demand_p (DEMAND_GE_SEW))
> +   return info1.get_sew () < info2.get_sew ();
> +  else if (!info2.demand_p (DEMAND_SEW))
> +   return false;
> +}
 
This seems relax the compatiblitly check to allow optimize more case,
if so this should be a sperated patch.
 
>return true;
>  }
 
 
> @@ -1815,7 +1737,7 @@ vector_insn_info::parse_insn (rtx_insn *rinsn)
>  return;
>if (optimize == 0 && !has_vtype_op (rinsn))
>  return;
> -  if (optimize > 0 && !vsetvl_insn_p (rinsn))
> +  if (optimize > 0 && vsetvl_discard_result_insn_p (rinsn))
 
I didn't get this change, could you explan few more about that? it was
early exit for non vsetvl insn, but now it allowed that now?
 
>  return;
>m_state = VALID;
>extract_insn_cached (rinsn);
 
> @@ -2206,9 +2128,9 @@ vector_insn_info::fuse_mask_policy (const 
> vector_insn_info &info1,
>
>  vector_insn_info
>  vector_insn_info::merge (const vector_insn_info &merge_info,
> -enum merge_type type) const
> +enum merge_type type, unsigned bb_index) const
>  {
> -  if (!vsetvl_insn_p (get_insn ()->rtl ()))
> +  if (!vsetvl_insn_p (get_insn ()->rtl ()) && *this != merge_info)
 
Why need this exception?
 
>  gcc_assert (this->compatible_p (merge_info)
> && "Can't merge incompatible demanded infos");
 
> @@ -2403,18 +2348,22 @@ vector_infos_manager::get_all_available_exprs (
>  }
>
>  bool
> -vector_infos_manager::all_empty_predecessor_p (const basic_block cfg_bb) 
> const
> +vector_infos_manager::earliest_fusion_worthwhile_p (
> +  const basic_block cfg_bb) const
>  {
> -  hash_set pred_cfg_bbs = get_all_predecessors (cfg_bb);
> -  for (const basic_block pred_cfg_bb : pred_cfg_bbs)
> +  edge e;
> +  edge_iterator ei;
> +  profile_probability prob = profile_probability::uninitialized ();
> +  FOR_EACH_EDGE (e, ei, cfg_bb->succs)
>  {
> -  const auto &pred_block_info = vector_block_infos[pred_cfg_bb->index];
> -  if (!pred_block_info.local_dem.valid_or_dirty_p ()
> - && !pred_block_info.reaching_out.valid_or_dirty_p ())
> +  if (prob == profile_probability::uninitialized ())
> +   prob = vector_block_infos[e->dest->index].probability;
> +  else if (prob == vector_block_infos[e->dest->index].probability)
> continue;
> -  return false;
> +  else
> +   return true;
 
Make sure I understand this correctly: it's worth if thoe edges has
different probability?
 
>  }
> -  return true;
> +  return false;
 
If all probability is same, then it's not worth?
 
Plz add few comments no matter my understand is right or not :)
 
>  }
>
>  bool
 
> @@ -2428,12 +2377,12 @@ vector_infos_m

Re: Re: [PATCH] RISC-V: Refactor Phase 3 (Demand fusion) of VSETVL PASS

2023-08-22 Thread 钟居哲
>> I saw you has update serveral testcase, why update instead of add new 
>> testcase??
Since original testcase failed after this patch.

>> could you say more about why some testcase added __riscv_vadd_vv_i8mf8
>> or add some more dependency of vl variable?
These are 2 separate questions.

1. Why some testcase added __riscv_vadd_vv_i8mf8.
This is because the original testcase is too fragile and easily fail.
 Consider this following case:

  for (...)
 if (cond)
   vsetvl e8mf8
   load
   store
 else
   vsetvl e16mf4
   load
   store
This example, we know that both "e8mf8" and "e16mf4" are compatible, so we can 
either put a vsevl e8mf8 or vsetvli e16mf4 before the
for...loop and elide all vsetvlis inside the loop. 
Before this patch, the codegen result is vsetvli e8mf8, after this patch, the 
codegen result is vsetvli e16mf4.
They are both legal and optimal codegen.

To avoid future potential unnecessary test report failure, I added "vadd" which 
demand both SEW and LMUL and only allow e8mf8.
Such testcase doesn't change our testing goal, since our goal of this testcase 
is to test LCM ability of fusing VSETVL and compute the 
optimal location of vsetvl.

2. Why add some more dependency of vl variable ?
Well, as I told you previously.
HARD_EMPTY and DIRTY_WITH_KILLED_AVL is supposed to optimize this following
case:

li a6, 101.
vsetvli e8mf8
for ...
li a5,101
vsetvli e16mf4
for ...

This case happens since we set "li" cost too low that previous pass failed to 
optimized them.
I don't think we should optimize such corner case in VSETVL PASS which 
complicates the implementation seriously and 
mess up the code quality.

So after I remove them, the codegen for such case will generate one more 
"vsetvli" (only one more dynamic run-time instruction count).
I note if we make all "li" inside a loop, the issue will be gone and VSETVL 
PASS can achieve optimal codegen.

To fix this failure of such testcases, instead of "vl= 101", I make them "vl = 
a + 101", then the assembly check remain and pass.

Thanks.



juzhe.zh...@rivai.ai
 
From: Kito Cheng
Date: 2023-08-22 23:35
To: Kito Cheng
CC: Robin Dapp; Juzhe-Zhong; GCC Patches; Jeff Law
Subject: Re: [PATCH] RISC-V: Refactor Phase 3 (Demand fusion) of VSETVL PASS
It's really great improvement, it's drop some state like HARD_EMPTY
and DIRTY_WITH_KILLED_AVL which make this algorithm more easy to
understand!
also this also fundamentally improved the phase 3, although one
concern is the time complexity might be come more higher order,
(and it's already high enough in fact.)
but mostly those vectorized code are only appeard within the inner
most loop, so that is acceptable in generally
 
So I will try my best to review this closely to make it more close to
the perfect :)
 
I saw you has update serveral testcase, why update instead of add new testcase??
could you say more about why some testcase added __riscv_vadd_vv_i8mf8
or add some more dependency of vl variable?
 
 
 
> @@ -1423,8 +1409,13 @@ static bool
>  ge_sew_ratio_unavailable_p (const vector_insn_info &info1,
> const vector_insn_info &info2)
>  {
> -  if (!info2.demand_p (DEMAND_LMUL) && info2.demand_p (DEMAND_GE_SEW))
> -return info1.get_sew () < info2.get_sew ();
> +  if (!info2.demand_p (DEMAND_LMUL))
> +{
> +  if (info2.demand_p (DEMAND_GE_SEW))
> +   return info1.get_sew () < info2.get_sew ();
> +  else if (!info2.demand_p (DEMAND_SEW))
> +   return false;
> +}
 
This seems relax the compatiblitly check to allow optimize more case,
if so this should be a sperated patch.
 
>return true;
>  }
 
 
> @@ -1815,7 +1737,7 @@ vector_insn_info::parse_insn (rtx_insn *rinsn)
>  return;
>if (optimize == 0 && !has_vtype_op (rinsn))
>  return;
> -  if (optimize > 0 && !vsetvl_insn_p (rinsn))
> +  if (optimize > 0 && vsetvl_discard_result_insn_p (rinsn))
 
I didn't get this change, could you explan few more about that? it was
early exit for non vsetvl insn, but now it allowed that now?
 
>  return;
>m_state = VALID;
>extract_insn_cached (rinsn);
 
> @@ -2206,9 +2128,9 @@ vector_insn_info::fuse_mask_policy (const 
> vector_insn_info &info1,
>
>  vector_insn_info
>  vector_insn_info::merge (const vector_insn_info &merge_info,
> -enum merge_type type) const
> +enum merge_type type, unsigned bb_index) const
>  {
> -  if (!vsetvl_insn_p (get_insn ()->rtl ()))
> +  if (!vsetvl_insn_p (get_insn ()->rtl ()) && *this != merge_info)
 
Why need this exception?
 
>  gcc_assert (this->compatible_p (merge_info)
>

Re: [PATCH] RISC-V: Refactor Phase 3 (Demand fusion) of VSETVL PASS

2023-08-22 Thread Kito Cheng via Gcc-patches
It's really great improvement, it's drop some state like HARD_EMPTY
and DIRTY_WITH_KILLED_AVL which make this algorithm more easy to
understand!
also this also fundamentally improved the phase 3, although one
concern is the time complexity might be come more higher order,
(and it's already high enough in fact.)
but mostly those vectorized code are only appeard within the inner
most loop, so that is acceptable in generally

So I will try my best to review this closely to make it more close to
the perfect :)

I saw you has update serveral testcase, why update instead of add new testcase??
could you say more about why some testcase added __riscv_vadd_vv_i8mf8
or add some more dependency of vl variable?



> @@ -1423,8 +1409,13 @@ static bool
>  ge_sew_ratio_unavailable_p (const vector_insn_info &info1,
> const vector_insn_info &info2)
>  {
> -  if (!info2.demand_p (DEMAND_LMUL) && info2.demand_p (DEMAND_GE_SEW))
> -return info1.get_sew () < info2.get_sew ();
> +  if (!info2.demand_p (DEMAND_LMUL))
> +{
> +  if (info2.demand_p (DEMAND_GE_SEW))
> +   return info1.get_sew () < info2.get_sew ();
> +  else if (!info2.demand_p (DEMAND_SEW))
> +   return false;
> +}

This seems relax the compatiblitly check to allow optimize more case,
if so this should be a sperated patch.

>return true;
>  }


> @@ -1815,7 +1737,7 @@ vector_insn_info::parse_insn (rtx_insn *rinsn)
>  return;
>if (optimize == 0 && !has_vtype_op (rinsn))
>  return;
> -  if (optimize > 0 && !vsetvl_insn_p (rinsn))
> +  if (optimize > 0 && vsetvl_discard_result_insn_p (rinsn))

I didn't get this change, could you explan few more about that? it was
early exit for non vsetvl insn, but now it allowed that now?

>  return;
>m_state = VALID;
>extract_insn_cached (rinsn);

> @@ -2206,9 +2128,9 @@ vector_insn_info::fuse_mask_policy (const 
> vector_insn_info &info1,
>
>  vector_insn_info
>  vector_insn_info::merge (const vector_insn_info &merge_info,
> -enum merge_type type) const
> +enum merge_type type, unsigned bb_index) const
>  {
> -  if (!vsetvl_insn_p (get_insn ()->rtl ()))
> +  if (!vsetvl_insn_p (get_insn ()->rtl ()) && *this != merge_info)

Why need this exception?

>  gcc_assert (this->compatible_p (merge_info)
> && "Can't merge incompatible demanded infos");

> @@ -2403,18 +2348,22 @@ vector_infos_manager::get_all_available_exprs (
>  }
>
>  bool
> -vector_infos_manager::all_empty_predecessor_p (const basic_block cfg_bb) 
> const
> +vector_infos_manager::earliest_fusion_worthwhile_p (
> +  const basic_block cfg_bb) const
>  {
> -  hash_set pred_cfg_bbs = get_all_predecessors (cfg_bb);
> -  for (const basic_block pred_cfg_bb : pred_cfg_bbs)
> +  edge e;
> +  edge_iterator ei;
> +  profile_probability prob = profile_probability::uninitialized ();
> +  FOR_EACH_EDGE (e, ei, cfg_bb->succs)
>  {
> -  const auto &pred_block_info = vector_block_infos[pred_cfg_bb->index];
> -  if (!pred_block_info.local_dem.valid_or_dirty_p ()
> - && !pred_block_info.reaching_out.valid_or_dirty_p ())
> +  if (prob == profile_probability::uninitialized ())
> +   prob = vector_block_infos[e->dest->index].probability;
> +  else if (prob == vector_block_infos[e->dest->index].probability)
> continue;
> -  return false;
> +  else
> +   return true;

Make sure I understand this correctly: it's worth if thoe edges has
different probability?

>  }
> -  return true;
> +  return false;

If all probability is same, then it's not worth?

Plz add few comments no matter my understand is right or not :)

>  }
>
>  bool

> @@ -2428,12 +2377,12 @@ vector_infos_manager::all_same_ratio_p (sbitmap 
> bitdata) const
>sbitmap_iterator sbi;
>
>EXECUTE_IF_SET_IN_BITMAP (bitdata, 0, bb_index, sbi)
> -  {
> -if (ratio == -1)
> -  ratio = vector_exprs[bb_index]->get_ratio ();
> -else if (vector_exprs[bb_index]->get_ratio () != ratio)
> -  return false;
> -  }
> +{
> +  if (ratio == -1)
> +   ratio = vector_exprs[bb_index]->get_ratio ();
> +  else if (vector_exprs[bb_index]->get_ratio () != ratio)
> +   return false;
> +}
>return true;
>  }

Split this into a NFC patch, you can commit that without asking review.

> @@ -907,8 +893,8 @@ change_insn (function_info *ssa, insn_change change, 
> insn_info *insn,
> ] UNSPEC_VPREDICATE)
> (plus:RVVM4DI (reg/v:RVVM4DI 104 v8 [orig:137 op1 ] [137])
> (sign_extend:RVVM4DI (vec_duplicate:RVVM4SI (reg:SI 15 a5
> -[140] (unspec:RVVM4DI [ (const_int 0 [0]) ] UNSPEC_VUNDEF))) 
> "rvv.c":8:12
> -2784 {pred_single_widen_addsvnx8di_scalar} (expr_list:REG_EQUIV
> +[140] (unspec:RVVM4DI [ (const_int 0 [0]) ] UNSPEC_VUNDEF)))
> +"rvv.c":8:12 2784 {pred_single_widen_addsvnx8di_scalar} 
> (expr_list:REG_EQUIV
>  (mem/c:RVVM4DI (reg:DI 10 a0 [142]) 

Re: [PATCH] RISC-V: Refactor Phase 3 (Demand fusion) of VSETVL PASS

2023-08-21 Thread Kito Cheng via Gcc-patches
I think I could do some details review tomorrow on the plane, I am free
from the meeting hell tomorrow :p



Robin Dapp via Gcc-patches  於 2023年8月21日 週一 23:24
寫道:

> Hi Juzhe,
>
> thanks, this is a reasonable approach and improves readability noticeably.
> LGTM but I'd like to wait for other opinions (e.g. by Kito) as I haven't
> looked closely into the vsetvl pass before and cannot entirely review it
> quickly.  As we already have good test coverage there is not much that
> can go wrong IMHO.
>
> Regards
>  Robin
>


Re: [PATCH] RISC-V: Refactor Phase 3 (Demand fusion) of VSETVL PASS

2023-08-21 Thread Robin Dapp via Gcc-patches
Hi Juzhe,

thanks, this is a reasonable approach and improves readability noticeably.
LGTM but I'd like to wait for other opinions (e.g. by Kito) as I haven't
looked closely into the vsetvl pass before and cannot entirely review it
quickly.  As we already have good test coverage there is not much that
can go wrong IMHO.

Regards
 Robin


[PATCH] RISC-V: Refactor Phase 3 (Demand fusion) of VSETVL PASS

2023-08-20 Thread Juzhe-Zhong
This patch refactors the Phase 3 (Demand fusion) and rename it into Earliest 
fusion.
I do the refactor for the following reasons:
  
  1. Current implementation of phase 3 is doing too many things which makes the 
code quality
 quite messy and not easy to maintain.
  2. The demand fusion I do previously is we explicitly make the fusion 
including how to fuse
 VSETVLs, where to make the VSETVL fusion happens, check the VSETVL fusion 
point (location)
 whether it is correct and optimal...etc.

 We are dong these things too much so I added these following functions:

enum fusion_type get_backward_fusion_type (const bb_info *,
 const vector_insn_info &);
bool hard_empty_block_p (const bb_info *, const vector_insn_info &) 
const;
bool backward_demand_fusion (void);
bool forward_demand_fusion (void);
bool cleanup_illegal_dirty_blocks (void);

 to make sure the VSETV fusion is optimal and correct. I found in may 
downstream testing it is
 not the reliable and optimal approach.

 Instead, this patch is to use 'compute_earliest' which is the function of 
LCM to fuse multiple
 'compatible' VSETVL demand info if they are having same earliest edge.  We 
let LCM decide almost
 everything of demand fusion for us. The only thing we do (Not the LCM do) 
is just checking the
 VSETVLs demand info are compatible or not. That's all we need to do.
 I belive such approach is much more reliable and optimal than before (We 
have many testcases already to check this refactor patch).
  3. Using LCM approach to do the demand fusion is more reliable and better CFG 
than before.
  ...

Here is the basics of this patch approach:

Consider this following case:

for
  for 
for
  ...
 for
   if (...)
 VSETVL 1 demand: RATIO = 32 and TU policy.
   else if (...)
 VSETVL 2 demand: SEW = 16.
   else
 VSETVL 3 demand: MU policy.

   - 'compute_earliest' which output the earliest edge of VSETVL 1, VSETVL 2 
and VSETVL 3.
 They are having same earliest edge which is outside the 1th inner-most 
loop.
   
   - Then, we check these 3 VSETVL demand info are compatible so fuse them into 
a single VSETVL info:
 demand SEW = 16, LMUL = MF2, TU, MU.
   
   - Then the later phase (phase 4) LCM PRE (partial reduandancy elimination) 
will hoist such VSETVL
 to the outer-most loop. So that we can get optimal codegen.

This patch is depending on: 
https://gcc.gnu.org/pipermail/gcc-patches/2023-August/627948.html

gcc/ChangeLog:

* config/riscv/riscv-vsetvl.cc (vsetvl_vtype_change_only_p): New 
function.
(find_reg_killed_by): Delete.
(after_or_same_p): New function.
(has_vsetvl_killed_avl_p):Delete.
(anticipatable_occurrence_p): Adapt function.
(get_same_bb_set): Delete.
(any_set_in_bb_p): Ditto.
(change_insn): Format.
(ge_sew_ratio_unavailable_p): Fix bug.
(backward_propagate_worthwhile_p): Delete.
(vector_insn_info::parse_insn): Adapt function.
(vector_insn_info::merge): Ditto.
(vector_insn_info::dump): Ditto.
(vector_infos_manager::vector_infos_manager): Refactor Phase 3.
(vector_infos_manager::all_empty_predecessor_p): Delete.
(vector_infos_manager::all_same_ratio_p): Refactor Phase 3.
(vector_infos_manager::all_same_avl_p): Ditto.
(vector_infos_manager::create_bitmap_vectors): Ditto.
(vector_infos_manager::free_bitmap_vectors): Ditto.
(vector_infos_manager::dump): Ditto.
(pass_vsetvl::update_block_info): New function.
(enum fusion_type): Refactor Phase 3.
(pass_vsetvl::get_backward_fusion_type): Delete.
(demands_can_be_fused_p): New function.
(pass_vsetvl::hard_empty_block_p): Delete.
(earliest_pred_can_be_fused_p): New function.
(pass_vsetvl::backward_demand_fusion): Delete.
(pass_vsetvl::earliest_fusion): New function.
(pass_vsetvl::forward_demand_fusion): Delete.
(pass_vsetvl::demand_fusion): Ditto.
(pass_vsetvl::cleanup_illegal_dirty_blocks): Ditto.
(pass_vsetvl::compute_local_properties): Adapt function.
(pass_vsetvl::refine_vsetvls): Ditto.
(pass_vsetvl::cleanup_vsetvls): Ditto.
(pass_vsetvl::commit_vsetvls): Ditto.
(pass_vsetvl::local_eliminate_vsetvl_insn): Ditto.
(get_first_vsetvl_before_rvv_insns): Ditto.
(pass_vsetvl::global_eliminate_vsetvl_insn): Ditto.
(pass_vsetvl::cleanup_earliest_vsetvls): New function.
(pass_vsetvl::df_post_optimization): Adapt function.
(pass_vsetvl::compute_probabilities): Ditto.
(pass_vsetvl::lazy_vsetvl): Ditto.
* config/riscv/riscv-vsetvl.def (DEF_SEW_LMUL_FUSE_RULE): Fix bug.
* config/riscv/riscv-vsetvl.h: Refactor Phase 3.
* config/riscv/t-riscv: