Re: Re: [PATCH] RISC-V: Refactor Phase 3 (Demand fusion) of VSETVL PASS
I have reorder the functions so that we won't mess up deleted functions and new functions. V2 patch: https://gcc.gnu.org/pipermail/gcc-patches/2023-August/628237.html >> Why need this exception? Because we have this piece code here for fusion in "EMPTY" block: new_info = expr.merge (expr, GLOBAL_MERGE, eg->src->index); The expr may not have a reall avl source which is considered as incompatible. However, in this case, we should skip the compatible check, just use merge to compute demand info. >>Make sure I understand this correctly: it's worth if thoe edges has >>different probability? >>If all probability is same, then it's not worth? The probability is supposed to help for picking the optimal VSETVL info for incompatible demand infos. Consider this following case: void f (int32_t * restrict in, int32_t * restrict out, size_t n, size_t cond, size_t cond2) { for (size_t i = 0; i < n; i++) { if (i== cond) { vint8mf8_t v = *(vint8mf8_t*)(in + i + 100); *(vint8mf8_t*)(out + i + 100) = v; } else { vbool1_t v = *(vbool1_t*)(in + i + 400); *(vbool1_t*)(out + i + 400) = v; } } } Both VSETVLs are incompatible since one want e8mf8, the other wants e8m8. For if (i == cond) is very low probability (It could only be accessed 0 times or once) We want to hoist the e8m8 to get optimal codegen like this: f: beq a2,zero,.L10 addi a0,a0,1600 addi a1,a1,1600 li a5,0 vsetvli a4,zero,e8,m8,ta,ma .L5: beq a3,a5,.L12 vlm.v v1,0(a0) vsm.v v1,0(a1) .L4: addi a5,a5,1 addi a0,a0,4 addi a1,a1,4 bne a2,a5,.L5 .L10: ret .L12: vsetvli a7,zero,e8,mf8,ta,ma addi a6,a1,-1200 addi t1,a0,-1200 vle8.v v1,0(t1) vse8.v v1,0(a6) vsetvli a4,zero,e8,m8,ta,ma j .L4 Wheras the other case is like this: void f (int32_t * restrict in, int32_t * restrict out, size_t n, size_t cond, size_t cond2) { for (size_t i = 0; i < n; i++) { if (i > cond) { vint8mf8_t v = *(vint8mf8_t*)(in + i + 100); *(vint8mf8_t*)(out + i + 100) = v; } else { vbool1_t v = *(vbool1_t*)(in + i + 400); *(vbool1_t*)(out + i + 400) = v; } } } Both condition probabilities are equal, so we don't want to take any of them as higher priority, so the codegen should be: f: beq a2,zero,.L10 addi a0,a0,1600 addi a1,a1,1600 li a5,0 j .L5 .L12: vsetvli a7,zero,e8,mf8,ta,ma addi a5,a5,1 vle8.v v1,0(a6) vse8.v v1,0(a4) addi a0,a0,4 addi a1,a1,4 beq a2,a5,.L10 .L5: addi a4,a1,-1200 addi a6,a0,-1200 bltu a3,a5,.L12 vsetvli t1,zero,e8,m8,ta,ma addi a5,a5,1 vlm.v v1,0(a0) vsm.v v1,0(a1) addi a0,a0,4 addi a1,a1,4 bne a2,a5,.L5 .L10: ret juzhe.zh...@rivai.ai From: Kito Cheng Date: 2023-08-22 23:35 To: Kito Cheng CC: Robin Dapp; Juzhe-Zhong; GCC Patches; Jeff Law Subject: Re: [PATCH] RISC-V: Refactor Phase 3 (Demand fusion) of VSETVL PASS It's really great improvement, it's drop some state like HARD_EMPTY and DIRTY_WITH_KILLED_AVL which make this algorithm more easy to understand! also this also fundamentally improved the phase 3, although one concern is the time complexity might be come more higher order, (and it's already high enough in fact.) but mostly those vectorized code are only appeard within the inner most loop, so that is acceptable in generally So I will try my best to review this closely to make it more close to the perfect :) I saw you has update serveral testcase, why update instead of add new testcase?? could you say more about why some testcase added __riscv_vadd_vv_i8mf8 or add some more dependency of vl variable? > @@ -1423,8 +1409,13 @@ static bool > ge_sew_ratio_unavailable_p (const vector_insn_info &info1, > const vector_insn_info &info2) > { > - if (!info2.demand_p (DEMAND_LMUL) && info2.demand_p (DEMAND_GE_SEW)) > -return info1.get_sew () < info2.get_sew (); > + if (!info2.demand_p (DEMAND_LMUL)) > +{ > + if (info2.demand_p (DEMAND_GE_SEW)) > + return info1.get_sew () < info2.get_sew (); > + else if (!info2.demand_p (DEMAND_SEW)) > + return false; > +} This seems relax the compatiblitly check to allow optimize more case, if so this should be a sperated patch. >return true; > } > @@ -1815,7 +1737,7 @@ vector_insn_info::parse_insn (rtx_insn *rinsn) > return; >if (optimize == 0 && !has_vtype_op (rinsn)) > return; > - if (optimize > 0 && !vsetvl_insn_p (rinsn)) > + if (optimize > 0 && vsetvl_discard_result_insn_p (rinsn)) I didn't get this change, could you explan few more about that? it was early exit for non vsetvl insn, but now it allowed that now? > return; >m_state = VALID; >extract_insn_cached (rinsn); > @@ -2206,9 +2128,9 @@ vector_insn_info::fuse_mask_policy (const > vector_insn_info &am
Re: Re: [PATCH] RISC-V: Refactor Phase 3 (Demand fusion) of VSETVL PASS
>> This seems relax the compatiblitly check to allow optimize more case, >> if so this should be a sperated patch. This is not a optimization fix, It's an bug fix. Since fusion for these 2 demands: 1. demand SEW and GE_SEW (meaning demand a SEW larger than a specific SEW). 2. demand SEW and GE_SEW (meaning demand a SEW larger than a specific SEW) and demand RATIO. The new fusion demand should include RATIO demand but it didn't before. It's an bug. It's lucky that previous tests didn't expose such bug before refactor. But such bug is exposed after refactor. I committed it with a separate patch. Thanks. juzhe.zh...@rivai.ai From: Kito Cheng Date: 2023-08-22 23:35 To: Kito Cheng CC: Robin Dapp; Juzhe-Zhong; GCC Patches; Jeff Law Subject: Re: [PATCH] RISC-V: Refactor Phase 3 (Demand fusion) of VSETVL PASS It's really great improvement, it's drop some state like HARD_EMPTY and DIRTY_WITH_KILLED_AVL which make this algorithm more easy to understand! also this also fundamentally improved the phase 3, although one concern is the time complexity might be come more higher order, (and it's already high enough in fact.) but mostly those vectorized code are only appeard within the inner most loop, so that is acceptable in generally So I will try my best to review this closely to make it more close to the perfect :) I saw you has update serveral testcase, why update instead of add new testcase?? could you say more about why some testcase added __riscv_vadd_vv_i8mf8 or add some more dependency of vl variable? > @@ -1423,8 +1409,13 @@ static bool > ge_sew_ratio_unavailable_p (const vector_insn_info &info1, > const vector_insn_info &info2) > { > - if (!info2.demand_p (DEMAND_LMUL) && info2.demand_p (DEMAND_GE_SEW)) > -return info1.get_sew () < info2.get_sew (); > + if (!info2.demand_p (DEMAND_LMUL)) > +{ > + if (info2.demand_p (DEMAND_GE_SEW)) > + return info1.get_sew () < info2.get_sew (); > + else if (!info2.demand_p (DEMAND_SEW)) > + return false; > +} This seems relax the compatiblitly check to allow optimize more case, if so this should be a sperated patch. >return true; > } > @@ -1815,7 +1737,7 @@ vector_insn_info::parse_insn (rtx_insn *rinsn) > return; >if (optimize == 0 && !has_vtype_op (rinsn)) > return; > - if (optimize > 0 && !vsetvl_insn_p (rinsn)) > + if (optimize > 0 && vsetvl_discard_result_insn_p (rinsn)) I didn't get this change, could you explan few more about that? it was early exit for non vsetvl insn, but now it allowed that now? > return; >m_state = VALID; >extract_insn_cached (rinsn); > @@ -2206,9 +2128,9 @@ vector_insn_info::fuse_mask_policy (const > vector_insn_info &info1, > > vector_insn_info > vector_insn_info::merge (const vector_insn_info &merge_info, > -enum merge_type type) const > +enum merge_type type, unsigned bb_index) const > { > - if (!vsetvl_insn_p (get_insn ()->rtl ())) > + if (!vsetvl_insn_p (get_insn ()->rtl ()) && *this != merge_info) Why need this exception? > gcc_assert (this->compatible_p (merge_info) > && "Can't merge incompatible demanded infos"); > @@ -2403,18 +2348,22 @@ vector_infos_manager::get_all_available_exprs ( > } > > bool > -vector_infos_manager::all_empty_predecessor_p (const basic_block cfg_bb) > const > +vector_infos_manager::earliest_fusion_worthwhile_p ( > + const basic_block cfg_bb) const > { > - hash_set pred_cfg_bbs = get_all_predecessors (cfg_bb); > - for (const basic_block pred_cfg_bb : pred_cfg_bbs) > + edge e; > + edge_iterator ei; > + profile_probability prob = profile_probability::uninitialized (); > + FOR_EACH_EDGE (e, ei, cfg_bb->succs) > { > - const auto &pred_block_info = vector_block_infos[pred_cfg_bb->index]; > - if (!pred_block_info.local_dem.valid_or_dirty_p () > - && !pred_block_info.reaching_out.valid_or_dirty_p ()) > + if (prob == profile_probability::uninitialized ()) > + prob = vector_block_infos[e->dest->index].probability; > + else if (prob == vector_block_infos[e->dest->index].probability) > continue; > - return false; > + else > + return true; Make sure I understand this correctly: it's worth if thoe edges has different probability? > } > - return true; > + return false; If all probability is same, then it's not worth? Plz add few comments no matter my understand is right or not :) > } > > bool > @@ -2428,12 +2377,12 @@ vector_infos_m
Re: Re: [PATCH] RISC-V: Refactor Phase 3 (Demand fusion) of VSETVL PASS
>> I saw you has update serveral testcase, why update instead of add new >> testcase?? Since original testcase failed after this patch. >> could you say more about why some testcase added __riscv_vadd_vv_i8mf8 >> or add some more dependency of vl variable? These are 2 separate questions. 1. Why some testcase added __riscv_vadd_vv_i8mf8. This is because the original testcase is too fragile and easily fail. Consider this following case: for (...) if (cond) vsetvl e8mf8 load store else vsetvl e16mf4 load store This example, we know that both "e8mf8" and "e16mf4" are compatible, so we can either put a vsevl e8mf8 or vsetvli e16mf4 before the for...loop and elide all vsetvlis inside the loop. Before this patch, the codegen result is vsetvli e8mf8, after this patch, the codegen result is vsetvli e16mf4. They are both legal and optimal codegen. To avoid future potential unnecessary test report failure, I added "vadd" which demand both SEW and LMUL and only allow e8mf8. Such testcase doesn't change our testing goal, since our goal of this testcase is to test LCM ability of fusing VSETVL and compute the optimal location of vsetvl. 2. Why add some more dependency of vl variable ? Well, as I told you previously. HARD_EMPTY and DIRTY_WITH_KILLED_AVL is supposed to optimize this following case: li a6, 101. vsetvli e8mf8 for ... li a5,101 vsetvli e16mf4 for ... This case happens since we set "li" cost too low that previous pass failed to optimized them. I don't think we should optimize such corner case in VSETVL PASS which complicates the implementation seriously and mess up the code quality. So after I remove them, the codegen for such case will generate one more "vsetvli" (only one more dynamic run-time instruction count). I note if we make all "li" inside a loop, the issue will be gone and VSETVL PASS can achieve optimal codegen. To fix this failure of such testcases, instead of "vl= 101", I make them "vl = a + 101", then the assembly check remain and pass. Thanks. juzhe.zh...@rivai.ai From: Kito Cheng Date: 2023-08-22 23:35 To: Kito Cheng CC: Robin Dapp; Juzhe-Zhong; GCC Patches; Jeff Law Subject: Re: [PATCH] RISC-V: Refactor Phase 3 (Demand fusion) of VSETVL PASS It's really great improvement, it's drop some state like HARD_EMPTY and DIRTY_WITH_KILLED_AVL which make this algorithm more easy to understand! also this also fundamentally improved the phase 3, although one concern is the time complexity might be come more higher order, (and it's already high enough in fact.) but mostly those vectorized code are only appeard within the inner most loop, so that is acceptable in generally So I will try my best to review this closely to make it more close to the perfect :) I saw you has update serveral testcase, why update instead of add new testcase?? could you say more about why some testcase added __riscv_vadd_vv_i8mf8 or add some more dependency of vl variable? > @@ -1423,8 +1409,13 @@ static bool > ge_sew_ratio_unavailable_p (const vector_insn_info &info1, > const vector_insn_info &info2) > { > - if (!info2.demand_p (DEMAND_LMUL) && info2.demand_p (DEMAND_GE_SEW)) > -return info1.get_sew () < info2.get_sew (); > + if (!info2.demand_p (DEMAND_LMUL)) > +{ > + if (info2.demand_p (DEMAND_GE_SEW)) > + return info1.get_sew () < info2.get_sew (); > + else if (!info2.demand_p (DEMAND_SEW)) > + return false; > +} This seems relax the compatiblitly check to allow optimize more case, if so this should be a sperated patch. >return true; > } > @@ -1815,7 +1737,7 @@ vector_insn_info::parse_insn (rtx_insn *rinsn) > return; >if (optimize == 0 && !has_vtype_op (rinsn)) > return; > - if (optimize > 0 && !vsetvl_insn_p (rinsn)) > + if (optimize > 0 && vsetvl_discard_result_insn_p (rinsn)) I didn't get this change, could you explan few more about that? it was early exit for non vsetvl insn, but now it allowed that now? > return; >m_state = VALID; >extract_insn_cached (rinsn); > @@ -2206,9 +2128,9 @@ vector_insn_info::fuse_mask_policy (const > vector_insn_info &info1, > > vector_insn_info > vector_insn_info::merge (const vector_insn_info &merge_info, > -enum merge_type type) const > +enum merge_type type, unsigned bb_index) const > { > - if (!vsetvl_insn_p (get_insn ()->rtl ())) > + if (!vsetvl_insn_p (get_insn ()->rtl ()) && *this != merge_info) Why need this exception? > gcc_assert (this->compatible_p (merge_info) >
Re: [PATCH] RISC-V: Refactor Phase 3 (Demand fusion) of VSETVL PASS
It's really great improvement, it's drop some state like HARD_EMPTY and DIRTY_WITH_KILLED_AVL which make this algorithm more easy to understand! also this also fundamentally improved the phase 3, although one concern is the time complexity might be come more higher order, (and it's already high enough in fact.) but mostly those vectorized code are only appeard within the inner most loop, so that is acceptable in generally So I will try my best to review this closely to make it more close to the perfect :) I saw you has update serveral testcase, why update instead of add new testcase?? could you say more about why some testcase added __riscv_vadd_vv_i8mf8 or add some more dependency of vl variable? > @@ -1423,8 +1409,13 @@ static bool > ge_sew_ratio_unavailable_p (const vector_insn_info &info1, > const vector_insn_info &info2) > { > - if (!info2.demand_p (DEMAND_LMUL) && info2.demand_p (DEMAND_GE_SEW)) > -return info1.get_sew () < info2.get_sew (); > + if (!info2.demand_p (DEMAND_LMUL)) > +{ > + if (info2.demand_p (DEMAND_GE_SEW)) > + return info1.get_sew () < info2.get_sew (); > + else if (!info2.demand_p (DEMAND_SEW)) > + return false; > +} This seems relax the compatiblitly check to allow optimize more case, if so this should be a sperated patch. >return true; > } > @@ -1815,7 +1737,7 @@ vector_insn_info::parse_insn (rtx_insn *rinsn) > return; >if (optimize == 0 && !has_vtype_op (rinsn)) > return; > - if (optimize > 0 && !vsetvl_insn_p (rinsn)) > + if (optimize > 0 && vsetvl_discard_result_insn_p (rinsn)) I didn't get this change, could you explan few more about that? it was early exit for non vsetvl insn, but now it allowed that now? > return; >m_state = VALID; >extract_insn_cached (rinsn); > @@ -2206,9 +2128,9 @@ vector_insn_info::fuse_mask_policy (const > vector_insn_info &info1, > > vector_insn_info > vector_insn_info::merge (const vector_insn_info &merge_info, > -enum merge_type type) const > +enum merge_type type, unsigned bb_index) const > { > - if (!vsetvl_insn_p (get_insn ()->rtl ())) > + if (!vsetvl_insn_p (get_insn ()->rtl ()) && *this != merge_info) Why need this exception? > gcc_assert (this->compatible_p (merge_info) > && "Can't merge incompatible demanded infos"); > @@ -2403,18 +2348,22 @@ vector_infos_manager::get_all_available_exprs ( > } > > bool > -vector_infos_manager::all_empty_predecessor_p (const basic_block cfg_bb) > const > +vector_infos_manager::earliest_fusion_worthwhile_p ( > + const basic_block cfg_bb) const > { > - hash_set pred_cfg_bbs = get_all_predecessors (cfg_bb); > - for (const basic_block pred_cfg_bb : pred_cfg_bbs) > + edge e; > + edge_iterator ei; > + profile_probability prob = profile_probability::uninitialized (); > + FOR_EACH_EDGE (e, ei, cfg_bb->succs) > { > - const auto &pred_block_info = vector_block_infos[pred_cfg_bb->index]; > - if (!pred_block_info.local_dem.valid_or_dirty_p () > - && !pred_block_info.reaching_out.valid_or_dirty_p ()) > + if (prob == profile_probability::uninitialized ()) > + prob = vector_block_infos[e->dest->index].probability; > + else if (prob == vector_block_infos[e->dest->index].probability) > continue; > - return false; > + else > + return true; Make sure I understand this correctly: it's worth if thoe edges has different probability? > } > - return true; > + return false; If all probability is same, then it's not worth? Plz add few comments no matter my understand is right or not :) > } > > bool > @@ -2428,12 +2377,12 @@ vector_infos_manager::all_same_ratio_p (sbitmap > bitdata) const >sbitmap_iterator sbi; > >EXECUTE_IF_SET_IN_BITMAP (bitdata, 0, bb_index, sbi) > - { > -if (ratio == -1) > - ratio = vector_exprs[bb_index]->get_ratio (); > -else if (vector_exprs[bb_index]->get_ratio () != ratio) > - return false; > - } > +{ > + if (ratio == -1) > + ratio = vector_exprs[bb_index]->get_ratio (); > + else if (vector_exprs[bb_index]->get_ratio () != ratio) > + return false; > +} >return true; > } Split this into a NFC patch, you can commit that without asking review. > @@ -907,8 +893,8 @@ change_insn (function_info *ssa, insn_change change, > insn_info *insn, > ] UNSPEC_VPREDICATE) > (plus:RVVM4DI (reg/v:RVVM4DI 104 v8 [orig:137 op1 ] [137]) > (sign_extend:RVVM4DI (vec_duplicate:RVVM4SI (reg:SI 15 a5 > -[140] (unspec:RVVM4DI [ (const_int 0 [0]) ] UNSPEC_VUNDEF))) > "rvv.c":8:12 > -2784 {pred_single_widen_addsvnx8di_scalar} (expr_list:REG_EQUIV > +[140] (unspec:RVVM4DI [ (const_int 0 [0]) ] UNSPEC_VUNDEF))) > +"rvv.c":8:12 2784 {pred_single_widen_addsvnx8di_scalar} > (expr_list:REG_EQUIV > (mem/c:RVVM4DI (reg:DI 10 a0 [142])
Re: [PATCH] RISC-V: Refactor Phase 3 (Demand fusion) of VSETVL PASS
I think I could do some details review tomorrow on the plane, I am free from the meeting hell tomorrow :p Robin Dapp via Gcc-patches 於 2023年8月21日 週一 23:24 寫道: > Hi Juzhe, > > thanks, this is a reasonable approach and improves readability noticeably. > LGTM but I'd like to wait for other opinions (e.g. by Kito) as I haven't > looked closely into the vsetvl pass before and cannot entirely review it > quickly. As we already have good test coverage there is not much that > can go wrong IMHO. > > Regards > Robin >
Re: [PATCH] RISC-V: Refactor Phase 3 (Demand fusion) of VSETVL PASS
Hi Juzhe, thanks, this is a reasonable approach and improves readability noticeably. LGTM but I'd like to wait for other opinions (e.g. by Kito) as I haven't looked closely into the vsetvl pass before and cannot entirely review it quickly. As we already have good test coverage there is not much that can go wrong IMHO. Regards Robin
[PATCH] RISC-V: Refactor Phase 3 (Demand fusion) of VSETVL PASS
This patch refactors the Phase 3 (Demand fusion) and rename it into Earliest fusion. I do the refactor for the following reasons: 1. Current implementation of phase 3 is doing too many things which makes the code quality quite messy and not easy to maintain. 2. The demand fusion I do previously is we explicitly make the fusion including how to fuse VSETVLs, where to make the VSETVL fusion happens, check the VSETVL fusion point (location) whether it is correct and optimal...etc. We are dong these things too much so I added these following functions: enum fusion_type get_backward_fusion_type (const bb_info *, const vector_insn_info &); bool hard_empty_block_p (const bb_info *, const vector_insn_info &) const; bool backward_demand_fusion (void); bool forward_demand_fusion (void); bool cleanup_illegal_dirty_blocks (void); to make sure the VSETV fusion is optimal and correct. I found in may downstream testing it is not the reliable and optimal approach. Instead, this patch is to use 'compute_earliest' which is the function of LCM to fuse multiple 'compatible' VSETVL demand info if they are having same earliest edge. We let LCM decide almost everything of demand fusion for us. The only thing we do (Not the LCM do) is just checking the VSETVLs demand info are compatible or not. That's all we need to do. I belive such approach is much more reliable and optimal than before (We have many testcases already to check this refactor patch). 3. Using LCM approach to do the demand fusion is more reliable and better CFG than before. ... Here is the basics of this patch approach: Consider this following case: for for for ... for if (...) VSETVL 1 demand: RATIO = 32 and TU policy. else if (...) VSETVL 2 demand: SEW = 16. else VSETVL 3 demand: MU policy. - 'compute_earliest' which output the earliest edge of VSETVL 1, VSETVL 2 and VSETVL 3. They are having same earliest edge which is outside the 1th inner-most loop. - Then, we check these 3 VSETVL demand info are compatible so fuse them into a single VSETVL info: demand SEW = 16, LMUL = MF2, TU, MU. - Then the later phase (phase 4) LCM PRE (partial reduandancy elimination) will hoist such VSETVL to the outer-most loop. So that we can get optimal codegen. This patch is depending on: https://gcc.gnu.org/pipermail/gcc-patches/2023-August/627948.html gcc/ChangeLog: * config/riscv/riscv-vsetvl.cc (vsetvl_vtype_change_only_p): New function. (find_reg_killed_by): Delete. (after_or_same_p): New function. (has_vsetvl_killed_avl_p):Delete. (anticipatable_occurrence_p): Adapt function. (get_same_bb_set): Delete. (any_set_in_bb_p): Ditto. (change_insn): Format. (ge_sew_ratio_unavailable_p): Fix bug. (backward_propagate_worthwhile_p): Delete. (vector_insn_info::parse_insn): Adapt function. (vector_insn_info::merge): Ditto. (vector_insn_info::dump): Ditto. (vector_infos_manager::vector_infos_manager): Refactor Phase 3. (vector_infos_manager::all_empty_predecessor_p): Delete. (vector_infos_manager::all_same_ratio_p): Refactor Phase 3. (vector_infos_manager::all_same_avl_p): Ditto. (vector_infos_manager::create_bitmap_vectors): Ditto. (vector_infos_manager::free_bitmap_vectors): Ditto. (vector_infos_manager::dump): Ditto. (pass_vsetvl::update_block_info): New function. (enum fusion_type): Refactor Phase 3. (pass_vsetvl::get_backward_fusion_type): Delete. (demands_can_be_fused_p): New function. (pass_vsetvl::hard_empty_block_p): Delete. (earliest_pred_can_be_fused_p): New function. (pass_vsetvl::backward_demand_fusion): Delete. (pass_vsetvl::earliest_fusion): New function. (pass_vsetvl::forward_demand_fusion): Delete. (pass_vsetvl::demand_fusion): Ditto. (pass_vsetvl::cleanup_illegal_dirty_blocks): Ditto. (pass_vsetvl::compute_local_properties): Adapt function. (pass_vsetvl::refine_vsetvls): Ditto. (pass_vsetvl::cleanup_vsetvls): Ditto. (pass_vsetvl::commit_vsetvls): Ditto. (pass_vsetvl::local_eliminate_vsetvl_insn): Ditto. (get_first_vsetvl_before_rvv_insns): Ditto. (pass_vsetvl::global_eliminate_vsetvl_insn): Ditto. (pass_vsetvl::cleanup_earliest_vsetvls): New function. (pass_vsetvl::df_post_optimization): Adapt function. (pass_vsetvl::compute_probabilities): Ditto. (pass_vsetvl::lazy_vsetvl): Ditto. * config/riscv/riscv-vsetvl.def (DEF_SEW_LMUL_FUSE_RULE): Fix bug. * config/riscv/riscv-vsetvl.h: Refactor Phase 3. * config/riscv/t-riscv: