>> This seems relax the compatiblitly check to allow optimize more case, >> if so this should be a sperated patch. This is not a optimization fix, It's an bug fix.
Since fusion for these 2 demands: 1. demand SEW and GE_SEW (meaning demand a SEW larger than a specific SEW). 2. demand SEW and GE_SEW (meaning demand a SEW larger than a specific SEW) and demand RATIO. The new fusion demand should include RATIO demand but it didn't before. It's an bug. It's lucky that previous tests didn't expose such bug before refactor. But such bug is exposed after refactor. I committed it with a separate patch. Thanks. juzhe.zh...@rivai.ai From: Kito Cheng Date: 2023-08-22 23:35 To: Kito Cheng CC: Robin Dapp; Juzhe-Zhong; GCC Patches; Jeff Law Subject: Re: [PATCH] RISC-V: Refactor Phase 3 (Demand fusion) of VSETVL PASS It's really great improvement, it's drop some state like HARD_EMPTY and DIRTY_WITH_KILLED_AVL which make this algorithm more easy to understand! also this also fundamentally improved the phase 3, although one concern is the time complexity might be come more higher order, (and it's already high enough in fact.) but mostly those vectorized code are only appeard within the inner most loop, so that is acceptable in generally So I will try my best to review this closely to make it more close to the perfect :) I saw you has update serveral testcase, why update instead of add new testcase?? could you say more about why some testcase added __riscv_vadd_vv_i8mf8 or add some more dependency of vl variable? > @@ -1423,8 +1409,13 @@ static bool > ge_sew_ratio_unavailable_p (const vector_insn_info &info1, > const vector_insn_info &info2) > { > - if (!info2.demand_p (DEMAND_LMUL) && info2.demand_p (DEMAND_GE_SEW)) > - return info1.get_sew () < info2.get_sew (); > + if (!info2.demand_p (DEMAND_LMUL)) > + { > + if (info2.demand_p (DEMAND_GE_SEW)) > + return info1.get_sew () < info2.get_sew (); > + else if (!info2.demand_p (DEMAND_SEW)) > + return false; > + } This seems relax the compatiblitly check to allow optimize more case, if so this should be a sperated patch. > return true; > } > @@ -1815,7 +1737,7 @@ vector_insn_info::parse_insn (rtx_insn *rinsn) > return; > if (optimize == 0 && !has_vtype_op (rinsn)) > return; > - if (optimize > 0 && !vsetvl_insn_p (rinsn)) > + if (optimize > 0 && vsetvl_discard_result_insn_p (rinsn)) I didn't get this change, could you explan few more about that? it was early exit for non vsetvl insn, but now it allowed that now? > return; > m_state = VALID; > extract_insn_cached (rinsn); > @@ -2206,9 +2128,9 @@ vector_insn_info::fuse_mask_policy (const > vector_insn_info &info1, > > vector_insn_info > vector_insn_info::merge (const vector_insn_info &merge_info, > - enum merge_type type) const > + enum merge_type type, unsigned bb_index) const > { > - if (!vsetvl_insn_p (get_insn ()->rtl ())) > + if (!vsetvl_insn_p (get_insn ()->rtl ()) && *this != merge_info) Why need this exception? > gcc_assert (this->compatible_p (merge_info) > && "Can't merge incompatible demanded infos"); > @@ -2403,18 +2348,22 @@ vector_infos_manager::get_all_available_exprs ( > } > > bool > -vector_infos_manager::all_empty_predecessor_p (const basic_block cfg_bb) > const > +vector_infos_manager::earliest_fusion_worthwhile_p ( > + const basic_block cfg_bb) const > { > - hash_set<basic_block> pred_cfg_bbs = get_all_predecessors (cfg_bb); > - for (const basic_block pred_cfg_bb : pred_cfg_bbs) > + edge e; > + edge_iterator ei; > + profile_probability prob = profile_probability::uninitialized (); > + FOR_EACH_EDGE (e, ei, cfg_bb->succs) > { > - const auto &pred_block_info = vector_block_infos[pred_cfg_bb->index]; > - if (!pred_block_info.local_dem.valid_or_dirty_p () > - && !pred_block_info.reaching_out.valid_or_dirty_p ()) > + if (prob == profile_probability::uninitialized ()) > + prob = vector_block_infos[e->dest->index].probability; > + else if (prob == vector_block_infos[e->dest->index].probability) > continue; > - return false; > + else > + return true; Make sure I understand this correctly: it's worth if thoe edges has different probability? > } > - return true; > + return false; If all probability is same, then it's not worth? Plz add few comments no matter my understand is right or not :) > } > > bool > @@ -2428,12 +2377,12 @@ vector_infos_manager::all_same_ratio_p (sbitmap > bitdata) const > sbitmap_iterator sbi; > > EXECUTE_IF_SET_IN_BITMAP (bitdata, 0, bb_index, sbi) > - { > - if (ratio == -1) > - ratio = vector_exprs[bb_index]->get_ratio (); > - else if (vector_exprs[bb_index]->get_ratio () != ratio) > - return false; > - } > + { > + if (ratio == -1) > + ratio = vector_exprs[bb_index]->get_ratio (); > + else if (vector_exprs[bb_index]->get_ratio () != ratio) > + return false; > + } > return true; > } Split this into a NFC patch, you can commit that without asking review. > @@ -907,8 +893,8 @@ change_insn (function_info *ssa, insn_change change, > insn_info *insn, > ] UNSPEC_VPREDICATE) > (plus:RVVM4DI (reg/v:RVVM4DI 104 v8 [orig:137 op1 ] [137]) > (sign_extend:RVVM4DI (vec_duplicate:RVVM4SI (reg:SI 15 a5 > - [140])))) (unspec:RVVM4DI [ (const_int 0 [0]) ] UNSPEC_VUNDEF))) > "rvv.c":8:12 > - 2784 {pred_single_widen_addsvnx8di_scalar} (expr_list:REG_EQUIV > + [140])))) (unspec:RVVM4DI [ (const_int 0 [0]) ] UNSPEC_VUNDEF))) > + "rvv.c":8:12 2784 {pred_single_widen_addsvnx8di_scalar} > (expr_list:REG_EQUIV > (mem/c:RVVM4DI (reg:DI 10 a0 [142]) [1 <retval>+0 S[64, 64] A128]) > (expr_list:REG_EQUAL (if_then_else:RVVM4DI (unspec:RVVMF8BI [ > (const_vector:RVVMF8BI repeat [ Split this into a NFC patch, you can commit that without asking review. > @@ -2777,6 +2770,17 @@ pass_vsetvl::update_vector_info (const insn_info *i, > m_vector_manager->vector_insn_infos[i->uid ()] = new_info; > } > > +void > +pass_vsetvl::update_block_info (int index, profile_probability prob, > + vector_insn_info new_info) const vector_insn_info &new_info > +{ > + m_vector_manager->vector_block_infos[index].probability = prob; > + if (m_vector_manager->vector_block_infos[index].local_dem > + == m_vector_manager->vector_block_infos[index].reaching_out) > + m_vector_manager->vector_block_infos[index].local_dem = new_info; > + m_vector_manager->vector_block_infos[index].reaching_out = new_info; > +} > + { auto &block_info = m_vector_manager->vector_block_infos[index]; block_info.probability = prob; if (block_info.local_dem == block_info.reaching_out) block_info.local_dem = new_info; block_info.reaching_out = new_info; } > /* Simple m_vsetvl_insert vsetvl for optimize == 0. */ > void > pass_vsetvl::simple_vsetvl (void) const > + for (insn_info *i = earliest_pred->end_insn ()->prev_nondebug_insn (); > + real_insn_and_same_bb_p (i, earliest_pred) > + && after_or_same_p (i, last_insn); > + i = i->prev_nondebug_insn ()) > { > + if (!vl && find_access (i->defs (), REGNO (avl))) > + return false; > + if (vl && find_access (i->defs (), REGNO (vl))) > + return false; > + if (vl && find_access (i->uses (), REGNO (vl))) > + return false; should we check `i->is_call () || i->is_asm ()`? > @@ -3892,7 +3408,7 @@ pass_vsetvl::refine_vsetvls (void) const > basic_block cfg_bb; > FOR_EACH_BB_FN (cfg_bb, cfun) > { > - auto info = get_block_info(cfg_bb).local_dem; > + auto info = get_block_info (cfg_bb).local_dem; > insn_info *insn = info.get_insn (); > if (!info.valid_p ()) > continue; Split this into a NFC patch, you can commit that without asking review. > @@ -3938,8 +3454,7 @@ pass_vsetvl::cleanup_vsetvls () > basic_block cfg_bb; > FOR_EACH_BB_FN (cfg_bb, cfun) > { > - auto &info > - = get_block_info(cfg_bb).reaching_out; > + auto &info = get_block_info (cfg_bb).reaching_out; > gcc_assert (m_vector_manager->expr_set_num ( > m_vector_manager->vector_del[cfg_bb->index]) > <= 1); Split this into a NFC patch, you can commit that without asking review. > @@ -3951,9 +3466,7 @@ pass_vsetvl::cleanup_vsetvls () > info.set_unknown (); > else > { > - const auto dem > - = get_block_info(cfg_bb) > - .local_dem; > + const auto dem = get_block_info (cfg_bb).local_dem; > gcc_assert (dem == *m_vector_manager->vector_exprs[i]); > insn_info *insn = dem.get_insn (); > gcc_assert (insn && insn->rtl ()); Split this into a NFC patch, you can commit that without asking review. > @@ -4020,33 +3543,10 @@ pass_vsetvl::commit_vsetvls (void) > for (const bb_info *bb : crtl->ssa->bbs ()) > { > basic_block cfg_bb = bb->cfg_bb (); > - const auto reaching_out > - = get_block_info(cfg_bb).reaching_out; > + const auto reaching_out = get_block_info (cfg_bb).reaching_out; Split this into a NFC patch, you can commit that without asking review. > @@ -4263,7 +3783,8 @@ pass_vsetvl::local_eliminate_vsetvl_insn (const bb_info > *bb) const > > /* Local AVL compatibility checking is simpler than global, we only > need to check the REGNO is same. */ > - if (prev_dem.valid_or_dirty_p () && prev_dem.skip_avl_compatible_p > (curr_dem) > + if (prev_dem.valid_or_dirty_p () > + && prev_dem.skip_avl_compatible_p (curr_dem) > && local_avl_compatible_p (prev_avl, curr_avl)) > { > /* curr_dem and prev_dem is compatible! */ Split this into a NFC patch, you can commit that without asking review. >@@ -4655,8 +4240,7 @@ pass_vsetvl::compute_probabilities (void) > for (const bb_info *bb : crtl->ssa->bbs ()) > { > basic_block cfg_bb = bb->cfg_bb (); >- auto &curr_prob >- = get_block_info(cfg_bb).probability; >+ auto &curr_prob = get_block_info (cfg_bb).probability; > > /* GCC assume entry block (bb 0) are always so > executed so set its probability as "always". */ Split this into a NFC patch, you can commit that without asking review. > @@ -4669,8 +4253,7 @@ pass_vsetvl::compute_probabilities (void) > gcc_assert (curr_prob.initialized_p ()); > FOR_EACH_EDGE (e, ei, cfg_bb->succs) > { > - auto &new_prob > - = get_block_info(e->dest).probability; > + auto &new_prob = get_block_info (e->dest).probability; > if (!new_prob.initialized_p ()) > new_prob = curr_prob * e->probability; > else if (new_prob == profile_probability::always ()) Split this into a NFC patch, you can commit that without asking review. > @@ -4298,7 +3819,8 @@ pass_vsetvl::local_eliminate_vsetvl_insn (const bb_info > *bb) const > none exists or if a user RVV instruction is enountered > prior to any vsetvl. */ > static rtx_insn * > -get_first_vsetvl_before_rvv_insns (basic_block cfg_bb) > +get_first_vsetvl_before_rvv_insns (basic_block cfg_bb, > + enum vsetvl_type insn_type) > { add gcc_assert (insn_type == VSETVL_DISCARD_RESULT || insn_type == VSETVL_VTYPE_CHANGE_ONLY). > rtx_insn *rinsn; > FOR_BB_INSNS (cfg_bb, rinsn) > @@ -4310,7 +3832,11 @@ get_first_vsetvl_before_rvv_insns (basic_block cfg_bb) > if (has_vtype_op (rinsn) || vsetvl_insn_p (rinsn)) > return nullptr; > > - if (vsetvl_discard_result_insn_p (rinsn)) > + if (insn_type == VSETVL_DISCARD_RESULT > + && vsetvl_discard_result_insn_p (rinsn)) > + return rinsn; > + if (insn_type == VSETVL_VTYPE_CHANGE_ONLY > + && vsetvl_vtype_change_only_p (rinsn)) > return rinsn; > } > return nullptr; > diff --git a/gcc/config/riscv/riscv-vsetvl.def > b/gcc/config/riscv/riscv-vsetvl.def > index 7a73149f1da..7289c01efcf 100644 > --- a/gcc/config/riscv/riscv-vsetvl.def > +++ b/gcc/config/riscv/riscv-vsetvl.def > @@ -319,7 +319,7 @@ DEF_SEW_LMUL_FUSE_RULE (/*SEW*/ DEMAND_TRUE, /*LMUL*/ > DEMAND_FALSE, > /*RATIO*/ DEMAND_TRUE, /*GE_SEW*/ DEMAND_FALSE, > /*NEW_DEMAND_SEW*/ true, > /*NEW_DEMAND_LMUL*/ false, > - /*NEW_DEMAND_RATIO*/ false, > + /*NEW_DEMAND_RATIO*/ true, This seems relax the compatiblitly check to allow optimize more case, if so this should be a sperated patch. > /*NEW_DEMAND_GE_SEW*/ true, first_sew, > vlmul_for_first_sew_second_ratio, second_ratio) > DEF_SEW_LMUL_FUSE_RULE (/*SEW*/ DEMAND_TRUE, /*LMUL*/ DEMAND_FALSE, > @@ -386,7 +337,8 @@ public: > bool compatible_avl_p (const avl_info &) const; > bool compatible_vtype_p (const vl_vtype_info &) const; > bool compatible_p (const vl_vtype_info &) const; > - vector_insn_info merge (const vector_insn_info &, enum merge_type) const; > + vector_insn_info merge (const vector_insn_info &, enum merge_type, > + unsigned = 0) const; it seems weired to set bb_index as 0 by default? > > rtl_ssa::insn_info *get_insn () const { return m_insn; } > const bool *get_demands (void) const { return m_demands; } > diff --git a/gcc/config/riscv/t-riscv b/gcc/config/riscv/t-riscv > index 1252d6f851a..f3ce66ccdd4 100644 > --- a/gcc/config/riscv/t-riscv > +++ b/gcc/config/riscv/t-riscv > @@ -62,7 +62,8 @@ riscv-vsetvl.o: $(srcdir)/config/riscv/riscv-vsetvl.cc \ > $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) $(RTL_H) $(REGS_H) \ > $(TARGET_H) tree-pass.h df.h rtl-ssa.h cfgcleanup.h insn-config.h \ > insn-attr.h insn-opinit.h tm-constrs.h cfgrtl.h cfganal.h lcm.h \ > - predict.h profile-count.h $(srcdir)/config/riscv/riscv-vsetvl.h > + predict.h profile-count.h $(srcdir)/config/riscv/riscv-vsetvl.h \ > + $(srcdir)/config/riscv/riscv-vsetvl.def This should be a seperate fix and backport to GCC 13 as well, pre-approve for both master and GCC-13 branch for this fix. > $(COMPILER) -c $(ALL_COMPILERFLAGS) $(ALL_CPPFLAGS) $(INCLUDES) \ > $(srcdir)/config/riscv/riscv-vsetvl.cc