Re: [PATCH V2] RISC-V: Support Dynamic LMUL Cost model
Hi Juzhe, general remark upfront: Please add function-level comments for all functions. This makes reading and reviewing much easier. I had to sweep back and forth quite a bit. > + > +static int > +get_last_live_range (const vec &live_ranges, tree var) > +{ > + unsigned int ix; > + var_live_range *live_range; > + FOR_EACH_VEC_ELT_REVERSE (live_ranges, ix, live_range) > +if (live_range->var == var) > + return ix; > + return -1; > +} >From reading the usage site of this function it looks like we could benefit from having the live ranges be a hash_map as well? That way we wouldn't need to scan through the list every time. Something like hash_map>. It looks like we only consider the range end anyway. > + int index = get_last_live_range (live_ranges, var); That way we could avoid some worst-case behavior here for pathological inputs. > + if (index == -1) > + { > + var_live_range range = {var, 0, point}; > + live_ranges.safe_push (range); > + } Please add a comment that we assume the variable is live from the start of this block. > + else > + live_ranges[index].end = point; And here a comment that we will grow the live range for each use. > +static bool > +live_range_conflict_p (const var_live_range &live_range1, > +const var_live_range &live_range2) > +{ > + if (live_range1.start >= live_range2.end) > +return false; > + if (live_range1.end <= live_range2.start) > +return false; > + if (live_range2.start >= live_range1.end) > +return false; > + if (live_range2.end <= live_range1.start) > +return false; > + return true; > +} Rename to live_range_overlap_p and simplify to return a.end >= b.start || b.end >= a.start; > + > +static unsigned int > +max_number_of_live_regs (const basic_block bb, > + const vec &live_ranges, > + machine_mode biggest_mode, int lmul) > +{ > + unsigned int max_nregs = 0; > + unsigned int i, j, k; > + unsigned int live_point = 0; > + for (i = 0; i < live_ranges.length (); i++) > +{ > + auto_vec conflict_live_ranges; > + var_live_range live_range = live_ranges[i]; > + conflict_live_ranges.safe_push (live_range); > + unsigned int min_point = live_range.start; > + unsigned int max_point = live_range.end; > + for (j = 0; j < live_ranges.length (); j++) > + { > + if (j == i) > + continue; > + if (live_range_conflict_p (live_range, live_ranges[j])) > + { > + conflict_live_ranges.safe_push (live_ranges[j]); > + min_point > + = std::min (min_point, (unsigned int) live_ranges[j].start); > + max_point > + = std::max (max_point, (unsigned int) live_ranges[j].end); > + } > + } > + for (j = min_point; j <= max_point; j++) > + { > + unsigned int nregs = 0; > + for (k = 0; k < conflict_live_ranges.length (); k++) > + { > + if (j >= (unsigned int) conflict_live_ranges[k].start > + && j <= (unsigned int) conflict_live_ranges[k].end) > + { > + machine_mode mode > + = TYPE_MODE (TREE_TYPE (conflict_live_ranges[k].var)); > + nregs += compute_nregs_for_mode (mode, biggest_mode, lmul); > + } > + } > + if (nregs > max_nregs) > + { > + max_nregs = nregs; > + live_point = j; > + } > + } > +} This looks pretty quadratic in the number of live ranges (or even cubic?). Can't it be done more efficiently using a sliding-window approach by sorting the live ranges according to their start point before? Also std::min/max -> MIN/MAX. > + > + /* Collect user explicit RVV type. */ > + hash_set all_preds = get_all_predecessors (bb); > + hash_set all_succs = get_all_successors (bb); As mentioned before, maybe dominator info could help here? > + for (i = 0; i < cfun->gimple_df->ssa_names->length (); i++) > +{ > + tree t = ssa_name (i); > + if (!t) > + continue; > + machine_mode mode = TYPE_MODE (TREE_TYPE (t)); > + if (!lookup_vector_type_attribute (TREE_TYPE (t)) > + && !riscv_v_ext_vls_mode_p (mode)) > + continue; > + > + gimple *def = SSA_NAME_DEF_STMT (t); > + if (gimple_bb (def) && !all_preds.contains (gimple_bb (def))) > + continue; > + const ssa_use_operand_t *const head = &(SSA_NAME_IMM_USE_NODE (t)); > + const ssa_use_operand_t *ptr; > + > + for (ptr = head->next; ptr != head; ptr = ptr->next) > + { > + if (USE_STMT (ptr) && !is_gimple_debug (USE_STMT (ptr))) > + { > + if (all_succs.contains (gimple_bb (USE_STMT (ptr > + { Reverse the conditions and continue, i.e. if (!USE_STMT || is_gimple_debug || !all_succs.contains). >
Re: [PATCH V2] RISC-V: Support Dynamic LMUL Cost model
On 9/5/23 15:39, 钟居哲 wrote: - Why don't we use the normal reverse postorder (or postorder) approach of computing live ranges? Is that because we don't really need full global live ranges? Yes. We don't need global live ranges. - Why can't we use existing code i.e. tree-ssa-live? I suspect I already know the answer but an explanation (in a comment) would still be useful. The existing code can't help I have tried many times. I would expect it to be fairly hard to use for this purpose. I've tried to use it in other contexts as well without success. Jeff
Re: Re: [PATCH V2] RISC-V: Support Dynamic LMUL Cost model
- Why don't we use the normal reverse postorder (or postorder) approach of computing live ranges? Is that because we don't really need full global live ranges? Yes. We don't need global live ranges. - Why can't we use existing code i.e. tree-ssa-live? I suspect I already know the answer but an explanation (in a comment) would still be useful. The existing code can't help I have tried many times. - Do we really need get_all_predecessors/get_all_successors? As they're only used for "defined before" and "used after", at first glance it looks like some kind of dominance info could help there but I didn't really check in detail. Yes. At the first time, I want to use dominance analysis but I am not sure whether we can use df_analyze () in COST model framwork. It worth trying. - Why don't we use bitmaps/sbitmaps like in vsetvl.cc and other related passes? I don't mind maps but just wonder if it's on purpose, for convenience or something else. I don't know how to use bitmap to substitue the current approach of using map. Besides, it might help to rename program_points_map (into program_points_per_bb or so). At first it looked quadratic to me but we're just iterating over the program points of a BB. Ok. juzhe.zh...@rivai.ai From: Robin Dapp Date: 2023-09-06 05:02 To: Juzhe-Zhong; gcc-patches CC: rdapp.gcc; kito.cheng; kito.cheng; jeffreyalaw Subject: Re: [PATCH V2] RISC-V: Support Dynamic LMUL Cost model Hi Juzhe, I think the general approach makes sense and it doesn't need to be perfect from the beginning as we can always iterate on it. Before continuing with a more detailed review (hopefully tomorrow) some high-level questions upfront. It would help to document some of these choices so it's easier to understand the rationale. - Why don't we use the normal reverse postorder (or postorder) approach of computing live ranges? Is that because we don't really need full global live ranges? - Why can't we use existing code i.e. tree-ssa-live? I suspect I already know the answer but an explanation (in a comment) would still be useful. - Do we really need get_all_predecessors/get_all_successors? As they're only used for "defined before" and "used after", at first glance it looks like some kind of dominance info could help there but I didn't really check in detail. - Why don't we use bitmaps/sbitmaps like in vsetvl.cc and other related passes? I don't mind maps but just wonder if it's on purpose, for convenience or something else. Besides, it might help to rename program_points_map (into program_points_per_bb or so). At first it looked quadratic to me but we're just iterating over the program points of a BB. Regards Robin
Re: [PATCH V2] RISC-V: Support Dynamic LMUL Cost model
Hi Juzhe, I think the general approach makes sense and it doesn't need to be perfect from the beginning as we can always iterate on it. Before continuing with a more detailed review (hopefully tomorrow) some high-level questions upfront. It would help to document some of these choices so it's easier to understand the rationale. - Why don't we use the normal reverse postorder (or postorder) approach of computing live ranges? Is that because we don't really need full global live ranges? - Why can't we use existing code i.e. tree-ssa-live? I suspect I already know the answer but an explanation (in a comment) would still be useful. - Do we really need get_all_predecessors/get_all_successors? As they're only used for "defined before" and "used after", at first glance it looks like some kind of dominance info could help there but I didn't really check in detail. - Why don't we use bitmaps/sbitmaps like in vsetvl.cc and other related passes? I don't mind maps but just wonder if it's on purpose, for convenience or something else. Besides, it might help to rename program_points_map (into program_points_per_bb or so). At first it looked quadratic to me but we're just iterating over the program points of a BB. Regards Robin