Hi Kewen, Sorry for the late reply. The patch's most important change is below cost computation:
> @@ -5890,6 +5973,10 @@ determine_iv_cost (struct ivopts_data *data, struct > iv_cand *cand) > cost_step = add_cost (data->speed, TYPE_MODE (TREE_TYPE (base))); > cost = cost_step + adjust_setup_cost (data, cost_base.cost); > > + /* Consider additional step updates during unrolling. */ > + if (data->consider_reg_offset_for_unroll_p && !cand->reg_offset_p) > + cost += (data->current_loop->estimated_unroll - 1) * cost_step; This is a bit strange, to me the add instructions are additional computation caused by unrolling+addressing_mode, rather than a native part in candidate itself. Specifically, an additional cost is needed if candidates (without reg_offset_p) are chosen for the address type group/uses. > + > /* Prefer the original ivs unless we may gain something by replacing it. > The reason is to make debugging simpler; so this is not relevant for > artificial ivs created by other optimization passes. */ > > @@ -3654,6 +3729,14 @@ set_group_iv_cost (struct ivopts_data *data, > return; > } > > + /* Since we priced more on non reg_offset IV cand step cost, we should > scale > + up the appropriate IV group costs. Simply consider USE_COMPARE at the > + loop exit, FIXME if multiple exits supported or no loop exit comparisons > + matter. */ > + if (data->consider_reg_offset_for_unroll_p > + && group->vuses[0]->type != USE_COMPARE) > + cost *= (HOST_WIDE_INT) data->current_loop->estimated_unroll; Not quite follow here, giving "pricing more on on-reg_offset IV cand" doesn't make much sense to me. Also why generic type uses are not skipped? We want to model the cost required for address computation, however, for generic type uses there is no way to save the computation in "address expression". Once unrolled, the computation is always there? And what's the impact on targets supporting [base + index + offset] addressing mode? Given the patch is not likely to harm because rtl loop unrolling is (or was?) by default disabled, so I am OK once the above comments are addressed. I wonder if it's possible to get 10% of (all which should be unrolled) loops unrolled (conservatively) on gimple and enable it by default at O3, rather than teaching ivopts to model a future pass which not likely be used outside of benchmarks? Thanks, bin > + > if (data->consider_all_candidates) > { > group->cost_map[cand->id].cand = cand; On Thu, May 28, 2020 at 8:24 PM Kewen.Lin <li...@linux.ibm.com> wrote: > > > gcc/ChangeLog > > 2020-MM-DD Kewen Lin <li...@gcc.gnu.org> > > * tree-ssa-loop-ivopts.c (struct iv_group): New field reg_offset_p. > (struct iv_cand): New field reg_offset_p. > (struct ivopts_data): New field consider_reg_offset_for_unroll_p. > (dump_groups): Dump group with reg_offset_p. > (record_group): Initialize reg_offset_p. > (mark_reg_offset_groups): New function. > (find_interesting_uses): Call mark_reg_offset_groups. > (add_candidate_1): Update reg_offset_p if derived from reg_offset_p > group. > (set_group_iv_cost): Scale up group cost with estimate_unroll_factor > if > consider_reg_offset_for_unroll_p. > (determine_iv_cost): Increase step cost with estimate_unroll_factor if > consider_reg_offset_for_unroll_p. > (tree_ssa_iv_optimize_loop): Call estimate_unroll_factor, update > consider_reg_offset_for_unroll_p. > > ----