On Mon, 3 Jul 2017, Richard Sandiford wrote: > Richard Biener <rguent...@suse.de> writes: > > On Mon, 12 Jun 2017, Bin.Cheng wrote: > >> On Mon, Jun 12, 2017 at 9:19 AM, Richard Sandiford > >> <richard.sandif...@linaro.org> wrote: > >> > "Bin.Cheng" <amker.ch...@gmail.com> writes: > >> >> On Sat, Jun 10, 2017 at 10:40 AM, Richard Sandiford > >> >> <richard.sandif...@linaro.org> wrote: > >> >>> Sorry to return this old patch, but: > >> >>> > >> >>> Bin Cheng <bin.ch...@arm.com> writes: > >> >>>> -/* Calculate the number of iterations under which scalar loop will be > >> >>>> - preferred than vectorized loop. NITERS_PROLOG is the number of > >> >>>> - iterations of prolog loop. If it's integer const, the integer > >> >>>> - number is also passed by INT_NITERS_PROLOG. VF is vector factor; > >> >>>> - TH is the threshold for vectorized loop if CHECK_PROFITABILITY is > >> >>>> - true. This function also store upper bound of the result in > >> >>>> BOUND. */ > >> >>>> +/* Calculate the number of iterations above which vectorized loop > >> >>>> will be > >> >>>> + preferred than scalar loop. NITERS_PROLOG is the number of > >> >>>> iterations > >> >>>> + of prolog loop. If it's integer const, the integer number is > >> >>>> also passed > >> >>>> + in INT_NITERS_PROLOG. BOUND_PROLOG is the upper bound (included) > >> >>>> of > >> >>>> + number of iterations of prolog loop. VFM1 is vector factor minus > >> >>>> one. > >> >>>> + If CHECK_PROFITABILITY is true, TH is the threshold below which > >> >>>> scalar > >> >>>> + (rather than vectorized) loop will be executed. This function > >> >>>> stores > >> >>>> + upper bound (included) of the result in BOUND_SCALAR. */ > >> >>>> > >> >>>> static tree > >> >>>> vect_gen_scalar_loop_niters (tree niters_prolog, int > >> >>>> int_niters_prolog, > >> >>>> - int bound_prolog, int vf, int th, int > >> >>>> *bound, > >> >>>> - bool check_profitability) > >> >>>> + int bound_prolog, int vfm1, int th, > >> >>>> + int *bound_scalar, bool > >> >>>> check_profitability) > >> >>>> { > >> >>>> tree type = TREE_TYPE (niters_prolog); > >> >>>> tree niters = fold_build2 (PLUS_EXPR, type, niters_prolog, > >> >>>> - build_int_cst (type, vf)); > >> >>>> + build_int_cst (type, vfm1)); > >> >>>> > >> >>>> - *bound = vf + bound_prolog; > >> >>>> + *bound_scalar = vfm1 + bound_prolog; > >> >>>> if (check_profitability) > >> >>>> { > >> >>>> - th++; > >> >>>> + /* TH indicates the minimum niters of vectorized loop, while we > >> >>>> + compute the maximum niters of scalar loop. */ > >> >>>> + th--; > >> >>> > >> >>> Are you sure about this last change? It looks like it should be > >> >>> dropping > >> >> Hi Richard, > >> >> Thanks for spotting this. I vaguely remember I got this from the way > >> >> how niter/th was checked in previous peeling code, but did't double > >> >> check it now. I tend to believe there is inconsistence about th, > >> >> especially with comment like: > >> >> > >> >> /* Threshold of number of iterations below which vectorzation will > >> >> not be > >> >> performed. It is calculated from MIN_PROFITABLE_ITERS and > >> >> PARAM_MIN_VECT_LOOP_BOUND. */ > >> >> unsigned int th; > >> >> > >> >> I also tend to believe the inconsistence was introduced partly because > >> >> niters in vectorizer stands for latch_niters + 1, while latch_niters > >> >> in rest of the compiler. > >> >> > >> >> and..., > >> >> > >> >>> the increment rather than replacing it with a decrement. > >> >>> > >> >>> It looks like the threshold is already the maximum niters for the > >> >>> scalar > >> >>> loop. It's set by: > >> >>> > >> >>> min_scalar_loop_bound = ((PARAM_VALUE (PARAM_MIN_VECT_LOOP_BOUND) > >> >>> * vectorization_factor) - 1); > >> >>> > >> >>> /* Use the cost model only if it is more conservative than user > >> >>> specified > >> >>> threshold. */ > >> >>> th = (unsigned) min_scalar_loop_bound; > >> >>> if (min_profitable_iters > >> >>> && (!min_scalar_loop_bound > >> >>> || min_profitable_iters > min_scalar_loop_bound)) > >> >>> th = (unsigned) min_profitable_iters; > >> >>> > >> >>> LOOP_VINFO_COST_MODEL_THRESHOLD (loop_vinfo) = th; > >> >>> > >> >>> (Note the "- 1" in the min_scalar_loop_bound. The multiplication > >> >>> result > >> >>> is the minimum niters for the vector loop.) > >> >> To be honest, min_scalar_loop_bound is more likely for something's > >> >> lower bound which is the niters for the vector loop. If it refers to > >> >> the niters scalar loop, it is in actuality the "max" value we should > >> >> use. I am not quite sure here, partly because I am not a native > >> >> speaker. > >> >> > >> >>> > >> >>> min_profitable_iters sounds like it _ought_ to be the minimum niters > >> >>> for > >> >>> which the vector loop is used, but vect_estimate_min_profitable_iters > >> >>> instead returns the largest niters for which the scalar loop should be > >> >>> preferred: > >> >>> > >> >>> /* Cost model disabled. */ > >> >>> if (unlimited_cost_model (LOOP_VINFO_LOOP (loop_vinfo))) > >> >>> { > >> >>> dump_printf_loc (MSG_NOTE, vect_location, "cost model > >> >>> disabled.\n"); > >> >>> *ret_min_profitable_niters = 0; > >> >>> *ret_min_profitable_estimate = 0; > >> >>> return; > >> >>> } > >> >>> [...] > >> >>> /* Because the condition we create is: > >> >>> if (niters <= min_profitable_iters) > >> >>> then skip the vectorized loop. */ > >> >>> min_profitable_iters--; > >> >>> [...] > >> >>> min_profitable_estimate --; > >> >>> > >> >>> Also, when deciding whether the threshold needs to be used, we have: > >> >>> > >> >>> th = LOOP_VINFO_COST_MODEL_THRESHOLD (loop_vinfo); > >> >>> if (th >= LOOP_VINFO_VECT_FACTOR (loop_vinfo) - 1 > >> >>> && !LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo)) > >> >>> { > >> >>> if (dump_enabled_p ()) > >> >>> dump_printf_loc (MSG_NOTE, vect_location, > >> >>> "Profitability threshold is %d loop > >> >>> iterations.\n", > >> >>> th); > >> >>> check_profitability = true; > >> >>> } > >> >> Yes, this indeed indicates that th refers to the maximum niters of the > >> >> scalar loop. Anyway, we should resolve the inconsistence and make it > >> >> more clear in variable names etc.. > >> > > >> > Yeah, agree the variable names are really confusing. Given the choice > >> > between changing the names to match the current meaning and changing > >> > the meaning to match the current names, the latter looks like it would > >> > give cleaner code. I'm happy to do that it that sounds like the way > >> > to go. > >> I am all for this direction, but Richard B may have some to say here. > >> BTW, will you also fix the other inconsistence you found? > > > > I'm all to making the code clearer but I guess it will remain a > > tricky area. Each time I need to change sth there I need half an > > hour to recap how all the stuff is tied together... > > > > So please go forward. Many bonus points if you manage to > > create testcases covering the mistakes you fix. > > No testcases, sorry, but I built the patched compiler for > aarch64-linux-gnu, arm-linux-gnueabi, x86_64-linux-gnu and > powerpc64le-linux-gnu, compiled the testsuite with -O3, > and compared the result with: > > diff -ur gcc/tree-vect-loop.c /hdd/richards/compare/old/gcc/tree-vect-loop.c > --- gcc/tree-vect-loop.c 2017-07-01 16:18:55.951581540 +0100 > +++ /hdd/richards/compare/old/gcc/tree-vect-loop.c 2017-07-01 > 16:00:20.266729964 +0100 > @@ -2247,8 +2247,10 @@ > /* Niters for at least one iteration of vectorized loop. */ > niters_th += LOOP_VINFO_VECT_FACTOR (loop_vinfo); > /* One additional iteration because of peeling for gap. */ > - if (!LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo)) > + if (LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo)) > niters_th++; > + /* Convert to maximum number of scalar iters. */ > + niters_th--; > if (LOOP_VINFO_COST_MODEL_THRESHOLD (loop_vinfo) < niters_th) > LOOP_VINFO_COST_MODEL_THRESHOLD (loop_vinfo) = niters_th; > } > diff -ur gcc/tree-vect-loop-manip.c > /hdd/richards/compare/old/gcc/tree-vect-loop-manip.c > --- gcc/tree-vect-loop-manip.c 2017-07-01 16:18:55.951581540 +0100 > +++ /hdd/richards/compare/old/gcc/tree-vect-loop-manip.c 2017-07-01 > 14:00:26.775542411 +0100 > @@ -1143,9 +1143,6 @@ > *bound_scalar = vfm1 + bound_prolog; > if (check_profitability) > { > - /* TH indicates the minimum niters of vectorized loop, while we > - compute the maximum niters of scalar loop. */ > - th--; > /* Peeling for constant times. */ > if (int_niters_prolog >= 0) > { > > which is the patch that would make everything consistent with the > current meaning (where the "minimum number of iterations" for which > vectorisation was profitable were actually the maximum number of > iterations for which vectorisation wasn't profitable). > > Also tested normally on aarch64-linux-gnu and x86_64-linux-gnu. > OK to install?
Ok. Thanks, Richard. > Richard > > > 2017-07-03 Richard Sandiford <richard.sandif...@linaro.org> > > gcc/ > * tree-vect-loop.c (vect_analyze_loop_2): Treat min_scalar_loop_bound, > min_profitable_iters, and th as inclusive lower bounds. > Fix LOOP_VINFO_PEELING_FOR_GAPS condition. > (vect_estimate_min_profitable_iters): Return inclusive lower bounds > for min_profitable_iters and min_profitable_estimate. > (vect_transform_loop): Treat th as an inclusive lower bound. > * tree-vect-loop-manip.c (vect_loop_versioning): Likewise. > > Index: gcc/tree-vect-loop.c > =================================================================== > --- gcc/tree-vect-loop.c 2017-07-03 08:42:51.149380545 +0100 > +++ gcc/tree-vect-loop.c 2017-07-03 08:50:43.507370205 +0100 > @@ -2132,21 +2132,17 @@ vect_analyze_loop_2 (loop_vec_info loop_ > goto again; > } > > - min_scalar_loop_bound = ((PARAM_VALUE (PARAM_MIN_VECT_LOOP_BOUND) > - * vectorization_factor) - 1); > + min_scalar_loop_bound = (PARAM_VALUE (PARAM_MIN_VECT_LOOP_BOUND) > + * vectorization_factor); > > /* Use the cost model only if it is more conservative than user specified > threshold. */ > - th = (unsigned) min_scalar_loop_bound; > - if (min_profitable_iters > - && (!min_scalar_loop_bound > - || min_profitable_iters > min_scalar_loop_bound)) > - th = (unsigned) min_profitable_iters; > + th = (unsigned) MAX (min_scalar_loop_bound, min_profitable_iters); > > LOOP_VINFO_COST_MODEL_THRESHOLD (loop_vinfo) = th; > > if (LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo) > - && LOOP_VINFO_INT_NITERS (loop_vinfo) <= th) > + && LOOP_VINFO_INT_NITERS (loop_vinfo) < th) > { > if (dump_enabled_p ()) > dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, > @@ -2165,7 +2161,7 @@ vect_analyze_loop_2 (loop_vec_info loop_ > estimated_niter = max_niter; > if (estimated_niter != -1 > && ((unsigned HOST_WIDE_INT) estimated_niter > - <= MAX (th, (unsigned)min_profitable_estimate))) > + < MAX (th, (unsigned) min_profitable_estimate))) > { > if (dump_enabled_p ()) > dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, > @@ -2182,9 +2178,9 @@ vect_analyze_loop_2 (loop_vec_info loop_ > > /* Decide whether we need to create an epilogue loop to handle > remaining scalar iterations. */ > - th = ((LOOP_VINFO_COST_MODEL_THRESHOLD (loop_vinfo) + 1) > - / LOOP_VINFO_VECT_FACTOR (loop_vinfo)) > - * LOOP_VINFO_VECT_FACTOR (loop_vinfo); > + th = ((LOOP_VINFO_COST_MODEL_THRESHOLD (loop_vinfo) > + / LOOP_VINFO_VECT_FACTOR (loop_vinfo)) > + * LOOP_VINFO_VECT_FACTOR (loop_vinfo)); > > if (LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo) > && LOOP_VINFO_PEELING_FOR_ALIGNMENT (loop_vinfo) > 0) > @@ -2247,7 +2243,7 @@ vect_analyze_loop_2 (loop_vec_info loop_ > /* Niters for at least one iteration of vectorized loop. */ > niters_th += LOOP_VINFO_VECT_FACTOR (loop_vinfo); > /* One additional iteration because of peeling for gap. */ > - if (!LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo)) > + if (LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo)) > niters_th++; > if (LOOP_VINFO_COST_MODEL_THRESHOLD (loop_vinfo) < niters_th) > LOOP_VINFO_COST_MODEL_THRESHOLD (loop_vinfo) = niters_th; > @@ -3545,7 +3541,7 @@ vect_estimate_min_profitable_iters (loop > if ((scalar_single_iter_cost * vf) > (int) vec_inside_cost) > { > if (vec_outside_cost <= 0) > - min_profitable_iters = 1; > + min_profitable_iters = 0; > else > { > min_profitable_iters = ((vec_outside_cost - scalar_outside_cost) * > vf > @@ -3586,11 +3582,6 @@ vect_estimate_min_profitable_iters (loop > min_profitable_iters = > min_profitable_iters < vf ? vf : min_profitable_iters; > > - /* Because the condition we create is: > - if (niters <= min_profitable_iters) > - then skip the vectorized loop. */ > - min_profitable_iters--; > - > if (dump_enabled_p ()) > dump_printf_loc (MSG_NOTE, vect_location, > " Runtime profitability threshold = %d\n", > @@ -3606,7 +3597,7 @@ vect_estimate_min_profitable_iters (loop > SIC * niters > VIC * ((niters-PL_ITERS-EP_ITERS)/VF) + VOC + SOC */ > > if (vec_outside_cost <= 0) > - min_profitable_estimate = 1; > + min_profitable_estimate = 0; > else > { > min_profitable_estimate = ((vec_outside_cost + scalar_outside_cost) * > vf > @@ -3615,7 +3606,6 @@ vect_estimate_min_profitable_iters (loop > / ((scalar_single_iter_cost * vf) > - vec_inside_cost); > } > - min_profitable_estimate --; > min_profitable_estimate = MAX (min_profitable_estimate, > min_profitable_iters); > if (dump_enabled_p ()) > dump_printf_loc (MSG_NOTE, vect_location, > @@ -7180,7 +7170,7 @@ vect_transform_loop (loop_vec_info loop_ > run at least the vectorization factor number of times checking > is pointless, too. */ > th = LOOP_VINFO_COST_MODEL_THRESHOLD (loop_vinfo); > - if (th >= LOOP_VINFO_VECT_FACTOR (loop_vinfo) - 1 > + if (th >= LOOP_VINFO_VECT_FACTOR (loop_vinfo) > && !LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo)) > { > if (dump_enabled_p ()) > Index: gcc/tree-vect-loop-manip.c > =================================================================== > --- gcc/tree-vect-loop-manip.c 2017-07-03 08:42:46.863565062 +0100 > +++ gcc/tree-vect-loop-manip.c 2017-07-03 08:50:43.506370248 +0100 > @@ -2142,7 +2142,7 @@ vect_loop_versioning (loop_vec_info loop > bool version_niter = LOOP_REQUIRES_VERSIONING_FOR_NITERS (loop_vinfo); > > if (check_profitability) > - cond_expr = fold_build2 (GT_EXPR, boolean_type_node, scalar_loop_iters, > + cond_expr = fold_build2 (GE_EXPR, boolean_type_node, scalar_loop_iters, > build_int_cst (TREE_TYPE (scalar_loop_iters), > th)); > > > -- Richard Biener <rguent...@suse.de> SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg)