Thanks. I modified the patch so that the max allowed peel iterations can be specified via a parameter. Testing on going. Ok for trunk ?
David On Mon, Sep 23, 2013 at 4:31 AM, Richard Biener <richard.guent...@gmail.com> wrote: > On Wed, Sep 18, 2013 at 10:21 PM, Xinliang David Li <davi...@google.com> > wrote: >> On Tue, Sep 17, 2013 at 1:20 AM, Richard Biener >> <richard.guent...@gmail.com> wrote: >>> On Mon, Sep 16, 2013 at 10:24 PM, Xinliang David Li <davi...@google.com> >>> wrote: >>>> On Mon, Sep 16, 2013 at 3:13 AM, Richard Biener >>>> <richard.guent...@gmail.com> wrote: >>>>> On Fri, Sep 13, 2013 at 5:16 PM, Xinliang David Li <davi...@google.com> >>>>> wrote: >>>>>> On Fri, Sep 13, 2013 at 1:30 AM, Richard Biener >>>>>> <richard.guent...@gmail.com> wrote: >>>>>>> On Thu, Sep 12, 2013 at 10:31 PM, Xinliang David Li >>>>>>> <davi...@google.com> wrote: >>>>>>>> Currently -ftree-vectorize turns on both loop and slp vectorizations, >>>>>>>> but there is no simple way to turn on loop vectorization alone. The >>>>>>>> logic for default O3 setting is also complicated. >>>>>>>> >>>>>>>> In this patch, two new options are introduced: >>>>>>>> >>>>>>>> 1) -ftree-loop-vectorize >>>>>>>> >>>>>>>> This option is used to turn on loop vectorization only. option >>>>>>>> -ftree-slp-vectorize also becomes a first class citizen, and no funny >>>>>>>> business of Init(2) is needed. With this change, -ftree-vectorize >>>>>>>> becomes a simple alias to -ftree-loop-vectorize + >>>>>>>> -ftree-slp-vectorize. >>>>>>>> >>>>>>>> For instance, to turn on only slp vectorize at O3, the old way is: >>>>>>>> >>>>>>>> -O3 -fno-tree-vectorize -ftree-slp-vectorize >>>>>>>> >>>>>>>> With the new change it becomes: >>>>>>>> >>>>>>>> -O3 -fno-loop-vectorize >>>>>>>> >>>>>>>> >>>>>>>> To turn on only loop vectorize at O2, the old way is >>>>>>>> >>>>>>>> -O2 -ftree-vectorize -fno-slp-vectorize >>>>>>>> >>>>>>>> The new way is >>>>>>>> >>>>>>>> -O2 -ftree-loop-vectorize >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> 2) -ftree-vect-loop-peeling >>>>>>>> >>>>>>>> This option is used to turn on/off loop peeling for alignment. In the >>>>>>>> long run, this should be folded into the cheap cost model proposed by >>>>>>>> Richard. This option is also useful in scenarios where peeling can >>>>>>>> introduce runtime problems: >>>>>>>> http://gcc.gnu.org/ml/gcc/2005-12/msg00390.html which happens to be >>>>>>>> common in practice. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> Patch attached. Compiler boostrapped. Ok after testing? >>>>>>> >>>>>>> I'd like you to split 1) and 2), mainly because I agree on 1) but not >>>>>>> on 2). >>>>>> >>>>>> Ok. Can you also comment on 2) ? >>>>> >>>>> I think we want to decide how granular we want to control the vectorizer >>>>> and using which mechanism. My cost-model re-org makes >>>>> ftree-vect-loop-version a no-op (basically removes it), so 2) looks like >>>>> a step backwards in this context. >>>> >>>> Using cost model to do a coarse grain control/configuration is >>>> certainly something we want, but having a fine grain control is still >>>> useful. >>>> >>>>> >>>>> So, can you summarize what pieces (including versioning) of the vectorizer >>>>> you'd want to be able to disable separately? >>>> >>>> Loop peeling seems to be the main one. There is also a correctness >>>> issue related. For instance, the following code is common in practice, >>>> but loop peeling wrongly assumes initial base-alignment and generates >>>> aligned mov instruction after peeling, leading to SEGV. Peeling is >>>> not something we can blindly turned on -- even when it is on, there >>>> should be a way to turn it off explicitly: >>>> >>>> char a[10000]; >>>> >>>> void foo(int n) >>>> { >>>> int* b = (int*)(a+n); >>>> int i = 0; >>>> for (; i < 1000; ++i) >>>> b[i] = 1; >>>> } >>>> >>>> int main(int argn, char** argv) >>>> { >>>> foo(argn); >>>> } >>> >>> But that's just a bug that should be fixed (looking into it). >>> >>>>> Just disabling peeling for >>>>> alignment may get you into the versioning for alignment path (and thus >>>>> an unvectorized loop at runtime). >>>> >>>> This is not true for target supporting mis-aligned access. I have not >>>> seen a case where alignment driver loop version happens on x86. >>>> >>>>>Also it's know that the alignment peeling >>>>> code needs some serious TLC (it's outcome depends on the order of DRs, >>>>> the cost model it uses leaves to be desired as we cannot distinguish >>>>> between unaligned load and store costs). >>>> >>>> Yet another reason to turn it off as it is not effective anyways? >>> >>> As said I'll disable all remains of -ftree-vect-loop-version with the cost >>> model >>> patch because it wasn't guarding versioning for aliasing but only versioning >>> for alignment. >>> >>> We have to be consistent here - if we add a way to disable peeling for >>> alignment then we certainly don't want to remove the ability to disable >>> versioning for alignment, no? >> >> We already have the ability to turn off versioning -- via --param. It >> is a more natural way to fine tune a pass instead of introducing a -f >> option. For this reason, your planned deprecation of the option is a >> good thing to do. >> >> For consistency, I think we should introduce a new parameter to turn >> on/off peeling. This can also be tied closely with the cost model >> change. >> >> The proposed patch attached. Does this one look ok? > > I'd principally support adding a param but rather would for example make > it limit the number of peeled iterations (zero turning off the feature). As > you implemented it it will give false messages for > > supportable_dr_alignment = vect_supportable_dr_alignment (dr, true); > do_peeling = vector_alignment_reachable_p (dr); > if (do_peeling) > { > ... > } > else > { > if (!aligned_access_p (dr)) > { > if (dump_enabled_p ()) > dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, > "vector alignment may not be reachable\n"); > break; > } > } > > the else path. I'd restructure this as > > if (!vector_alignment_reachable_p (dr)) > { > if (!aligned_access_p (dr)) > .... > continue; > } > > if (known_alignment_for_access_p (dr)) > { > ... check param against the number of iterations to peel for this DR .... > } > else > { > ... check param against max iterations ... > } > > an alternative is a tri-state param that would either allow no peeling at all, > just peeling with known number of iterations or all kind of peelings. > > Thanks, > Richard. > > >> thanks, >> >> David >> >> >> >>> >>> Richard. >>> >>>> >>>> thanks, >>>> >>>> David >>>> >>>>> >>>>> Richard. >>>>> >>>>>>> >>>>>>> I've stopped a quick try doing 1) myself because >>>>>>> >>>>>>> @@ -1691,6 +1695,12 @@ common_handle_option (struct gcc_options >>>>>>> opts->x_flag_ipa_reference = false; >>>>>>> break; >>>>>>> >>>>>>> + case OPT_ftree_vectorize: >>>>>>> + if (!opts_set->x_flag_tree_loop_vectorize) >>>>>>> + opts->x_flag_tree_loop_vectorize = value; >>>>>>> + if (!opts_set->x_flag_tree_slp_vectorize) >>>>>>> + opts->x_flag_tree_slp_vectorize = value; >>>>>>> + break; >>>>>>> >>>>>>> doesn't look obviously correct. Does that handle >>>>>>> >>>>>>> -ftree-vectorize -fno-tree-loop-vectorize -ftree-vectorize >>>>>>> >>>>>>> or >>>>>>> >>>>>>> -ftree-loop-vectorize -fno-tree-vectorize >>>>>>> >>>>>>> properly? Currently at least >>>>>>> >>>>>>> -ftree-slp-vectorize -fno-tree-vectorize >>>>>>> >>>>>>> doesn't "work". >>>>>> >>>>>> >>>>>> Right -- same is true for -fprofile-use option. FDO enables some >>>>>> passes, but can not re-enable them if they are flipped off before. >>>>>> >>>>>>> >>>>>>> That said, the option machinery doesn't handle an option being an alias >>>>>>> for two other options, so it's mechanism to contract positives/negatives >>>>>>> doesn't work here and the override hooks do not work reliably for >>>>>>> repeated options. >>>>>>> >>>>>>> Or am I wrong here? Should we care at all? Joseph? >>>>>> >>>>>> We should probably just document the behavior. Even better, we should >>>>>> deprecate the old option. >>>>>> >>>>>> thanks, >>>>>> >>>>>> David >>>>>> >>>>>>> >>>>>>> Thanks, >>>>>>> Richard. >>>>>>> >>>>>>>> >>>>>>>> thanks, >>>>>>>> >>>>>>>> David
Index: params.def =================================================================== --- params.def (revision 202834) +++ params.def (working copy) @@ -544,6 +544,11 @@ DEFPARAM(PARAM_VECT_MAX_VERSION_FOR_ALIA "Bound on number of runtime checks inserted by the vectorizer's loop versioning for alias check", 10, 0, 0) +DEFPARAM(PARAM_VECT_MAX_PEELING_FOR_ALIGNMENT, + "vect-max-peeling-for-alignment", + "Max number of loop peels to enhancement alignment of data references in a loop", + -1, 0, 0) + DEFPARAM(PARAM_MAX_CSELIB_MEMORY_LOCATIONS, "max-cselib-memory-locations", "The maximum memory locations recorded by cselib", Index: tree-vect-data-refs.c =================================================================== --- tree-vect-data-refs.c (revision 202834) +++ tree-vect-data-refs.c (working copy) @@ -1718,6 +1718,30 @@ vect_enhance_data_refs_alignment (loop_v if (do_peeling) { + unsigned max_allowed_peel + = PARAM_VALUE (PARAM_VECT_MAX_PEELING_FOR_ALIGNMENT); + if (max_allowed_peel != (unsigned)-1) + { + unsigned max_peel = npeel; + if (max_peel == 0) + { + gimple dr_stmt = DR_STMT (dr0); + stmt_vec_info vinfo = vinfo_for_stmt (dr_stmt); + tree vtype = STMT_VINFO_VECTYPE (vinfo); + max_peel = TYPE_VECTOR_SUBPARTS (vtype) - 1; + } + if (max_peel > max_allowed_peel) + { + do_peeling = false; + if (dump_enabled_p ()) + dump_printf_loc (MSG_NOTE, vect_location, + "Disable peeling, max peels reached: %d\n", max_peel); + } + } + } + + if (do_peeling) + { stmt_info_for_cost *si; void *data = LOOP_VINFO_TARGET_COST_DATA (loop_vinfo); Index: doc/invoke.texi =================================================================== --- doc/invoke.texi (revision 202834) +++ doc/invoke.texi (working copy) @@ -9451,6 +9451,10 @@ The maximum number of run-time checks th doing loop versioning for alias in the vectorizer. See option @option{-ftree-vect-loop-version} for more information. +@item vect-max-peeling-for-alignment +The maximum number of loop peels to enhance access alignment +for vectorizer. Value -1 means 'no limit'. + @item max-iterations-to-track The maximum number of iterations of a loop the brute-force algorithm for analysis of the number of iterations of the loop tries to evaluate.