On Tue, Sep 17, 2013 at 1:20 AM, Richard Biener <richard.guent...@gmail.com> wrote: > On Mon, Sep 16, 2013 at 10:24 PM, Xinliang David Li <davi...@google.com> > wrote: >> On Mon, Sep 16, 2013 at 3:13 AM, Richard Biener >> <richard.guent...@gmail.com> wrote: >>> On Fri, Sep 13, 2013 at 5:16 PM, Xinliang David Li <davi...@google.com> >>> wrote: >>>> On Fri, Sep 13, 2013 at 1:30 AM, Richard Biener >>>> <richard.guent...@gmail.com> wrote: >>>>> On Thu, Sep 12, 2013 at 10:31 PM, Xinliang David Li <davi...@google.com> >>>>> wrote: >>>>>> Currently -ftree-vectorize turns on both loop and slp vectorizations, >>>>>> but there is no simple way to turn on loop vectorization alone. The >>>>>> logic for default O3 setting is also complicated. >>>>>> >>>>>> In this patch, two new options are introduced: >>>>>> >>>>>> 1) -ftree-loop-vectorize >>>>>> >>>>>> This option is used to turn on loop vectorization only. option >>>>>> -ftree-slp-vectorize also becomes a first class citizen, and no funny >>>>>> business of Init(2) is needed. With this change, -ftree-vectorize >>>>>> becomes a simple alias to -ftree-loop-vectorize + >>>>>> -ftree-slp-vectorize. >>>>>> >>>>>> For instance, to turn on only slp vectorize at O3, the old way is: >>>>>> >>>>>> -O3 -fno-tree-vectorize -ftree-slp-vectorize >>>>>> >>>>>> With the new change it becomes: >>>>>> >>>>>> -O3 -fno-loop-vectorize >>>>>> >>>>>> >>>>>> To turn on only loop vectorize at O2, the old way is >>>>>> >>>>>> -O2 -ftree-vectorize -fno-slp-vectorize >>>>>> >>>>>> The new way is >>>>>> >>>>>> -O2 -ftree-loop-vectorize >>>>>> >>>>>> >>>>>> >>>>>> 2) -ftree-vect-loop-peeling >>>>>> >>>>>> This option is used to turn on/off loop peeling for alignment. In the >>>>>> long run, this should be folded into the cheap cost model proposed by >>>>>> Richard. This option is also useful in scenarios where peeling can >>>>>> introduce runtime problems: >>>>>> http://gcc.gnu.org/ml/gcc/2005-12/msg00390.html which happens to be >>>>>> common in practice. >>>>>> >>>>>> >>>>>> >>>>>> Patch attached. Compiler boostrapped. Ok after testing? >>>>> >>>>> I'd like you to split 1) and 2), mainly because I agree on 1) but not on >>>>> 2). >>>> >>>> Ok. Can you also comment on 2) ? >>> >>> I think we want to decide how granular we want to control the vectorizer >>> and using which mechanism. My cost-model re-org makes >>> ftree-vect-loop-version a no-op (basically removes it), so 2) looks like >>> a step backwards in this context. >> >> Using cost model to do a coarse grain control/configuration is >> certainly something we want, but having a fine grain control is still >> useful. >> >>> >>> So, can you summarize what pieces (including versioning) of the vectorizer >>> you'd want to be able to disable separately? >> >> Loop peeling seems to be the main one. There is also a correctness >> issue related. For instance, the following code is common in practice, >> but loop peeling wrongly assumes initial base-alignment and generates >> aligned mov instruction after peeling, leading to SEGV. Peeling is >> not something we can blindly turned on -- even when it is on, there >> should be a way to turn it off explicitly: >> >> char a[10000]; >> >> void foo(int n) >> { >> int* b = (int*)(a+n); >> int i = 0; >> for (; i < 1000; ++i) >> b[i] = 1; >> } >> >> int main(int argn, char** argv) >> { >> foo(argn); >> } > > But that's just a bug that should be fixed (looking into it). > >>> Just disabling peeling for >>> alignment may get you into the versioning for alignment path (and thus >>> an unvectorized loop at runtime). >> >> This is not true for target supporting mis-aligned access. I have not >> seen a case where alignment driver loop version happens on x86. >> >>>Also it's know that the alignment peeling >>> code needs some serious TLC (it's outcome depends on the order of DRs, >>> the cost model it uses leaves to be desired as we cannot distinguish >>> between unaligned load and store costs). >> >> Yet another reason to turn it off as it is not effective anyways? > > As said I'll disable all remains of -ftree-vect-loop-version with the cost > model > patch because it wasn't guarding versioning for aliasing but only versioning > for alignment. > > We have to be consistent here - if we add a way to disable peeling for > alignment then we certainly don't want to remove the ability to disable > versioning for alignment, no?
We already have the ability to turn off versioning -- via --param. It is a more natural way to fine tune a pass instead of introducing a -f option. For this reason, your planned deprecation of the option is a good thing to do. For consistency, I think we should introduce a new parameter to turn on/off peeling. This can also be tied closely with the cost model change. The proposed patch attached. Does this one look ok? thanks, David > > Richard. > >> >> thanks, >> >> David >> >>> >>> Richard. >>> >>>>> >>>>> I've stopped a quick try doing 1) myself because >>>>> >>>>> @@ -1691,6 +1695,12 @@ common_handle_option (struct gcc_options >>>>> opts->x_flag_ipa_reference = false; >>>>> break; >>>>> >>>>> + case OPT_ftree_vectorize: >>>>> + if (!opts_set->x_flag_tree_loop_vectorize) >>>>> + opts->x_flag_tree_loop_vectorize = value; >>>>> + if (!opts_set->x_flag_tree_slp_vectorize) >>>>> + opts->x_flag_tree_slp_vectorize = value; >>>>> + break; >>>>> >>>>> doesn't look obviously correct. Does that handle >>>>> >>>>> -ftree-vectorize -fno-tree-loop-vectorize -ftree-vectorize >>>>> >>>>> or >>>>> >>>>> -ftree-loop-vectorize -fno-tree-vectorize >>>>> >>>>> properly? Currently at least >>>>> >>>>> -ftree-slp-vectorize -fno-tree-vectorize >>>>> >>>>> doesn't "work". >>>> >>>> >>>> Right -- same is true for -fprofile-use option. FDO enables some >>>> passes, but can not re-enable them if they are flipped off before. >>>> >>>>> >>>>> That said, the option machinery doesn't handle an option being an alias >>>>> for two other options, so it's mechanism to contract positives/negatives >>>>> doesn't work here and the override hooks do not work reliably for >>>>> repeated options. >>>>> >>>>> Or am I wrong here? Should we care at all? Joseph? >>>> >>>> We should probably just document the behavior. Even better, we should >>>> deprecate the old option. >>>> >>>> thanks, >>>> >>>> David >>>> >>>>> >>>>> Thanks, >>>>> Richard. >>>>> >>>>>> >>>>>> thanks, >>>>>> >>>>>> David
Index: doc/invoke.texi =================================================================== --- doc/invoke.texi (revision 202660) +++ doc/invoke.texi (working copy) @@ -9451,6 +9451,9 @@ The maximum number of run-time checks th doing loop versioning for alias in the vectorizer. See option @option{-ftree-vect-loop-version} for more information. +@item vect-do-peeling-for-alignment +Turn on/off alignment enhancing loop peeling for vectorizer. + @item max-iterations-to-track The maximum number of iterations of a loop the brute-force algorithm for analysis of the number of iterations of the loop tries to evaluate. Index: tree-vect-data-refs.c =================================================================== --- tree-vect-data-refs.c (revision 202660) +++ tree-vect-data-refs.c (working copy) @@ -1349,6 +1349,7 @@ vect_enhance_data_refs_alignment (loop_v unsigned int i, j; bool do_peeling = false; bool do_versioning = false; + int peeling_enabled; bool stat; gimple stmt; stmt_vec_info stmt_info; @@ -1396,6 +1397,7 @@ vect_enhance_data_refs_alignment (loop_v - The cost of peeling (the extra runtime checks, the increase in code size). */ + peeling_enabled = PARAM_VALUE (PARAM_VECT_DO_PEELING_FOR_ALIGNMENT); FOR_EACH_VEC_ELT (datarefs, i, dr) { stmt = DR_STMT (dr); @@ -1420,7 +1422,7 @@ vect_enhance_data_refs_alignment (loop_v continue; supportable_dr_alignment = vect_supportable_dr_alignment (dr, true); - do_peeling = vector_alignment_reachable_p (dr); + do_peeling = (peeling_enabled && vector_alignment_reachable_p (dr)); if (do_peeling) { if (known_alignment_for_access_p (dr)) Index: params.def =================================================================== --- params.def (revision 202660) +++ params.def (working copy) @@ -544,6 +544,11 @@ DEFPARAM(PARAM_VECT_MAX_VERSION_FOR_ALIA "Bound on number of runtime checks inserted by the vectorizer's loop versioning for alias check", 10, 0, 0) +DEFPARAM(PARAM_VECT_DO_PEELING_FOR_ALIGNMENT, + "vect-do-peeling-for-alignment", + "Do loop peeling to enhancement alignment of data references in a loop", + 1, 0, 0) + DEFPARAM(PARAM_MAX_CSELIB_MEMORY_LOCATIONS, "max-cselib-memory-locations", "The maximum memory locations recorded by cselib",