Re: New GCC options for loop vectorization

Xinliang David Li Wed, 18 Sep 2013 13:21:38 -0700

On Tue, Sep 17, 2013 at 1:20 AM, Richard Biener
<richard.guent...@gmail.com> wrote:
> On Mon, Sep 16, 2013 at 10:24 PM, Xinliang David Li <davi...@google.com> 
> wrote:
>> On Mon, Sep 16, 2013 at 3:13 AM, Richard Biener
>> <richard.guent...@gmail.com> wrote:
>>> On Fri, Sep 13, 2013 at 5:16 PM, Xinliang David Li <davi...@google.com> 
>>> wrote:
>>>> On Fri, Sep 13, 2013 at 1:30 AM, Richard Biener
>>>> <richard.guent...@gmail.com> wrote:
>>>>> On Thu, Sep 12, 2013 at 10:31 PM, Xinliang David Li <davi...@google.com> 
>>>>> wrote:
>>>>>> Currently -ftree-vectorize turns on both loop and slp vectorizations,
>>>>>> but there is no simple way to turn on loop vectorization alone. The
>>>>>> logic for default O3 setting is also complicated.
>>>>>>
>>>>>> In this patch, two new options are introduced:
>>>>>>
>>>>>> 1) -ftree-loop-vectorize
>>>>>>
>>>>>> This option is used to turn on loop vectorization only. option
>>>>>> -ftree-slp-vectorize also becomes a first class citizen, and no funny
>>>>>> business of Init(2) is needed.  With this change, -ftree-vectorize
>>>>>> becomes a simple alias to -ftree-loop-vectorize +
>>>>>> -ftree-slp-vectorize.
>>>>>>
>>>>>> For instance, to turn on only slp vectorize at O3, the old way is:
>>>>>>
>>>>>>      -O3 -fno-tree-vectorize -ftree-slp-vectorize
>>>>>>
>>>>>> With the new change it becomes:
>>>>>>
>>>>>>     -O3 -fno-loop-vectorize
>>>>>>
>>>>>>
>>>>>> To turn on only loop vectorize at O2, the old way is
>>>>>>
>>>>>>     -O2 -ftree-vectorize -fno-slp-vectorize
>>>>>>
>>>>>> The new way is
>>>>>>
>>>>>>     -O2 -ftree-loop-vectorize
>>>>>>
>>>>>>
>>>>>>
>>>>>> 2) -ftree-vect-loop-peeling
>>>>>>
>>>>>> This option is used to turn on/off loop peeling for alignment.  In the
>>>>>> long run, this should be folded into the cheap cost model proposed by
>>>>>> Richard.  This option is also useful in scenarios where peeling can
>>>>>> introduce runtime problems:
>>>>>> http://gcc.gnu.org/ml/gcc/2005-12/msg00390.html  which happens to be
>>>>>> common in practice.
>>>>>>
>>>>>>
>>>>>>
>>>>>> Patch attached. Compiler boostrapped. Ok after testing?
>>>>>
>>>>> I'd like you to split 1) and 2), mainly because I agree on 1) but not on 
>>>>> 2).
>>>>
>>>> Ok. Can you also comment on 2) ?
>>>
>>> I think we want to decide how granular we want to control the vectorizer
>>> and using which mechanism.  My cost-model re-org makes
>>> ftree-vect-loop-version a no-op (basically removes it), so 2) looks like
>>> a step backwards in this context.
>>
>> Using cost model to do a coarse grain control/configuration is
>> certainly something we want, but having a fine grain control is still
>> useful.
>>
>>>
>>> So, can you summarize what pieces (including versioning) of the vectorizer
>>> you'd want to be able to disable separately?
>>
>> Loop peeling seems to be the main one. There is also a correctness
>> issue related. For instance, the following code is common in practice,
>> but loop peeling wrongly assumes initial base-alignment and generates
>> aligned mov instruction after peeling, leading to SEGV.  Peeling is
>> not something we can blindly turned on -- even when it is on, there
>> should be a way to turn it off explicitly:
>>
>> char a[10000];
>>
>> void foo(int n)
>> {
>>   int* b = (int*)(a+n);
>>   int i = 0;
>>   for (; i < 1000; ++i)
>>     b[i] = 1;
>> }
>>
>> int main(int argn, char** argv)
>> {
>>   foo(argn);
>> }
>
> But that's just a bug that should be fixed (looking into it).
>
>>>  Just disabling peeling for
>>> alignment may get you into the versioning for alignment path (and thus
>>> an unvectorized loop at runtime).
>>
>> This is not true for target supporting mis-aligned access. I have not
>> seen a case where alignment driver loop version happens on x86.
>>
>>>Also it's know that the alignment peeling
>>> code needs some serious TLC (it's outcome depends on the order of DRs,
>>> the cost model it uses leaves to be desired as we cannot distinguish
>>> between unaligned load and store costs).
>>
>> Yet another reason to turn it off as it is not effective anyways?
>
> As said I'll disable all remains of -ftree-vect-loop-version with the cost 
> model
> patch because it wasn't guarding versioning for aliasing but only versioning
> for alignment.
>
> We have to be consistent here - if we add a way to disable peeling for
> alignment then we certainly don't want to remove the ability to disable
> versioning for alignment, no?


We already have the ability to turn off versioning -- via --param. It
is a more natural way to fine tune a pass instead of introducing a -f
option. For this reason, your planned deprecation of the option is a
good thing to do.

For consistency, I think we should introduce a new parameter to turn
on/off peeling. This can also be tied closely with the cost model
change.

The proposed patch attached. Does this one look ok?

thanks,

David



>
> Richard.
>
>>
>> thanks,
>>
>> David
>>
>>>
>>> Richard.
>>>
>>>>>
>>>>> I've stopped a quick try doing 1) myself because
>>>>>
>>>>> @@ -1691,6 +1695,12 @@ common_handle_option (struct gcc_options
>>>>>          opts->x_flag_ipa_reference = false;
>>>>>        break;
>>>>>
>>>>> +    case OPT_ftree_vectorize:
>>>>> +      if (!opts_set->x_flag_tree_loop_vectorize)
>>>>> + opts->x_flag_tree_loop_vectorize = value;
>>>>> +      if (!opts_set->x_flag_tree_slp_vectorize)
>>>>> + opts->x_flag_tree_slp_vectorize = value;
>>>>> +      break;
>>>>>
>>>>> doesn't look obviously correct.  Does that handle
>>>>>
>>>>>   -ftree-vectorize -fno-tree-loop-vectorize -ftree-vectorize
>>>>>
>>>>> or
>>>>>
>>>>>   -ftree-loop-vectorize -fno-tree-vectorize
>>>>>
>>>>> properly?  Currently at least
>>>>>
>>>>>   -ftree-slp-vectorize -fno-tree-vectorize
>>>>>
>>>>> doesn't "work".
>>>>
>>>>
>>>> Right -- same is true for -fprofile-use option. FDO enables some
>>>> passes, but can not re-enable them if they are flipped off before.
>>>>
>>>>>
>>>>> That said, the option machinery doesn't handle an option being an alias
>>>>> for two other options, so it's mechanism to contract positives/negatives
>>>>> doesn't work here and the override hooks do not work reliably for
>>>>> repeated options.
>>>>>
>>>>> Or am I wrong here?  Should we care at all?  Joseph?
>>>>
>>>> We should probably just document the behavior. Even better, we should
>>>> deprecate the old option.
>>>>
>>>> thanks,
>>>>
>>>> David
>>>>
>>>>>
>>>>> Thanks,
>>>>> Richard.
>>>>>
>>>>>>
>>>>>> thanks,
>>>>>>
>>>>>> David

Index: doc/invoke.texi
===================================================================
--- doc/invoke.texi     (revision 202660)
+++ doc/invoke.texi     (working copy)
@@ -9451,6 +9451,9 @@ The maximum number of run-time checks th
 doing loop versioning for alias in the vectorizer.  See option
 @option{-ftree-vect-loop-version} for more information.
 
+@item vect-do-peeling-for-alignment
+Turn on/off alignment enhancing loop peeling for vectorizer.
+
 @item max-iterations-to-track
 The maximum number of iterations of a loop the brute-force algorithm
 for analysis of the number of iterations of the loop tries to evaluate.
Index: tree-vect-data-refs.c
===================================================================
--- tree-vect-data-refs.c       (revision 202660)
+++ tree-vect-data-refs.c       (working copy)
@@ -1349,6 +1349,7 @@ vect_enhance_data_refs_alignment (loop_v
   unsigned int i, j;
   bool do_peeling = false;
   bool do_versioning = false;
+  int peeling_enabled;
   bool stat;
   gimple stmt;
   stmt_vec_info stmt_info;
@@ -1396,6 +1397,7 @@ vect_enhance_data_refs_alignment (loop_v
      - The cost of peeling (the extra runtime checks, the increase
        in code size).  */
 
+  peeling_enabled = PARAM_VALUE (PARAM_VECT_DO_PEELING_FOR_ALIGNMENT);
   FOR_EACH_VEC_ELT (datarefs, i, dr)
     {
       stmt = DR_STMT (dr);
@@ -1420,7 +1422,7 @@ vect_enhance_data_refs_alignment (loop_v
        continue;
 
       supportable_dr_alignment = vect_supportable_dr_alignment (dr, true);
-      do_peeling = vector_alignment_reachable_p (dr);
+      do_peeling = (peeling_enabled && vector_alignment_reachable_p (dr));
       if (do_peeling)
         {
           if (known_alignment_for_access_p (dr))
Index: params.def
===================================================================
--- params.def  (revision 202660)
+++ params.def  (working copy)
@@ -544,6 +544,11 @@ DEFPARAM(PARAM_VECT_MAX_VERSION_FOR_ALIA
          "Bound on number of runtime checks inserted by the vectorizer's loop 
versioning for alias check",
          10, 0, 0)
 
+DEFPARAM(PARAM_VECT_DO_PEELING_FOR_ALIGNMENT,
+         "vect-do-peeling-for-alignment",
+         "Do loop peeling to enhancement alignment of data references in a 
loop",
+         1, 0, 0)
+
 DEFPARAM(PARAM_MAX_CSELIB_MEMORY_LOCATIONS,
         "max-cselib-memory-locations",
         "The maximum memory locations recorded by cselib",

Re: New GCC options for loop vectorization

Reply via email to