Re: New GCC options for loop vectorization

Xinliang David Li Mon, 23 Sep 2013 13:38:18 -0700

Thanks. I modified the patch so that the max allowed peel iterations
can be specified via a parameter. Testing on going. Ok for trunk ?


David

On Mon, Sep 23, 2013 at 4:31 AM, Richard Biener
<richard.guent...@gmail.com> wrote:
> On Wed, Sep 18, 2013 at 10:21 PM, Xinliang David Li <davi...@google.com> 
> wrote:
>> On Tue, Sep 17, 2013 at 1:20 AM, Richard Biener
>> <richard.guent...@gmail.com> wrote:
>>> On Mon, Sep 16, 2013 at 10:24 PM, Xinliang David Li <davi...@google.com> 
>>> wrote:
>>>> On Mon, Sep 16, 2013 at 3:13 AM, Richard Biener
>>>> <richard.guent...@gmail.com> wrote:
>>>>> On Fri, Sep 13, 2013 at 5:16 PM, Xinliang David Li <davi...@google.com> 
>>>>> wrote:
>>>>>> On Fri, Sep 13, 2013 at 1:30 AM, Richard Biener
>>>>>> <richard.guent...@gmail.com> wrote:
>>>>>>> On Thu, Sep 12, 2013 at 10:31 PM, Xinliang David Li 
>>>>>>> <davi...@google.com> wrote:
>>>>>>>> Currently -ftree-vectorize turns on both loop and slp vectorizations,
>>>>>>>> but there is no simple way to turn on loop vectorization alone. The
>>>>>>>> logic for default O3 setting is also complicated.
>>>>>>>>
>>>>>>>> In this patch, two new options are introduced:
>>>>>>>>
>>>>>>>> 1) -ftree-loop-vectorize
>>>>>>>>
>>>>>>>> This option is used to turn on loop vectorization only. option
>>>>>>>> -ftree-slp-vectorize also becomes a first class citizen, and no funny
>>>>>>>> business of Init(2) is needed.  With this change, -ftree-vectorize
>>>>>>>> becomes a simple alias to -ftree-loop-vectorize +
>>>>>>>> -ftree-slp-vectorize.
>>>>>>>>
>>>>>>>> For instance, to turn on only slp vectorize at O3, the old way is:
>>>>>>>>
>>>>>>>>      -O3 -fno-tree-vectorize -ftree-slp-vectorize
>>>>>>>>
>>>>>>>> With the new change it becomes:
>>>>>>>>
>>>>>>>>     -O3 -fno-loop-vectorize
>>>>>>>>
>>>>>>>>
>>>>>>>> To turn on only loop vectorize at O2, the old way is
>>>>>>>>
>>>>>>>>     -O2 -ftree-vectorize -fno-slp-vectorize
>>>>>>>>
>>>>>>>> The new way is
>>>>>>>>
>>>>>>>>     -O2 -ftree-loop-vectorize
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> 2) -ftree-vect-loop-peeling
>>>>>>>>
>>>>>>>> This option is used to turn on/off loop peeling for alignment.  In the
>>>>>>>> long run, this should be folded into the cheap cost model proposed by
>>>>>>>> Richard.  This option is also useful in scenarios where peeling can
>>>>>>>> introduce runtime problems:
>>>>>>>> http://gcc.gnu.org/ml/gcc/2005-12/msg00390.html  which happens to be
>>>>>>>> common in practice.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Patch attached. Compiler boostrapped. Ok after testing?
>>>>>>>
>>>>>>> I'd like you to split 1) and 2), mainly because I agree on 1) but not 
>>>>>>> on 2).
>>>>>>
>>>>>> Ok. Can you also comment on 2) ?
>>>>>
>>>>> I think we want to decide how granular we want to control the vectorizer
>>>>> and using which mechanism.  My cost-model re-org makes
>>>>> ftree-vect-loop-version a no-op (basically removes it), so 2) looks like
>>>>> a step backwards in this context.
>>>>
>>>> Using cost model to do a coarse grain control/configuration is
>>>> certainly something we want, but having a fine grain control is still
>>>> useful.
>>>>
>>>>>
>>>>> So, can you summarize what pieces (including versioning) of the vectorizer
>>>>> you'd want to be able to disable separately?
>>>>
>>>> Loop peeling seems to be the main one. There is also a correctness
>>>> issue related. For instance, the following code is common in practice,
>>>> but loop peeling wrongly assumes initial base-alignment and generates
>>>> aligned mov instruction after peeling, leading to SEGV.  Peeling is
>>>> not something we can blindly turned on -- even when it is on, there
>>>> should be a way to turn it off explicitly:
>>>>
>>>> char a[10000];
>>>>
>>>> void foo(int n)
>>>> {
>>>>   int* b = (int*)(a+n);
>>>>   int i = 0;
>>>>   for (; i < 1000; ++i)
>>>>     b[i] = 1;
>>>> }
>>>>
>>>> int main(int argn, char** argv)
>>>> {
>>>>   foo(argn);
>>>> }
>>>
>>> But that's just a bug that should be fixed (looking into it).
>>>
>>>>>  Just disabling peeling for
>>>>> alignment may get you into the versioning for alignment path (and thus
>>>>> an unvectorized loop at runtime).
>>>>
>>>> This is not true for target supporting mis-aligned access. I have not
>>>> seen a case where alignment driver loop version happens on x86.
>>>>
>>>>>Also it's know that the alignment peeling
>>>>> code needs some serious TLC (it's outcome depends on the order of DRs,
>>>>> the cost model it uses leaves to be desired as we cannot distinguish
>>>>> between unaligned load and store costs).
>>>>
>>>> Yet another reason to turn it off as it is not effective anyways?
>>>
>>> As said I'll disable all remains of -ftree-vect-loop-version with the cost 
>>> model
>>> patch because it wasn't guarding versioning for aliasing but only versioning
>>> for alignment.
>>>
>>> We have to be consistent here - if we add a way to disable peeling for
>>> alignment then we certainly don't want to remove the ability to disable
>>> versioning for alignment, no?
>>
>> We already have the ability to turn off versioning -- via --param. It
>> is a more natural way to fine tune a pass instead of introducing a -f
>> option. For this reason, your planned deprecation of the option is a
>> good thing to do.
>>
>> For consistency, I think we should introduce a new parameter to turn
>> on/off peeling. This can also be tied closely with the cost model
>> change.
>>
>> The proposed patch attached. Does this one look ok?
>
> I'd principally support adding a param but rather would for example make
> it limit the number of peeled iterations (zero turning off the feature).  As
> you implemented it it will give false messages for
>
>       supportable_dr_alignment = vect_supportable_dr_alignment (dr, true);
>       do_peeling = vector_alignment_reachable_p (dr);
>       if (do_peeling)
>         {
> ...
>         }
>       else
>         {
>           if (!aligned_access_p (dr))
>             {
>               if (dump_enabled_p ())
>                 dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
>                                  "vector alignment may not be reachable\n");
>               break;
>             }
>         }
>
> the else path.  I'd restructure this as
>
>      if (!vector_alignment_reachable_p (dr))
>        {
>           if (!aligned_access_p (dr))
>          ....
>           continue;
>        }
>
>      if (known_alignment_for_access_p (dr))
>       {
> ... check param against the number of iterations to peel for this DR ....
>       }
>      else
>       {
> ... check param against max iterations ...
>       }
>
> an alternative is a tri-state param that would either allow no peeling at all,
> just peeling with known number of iterations or all kind of peelings.
>
> Thanks,
> Richard.
>
>
>> thanks,
>>
>> David
>>
>>
>>
>>>
>>> Richard.
>>>
>>>>
>>>> thanks,
>>>>
>>>> David
>>>>
>>>>>
>>>>> Richard.
>>>>>
>>>>>>>
>>>>>>> I've stopped a quick try doing 1) myself because
>>>>>>>
>>>>>>> @@ -1691,6 +1695,12 @@ common_handle_option (struct gcc_options
>>>>>>>          opts->x_flag_ipa_reference = false;
>>>>>>>        break;
>>>>>>>
>>>>>>> +    case OPT_ftree_vectorize:
>>>>>>> +      if (!opts_set->x_flag_tree_loop_vectorize)
>>>>>>> + opts->x_flag_tree_loop_vectorize = value;
>>>>>>> +      if (!opts_set->x_flag_tree_slp_vectorize)
>>>>>>> + opts->x_flag_tree_slp_vectorize = value;
>>>>>>> +      break;
>>>>>>>
>>>>>>> doesn't look obviously correct.  Does that handle
>>>>>>>
>>>>>>>   -ftree-vectorize -fno-tree-loop-vectorize -ftree-vectorize
>>>>>>>
>>>>>>> or
>>>>>>>
>>>>>>>   -ftree-loop-vectorize -fno-tree-vectorize
>>>>>>>
>>>>>>> properly?  Currently at least
>>>>>>>
>>>>>>>   -ftree-slp-vectorize -fno-tree-vectorize
>>>>>>>
>>>>>>> doesn't "work".
>>>>>>
>>>>>>
>>>>>> Right -- same is true for -fprofile-use option. FDO enables some
>>>>>> passes, but can not re-enable them if they are flipped off before.
>>>>>>
>>>>>>>
>>>>>>> That said, the option machinery doesn't handle an option being an alias
>>>>>>> for two other options, so it's mechanism to contract positives/negatives
>>>>>>> doesn't work here and the override hooks do not work reliably for
>>>>>>> repeated options.
>>>>>>>
>>>>>>> Or am I wrong here?  Should we care at all?  Joseph?
>>>>>>
>>>>>> We should probably just document the behavior. Even better, we should
>>>>>> deprecate the old option.
>>>>>>
>>>>>> thanks,
>>>>>>
>>>>>> David
>>>>>>
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Richard.
>>>>>>>
>>>>>>>>
>>>>>>>> thanks,
>>>>>>>>
>>>>>>>> David

Index: params.def
===================================================================
--- params.def  (revision 202834)
+++ params.def  (working copy)
@@ -544,6 +544,11 @@ DEFPARAM(PARAM_VECT_MAX_VERSION_FOR_ALIA
          "Bound on number of runtime checks inserted by the vectorizer's loop 
versioning for alias check",
          10, 0, 0)
 
+DEFPARAM(PARAM_VECT_MAX_PEELING_FOR_ALIGNMENT,
+         "vect-max-peeling-for-alignment",
+         "Max number of loop peels to enhancement alignment of data references 
in a loop",
+         -1, 0, 0)
+
 DEFPARAM(PARAM_MAX_CSELIB_MEMORY_LOCATIONS,
         "max-cselib-memory-locations",
         "The maximum memory locations recorded by cselib",
Index: tree-vect-data-refs.c
===================================================================
--- tree-vect-data-refs.c       (revision 202834)
+++ tree-vect-data-refs.c       (working copy)
@@ -1718,6 +1718,30 @@ vect_enhance_data_refs_alignment (loop_v
 
       if (do_peeling)
         {
+          unsigned max_allowed_peel
+            = PARAM_VALUE (PARAM_VECT_MAX_PEELING_FOR_ALIGNMENT);
+          if (max_allowed_peel != (unsigned)-1)
+            {
+              unsigned max_peel = npeel;
+              if (max_peel == 0)
+                {
+                  gimple dr_stmt = DR_STMT (dr0);
+                  stmt_vec_info vinfo = vinfo_for_stmt (dr_stmt);
+                  tree vtype = STMT_VINFO_VECTYPE (vinfo);
+                  max_peel = TYPE_VECTOR_SUBPARTS (vtype) - 1;
+                }
+              if (max_peel > max_allowed_peel)
+                {
+                  do_peeling = false;
+                  if (dump_enabled_p ())
+                    dump_printf_loc (MSG_NOTE, vect_location,
+                        "Disable peeling, max peels reached: %d\n", max_peel);
+                }
+            }
+        }
+
+      if (do_peeling)
+        {
          stmt_info_for_cost *si;
          void *data = LOOP_VINFO_TARGET_COST_DATA (loop_vinfo);
 
Index: doc/invoke.texi
===================================================================
--- doc/invoke.texi     (revision 202834)
+++ doc/invoke.texi     (working copy)
@@ -9451,6 +9451,10 @@ The maximum number of run-time checks th
 doing loop versioning for alias in the vectorizer.  See option
 @option{-ftree-vect-loop-version} for more information.
 
+@item vect-max-peeling-for-alignment
+The maximum number of loop peels to enhance access alignment
+for vectorizer. Value -1 means 'no limit'.
+
 @item max-iterations-to-track
 The maximum number of iterations of a loop the brute-force algorithm
 for analysis of the number of iterations of the loop tries to evaluate.

Re: New GCC options for loop vectorization

Reply via email to