Re: [PATCH, vec-tails] Support loop epilogue vectorization

Bin.Cheng Wed, 09 Nov 2016 04:41:28 -0800

On Wed, Nov 9, 2016 at 12:12 PM, Yuri Rumyantsev <ysrum...@gmail.com> wrote:
> I am familiar with SVE extension and understand that implemented
> approach might be not suitable for ARM. The main point is that only
> load/store instructions are masked but all other calculations are not
> (we did special conversion for reduction statements to implement
> merging predication semantic). For SVE peeling for niters is not
> required but it is not true for x86 -  we must determine what
> vectorization scheme is more profitable: loop combining (the only
> essential for SVE) or separate epilogue vectorization using masking or
> less vectorization factor. So I'd like to have the full list of
> required changes to our implementation to try to remove them.
Hmm, sorry that my comment gave impression that I was trying to hold
back the patch, it's not what I meant by any means.  Also it's not
related to SVE, As a matter of fact, I haven't read any document about
SVE yet.  Sorry again for the false impression conveyed by previous
messages.


Thanks,
bin
>
> Thanks.
> Yuri.
>
> 2016-11-09 14:46 GMT+03:00 Bin.Cheng <amker.ch...@gmail.com>:
>> On Wed, Nov 9, 2016 at 11:28 AM, Yuri Rumyantsev <ysrum...@gmail.com> wrote:
>>> Thanks Richard for your comments.
>>> Your proposed to handle epilogue loop just like normal short-trip loop
>>> but this is not exactly truth since e.g. epilogue must not be peeled
>>> for alignment.
>> Yes there must be some differences, my motivation is to minimize that
>> so we don't need to specially check normal/epilogue loops at too many
>> places.
>> Of course it's just my feeling when going through the patch set, and
>> could be wrong.
>>
>> Thanks,
>> bin
>>>
>>> It is not clear for me what are my next steps? Should I re-design the
>>> patch completely or simply decompose the whole patch to different
>>> parts? But it means that we must start review process from beginning
>>> but release is closed to its end.
>>> Note also that i work for Intel till the end of year and have not idea
>>> who will continue working on this project.
>>>
>>> Any help will be appreciated.
>>>
>>> Thanks.
>>> Yuri.
>>>
>>> 2016-11-09 13:37 GMT+03:00 Bin.Cheng <amker.ch...@gmail.com>:
>>>> On Tue, Nov 1, 2016 at 12:38 PM, Yuri Rumyantsev <ysrum...@gmail.com> 
>>>> wrote:
>>>>> Hi All,
>>>>>
>>>>> I re-send all patches sent by Ilya earlier for review which support
>>>>> vectorization of loop epilogues and loops with low trip count. We
>>>>> assume that the only patch - vec-tails-07-combine-tail.patch - was not
>>>>> approved by Jeff.
>>>>>
>>>>> I did re-base of all patches and performed bootstrapping and
>>>>> regression testing that did not show any new failures. Also all
>>>>> changes related to new vect_do_peeling algorithm have been changed
>>>>> accordingly.
>>>>>
>>>>> Is it OK for trunk?
>>>>
>>>> Hi,
>>>> I can't approve patches, but had some comments after going through the
>>>> implementation.
>>>>
>>>> One confusing part is cost model change, as well as the way it's used
>>>> to decide how epilogue loop should be vectorized.  Given vect-tail is
>>>> disabled at the moment and the cost change needs further tuning, is it
>>>> reasonable to split this part out and get vectorization part
>>>> reviewed/submitted independently?  For example, let user specified
>>>> parameters make the decision for now.  Cost and target dependent
>>>> changes should go in at last, this could make the patch easier to
>>>> read.
>>>>
>>>> The implementation computes/shares quite amount information from main
>>>> loop to epilogue loop vectorization.  Furthermore, variables/fields
>>>> for such information are somehow named in a misleading way.  For
>>>> example. LOOP_VINFO_MASK_EPILOGUE gives me the impression this is the
>>>> flag controlling whether epilogue loop should be vectorized with
>>>> masking.  However, it's actually controlled by exactly the same flag
>>>> as whether epilogue loop should be combined into the main loop with
>>>> masking:
>>>> @@ -7338,6 +8013,9 @@ vect_transform_loop (loop_vec_info loop_vinfo)
>>>>
>>>>    slpeel_make_loop_iterate_ntimes (loop, niters_vector);
>>>>
>>>> +  if (LOOP_VINFO_COMBINE_EPILOGUE (loop_vinfo))
>>>> +    vect_combine_loop_epilogue (loop_vinfo);
>>>> +
>>>>    /* Reduce loop iterations by the vectorization factor.  */
>>>>    scale_loop_profile (loop, GCOV_COMPUTE_SCALE (1, vf),
>>>>                expected_iterations / vf);
>>>>
>>>> IMHO, we should decouple main loop vectorization and epilogue
>>>> vectorization as much as possible by sharing as few information as we
>>>> can.  The general idea is to handle epilogue loop just like normal
>>>> short-trip loop.  For example, we can rename
>>>> LOOP_VINFO_COMBINE_EPILOGUE into LOOP_VINFO_VECT_MASK (or something
>>>> else), and we don't differentiate its meaning between main and
>>>> epilogue(short-trip) loop.  It only indicates the current loop should
>>>> be vectorized with masking no matter it's a main loop or epilogue
>>>> loop, and it works just like the current implementation.
>>>>
>>>> After this change, we can refine vectorization and make it more
>>>> general for normal loop and epilogue(short trip) loop.  For example,
>>>> this implementation sets LOOP_VINFO_PEELING_FOR_NITER  for epilogue
>>>> loop and use it to control how it should be vectorized:
>>>> +  if (!LOOP_VINFO_PEELING_FOR_NITER (loop_vinfo))
>>>> +    {
>>>> +      LOOP_VINFO_MASK_EPILOGUE (loop_vinfo) = false;
>>>> +      LOOP_VINFO_COMBINE_EPILOGUE (loop_vinfo) = false;
>>>> +    }
>>>> +  else if (LOOP_VINFO_CAN_BE_MASKED (loop_vinfo)
>>>> +       && min_profitable_combine_iters >= 0)
>>>> +    {
>>>>
>>>> This works, but not that good for understanding or maintaining.
>>>>
>>>> Thanks,
>>>> bin

Re: [PATCH, vec-tails] Support loop epilogue vectorization

Reply via email to