Any more comments?


thanks,
Cong


On Wed, Nov 13, 2013 at 6:06 PM, Cong Hou <co...@google.com> wrote:
> Ping?
>
>
> thanks,
> Cong
>
>
> On Mon, Nov 11, 2013 at 11:25 AM, Cong Hou <co...@google.com> wrote:
>> Hi James
>>
>> Sorry for the late reply.
>>
>>
>> On Fri, Nov 8, 2013 at 2:55 AM, James Greenhalgh
>> <james.greenha...@arm.com> wrote:
>>>> On Tue, Nov 5, 2013 at 9:58 AM, Cong Hou <co...@google.com> wrote:
>>>> > Thank you for your detailed explanation.
>>>> >
>>>> > Once GCC detects a reduction operation, it will automatically
>>>> > accumulate all elements in the vector after the loop. In the loop the
>>>> > reduction variable is always a vector whose elements are reductions of
>>>> > corresponding values from other vectors. Therefore in your case the
>>>> > only instruction you need to generate is:
>>>> >
>>>> >     VABAL   ops[3], ops[1], ops[2]
>>>> >
>>>> > It is OK if you accumulate the elements into one in the vector inside
>>>> > of the loop (if one instruction can do this), but you have to make
>>>> > sure other elements in the vector should remain zero so that the final
>>>> > result is correct.
>>>> >
>>>> > If you are confused about the documentation, check the one for
>>>> > udot_prod (just above usad in md.texi), as it has very similar
>>>> > behavior as usad. Actually I copied the text from there and did some
>>>> > changes. As those two instruction patterns are both for vectorization,
>>>> > their behavior should not be difficult to explain.
>>>> >
>>>> > If you have more questions or think that the documentation is still
>>>> > improper please let me know.
>>>
>>> Hi Cong,
>>>
>>> Thanks for your reply.
>>>
>>> I've looked at Dorit's original patch adding WIDEN_SUM_EXPR and
>>> DOT_PROD_EXPR and I see that the same ambiguity exists for
>>> DOT_PROD_EXPR. Can you please add a note in your tree.def
>>> that SAD_EXPR, like DOT_PROD_EXPR can be expanded as either:
>>>
>>>   tmp = WIDEN_MINUS_EXPR (arg1, arg2)
>>>   tmp2 = ABS_EXPR (tmp)
>>>   arg3 = PLUS_EXPR (tmp2, arg3)
>>>
>>> or:
>>>
>>>   tmp = WIDEN_MINUS_EXPR (arg1, arg2)
>>>   tmp2 = ABS_EXPR (tmp)
>>>   arg3 = WIDEN_SUM_EXPR (tmp2, arg3)
>>>
>>> Where WIDEN_MINUS_EXPR is a signed MINUS_EXPR, returning a
>>> a value of the same (widened) type as arg3.
>>>
>>
>>
>> I have added it, although we currently don't have WIDEN_MINUS_EXPR (I
>> mentioned it in tree.def).
>>
>>
>>> Also, while looking for the history of DOT_PROD_EXPR I spotted this
>>> patch:
>>>
>>>   [autovect] [patch] detect mult-hi and sad patterns
>>>   http://gcc.gnu.org/ml/gcc-patches/2005-10/msg01394.html
>>>
>>> I wonder what the reason was for that patch to be dropped?
>>>
>>
>> It has been 8 years.. I have no idea why this patch is not accepted
>> finally. There is even no reply in that thread. But I believe the SAD
>> pattern is very important to be recognized. ARM also provides
>> instructions for it.
>>
>>
>> Thank you for your comment again!
>>
>>
>> thanks,
>> Cong
>>
>>
>>
>>> Thanks,
>>> James
>>>

Reply via email to