OK. Thank you very much for your review, Richard! thanks, Cong
On Tue, Jun 24, 2014 at 4:19 AM, Richard Biener <richard.guent...@gmail.com> wrote: > On Tue, Dec 3, 2013 at 2:06 AM, Cong Hou <co...@google.com> wrote: >> Hi Richard >> >> Could you please take a look at this patch and see if it is ready for >> the trunk? The patch is pasted as a text file here again. > > (found it) > > The patch is ok for trunk. (please consider re-testing before you commit) > > Thanks, > Richard. > >> Thank you very much! >> >> >> Cong >> >> >> On Mon, Nov 11, 2013 at 11:25 AM, Cong Hou <co...@google.com> wrote: >>> Hi James >>> >>> Sorry for the late reply. >>> >>> >>> On Fri, Nov 8, 2013 at 2:55 AM, James Greenhalgh >>> <james.greenha...@arm.com> wrote: >>>>> On Tue, Nov 5, 2013 at 9:58 AM, Cong Hou <co...@google.com> wrote: >>>>> > Thank you for your detailed explanation. >>>>> > >>>>> > Once GCC detects a reduction operation, it will automatically >>>>> > accumulate all elements in the vector after the loop. In the loop the >>>>> > reduction variable is always a vector whose elements are reductions of >>>>> > corresponding values from other vectors. Therefore in your case the >>>>> > only instruction you need to generate is: >>>>> > >>>>> > VABAL ops[3], ops[1], ops[2] >>>>> > >>>>> > It is OK if you accumulate the elements into one in the vector inside >>>>> > of the loop (if one instruction can do this), but you have to make >>>>> > sure other elements in the vector should remain zero so that the final >>>>> > result is correct. >>>>> > >>>>> > If you are confused about the documentation, check the one for >>>>> > udot_prod (just above usad in md.texi), as it has very similar >>>>> > behavior as usad. Actually I copied the text from there and did some >>>>> > changes. As those two instruction patterns are both for vectorization, >>>>> > their behavior should not be difficult to explain. >>>>> > >>>>> > If you have more questions or think that the documentation is still >>>>> > improper please let me know. >>>> >>>> Hi Cong, >>>> >>>> Thanks for your reply. >>>> >>>> I've looked at Dorit's original patch adding WIDEN_SUM_EXPR and >>>> DOT_PROD_EXPR and I see that the same ambiguity exists for >>>> DOT_PROD_EXPR. Can you please add a note in your tree.def >>>> that SAD_EXPR, like DOT_PROD_EXPR can be expanded as either: >>>> >>>> tmp = WIDEN_MINUS_EXPR (arg1, arg2) >>>> tmp2 = ABS_EXPR (tmp) >>>> arg3 = PLUS_EXPR (tmp2, arg3) >>>> >>>> or: >>>> >>>> tmp = WIDEN_MINUS_EXPR (arg1, arg2) >>>> tmp2 = ABS_EXPR (tmp) >>>> arg3 = WIDEN_SUM_EXPR (tmp2, arg3) >>>> >>>> Where WIDEN_MINUS_EXPR is a signed MINUS_EXPR, returning a >>>> a value of the same (widened) type as arg3. >>>> >>> >>> >>> I have added it, although we currently don't have WIDEN_MINUS_EXPR (I >>> mentioned it in tree.def). >>> >>> >>>> Also, while looking for the history of DOT_PROD_EXPR I spotted this >>>> patch: >>>> >>>> [autovect] [patch] detect mult-hi and sad patterns >>>> http://gcc.gnu.org/ml/gcc-patches/2005-10/msg01394.html >>>> >>>> I wonder what the reason was for that patch to be dropped? >>>> >>> >>> It has been 8 years.. I have no idea why this patch is not accepted >>> finally. There is even no reply in that thread. But I believe the SAD >>> pattern is very important to be recognized. ARM also provides >>> instructions for it. >>> >>> >>> Thank you for your comment again! >>> >>> >>> thanks, >>> Cong >>> >>> >>> >>>> Thanks, >>>> James >>>>