It has been 8 months since this patch is posted. I have addressed all comments to this patch.
The SAD pattern is very useful for some multimedia algorithms like ffmpeg. This patch will greatly improve the performance of such algorithms. Could you please have a look again and check if it is OK for the trunk? If it is necessary I can re-post this patch in a new thread. Thank you! Cong On Tue, Dec 17, 2013 at 10:04 AM, Cong Hou <co...@google.com> wrote: > > Ping? > > > thanks, > Cong > > > On Mon, Dec 2, 2013 at 5:06 PM, Cong Hou <co...@google.com> wrote: > > Hi Richard > > > > Could you please take a look at this patch and see if it is ready for > > the trunk? The patch is pasted as a text file here again. > > > > Thank you very much! > > > > > > Cong > > > > > > On Mon, Nov 11, 2013 at 11:25 AM, Cong Hou <co...@google.com> wrote: > >> Hi James > >> > >> Sorry for the late reply. > >> > >> > >> On Fri, Nov 8, 2013 at 2:55 AM, James Greenhalgh > >> <james.greenha...@arm.com> wrote: > >>>> On Tue, Nov 5, 2013 at 9:58 AM, Cong Hou <co...@google.com> wrote: > >>>> > Thank you for your detailed explanation. > >>>> > > >>>> > Once GCC detects a reduction operation, it will automatically > >>>> > accumulate all elements in the vector after the loop. In the loop the > >>>> > reduction variable is always a vector whose elements are reductions of > >>>> > corresponding values from other vectors. Therefore in your case the > >>>> > only instruction you need to generate is: > >>>> > > >>>> > VABAL ops[3], ops[1], ops[2] > >>>> > > >>>> > It is OK if you accumulate the elements into one in the vector inside > >>>> > of the loop (if one instruction can do this), but you have to make > >>>> > sure other elements in the vector should remain zero so that the final > >>>> > result is correct. > >>>> > > >>>> > If you are confused about the documentation, check the one for > >>>> > udot_prod (just above usad in md.texi), as it has very similar > >>>> > behavior as usad. Actually I copied the text from there and did some > >>>> > changes. As those two instruction patterns are both for vectorization, > >>>> > their behavior should not be difficult to explain. > >>>> > > >>>> > If you have more questions or think that the documentation is still > >>>> > improper please let me know. > >>> > >>> Hi Cong, > >>> > >>> Thanks for your reply. > >>> > >>> I've looked at Dorit's original patch adding WIDEN_SUM_EXPR and > >>> DOT_PROD_EXPR and I see that the same ambiguity exists for > >>> DOT_PROD_EXPR. Can you please add a note in your tree.def > >>> that SAD_EXPR, like DOT_PROD_EXPR can be expanded as either: > >>> > >>> tmp = WIDEN_MINUS_EXPR (arg1, arg2) > >>> tmp2 = ABS_EXPR (tmp) > >>> arg3 = PLUS_EXPR (tmp2, arg3) > >>> > >>> or: > >>> > >>> tmp = WIDEN_MINUS_EXPR (arg1, arg2) > >>> tmp2 = ABS_EXPR (tmp) > >>> arg3 = WIDEN_SUM_EXPR (tmp2, arg3) > >>> > >>> Where WIDEN_MINUS_EXPR is a signed MINUS_EXPR, returning a > >>> a value of the same (widened) type as arg3. > >>> > >> > >> > >> I have added it, although we currently don't have WIDEN_MINUS_EXPR (I > >> mentioned it in tree.def). > >> > >> > >>> Also, while looking for the history of DOT_PROD_EXPR I spotted this > >>> patch: > >>> > >>> [autovect] [patch] detect mult-hi and sad patterns > >>> http://gcc.gnu.org/ml/gcc-patches/2005-10/msg01394.html > >>> > >>> I wonder what the reason was for that patch to be dropped? > >>> > >> > >> It has been 8 years.. I have no idea why this patch is not accepted > >> finally. There is even no reply in that thread. But I believe the SAD > >> pattern is very important to be recognized. ARM also provides > >> instructions for it. > >> > >> > >> Thank you for your comment again! > >> > >> > >> thanks, > >> Cong > >> > >> > >> > >>> Thanks, > >>> James > >>>