On Fri, May 02, 2014 at 10:29:06AM +0100, pins...@gmail.com wrote: > > > > On May 2, 2014, at 2:21 AM, James Greenhalgh <james.greenha...@arm.com> > > wrote: > > > >> On Fri, May 02, 2014 at 10:00:15AM +0100, Andrew Pinski wrote: > >> On Fri, May 2, 2014 at 1:48 AM, James Greenhalgh > >> <james.greenha...@arm.com> wrote: > >>> > >>> Hi, > >>> > >>> Unlike the mid-end's concept of an ABS_EXPR, which treats overflow as > >>> undefined/impossible, the neon intrinsics vabs intrinsics should behave as > >>> the hardware. That is to say, the pseudo-code sequence: > >> > >> > >> Only for signed integer types. You should be able to use an unsigned > >> integer type here instead. > > > > If anything, I think that puts us in a worse position. > > Not if you cast it back. > > > > The issue that > > inspires this patch is that GCC will happily fold: > > > > t1 = ABS_EXPR (x) > > t2 = GE_EXPR (t1, 0) > > > > to > > > > t2 = TRUE > > > > Surely an unsigned integer type is going to suffer the same fate? Certainly > > I > > can imagine somewhere in the compiler there being a fold path for: > > Yes but if add a cast from the unsigned type to the signed type gcc does not > optimize that. If it does it is a bug since the overflow is defined there.
I'm not sure I understand, are you saying I want to fold to: t1 = VIEW_CONVERT_EXPR (x, unsigned) t2 = ABS_EXPR (t1) t3 = VIEW_CONVERT_EXPR (t2, signed) Surely ABS_EXPR (unsigned) is a nop, and the two VIEW_CONVERTs cancel each other out leading to an overall NOP? It might just be Friday morning and a lack of coffee talking, but I think I need you to spell this one out to me in big letters! > > > > (unsigned >= 0) == TRUE > > > >>> > >>> a = vabs_s8 (vdup_n_s8 (-128)); > >>> assert (a >= 0); > >>> > >>> does not hold. As in hardware > >>> > >>> abs (-128) == -128 > >>> > >>> Folding vabs intrinsics to an ABS_EXPR is thus a mistake, and we should > >>> avoid > >>> it. In fact, we have to be even more careful than that, and keep the > >>> integer > >>> vabs intrinsics as an unspec in the back end. > >> > >> No it is not. The mistake is to use signed integer types here. Just > >> add a conversion to an unsigned integer vector and it will work > >> correctly. > >> In fact the ABS rtl code is not undefined for the overflow. > > > > Here we are covering ourselves against a seperate issue. For auto-vectorized > > code we want the SABD combine patterns to kick in whenever sensible. For > > intrinsics code, in the case where vsub_s8 (x, y) would cause an underflow: > > > > vabs_s8 (vsub_s8 (x, y)) != vabd_s8 (x, y) > > > > So in this case, the combine would be erroneous. Likewise SABA. > > This sounds like it would problematic for unsigned types and not just for > vabs_s8 with vsub_s8. So I think you should be using unspec for vabd_s8 > instead. Since in rtl overflow and underflow is defined to be wrapping. There are no vabs_u8/vabd_u8 so I don't see how we can reach this point with unsigned types. Further, I have never thought of RTL having signed and unsigned types, just a bag of bits. We'll want to use unspec for the intrinsic version of vabd_s8 - but we'll want to specify the (abs (minus (reg) (reg))) behaviour so that auto-vectorized code can pick it up. So in the end we'll have these patterns: (abs (abs (reg))) (intrinsic_abs (unspec [(reg)] UNSPEC_ABS)) (abd (abs (minus (reg) (reg)))) (intrinsic_abd (unspec [(reg) (reg)] UNSPEC_ABD)) (aba (plus (abs (minus (reg) (reg))) (reg))) (intrinsic_aba (plus (unspec [(reg) (reg)] UNSPEC_ABD) (reg))) which should give us reasonable auto-vectorized code without triggering any of the issues mapping the semantics of the instructions to intrinsics. Thanks, James > > Thanks, > Andrew Pinski > > > > > Thanks, > > James >