On Fri, May 02, 2014 at 10:00:15AM +0100, Andrew Pinski wrote: > On Fri, May 2, 2014 at 1:48 AM, James Greenhalgh > <james.greenha...@arm.com> wrote: > > > > Hi, > > > > Unlike the mid-end's concept of an ABS_EXPR, which treats overflow as > > undefined/impossible, the neon intrinsics vabs intrinsics should behave as > > the hardware. That is to say, the pseudo-code sequence: > > > Only for signed integer types. You should be able to use an unsigned > integer type here instead.
If anything, I think that puts us in a worse position. The issue that inspires this patch is that GCC will happily fold: t1 = ABS_EXPR (x) t2 = GE_EXPR (t1, 0) to t2 = TRUE Surely an unsigned integer type is going to suffer the same fate? Certainly I can imagine somewhere in the compiler there being a fold path for: (unsigned >= 0) == TRUE > > > > a = vabs_s8 (vdup_n_s8 (-128)); > > assert (a >= 0); > > > > does not hold. As in hardware > > > > abs (-128) == -128 > > > > Folding vabs intrinsics to an ABS_EXPR is thus a mistake, and we should > > avoid > > it. In fact, we have to be even more careful than that, and keep the integer > > vabs intrinsics as an unspec in the back end. > > No it is not. The mistake is to use signed integer types here. Just > add a conversion to an unsigned integer vector and it will work > correctly. > In fact the ABS rtl code is not undefined for the overflow. Here we are covering ourselves against a seperate issue. For auto-vectorized code we want the SABD combine patterns to kick in whenever sensible. For intrinsics code, in the case where vsub_s8 (x, y) would cause an underflow: vabs_s8 (vsub_s8 (x, y)) != vabd_s8 (x, y) So in this case, the combine would be erroneous. Likewise SABA. Thanks, James