Re: [AArch64] Fix integer vabs intrinsics

James Greenhalgh Fri, 02 May 2014 03:29:26 -0700

On Fri, May 02, 2014 at 10:29:06AM +0100, pins...@gmail.com wrote:
> 
> 
> > On May 2, 2014, at 2:21 AM, James Greenhalgh <james.greenha...@arm.com> 
> > wrote:
> > 
> >> On Fri, May 02, 2014 at 10:00:15AM +0100, Andrew Pinski wrote:
> >> On Fri, May 2, 2014 at 1:48 AM, James Greenhalgh
> >> <james.greenha...@arm.com> wrote:
> >>> 
> >>> Hi,
> >>> 
> >>> Unlike the mid-end's concept of an ABS_EXPR, which treats overflow as
> >>> undefined/impossible, the neon intrinsics vabs intrinsics should behave as
> >>> the hardware. That is to say, the pseudo-code sequence:
> >> 
> >> 
> >> Only for signed integer types.  You should be able to use an unsigned
> >> integer type here instead.
> > 
> > If anything, I think that puts us in a worse position.
> 
> Not if you cast it back. 
> 
> 
> > The issue that
> > inspires this patch is that GCC will happily fold:
> > 
> >  t1 = ABS_EXPR (x)
> >  t2 = GE_EXPR (t1, 0)
> > 
> > to
> > 
> >  t2 = TRUE
> > 
> > Surely an unsigned integer type is going to suffer the same fate? Certainly 
> > I
> > can imagine somewhere in the compiler there being a fold path for:
> 
> Yes but if add a cast from the unsigned type to the signed type gcc does not
> optimize that. If it does it is a bug since the overflow is defined there.


I'm not sure I understand, are you saying I want to fold to:

  t1 = VIEW_CONVERT_EXPR (x, unsigned)
  t2 = ABS_EXPR (t1)
  t3 = VIEW_CONVERT_EXPR (t2, signed)

Surely ABS_EXPR (unsigned) is a nop, and the two VIEW_CONVERTs cancel each
other out leading to an overall NOP? It might just be Friday morning and a
lack of coffee talking, but I think I need you to spell this one out to
me in big letters!

> > 
> >  (unsigned >= 0) == TRUE
> > 
> >>> 
> >>>  a = vabs_s8 (vdup_n_s8 (-128));
> >>>  assert (a >= 0);
> >>> 
> >>> does not hold. As in hardware
> >>> 
> >>>  abs (-128) == -128
> >>> 
> >>> Folding vabs intrinsics to an ABS_EXPR is thus a mistake, and we should 
> >>> avoid
> >>> it. In fact, we have to be even more careful than that, and keep the 
> >>> integer
> >>> vabs intrinsics as an unspec in the back end.
> >> 
> >> No it is not.  The mistake is to use signed integer types here.  Just
> >> add a conversion to an unsigned integer vector and it will work
> >> correctly.
> >> In fact the ABS rtl code is not undefined for the overflow.
> > 
> > Here we are covering ourselves against a seperate issue. For auto-vectorized
> > code we want the SABD combine patterns to kick in whenever sensible. For
> > intrinsics code, in the case where vsub_s8 (x, y) would cause an underflow:
> > 
> >  vabs_s8 (vsub_s8 (x, y)) != vabd_s8 (x, y)
> > 
> > So in this case, the combine would be erroneous. Likewise SABA.
> 
> This sounds like it would problematic for unsigned types  and not just for
> vabs_s8 with vsub_s8. So I think you should be using unspec for vabd_s8
> instead. Since in rtl overflow and underflow is defined to be wrapping. 

There are no vabs_u8/vabd_u8 so I don't see how we can reach this point
with unsigned types. Further, I have never thought of RTL having signed
and unsigned types, just a bag of bits. We'll want to use unspec for the
intrinsic version of vabd_s8 - but we'll want to specify the

  (abs (minus (reg) (reg)))

behaviour so that auto-vectorized code can pick it up.

So in the end we'll have these patterns:

  (abs
    (abs (reg)))

  (intrinsic_abs
    (unspec [(reg)] UNSPEC_ABS))

  (abd
    (abs (minus (reg) (reg))))

  (intrinsic_abd
    (unspec [(reg) (reg)] UNSPEC_ABD))

  (aba
    (plus (abs (minus (reg) (reg))) (reg)))

  (intrinsic_aba
    (plus (unspec [(reg) (reg)] UNSPEC_ABD) (reg)))

which should give us reasonable auto-vectorized code without triggering any
of the issues mapping the semantics of the instructions to intrinsics.

Thanks,
James

> 
> Thanks,
> Andrew Pinski
> 
> > 
> > Thanks,
> > James
>

Re: [AArch64] Fix integer vabs intrinsics

Reply via email to