https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108583

--- Comment #18 from Tamar Christina <tnfchris at gcc dot gnu.org> ---
> > 
> > Ack, that also tracks with what I tried before, we don't indeed track ranges
> > for vector ops. The general case can still be handled slightly better (I 
> > think)
> > but it doesn't become as clear of a win as this one.
> > 
> > > You probably did so elsewhere some time ago, but what exactly are those
> > > four instructions?  (pointers to specifications appreciated)
> > 
> > For NEON we use:
> > https://developer.arm.com/documentation/ddi0596/2021-12/SIMD-FP-Instructions/ADDHN--ADDHN2--Add-returning-High-Narrow-
> 
> so thats a add + pack high
> 

Yes, though with no overflow, the addition is done in twice the precision of
the original type. So it's more a widening add + pack high which narrows it
back and zero extends.

> > https://developer.arm.com/documentation/ddi0596/2021-12/SIMD-FP-Instructions/UADDW--UADDW2--Unsigned-Add-Wide-
> 
> and that unpacks (zero-extends) the high/low part of one operand of an add
> 
> I wonder if we'd open-code the pack / unpack and use regular add whether
> combine can synthesize uaddw and addhn?  The pack and unpack would be
> vec_perms on GIMPLE (plus V_C_E).

I don't think so for addhn, because it wouldn't truncate the top bits, it
truncates the bottom bits.

The instruction does
    element1 = Elem[operand1, e, 2*esize];
    element2 = Elem[operand2, e, 2*esize];

So it widens on input. 

> 
> So the difficulty here will be to decide whether that's in the end
> better than what the pattern handling code does now, right?  Because
> I think most targets will be able to do the above but lacking the
> special adds it will be slower because of the extra packing/unpacking?
> 
> That said, can we possibly do just that costing (would be a first in
> the pattern code I guess) with a target hook?  Or add optabs for
> the addh operations so we can query support?

We could, the alternative wouldn't be correct for costing I think.. if we
generate *+ , vec_perm that's gonna be more expensive.

Reply via email to