Richard Biener <richard.guent...@gmail.com> writes: > On Tue, Feb 2, 2021 at 5:19 PM Richard Sandiford > <richard.sandif...@arm.com> wrote: >> >> Richard Biener <richard.guent...@gmail.com> writes: >> > On Tue, Feb 2, 2021 at 4:03 PM Richard Sandiford >> > <richard.sandif...@arm.com> wrote: >> >> >> >> Richard Biener <richard.guent...@gmail.com> writes: >> >> > On Mon, Feb 1, 2021 at 6:54 PM Joel Hutton <joel.hut...@arm.com> wrote: >> >> >> >> >> >> Hi Richard(s), >> >> >> >> >> >> I'm just looking to see if I'm going about this the right way, based >> >> >> on the discussion we had on IRC. I've managed to hack something >> >> >> together, I've attached a (very) WIP patch which gives the correct >> >> >> codegen for the testcase in question >> >> >> (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98772). It would >> >> >> obviously need to support other widening patterns and differentiate >> >> >> between big/little endian among other things. >> >> >> >> >> >> I added a backend pattern because I wasn't quite clear which changes >> >> >> to make in order to allow the existing backend patterns to be used >> >> >> with a V8QI, or how to represent V16QI where we don't care about the >> >> >> top/bottom 8. I made some attempt in optabs.c, which is in the patch >> >> >> commented out, but I'm not sure if I'm going about this the right way. >> >> > >> >> > Hmm, as said, I'd try to arrange like illustrated in the attachment, >> >> > confined to vectorizable_conversion. The >> >> > only complication might be sub-optimal code-gen for the vector-vector >> >> > CTOR compensating for the input >> >> > vector (on RTL that would be a paradoxical subreg from say V4HI to V8HI) >> >> >> >> Yeah. I don't really like this because it means that it'll be >> >> impossible to remove the redundant work in gimple. The extra elements >> >> are just a crutch to satisfy the type system. >> > >> > We can certainly devise a more clever way to represent a paradoxical >> > subreg, >> > but at least the actual operation (WIDEN_MINUS_LOW) would match what >> > the hardware can do. >> >> At least for the Arm ISAs, the low parts are really 64-bit → 128-bit >> operations. E.g. the low-part intrinsic for signed 8-bit integers is: >> >> int16x8_t vsubl_s8 (int8x8_t __a, int8x8_t __b); >> >> whereas the high-part intrinsic is: >> >> int16x8_t vsubl_high_s8 (int8x16_t __a, int8x16_t __b); >> >> So representing the low part as a 128-bit → 128-bit operation is already >> a little artifical. > > that's intrinsincs - but I guess the actual machine instruction is different?
FWIW, the instructions are the same. E.g. for AArch64 it's: ssubl v0.8h, v0.8b, v1.8b (8b being a 64-bit vector and 8h being a 128-bit vector) instead of: ssubl v0.8h, v0.16b, v1.16b The AArch32 lowpart is: vsubl.s16 q0, d0, d1 where a q register joins together two d registers. >> > OTOH we could simply accept half of a vector for >> > the _LOW (little-endial) or _HIGH (big-endian) op and have the expander >> > deal with subreg frobbing? Not that I'd like that very much though, even >> > a VIEW_CONVERT <v8hi> (v4hi-reg) would be cleaner IMHO (not sure >> > how to go about endianess here ... the _LOW/_HIGH paints us into some >> > corner here) >> >> I think it only makes sense for the low part. But yeah, I guess that >> would work (although I agree it doesn't seem very appealing :-)). >> >> > A new IFN (direct optab?) means targets with existing support for _LO/HI >> > do not automatically benefit which is a shame. >> >> In practice this will only affect targets that choose to use mixed >> vector sizes, and I think it's reasonable to optimise only for the >> case in which such targets support widening conversions. So what >> do you think about the idea of emitting separate conversions and >> a normal subtract? We'd be relying on RTL to fuse them together, >> but at least there would be no redundancy to eliminate. > > So in vectorizable_conversion for the widen-minus you'd check > whether you can do a v4qi -> v4hi and then emit a conversion > and a wide minus? Yeah. Richard > I guess as long as vectorizer costing behaves > as if the op is fused that's a similarly OK trick as a V_C_E or a > vector CTOR. > > Richard. > >> Thanks, >> Richard >> > >> >> As far as Joel's patch goes, I was imagining that the new operation >> >> would be an internal function rather than a tree code. However, >> >> if we don't want that, maybe we should just emit separate conversions >> >> and a normal subtraction, like we would for (signed) x - (unsigned) y. >> >> >> >> Thanks, >> >> Richard