> -----Original Message----- > From: Richard Sandiford <richard.sandif...@arm.com> > Sent: 09 July 2021 12:40 > To: Jonathan Wright <jonathan.wri...@arm.com> > Cc: gcc-patches@gcc.gnu.org; Kyrylo Tkachov <kyrylo.tkac...@arm.com> > Subject: Re: [PATCH] aarch64: Use unions for vector tables in vqtbl[234] > intrinsics > > Jonathan Wright <jonathan.wri...@arm.com> writes: > > Hi, > > > > As subject, this patch uses a union instead of constructing a new opaque > > vector structure for each of the vqtbl[234] Neon intrinsics in arm_neon.h. > > This simplifies the header file and also improves code generation - > > superfluous move instructions were emitted for every register > > extraction/set in this additional structure. > > > > This change is safe because the C-level vector structure types e.g. > > uint8x16x4_t already provide a tie for sequential register allocation > > - which is required by the TBL instructions. > > > > Regression tested and bootstrapped on aarch64-none-linux-gnu - no > > issues. > > > > Ok for master? > > Looks good, but I think we should have some tests to defend the > RA improvements. E.g. have things like: > > #include <arm_neon.h> > > … > > uint8x8_t > f2_u8 (uint8x16x2_t x, uint8x8_t y) > { > return vqtbl2_u8 (x, y); > } > > … > > and add a scan-assembler-not for moves. > > Union punning is UB for standard C++, but I think in practice we're > not going to be able to treat it as such for GCC. This would be > far from the only thing to rely on union punning for correctness.
Could we use some reinterpret_cast or bit_cast (or the builtins the rely on) for C++? This may involve separate definitions for C and C++, which may not be worth it... No objections to the patch from me to be clear. Thanks, Kyrill > > Thanks, > Richard