RE: [PATCH] aarch64: Reimplement vget_low* intrinsics

Kyrylo Tkachov via Gcc-patches Fri, 05 Feb 2021 01:47:17 -0800

Hi Richard,

> -----Original Message-----
> From: Richard Biener <richard.guent...@gmail.com>
> Sent: 05 February 2021 09:25
> To: Kyrylo Tkachov <kyrylo.tkac...@arm.com>
> Cc: gcc-patches@gcc.gnu.org
> Subject: Re: [PATCH] aarch64: Reimplement vget_low* intrinsics
> 
> On Fri, Feb 5, 2021 at 9:59 AM Kyrylo Tkachov via Gcc-patches
> <gcc-patches@gcc.gnu.org> wrote:
> >
> > Hi all,
> >
> > We can do better on the vget_low* intrinsics.
> > Currently they reinterpret their argument into a V2DI vector and extract the
> low "lane",
> > reinterpreting that back into the shorter vector.
> > This is functionally correct and generates a sequence of subregs and a
> vec_select that, by itself,
> > gets optimised away eventually.
> > However it's bad when we want to use the result in a other SIMD
> operations.
> > Then the subreg-vec_select-subreg combo blocks many combine patterns.
> >
> > This patch reimplements them to emit a proper low vec_select from the
> start.
> > It generates much cleaner RTL and allows for more aggressive
> combinations, particularly
> > with the patterns that Jonathan has been pushing lately.
> >
> > Bootstrapped and tested on aarch64-none-linux-gnu and aarch64_be-
> none-elf.
> > Pushing to trunk.
> 
> Just to remind you folks that we're in stage4 which means fixes to
> regressions
> (or wrong-code) only.  aarch64 is a primary target and you should provide a
> good
> example of following the rules we set up for GCC development.


Apologies for the stream of such patches this late in development.
Indeed it is quite late in the development cycle and I'll be more careful for 
the rest of stage4.

> 
> I'd expect _at least_ a short sentence on why you think this change is
> absolutely
> required for GCC 11.

It was mostly reports from some users on really bad code generation with 
intrinsics, similar to PR94442.
The root cause for most of these is are implementations of the intrinsics with 
inline assembly, which can be fixed in a mostly-mechanical way, thus the 
similarly-looking patches.
I appreciate though that this needs to be weighed against the stability 
requirements at this stage...
> 
> The change also comes with zero testcases and zero bug references.

Indeed, I could have elaborated more. The recent changes have been targeted at 
the intrinsics in arm_neon.h.
We have quite a detailed testsuite for them at 
gcc.target/aarch64/advsimd-intrinsics that exercises them, which is why I felt 
confident to push changes in that area at this stage. 

> 
> Sorry for this particular change taking the fire, I just picked a random one 
> of
> the non-regression change-storm I'm seeing for arm/aarch64 recently.
> 

Thanks keeping me honest.
Kyrill

> Thanks for your consideration,
> Richard.
> 
> > Thanks,
> > Kyrill
> >
> > Thanks,
> > Kyrill
> >
> > gcc/ChangeLog:
> >
> >         * config/aarch64/aarch64-simd-builtins.def (get_low): Define 
> > builtin.
> >         * config/aarch64/aarch64-simd.md (aarch64_get_low<mode>): Define.
> >         * config/aarch64/arm_neon.h (__GET_LOW): Delete.
> >         (vget_low_f16): Reimplement using new builtin.
> >         (vget_low_f32): Likewise.
> >         (vget_low_f64): Likewise.
> >         (vget_low_p8): Likewise.
> >         (vget_low_p16): Likewise.
> >         (vget_low_p64): Likewise.
> >         (vget_low_s8): Likewise.
> >         (vget_low_s16): Likewise.
> >         (vget_low_s32): Likewise.
> >         (vget_low_s64): Likewise.
> >         (vget_low_u8): Likewise.
> >         (vget_low_u16): Likewise.
> >         (vget_low_u32): Likewise.
> >         (vget_low_u64): Likewise.

RE: [PATCH] aarch64: Reimplement vget_low* intrinsics

Reply via email to