Re: [ARM] PR66791: Replace builtins for fp and unsigned vmul_n intrinsics

Christophe Lyon via Gcc-patches Tue, 03 Aug 2021 05:53:43 -0700

On Mon, Jul 19, 2021 at 2:34 PM Prathamesh Kulkarni <
prathamesh.kulka...@linaro.org> wrote:


> On Thu, 15 Jul 2021 at 16:46, Prathamesh Kulkarni
> <prathamesh.kulka...@linaro.org> wrote:
> >
> > On Thu, 15 Jul 2021 at 14:47, Christophe Lyon
> > <christophe.lyon....@gmail.com> wrote:
> > >
> > > Hi Prathamesh,
> > >
> > > On Mon, Jul 5, 2021 at 11:25 AM Kyrylo Tkachov via Gcc-patches <
> gcc-patches@gcc.gnu.org> wrote:
> > >>
> > >>
> > >>
> > >> > -----Original Message-----
> > >> > From: Prathamesh Kulkarni <prathamesh.kulka...@linaro.org>
> > >> > Sent: 05 July 2021 10:18
> > >> > To: gcc Patches <gcc-patches@gcc.gnu.org>; Kyrylo Tkachov
> > >> > <kyrylo.tkac...@arm.com>
> > >> > Subject: [ARM] PR66791: Replace builtins for fp and unsigned vmul_n
> > >> > intrinsics
> > >> >
> > >> > Hi Kyrill,
> > >> > I assume this patch is OK to commit after bootstrap+testing ?
> > >>
> > >> Yes.
> > >> Thanks,
> > >> Kyrill
> > >>
> > >
> > >
> > > The updated testcase fails on some configs:
> > > gcc.target/arm/armv8_2-fp16-neon-2.c: vdup\\.16\\tq[0-9]+, r[0-9]+
> found 2 times
> > > FAIL:  gcc.target/arm/armv8_2-fp16-neon-2.c scan-assembler-times
> vdup\\.16\\tq[0-9]+, r[0-9]+ 3
> > >
> > > For instance on arm-none-eabi with default configuration flags
> (mode/cpu/fpu)
> > > and default runtestflags.
> > > The same toolchain config also fails on this test when overriding
> runtestflags with:
> > > -mthumb/-mfloat-abi=soft/-march=armv6s-m
> > > -mthumb/-mfloat-abi=soft/-march=armv7-m
> > > -mthumb/-mfloat-abi=soft/-march=armv8.1-m.main
> > >
> > > Can you fix this please?
> > Hi Christophe,
> > Sorry for the breakage, I will take a look.
> The issue is for the following function;
>
> float16x8_t f2 (float16x8_t __a, float16_t __b) {
>   return __a * __b;
> }
>
> With -O2 -ffast-math -mfloat-abi=softfp -march=armv8.2-a+fp16, it
> generates:
> f2:
>         ldrh    ip, [sp]        @ __fp16
>         vmov    d18, r0, r1  @ v8hf
>         vmov    d19, r2, r3
>         vdup.16 q8, ip
>         vmul.f16        q8, q8, q9
>         vmov    r0, r1, d16  @ v8hf
>         vmov    r2, r3, d17
>         bx      lr
>
> It correctly generates vdup, but IIUC, r0-r3 are used up in loading
> 'a' into q9 (d18 / d19),
> and it uses ip for loading 'b' and ends up with vdup q8, ip, and thus
> the scan for "vdup\\.16\\tq[0-9]+, r[0-9]+" fails.
> I tried to adjust the scan to following to accommodate ip:
> /* { dg-final { scan-assembler-times {vdup\.16\tq[0-9]+, (r[0-9]+|ip)} 3 }
> }  */
> but that still FAIL's because log shows:
> gcc.target/arm/armv8_2-fp16-neon-2.c: vdup\\.16\\tq[0-9]+,
> (r[0-9]+|ip) found 6 times
>
> Could you suggest how should I adjust the test, so the second operand
> can be either r[0-9]+ or ip register ?
>
>
Sorry for the delay, I was on vacation.

I don't know off-hand how to adjust the test, did you check why it matched
6 times?

Christophe


> Thanks,
> Prathamesh
> >
> > Thanks,
> > Prathamesh
> > >
> > > Thanks,
> > >
> > > Christophe
> > >
> > >> >
> > >> > Thanks,
> > >> > Prathamesh
>

Re: [ARM] PR66791: Replace builtins for fp and unsigned vmul_n intrinsics

Reply via email to