On Mon, Jul 19, 2021 at 2:34 PM Prathamesh Kulkarni < prathamesh.kulka...@linaro.org> wrote:
> On Thu, 15 Jul 2021 at 16:46, Prathamesh Kulkarni > <prathamesh.kulka...@linaro.org> wrote: > > > > On Thu, 15 Jul 2021 at 14:47, Christophe Lyon > > <christophe.lyon....@gmail.com> wrote: > > > > > > Hi Prathamesh, > > > > > > On Mon, Jul 5, 2021 at 11:25 AM Kyrylo Tkachov via Gcc-patches < > gcc-patches@gcc.gnu.org> wrote: > > >> > > >> > > >> > > >> > -----Original Message----- > > >> > From: Prathamesh Kulkarni <prathamesh.kulka...@linaro.org> > > >> > Sent: 05 July 2021 10:18 > > >> > To: gcc Patches <gcc-patches@gcc.gnu.org>; Kyrylo Tkachov > > >> > <kyrylo.tkac...@arm.com> > > >> > Subject: [ARM] PR66791: Replace builtins for fp and unsigned vmul_n > > >> > intrinsics > > >> > > > >> > Hi Kyrill, > > >> > I assume this patch is OK to commit after bootstrap+testing ? > > >> > > >> Yes. > > >> Thanks, > > >> Kyrill > > >> > > > > > > > > > The updated testcase fails on some configs: > > > gcc.target/arm/armv8_2-fp16-neon-2.c: vdup\\.16\\tq[0-9]+, r[0-9]+ > found 2 times > > > FAIL: gcc.target/arm/armv8_2-fp16-neon-2.c scan-assembler-times > vdup\\.16\\tq[0-9]+, r[0-9]+ 3 > > > > > > For instance on arm-none-eabi with default configuration flags > (mode/cpu/fpu) > > > and default runtestflags. > > > The same toolchain config also fails on this test when overriding > runtestflags with: > > > -mthumb/-mfloat-abi=soft/-march=armv6s-m > > > -mthumb/-mfloat-abi=soft/-march=armv7-m > > > -mthumb/-mfloat-abi=soft/-march=armv8.1-m.main > > > > > > Can you fix this please? > > Hi Christophe, > > Sorry for the breakage, I will take a look. > The issue is for the following function; > > float16x8_t f2 (float16x8_t __a, float16_t __b) { > return __a * __b; > } > > With -O2 -ffast-math -mfloat-abi=softfp -march=armv8.2-a+fp16, it > generates: > f2: > ldrh ip, [sp] @ __fp16 > vmov d18, r0, r1 @ v8hf > vmov d19, r2, r3 > vdup.16 q8, ip > vmul.f16 q8, q8, q9 > vmov r0, r1, d16 @ v8hf > vmov r2, r3, d17 > bx lr > > It correctly generates vdup, but IIUC, r0-r3 are used up in loading > 'a' into q9 (d18 / d19), > and it uses ip for loading 'b' and ends up with vdup q8, ip, and thus > the scan for "vdup\\.16\\tq[0-9]+, r[0-9]+" fails. > I tried to adjust the scan to following to accommodate ip: > /* { dg-final { scan-assembler-times {vdup\.16\tq[0-9]+, (r[0-9]+|ip)} 3 } > } */ > but that still FAIL's because log shows: > gcc.target/arm/armv8_2-fp16-neon-2.c: vdup\\.16\\tq[0-9]+, > (r[0-9]+|ip) found 6 times > > Could you suggest how should I adjust the test, so the second operand > can be either r[0-9]+ or ip register ? > > Sorry for the delay, I was on vacation. I don't know off-hand how to adjust the test, did you check why it matched 6 times? Christophe > Thanks, > Prathamesh > > > > Thanks, > > Prathamesh > > > > > > Thanks, > > > > > > Christophe > > > > > >> > > > >> > Thanks, > > >> > Prathamesh >