On Thu, 2 Sept 2021 at 14:32, Christophe Lyon <christophe.lyon....@gmail.com> wrote: > > > > On Tue, Aug 24, 2021 at 10:17 AM Kyrylo Tkachov <kyrylo.tkac...@arm.com> > wrote: >> >> >> >> > -----Original Message----- >> > From: Prathamesh Kulkarni <prathamesh.kulka...@linaro.org> >> > Sent: 24 August 2021 09:01 >> > To: Christophe Lyon <christophe.lyon....@gmail.com> >> > Cc: Kyrylo Tkachov <kyrylo.tkac...@arm.com>; gcc Patches <gcc- >> > patc...@gcc.gnu.org> >> > Subject: Re: [ARM] PR66791: Replace builtins for vdup_n and vmov_n >> > intrinsics >> > >> > On Tue, 17 Aug 2021 at 11:55, Prathamesh Kulkarni >> > <prathamesh.kulka...@linaro.org> wrote: >> > > >> > > On Thu, 12 Aug 2021 at 19:04, Christophe Lyon >> > > <christophe.lyon....@gmail.com> wrote: >> > > > >> > > > >> > > > >> > > > On Thu, Aug 12, 2021 at 1:54 PM Prathamesh Kulkarni >> > <prathamesh.kulka...@linaro.org> wrote: >> > > >> >> > > >> On Wed, 11 Aug 2021 at 22:23, Christophe Lyon >> > > >> <christophe.lyon....@gmail.com> wrote: >> > > >> > >> > > >> > >> > > >> > >> > > >> > On Thu, Jun 24, 2021 at 6:29 PM Kyrylo Tkachov via Gcc-patches <gcc- >> > patc...@gcc.gnu.org> wrote: >> > > >> >> >> > > >> >> >> > > >> >> >> > > >> >> > -----Original Message----- >> > > >> >> > From: Prathamesh Kulkarni <prathamesh.kulka...@linaro.org> >> > > >> >> > Sent: 24 June 2021 12:11 >> > > >> >> > To: gcc Patches <gcc-patches@gcc.gnu.org>; Kyrylo Tkachov >> > > >> >> > <kyrylo.tkac...@arm.com> >> > > >> >> > Subject: [ARM] PR66791: Replace builtins for vdup_n and vmov_n >> > intrinsics >> > > >> >> > >> > > >> >> > Hi, >> > > >> >> > This patch replaces builtins for vdup_n and vmov_n. >> > > >> >> > The patch results in regression for pr51534.c. >> > > >> >> > Consider following function: >> > > >> >> > >> > > >> >> > uint8x8_t f1 (uint8x8_t a) { >> > > >> >> > return vcgt_u8(a, vdup_n_u8(0)); >> > > >> >> > } >> > > >> >> > >> > > >> >> > code-gen before patch: >> > > >> >> > f1: >> > > >> >> > vmov.i32 d16, #0 @ v8qi >> > > >> >> > vcgt.u8 d0, d0, d16 >> > > >> >> > bx lr >> > > >> >> > >> > > >> >> > code-gen after patch: >> > > >> >> > f1: >> > > >> >> > vceq.i8 d0, d0, #0 >> > > >> >> > vmvn d0, d0 >> > > >> >> > bx lr >> > > >> >> > >> > > >> >> > I am not sure which one is better tho ? >> > > >> >> >> > > >> > >> > > >> > Hi Prathamesh, >> > > >> > >> > > >> > This patch introduces a regression on non-hardfp configs (eg arm- >> > linux-gnueabi or arm-eabi): >> > > >> > FAIL: gcc:gcc.target/arm/arm.exp=gcc.target/arm/pr51534.c scan- >> > assembler-times vmov.i32[ \t]+[dD][0-9]+, #0xffffffff 3 >> > > >> > FAIL: gcc:gcc.target/arm/arm.exp=gcc.target/arm/pr51534.c scan- >> > assembler-times vmov.i32[ \t]+[qQ][0-9]+, #4294967295 3 >> > > >> > >> > > >> > Can you fix this? >> > > >> The issue is, for following test: >> > > >> >> > > >> #include <arm_neon.h> >> > > >> >> > > >> uint8x8_t f1 (uint8x8_t a) { >> > > >> return vcge_u8(a, vdup_n_u8(0)); >> > > >> } >> > > >> >> > > >> armhf code-gen: >> > > >> f1: >> > > >> vmov.i32 d0, #0xffffffff @ v8qi >> > > >> bx lr >> > > >> >> > > >> arm softfp code-gen: >> > > >> f1: >> > > >> mov r0, #-1 >> > > >> mov r1, #-1 >> > > >> bx lr >> > > >> >> > > >> The code-gen for both is same upto split2 pass: >> > > >> >> > > >> (insn 10 6 11 2 (set (reg/i:V8QI 16 s0) >> > > >> (const_vector:V8QI [ >> > > >> (const_int -1 [0xffffffffffffffff]) repeated x8 >> > > >> ])) "foo.c":5:1 1052 {*neon_movv8qi} >> > > >> (expr_list:REG_EQUAL (const_vector:V8QI [ >> > > >> (const_int -1 [0xffffffffffffffff]) repeated x8 >> > > >> ]) >> > > >> (nil))) >> > > >> (insn 11 10 13 2 (use (reg/i:V8QI 16 s0)) "foo.c":5:1 -1 >> > > >> (nil)) >> > > >> >> > > >> and for softfp target, split2 pass splits the assignment to r0 and r1: >> > > >> >> > > >> (insn 15 6 16 2 (set (reg:SI 0 r0) >> > > >> (const_int -1 [0xffffffffffffffff])) "foo.c":5:1 740 >> > {*thumb2_movsi_vfp} >> > > >> (nil)) >> > > >> (insn 16 15 11 2 (set (reg:SI 1 r1 [+4 ]) >> > > >> (const_int -1 [0xffffffffffffffff])) "foo.c":5:1 740 >> > {*thumb2_movsi_vfp} >> > > >> (nil)) >> > > >> (insn 11 16 13 2 (use (reg/i:V8QI 0 r0)) "foo.c":5:1 -1 >> > > >> (nil)) >> > > >> >> > > >> I suppose we could use a dg-scan for r[0-9]+, #-1 for softfp targets ? >> > > >> >> > > > Yes, probably, or try with check-function-bodies. >> > > Hi, >> > > Sorry for the late response. Does the attached patch look OK ? >> > ping https://gcc.gnu.org/pipermail/gcc-patches/2021-August/577532.html >> >> Ok. > > > > Sorry Prathamesh, this does not quite work. See > https://people.linaro.org/~christophe.lyon/cross-validation/gcc/trunk/r12-3294-g7a6f40d0452ec76e126c2612dcfa32f3c73e2315/report-build-info.html > (red cells in the gcc column) > > Can you have a look? Ah, for arm-none-eabi target the code-gen for test_vcge_u8_uint8x8_t is: mvn r0, #0 mvn r1, #0 bx lr
instead of: mov r0, #-1 mov r1, #-1 bx lr I guess both code-gen sequences are correct, but I am not sure under which circumstance either sequence gets generated. To accomodate for both, I tried to check for either pattern with OR: /* { dg-final { scan-assembler-times "(mov\[ \]+r\[0-9\]+, #-1)|(mvn\[ \]+r\[0-9\]+, #0)" 6 { target { arm_softfp_ok } } } } */ but that doesn't seem to work. Do you have any suggestions on how to proceed ? Thanks, Prathamesh > > Thanks > > Christophe > >> Thanks, >> Kyrill >> >> > >> > Thanks, >> > Prathamesh >> > > >> > > Thanks, >> > > Prathamesh >> > > > >> > > > Christophe >> > > > >> > > >> Thanks, >> > > >> Prathamesh >> > > >> > >> > > >> > Thanks >> > > >> > >> > > >> > Christophe >> > > >> > >> > > >> > >> > > >> >> >> > > >> >> I think they're equivalent in practice, in any case the patch >> > > >> >> itself is >> > good (move away from RTL builtins). >> > > >> >> Ok. >> > > >> >> Thanks, >> > > >> >> Kyrill >> > > >> >> >> > > >> >> > >> > > >> >> > Also, this patch regressed bf16_dup.c on arm-linux-gnueabi, >> > > >> >> > which is due to a missed opt in lowering. I had filed it as >> > > >> >> > PR98435, and posted a fix for it here: >> > > >> >> > https://gcc.gnu.org/pipermail/gcc-patches/2021- >> > June/572648.html >> > > >> >> > >> > > >> >> > Thanks, >> > > >> >> > Prathamesh