RE: [PATCH] RISC-V: Allow Vector IOR(V1, NOT V1) optimiztion

Li, Pan2 via Gcc-patches Tue, 18 Apr 2023 02:12:02 -0700

Update the PATCH v2 here, 
https://gcc.gnu.org/pipermail/gcc-patches/2023-April/615937.html.


Running the boostrap/regression test, and keep you posted.

Pan

-----Original Message-----
From: Gcc-patches <gcc-patches-bounces+pan2.li=intel....@gcc.gnu.org> On Behalf 
Of Li, Pan2 via Gcc-patches
Sent: Tuesday, April 18, 2023 4:20 PM
To: Richard Biener <richard.guent...@gmail.com>
Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; kito.ch...@sifive.com; 
rguent...@suse.de; Wang, Yanzhang <yanzhang.w...@intel.com>; 
richard.sandif...@arm.com
Subject: RE: [PATCH] RISC-V: Allow Vector IOR(V1, NOT V1) optimiztion

I look into the IOR simplification code for this optimization. Mostly I try to 
implement them with generic vector operations.

Pan

-----Original Message-----
From: Richard Biener <richard.guent...@gmail.com>
Sent: Tuesday, April 18, 2023 4:01 PM
To: Li, Pan2 <pan2...@intel.com>
Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; kito.ch...@sifive.com; 
rguent...@suse.de; Wang, Yanzhang <yanzhang.w...@intel.com>; 
richard.sandif...@arm.com
Subject: Re: [PATCH] RISC-V: Allow Vector IOR(V1, NOT V1) optimiztion

On Tue, Apr 18, 2023 at 9:59 AM Richard Biener <richard.guent...@gmail.com> 
wrote:
>
> On Tue, Apr 18, 2023 at 3:31 AM Li, Pan2 via Gcc-patches 
> <gcc-patches@gcc.gnu.org> wrote:
> >
> > Passed the X86 bootstrap and regression tests.
> >
> > Pan
> >
> > -----Original Message-----
> > From: Li, Pan2 <pan2...@intel.com>
> > Sent: Monday, April 17, 2023 10:50 PM
> > To: gcc-patches@gcc.gnu.org
> > Cc: juzhe.zh...@rivai.ai; kito.ch...@sifive.com; rguent...@suse.de; 
> > Li, Pan2 <pan2...@intel.com>; Wang, Yanzhang 
> > <yanzhang.w...@intel.com>; richard.sandif...@arm.com
> > Subject: [PATCH] RISC-V: Allow Vector IOR(V1, NOT V1) optimiztion
> >
> > From: Pan Li <pan2...@intel.com>
> >
> > This patch add the optimization for the vector IOR(V1, NOT V1). Assume we 
> > have below sample code.
> >
> > vbool32_t test_shortcut_for_riscv_vmorn_case_5(vbool32_t v1, size_t vl) {
> >   return __riscv_vmorn_mm_b32(v1, v1, vl); }

Btw, this also shows you might want to consider inlining the intrinsics in 
target specific folding or implement them with generic vector operations in the 
headers.

> >
> > Before this patch:
> > vsetvli  a5,zero,e8,mf4,ta,ma
> > vlm.v    v24,0(a1)
> > vsetvli  zero,a2,e8,mf4,ta,ma
> > vmorn.mm v24,v24,v24
> > vsetvli  a5,zero,e8,mf4,ta,ma
> > vsm.v    v24,0(a0)
> > ret
> >
> > After this patch:
> > vsetvli zero,a2,e8,mf4,ta,ma
> > vmset.m v24
> > vsetvli a5,zero,e8,mf4,ta,ma
> > vsm.v   v24,0(a0)
> > ret
> >
> > Or in RTL's perspective,
> > from:
> > (ior:VNx2BI (reg/v:VNx2BI 137 [ v1 ]) (not:VNx2BI (reg/v:VNx2BI 137 
> > [ v1 ])))
> > to:
> > (const_vector:VNx2BI repeat [ (const_int 1 [0x1]) ])
> >
> > The similar optimization like VMANDN has enabled already. There should be 
> > no difference execpt the operator when compare the VMORN and VMANDN for 
> > such kind of optimization. The patch allows the VECTOR_BOOL IOR(V1, NOT V1) 
> > simplification besides the existing SCALAR_INT mode.
> >
> > gcc/ChangeLog:
> >
> >         * machmode.h (VECTOR_BOOL_MODE_P):
> >         * simplify-rtx.cc (valid_mode_for_ior_simplification_p):
> >         (simplify_context::simplify_binary_operation_1):
> >
> > gcc/testsuite/ChangeLog:
> >
> >         * gcc.target/riscv/rvv/base/mask_insn_shortcut.c:
> >         * gcc.target/riscv/simplify_ior_optimization.c: New test.
> >
> > Signed-off-by: Pan Li <pan2...@intel.com>
> > ---
> >  gcc/machmode.h                                |  4 ++
> >  gcc/simplify-rtx.cc                           | 10 +++-
> >  .../riscv/rvv/base/mask_insn_shortcut.c       |  3 +-
> >  .../riscv/simplify_ior_optimization.c         | 50 +++++++++++++++++++
> >  4 files changed, 63 insertions(+), 4 deletions(-)  create mode
> > 100644 gcc/testsuite/gcc.target/riscv/simplify_ior_optimization.c
> >
> > diff --git a/gcc/machmode.h b/gcc/machmode.h index
> > f1865c1ef42..771bae89cb7 100644
> > --- a/gcc/machmode.h
> > +++ b/gcc/machmode.h
> > @@ -134,6 +134,10 @@ extern const unsigned char 
> > mode_class[NUM_MACHINE_MODES];
> >     || GET_MODE_CLASS (MODE) == MODE_VECTOR_ACCUM       \
> >     || GET_MODE_CLASS (MODE) == MODE_VECTOR_UACCUM)
> >
> > +/* Nonzero if MODE is a vector bool mode.  */
> > +#define VECTOR_BOOL_MODE_P(MODE)                       \
> > +  (GET_MODE_CLASS (MODE) == MODE_VECTOR_BOOL)          \
> > +
> >  /* Nonzero if MODE is a scalar integral mode.  */
> >  #define SCALAR_INT_MODE_P(MODE)                        \
> >    (GET_MODE_CLASS (MODE) == MODE_INT           \
> > diff --git a/gcc/simplify-rtx.cc b/gcc/simplify-rtx.cc index 
> > ee75079917f..eff27b835bf 100644
> > --- a/gcc/simplify-rtx.cc
> > +++ b/gcc/simplify-rtx.cc
> > @@ -57,6 +57,12 @@ neg_poly_int_rtx (machine_mode mode, const_rtx i)
> >    return immed_wide_int_const (-wi::to_poly_wide (i, mode), mode); 
> > }
> >
> > +static bool
> > +valid_mode_for_ior_simplification_p (machine_mode mode) {
> > +  return SCALAR_INT_MODE_P (mode) || VECTOR_BOOL_MODE_P (mode); }
> > +
> >  /* Test whether expression, X, is an immediate constant that represents
> >     the most significant bit of machine mode MODE.  */
> >
> > @@ -3332,8 +3338,8 @@ simplify_context::simplify_binary_operation_1 
> > (rtx_code code,
> >        if (((GET_CODE (op0) == NOT && rtx_equal_p (XEXP (op0, 0), op1))
> >            || (GET_CODE (op1) == NOT && rtx_equal_p (XEXP (op1, 0), op0)))
> >           && ! side_effects_p (op0)
> > -         && SCALAR_INT_MODE_P (mode))
> > -       return constm1_rtx;
> > +         && valid_mode_for_ior_simplification_p (mode))
>
> for simple predicates like this please do not split them out, it makes 
> understanding the code more difficult.
>
> > +       return CONST1_RTX (mode);
>
> shouldn't this be CONSTM1_RTX (mode)?  Why is this only valid for 
> VECTOR_BOOL and not also for VECTOR_INT?  You're citing AND and that 
> does
>
>       /* A & (~A) -> 0 */
>       if (((GET_CODE (op0) == NOT && rtx_equal_p (XEXP (op0, 0), op1))
>            || (GET_CODE (op1) == NOT && rtx_equal_p (XEXP (op1, 0), op0)))
>           && ! side_effects_p (op0)
>           && GET_MODE_CLASS (mode) != MODE_CC)
>         return CONST0_RTX (mode);
>
> so why differ and not use the same GET_MODE_CLASS (mode) != MODE_CC condition?
>
> Richard.
>
> >
> >        /* (ior A C) is C if all bits of A that might be nonzero are on in 
> > C.  */
> >        if (CONST_INT_P (op1)
> > diff --git
> > a/gcc/testsuite/gcc.target/riscv/rvv/base/mask_insn_shortcut.c
> > b/gcc/testsuite/gcc.target/riscv/rvv/base/mask_insn_shortcut.c
> > index 83cc4a1b5a5..57d0241675a 100644
> > --- a/gcc/testsuite/gcc.target/riscv/rvv/base/mask_insn_shortcut.c
> > +++ b/gcc/testsuite/gcc.target/riscv/rvv/base/mask_insn_shortcut.c
> > @@ -233,9 +233,8 @@ vbool64_t
> > test_shortcut_for_riscv_vmxnor_case_6(vbool64_t v1, size_t vl) {
> >  /* { dg-final { scan-assembler-not {vmxor\.mm\s+v[0-9]+,\s*v[0-9]+} 
> > } } */
> >  /* { dg-final { scan-assembler-not {vmor\.mm\s+v[0-9]+,\s*v[0-9]+} 
> > } } */
> >  /* { dg-final { scan-assembler-not {vmnor\.mm\s+v[0-9]+,\s*v[0-9]+} 
> > } } */
> > -/* { dg-final { scan-assembler-times 
> > {vmorn\.mm\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+} 7 } } */
> >  /* { dg-final { scan-assembler-not 
> > {vmxnor\.mm\s+v[0-9]+,\s*v[0-9]+} } } */
> >  /* { dg-final { scan-assembler-times {vmclr\.m\s+v[0-9]+} 14 } } */
> > -/* { dg-final { scan-assembler-times {vmset\.m\s+v[0-9]+} 7 } } */
> > +/* { dg-final { scan-assembler-times {vmset\.m\s+v[0-9]+} 14 } } */
> >  /* { dg-final { scan-assembler-times {vmmv\.m\s+v[0-9]+,\s*v[0-9]+}
> > 14 } } */
> >  /* { dg-final { scan-assembler-times 
> > {vmnot\.m\s+v[0-9]+,\s*v[0-9]+} 14 } } */ diff --git 
> > a/gcc/testsuite/gcc.target/riscv/simplify_ior_optimization.c
> > b/gcc/testsuite/gcc.target/riscv/simplify_ior_optimization.c
> > new file mode 100644
> > index 00000000000..ec3bd0baf03
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/riscv/simplify_ior_optimization.c
> > @@ -0,0 +1,50 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-march=rv64gc -mabi=lp64 -O2" } */
> > +
> > +#include <stdint.h>
> > +
> > +uint8_t test_simplify_ior_scalar_case_0 (uint8_t a) {
> > +  return a | ~a;
> > +}
> > +
> > +uint16_t test_simplify_ior_scalar_case_1 (uint16_t a) {
> > +  return a | ~a;
> > +}
> > +
> > +uint32_t test_simplify_ior_scalar_case_2 (uint32_t a) {
> > +  return a | ~a;
> > +}
> > +
> > +uint64_t test_simplify_ior_scalar_case_3 (uint64_t a) {
> > +  return a | ~a;
> > +}
> > +
> > +int8_t test_simplify_ior_scalar_case_4 (int8_t a) {
> > +  return a | ~a;
> > +}
> > +
> > +int16_t test_simplify_ior_scalar_case_5 (int16_t a) {
> > +  return a | ~a;
> > +}
> > +
> > +int32_t test_simplify_ior_scalar_case_6 (int32_t a) {
> > +  return a | ~a;
> > +}
> > +
> > +int64_t test_simplify_ior_scalar_case_7 (int64_t a) {
> > +  return a | ~a;
> > +}
> > +
> > +/* { dg-final { scan-assembler-times {li\s+a[0-9]+,\s*-1} 6 } } */
> > +/* { dg-final { scan-assembler-times {li\s+a[0-9]+,\s*255} 1 } } */
> > +/* { dg-final { scan-assembler-times {li\s+a[0-9]+,\s*65536} 1 } } 
> > +*/
> > +/* { dg-final { scan-assembler-not {or\s+a[0-9]+} } } */
> > +/* { dg-final { scan-assembler-not {not\s+a[0-9]+} } } */
> > --
> > 2.34.1
> >

RE: [PATCH] RISC-V: Allow Vector IOR(V1, NOT V1) optimiztion

Reply via email to