Jeff,

just a heads-up that that trunk (i.e., the soon-to-be GCC14) still
generates the suboptimal sequence:
  https://godbolt.org/z/K9YYEPsvY

Thanks,
Philipp.


On Mon, 21 Nov 2022 at 18:00, Philipp Tomsich <philipp.toms...@vrull.eu> wrote:
>
> On Sun, 20 Nov 2022 at 17:38, Jeff Law <jeffreya...@gmail.com> wrote:
> >
> >
> > On 11/9/22 16:10, Philipp Tomsich wrote:
> > > The current method of treating shifts of extended values on RISC-V
> > > frequently causes sequences of 3 shifts, despite the presence of the
> > > 'zero_extendsidi2_shifted' pattern.
> > >
> > > Consider:
> > >      unsigned long f(unsigned int a, unsigned long b)
> > >      {
> > >              a = a << 1;
> > >              unsigned long c = (unsigned long) a;
> > >              c = b + (c<<4);
> > >              return c;
> > >      }
> > > which will present at combine-time as:
> > >      Trying 7, 8 -> 9:
> > >          7: r78:SI=r81:DI#0<<0x1
> > >            REG_DEAD r81:DI
> > >          8: r79:DI=zero_extend(r78:SI)
> > >            REG_DEAD r78:SI
> > >          9: r72:DI=r79:DI<<0x4
> > >            REG_DEAD r79:DI
> > >      Failed to match this instruction:
> > >      (set (reg:DI 72 [ _1 ])
> > >          (and:DI (ashift:DI (reg:DI 81)
> > >                  (const_int 5 [0x5]))
> > >       (const_int 68719476704 [0xfffffffe0])))
> > > and produce the following (optimized) assembly:
> > >      f:
> > >       slliw   a5,a0,1
> > >       slli    a5,a5,32
> > >       srli    a5,a5,28
> > >       add     a0,a5,a1
> > >       ret
> > >
> > > The current way of handling this (in 'zero_extendsidi2_shifted')
> > > doesn't apply for two reasons:
> > > - this is seen before reload, and
> > > - (more importantly) the constant mask is not 0xfffffffful.
> > >
> > > To address this, we introduce a generalized version of shifting
> > > zero-extended values that supports any mask of consecutive ones as
> > > long as the number of training zeros is the inner shift-amount.
> > >
> > > With this new split, we generate the following assembly for the
> > > aforementioned function:
> > >      f:
> > >       slli    a0,a0,33
> > >       srli    a0,a0,28
> > >       add     a0,a0,a1
> > >       ret
> > >
> > > Unfortunately, all of this causes some fallout (especially in how it
> > > interacts with Zb* extensions and zero_extract expressions formed
> > > during combine): this is addressed through additional instruction
> > > splitting and handling of zero_extract.
> > >
> > > gcc/ChangeLog:
> > >
> > >       * config/riscv/bitmanip.md (*zext.w): Match a zext.w expressed
> > >          as an and:DI.
> > >       (*andi_add.uw): New pattern.
> > >       (*slli_slli_uw): New pattern.
> > >       (*shift_then_shNadd.uw): New pattern.
> > >       (*slliuw): Rename to riscv_slli_uw.
> > >       (riscv_slli_uw): Renamed from *slliuw.
> > >       (*zeroextract<GPR:mode><ANYI:mode>2_highbits): New pattern.
> > >       (*zero_extract<GPR:mode>): New pattern, which will be split to
> > >       shift-left + shift-right.
> > >       * config/riscv/predicates.md (dimode_shift_operand):
> > >       * config/riscv/riscv.md (*zero_extract<GPR:mode>_lowbits):
> > >       (zero_extendsidi2_shifted): Rename.
> > >       (*zero_extendsidi2_shifted): Generalize.
> > >       (*shift<GPR:MODE>_truthvalue): New pattern.
> > >
> > > gcc/testsuite/ChangeLog:
> > >
> > >       * gcc.target/riscv/shift-shift-6.c: New test.
> > >       * gcc.target/riscv/shift-shift-7.c: New test.
> > >       * gcc.target/riscv/shift-shift-8.c: New test.
> > >       * gcc.target/riscv/shift-shift-9.c: New test.
> > >       * gcc.target/riscv/snez.c: New test.
> > >
> > > Commit notes:
> > > - Depends on a predicate posted in "RISC-V: Optimize branches testing
> > >    a bit-range or a shifted immediate".  Depending on the order of
> > >    applying these, I'll take care to pull that part out of the other
> > >    patch if needed.
> > >
> > > Version-changes: 2
> > > - refactor
> > > - optimise for additional corner cases and deal with fallout
> > >
> > > Version-changes: 3
> > > - removed the [WIP] from the commit message (no other changes)
> > >
> > > Signed-off-by: Philipp Tomsich <philipp.toms...@vrull.eu>
> > > ---
> > >
> > > (no changes since v1)
> > >
> > >   gcc/config/riscv/bitmanip.md                  | 142 ++++++++++++++----
> > >   gcc/config/riscv/predicates.md                |   5 +
> > >   gcc/config/riscv/riscv.md                     |  75 +++++++--
> > >   .../gcc.target/riscv/shift-shift-6.c          |  14 ++
> > >   .../gcc.target/riscv/shift-shift-7.c          |  16 ++
> > >   .../gcc.target/riscv/shift-shift-8.c          |  20 +++
> > >   .../gcc.target/riscv/shift-shift-9.c          |  15 ++
> > >   gcc/testsuite/gcc.target/riscv/snez.c         |  14 ++
> > >   8 files changed, 261 insertions(+), 40 deletions(-)
> > >   create mode 100644 gcc/testsuite/gcc.target/riscv/shift-shift-6.c
> > >   create mode 100644 gcc/testsuite/gcc.target/riscv/shift-shift-7.c
> > >   create mode 100644 gcc/testsuite/gcc.target/riscv/shift-shift-8.c
> > >   create mode 100644 gcc/testsuite/gcc.target/riscv/shift-shift-9.c
> > >   create mode 100644 gcc/testsuite/gcc.target/riscv/snez.c
> > >
> > > diff --git a/gcc/config/riscv/bitmanip.md b/gcc/config/riscv/bitmanip.md
> > > index 78fdf02c2ec..06126ac4819 100644
> > > --- a/gcc/config/riscv/bitmanip.md
> > > +++ b/gcc/config/riscv/bitmanip.md
> > > @@ -29,7 +29,20 @@
> > >     [(set_attr "type" "bitmanip,load")
> > >      (set_attr "mode" "DI")])
> > >
> > > -(define_insn "riscv_shNadd<X:mode>"
> > > +;; We may end up forming a slli.uw with an immediate of 0 (while
> > > +;; splitting through "*slli_slli_uw", below).
> > > +;; Match this back to a zext.w
> > > +(define_insn "*zext.w"
> > > +  [(set (match_operand:DI 0 "register_operand" "=r")
> > > +     (and:DI (ashift:DI (match_operand:DI 1 "register_operand" "r")
> > > +                        (const_int 0))
> > > +             (const_int 4294967295)))]
> > > +  "TARGET_64BIT && TARGET_ZBA"
> > > +  "zext.w\t%0,%1"
> > > +  [(set_attr "type" "bitmanip")
> > > +   (set_attr "mode" "DI")])
> >
> > Would it be better to detect that we're going to create a shift count of
> > zero  and a -1 mask and emit RTL for a SI->DI zero extension directly
> > rather than having this pattern?
>
> This is an attempt to use the RTL template in the slli_slli_uw insn-and-split.
> Of course, we can emit the pattern directly ... it is a question of
> what is more readable (which may come down to personal preference).
>
> Let's try it for the next version and we can still go back to what we had…
>
> > It overall looks sensible -- I didn't check all the conditions and such,
> > just the overall structure.
> >
> >
> > Jeff
> >
> >

Reply via email to