On 11/9/22 16:10, Philipp Tomsich wrote:
The current method of treating shifts of extended values on RISC-V
frequently causes sequences of 3 shifts, despite the presence of the
'zero_extendsidi2_shifted' pattern.

Consider:
     unsigned long f(unsigned int a, unsigned long b)
     {
             a = a << 1;
             unsigned long c = (unsigned long) a;
             c = b + (c<<4);
             return c;
     }
which will present at combine-time as:
     Trying 7, 8 -> 9:
         7: r78:SI=r81:DI#0<<0x1
           REG_DEAD r81:DI
         8: r79:DI=zero_extend(r78:SI)
           REG_DEAD r78:SI
         9: r72:DI=r79:DI<<0x4
           REG_DEAD r79:DI
     Failed to match this instruction:
     (set (reg:DI 72 [ _1 ])
         (and:DI (ashift:DI (reg:DI 81)
                 (const_int 5 [0x5]))
        (const_int 68719476704 [0xfffffffe0])))
and produce the following (optimized) assembly:
     f:
        slliw   a5,a0,1
        slli    a5,a5,32
        srli    a5,a5,28
        add     a0,a5,a1
        ret

The current way of handling this (in 'zero_extendsidi2_shifted')
doesn't apply for two reasons:
- this is seen before reload, and
- (more importantly) the constant mask is not 0xfffffffful.

To address this, we introduce a generalized version of shifting
zero-extended values that supports any mask of consecutive ones as
long as the number of training zeros is the inner shift-amount.

With this new split, we generate the following assembly for the
aforementioned function:
     f:
        slli    a0,a0,33
        srli    a0,a0,28
        add     a0,a0,a1
        ret

Unfortunately, all of this causes some fallout (especially in how it
interacts with Zb* extensions and zero_extract expressions formed
during combine): this is addressed through additional instruction
splitting and handling of zero_extract.

gcc/ChangeLog:

        * config/riscv/bitmanip.md (*zext.w): Match a zext.w expressed
         as an and:DI.
        (*andi_add.uw): New pattern.
        (*slli_slli_uw): New pattern.
        (*shift_then_shNadd.uw): New pattern.
        (*slliuw): Rename to riscv_slli_uw.
        (riscv_slli_uw): Renamed from *slliuw.
        (*zeroextract<GPR:mode><ANYI:mode>2_highbits): New pattern.
        (*zero_extract<GPR:mode>): New pattern, which will be split to
        shift-left + shift-right.
        * config/riscv/predicates.md (dimode_shift_operand):
        * config/riscv/riscv.md (*zero_extract<GPR:mode>_lowbits):
        (zero_extendsidi2_shifted): Rename.
        (*zero_extendsidi2_shifted): Generalize.
        (*shift<GPR:MODE>_truthvalue): New pattern.

gcc/testsuite/ChangeLog:

        * gcc.target/riscv/shift-shift-6.c: New test.
        * gcc.target/riscv/shift-shift-7.c: New test.
        * gcc.target/riscv/shift-shift-8.c: New test.
        * gcc.target/riscv/shift-shift-9.c: New test.
        * gcc.target/riscv/snez.c: New test.

Commit notes:
- Depends on a predicate posted in "RISC-V: Optimize branches testing
   a bit-range or a shifted immediate".  Depending on the order of
   applying these, I'll take care to pull that part out of the other
   patch if needed.

Version-changes: 2
- refactor
- optimise for additional corner cases and deal with fallout

Version-changes: 3
- removed the [WIP] from the commit message (no other changes)

Signed-off-by: Philipp Tomsich <philipp.toms...@vrull.eu>
---

(no changes since v1)

  gcc/config/riscv/bitmanip.md                  | 142 ++++++++++++++----
  gcc/config/riscv/predicates.md                |   5 +
  gcc/config/riscv/riscv.md                     |  75 +++++++--
  .../gcc.target/riscv/shift-shift-6.c          |  14 ++
  .../gcc.target/riscv/shift-shift-7.c          |  16 ++
  .../gcc.target/riscv/shift-shift-8.c          |  20 +++
  .../gcc.target/riscv/shift-shift-9.c          |  15 ++
  gcc/testsuite/gcc.target/riscv/snez.c         |  14 ++
  8 files changed, 261 insertions(+), 40 deletions(-)
  create mode 100644 gcc/testsuite/gcc.target/riscv/shift-shift-6.c
  create mode 100644 gcc/testsuite/gcc.target/riscv/shift-shift-7.c
  create mode 100644 gcc/testsuite/gcc.target/riscv/shift-shift-8.c
  create mode 100644 gcc/testsuite/gcc.target/riscv/shift-shift-9.c
  create mode 100644 gcc/testsuite/gcc.target/riscv/snez.c

diff --git a/gcc/config/riscv/bitmanip.md b/gcc/config/riscv/bitmanip.md
index 78fdf02c2ec..06126ac4819 100644
--- a/gcc/config/riscv/bitmanip.md
+++ b/gcc/config/riscv/bitmanip.md
@@ -29,7 +29,20 @@
    [(set_attr "type" "bitmanip,load")
     (set_attr "mode" "DI")])
-(define_insn "riscv_shNadd<X:mode>"
+;; We may end up forming a slli.uw with an immediate of 0 (while
+;; splitting through "*slli_slli_uw", below).
+;; Match this back to a zext.w
+(define_insn "*zext.w"
+  [(set (match_operand:DI 0 "register_operand" "=r")
+       (and:DI (ashift:DI (match_operand:DI 1 "register_operand" "r")
+                          (const_int 0))
+               (const_int 4294967295)))]
+  "TARGET_64BIT && TARGET_ZBA"
+  "zext.w\t%0,%1"
+  [(set_attr "type" "bitmanip")
+   (set_attr "mode" "DI")])

Would it be better to detect that we're going to create a shift count of zero  and a -1 mask and emit RTL for a SI->DI zero extension directly rather than having this pattern?


It overall looks sensible -- I didn't check all the conditions and such, just the overall structure.


Jeff


Reply via email to