In pr123010 we have a case where we should be getting a single slliw, but instead we get a 3-insn sequence.

As noted in the BZ we had this before combine:

(insn 6 3 13 2 (set (reg:DI 137)
         (sign_extend:DI (ashift:SI (subreg/s/u:SI (reg/v:DI 135 [ a ]) 0)
                 (const_int 1 [0x1])))) "j.c":10:14 312 {ashlsi3_extend}
      (expr_list:REG_DEAD (reg/v:DI 135 [ a ])
         (nil)))
Which is exactly what we want.  That's a single slliw instruction.  THen combine generates this:


(insn 6 3 13 2 (parallel [
             (set (reg:DI 137)
                 (sign_extract:DI (reg:DI 139 [ a ])
                     (const_int 31 [0x1f])
                     (const_int 0 [0])))
             (clobber (scratch:DI))
         ]) "j.c":10:14 333 {*extractdi3}
      (expr_list:REG_DEAD (reg:DI 139 [ a ])
         (nil)))
(insn 13 6 14 2 (set (reg/i:DI 10 a0)
         (ashift:DI (reg:DI 137)
             (const_int 1 [0x1]))) "j.c":11:1 297 {ashldi3}
      (expr_list:REG_DEAD (reg:DI 137)
         (nil)))
Which is due to a define_insn_and_split mis-behaving a bit.

The first approach was to define an insn for the case where we left shift a sign extracted bitfield where the sign bit of the bitfield gets shifted into bit 31.  Theory being this might be a reasonably common occurrence and having a pattern for it might be useful (and there's a similar pattern one could write for a small number of zero extended fields getting shifted left as well).  That turns out to be a problem though as the sign extension is obfuscated making it harder to track the state of the sign bits and thus harder to eliminate later sign extensions.  Those regressions can be fixed, but doing so requires the revamp of the sign/zero extension patterns to eliminate several of the define_insn_and_splits.  Larger change than I really want to do right now.


We could also throttle back the most problematic define_insn_and_split.  It's likely viable, though probably not the best use of time given my desire to clean up the define_insn_and_splits, including this one.

We can also recognize this case and simplify it.  Essentially when we have (ashift (sign_extract ...)) recognize the case when we're extracting a bitfield starting at bit 0 and the bit field is shifted such that the sign bit of the bitfield moves into bit position 7, 15 or 31.  In that case we can simplify it to (sign_extend (ashift ...))

This patch takes the last of those three approaches.  Bootstrapped and regression tested on x86_64, and riscv (BPI and Pioneer) as well as going through all the embedded targets without regressions.  I was somewhat worried about loongarch due to a pattern in its machine description, but the two tests added with that pattern still pass, so it seems OK too.


Pushing to the trunk.

Jeff

commit bde7b018d4557538cdc8bef57ab073f3ec0d25b8
Author: Jeff Law <[email protected]>
Date:   Sun Jan 4 09:30:01 2026 -0700

    [PR target/123010] Simplify shift of sign extracted field to a sign 
extending shift
    
    In pr123010 we have a case where we should be getting a single slliw, but
    instead we get a 3-insn sequence.
    
    As noted in the BZ we had this before combine:
    
    > (insn 6 3 13 2 (set (reg:DI 137)
    >         (sign_extend:DI (ashift:SI (subreg/s/u:SI (reg/v:DI 135 [ a ]) 0)
    >                 (const_int 1 [0x1])))) "j.c":10:14 312 {ashlsi3_extend}
    >      (expr_list:REG_DEAD (reg/v:DI 135 [ a ])
    >         (nil)))
    Which is exactly what we want.  That's a single slliw instruction.  THen
    combine generates this:
    
    > (insn 6 3 13 2 (parallel [
    >             (set (reg:DI 137)
    >                 (sign_extract:DI (reg:DI 139 [ a ])
    >                     (const_int 31 [0x1f])
    >                     (const_int 0 [0])))
    >             (clobber (scratch:DI))
    >         ]) "j.c":10:14 333 {*extractdi3}
    >      (expr_list:REG_DEAD (reg:DI 139 [ a ])
    >         (nil)))
    > (insn 13 6 14 2 (set (reg/i:DI 10 a0)
    >         (ashift:DI (reg:DI 137)
    >             (const_int 1 [0x1]))) "j.c":11:1 297 {ashldi3}
    >      (expr_list:REG_DEAD (reg:DI 137)
    >         (nil)))
    Which is due to a define_insn_and_split mis-behaving a bit.
    
    The first approach was to define an insn for the case where we left shift a
    sign extracted bitfield where the sign bit of the bitfield gets shifted into
    bit 31.  Theory being this might be a reasonably common occurrence and 
having a
    pattern for it might be useful (and there's a similar pattern one could 
write
    for a small number of zero extended fields getting shifted left as well).
    That turns out to be a problem though as the sign extension is obfuscated
    making it harder to track the state of the sign bits and thus harder to
    eliminate later sign extensions.  Those regressions can be fixed, but doing 
so
    requires the revamp of the sign/zero extension patterns to eliminate 
several of
    the define_insn_and_splits.  Larger change than I really want to do right 
now.
    
    We could also throttle back the most problematic define_insn_and_split.  
It's
    likely viable, though probably not the best use of time given my desire to
    clean up the define_insn_and_splits, including this one.
    
    We can also recognize this case and simplify it.  Essentially when we have
    (ashift (sign_extract ...)) recognize the case when we're extracting a 
bitfield
    starting at bit 0 and the bit field is shifted such that the sign bit of the
    bitfield moves into bit position 7, 15 or 31.  In that case we can simplify 
it
    to (sign_extend (ashift ...))
    
    This patch takes the last of those three approaches.  Bootstrapped and
    regression tested on x86_64, and riscv (BPI and Pioneer) as well as going
    through all the embedded targets without regressions.  I was somewhat 
worried
    about loongarch due to a pattern in its machine description, but the two 
tests
    added with that pattern still pass, so it seems OK too.
    
            PR target/123010
    gcc/
            * simplify-rtx.cc (simplify_binary_operation_1, case ASHIFT): 
Simplify
            case where a left shift of the sign extracted field can be turned 
into
            a sign extension of a left shift.
    
    gcc/testsuite
            * gcc.target/riscv/pr123010.c: New test.

diff --git a/gcc/simplify-rtx.cc b/gcc/simplify-rtx.cc
index b9843bd3078..75655787202 100644
--- a/gcc/simplify-rtx.cc
+++ b/gcc/simplify-rtx.cc
@@ -4793,6 +4793,43 @@ simplify_ashift:
          rtx new_op1 = immed_wide_int_const (c, int_mode);
          return simplify_gen_binary (MULT, int_mode, op0, new_op1);
        }
+
+      /* If we're shifting left a signed bitfield extraction and the
+        shift count + bitfield size is a natural integral mode and
+        the field starts at offset 0 (counting from the LSB), then
+        this can be simplified to a sign extension of a left shift.
+
+        Some ISAs (RISC-V 64-bit) have inherent support for such
+        instructions and it's better for various optimizations to
+        express as a SIGN_EXTEND rather than a shifted SIGN_EXTRACT.  */
+      if (GET_CODE (op0) == SIGN_EXTRACT
+         && REG_P (XEXP (op0, 0))
+         /* The size of the bitfield, the location of the bitfield and
+            shift count must be CONST_INTs.  */
+         && CONST_INT_P (op1)
+         && CONST_INT_P (XEXP (op0, 1))
+         && CONST_INT_P (XEXP (op0, 2)))
+       {
+         int size = INTVAL (op1) + INTVAL (XEXP (op0, 1));
+         machine_mode smaller_mode;
+         /* Now we need to verify the size of the bitfield plus the shift
+            count is an integral mode and smaller than MODE.  This is
+            requirement for using SIGN_EXTEND.  We also need to verify the
+            field starts at bit location 0 and that the subreg lowpart also
+            starts at zero.  */
+         if (int_mode_for_size (size, size).exists (&smaller_mode)
+             && mode > smaller_mode
+             && (subreg_lowpart_offset (smaller_mode, mode).to_constant ()
+                 == UINTVAL (XEXP (op0, 2)))
+             && XEXP (op0, 2) == CONST0_RTX (mode))
+           {
+             /* Everything passed.  So we just need to get the subreg of the
+                original input, shift it and sign extend the result.  */
+             rtx op = gen_lowpart (smaller_mode, XEXP (op0, 0));
+             rtx x = gen_rtx_ASHIFT (smaller_mode, op, op1);
+             return gen_rtx_SIGN_EXTEND (mode, x);
+           }
+       }
       goto canonicalize_shift;
 
     case LSHIFTRT:
diff --git a/gcc/testsuite/gcc.target/riscv/pr123010.c 
b/gcc/testsuite/gcc.target/riscv/pr123010.c
new file mode 100644
index 00000000000..f7fe5a4484d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/pr123010.c
@@ -0,0 +1,17 @@
+/* { dg-do compile { target rv64 } } */
+/* { dg-options "-O3 -march=rv64gc -mabi=lp64" { target rv64 } } */
+#include <stdint-gcc.h>
+
+#define N 2
+
+uint32_t mulu(uint32_t a) {
+    return a * N;
+}
+
+int32_t muls(int32_t a) {
+    return a * N;
+}
+
+/* { dg-final { scan-assembler-times "slliw\t" 2 { target rv64 } } } */
+/* { dg-final { scan-assembler-not "sr\[al\]i\t" } } */
+

Reply via email to