This patch addresses a code quality regression on x86_64 related to
PR 123236.  That original PR (and the related PR 101266) concern tree
level optimizations, where this problem should also be fixed, but it
also reveals a regression in the RTL optimizers.

A motivating test case (on x86_64) is:

unsigned int foo(unsigned int a) {
  unsigned long long t = a;
  return t >> 4;
}

which with -O2 currently generates (2 64-bit shifts):

foo:    movq    %rdi, %rax
        salq    $32, %rax
        shrq    $36, %rax
        ret

with this patch, we now generate (1 32-bit shift):

foo:    movl    %edi, %eax
        shrl    $4, %eax
        ret

which matches what GCC generated prior to GCC 11.
Likewise, for the signed (SIGN_EXTRACT/SIGN_EXTEND) case:

int bar(int a) {
  long long t = a;
  return t >> 4;
}

Before:
bar:    movslq  %edi, %rax
        sarq    $4, %rax
        ret

After:
bar:    movl    %edi, %eax
        sarl    $4, %eax
        ret

The underlying cause of the RTL-level regression is that some
RTL expressions that were previously expressed as {ZERO,SIGN}_EXTEND
are now sometimes represented as the equivalent {ZERO,SIGN}_EXTRACT,
and that not all simplifications of TRUNCATE({ZERO,SIGN}_EXTEND) are
implemented for TRUNCATE({ZERO,SIGN}_EXTRACT).

Thanks to Segher, simplify_rtx does handle some truncations of extracts,
see https://gcc.gnu.org/pipermail/gcc-patches/2016-November/463629.html
but unfortunately this code (and subsequent tweaks) doesn't quite match
the cases that we care about.

Previously:
Trying 6, 7 -> 8:
    6: r103:DI=sign_extend(r105:SI)
      REG_DEAD r105:SI
    7: {r104:DI=r103:DI>>0x4;clobber flags:CC;}
      REG_DEAD r103:DI
      REG_UNUSED flags:CC
    8: r102:SI=r104:DI#0
      REG_DEAD r104:DI
Failed to match this instruction:
(set (subreg:DI (reg:SI 102 [ _4 ]) 0)
    (zero_extend:DI (subreg:SI (sign_extract:DI (reg:SI 105 [ aD.2962 ])
                (const_int 28 [0x1c])
                (const_int 4 [0x4])) 0)))

With the tweaks below, this becomes:
Trying 6, 7 -> 8:
    6: r103:DI=sign_extend(r105:SI)
      REG_DEAD r105:SI
    7: {r104:DI=r103:DI>>0x4;clobber flags:CC;}
      REG_DEAD r103:DI
      REG_UNUSED flags:CC
    8: r102:SI=r104:DI#0
      REG_DEAD r104:DI
Successfully matched this instruction:
(set (reg:SI 102 [ _4 ])
    (ashiftrt:SI (reg:SI 105 [ aD.2962 ])
        (const_int 4 [0x4])))
allowing combination of insns 6, 7 and 8
original costs 4 + 4 + 4 = 12
replacement cost 4
deferring deletion of insn with uid = 7.
deferring deletion of insn with uid = 6.
modifying insn i3     8: {r102:SI=r105:SI>>0x4;clobber flags:CC;}
      REG_UNUSED flags:CC
      REG_DEAD r105:SI
deferring rescan insn with uid = 8.

This requires two minor tweaks.  The first is that the paradoxical
subreg assignment (set (subreg:DI (reg:SI 102) 0) (expr:DI)) can
(sometimes) be simplified to (set (reg:SI 102) (truncate:SI (expr:DI))
especially if the (truncate:SI (expr:DI)) can itself be simplified.
The second is that (truncate:SI (sign_extract:DI (reg:SI 102) x y))
slips through the existing simplify_truncation transformations as the
the {zero,sign}_extract has a different mode to its first operand.

This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
and make -k check, both with and without --target_board=unix{-m32},
with no new failures.  Ok for mainline?

2026-01-09  Roger Sayle  <[email protected]>

gcc/ChangeLog
        PR rtl-optimization/123236
        * combine.cc (simplify_set): Attempt to simplify SETs where the
        destination is a low-part paradoxical subreg between scalar integer
        modes.
        * simplify-rtx.cc (simplify_context::simplify_truncation): Handle
        cases where a ZERO_EXTRACT or SIGN_EXTRACT has a different mode
        to (at least as wide as) its first operand.

gcc/testsuite/ChangeLog
        PR rtl-optimization/123236
        * gcc.target/i386/pr123236.c: New test case.


Thanks in advance,
Roger
--

diff --git a/gcc/combine.cc b/gcc/combine.cc
index 66963222efc..b0c930ae5f5 100644
--- a/gcc/combine.cc
+++ b/gcc/combine.cc
@@ -7066,6 +7066,27 @@ simplify_set (rtx x)
       src = SET_SRC (x), dest = SET_DEST (x);
     }
 
+  /* If we have (set (subreg:M (reg:N DEST) SRC) with M wider than N,
+     we have a paradoxical subreg destination, such as created by the
+     clause above, check if we can simplify (subreg:N SRC) to eliminate
+     the (paradoxical) subreg.  */
+  else if (GET_CODE (dest) == SUBREG
+          && subreg_lowpart_p (dest)
+          && paradoxical_subreg_p (dest)
+          && SCALAR_INT_MODE_P (mode)
+          && SCALAR_INT_MODE_P (GET_MODE (SUBREG_REG (dest))))
+    {
+      rtx tmp = simplify_gen_unary (TRUNCATE, GET_MODE (SUBREG_REG (dest)),
+                                   src, mode);
+      if (tmp)
+       {
+         SUBST (SET_DEST (x), SUBREG_REG (dest));
+         SUBST (SET_SRC (x), tmp);
+         dest = SET_DEST (x);
+         src = SET_SRC (x);
+       }
+    }
+
   /* If we have (set FOO (subreg:M (mem:N BAR) 0)) with M wider than N, this
      would require a paradoxical subreg.  Replace the subreg with a
      zero_extend to avoid the reload that would otherwise be required.
diff --git a/gcc/simplify-rtx.cc b/gcc/simplify-rtx.cc
index 8016e02e925..e27ca8be5eb 100644
--- a/gcc/simplify-rtx.cc
+++ b/gcc/simplify-rtx.cc
@@ -727,12 +727,10 @@ simplify_context::simplify_truncation (machine_mode mode, 
rtx op,
        }
     }
 
-  /* Turn (truncate:M1 (*_extract:M2 (reg:M2) (len) (pos))) into
-     (*_extract:M1 (truncate:M1 (reg:M2)) (len) (pos')) if possible without
-     changing len.  */
+  /* Turn (truncate:M1 (*_extract:M2 (reg:M3) (len) (pos))) into
+     (*_extract:M1 (truncate:M1 (reg:M3)) (len) (pos')) if possible.  */
   if ((GET_CODE (op) == ZERO_EXTRACT || GET_CODE (op) == SIGN_EXTRACT)
-      && REG_P (XEXP (op, 0))
-      && GET_MODE (XEXP (op, 0)) == GET_MODE (op)
+      && precision <= GET_MODE_UNIT_PRECISION (GET_MODE (XEXP (op, 0)))
       && CONST_INT_P (XEXP (op, 1))
       && CONST_INT_P (XEXP (op, 2)))
     {
@@ -741,7 +739,8 @@ simplify_context::simplify_truncation (machine_mode mode, 
rtx op,
       unsigned HOST_WIDE_INT pos = UINTVAL (XEXP (op, 2));
       if (BITS_BIG_ENDIAN && pos >= op_precision - precision)
        {
-         op0 = simplify_gen_unary (TRUNCATE, mode, op0, GET_MODE (op0));
+         if (GET_MODE (op0) != mode)
+           op0 = simplify_gen_unary (TRUNCATE, mode, op0, GET_MODE (op0));
          if (op0)
            {
              pos -= op_precision - precision;
@@ -751,7 +750,8 @@ simplify_context::simplify_truncation (machine_mode mode, 
rtx op,
        }
       else if (!BITS_BIG_ENDIAN && precision >= len + pos)
        {
-         op0 = simplify_gen_unary (TRUNCATE, mode, op0, GET_MODE (op0));
+         if (GET_MODE (op0) != mode)
+           op0 = simplify_gen_unary (TRUNCATE, mode, op0, GET_MODE (op0));
          if (op0)
            return simplify_gen_ternary (GET_CODE (op), mode, mode, op0,
                                         XEXP (op, 1), XEXP (op, 2));
diff --git a/gcc/testsuite/gcc.target/i386/pr123236.c 
b/gcc/testsuite/gcc.target/i386/pr123236.c
new file mode 100644
index 00000000000..b369b6fd299
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr123236.c
@@ -0,0 +1,18 @@
+/* { dg-do compile { target { ! ia32 } } } */
+/* { dg-options "-O2" } */
+
+unsigned int foo(unsigned int a) {
+  unsigned long long t = a;
+  return t >> 4;
+}
+
+int bar(int a) {
+  long long t = a;
+  return t >> 4;
+}
+
+/* { dg-final { scan-assembler-not "movq" } } */
+/* { dg-final { scan-assembler-not "salq" } } */
+/* { dg-final { scan-assembler-not "shrq" } } */
+/* { dg-final { scan-assembler-not "movslq" } } */
+/* { dg-final { scan-assembler-not "sarq" } } */

Reply via email to