This patch addresses a code quality regression on x86_64 related to
PR 123236. That original PR (and the related PR 101266) concern tree
level optimizations, where this problem should also be fixed, but it
also reveals a regression in the RTL optimizers.
A motivating test case (on x86_64) is:
unsigned int foo(unsigned int a) {
unsigned long long t = a;
return t >> 4;
}
which with -O2 currently generates (2 64-bit shifts):
foo: movq %rdi, %rax
salq $32, %rax
shrq $36, %rax
ret
with this patch, we now generate (1 32-bit shift):
foo: movl %edi, %eax
shrl $4, %eax
ret
which matches what GCC generated prior to GCC 11.
Likewise, for the signed (SIGN_EXTRACT/SIGN_EXTEND) case:
int bar(int a) {
long long t = a;
return t >> 4;
}
Before:
bar: movslq %edi, %rax
sarq $4, %rax
ret
After:
bar: movl %edi, %eax
sarl $4, %eax
ret
The underlying cause of the RTL-level regression is that some
RTL expressions that were previously expressed as {ZERO,SIGN}_EXTEND
are now sometimes represented as the equivalent {ZERO,SIGN}_EXTRACT,
and that not all simplifications of TRUNCATE({ZERO,SIGN}_EXTEND) are
implemented for TRUNCATE({ZERO,SIGN}_EXTRACT).
Thanks to Segher, simplify_rtx does handle some truncations of extracts,
see https://gcc.gnu.org/pipermail/gcc-patches/2016-November/463629.html
but unfortunately this code (and subsequent tweaks) doesn't quite match
the cases that we care about.
Previously:
Trying 6, 7 -> 8:
6: r103:DI=sign_extend(r105:SI)
REG_DEAD r105:SI
7: {r104:DI=r103:DI>>0x4;clobber flags:CC;}
REG_DEAD r103:DI
REG_UNUSED flags:CC
8: r102:SI=r104:DI#0
REG_DEAD r104:DI
Failed to match this instruction:
(set (subreg:DI (reg:SI 102 [ _4 ]) 0)
(zero_extend:DI (subreg:SI (sign_extract:DI (reg:SI 105 [ aD.2962 ])
(const_int 28 [0x1c])
(const_int 4 [0x4])) 0)))
With the tweaks below, this becomes:
Trying 6, 7 -> 8:
6: r103:DI=sign_extend(r105:SI)
REG_DEAD r105:SI
7: {r104:DI=r103:DI>>0x4;clobber flags:CC;}
REG_DEAD r103:DI
REG_UNUSED flags:CC
8: r102:SI=r104:DI#0
REG_DEAD r104:DI
Successfully matched this instruction:
(set (reg:SI 102 [ _4 ])
(ashiftrt:SI (reg:SI 105 [ aD.2962 ])
(const_int 4 [0x4])))
allowing combination of insns 6, 7 and 8
original costs 4 + 4 + 4 = 12
replacement cost 4
deferring deletion of insn with uid = 7.
deferring deletion of insn with uid = 6.
modifying insn i3 8: {r102:SI=r105:SI>>0x4;clobber flags:CC;}
REG_UNUSED flags:CC
REG_DEAD r105:SI
deferring rescan insn with uid = 8.
This requires two minor tweaks. The first is that the paradoxical
subreg assignment (set (subreg:DI (reg:SI 102) 0) (expr:DI)) can
(sometimes) be simplified to (set (reg:SI 102) (truncate:SI (expr:DI))
especially if the (truncate:SI (expr:DI)) can itself be simplified.
The second is that (truncate:SI (sign_extract:DI (reg:SI 102) x y))
slips through the existing simplify_truncation transformations as the
the {zero,sign}_extract has a different mode to its first operand.
This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
and make -k check, both with and without --target_board=unix{-m32},
with no new failures. Ok for mainline?
2026-01-09 Roger Sayle <[email protected]>
gcc/ChangeLog
PR rtl-optimization/123236
* combine.cc (simplify_set): Attempt to simplify SETs where the
destination is a low-part paradoxical subreg between scalar integer
modes.
* simplify-rtx.cc (simplify_context::simplify_truncation): Handle
cases where a ZERO_EXTRACT or SIGN_EXTRACT has a different mode
to (at least as wide as) its first operand.
gcc/testsuite/ChangeLog
PR rtl-optimization/123236
* gcc.target/i386/pr123236.c: New test case.
Thanks in advance,
Roger
--
diff --git a/gcc/combine.cc b/gcc/combine.cc
index 66963222efc..b0c930ae5f5 100644
--- a/gcc/combine.cc
+++ b/gcc/combine.cc
@@ -7066,6 +7066,27 @@ simplify_set (rtx x)
src = SET_SRC (x), dest = SET_DEST (x);
}
+ /* If we have (set (subreg:M (reg:N DEST) SRC) with M wider than N,
+ we have a paradoxical subreg destination, such as created by the
+ clause above, check if we can simplify (subreg:N SRC) to eliminate
+ the (paradoxical) subreg. */
+ else if (GET_CODE (dest) == SUBREG
+ && subreg_lowpart_p (dest)
+ && paradoxical_subreg_p (dest)
+ && SCALAR_INT_MODE_P (mode)
+ && SCALAR_INT_MODE_P (GET_MODE (SUBREG_REG (dest))))
+ {
+ rtx tmp = simplify_gen_unary (TRUNCATE, GET_MODE (SUBREG_REG (dest)),
+ src, mode);
+ if (tmp)
+ {
+ SUBST (SET_DEST (x), SUBREG_REG (dest));
+ SUBST (SET_SRC (x), tmp);
+ dest = SET_DEST (x);
+ src = SET_SRC (x);
+ }
+ }
+
/* If we have (set FOO (subreg:M (mem:N BAR) 0)) with M wider than N, this
would require a paradoxical subreg. Replace the subreg with a
zero_extend to avoid the reload that would otherwise be required.
diff --git a/gcc/simplify-rtx.cc b/gcc/simplify-rtx.cc
index 8016e02e925..e27ca8be5eb 100644
--- a/gcc/simplify-rtx.cc
+++ b/gcc/simplify-rtx.cc
@@ -727,12 +727,10 @@ simplify_context::simplify_truncation (machine_mode mode,
rtx op,
}
}
- /* Turn (truncate:M1 (*_extract:M2 (reg:M2) (len) (pos))) into
- (*_extract:M1 (truncate:M1 (reg:M2)) (len) (pos')) if possible without
- changing len. */
+ /* Turn (truncate:M1 (*_extract:M2 (reg:M3) (len) (pos))) into
+ (*_extract:M1 (truncate:M1 (reg:M3)) (len) (pos')) if possible. */
if ((GET_CODE (op) == ZERO_EXTRACT || GET_CODE (op) == SIGN_EXTRACT)
- && REG_P (XEXP (op, 0))
- && GET_MODE (XEXP (op, 0)) == GET_MODE (op)
+ && precision <= GET_MODE_UNIT_PRECISION (GET_MODE (XEXP (op, 0)))
&& CONST_INT_P (XEXP (op, 1))
&& CONST_INT_P (XEXP (op, 2)))
{
@@ -741,7 +739,8 @@ simplify_context::simplify_truncation (machine_mode mode,
rtx op,
unsigned HOST_WIDE_INT pos = UINTVAL (XEXP (op, 2));
if (BITS_BIG_ENDIAN && pos >= op_precision - precision)
{
- op0 = simplify_gen_unary (TRUNCATE, mode, op0, GET_MODE (op0));
+ if (GET_MODE (op0) != mode)
+ op0 = simplify_gen_unary (TRUNCATE, mode, op0, GET_MODE (op0));
if (op0)
{
pos -= op_precision - precision;
@@ -751,7 +750,8 @@ simplify_context::simplify_truncation (machine_mode mode,
rtx op,
}
else if (!BITS_BIG_ENDIAN && precision >= len + pos)
{
- op0 = simplify_gen_unary (TRUNCATE, mode, op0, GET_MODE (op0));
+ if (GET_MODE (op0) != mode)
+ op0 = simplify_gen_unary (TRUNCATE, mode, op0, GET_MODE (op0));
if (op0)
return simplify_gen_ternary (GET_CODE (op), mode, mode, op0,
XEXP (op, 1), XEXP (op, 2));
diff --git a/gcc/testsuite/gcc.target/i386/pr123236.c
b/gcc/testsuite/gcc.target/i386/pr123236.c
new file mode 100644
index 00000000000..b369b6fd299
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr123236.c
@@ -0,0 +1,18 @@
+/* { dg-do compile { target { ! ia32 } } } */
+/* { dg-options "-O2" } */
+
+unsigned int foo(unsigned int a) {
+ unsigned long long t = a;
+ return t >> 4;
+}
+
+int bar(int a) {
+ long long t = a;
+ return t >> 4;
+}
+
+/* { dg-final { scan-assembler-not "movq" } } */
+/* { dg-final { scan-assembler-not "salq" } } */
+/* { dg-final { scan-assembler-not "shrq" } } */
+/* { dg-final { scan-assembler-not "movslq" } } */
+/* { dg-final { scan-assembler-not "sarq" } } */