So this turned out to be a neat little test and while the fuzzer found
it on RISC-V, I wouldn't be surprised if the underlying issue is also
the root cause of the loongarch issue with ext-dce.
The key issue is that if we have something like
(set (dest) (any_extend (subreg (source))))
If the subreg object is marked with SUBREG_PROMOTED and the
sign/unsigned state matches the any_extend opcode, then combine (and I
guess anything using simplify-rtx) may simplify that to
(set (dest) (source))
That implies that bits outside the mode of the subreg are actually live
and valid. This needs to be accounted for during liveness computation.
We have to be careful here though. If we're too conservative about
setting additional bits live, then we'll inhibit the desired
optimization in the coremark examples. To do a good job we need to know
the extension opcode.
I'm extremely unhappy with how the use handling works in ext-dce. It
mixes different conceptual steps and has horribly complex control flow.
It only handles a subset of the unary/binary opcodes, etc etc. It's
just damn mess. It's going to need some more noodling around.
In the mean time this is a bit hacky in that it depends on non-obvious
behavior to know it can get the extension opcode, but I don't want to
leave the trunk in a broken state while I figure out the refactoring
problem.
Bootstrapped and regression tested on x86 and tested on the crosses.
Pushing to the trunk.
Jeff
commit 34fb0feca71f763b2fbe832548749666d34a4a76
Author: Jeff Law <j...@ventanamicro.com>
Date: Thu Jul 25 12:32:28 2024 -0600
[PR rtl-optimization/116039] Fix life computation for promoted subregs
So this turned out to be a neat little test and while the fuzzer found it on
RISC-V, I wouldn't be surprised if the underlying issue is also the root
cause
of the loongarch issue with ext-dce.
The key issue is that if we have something like
(set (dest) (any_extend (subreg (source))))
If the subreg object is marked with SUBREG_PROMOTED and the sign/unsigned
state
matches the any_extend opcode, then combine (and I guess anything using
simplify-rtx) may simplify that to
(set (dest) (source))
That implies that bits outside the mode of the subreg are actually live and
valid. This needs to be accounted for during liveness computation.
We have to be careful here though. If we're too conservative about setting
additional bits live, then we'll inhibit the desired optimization in the
coremark examples. To do a good job we need to know the extension opcode.
I'm extremely unhappy with how the use handling works in ext-dce. It mixes
different conceptual steps and has horribly complex control flow. It only
handles a subset of the unary/binary opcodes, etc etc. It's just damn mess.
It's going to need some more noodling around.
In the mean time this is a bit hacky in that it depends on non-obvious
behavior
to know it can get the extension opcode, but I don't want to leave the
trunk in
a broken state while I figure out the refactoring problem.
Bootstrapped and regression tested on x86 and tested on the crosses.
Pushing to the trunk.
PR rtl-optimization/116039
gcc/
* ext-dce.cc (ext_dce_process_uses): Add some comments about
concerns
with current code. Mark additional bit groups as live when we have
an extension of a suitably promoted subreg.
gcc/testsuite
* gcc.dg/torture/pr116039.c: New test.
diff --git a/gcc/ext-dce.cc b/gcc/ext-dce.cc
index c94d1fc3414..14f163a01d6 100644
--- a/gcc/ext-dce.cc
+++ b/gcc/ext-dce.cc
@@ -667,6 +667,12 @@ ext_dce_process_uses (rtx_insn *insn, rtx obj,
if (modify && !skipped_dest && (dst_mask & ~src_mask) == 0)
ext_dce_try_optimize_insn (insn, x);
+ /* Stripping the extension here just seems wrong on multiple
+ levels. It's source side handling, so it seems like it
+ belongs in the loop below. Stripping here also makes it
+ harder than necessary to properly handle live bit groups
+ for (ANY_EXTEND (SUBREG)) where the SUBREG has
+ SUBREG_PROMOTED state. */
dst_mask &= src_mask;
src = XEXP (src, 0);
code = GET_CODE (src);
@@ -674,8 +680,8 @@ ext_dce_process_uses (rtx_insn *insn, rtx obj,
/* Optimization is done at this point. We just want to make
sure everything that should get marked as live is marked
- from here onward. */
-
+ from here onward. Shouldn't the backpropagate step happen
+ before optimization? */
dst_mask = carry_backpropagate (dst_mask, code, src);
/* We will handle the other operand of a binary operator
@@ -688,7 +694,11 @@ ext_dce_process_uses (rtx_insn *insn, rtx obj,
/* We're inside a SET and want to process the source operands
making things live. Breaking from this loop will cause
the iterator to work on sub-rtxs, so it is safe to break
- if we see something we don't know how to handle. */
+ if we see something we don't know how to handle.
+
+ This code is just hokey as it really just handles trivial
+ unary and binary cases. Otherwise the loop exits and we
+ continue iterating on sub-rtxs, but outside the set context.
*/
unsigned HOST_WIDE_INT save_mask = dst_mask;
for (;;)
{
@@ -704,10 +714,26 @@ ext_dce_process_uses (rtx_insn *insn, rtx obj,
y = XEXP (y, 0);
else if (SUBREG_P (y) && SUBREG_BYTE (y).is_constant ())
{
- /* For anything but (subreg (reg)), break the inner loop
- and process normally (conservatively). */
- if (!REG_P (SUBREG_REG (y)))
+ /* We really want to know the outer code here, ie do we
+ have (ANY_EXTEND (SUBREG ...)) as we need to know if
+ the extension matches the SUBREG_PROMOTED state. In
+ that case optimizers can turn the extension into a
+ simple copy. Which means that bits outside the
+ SUBREG's mode are actually live.
+
+ We don't want to mark those bits live unnecessarily
+ as that inhibits extension elimination in important
+ cases such as those in Coremark. So we need that
+ outer code. */
+ if (!REG_P (SUBREG_REG (y))
+ || (SUBREG_PROMOTED_VAR_P (y)
+ && ((GET_CODE (SET_SRC (x)) == SIGN_EXTEND
+ && SUBREG_PROMOTED_SIGNED_P (y))
+ || (GET_CODE (SET_SRC (x)) == ZERO_EXTEND
+ && SUBREG_PROMOTED_UNSIGNED_P (y)))))
break;
+
+ /* The SUBREG's mode determine the live width. */
bit = subreg_lsb (y).to_constant ();
if (dst_mask)
{
@@ -785,6 +811,11 @@ ext_dce_process_uses (rtx_insn *insn, rtx obj,
HOST_WIDE_INT size = GET_MODE_BITSIZE (GET_MODE (x)).to_constant ();
HOST_WIDE_INT rn = 4 * REGNO (SUBREG_REG (x));
+ /* If this is a promoted subreg, then more of it may be live than
+ is otherwise obvious. */
+ if (SUBREG_PROMOTED_VAR_P (x))
+ size = GET_MODE_BITSIZE (GET_MODE (SUBREG_REG (x))).to_constant ();
+
bitmap_set_bit (livenow, rn);
if (size > 8)
bitmap_set_bit (livenow, rn + 1);
diff --git a/gcc/testsuite/gcc.dg/torture/pr116039.c
b/gcc/testsuite/gcc.dg/torture/pr116039.c
new file mode 100644
index 00000000000..d67b9326de7
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/torture/pr116039.c
@@ -0,0 +1,20 @@
+/* { dg-do run } */
+/* { dg-additional-options "-fsigned-char -fno-strict-aliasing -fwrapv" } */
+
+extern void abort (void);
+
+int c[12];
+char d[12];
+int *f = c;
+int *z = (int *)1;
+long long y;
+int main() {
+ c[9] = 0xff;
+ for (int i = 0; i < 12; i += 3)
+ d[9] = z ? f[i] : 0;
+ for (long i = 0; i < 12; ++i)
+ y ^= d[i];
+ if (y != -1)
+ abort ();
+ return 0;
+}