https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106594

--- Comment #8 from Roger Sayle <roger at nextmovesoftware dot com> ---
Time for a status update.  The PR title is a little misleading; sign-extensions
aren't really the problem, but it turns out that the equivalent zero-extensions
aren't always optimized as well as the equivalent sign-extensions (due to
alternate possible RTL representations), even on symmetric targets like
aarch64.

The following patch solves the underlying issue reported in the PR:

diff --git a/gcc/combine.cc b/gcc/combine.cc
index a5fabf3..d739cae 100644
--- a/gcc/combine.cc
+++ b/gcc/combine.cc
@@ -7288,7 +7288,17 @@ expand_compound_operation (rtx x)
          && (STORE_FLAG_VALUE & ~GET_MODE_MASK (inner_mode)) == 0)
        return SUBREG_REG (XEXP (x, 0));

+      /* If ZERO_EXTEND is cheap on this target, do nothing,
+        i.e. don't attempt to convert it to a pair of shifts.  */
+      if (set_src_cost (x, mode, optimize_this_for_speed_p)
+         <= COSTS_N_INSNS (1))
+       return x;
     }
+  /* Likewise, if SIGN_EXTEND is cheap, do nothing.  */
+  else if (GET_CODE (x) == SIGN_EXTEND
+          && set_src_cost (x, mode, optimize_this_for_speed_p)
+             <= COSTS_N_INSNS (1))
+    return x;

   /* If we reach here, we want to return a pair of shifts.  The inner
      shift is a left shift of BITSIZE - POS - LEN bits.  The outer


The explanation/motivation for this fix is in the code immediately above in
expand_compound_operation, that considers converting SIGN_EXTEND into
ZERO_EXTEND, but checks set_src_cost to confirm that it'd be cheaper than the
pair of shifts mentioned at the bottom.  By converting SIGN_EXTEND
to ZERO_EXTEND earlier (for example during the tree-ssa passes), the new
ZERO_EXTEND now passes through the above logic and always gets converted to a
pair of shifts, regardless of the cost, triggering the regresssion.

Thanks to Tamar for confirming the above change results in a significant
performance benefit (sometimes ~5%) on aarch64.

The complication is that the above change, to preserve ZERO_EXTEND/SIGN_EXTEND
in GCC's preferred canonical RTL when these operations are cheap, exposes a
number of latent bugs and missed optimization opportunities (on at least x86
and aarch64).  I'm currently bootstrapping and regression testing a number of
fixes to prevent these and minimize disruption should the above change be
approved.

Hence we're making progress, but a clean "fix" isn't ready just yet.

p.s. HP is right that REG_EQUAL notes may help (when x & 0xff is cheaper than x
& 0x23, i.e. if we're really doing AND then including nonzero_bits is free, but
if this is a zero_extension, the AND is implicit, and will only be recognized
as an addressing mode if the precision mask isn't modified by nonzero_bits).

Reply via email to