Re: [RFC][PATCH][mid-end] Optimize immediate choice in comparisons.

Vlad Lazar Fri, 10 Aug 2018 08:55:25 -0700

On 09/08/18 06:48, Jeff Law wrote:

On 08/07/2018 02:11 PM, Richard Sandiford wrote:

Hi Vlad,


Thanks for the patch.

Vlad Lazar <vlad.la...@arm.com> writes:

Hi.

This patch optimises the choice of immediates in integer comparisons. Integer
comparisons allow for different choices (e.g. a > b is equivalent to a >= b+1)
and there are cases where an incremented/decremented immediate can be loaded 
into a
register in fewer instructions. The cases are as follows:

i) a > b or a >= b + 1

ii)  a <= b or a <  b + 1
iii) a >= b or a >  b - 1
iv)  a <  b or a <= b - 1

For each comparison we check whether the equivalent can be performed in less 
instructions.
This is done on a statement by statement basis, right before the GIMPLE 
statement is expanded
to RTL. Therefore, it requires a correct implementation of the TARGET_INSN_COST 
hook.
The change is general and it applies to any integer comparison, regardless of 
types or location.

For example, on AArch64 for the following code:

int foo (int x)
{
    return x > 0xfefffffe;
}

it generates:

mov     w1, -16777217
cmp     w0, w1
cset    w0, cs

instead of:

mov     w1, 65534
movk    w1, 0xfeff, lsl 16
cmp     w0, w1
cset    w0, hi

Bootstrapped and regtested on aarch64-none-linux-gnu, x86_64-pc-linux-gnu and 
there are no regressions.


Looks like a useful feature.  I'm playing devil's advocate to some
extent here, but did you consider instead doing this during the
expansion functions themselves?  In some ways that feels more
natural because we're already operating on rtxes at that point.
It also has the advantage that it would trap comparisons that didn't
exist at the gimple level, such as when computing the carry for a
doubleword add/sub.

I've got no strong opinions on doing it in cfgexpand vs the expansion
functions themselves.  I'm happy to have you setting overall direction
here Richard.

I do worry about the amount of RTL we generate and throw away during
cost computation.  Though it's just for comparisons, so it may not be
terrible.

I wouldn't be surprised if ports aren't particularly accurate in their
costing computations for this kind of use -- but that's nothing new.  We
run into it every time we use rtx costing in a new place.  I'm
comfortable having targets fault in improvements for this kind of use.

Jeff

Thank you for the feedback.

I agree with Richard's opinion that the change feels more natural in the
actual expanders.  Therefore, I've reimplemented the feature in the expansion
functions.  It's worth mentioning that it now also applies the optimization
in ternary operators comparisons and I added a test which reflects such a case.

Regarding the amount of RTL generated, I have tried to keep it at a minimum,
by only generating the immediate value moves.

I've bootstrapped and regtested again and there are no regressions.
See the updated patch below.

Thanks,
Vlad

---

This patch optimises the choice of immediates in integer comparisons. Integer
comparisons allow for different choices (e.g. a > b is equivalent to a >= b+1)
and there are cases where an incremented/decremented immediate can be loaded 
into a
register in fewer instructions. The cases are as follows:

i) a > b or a >= b + 1

ii)  a <= b or a <  b + 1
iii) a >= b or a >  b - 1
iv)  a <  b or a <= b - 1

For each comparison we check whether the equivalent can be performed in less 
instructions.
This is done in the gimple expanders. Therefore, it requires a correct 
implementation of the
TARGET_INSN_COST hook. The change is general and it applies to any integer 
comparison, regardless
of types or location.

For example, on AArch64 for the following code:

int foo (int x)
{
  return x > 0xfefffffe;
}

it generates:

mov     w1, -16777217
cmp     w0, w1
cset    w0, cs

instead of:

mov     w1, 65534
movk    w1, 0xfeff, lsl 16
cmp     w0, w1
cset    w0, hi

Bootstrapped and regtested on aarch64-none-linux-gnu, x86_64-pc-linux-gnu and 
there are no regressions.

Thanks,
Vlad

gcc/testsuite/
Changelog for gcc/testsuite/Changelog
2018-08-10  Vlad Lazar  <vlad.la...@arm.com>

        * gcc.target/aarch64/imm_choice_comparison.c: New.

gcc/
Changelog for gcc/Changelog
2018-08-10  Vlad Lazar  <vlad.la...@arm.com>
        * expmed.h (canonicalize_comparison, canonicalized_cmp_code): New 
declarations.
        * expmed.c (canonicalize_comparison, canonicalized_cmp_code): New 
implementations.
        * expmed.c (emit_store_flag_1): Add call to canonicalize_comparison.
        * optabs.c (prepare_cmp_insn): Likewise.

---

diff --git a/gcc/expmed.h b/gcc/expmed.h
index 
2890d9c9bbd034f01030dd551d544bf73e73b784..86a32a643fdd0fc9f396bd2c7904244bd484df16
 100644
--- a/gcc/expmed.h
+++ b/gcc/expmed.h
@@ -702,6 +702,10 @@ extern rtx emit_store_flag (rtx, enum rtx_code, rtx, rtx, 
machine_mode,
 extern rtx emit_store_flag_force (rtx, enum rtx_code, rtx, rtx,
                                  machine_mode, int, int);

+extern void canonicalize_comparison (machine_mode, enum rtx_code *, rtx *);

+
+extern enum rtx_code canonicalized_cmp_code (enum rtx_code);
+
 /* Choose a minimal N + 1 bit approximation to 1/D that can be used to
    replace division by D, and put the least significant N bits of the result
    in *MULTIPLIER_PTR and return the most significant bit.  */
diff --git a/gcc/expmed.c b/gcc/expmed.c
index 
4c74e7dc2ea448fb1c1dd69d197b4bca23668b44..bdd19062cf0dda3a6b7ffe04698bf832e80fca75
 100644
--- a/gcc/expmed.c
+++ b/gcc/expmed.c
@@ -5538,6 +5538,9 @@ emit_store_flag_1 (rtx target, enum rtx_code code, rtx 
op0, rtx op1,
   if (mode == VOIDmode)
     mode = GET_MODE (op0);

+ if (CONST_SCALAR_INT_P (op1))

+    canonicalize_comparison (mode, &code, &op1);
+
   /* For some comparisons with 1 and -1, we can convert this to
      comparisons with zero.  This will often produce more opportunities for
      store-flag insns.  */
@@ -6153,6 +6156,97 @@ emit_store_flag_force (rtx target, enum rtx_code code, 
rtx op0, rtx op1,

return target;

 }
+
+/* Choose the more appropiate immediate in comparisons.  The purpose of this
+   is to end up with an immediate which can be loaded into a register in fewer
+   moves, if possible.
+
+   For each integer comparison there exists an equivalent choice:
+     i)   a >  b or a >= b + 1
+     ii)  a <= b or a <  b + 1
+     iii) a >= b or a >  b - 1
+     iv)  a <  b or a <= b - 1
+
+   The function is called in the gimple expanders which handle GIMPLE_ASSIGN
+   and GIMPLE_COND.  It assumes that the first operand of the comparison is a
+   register and the second is a constant.
+
+   mode is the mode of the first operand.
+   code points to the comparison code.
+   imm points to the rtx containing the immediate.  */
+
+void canonicalize_comparison (machine_mode mode, enum rtx_code *code, rtx *imm)
+{
+  int to_add = 0;
+
+  if (!SCALAR_INT_MODE_P (mode))
+    return;
+
+  /* Extract the immediate value from the rtx.  */
+  wide_int imm_val = wi::shwi (INTVAL (*imm), mode);
+
+  if (*code == GT || *code == GTU || *code == LE || *code == LEU)
+    to_add = 1;
+
+  if (*code == GE || *code == GEU || *code == LT || *code == LTU)
+    to_add = -1;
+
+  /* Check for overflow/underflow in the case of signed values and
+     wrapping around in the case of unsigned values.  If any occur
+     cancel the optimization.  */
+  wide_int max_val = wi::max_value (mode, SIGNED);
+  wide_int min_val = wi::min_value (mode, SIGNED);
+  if ((wi::cmps (imm_val, max_val) == 0 && to_add == 1)
+      || (wi::cmps (imm_val, min_val) == 0 && to_add == -1))
+    return;
+
+  /* Generate a mov instruction for both cases and see whether the change
+     was profitable.  */
+  wide_int imm_modif = wi::add (imm_val, to_add);
+
+  rtx reg = gen_reg_rtx (mode);
+  rtx new_imm = GEN_INT (imm_modif.to_shwi ());
+
+  rtx_insn *old_rtx = gen_move_insn (reg, *imm);
+  rtx_insn *new_rtx = gen_move_insn (reg, new_imm);
+
+  /* Update the immediate and the code.  */
+  if (insn_cost (old_rtx, true) > insn_cost (new_rtx, true))
+    {
+      *code = canonicalized_cmp_code (*code);
+      *imm = new_imm;
+    }
+}
+
+/* Helper function for canonicalize_cmp_for_target.  It returns the comparison
+   code of the equivalent comparison.  */
+
+enum rtx_code canonicalized_cmp_code (enum rtx_code code)
+{
+  switch (code)
+    {
+    case GT:
+      return GE;
+    case GE:
+      return GT;
+    case LT:
+      return LE;
+    case LE:
+      return LT;
+    case GTU:
+      return GEU;
+    case GEU:
+      return GTU;
+    case LTU:
+      return LEU;
+    case LEU:
+      return LTU;
+
+    default:
+      return code;
+    }
+}
+
 
 /* Perform possibly multi-word comparison and conditional jump to LABEL
    if ARG1 OP ARG2 true where ARG1 and ARG2 are of mode MODE.  This is
diff --git a/gcc/optabs.c b/gcc/optabs.c
index 
cadf4676c986c8430baafab8ef5282e890d36308..6052222c90ce559ae3b11a40fbd2ebbf4a6bc8dc
 100644
--- a/gcc/optabs.c
+++ b/gcc/optabs.c
@@ -3812,6 +3812,9 @@ prepare_cmp_insn (rtx x, rtx y, enum rtx_code comparison, 
rtx size,
   gcc_assert (methods == OPTAB_DIRECT || methods == OPTAB_WIDEN
              || methods == OPTAB_LIB_WIDEN);

+ if (CONST_SCALAR_INT_P (y))

+    canonicalize_comparison (mode, &comparison, &y);
+
   /* If we are optimizing, force expensive constants into a register.  */
   if (CONSTANT_P (x) && optimize
       && (rtx_cost (x, mode, COMPARE, 0, optimize_insn_for_speed_p ())
diff --git a/gcc/testsuite/gcc.target/aarch64/imm_choice_comparison.c 
b/gcc/testsuite/gcc.target/aarch64/imm_choice_comparison.c
new file mode 100644
index 
0000000000000000000000000000000000000000..ebc44d6dbc7287d907603d77d7b54496de177c4b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/imm_choice_comparison.c
@@ -0,0 +1,54 @@
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+
+/* Go from four moves to two.  */
+
+int
+foo (long long x)
+{
+  return x <= 0x1999999999999998;
+}
+
+int
+GT (unsigned int x)
+{
+  return x > 0xfefffffe;
+}
+
+int
+LE (unsigned int x)
+{
+  return x <= 0xfefffffe;
+}
+
+int
+GE (long long x)
+{
+  return x >= 0xff000000;
+}
+
+int
+LT (int x)
+{
+  return x < 0xff000000;
+}
+
+/* Optimize the immediate in conditionals.  */
+
+int
+check (int x, int y)
+{
+  if (x > y && GT (x))
+    return 100;
+
+  return x;
+}
+
+int
+tern (int x)
+{
+  return x >= 0xff000000 ? 5 : -3;
+}
+
+/* baz produces one movk instruction.  */
+/* { dg-final { scan-assembler-times "movk" 1 } } */

Re: [RFC][PATCH][mid-end] Optimize immediate choice in comparisons.

Reply via email to