Please disregard the original patch and see this updated version.


On 07/20/2018 10:31 AM, Sam Tebbs wrote:
Hi all,

Here is an updated patch that does the following:

* Adds a new constraint in config/aarch64/constraints.md to check for a constant integer that is left consecutive. This addresses Richard Henderson's suggestion about combining the aarch64_is_left_consecutive call and the const_int match in the pattern.

* Merges the two patterns defined into one.

* Changes the pattern's type attribute to bfm.

* Improved the comment above the aarch64_is_left_consecutive implementation.

* Makes the pattern use the GPI iterator to accept smaller integer sizes (an extra test is added to check for this).

* Improves the tests in combine_bfxil.c to ensure they aren't optimised away and that they check for the pattern's correctness.

Below is a new changelog and the example given before.

Is this OK for trunk?

This patch adds an optimisation that exploits the AArch64 BFXIL instruction
when or-ing the result of two bitwise and operations with non-overlapping
bitmasks (e.g. (a & 0xFFFF0000) | (b & 0x0000FFFF)).

Example:

unsigned long long combine(unsigned long long a, unsigned long long b) {
  return (a & 0xffffffff00000000ll) | (b & 0x00000000ffffffffll);
}

void read(unsigned long long a, unsigned long long b, unsigned long long *c) {
  *c = combine(a, b);
}

When compiled with -O2, read would result in:

read:
  and   x5, x1, #0xffffffff
  and   x4, x0, #0xffffffff00000000
  orr   x4, x4, x5
  str   x4, [x2]
  ret

But with this patch results in:

read:
  mov    x4, x0
  bfxil    x4, x1, 0, 32
  str    x4, [x2]
  ret



Bootstrapped and regtested on aarch64-none-linux-gnu and aarch64-none-elf with no regressions.


gcc/
2018-07-11  Sam Tebbs  <sam.te...@arm.com>

        PR target/85628
        * config/aarch64/aarch64.md (*aarch64_bfxil):
        Define.
        * config/aarch64/constraints.md (Ulc): Define
        * config/aarch64/aarch64-protos.h (aarch64_is_left_consecutive):
        Define.
        * config/aarch64/aarch64.c (aarch64_is_left_consecutive): New function.

gcc/testsuite
2018-07-11  Sam Tebbs  <sam.te...@arm.com>

        PR target/85628
        * gcc.target/aarch64/combine_bfxil.c: New file.
        * gcc.target/aarch64/combine_bfxil_2.c: New file.


On 07/19/2018 02:02 PM, Sam Tebbs wrote:
Hi Richard,

Thanks for the feedback. I find that using "is_left_consecutive" is more descriptive than checking for it being a power of 2 - 1, since it describes the requirement (having consecutive ones from the MSB) more explicitly. I would be happy to change it though if that is the consensus.

I have addressed your point about just returning the string instead of using output_asm_insn and have changed it locally. I'll send an updated patch soon.


On 07/17/2018 02:33 AM, Richard Henderson wrote:
On 07/16/2018 10:10 AM, Sam Tebbs wrote:
+++ b/gcc/config/aarch64/aarch64.c
@@ -1439,6 +1439,14 @@ aarch64_hard_regno_caller_save_mode (unsigned regno, unsigned,
      return SImode;
  }
  +/* Implement IS_LEFT_CONSECUTIVE.  Check if an integer's bits are consecutive
+   ones from the MSB.  */
+bool
+aarch64_is_left_consecutive (HOST_WIDE_INT i)
+{
+  return (i | (i - 1)) == HOST_WIDE_INT_M1;
+}
+
...
+(define_insn "*aarch64_bfxil"
+  [(set (match_operand:DI 0 "register_operand" "=r")
+    (ior:DI (and:DI (match_operand:DI 1 "register_operand" "r")
+            (match_operand 3 "const_int_operand"))
+        (and:DI (match_operand:DI 2 "register_operand" "0")
+            (match_operand 4 "const_int_operand"))))]
+  "INTVAL (operands[3]) == ~INTVAL (operands[4])
+    && aarch64_is_left_consecutive (INTVAL (operands[4]))"
Better is to use a define_predicate to merge both that second test and the
const_int_operand.

(I'm not sure about the "left_consecutive" language either.
Isn't it more descriptive to say that op3 is a power of 2 minus 1?)

(define_predicate "pow2m1_operand"
   (and (match_code "const_int")
        (match_test "exact_pow2 (INTVAL(op) + 1) > 0")))

and use

   (match_operand:DI 3 "pow2m1_operand")

and then just the

   INTVAL (operands[3]) == ~INTVAL (operands[4])

test.

Also, don't omit the modes for the constants.
Also, there's no reason this applies only to DI mode;
use the GPI iterator and %<w> in the output template.

+    HOST_WIDE_INT op3 = INTVAL (operands[3]);
+    operands[3] = GEN_INT (ceil_log2 (op3));
+    output_asm_insn ("bfxil\\t%0, %1, 0, %3", operands);
+    return "";
You can just return the string that you passed to output_asm_insn.

+  }
+  [(set_attr "type" "bfx")]
The other aliases of the BFM insn use type "bfm";
"bfx" appears to be aliases of UBFM and SBFM.
Not that it appears to matter to the scheduling
descriptions, but it is inconsistent.


r~



diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h
index 514ddc4..b025cd6 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -558,4 +558,6 @@ rtl_opt_pass *make_pass_fma_steering (gcc::context *ctxt);
 
 poly_uint64 aarch64_regmode_natural_size (machine_mode);
 
+bool aarch64_is_left_consecutive (HOST_WIDE_INT);
+
 #endif /* GCC_AARCH64_PROTOS_H */
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index d75d45f..bf084cb 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -1439,6 +1439,14 @@ aarch64_hard_regno_caller_save_mode (unsigned regno, unsigned,
     return SImode;
 }
 
+/* Implement IS_LEFT_CONSECUTIVE.  Check if I's bits are consecutive
+   ones from the MSB.  */
+bool
+aarch64_is_left_consecutive (HOST_WIDE_INT i)
+{
+  return (i | (i - 1)) == HOST_WIDE_INT_M1;
+}
+
 /* Implement TARGET_CONSTANT_ALIGNMENT.  Make strings word-aligned so
    that strcpy from constants will be faster.  */
 
diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index a014a01..a841db8 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -4844,6 +4844,29 @@
   [(set_attr "type" "rev")]
 )
 
+(define_insn "*aarch64_bfxil<mode>"
+  [(set (match_operand:GPI 0 "register_operand" "=r,r")
+    (ior:GPI (and:GPI (match_operand:GPI 1 "register_operand" "r,0")
+		    (match_operand:GPI 3 "const_int_operand" "n, Ulc"))
+	    (and:GPI (match_operand:GPI 2 "register_operand" "0,r")
+		    (match_operand:GPI 4 "const_int_operand" "Ulc, n"))))]
+  "INTVAL (operands[3]) == ~INTVAL (operands[4])"
+  {
+    switch (which_alternative)
+    {
+      case 0:
+	operands[3] = GEN_INT (ceil_log2 (INTVAL (operands[3])));
+	return "bfxil\\t%<w>0, %<w>1, 0, %3";
+      case 1:
+	operands[3] = GEN_INT (ceil_log2 (INTVAL (operands[4])));
+	return "bfxil\\t%<w>0, %<w>2, 0, %3";
+      default:
+	gcc_unreachable ();
+    }
+  }
+  [(set_attr "type" "bfm")]
+)
+
 ;; There are no canonicalisation rules for the position of the lshiftrt, ashift
 ;; operations within an IOR/AND RTX, therefore we have two patterns matching
 ;; each valid permutation.
diff --git a/gcc/config/aarch64/constraints.md b/gcc/config/aarch64/constraints.md
index 32a0fa6..46a4764 100644
--- a/gcc/config/aarch64/constraints.md
+++ b/gcc/config/aarch64/constraints.md
@@ -172,6 +172,13 @@
   A constraint that matches the immediate constant -1."
   (match_test "op == constm1_rtx"))
 
+(define_constraint "Ulc"
+ "@internal
+ A constraint that matches a constant integer whose bits are consecutive ones
+ from the MSB."
+ (and (match_code "const_int")
+      (match_test "aarch64_is_left_consecutive (ival)")))
+
 (define_constraint "Usv"
   "@internal
    A constraint that matches a VG-based constant that can be loaded by
diff --git a/gcc/testsuite/gcc.target/aarch64/combine_bfxil.c b/gcc/testsuite/gcc.target/aarch64/combine_bfxil.c
new file mode 100644
index 0000000..e55f62a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/combine_bfxil.c
@@ -0,0 +1,81 @@
+/* { dg-do run } */
+/* { dg-options "-O2 --save-temps" } */
+
+extern void abort(void);
+
+unsigned long long
+combine_balanced (unsigned long long a, unsigned long long b)
+{
+  return (a & 0xffffffff00000000ll) | (b & 0x00000000ffffffffll);
+}
+
+unsigned long long
+combine_unbalanced (unsigned long long a, unsigned long long b)
+{
+  return (a & 0xffffffffff000000ll) | (b & 0x0000000000ffffffll);
+}
+
+unsigned int
+combine_balanced_int (unsigned int a, unsigned int b)
+{
+  return (a & 0xffff0000ll) | (b & 0x0000ffffll);
+}
+
+unsigned int
+combine_unbalanced_int (unsigned int a, unsigned int b)
+{
+  return (a & 0xffffff00ll) | (b & 0x000000ffll);
+}
+
+__attribute__ ((noinline)) void
+foo2 (unsigned long long a, unsigned long long b, unsigned long long *c,
+  unsigned long long *d)
+{
+  *c = combine_balanced (a, b);
+  *d = combine_balanced (b, a);
+}
+
+__attribute__ ((noinline)) void
+foo3 (unsigned long long a, unsigned long long b, unsigned long long *c,
+  unsigned long long *d)
+{
+  *c = combine_unbalanced (a, b);
+  *d = combine_unbalanced (b, a);
+}
+
+void
+foo4 (unsigned int a, unsigned int b, unsigned int *c, unsigned int *d)
+{
+  *c = combine_balanced_int(a, b);
+  *d = combine_balanced_int(b, a);
+}
+
+void
+foo5 (unsigned int a, unsigned int b, unsigned int *c, unsigned int *d)
+{
+  *c = combine_unbalanced_int(a, b);
+  *d = combine_unbalanced_int(b, a);
+}
+
+int
+main(void)
+{
+  unsigned long long a = 0x0123456789ABCDEF, b = 0xFEDCBA9876543210, c, d;
+  foo3(a, b, &c, &d);
+  if(c != 0x0123456789543210) abort();
+  if(d != 0xfedcba9876abcdef) abort();
+  foo2(a, b, &c, &d);
+  if(c != 0x0123456776543210) abort();
+  if(d != 0xfedcba9889abcdef) abort();
+
+  unsigned int a2 = 0x01234567, b2 = 0xFEDCBA98, c2, d2;
+  foo4(a2, b2, &c2, &d2);
+  if(c2 != 0x0123ba98) abort();
+  if(d2 != 0xfedc4567) abort();
+  foo5(a2, b2, &c2, &d2);
+  if(c2 != 0x01234598) abort();
+  if(d2 != 0xfedcba67) abort();
+  return 0;
+}
+
+/* { dg-final { scan-assembler-times "bfxil\\t" 8 } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/combine_bfxil_2.c b/gcc/testsuite/gcc.target/aarch64/combine_bfxil_2.c
new file mode 100644
index 0000000..0fc1404
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/combine_bfxil_2.c
@@ -0,0 +1,16 @@
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+
+unsigned long long
+combine_non_consecutive (unsigned long long a, unsigned long long b)
+{
+  return (a & 0xfffffff200f00000ll) | (b & 0x00001000ffffffffll);
+}
+
+void
+foo4 (unsigned long long a, unsigned long long b, unsigned long long *c,
+  unsigned long long *d) {
+  /* { dg-final { scan-assembler-not "bfxil\\t" } } */
+  *c = combine_non_consecutive (a, b);
+  *d = combine_non_consecutive (b, a);
+}

Reply via email to