At present vec_perm with non-const indices is not handled on bigendian, so gcc
generates generic, slow, code. This patch fixes up TBL to reverse the indices
within each input vector (following Richard Henderson's suggestion of using an
XOR with (nelts - 1) rather than a complicated mask/add/subtract,
http://gcc.gnu.org/ml/gcc-patches/2014-03/msg01285.html), and enables the code
for bigendian.
Regressed on aarch64_be-none-elf with no changes. (This is as expected: in all
affected cases, gcc was already producing correct non-arch-specific code using
scalar op. However, I have manually verified for various tests in
c-c++-common/torture/vshuf-v* that (a) TBL instructions are now produced, (b) a
version of the compiler that produces TBLs without the index correction, fails
tests).
Note tests c-c++-common/torture/vshuf-{v16hi,v4df,v4di,v8si} (i.e. the 32-byte
vectors) were broken prior to this patch and are not affected.
gcc/ChangeLog:
2014-04-23 Alan Lawrence <alan.lawre...@arm.com>
* config/aarch64/aarch64-simd.md (vec_perm): Enable for bigendian.
* config/aarch64/aarch64.c (aarch64_expand_vec_perm): Remove assert
against bigendian and adjust indices.
diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md
index 73aee2c..e14e9b0 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -4002,7 +4002,7 @@
(match_operand:VB 1 "register_operand")
(match_operand:VB 2 "register_operand")
(match_operand:VB 3 "register_operand")]
- "TARGET_SIMD && !BYTES_BIG_ENDIAN"
+ "TARGET_SIMD"
{
aarch64_expand_vec_perm (operands[0], operands[1],
operands[2], operands[3]);
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index d332741..6875b58 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -7763,18 +7763,24 @@ aarch64_expand_vec_perm (rtx target, rtx op0, rtx op1, rtx sel)
enum machine_mode vmode = GET_MODE (target);
unsigned int i, nelt = GET_MODE_NUNITS (vmode);
bool one_vector_p = rtx_equal_p (op0, op1);
- rtx rmask[MAX_VECT_LEN], mask;
-
- gcc_checking_assert (!BYTES_BIG_ENDIAN);
+ rtx mask;
/* The TBL instruction does not use a modulo index, so we must take care
of that ourselves. */
- mask = GEN_INT (one_vector_p ? nelt - 1 : 2 * nelt - 1);
- for (i = 0; i < nelt; ++i)
- rmask[i] = mask;
- mask = gen_rtx_CONST_VECTOR (vmode, gen_rtvec_v (nelt, rmask));
+ mask = aarch64_simd_gen_const_vector_dup (vmode,
+ one_vector_p ? nelt - 1 : 2 * nelt - 1);
sel = expand_simple_binop (vmode, AND, sel, mask, NULL, 0, OPTAB_LIB_WIDEN);
+ /* For big-endian, we also need to reverse the index within the vector
+ (but not which vector). */
+ if (BYTES_BIG_ENDIAN)
+ {
+ /* If one_vector_p, mask is a vector of (nelt - 1)'s already. */
+ if (!one_vector_p)
+ mask = aarch64_simd_gen_const_vector_dup (vmode, nelt - 1);
+ sel = expand_simple_binop (vmode, XOR, sel, mask,
+ NULL, 0, OPTAB_LIB_WIDEN);
+ }
aarch64_expand_vec_perm_1 (target, op0, op1, sel);
}