On 16/12/15 17:44, Charles Baylis wrote:
Hi
Hi Charles, sorry for the delay on this one.
This patch addresses incorrect recognition of VEC_PERM_EXPRs as VUZP and VZIP on armeb-* targets. It also fixes the definition of the vuzpq_* and vzipq_* NEON intrinsics which use incorrect lane specifiers in the use of __builtin_shuffle(). The problem with arm_neon.h can be seen by temporarily altering arm_expand_vec_perm_const_1() to unconditionally return false. If this is done, the vuzp/vzip tests in the advsimd execution tests will fail. With these patches, this is no longer the case. The problem is caused by the weird mapping of architectural lane order to gcc lane order in big endian. For 64 bit vectors, the order is simply reversed, but 128 bit vectors are treated as 2 64 bit vectors where the lane ordering is reversed inside those. This is due to the memory ordering defined by the EABI. There is a large comment in gcc/config/arm.c above output_move_neon() which describes this in more detail. The arm_evpc_neon_vuzp() and arm_evpc_neon_vzip() functions do not allow for this lane order, instead treating the lane order as simply reversed in 128 bit vectors. These patches fix this. I have included a test case for vuzp, but I don't have one for vzip. Tested with make check on arm-unknown-linux-gnueabihf with no regressions Tested with make check on armeb-unknown-linux-gnueabihf. Some gcc.dg/vect tests fail due to no longer being vectorized. I haven't analysed these, but it is expected since vuzp is not usable for the shuffle patterns for which it was previously used. There are also a few new PASSes.
Indeed I see the new passes on armeb-none-eabi. However, the new FAILs that I see are ICEs, not just vectorisation failures, so they need to be looked at. The ICEs that I see are: FAIL: gcc.dg/torture/vshuf-v4hi.c -O2 (internal compiler error) FAIL: gcc.dg/torture/vshuf-v8qi.c -O2 (internal compiler error) The backtrace looks like: 0x81c9eb expand_expr_real_2(separate_ops*, rtx_def*, machine_mode, expand_modifier) $SRC/gcc/expr.c:9239 0x8044cc expand_expr_real_1(tree_node*, rtx_def*, machine_mode, expand_modifier, rtx_def**, bool) $SRC/gcc/expr.c:9562 0x80a851 expand_expr_real(tree_node*, rtx_def*, machine_mode, expand_modifier, rtx_def**, bool) $SRC/gcc/expr.c:7947 0x811bf0 store_expr_with_bounds(tree_node*, rtx_def*, int, bool, bool, tree_node*) $SRC/gcc/expr.c:5406 0x814a7f expand_assignment(tree_node*, tree_node*, bool) $SRC/gcc/expr.c:5175 0x709da5 expand_gimple_stmt_1 $SRC/gcc/cfgexpand.c:3606 0x709da5 expand_gimple_stmt $SRC/gcc/cfgexpand.c:3702 0x70c3a6 expand_gimple_basic_block $SRC/gcc/cfgexpand.c:5708 0x70fd58 execute $SRC/gcc/cfgexpand.c:6323 Please submit a full bug report, with preprocessed source if appropriate. Please include the complete backtrace with any bug report. See <http://gcc.gnu.org/bugs.html> for instructions. Seems that the code in expr.c asserts that expand_vec_perm returned a non-NULL result. I'll look at the patches in more detail, but in the meantime I notice that there are some GNU style issues that should be resolved, like starting comments with a capital letter, two spaces after full stop, two spaces between full stop and close comment, as well as some lines over 80 characters. The check_GNU_style.sh script in the contrib/ directory can help catch some (if not all) of these. Also, can you please send any follow-up versions of the two patches as separate emails, so that we can more easily keep track of what's comment goes to which patch. Thanks, Kyrill
Patch 1 (vuzp): gcc/ChangeLog: 2015-12-15 Charles Baylis <charles.bay...@linaro.org> * config/arm/arm.c (arm_neon_endian_lane_map): New function. (arm_neon_vector_pair_endian_lane_map): New function. (arm_evpc_neon_vuzp): Allow for big endian lane order. * config/arm/arm_neon.h (vuzpq_s8): Adjust shuffle patterns for big endian. (vuzpq_s16): Likewise. (vuzpq_s32): Likewise. (vuzpq_f32): Likewise. (vuzpq_u8): Likewise. (vuzpq_u16): Likewise. (vuzpq_u32): Likewise. (vuzpq_p8): Likewise. (vuzpq_p16): Likewise. gcc/testsuite/ChangeLog: 2015-12-15 Charles Baylis <charles.bay...@linaro.org> * gcc.c-torture/execute/pr68532.c: New test. Patch 2 (vzip) gcc/ChangeLog: 2015-12-15 Charles Baylis <charles.bay...@linaro.org> * config/arm/arm.c (arm_evpc_neon_vzip): Allow for big endian lane order. * config/arm/arm_neon.h (vzipq_s8): Adjust shuffle patterns for big endian. (vzipq_s16): Likewise. (vzipq_s32): Likewise. (vzipq_f32): Likewise. (vzipq_u8): Likewise. (vzipq_u16): Likewise. (vzipq_u32): Likewise. (vzipq_p8): Likewise. (vzipq_p16): Likewise.