On 16/12/15 17:44, Charles Baylis wrote:
Hi

Hi Charles,
sorry for the delay on this one.


This patch addresses incorrect recognition of VEC_PERM_EXPRs as VUZP
and VZIP on armeb-* targets. It also fixes the definition of the
vuzpq_* and vzipq_*  NEON intrinsics which use incorrect lane
specifiers in the use of __builtin_shuffle().

The problem with arm_neon.h can be seen by temporarily altering
arm_expand_vec_perm_const_1() to unconditionally return false. If this
is done, the vuzp/vzip tests in the advsimd execution tests will fail.
With these patches, this is no longer the case.

The problem is caused by the weird mapping of architectural lane order
to gcc lane order in big endian. For 64 bit vectors, the order is
simply reversed, but 128 bit vectors are treated as 2 64 bit vectors
where the lane ordering is reversed inside those. This is due to the
memory ordering defined by the EABI. There is a large comment in
gcc/config/arm.c above output_move_neon() which describes this in more
detail.

The arm_evpc_neon_vuzp() and  arm_evpc_neon_vzip() functions do not
allow for this lane order, instead treating the lane order as simply
reversed in 128 bit vectors. These patches fix this. I have included a
test case for vuzp, but I don't have one for vzip.

Tested with make check on arm-unknown-linux-gnueabihf with no regressions
Tested with make check on armeb-unknown-linux-gnueabihf. Some
gcc.dg/vect tests fail due to no longer being vectorized. I haven't
analysed these, but it is expected since vuzp is not usable for the
shuffle patterns for which it was previously used. There are also a
few new PASSes.

Indeed I see the new passes on armeb-none-eabi.
However, the new FAILs that I see are ICEs, not just vectorisation failures,
so they need to be looked at.

The ICEs that I see are:
FAIL: gcc.dg/torture/vshuf-v4hi.c   -O2  (internal compiler error)
FAIL: gcc.dg/torture/vshuf-v8qi.c   -O2  (internal compiler error)

The backtrace looks like:
0x81c9eb expand_expr_real_2(separate_ops*, rtx_def*, machine_mode, 
expand_modifier)
        $SRC/gcc/expr.c:9239
0x8044cc expand_expr_real_1(tree_node*, rtx_def*, machine_mode, 
expand_modifier, rtx_def**, bool)
        $SRC/gcc/expr.c:9562
0x80a851 expand_expr_real(tree_node*, rtx_def*, machine_mode, expand_modifier, 
rtx_def**, bool)
        $SRC/gcc/expr.c:7947
0x811bf0 store_expr_with_bounds(tree_node*, rtx_def*, int, bool, bool, 
tree_node*)
        $SRC/gcc/expr.c:5406
0x814a7f expand_assignment(tree_node*, tree_node*, bool)
        $SRC/gcc/expr.c:5175
0x709da5 expand_gimple_stmt_1
        $SRC/gcc/cfgexpand.c:3606
0x709da5 expand_gimple_stmt
        $SRC/gcc/cfgexpand.c:3702
0x70c3a6 expand_gimple_basic_block
        $SRC/gcc/cfgexpand.c:5708
0x70fd58 execute
        $SRC/gcc/cfgexpand.c:6323
Please submit a full bug report,
with preprocessed source if appropriate.
Please include the complete backtrace with any bug report.
See <http://gcc.gnu.org/bugs.html> for instructions.

Seems that the code in expr.c asserts that expand_vec_perm returned a non-NULL 
result.

I'll look at the patches in more detail, but in the meantime I notice that 
there are some
GNU style issues that should be resolved, like starting comments with a capital 
letter,
two spaces after full stop, two spaces between full stop and close comment, as 
well as some
lines over 80 characters. The check_GNU_style.sh script in the contrib/ 
directory can help
catch some (if not all) of these.

Also, can you please send any follow-up versions of the two patches as separate 
emails,
so that we can more easily keep track of what's comment goes to which patch.

Thanks,
Kyrill


Patch 1 (vuzp):

gcc/ChangeLog:

2015-12-15  Charles Baylis  <charles.bay...@linaro.org>

         * config/arm/arm.c (arm_neon_endian_lane_map): New function.
         (arm_neon_vector_pair_endian_lane_map): New function.
         (arm_evpc_neon_vuzp): Allow for big endian lane order.
         * config/arm/arm_neon.h (vuzpq_s8): Adjust shuffle patterns for big
         endian.
         (vuzpq_s16): Likewise.
         (vuzpq_s32): Likewise.
         (vuzpq_f32): Likewise.
         (vuzpq_u8): Likewise.
         (vuzpq_u16): Likewise.
         (vuzpq_u32): Likewise.
         (vuzpq_p8): Likewise.
         (vuzpq_p16): Likewise.

gcc/testsuite/ChangeLog:

2015-12-15  Charles Baylis  <charles.bay...@linaro.org>

         * gcc.c-torture/execute/pr68532.c: New test.


Patch 2 (vzip)

gcc/ChangeLog:

2015-12-15  Charles Baylis  <charles.bay...@linaro.org>

         * config/arm/arm.c (arm_evpc_neon_vzip): Allow for big endian lane
         order.
         * config/arm/arm_neon.h (vzipq_s8): Adjust shuffle patterns for big
         endian.
         (vzipq_s16): Likewise.
         (vzipq_s32): Likewise.
         (vzipq_f32): Likewise.
         (vzipq_u8): Likewise.
         (vzipq_u16): Likewise.
         (vzipq_u32): Likewise.
         (vzipq_p8): Likewise.
         (vzipq_p16): Likewise.

Reply via email to