ping^2
On 13 January 2016 at 13:37, Charles Baylis <charles.bay...@linaro.org> wrote: > ping > > On 16 December 2015 at 17:44, Charles Baylis <charles.bay...@linaro.org> > wrote: >> Hi >> >> This patch addresses incorrect recognition of VEC_PERM_EXPRs as VUZP >> and VZIP on armeb-* targets. It also fixes the definition of the >> vuzpq_* and vzipq_* NEON intrinsics which use incorrect lane >> specifiers in the use of __builtin_shuffle(). >> >> The problem with arm_neon.h can be seen by temporarily altering >> arm_expand_vec_perm_const_1() to unconditionally return false. If this >> is done, the vuzp/vzip tests in the advsimd execution tests will fail. >> With these patches, this is no longer the case. >> >> The problem is caused by the weird mapping of architectural lane order >> to gcc lane order in big endian. For 64 bit vectors, the order is >> simply reversed, but 128 bit vectors are treated as 2 64 bit vectors >> where the lane ordering is reversed inside those. This is due to the >> memory ordering defined by the EABI. There is a large comment in >> gcc/config/arm.c above output_move_neon() which describes this in more >> detail. >> >> The arm_evpc_neon_vuzp() and arm_evpc_neon_vzip() functions do not >> allow for this lane order, instead treating the lane order as simply >> reversed in 128 bit vectors. These patches fix this. I have included a >> test case for vuzp, but I don't have one for vzip. >> >> Tested with make check on arm-unknown-linux-gnueabihf with no regressions >> Tested with make check on armeb-unknown-linux-gnueabihf. Some >> gcc.dg/vect tests fail due to no longer being vectorized. I haven't >> analysed these, but it is expected since vuzp is not usable for the >> shuffle patterns for which it was previously used. There are also a >> few new PASSes. >> >> >> Patch 1 (vuzp): >> >> gcc/ChangeLog: >> >> 2015-12-15 Charles Baylis <charles.bay...@linaro.org> >> >> * config/arm/arm.c (arm_neon_endian_lane_map): New function. >> (arm_neon_vector_pair_endian_lane_map): New function. >> (arm_evpc_neon_vuzp): Allow for big endian lane order. >> * config/arm/arm_neon.h (vuzpq_s8): Adjust shuffle patterns for big >> endian. >> (vuzpq_s16): Likewise. >> (vuzpq_s32): Likewise. >> (vuzpq_f32): Likewise. >> (vuzpq_u8): Likewise. >> (vuzpq_u16): Likewise. >> (vuzpq_u32): Likewise. >> (vuzpq_p8): Likewise. >> (vuzpq_p16): Likewise. >> >> gcc/testsuite/ChangeLog: >> >> 2015-12-15 Charles Baylis <charles.bay...@linaro.org> >> >> * gcc.c-torture/execute/pr68532.c: New test. >> >> >> Patch 2 (vzip) >> >> gcc/ChangeLog: >> >> 2015-12-15 Charles Baylis <charles.bay...@linaro.org> >> >> * config/arm/arm.c (arm_evpc_neon_vzip): Allow for big endian lane >> order. >> * config/arm/arm_neon.h (vzipq_s8): Adjust shuffle patterns for big >> endian. >> (vzipq_s16): Likewise. >> (vzipq_s32): Likewise. >> (vzipq_f32): Likewise. >> (vzipq_u8): Likewise. >> (vzipq_u16): Likewise. >> (vzipq_u32): Likewise. >> (vzipq_p8): Likewise. >> (vzipq_p16): Likewise.