On Wed, May 14, 2014 at 2:52 PM, Alan Lawrence <alan.lawre...@arm.com> wrote:
> Hi,
>
> Due to differences in how the ARM C Language Extensions and gcc's vector
> extensions deal with indices within vectors, the __builtin_shuffle masks
> used to implement the ZIP, UZP and TRN Neon Intrinsics in arm_neon.h are
> correct only for little-endian. (The problem on bigendian has recently been
> revealed by new tests in gcc.target/arm/simd/.)
>
> This patch corrects the indices using "#ifdef __ARM_BIG_ENDIAN" through
> arm_neon.h. I've tested all the arm-specific tests (arm.exp acle.exp
> aapcs.exp simd.exp neon.exp) on both arm-none-eabi and armeb-none-eabi and
> there are no regressions, and on armeb-none-eabi this patch fixes FAIL ->
> PASS for simd/v{uzp,zip,trn}*_1.c.
>
> Note the patch also modifies gcc.target/arm/pr48252.c. A bit of diving into
> the history of this test reveals
>    *the test was first written in the days when the arm_neon.h
> implementation used builtins such as __builtin_neon_vzipv8qi (which were
> thus correct for bigendian).
>    *In SVN rev 189294, ZIP intrinsics were rewritten to use
> __builtin_shuffle (with little-endian masks); this broke pr48252.c on
> bigendian, but this was not detected until...
>    *In SVN rev 191200, in which pr48252.c was modified to expect different
> results according to endianness - that is, "fixing" the test to match the
> broken implementation. (I have verified that this updated test failed on the
> original __builtin_neon_vzipv8qi implementation.)

Thanks for fixing this up - This patch is ok. Please apply to trunk
and backport to 4.9 after letting it bake on trunk for a few days.

I suspect 4.8 is also afflicted from a quick look - can you please
double check and then apply.

If this goes in we need to kill the ml for 4.8 and 4.9 now.

regards
Ramana

>
> The fix to pr48252.c here largely reverts to the original form although
> keeps the (correct, proper) use of vld1.
>
> gcc/ChangeLog:
>
>         * config/arm/arm_neon.h (vtrn_s8, vtrn_s16, vtrn_u8, vtrn_u16,
> vtrn_p8,
>         vtrn_p16, vtrn_s32, vtrn_f32, vtrn_u32, vtrnq_s8, vtrnq_s16,
> vtrnq_s32,
>         vtrnq_f32, vtrnq_u8, vtrnq_u16, vtrnq_u32, vtrnq_p8, vtrnq_p16,
> vzip_s8,
>         vzip_s16, vzip_u8, vzip_u16, vzip_p8, vzip_p16, vzip_s32, vzip_f32,
>         vzip_u32, vzipq_s8, vzipq_s16, vzipq_s32, vzipq_f32, vzipq_u8,
>         vzipq_u16, vzipq_u32, vzipq_p8, vzipq_p16, vuzp_s8, vuzp_s16,
> vuzp_s32,
>         vuzp_f32, vuzp_u8, vuzp_u16, vuzp_u32, vuzp_p8, vuzp_p16, vuzpq_s8,
>         vuzpq_s16, vuzpq_s32, vuzpq_f32, vuzpq_u8, vuzpq_u16, vuzpq_u32,
>         vuzpq_p8, vuzpq_p16): Correct mask for bigendian.
>
> gcc/testsuite/ChangeLog:
>
>         * gcc.target/arm/pr48252.c (main): Expect same result as
> endian-neutral.
>
> Cheers, Alan

Reply via email to