Hi,

Much like the zip intrinsics, the vuzp_* intrinsics are implemented with inline
ASM, which prevents compiler analysis. This series replaces those with calls to
_builtin_shuffle, which produce the same** assembler instructions.

(**except for two-element vectors where UZP and ZIP are equivalent and the
backend outputs ZIP.)

First patch adds a bunch of tests, passing for the current asm implementation;
Second patch reimplements with __builtin_shuffle;
Third patch adds equivalent ARM tests using test bodies shared from first patch.

OK for stage 1?

Cheers, Alan


Reply via email to