On Fri, Jan 26, 2018 at 02:21:11PM +0000, Richard Sandiford wrote: > I hadn't realised that on big-endian targets, VEC_UNPACK*HI_EXPR unpacks > the low-numbered lanes and VEC_UNPACK*LO_EXPR unpacks the high-numbered > lanes. This meant that both the SVE patterns and the handling of > fully-masked loops were wrong. > > The patch deals with that by making sure that all vec_unpack* optabs > are define_expands, using BYTES_BIG_ENDIAN to choose the appropriate > define_insn. This in turn meant that we can get rid of the duplication > between the signed and unsigned patterns for predicates. (We provide > implementations of both the signed and unsigned optabs because the sign > doesn't matter for predicates: every element contains only one > significant bit.) > > Also, the float unpacks need to unpack one half of the input vector, > but the unpacked upper bits are "don't care". There are two obvious > ways of handling that: use an unpack (filling with zeros) or use a ZIP > (filling with a duplicate of the low bits). The code previously used > unpacks, but the sequence involved a subreg that is semantically an > element reverse on big-endian targets. Using the ZIP patterns avoids > that, and at the moment there's no reason to prefer one over the other > for performance reasons, so the patch switches to ZIP unconditionally. > As the comment says, it would be easy to optimise this later if UUNPK > turns out to be better for some implementations. > > Tested on aarch64_be-elf, aarch64-linux-gnu and x86_64-linux-gnu. > I think the tree-vect-loop-manip.c part is obvious, but OK for the > AArch64 bits?
The AArch64 bits are OK by me, if this direction is appropriate for the tree-vect-loop-manip.c parts. Thanks, James > > Richard > > > 2018-01-26 Richard Sandiford <richard.sandif...@linaro.org> > > gcc/ > * tree-vect-loop-manip.c (vect_maybe_permute_loop_masks): > Reverse the choice between VEC_UNPACK_LO_EXPR and VEC_UNPACK_HI_EXPR > for big-endian. > * config/aarch64/iterators.md (hi_lanes_optab): New int attribute. > * config/aarch64/aarch64-sve.md > (*aarch64_sve_<perm_insn><perm_hilo><mode>): Rename to... > (aarch64_sve_<perm_insn><perm_hilo><mode>): ...this. > (*extend<mode><Vwide>2): Rename to... > (aarch64_sve_extend<mode><Vwide>2): ...this. > (vec_unpack<su>_<perm_hilo>_<mode>): Turn into a define_expand, > renaming the old pattern to... > (aarch64_sve_punpk<perm_hilo>_<mode>): ...this. Only define > unsigned packs. > (vec_unpack<su>_<perm_hilo>_<SVE_BHSI:mode>): Turn into a > define_expand, renaming the old pattern to... > (aarch64_sve_<su>unpk<perm_hilo>_<SVE_BHSI:mode>): ...this. > (*vec_unpacku_<perm_hilo>_<mode>_no_convert): Delete. > (vec_unpacks_<perm_hilo>_<mode>): Take BYTES_BIG_ENDIAN into > account when deciding which SVE instruction the optab should use. > (vec_unpack<su_optab>_float_<perm_hilo>_vnx4si): Likewise. > > gcc/testsuite/ > * gcc.target/aarch64/sve/unpack_fcvt_signed_1.c: Expect zips rather > than unpacks. > * gcc.target/aarch64/sve/unpack_fcvt_unsigned_1.c: Likewise. > * gcc.target/aarch64/sve/unpack_float_1.c: Likewise.