On Fri, Jan 26, 2018 at 4:04 PM, Richard Sandiford <richard.sandif...@linaro.org> wrote: > Richard Biener <richard.guent...@gmail.com> writes: >> On Fri, Jan 26, 2018 at 3:21 PM, Richard Sandiford >> <richard.sandif...@linaro.org> wrote: >>> I hadn't realised that on big-endian targets, VEC_UNPACK*HI_EXPR unpacks >>> the low-numbered lanes and VEC_UNPACK*LO_EXPR unpacks the high-numbered >>> lanes. This meant that both the SVE patterns and the handling of >>> fully-masked loops were wrong. >>> >>> The patch deals with that by making sure that all vec_unpack* optabs >>> are define_expands, using BYTES_BIG_ENDIAN to choose the appropriate >>> define_insn. This in turn meant that we can get rid of the duplication >>> between the signed and unsigned patterns for predicates. (We provide >>> implementations of both the signed and unsigned optabs because the sign >>> doesn't matter for predicates: every element contains only one >>> significant bit.) >>> >>> Also, the float unpacks need to unpack one half of the input vector, >>> but the unpacked upper bits are "don't care". There are two obvious >>> ways of handling that: use an unpack (filling with zeros) or use a ZIP >>> (filling with a duplicate of the low bits). The code previously used >>> unpacks, but the sequence involved a subreg that is semantically an >>> element reverse on big-endian targets. Using the ZIP patterns avoids >>> that, and at the moment there's no reason to prefer one over the other >>> for performance reasons, so the patch switches to ZIP unconditionally. >>> As the comment says, it would be easy to optimise this later if UUNPK >>> turns out to be better for some implementations. >>> >>> Tested on aarch64_be-elf, aarch64-linux-gnu and x86_64-linux-gnu. >>> I think the tree-vect-loop-manip.c part is obvious, but OK for the >>> AArch64 bits? >>> >>> Richard >>> >>> >>> 2018-01-26 Richard Sandiford <richard.sandif...@linaro.org> >>> >>> gcc/ >>> * tree-vect-loop-manip.c (vect_maybe_permute_loop_masks): >>> Reverse the choice between VEC_UNPACK_LO_EXPR and VEC_UNPACK_HI_EXPR >>> for big-endian. >>> * config/aarch64/iterators.md (hi_lanes_optab): New int attribute. >>> * config/aarch64/aarch64-sve.md >>> (*aarch64_sve_<perm_insn><perm_hilo><mode>): Rename to... >>> (aarch64_sve_<perm_insn><perm_hilo><mode>): ...this. >>> (*extend<mode><Vwide>2): Rename to... >>> (aarch64_sve_extend<mode><Vwide>2): ...this. >>> (vec_unpack<su>_<perm_hilo>_<mode>): Turn into a define_expand, >>> renaming the old pattern to... >>> (aarch64_sve_punpk<perm_hilo>_<mode>): ...this. Only define >>> unsigned packs. >>> (vec_unpack<su>_<perm_hilo>_<SVE_BHSI:mode>): Turn into a >>> define_expand, renaming the old pattern to... >>> (aarch64_sve_<su>unpk<perm_hilo>_<SVE_BHSI:mode>): ...this. >>> (*vec_unpacku_<perm_hilo>_<mode>_no_convert): Delete. >>> (vec_unpacks_<perm_hilo>_<mode>): Take BYTES_BIG_ENDIAN into >>> account when deciding which SVE instruction the optab should use. >>> (vec_unpack<su_optab>_float_<perm_hilo>_vnx4si): Likewise. >>> >>> gcc/testsuite/ >>> * gcc.target/aarch64/sve/unpack_fcvt_signed_1.c: Expect zips rather >>> than unpacks. >>> * gcc.target/aarch64/sve/unpack_fcvt_unsigned_1.c: Likewise. >>> * gcc.target/aarch64/sve/unpack_float_1.c: Likewise. >>> >>> Index: gcc/tree-vect-loop-manip.c >>> =================================================================== >>> --- gcc/tree-vect-loop-manip.c 2018-01-13 18:02:00.948360274 +0000 >>> +++ gcc/tree-vect-loop-manip.c 2018-01-26 14:00:15.717916957 +0000 >>> @@ -334,7 +334,8 @@ vect_maybe_permute_loop_masks (gimple_se >>> { >>> tree src = src_rgm->masks[i / 2]; >>> tree dest = dest_rgm->masks[i]; >>> - tree_code code = (i & 1 ? VEC_UNPACK_HI_EXPR >>> + tree_code code = ((i & 1) == (BYTES_BIG_ENDIAN ? 0 : 1) >>> + ? VEC_UNPACK_HI_EXPR >>> : VEC_UNPACK_LO_EXPR); >> >> This looks broken anyways -- the element oder on GIMPLE is the same >> for little and big-endian. I see two other BYTES_BIG_ENDIAN uses in >> tree-vect-*, not sure when they crept in but they are all broken. > > Are you sure? Everything apart from this new code seems consistently > to treat UNPACK_HI as "high part of the register" rather than > "high memory address/lane number". I think changing it now would > break powerpc64 and mips.
Looks like so. But we agreed on changing this, looks like this is a left-over. (I'm always looking at constant folding for how it's supposed to work). Looks like VEC_WIDEN_LSHIFT_HI/O_EXPR doesn't have constant folding. Richard. > Thanks, > Richard