On Fri, Jan 26, 2018 at 02:21:11PM +0000, Richard Sandiford wrote:
> I hadn't realised that on big-endian targets, VEC_UNPACK*HI_EXPR unpacks
> the low-numbered lanes and VEC_UNPACK*LO_EXPR unpacks the high-numbered
> lanes.  This meant that both the SVE patterns and the handling of
> fully-masked loops were wrong.
> 
> The patch deals with that by making sure that all vec_unpack* optabs
> are define_expands, using BYTES_BIG_ENDIAN to choose the appropriate
> define_insn.  This in turn meant that we can get rid of the duplication
> between the signed and unsigned patterns for predicates.  (We provide
> implementations of both the signed and unsigned optabs because the sign
> doesn't matter for predicates: every element contains only one
> significant bit.)
> 
> Also, the float unpacks need to unpack one half of the input vector,
> but the unpacked upper bits are "don't care".  There are two obvious
> ways of handling that: use an unpack (filling with zeros) or use a ZIP
> (filling with a duplicate of the low bits).  The code previously used
> unpacks, but the sequence involved a subreg that is semantically an
> element reverse on big-endian targets.  Using the ZIP patterns avoids
> that, and at the moment there's no reason to prefer one over the other
> for performance reasons, so the patch switches to ZIP unconditionally.
> As the comment says, it would be easy to optimise this later if UUNPK
> turns out to be better for some implementations.
> 
> Tested on aarch64_be-elf, aarch64-linux-gnu and x86_64-linux-gnu.
> I think the tree-vect-loop-manip.c part is obvious, but OK for the
> AArch64 bits?

The AArch64 bits are OK by me, if this direction is appropriate for
the tree-vect-loop-manip.c parts.

Thanks,
James

> 
> Richard
> 
> 
> 2018-01-26  Richard Sandiford  <richard.sandif...@linaro.org>
> 
> gcc/
>       * tree-vect-loop-manip.c (vect_maybe_permute_loop_masks):
>       Reverse the choice between VEC_UNPACK_LO_EXPR and VEC_UNPACK_HI_EXPR
>       for big-endian.
>       * config/aarch64/iterators.md (hi_lanes_optab): New int attribute.
>       * config/aarch64/aarch64-sve.md
>       (*aarch64_sve_<perm_insn><perm_hilo><mode>): Rename to...
>       (aarch64_sve_<perm_insn><perm_hilo><mode>): ...this.
>       (*extend<mode><Vwide>2): Rename to...
>       (aarch64_sve_extend<mode><Vwide>2): ...this.
>       (vec_unpack<su>_<perm_hilo>_<mode>): Turn into a define_expand,
>       renaming the old pattern to...
>       (aarch64_sve_punpk<perm_hilo>_<mode>): ...this.  Only define
>       unsigned packs.
>       (vec_unpack<su>_<perm_hilo>_<SVE_BHSI:mode>): Turn into a
>       define_expand, renaming the old pattern to...
>       (aarch64_sve_<su>unpk<perm_hilo>_<SVE_BHSI:mode>): ...this.
>       (*vec_unpacku_<perm_hilo>_<mode>_no_convert): Delete.
>       (vec_unpacks_<perm_hilo>_<mode>): Take BYTES_BIG_ENDIAN into
>       account when deciding which SVE instruction the optab should use.
>       (vec_unpack<su_optab>_float_<perm_hilo>_vnx4si): Likewise.
> 
> gcc/testsuite/
>       * gcc.target/aarch64/sve/unpack_fcvt_signed_1.c: Expect zips rather
>       than unpacks.
>       * gcc.target/aarch64/sve/unpack_fcvt_unsigned_1.c: Likewise.
>       * gcc.target/aarch64/sve/unpack_float_1.c: Likewise.

Reply via email to