Re: [SLP/AArch64] Fix unpack handling for big-endian SVE

Richard Biener Fri, 26 Jan 2018 07:22:53 -0800

On Fri, Jan 26, 2018 at 4:04 PM, Richard Sandiford
<richard.sandif...@linaro.org> wrote:
> Richard Biener <richard.guent...@gmail.com> writes:
>> On Fri, Jan 26, 2018 at 3:21 PM, Richard Sandiford
>> <richard.sandif...@linaro.org> wrote:
>>> I hadn't realised that on big-endian targets, VEC_UNPACK*HI_EXPR unpacks
>>> the low-numbered lanes and VEC_UNPACK*LO_EXPR unpacks the high-numbered
>>> lanes.  This meant that both the SVE patterns and the handling of
>>> fully-masked loops were wrong.
>>>
>>> The patch deals with that by making sure that all vec_unpack* optabs
>>> are define_expands, using BYTES_BIG_ENDIAN to choose the appropriate
>>> define_insn.  This in turn meant that we can get rid of the duplication
>>> between the signed and unsigned patterns for predicates.  (We provide
>>> implementations of both the signed and unsigned optabs because the sign
>>> doesn't matter for predicates: every element contains only one
>>> significant bit.)
>>>
>>> Also, the float unpacks need to unpack one half of the input vector,
>>> but the unpacked upper bits are "don't care".  There are two obvious
>>> ways of handling that: use an unpack (filling with zeros) or use a ZIP
>>> (filling with a duplicate of the low bits).  The code previously used
>>> unpacks, but the sequence involved a subreg that is semantically an
>>> element reverse on big-endian targets.  Using the ZIP patterns avoids
>>> that, and at the moment there's no reason to prefer one over the other
>>> for performance reasons, so the patch switches to ZIP unconditionally.
>>> As the comment says, it would be easy to optimise this later if UUNPK
>>> turns out to be better for some implementations.
>>>
>>> Tested on aarch64_be-elf, aarch64-linux-gnu and x86_64-linux-gnu.
>>> I think the tree-vect-loop-manip.c part is obvious, but OK for the
>>> AArch64 bits?
>>>
>>> Richard
>>>
>>>
>>> 2018-01-26  Richard Sandiford  <richard.sandif...@linaro.org>
>>>
>>> gcc/
>>>         * tree-vect-loop-manip.c (vect_maybe_permute_loop_masks):
>>>         Reverse the choice between VEC_UNPACK_LO_EXPR and VEC_UNPACK_HI_EXPR
>>>         for big-endian.
>>>         * config/aarch64/iterators.md (hi_lanes_optab): New int attribute.
>>>         * config/aarch64/aarch64-sve.md
>>>         (*aarch64_sve_<perm_insn><perm_hilo><mode>): Rename to...
>>>         (aarch64_sve_<perm_insn><perm_hilo><mode>): ...this.
>>>         (*extend<mode><Vwide>2): Rename to...
>>>         (aarch64_sve_extend<mode><Vwide>2): ...this.
>>>         (vec_unpack<su>_<perm_hilo>_<mode>): Turn into a define_expand,
>>>         renaming the old pattern to...
>>>         (aarch64_sve_punpk<perm_hilo>_<mode>): ...this.  Only define
>>>         unsigned packs.
>>>         (vec_unpack<su>_<perm_hilo>_<SVE_BHSI:mode>): Turn into a
>>>         define_expand, renaming the old pattern to...
>>>         (aarch64_sve_<su>unpk<perm_hilo>_<SVE_BHSI:mode>): ...this.
>>>         (*vec_unpacku_<perm_hilo>_<mode>_no_convert): Delete.
>>>         (vec_unpacks_<perm_hilo>_<mode>): Take BYTES_BIG_ENDIAN into
>>>         account when deciding which SVE instruction the optab should use.
>>>         (vec_unpack<su_optab>_float_<perm_hilo>_vnx4si): Likewise.
>>>
>>> gcc/testsuite/
>>>         * gcc.target/aarch64/sve/unpack_fcvt_signed_1.c: Expect zips rather
>>>         than unpacks.
>>>         * gcc.target/aarch64/sve/unpack_fcvt_unsigned_1.c: Likewise.
>>>         * gcc.target/aarch64/sve/unpack_float_1.c: Likewise.
>>>
>>> Index: gcc/tree-vect-loop-manip.c
>>> ===================================================================
>>> --- gcc/tree-vect-loop-manip.c  2018-01-13 18:02:00.948360274 +0000
>>> +++ gcc/tree-vect-loop-manip.c  2018-01-26 14:00:15.717916957 +0000
>>> @@ -334,7 +334,8 @@ vect_maybe_permute_loop_masks (gimple_se
>>>         {
>>>           tree src = src_rgm->masks[i / 2];
>>>           tree dest = dest_rgm->masks[i];
>>> -         tree_code code = (i & 1 ? VEC_UNPACK_HI_EXPR
>>> +         tree_code code = ((i & 1) == (BYTES_BIG_ENDIAN ? 0 : 1)
>>> +                           ? VEC_UNPACK_HI_EXPR
>>>                             : VEC_UNPACK_LO_EXPR);
>>
>> This looks broken anyways -- the element oder on GIMPLE is the same
>> for little and big-endian.  I see two other BYTES_BIG_ENDIAN uses in
>> tree-vect-*, not sure when they crept in but they are all broken.
>
> Are you sure?  Everything apart from this new code seems consistently
> to treat UNPACK_HI as "high part of the register" rather than
> "high memory address/lane number".  I think changing it now would
> break powerpc64 and mips.


Looks like so.  But we agreed on changing this, looks like this is a left-over.
(I'm always looking at constant folding for how it's supposed to work).
Looks like VEC_WIDEN_LSHIFT_HI/O_EXPR doesn't have constant folding.

Richard.

> Thanks,
> Richard

Re: [SLP/AArch64] Fix unpack handling for big-endian SVE

Reply via email to