On Mon, Sep 22, 2014 at 1:21 PM, Richard Biener
<richard.guent...@gmail.com> wrote:
> On Thu, Sep 18, 2014 at 1:41 PM, Alan Lawrence <alan.lawre...@arm.com> wrote:
>> The end goal here is to remove this code from tree-vect-loop.c
>> (vect_create_epilog_for_reduction):
>>
>>       if (BYTES_BIG_ENDIAN)
>>         bitpos = size_binop (MULT_EXPR,
>>                              bitsize_int (TYPE_VECTOR_SUBPARTS (vectype) -
>> 1),
>>                              TYPE_SIZE (scalar_type));
>>       else
>>
>> as this is the root cause of PR/61114 (see testcase there, failing on all
>> bigendian targets supporting reduc_[us]plus_optab). Quoting Richard Biener,
>> "all code conditional on BYTES/WORDS_BIG_ENDIAN in tree-vect* is
>> suspicious". The code snippet above is used on two paths:
>>
>> (Path 1) (patches 1-6) Reductions using REDUC_(PLUS|MIN|MAX)_EXPR =
>> reduc_[us](plus|min|max)_optab.
>> The optab is documented as "the scalar result is stored in the least
>> significant bits of operand 0", but the tree code as "the first element in
>> the vector holding the result of the reduction of all elements of the
>> operand". This mismatch means that when the tree code is folded, the code
>> snippet above reads the result from the wrong end of the vector.
>>
>> The strategy (as per
>> https://gcc.gnu.org/ml/gcc-patches/2014-08/msg00041.html) is to define new
>> tree codes and optabs that produce scalar results directly; this seems
>> better than tying (the element of the vector into which the result is
>> placed) to (the endianness of the target), and avoids generating extra moves
>> on current bigendian targets. However, the previous optabs are retained for
>> now as a migration strategy so as not to break existing backends; moving
>> individual platforms over will follow.
>>
>> A complication here is on AArch64, where we directly generate
>> REDUC_PLUS_EXPRs from intrinsics in gimple_fold_builtin; I temporarily
>> remove this folding in order to decouple the midend and AArch64 backend.
>
> Sounds fine.  I hope we can transition all backends for 5.0 and remove
> the vector variant optabs (maybe renaming the scalar ones).
>
>> (Path 2) (patches 7-13) Reductions using whole-vector-shifts, i.e.
>> VEC_RSHIFT_EXPR and vec_shr_optab. Here the tree code as well as the optab
>> is defined in an endianness-dependent way, leading to significant
>> complication in fold-const.c. (Moreover, the "equivalent" vec_shl_optab is
>> never used!). Few platforms appear to handle vec_shr_optab (and fewer
>> bigendian - I see only PowerPC and MIPS), so it seems pertinent to change
>> the existing optab to be endianness-neutral.
>>
>> Patch 10 defines vec_shr for AArch64, for the old specification; patch 13
>> updates that implementation to fit the new endianness-neutral specification,
>> serving as a guide for other existing backends. Patches/RFCs 15 and 16 are
>> equivalents for MIPS and PowerPC; I haven't tested these but hope they act
>> as useful pointers for the port maintainers.
>>
>> Finally patch 14 cleans up the affected part of tree-vect-loop.c
>> (vect_create_epilog_for_reduction).
>
> As said during the individual patches review I'd like the vectorizer to
> use a VEC_PERM_EXPR instead of VEC_RSHIFT_EXPR (with
> only whole-element amounts).  This means we can remove
> VEC_RSHIFT_EXPR.  It also means that if the backend defines
> vec_perm_const (which it really should) it can handle the special
> permutes that boil down to a possibly more efficient vector shift
> there (a good optimization anyway).  Until it does that all backends
> would at least create correct code (with the endian dependent
> vec_shr removed).

It seems only Alpha completely lacks vec_perm_const but implements
vec_shr.

Richard.

> Richard.
>
>> --Alan
>>

Reply via email to