Ok....well, I see a path forward, somewhere there....
however (bah), I can't push that subset of patches - I came back from a week's holiday and misremembered - the AArch64 changes depend upon the introduction of the _scal_optabs, not just the tree changes :( . I'll try to post optab migration patches next week for x86, rs6000 (mostly, I haven't figured out paired.md yet), ARM, and IA64 (fwiw; has only v2sf reductions). Having looked at MIPS/Loongson I'm feeling a bit bewildered and not sure how to proceed, so I think I must ask the MIPS maintainers (CC'd) for assistance: how can one add a vec_extract, to produce a scalar result, to the end of each reduc_ optab ? --Alan Richard Biener wrote: > > On Mon, Oct 6, 2014 at 7:30 PM, Alan Lawrence <alan.lawre...@arm.com> wrote: >> >> Ok, so unless there are objections, I plan to commit patches 1, 2, 4, 5, and >> >> 6, >> >> which have been previously approved, in that sequence. (Of those, all bar >> >> patch >> >> 2 are AArch64 only.) I think this is better than maintaining an >> >> ever-expanding >> >> patch series. > > > > Agreed. > > >> >> Then I'll get to work on migrating all backends to the new _scal_ optab (and >> >> removing the vector optab). Certainly I'd like to replace vec_shr/l with >> >> vec_perm_expr too, but I'm conscious that the end of stage 1 is approaching! > > > > I suppose we all are. It will last until end of October at least > > (stage1 of gcc 4.9 > > ended Nov 22th, certainly a bit late). > > > > I do expect we will continue merging already developed / posted stuff through > > stage3 (as usual). > > > > That said, it would be really nice to get rid of VEC_RSHIFT_EXPR. > > > > Thanks, > > Richard. > > >> >> --Alan >> >> >> >> >> >> >> >> >> >> Richard Biener wrote: >>> >>> On Thu, Sep 18, 2014 at 1:41 PM, Alan Lawrence <alan.lawre...@arm.com> >>> >>> wrote: >>>> >>>> The end goal here is to remove this code from tree-vect-loop.c >>>> >>>> (vect_create_epilog_for_reduction): >>>> >>>> >>>> >>>> if (BYTES_BIG_ENDIAN) >>>> >>>> bitpos = size_binop (MULT_EXPR, >>>> >>>> bitsize_int (TYPE_VECTOR_SUBPARTS (vectype) >>>> >>>> - >>>> >>>> 1), >>>> >>>> TYPE_SIZE (scalar_type)); >>>> >>>> else >>>> >>>> >>>> >>>> as this is the root cause of PR/61114 (see testcase there, failing on all >>>> >>>> bigendian targets supporting reduc_[us]plus_optab). Quoting Richard >>>> >>>> Biener, >>>> >>>> "all code conditional on BYTES/WORDS_BIG_ENDIAN in tree-vect* is >>>> >>>> suspicious". The code snippet above is used on two paths: >>>> >>>> >>>> >>>> (Path 1) (patches 1-6) Reductions using REDUC_(PLUS|MIN|MAX)_EXPR = >>>> >>>> reduc_[us](plus|min|max)_optab. >>>> >>>> The optab is documented as "the scalar result is stored in the least >>>> >>>> significant bits of operand 0", but the tree code as "the first element >>>> >>>> in >>>> >>>> the vector holding the result of the reduction of all elements of the >>>> >>>> operand". This mismatch means that when the tree code is folded, the code >>>> >>>> snippet above reads the result from the wrong end of the vector. >>>> >>>> >>>> >>>> The strategy (as per >>>> >>>> https://gcc.gnu.org/ml/gcc-patches/2014-08/msg00041.html) is to define >>>> >>>> new >>>> >>>> tree codes and optabs that produce scalar results directly; this seems >>>> >>>> better than tying (the element of the vector into which the result is >>>> >>>> placed) to (the endianness of the target), and avoids generating extra >>>> >>>> moves >>>> >>>> on current bigendian targets. However, the previous optabs are retained >>>> >>>> for >>>> >>>> now as a migration strategy so as not to break existing backends; moving >>>> >>>> individual platforms over will follow. >>>> >>>> >>>> >>>> A complication here is on AArch64, where we directly generate >>>> >>>> REDUC_PLUS_EXPRs from intrinsics in gimple_fold_builtin; I temporarily >>>> >>>> remove this folding in order to decouple the midend and AArch64 backend. >>> >>> >>> >>> Sounds fine. I hope we can transition all backends for 5.0 and remove >>> >>> the vector variant optabs (maybe renaming the scalar ones). >>> >>> >>>> >>>> (Path 2) (patches 7-13) Reductions using whole-vector-shifts, i.e. >>>> >>>> VEC_RSHIFT_EXPR and vec_shr_optab. Here the tree code as well as the >>>> >>>> optab >>>> >>>> is defined in an endianness-dependent way, leading to significant >>>> >>>> complication in fold-const.c. (Moreover, the "equivalent" vec_shl_optab >>>> >>>> is >>>> >>>> never used!). Few platforms appear to handle vec_shr_optab (and fewer >>>> >>>> bigendian - I see only PowerPC and MIPS), so it seems pertinent to change >>>> >>>> the existing optab to be endianness-neutral. >>>> >>>> >>>> >>>> Patch 10 defines vec_shr for AArch64, for the old specification; patch 13 >>>> >>>> updates that implementation to fit the new endianness-neutral >>>> >>>> specification, >>>> >>>> serving as a guide for other existing backends. Patches/RFCs 15 and 16 >>>> >>>> are >>>> >>>> equivalents for MIPS and PowerPC; I haven't tested these but hope they >>>> >>>> act >>>> >>>> as useful pointers for the port maintainers. >>>> >>>> >>>> >>>> Finally patch 14 cleans up the affected part of tree-vect-loop.c >>>> >>>> (vect_create_epilog_for_reduction). >>> >>> >>> >>> As said during the individual patches review I'd like the vectorizer to >>> >>> use a VEC_PERM_EXPR instead of VEC_RSHIFT_EXPR (with >>> >>> only whole-element amounts). This means we can remove >>> >>> VEC_RSHIFT_EXPR. It also means that if the backend defines >>> >>> vec_perm_const (which it really should) it can handle the special >>> >>> permutes that boil down to a possibly more efficient vector shift >>> >>> there (a good optimization anyway). Until it does that all backends >>> >>> would at least create correct code (with the endian dependent >>> >>> vec_shr removed). >>> >>> >>> >>> Richard. >>> >>> >>>> >>>> --Alan >>>> >>>>