Hi all,

We've received requests to optimise the attached intrinsics testcase.
We currently generate:
foo_1:
        uaddlp  v0.4s, v0.8h
        uaddlv  d31, v0.4s
        fmov    x0, d31
        ret
foo_2:
        uaddlp  v0.4s, v0.8h
        addv    s31, v0.4s
        fmov    w0, s31
        ret
foo_3:
        saddlp  v0.4s, v0.8h
        addv    s31, v0.4s
        fmov    w0, s31
        ret

The widening pair-wise addition addlp instructions can be omitted if we're just 
doing an ADDV afterwards.
Making this optimisation would be quite simple if we had a standard RTL PLUS 
vector reduction code.
As we don't, we can use UNSPEC_ADDV as a stand in.
This patch expresses the SADDLV and UADDLV instructions as an UNSPEC_ADDV over 
a widened input, thus removing
the need for separate UNSPEC_SADDLV and UNSPEC_UADDLV codes.
To optimise the testcases involved we add two splitters that match a vector 
addition where all participating elements
are taken and widened from the same vector and then fed into an UNSPEC_ADDV. In 
that case we can just remove the
vector PLUS and just emit the simple RTL for SADDLV/UADDLV.

Bootstrapped and tested on aarch64-none-linux-gnu.
Pushing to trunk.

Thanks,
Kyrill

gcc/ChangeLog:

        * config/aarch64/aarch64-protos.h (aarch64_parallel_select_half_p):
        Define prototype.
        (aarch64_pars_overlap_p): Likewise.
        * config/aarch64/aarch64-simd.md (aarch64_<su>addlv<mode>):
        Express in terms of UNSPEC_ADDV.
        (*aarch64_<su>addlv<VDQV_L:mode>_ze<GPI:mode>): Likewise.
        (*aarch64_<su>addlv<mode>_reduction): Define.
        (*aarch64_uaddlv<mode>_reduction_2): Likewise.
        * config/aarch64/aarch64.cc     (aarch64_parallel_select_half_p): 
Define.
        (aarch64_pars_overlap_p): Likewise.
        * config/aarch64/iterators.md (UNSPEC_SADDLV, UNSPEC_UADDLV): Delete.
        (VQUADW): New mode attribute.
        (VWIDE2X_S): Likewise.
        (USADDLV): Delete.
        (su): Delete handling of UNSPEC_SADDLV, UNSPEC_UADDLV.
        * config/aarch64/predicates.md (vect_par_cnst_select_half): Define.

gcc/testsuite/ChangeLog:

        * gcc.target/aarch64/simd/addlv_1.c: New test.

Attachment: addlv2.patch
Description: addlv2.patch

Reply via email to