Hi all,

The aarch64_addpdi pattern is redundant as the reduc_plus_scal_<mode> pattern 
can already generate
the required form of the ADDP instruction, and is mostly folded to GIMPLE early 
on so can benefit from more optimisations.
Though it turns out that we were missing the folding for the unsigned variants.
This patch adds that and wires up the vpaddd_u64 and vpaddd_s64 intrinsics 
through the above pattern instead
so that we can remove a redundant pattern and get more optimisation earlier.

Bootstrapped and tested on aarch64-none-linux-gnu and aarch64_be-none-elf.
Pushing to trunk.
Thanks,
Kyrill

gcc/ChangeLog:

        * config/aarch64/aarch64-builtins.cc 
(aarch64_general_gimple_fold_builtin):
        Handle unsigned reduc_plus_scal_ builtins.
        * config/aarch64/aarch64-simd-builtins.def (addp): Delete DImode 
instances.
        * config/aarch64/aarch64-simd.md (aarch64_addpdi): Delete.
        * config/aarch64/arm_neon.h (vpaddd_s64): Reimplement with
        __builtin_aarch64_reduc_plus_scal_v2di.
        (vpaddd_u64): Reimplement with 
__builtin_aarch64_reduc_plus_scal_v2di_uu.

Attachment: vpaddd.patch
Description: vpaddd.patch

Reply via email to