[Bug target/97875] suboptimal loop vectorization

cvs-commit at gcc dot gnu.org via Gcc-bugs Tue, 12 Jan 2021 08:51:28 -0800

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97875


--- Comment #7 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Christophe Lyon <cl...@gcc.gnu.org>:

https://gcc.gnu.org/g:25bef68902f42f414f99626cefb2d3df81de7dc8

commit r11-6616-g25bef68902f42f414f99626cefb2d3df81de7dc8
Author: Christophe Lyon <christophe.l...@linaro.org>
Date:   Tue Jan 12 16:47:27 2021 +0000

    arm: Add movmisalign patterns for MVE (PR target/97875)

    This patch adds new movmisalign<mode>_mve_load and store patterns for
    MVE to help vectorization. They are very similar to their Neon
    counterparts, but use different iterators and instructions.

    Indeed MVE supports less vectors modes than Neon, so we use the
    MVE_VLD_ST iterator where Neon uses VQX.

    Since the supported modes are different from the ones valid for
    arithmetic operators, we introduce two new sets of macros:

    ARM_HAVE_NEON_<MODE>_LDST
      true if Neon has vector load/store instructions for <MODE>

    ARM_HAVE_<MODE>_LDST
      true if any vector extension has vector load/store instructions for
<MODE>

    We move the movmisalign<mode> expander from neon.md to vec-commond.md, and
    replace the TARGET_NEON enabler with ARM_HAVE_<MODE>_LDST.

    The patch also updates the mve-vneg.c test to scan for the better code
    generation when loading and storing the vectors involved: it checks
    that no 'orr' instruction is generated to cope with misalignment at
    runtime.
    This test was chosen among the other mve tests, but any other should
    be OK. Using a plain vector copy loop (dest[i] = a[i]) is not a good
    test because the compiler chooses to use memcpy.

    For instance we now generate:
    test_vneg_s32x4:
            vldrw.32       q3, [r1]
            vneg.s32  q3, q3
            vstrw.32       q3, [r0]
            bx      lr

    instead of:
    test_vneg_s32x4:
            orr     r3, r1, r0
            lsls    r3, r3, #28
            bne     .L15
            vldrw.32        q3, [r1]
            vneg.s32  q3, q3
            vstrw.32        q3, [r0]
            bx      lr
            .L15:
            push    {r4, r5}
            ldrd    r2, r3, [r1, #8]
            ldrd    r5, r4, [r1]
            rsbs    r2, r2, #0
            rsbs    r5, r5, #0
            rsbs    r4, r4, #0
            rsbs    r3, r3, #0
            strd    r5, r4, [r0]
            pop     {r4, r5}
            strd    r2, r3, [r0, #8]
            bx      lr

    2021-01-12  Christophe Lyon  <christophe.l...@linaro.org>

            PR target/97875
            gcc/
            * config/arm/arm.h (ARM_HAVE_NEON_V8QI_LDST): New macro.
            (ARM_HAVE_NEON_V16QI_LDST, ARM_HAVE_NEON_V4HI_LDST): Likewise.
            (ARM_HAVE_NEON_V8HI_LDST, ARM_HAVE_NEON_V2SI_LDST): Likewise.
            (ARM_HAVE_NEON_V4SI_LDST, ARM_HAVE_NEON_V4HF_LDST): Likewise.
            (ARM_HAVE_NEON_V8HF_LDST, ARM_HAVE_NEON_V4BF_LDST): Likewise.
            (ARM_HAVE_NEON_V8BF_LDST, ARM_HAVE_NEON_V2SF_LDST): Likewise.
            (ARM_HAVE_NEON_V4SF_LDST, ARM_HAVE_NEON_DI_LDST): Likewise.
            (ARM_HAVE_NEON_V2DI_LDST): Likewise.
            (ARM_HAVE_V8QI_LDST, ARM_HAVE_V16QI_LDST): Likewise.
            (ARM_HAVE_V4HI_LDST, ARM_HAVE_V8HI_LDST): Likewise.
            (ARM_HAVE_V2SI_LDST, ARM_HAVE_V4SI_LDST, ARM_HAVE_V4HF_LDST):
Likewise.
            (ARM_HAVE_V8HF_LDST, ARM_HAVE_V4BF_LDST, ARM_HAVE_V8BF_LDST):
Likewise.
            (ARM_HAVE_V2SF_LDST, ARM_HAVE_V4SF_LDST, ARM_HAVE_DI_LDST):
Likewise.
            (ARM_HAVE_V2DI_LDST): Likewise.
            * config/arm/mve.md (*movmisalign<mode>_mve_store): New pattern.
            (*movmisalign<mode>_mve_load): New pattern.
            * config/arm/neon.md (movmisalign<mode>): Move to ...
            * config/arm/vec-common.md: ... here.

            PR target/97875
            gcc/testsuite/
            * gcc.target/arm/simd/mve-vneg.c: Update test.

[Bug target/97875] suboptimal loop vectorization

Reply via email to