ins with __builtin_shuffle and zero vector

cvs-commit at gcc dot gnu.org via Gcc-bugs Fri, 16 May 2025 11:27:21 -0700

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100165


--- Comment #8 from GCC Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Pengxuan Zheng <pzh...@gcc.gnu.org>:

https://gcc.gnu.org/g:0417a630811404c2362060b7e15f99e5a4a0d76a

commit r16-703-g0417a630811404c2362060b7e15f99e5a4a0d76a
Author: Pengxuan Zheng <quic_pzh...@quicinc.com>
Date:   Mon May 12 10:12:11 2025 -0700

    aarch64: Optimize AND with certain vector of immediates as FMOV [PR100165]

    We can optimize AND with certain vector of immediates as FMOV if the result
of
    the AND is as if the upper lane of the input vector is set to zero and the
lower
    lane remains unchanged.

    For example, at present:

    v4hi
    f_v4hi (v4hi x)
    {
      return x & (v4hi){ 0xffff, 0xffff, 0, 0 };
    }

    generates:

    f_v4hi:
            movi    d31, 0xffffffff
            and     v0.8b, v0.8b, v31.8b
            ret

    With this patch, it generates:

    f_v4hi:
            fmov    s0, s0
            ret

    Changes since v1:
    * v2: Simplify the mask checking logic by using native_decode_int and
address a
    few other review comments.

            PR target/100165

    gcc/ChangeLog:

            * config/aarch64/aarch64-protos.h (aarch64_output_fmov): New
prototype.
            (aarch64_simd_valid_and_imm_fmov): Likewise.
            * config/aarch64/aarch64-simd.md (and<mode>3<vczle><vczbe>): Allow
FMOV
            codegen.
            * config/aarch64/aarch64.cc (aarch64_simd_valid_and_imm_fmov): New.
            (aarch64_output_fmov): Likewise.
            * config/aarch64/constraints.md (Df): New constraint.
            * config/aarch64/predicates.md (aarch64_reg_or_and_imm): Update
            predicate to support FMOV codegen.

    gcc/testsuite/ChangeLog:

            * gcc.target/aarch64/fmov-1-be.c: New test.
            * gcc.target/aarch64/fmov-1-le.c: New test.
            * gcc.target/aarch64/fmov-2-be.c: New test.
            * gcc.target/aarch64/fmov-2-le.c: New test.

    Signed-off-by: Pengxuan Zheng <quic_pzh...@quicinc.com>

[Bug target/100165] fmov could be used to zero out the upper bits instead of movi/zip or movi/ins with __builtin_shuffle and zero vector

Reply via email to