https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95905

--- Comment #3 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Jakub Jelinek <ja...@gcc.gnu.org>:

https://gcc.gnu.org/g:b1d1e2b54c6b9cf13f021176ba37d24cc4dc2fe1

commit r11-6636-gb1d1e2b54c6b9cf13f021176ba37d24cc4dc2fe1
Author: Jakub Jelinek <ja...@redhat.com>
Date:   Wed Jan 13 11:28:48 2021 +0100

    i386, expand: Optimize also 256-bit and 512-bit permutatations as vpmovzx
if possible [PR95905]

    The following patch implements what I've talked about, i.e. to no longer
    force operands of vec_perm_const into registers in the generic code, but
let
    each of the (currently 8) targets force it into registers individually,
    giving the targets better control on if it does that and when and allowing
    them to do something special with some particular operands.
    And then defines the define_insn_and_split for the 256-bit and 512-bit
    permutations into vpmovzx* (only the bw, wd and dq cases, in theory we
could
    add define_insn_and_split patterns also for the bd, bq and wq).

    2021-01-13  Jakub Jelinek  <ja...@redhat.com>

            PR target/95905
            * optabs.c (expand_vec_perm_const): Don't force v0 and v1 into
            registers before calling targetm.vectorize.vec_perm_const, only
after
            that.
            * config/i386/i386-expand.c (ix86_vectorize_vec_perm_const): Handle
            two argument permutation when one operand is zero vector and only
            after that force operands into registers.
            * config/i386/sse.md (*avx2_zero_extendv16qiv16hi2_1): New
            define_insn_and_split pattern.
            (*avx512bw_zero_extendv32qiv32hi2_1): Likewise.
            (*avx512f_zero_extendv16hiv16si2_1): Likewise.
            (*avx2_zero_extendv8hiv8si2_1): Likewise.
            (*avx512f_zero_extendv8siv8di2_1): Likewise.
            (*avx2_zero_extendv4siv4di2_1): Likewise.
            * config/mips/mips.c (mips_vectorize_vec_perm_const): Force
operands
            into registers.
            * config/arm/arm.c (arm_vectorize_vec_perm_const): Likewise.
            * config/sparc/sparc.c (sparc_vectorize_vec_perm_const): Likewise.
            * config/ia64/ia64.c (ia64_vectorize_vec_perm_const): Likewise.
            * config/aarch64/aarch64.c (aarch64_vectorize_vec_perm_const):
Likewise.
            * config/rs6000/rs6000.c (rs6000_vectorize_vec_perm_const):
Likewise.
            * config/gcn/gcn.c (gcn_vectorize_vec_perm_const): Likewise.  Use
std::swap.

            * gcc.target/i386/pr95905-2.c: Use scan-assembler-times instead of
            scan-assembler.  Add tests with zero vector as first
__builtin_shuffle
            operand.
            * gcc.target/i386/pr95905-3.c: New test.
            * gcc.target/i386/pr95905-4.c: New test.

Reply via email to