setr_m128i intrinsics

cvs-commit at gcc dot gnu.org Thu, 06 Feb 2020 02:11:19 -0800

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93594


--- Comment #5 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Jakub Jelinek <ja...@gcc.gnu.org>:

https://gcc.gnu.org/g:3f740c67dbb90177aa71d3c60ef9b0fd2f44dbd9

commit r10-6472-g3f740c67dbb90177aa71d3c60ef9b0fd2f44dbd9
Author: Jakub Jelinek <ja...@redhat.com>
Date:   Thu Feb 6 11:08:59 2020 +0100

    i386: Improve avx* vector concatenation [PR93594]

    The following testcase shows that for _mm256_set*_m128i and similar
    intrinsics, we sometimes generate bad code.  All 4 routines are expressing
    the same thing, a 128-bit vector zero padded to 256-bit vector, but only
the
    3rd one actually emits the desired vmovdqa      %xmm0, %xmm0 insn, the
    others vpxor    %xmm1, %xmm1, %xmm1; vinserti128        $0x1, %xmm1, %ymm0,
%ymm0
    The problem is that the cast builtins use UNSPEC_CAST which is after reload
    simplified using a splitter, but during combine it prevents optimizations.
    We do have avx_vec_concat* patterns that generate efficient code, both for
    this low part + zero concatenation special case and for other cases too, so
    the following define_insn_and_split just recognizes avx_vec_concat made of
a
    low half of a cast and some other reg.

    2020-02-06  Jakub Jelinek  <ja...@redhat.com>

        PR target/93594
        * config/i386/predicates.md (avx_identity_operand): New predicate.
        * config/i386/sse.md (*avx_vec_concat<mode>_1): New
        define_insn_and_split.

        * gcc.target/i386/avx2-pr93594.c: New test.

[Bug target/93594] Missed optimization with _mm256_set/setr_m128i intrinsics

Reply via email to