https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94864

--- Comment #5 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Richard Biener <rgue...@gcc.gnu.org>:

https://gcc.gnu.org/g:27de9aa152141e7f3ee66372647d0f2cd94c4b90

commit r14-3381-g27de9aa152141e7f3ee66372647d0f2cd94c4b90
Author: Richard Biener <rguent...@suse.de>
Date:   Wed Jul 12 15:01:47 2023 +0200

    tree-optimization/94864 - vector insert of vector extract simplification

    The PRs ask for optimizing of

      _1 = BIT_FIELD_REF <b_3(D), 64, 64>;
      result_4 = BIT_INSERT_EXPR <a_2(D), _1, 64>;

    to a vector permutation.  The following implements this as
    match.pd pattern, improving code generation on x86_64.

    On the RTL level we face the issue that backend patterns inconsistently
    use vec_merge and vec_select of vec_concat to represent permutes.

    I think using a (supported) permute is almost always better
    than an extract plus insert, maybe excluding the case we extract
    element zero and that's aliased to a register that can be used
    directly for insertion (not sure how to query that).

    The patch FAILs one case in gcc.target/i386/avx512fp16-vmovsh-1a.c
    where we now expand from

     __A_28 = VEC_PERM_EXPR <x2.8_9, x1.9_10, { 0, 9, 10, 11, 12, 13, 14, 15
}>;

    instead of

     _28 = BIT_FIELD_REF <x2.8_9, 16, 0>;
     __A_29 = BIT_INSERT_EXPR <x1.9_10, _28, 0>;

    producing a vpblendw instruction instead of the expected vmovsh.  That's
    either a missed vec_perm_const expansion optimization or even better,
    an improvement - Zen4 for example has 4 ports to execute vpblendw
    but only 3 for executing vmovsh and both instructions have the same size.

    The patch XFAILs the sub-testcase.

            PR tree-optimization/94864
            PR tree-optimization/94865
            PR tree-optimization/93080
            * match.pd (bit_insert @0 (BIT_FIELD_REF @1 ..) ..): New pattern
            for vector insertion from vector extraction.

            * gcc.target/i386/pr94864.c: New testcase.
            * gcc.target/i386/pr94865.c: Likewise.
            * gcc.target/i386/avx512fp16-vmovsh-1a.c: XFAIL.
            * gcc.dg/tree-ssa/forwprop-40.c: Likewise.
            * gcc.dg/tree-ssa/forwprop-41.c: Likewise.

Reply via email to