https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99656

--- Comment #4 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Tamar Christina <tnfch...@gcc.gnu.org>:

https://gcc.gnu.org/g:c3a2bc6daaa2d278cb5f323e2df4b8c2af4198ac

commit r11-7736-gc3a2bc6daaa2d278cb5f323e2df4b8c2af4198ac
Author: Tamar Christina <tamar.christ...@arm.com>
Date:   Fri Mar 19 14:29:36 2021 +0000

    slp: remove unneeded permute calculation (PR99656)

    The attach testcase ICEs because as you showed on the PR we have one child
    which is an internal with a PERM of EVENEVEN and one with TOP.

    The problem is while we can conceptually merge the permute itself into
EVENEVEN,
    merging the lanes don't really make sense.

    That said, we no longer even require the merged lanes as we create the
permutes
    based on the KIND directly.

    This patch just removes all of that code.

    Unfortunately it still won't vectorize with the cost model enabled due to
the
    blend that's created combining the load and the external

            note: node 0x51f2ce8 (max_nunits=1, refcnt=1)
            note: op: VEC_PERM_EXPR
            note:       { }
            note:       lane permutation { 0[0] 1[1] }
            note:       children 0x51f23e0 0x51f2578
            note: node 0x51f23e0 (max_nunits=2, refcnt=1)
            note: op template: _16 = REALPART_EXPR <*t1_9(D)>;
            note:       stmt 0 _16 = REALPART_EXPR <*t1_9(D)>;
            note:       stmt 1 _16 = REALPART_EXPR <*t1_9(D)>;
            note:       load permutation { 0 0 }
            note: node (external) 0x51f2578 (max_nunits=1, refcnt=1)
            note:       { _18, _18 }

    which costs the cost for the load-and-split and the cost of the external
splat,
    and the one for blending them while in reality it's just a scalar load and
    insert.

    The compiler (with the cost model disabled) generates

            ldr     q1, [x19]
            dup     v1.2d, v1.d[0]
            ldr     d0, [x19, 8]
            fneg    d0, d0
            ins     v1.d[1], v0.d[0]

    while really it should be

            ldp     d1, d0, [x19]
            fneg    d0, d0
            ins     v1.d[1], v0.d[0]

    but that's for another time.

    gcc/ChangeLog:

            PR tree-optimization/99656
            * tree-vect-slp-patterns.c (linear_loads_p,
            complex_add_pattern::matches, is_eq_or_top,
            vect_validate_multiplication, complex_mul_pattern::matches,
            complex_fms_pattern::matches): Remove complex_perm_kinds_t.
            * tree-vectorizer.h: (complex_load_perm_t): Removed.
            (slp_tree_to_load_perm_map_t): Use complex_perm_kinds_t instead of
            complex_load_perm_t.

    gcc/testsuite/ChangeLog:

            PR tree-optimization/99656
            * gfortran.dg/vect/pr99656.f90: New test.

Reply via email to