[Bug middle-end/115597] New: [15 Regression] vectorizer takes 20+ h compiling 510.parest in SPECCPU2017 since g:46bb4ce4d30ab749d40f6f4cef6f1fb7c7813452

tnfchris at gcc dot gnu.org via Gcc-bugs Sun, 23 Jun 2024 01:24:40 -0700

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115597


            Bug ID: 115597
           Summary: [15 Regression] vectorizer takes 20+ h compiling
                    510.parest in SPECCPU2017 since
                    g:46bb4ce4d30ab749d40f6f4cef6f1fb7c7813452
           Product: gcc
           Version: 15.0
            Status: UNCONFIRMED
          Keywords: compile-time-hog
          Severity: normal
          Priority: P3
         Component: middle-end
          Assignee: unassigned at gcc dot gnu.org
          Reporter: tnfchris at gcc dot gnu.org
                CC: rguenth at gcc dot gnu.org
  Target Milestone: ---
            Target: aarch64*

Created attachment 58496
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=58496&action=edit
slp dump graph

Since:

commit 46bb4ce4d30ab749d40f6f4cef6f1fb7c7813452 (HEAD)
Author: Richard Biener <rguent...@suse.de>
Date:   Wed Jun 19 12:57:27 2024 +0200

    tree-optimization/114413 - SLP CSE after permute optimization

    We currently fail to re-CSE SLP nodes after optimizing permutes
    which results in off cost estimates.  For gcc.dg/vect/bb-slp-32.c
    this shows in not re-using the SLP node with the load and arithmetic
    for both the store and the reduction.  The following implements
    CSE by re-bst-mapping nodes as finalization part of vect_optimize_slp.

    I've tried to make the CSE part of permute materialization but it
    isn't a very good fit there.  I've not bothered to implement something
    more complete, also handling external defs or defs without
    SLP_TREE_SCALAR_STMTS.

    I realize this might result in more BB SLP which in turn might slow
    down code given costing for BB SLP is difficult (even that we now
    vectorize gcc.dg/vect/bb-slp-32.c on x86_64 might be not a good idea).
    This is nevertheless feeding more accurate info to costing which is
    good.

            PR tree-optimization/114413
            * tree-vect-slp.cc (release_scalar_stmts_to_slp_tree_map):
            New function, split out from ...
            (vect_analyze_slp): ... here.  Call it.
            (vect_cse_slp_nodes): New function.
            (vect_optimize_slp): Call it.

            * gcc.dg/vect/bb-slp-32.c: Expect CSE and vectorization on x86.

Compilation takes an extremely long time in 510.parest_r.

The problem seems to be that vect_cse_slp_nodes visits the same nodes twice.
It looks like the function has no visited set, and the hot loop in parest (when
vectorizable thanks to libmvec) has many TWO_OPERANDS nodes and one of them is
rooted at the top level.

vect_cse_slp_nodes seems to skip VEC_PERM_EXPR but not it's children, as such
it ends up visiting the same subgraphs multiple times. The graph in parest has
so many TWO_OPERAND nodes that essentially compilation never finishes.

I believe this function needs a visited node set.

example call graph:

#334 0x00000000018a1e14 in vect_cse_slp_nodes (bst_map=0x41627a0,
node=@0x4132f40: 0x3df2ec0) at
/opt/buildAgent/work/5c94c4ced6ebfcd0/gcc/tree-vect-slp.cc:6111
#335 0x00000000018a1e14 in vect_cse_slp_nodes (bst_map=0x41627a0,
node=@0x41321a0: 0x3df2b90) at
/opt/buildAgent/work/5c94c4ced6ebfcd0/gcc/tree-vect-slp.cc:6111
#336 0x00000000018a1e14 in vect_cse_slp_nodes (bst_map=0x41627a0,
node=@0x4130b00: 0x3df2860) at
/opt/buildAgent/work/5c94c4ced6ebfcd0/gcc/tree-vect-slp.cc:6111
#337 0x00000000018a1e14 in vect_cse_slp_nodes (bst_map=0x41627a0,
node=@0x41348a0: 0x3df2530) at
/opt/buildAgent/work/5c94c4ced6ebfcd0/gcc/tree-vect-slp.cc:6111
#338 0x00000000018a1e14 in vect_cse_slp_nodes (bst_map=0x41627a0,
node=@0x3b8b0d0: 0x3df2310) at
/opt/buildAgent/work/5c94c4ced6ebfcd0/gcc/tree-vect-slp.cc:6111
#339 0x00000000018a1e14 in vect_cse_slp_nodes (bst_map=0x41627a0,
node=@0x41348f0: 0x3dee928) at
/opt/buildAgent/work/5c94c4ced6ebfcd0/gcc/tree-vect-slp.cc:6111
#340 0x00000000018a1e14 in vect_cse_slp_nodes (bst_map=0x41627a0,
node=@0x4134500: 0x3dee460) at
/opt/buildAgent/work/5c94c4ced6ebfcd0/gcc/tree-vect-slp.cc:6111
#341 0x00000000018a1e14 in vect_cse_slp_nodes (bst_map=0x41627a0,
node=@0x3c14600: 0x3ded690) at
/opt/buildAgent/work/5c94c4ced6ebfcd0/gcc/tree-vect-slp.cc:6111
#342 0x00000000018a1e14 in vect_cse_slp_nodes (bst_map=0x41627a0,
node=@0x3ca75f0: 0x3de7910) at
/opt/buildAgent/work/5c94c4ced6ebfcd0/gcc/tree-vect-slp.cc:6111
#343 0x00000000018a1e14 in vect_cse_slp_nodes (bst_map=0x41627a0,
node=@0x3e28590: 0x3de8768) at
/opt/buildAgent/work/5c94c4ced6ebfcd0/gcc/tree-vect-slp.cc:6111
#344 0x00000000018a1e14 in vect_cse_slp_nodes (bst_map=0x41627a0,
node=@0x3c2e4b8: 0x3de7778) at
/opt/buildAgent/work/5c94c4ced6ebfcd0/gcc/tree-vect-slp.cc:6111
#345 0x00000000018a1e14 in vect_cse_slp_nodes (bst_map=0x41627a0,
node=@0x3da5e58: 0x3de7dd8) at
/opt/buildAgent/work/5c94c4ced6ebfcd0/gcc/tree-vect-slp.cc:6111
#346 0x00000000018a1e14 in vect_cse_slp_nodes (bst_map=0x41627a0,
node=@0x41d0770: 0x3de7f70) at
/opt/buildAgent/work/5c94c4ced6ebfcd0/gcc/tree-vect-slp.cc:6111
#347 0x00000000018a1f20 in vect_optimize_slp (vinfo=0x3e291c0) at
/opt/buildAgent/work/5c94c4ced6ebfcd0/gcc/tree-vect-slp.cc:6128

[Bug middle-end/115597] New: [15 Regression] vectorizer takes 20+ h compiling 510.parest in SPECCPU2017 since g:46bb4ce4d30ab749d40f6f4cef6f1fb7c7813452

Reply via email to