https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115597
Bug ID: 115597 Summary: [15 Regression] vectorizer takes 20+ h compiling 510.parest in SPECCPU2017 since g:46bb4ce4d30ab749d40f6f4cef6f1fb7c7813452 Product: gcc Version: 15.0 Status: UNCONFIRMED Keywords: compile-time-hog Severity: normal Priority: P3 Component: middle-end Assignee: unassigned at gcc dot gnu.org Reporter: tnfchris at gcc dot gnu.org CC: rguenth at gcc dot gnu.org Target Milestone: --- Target: aarch64* Created attachment 58496 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=58496&action=edit slp dump graph Since: commit 46bb4ce4d30ab749d40f6f4cef6f1fb7c7813452 (HEAD) Author: Richard Biener <rguent...@suse.de> Date: Wed Jun 19 12:57:27 2024 +0200 tree-optimization/114413 - SLP CSE after permute optimization We currently fail to re-CSE SLP nodes after optimizing permutes which results in off cost estimates. For gcc.dg/vect/bb-slp-32.c this shows in not re-using the SLP node with the load and arithmetic for both the store and the reduction. The following implements CSE by re-bst-mapping nodes as finalization part of vect_optimize_slp. I've tried to make the CSE part of permute materialization but it isn't a very good fit there. I've not bothered to implement something more complete, also handling external defs or defs without SLP_TREE_SCALAR_STMTS. I realize this might result in more BB SLP which in turn might slow down code given costing for BB SLP is difficult (even that we now vectorize gcc.dg/vect/bb-slp-32.c on x86_64 might be not a good idea). This is nevertheless feeding more accurate info to costing which is good. PR tree-optimization/114413 * tree-vect-slp.cc (release_scalar_stmts_to_slp_tree_map): New function, split out from ... (vect_analyze_slp): ... here. Call it. (vect_cse_slp_nodes): New function. (vect_optimize_slp): Call it. * gcc.dg/vect/bb-slp-32.c: Expect CSE and vectorization on x86. Compilation takes an extremely long time in 510.parest_r. The problem seems to be that vect_cse_slp_nodes visits the same nodes twice. It looks like the function has no visited set, and the hot loop in parest (when vectorizable thanks to libmvec) has many TWO_OPERANDS nodes and one of them is rooted at the top level. vect_cse_slp_nodes seems to skip VEC_PERM_EXPR but not it's children, as such it ends up visiting the same subgraphs multiple times. The graph in parest has so many TWO_OPERAND nodes that essentially compilation never finishes. I believe this function needs a visited node set. example call graph: #334 0x00000000018a1e14 in vect_cse_slp_nodes (bst_map=0x41627a0, node=@0x4132f40: 0x3df2ec0) at /opt/buildAgent/work/5c94c4ced6ebfcd0/gcc/tree-vect-slp.cc:6111 #335 0x00000000018a1e14 in vect_cse_slp_nodes (bst_map=0x41627a0, node=@0x41321a0: 0x3df2b90) at /opt/buildAgent/work/5c94c4ced6ebfcd0/gcc/tree-vect-slp.cc:6111 #336 0x00000000018a1e14 in vect_cse_slp_nodes (bst_map=0x41627a0, node=@0x4130b00: 0x3df2860) at /opt/buildAgent/work/5c94c4ced6ebfcd0/gcc/tree-vect-slp.cc:6111 #337 0x00000000018a1e14 in vect_cse_slp_nodes (bst_map=0x41627a0, node=@0x41348a0: 0x3df2530) at /opt/buildAgent/work/5c94c4ced6ebfcd0/gcc/tree-vect-slp.cc:6111 #338 0x00000000018a1e14 in vect_cse_slp_nodes (bst_map=0x41627a0, node=@0x3b8b0d0: 0x3df2310) at /opt/buildAgent/work/5c94c4ced6ebfcd0/gcc/tree-vect-slp.cc:6111 #339 0x00000000018a1e14 in vect_cse_slp_nodes (bst_map=0x41627a0, node=@0x41348f0: 0x3dee928) at /opt/buildAgent/work/5c94c4ced6ebfcd0/gcc/tree-vect-slp.cc:6111 #340 0x00000000018a1e14 in vect_cse_slp_nodes (bst_map=0x41627a0, node=@0x4134500: 0x3dee460) at /opt/buildAgent/work/5c94c4ced6ebfcd0/gcc/tree-vect-slp.cc:6111 #341 0x00000000018a1e14 in vect_cse_slp_nodes (bst_map=0x41627a0, node=@0x3c14600: 0x3ded690) at /opt/buildAgent/work/5c94c4ced6ebfcd0/gcc/tree-vect-slp.cc:6111 #342 0x00000000018a1e14 in vect_cse_slp_nodes (bst_map=0x41627a0, node=@0x3ca75f0: 0x3de7910) at /opt/buildAgent/work/5c94c4ced6ebfcd0/gcc/tree-vect-slp.cc:6111 #343 0x00000000018a1e14 in vect_cse_slp_nodes (bst_map=0x41627a0, node=@0x3e28590: 0x3de8768) at /opt/buildAgent/work/5c94c4ced6ebfcd0/gcc/tree-vect-slp.cc:6111 #344 0x00000000018a1e14 in vect_cse_slp_nodes (bst_map=0x41627a0, node=@0x3c2e4b8: 0x3de7778) at /opt/buildAgent/work/5c94c4ced6ebfcd0/gcc/tree-vect-slp.cc:6111 #345 0x00000000018a1e14 in vect_cse_slp_nodes (bst_map=0x41627a0, node=@0x3da5e58: 0x3de7dd8) at /opt/buildAgent/work/5c94c4ced6ebfcd0/gcc/tree-vect-slp.cc:6111 #346 0x00000000018a1e14 in vect_cse_slp_nodes (bst_map=0x41627a0, node=@0x41d0770: 0x3de7f70) at /opt/buildAgent/work/5c94c4ced6ebfcd0/gcc/tree-vect-slp.cc:6111 #347 0x00000000018a1f20 in vect_optimize_slp (vinfo=0x3e291c0) at /opt/buildAgent/work/5c94c4ced6ebfcd0/gcc/tree-vect-slp.cc:6128