https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109747
--- Comment #4 from CVS Commits <cvs-commit at gcc dot gnu.org> --- The master branch has been updated by Richard Biener <rgue...@gcc.gnu.org>: https://gcc.gnu.org/g:b6b8870ec585947a03a797f9037d02380316e235 commit r14-1139-gb6b8870ec585947a03a797f9037d02380316e235 Author: Richard Biener <rguent...@suse.de> Date: Tue May 23 15:03:00 2023 +0200 tree-optimization/109747 - SLP cost of CTORs The x86 backend looks at the SLP node passed to the add_stmt_cost hook when costing vec_construct, looking for elements that require a move from a GPR to a vector register and cost that. But since vect_prologue_cost_for_slp decomposes the cost for an external SLP node into individual pieces this cost gets applied N times without a chance for the backend to know it's just dealing with a part of the SLP node. Just looking at a part is also not perfect since the GPR to XMM move cost applies only once per distinct element so handling the whole SLP node one more correctly reflects cost (albeit without considering other external SLP nodes). The following addresses the issue by passing down the SLP node only for one piece and nullptr for the rest. The x86 backend is currently the only one looking at it. In the future the cost of external elements is something to deal with globally but that would require the full SLP tree be available to costing. It's difficult to write a testcase, at the tipping point not vectorizing is better so I'll followup with x86 specific adjustments and will see to add a testcase later. PR tree-optimization/109747 * tree-vect-slp.cc (vect_prologue_cost_for_slp): Pass down the SLP node only once to the cost hook.