The following series attempts to solve the issue that SLP_TREE_SCALAR_STMTS is not full scalar coverage of the SLP graph, in particular, but not only, when patterns and in particular SLP patterns are involved. This results in some workarounds in live lane analysis, double-costing there and imprecision in scalar costing.
Instead of trying to derive scalar coverage from SLP_TREE_SCALAR_STMTS the following basically re-does a simple "single-lane" SLP discovery on the SSA graph from the scalar SLP graph entry stmts with external SLP nodes determining the leafs. To record coverage the series turns STMT_SLP_TYPE which now is only pure_slp or no_vect into marking original scalar stmts (not pattern stmts) that are covered (now marked pure_slp with this patch). I've introduced a 'slp_oprnds' class as a start to marshall GIMPLE stmt operands <-> SLP node children mapping with the idea to re-use this for an actual single-lane SLP graph build for loop vectorization, both to ease root discovery there and to serve as starting point for the longer-term alternate SLP discovery (merging nodes from a single SLP graph rather than greedy discovery). That class is likely going to change as that evolves. The series first changes STMT_SLP_TYPE to be scalar coverage for BB vectorization (it's actually unused for loop vectorization). Then it simplifies BB live statement marking using it. Then it replaces the scalar coverage code in BB vectorization costing, actually solving PR124222. And finally (somewhat unrelated), it improves BB vectorization live lane generation by no longer requiring to be able to code generate from every SLP use of the live scalar stmt but from one, only cost one, and only code-generate from that exactly one. This is not yet able to solve the fallbacks in actual code generation - I have updated the comments to mention the actual testcases FAILing. We're still missing to commit to a schedule (aka record a gsi on each SLP node where we insert vectorized stmts) that we could use to upfront verify the inserted vector stmts reach all original scalar uses (or in turn, make sure the schedule is arranged to allow that). The series was part-wise and fully bootstrapped and tested on x86_64-unknown-linux-gnu and is now queued for pushing when stage1 opens. Feedback still welcome of course. Thanks, Richard.
