https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116312
Richard Sandiford <rsandifo at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |rsandifo at gcc dot gnu.org --- Comment #3 from Richard Sandiford <rsandifo at gcc dot gnu.org> --- FWIW, see the comment in aarch64_sve_adjust_stmt_cost for some of the problems with costing LDP and STP correctly: /* Advanced SIMD can load and store pairs of registers using LDP and STP, but there are no equivalent instructions for SVE. This means that (all other things being equal) 128-bit SVE needs twice as many load and store instructions as Advanced SIMD in order to process vector pairs. Also, scalar code can often use LDP and STP to access pairs of values, so it is too simplistic to say that one SVE load or store replaces VF scalar loads and stores. Ideally we would account for this in the scalar and Advanced SIMD costs by making suitable load/store pairs as cheap as a single load/store. However, that would be a very invasive change and in practice it tends to stress other parts of the cost model too much. E.g. stores of scalar constants currently count just a store, whereas stores of vector constants count a store and a vec_init. This is an artificial distinction for AArch64, where stores of nonzero scalar constants need the same kind of register invariant as vector stores. An alternative would be to double the cost of any SVE loads and stores that could be paired in Advanced SIMD (and possibly also paired in scalar code). But this tends to stress other parts of the cost model in the same way. It also means that we can fall back to Advanced SIMD even if full-loop predication would have been useful. Here we go for a more conservative version: double the costs of SVE loads and stores if one iteration of the scalar loop processes enough elements for it to use a whole number of Advanced SIMD LDP or STP instructions. This makes it very likely that the VF would be 1 for Advanced SIMD, and so no epilogue should be needed. */