https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117733
Richard Biener <rguenth at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Target| |riscv
Component|middle-end |tree-optimization
Blocks| |26163, 53947
--- Comment #2 from Richard Biener <rguenth at gcc dot gnu.org> ---
The inner loop is unrolled and we select a [2,2] VF as the group size is 5:
t.f90:12:20: note: Detected interleaving load of size 5
t.f90:12:20: note: _31 = (*q_18(D))[_30];
t.f90:12:20: note: _44 = (*q_18(D))[_43];
t.f90:12:20: note: _57 = (*q_18(D))[_56];
t.f90:12:20: note: _70 = (*q_18(D))[_69];
t.f90:12:20: note: _83 = (*q_18(D))[_82];
I think what's needed for your idea to work is basically re-rolling the loop,
I don't see how we can otherwise deal with this absent a vector mode
with [10,2]? Note the re-rolling can take place "virtually" inside the
vectorizer, we'd use a fractional VF to get us to group size 1.
Referenced Bugs:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=26163
[Bug 26163] [meta-bug] missed optimization in SPEC (2k17, 2k and 2k6 and 95)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53947
[Bug 53947] [meta-bug] vectorizer missed-optimizations