https://gcc.gnu.org/bugzilla/show_bug.cgi?id=123343

            Bug ID: 123343
           Summary: Loop unrolling before vectorization produces
                    suboptimal RISC-V code
           Product: gcc
           Version: 16.0
            Status: UNCONFIRMED
          Keywords: missed-optimization
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: chenzhongyao.hit at gmail dot com
  Target Milestone: ---
            Target: riscv

testcase is extracted from x264.
https://godbolt.org/z/doreGfo79

The inner loop gets unrolled before the vectorizer sees it, leads to
inefficient code with vslide/vcompress permutation instructions.


Potential approaches (seeking feedback):
1. Skip unrolling - Don't unroll innermost loops with memory loads + type
promotion when vectorization is enabled.
2. SLP reorganization - Improve SLP vectorization to detect and reorganize
interleaved loads back into contiguous accesses after unrolling.

Or other better approaches ?

Reply via email to