[Bug tree-optimization/88760] GCC unrolling is suboptimal

rsandifo at gcc dot gnu.org Wed, 16 Jan 2019 02:10:18 -0800

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88760


rsandifo at gcc dot gnu.org <rsandifo at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |NEW
   Last reconfirmed|                            |2019-01-16
                 CC|                            |rsandifo at gcc dot gnu.org
     Ever confirmed|0                           |1

--- Comment #10 from rsandifo at gcc dot gnu.org <rsandifo at gcc dot gnu.org> 
---
FWIW, I agree that pure unrolling doesn't feel like a gimple-level
optimisation.  Whether it's a win or not depends on whether the
unrolled loop will make better use of the microarchitecture.
The problem isn't just that that's hard to decide at the gimple level,
but that the result can't be represented directly in gimple.  AIUI
there's no real significance to the schedule of gimple statements
(beyond ensuring valid SSA and functional correctness).

This is different from vectorisation and ivopts, which can represent
the benefit of the transformation directly in gimple (using vector
ops and TARGET_MEM_REFs respectively).

As Kyrill pointed out off-list, LLVM does the unrolling in the vectoriser
rather than a separate unrolling pass.  (Use -mllvm -print-after-all
to see this.)

I think for AArch64 we can view LDP and STP as 2-element vector loads
and stores that have zero-cost insertion and extraction.  So converting:

      ldr     x0, [...]
      add     x0, x0, 1
      str     x0, [...]

into:

      ldp     x0, x1, [...]
      add     x0, x0, 1
      add     x1, x1, 1
      stp     x0, x1, [...]

is IMO genuine vectorisation.  The LDPs and STPs are effectively
scalar IFN_LOAD_LANES and IFN_STORE_LANES, although we could also
represent them as single-element (V1) vector ops instead if that
seems more consistent.

Vectorising operations other than loads and stores would simply
involve duplicating the statements VF times.

[Bug tree-optimization/88760] GCC unrolling is suboptimal

Reply via email to