[Bug tree-optimization/81558] Loop not vectorized
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81558 Andrew Pinski changed: What|Removed |Added Last reconfirmed||2023-07-21 Status|UNCONFIRMED |NEW Ever confirmed|0 |1 --- Comment #4 from Andrew Pinski --- Confirmed.
[Bug tree-optimization/81558] Loop not vectorized
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81558 Andrew Pinski changed: What|Removed |Added Keywords||missed-optimization Severity|normal |enhancement
[Bug tree-optimization/81558] Loop not vectorized
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81558 --- Comment #3 from rguenther at suse dot de --- On Thu, 27 Jul 2017, kugan at gcc dot gnu.org wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81558 > > --- Comment #2 from kugan at gcc dot gnu.org --- > > > Does LLVM do a runtime alias check here? For foo1 GCC adds a runtime alias > > check > > (BB vectorization cannot version for aliasing). > > Yes. LLVM does not seem to be unrolling the inner loop. As you said, when > disabling cunrolli it works. cunroll pass will unroll after loop > vectorisation. > Can anything done with the heuristics for this case? Thanks. cunrolli sees Loop 2 iterates 16 times. ... size: 1 imgY_org.6_2 = imgY_org; size: 0 _3 = (long unsigned int) y_15; size: 1 _4 = _3 * 8; size: 1 _5 = imgY_org.6_2 + _4; size: 1 _6 = *_5; size: 0 _7 = (long unsigned int) x_14; size: 1 _8 = _7 * 2; size: 1 _9 = _6 + _8; size: 1 orgptr_24 = orgptr_16 + 2; size: 1 _10 = *_9; size: 1 *orgptr_16 = _10; size: 1 x_26 = x_14 + 1; A quick shot at a heuristic would see we'd vectorize this with V8HI/V16HImode and with statically determined 16 iterations that should be profitable. So yes, a heuristic is possible but it would be only a heuristic which means there's likely a testcase that will regress in one way or another (like missing simplifications exposed by unrolling). Another thing is that IMHO cunrolli has a too big limit on the maximum number of iterations it'll unroll. Adding another param might help here, or making it less aggressive. Of course calcluix relies heavily on curnolli aggressively unrolling ...
[Bug tree-optimization/81558] Loop not vectorized
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81558 --- Comment #2 from kugan at gcc dot gnu.org --- > Does LLVM do a runtime alias check here? For foo1 GCC adds a runtime alias > check > (BB vectorization cannot version for aliasing). Yes. LLVM does not seem to be unrolling the inner loop. As you said, when disabling cunrolli it works. cunroll pass will unroll after loop vectorisation. Can anything done with the heuristics for this case? Thanks.
[Bug tree-optimization/81558] Loop not vectorized
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81558 Richard Biener changed: What|Removed |Added CC||rguenth at gcc dot gnu.org Blocks||53947 --- Comment #1 from Richard Biener --- The inner loop in foo2 is completely unrolled by GCC and imgY_org[y][x] is _32 = (long unsigned int) y_103; _33 = _32 * 8; _34 = imgY_org.8_31 + _33; _35 = *_34; where *_34 aliases *orgptr. Thus it's not possible to vectorize this without a runtime alias check. The innermost loop in foo1 is vectorized, the unrolled loop in foo2 is not basic-block vectorized because basic-block vectorization runs into the very same dependence issue. Does LLVM do a runtime alias check here? For foo1 GCC adds a runtime alias check (BB vectorization cannot version for aliasing). Referenced Bugs: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53947 [Bug 53947] [meta-bug] vectorizer missed-optimizations