https://gcc.gnu.org/bugzilla/show_bug.cgi?id=125293
Bug ID: 125293
Summary: equivalent loops, different vectorization
Product: gcc
Version: 16.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: middle-end
Assignee: unassigned at gcc dot gnu.org
Reporter: manu at gcc dot gnu.org
Target Milestone: ---
GCC vectorizes the two for loops for the first function but not the for loop
for the second function, even though the loops are equivalent. If vectorization
is profitable at all, the single for loop is surely more profitable.
```
#include <stddef.h>
void
matrix_transpose_double(double * restrict dst, const double * restrict src,
const size_t nrows, const size_t ncols)
{
if (nrows == 0 || ncols == 0)
return;
const size_t len_1 = (nrows * ncols) - 1;
size_t i = 0, j = 0;
for (; j <= len_1; i++, j += nrows)
dst[j] = src[i];
j -= len_1;
while (i <= len_1) {
for (; j <= len_1; i++, j += nrows)
dst[j] = src[i];
j -= len_1;
}
}
void
matrix_transpose_double_2(double * restrict dst, const double * restrict src,
const size_t nrows, const size_t ncols)
{
if (nrows == 0 || ncols == 0)
return;
const size_t len_1 = (nrows * ncols) - 1;
size_t i = 0, j = 0;
do {
for (; j <= len_1; i++, j += nrows)
dst[j] = src[i];
j -= len_1;
} while (i <= len_1);
}
```
compiled with
gcc -o test test.c -O3 -fopt-info-vec-all -march=x86-64-v2
gives:
<source>:12:14: optimized: loop vectorized using 16 byte vectors and unroll
factor 2
<source>:15:14: missed: couldn't vectorize loop
<source>:15:14: missed: not vectorized: unsupported control flow in loop.
<source>:16:18: optimized: loop vectorized using 16 byte vectors and unroll
factor 2
<source>:15:14: missed: couldn't vectorize loop
<source>:15:14: missed: not vectorized: unsupported control flow in loop.
<source>:16:18: missed: couldn't vectorize loop
<source>:17:25: missed: not vectorized: no vectype for stmt: _14 = *_11;
scalar_type: const double
<source>:12:14: missed: couldn't vectorize loop
<source>:13:21: missed: not vectorized: no vectype for stmt: _9 = *_6;
scalar_type: const double
<source>:4:1: note: vectorized 2 loops in function.
<source>:7:20: note: ***** Analysis failed with vector mode V2DF
<source>:7:20: note: ***** The result for vector mode V16QI would be the same
<source>:7:20: note: ***** Re-trying analysis with vector mode V8QI
<source>:7:20: note: ***** Analysis failed with vector mode V8QI
<source>:7:20: note: ***** Re-trying analysis with vector mode V4QI
<source>:7:20: note: ***** Analysis failed with vector mode V4QI
<source>:36:13: missed: couldn't vectorize loop
<source>:36:13: missed: not vectorized: unsupported control flow in loop.
<source>:23:1: note: vectorized 0 loops in function.
<source>:26:20: note: ***** Analysis failed with vector mode V2DF
<source>:26:20: note: ***** The result for vector mode V16QI would be the same
<source>:26:20: note: ***** Re-trying analysis with vector mode V8QI
<source>:26:20: note: ***** Analysis failed with vector mode V8QI
<source>:26:20: note: ***** Re-trying analysis with vector mode V4QI
<source>:26:20: note: ***** Analysis failed with vector mode V4QI