https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106475

            Bug ID: 106475
           Summary: Loop vectorizer prevents vectorization
           Product: gcc
           Version: 13.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: tree-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: christophm30 at gmail dot com
  Target Milestone: ---

Inspired by https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106352 I've tested 
GCC's behaviour after adding the restrict keyword as advised there.
This results in the following code:

```
#include <inttypes.h>
void
foo (uint8_t *restrict dst, int i_dst_stride,
     uint8_t *src1, int i_src1_stride,
     uint8_t *src2, int i_src2_stride,
     int i_height)
{
    for (int y = 0; y < i_height; y++)
      {
        for( int x = 0; x < 8; x++ )
          dst[x] = (src1[x] + src2[x] + 1);
        dst  += i_dst_stride;
        src1 += i_src1_stride;
        src2 += i_src2_stride;
      }
}
```

The issue is now, that this only gets vectorized, if we pass
`-O3 -fno-tree-loop-vectorize`, i.e. disable the loop vectorizer.

Obviously, what helps for this function is not necessarily beneficial
for the rest of the program. So a solution that does not need to disable
the loop vectorization to generate faster code would be preferred.

I have not found a GCC version that can do this, so this is not a
regression, but a limitation. I also have not found a similar ticket,
but I suspect this to be somehow a known issue.

Are there any ideas to improve this?

Reply via email to