https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106475
Bug ID: 106475 Summary: Loop vectorizer prevents vectorization Product: gcc Version: 13.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: christophm30 at gmail dot com Target Milestone: --- Inspired by https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106352 I've tested GCC's behaviour after adding the restrict keyword as advised there. This results in the following code: ``` #include <inttypes.h> void foo (uint8_t *restrict dst, int i_dst_stride, uint8_t *src1, int i_src1_stride, uint8_t *src2, int i_src2_stride, int i_height) { for (int y = 0; y < i_height; y++) { for( int x = 0; x < 8; x++ ) dst[x] = (src1[x] + src2[x] + 1); dst += i_dst_stride; src1 += i_src1_stride; src2 += i_src2_stride; } } ``` The issue is now, that this only gets vectorized, if we pass `-O3 -fno-tree-loop-vectorize`, i.e. disable the loop vectorizer. Obviously, what helps for this function is not necessarily beneficial for the rest of the program. So a solution that does not need to disable the loop vectorization to generate faster code would be preferred. I have not found a GCC version that can do this, so this is not a regression, but a limitation. I also have not found a similar ticket, but I suspect this to be somehow a known issue. Are there any ideas to improve this?