https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88281
Bug ID: 88281 Summary: SLP permutation check fails to fall back to strided loads Product: gcc Version: 9.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: rguenth at gcc dot gnu.org Target Milestone: --- The following is not vectorized due to a group size of 17 and unsupported load permutation: typedef unsigned char uint8_t; static int x264_pixel_sad_8x8( uint8_t *pix1, int i_stride_pix1, uint8_t *pix2, int i_stride_pix2 ) { int i_sum = 0; for( int y = 0; y < 8; y++ ) { for( int x = 0; x < 8; x++ ) i_sum += __builtin_abs( pix1[x] - pix2[x] ); pix1 += 17; pix2 += i_stride_pix2; } return i_sum; } void x264_pixel_sad_x4_8x8( uint8_t *fenc, uint8_t *pix0, uint8_t *pix1, uint8_t *pix2, uint8_t *pix3, int i_stride, int scores[4] ) { *scores = x264_pixel_sad_8x8( fenc, 16, pix0, i_stride ); }