https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88281

            Bug ID: 88281
           Summary: SLP permutation check fails to fall back to strided
                    loads
           Product: gcc
           Version: 9.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: tree-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: rguenth at gcc dot gnu.org
  Target Milestone: ---

The following is not vectorized due to a group size of 17 and unsupported load
permutation:

typedef unsigned char uint8_t;
static int x264_pixel_sad_8x8( uint8_t *pix1, int i_stride_pix1, uint8_t *pix2,
int i_stride_pix2 )
{
  int i_sum = 0;
  for( int y = 0; y < 8; y++ )
    {
      for( int x = 0; x < 8; x++ )
        i_sum += __builtin_abs( pix1[x] - pix2[x] );
      pix1 += 17;
      pix2 += i_stride_pix2;
    }
  return i_sum;
}
void x264_pixel_sad_x4_8x8( uint8_t *fenc, uint8_t *pix0, uint8_t *pix1,
uint8_t *pix2, uint8_t *pix3, int i_stride, int scores[4] )
{
  *scores = x264_pixel_sad_8x8( fenc, 16, pix0, i_stride );
}

Reply via email to