https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98138
--- Comment #16 from rguenther at suse dot de <rguenther at suse dot de> --- On Fri, 10 Jan 2025, linkw at gcc dot gnu.org wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98138 > > --- Comment #15 from Kewen Lin <linkw at gcc dot gnu.org> --- > It looks r15-2820-gab18785840d7b8 has made the case in #c1 vectorized, nice! > > But CPUBench has unsigned type in HADAMARD4: > > #if BIT_DEPTH > 8 > typedef uint32_t sum_t; > typedef uint64_t sum2_t; > #else > typedef uint16_t sum_t; > typedef uint32_t sum2_t; > #endif > #define BITS_PER_SUM (8 * sizeof(sum_t)) > > #define HADAMARD4(d0, d1, d2, d3, s0, s1, s2, s3) {\ > sum2_t t0 = s0 + s1;\ > sum2_t t1 = s0 - s1;\ > sum2_t t2 = s2 + s3;\ > sum2_t t3 = s2 - s3;\ > d0 = t0 + t2;\ > d2 = t0 - t2;\ > d1 = t1 + t3;\ > d3 = t1 - t3;\ > } > > GCC still fails to vectorize it if we change type of t0,t1,t2,t3's type to > unsigned int. Likely re-association makes the "pattern" not match.