https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98138

--- Comment #15 from Kewen Lin <linkw at gcc dot gnu.org> ---
It looks r15-2820-gab18785840d7b8 has made the case in #c1 vectorized, nice!

But CPUBench has unsigned type in HADAMARD4:

#if BIT_DEPTH > 8
    typedef uint32_t sum_t;
    typedef uint64_t sum2_t;
#else
    typedef uint16_t sum_t;
    typedef uint32_t sum2_t;
#endif
#define BITS_PER_SUM (8 * sizeof(sum_t))

#define HADAMARD4(d0, d1, d2, d3, s0, s1, s2, s3) {\
    sum2_t t0 = s0 + s1;\
    sum2_t t1 = s0 - s1;\
    sum2_t t2 = s2 + s3;\
    sum2_t t3 = s2 - s3;\
    d0 = t0 + t2;\
    d2 = t0 - t2;\
    d1 = t1 + t3;\
    d3 = t1 - t3;\
}

GCC still fails to vectorize it if we change type of t0,t1,t2,t3's type to
unsigned int.

Reply via email to