https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98138

--- Comment #16 from rguenther at suse dot de <rguenther at suse dot de> ---
On Fri, 10 Jan 2025, linkw at gcc dot gnu.org wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98138
> 
> --- Comment #15 from Kewen Lin <linkw at gcc dot gnu.org> ---
> It looks r15-2820-gab18785840d7b8 has made the case in #c1 vectorized, nice!
> 
> But CPUBench has unsigned type in HADAMARD4:
> 
> #if BIT_DEPTH > 8
>     typedef uint32_t sum_t;
>     typedef uint64_t sum2_t;
> #else
>     typedef uint16_t sum_t;
>     typedef uint32_t sum2_t;
> #endif
> #define BITS_PER_SUM (8 * sizeof(sum_t))
> 
> #define HADAMARD4(d0, d1, d2, d3, s0, s1, s2, s3) {\
>     sum2_t t0 = s0 + s1;\
>     sum2_t t1 = s0 - s1;\
>     sum2_t t2 = s2 + s3;\
>     sum2_t t3 = s2 - s3;\
>     d0 = t0 + t2;\
>     d2 = t0 - t2;\
>     d1 = t1 + t3;\
>     d3 = t1 - t3;\
> }
> 
> GCC still fails to vectorize it if we change type of t0,t1,t2,t3's type to
> unsigned int.

Likely re-association makes the "pattern" not match.

Reply via email to