[Bug tree-optimization/111793] OpenMP SIMD inbranch clones for AVX512 are highly sub-optimal

rguenth at gcc dot gnu.org via Gcc-bugs Fri, 13 Oct 2023 02:39:46 -0700

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111793


--- Comment #5 from Richard Biener <rguenth at gcc dot gnu.org> ---
(In reply to Jakub Jelinek from comment #4)
> So, shouldn't we match.pd (or something else) pattern match
>   vect_cst__50 = {mask.48_7(D), mask.48_7(D), mask.48_7(D), mask.48_7(D),
> mask.48_7(D), mask.48_7(D), mask.48_7(D), mask.48_7(D), mask.48_7(D),
> mask.48_7(D), mask.48_7(D), mask.48_7(D), mask.48_7(D), mask.48_7(D),
> mask.48_7(D), mask.48_7(D)};
>   vect__8.132_51 = vect_cst__50 >> { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11,
> 12, 13, 14, 15 };
>   vect__9.133_53 = vect__8.132_51 & { 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
> 1, 1, 1 };
>   mask__39.139_60 = vect__9.133_53 != { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
> 0, 0, 0, 0 };
> back into mask__39.139_60 = mask.48_7(D); ?

Yes, that's a possibility.  I wonder if it's possible to arrange things in the
vectorizer itself so that costing gets more accurate (probably not that
important for OMP SIMD though).

Maybe it works a bit better if we did mask & (1 << iv), but I guess we
canonicalize that back.

I've opened this for tracking for now, working on PR111795 first.

[Bug tree-optimization/111793] OpenMP SIMD inbranch clones for AVX512 are highly sub-optimal

Reply via email to