10 Regression] SIMD not generated for horizontal sum of bytes in array

glisse at gcc dot gnu.org Tue, 30 Jul 2019 07:18:25 -0700

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91201


--- Comment #12 from Marc Glisse <glisse at gcc dot gnu.org> ---
(In reply to Jakub Jelinek from comment #11)
> I'm not aware of vcompressb insn, only vcompressps and vcompresspd.

Intel lists it under VBMI2, so icelake+.

> Sure,
> one could just emit whatever we emit for __builtin_shuffle with (__v64qi) {
> 0, 8, 16, 24, 32, 40, 48, 56, 0, 8, 16, 24, 32, 40, 48, 56, 0, 8, 16, 24,
> 32, 40, 48, 56, 0, 8, 16, 24, 32, 40, 48, 56, 0, 8, 16, 24, 32, 40, 48, 56,
> 0, 8, 16, 24, 32, 40, 48, 56, 0, 8, 16, 24, 32, 40, 48, 56, 0, 8, 16, 24,
> 32, 40, 48, 56 } or similar perm, the question is if it will be faster that
> way or not.

Exactly.

[Bug tree-optimization/91201] [7/8/9/10 Regression] SIMD not generated for horizontal sum of bytes in array

Reply via email to