https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109011

--- Comment #5 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
And to answer myself, as x86 has vplzcnt* just for 32-bit and 64-bit elts with
-mavx512cd (perhaps -mavx512vl also depending on vecsize), there is also 8-bit
and 16-bit element vector popcount (guarded by different options).
And with popcount it would be 3 instructions instead of 4, though dunno about
their latencies etc.

Reply via email to