On Fri, Jul 24, 2020 at 12:19:59AM -0700, Ian Rogers wrote:
> for_each_set_bit, or similar functions like for_each_cpu, may be hot
> within the kernel. If many bits were set then one could imagine on
> Intel a "bt" instruction with every bit may be faster than the function
> call and word length find_next_bit logic. Add a benchmark to measure
> this.
 
> This benchmark on AMD rome and Intel skylakex shows "bt" is not a good
> option except for very small bitmaps.

Small bitmaps is a common case in the kernel (e.g. cpu bitmaps) 

But the current code isn't that great for small bitmaps. It always looks 
horrific
when I look at PT traces or brstackinsn, especially since it was optimized
purely for code size at some point.

Probably would be better to have different implementations for
different sizes.

-Andi

Reply via email to