On Fri, Feb 26, 2021 at 1:03 PM Peter Zijlstra <pet...@infradead.org> wrote: > > On Fri, Feb 26, 2021 at 11:52:39AM -0800, Josh Don wrote: > > From: Clement Courbet <cour...@google.com> > > > > A significant portion of __calc_delta time is spent in the loop > > shifting a u64 by 32 bits. Use a __builtin_clz instead of iterating. > > > > This is ~7x faster on benchmarks. > > Have you tried on hardware without such fancy instructions?
Was not able to find any on hand unfortunately. Clement did rework the patch to use fls() instead, and has benchmarks for the generic and asm variations. All of which are faster than the loop. In my next reply, I'll include the updated patch inline.