On Fri, 27 Mar 2026 at 12:57, <[email protected]> wrote:
>
> Using 'byte masking' is faster for longer strings - the break-even point
> is around 56 bytes on the same Zen-5 (there is much larger overhead, then
> it runs at 16 bytes in 3 clocks).
What byte masking approach did you actually use?
We have 'lib/strnlen_user.c', which is actually the only strlen() in
the kernel that I've really ever seen in profiles (it shows up for
execve() with lots of arguments).
That has tons of extra overhead due to the whole user access setuip,
but the core loop should be pretty good with that has_zero() thing.
I do agree that we shouldn't use 'rep scas'. It goes back to the
*very* original linux kernel sources, though, and I've never seen it
in profiles because very few things in the kernel actually use strings
a lot.
Linus