Paul Eggert writes:
On 3/24/26 02:31, Chris Down wrote:
You might wonder why there's no change in shuf. Profiling shows shuf
spends its time almost entirely in randperm_new() and randint_genmax(),
so I/O is not the bottleneck.
Odd. I'm not seeing that in this benchmark, on Fedora 43 x86-64 with a
default build:
$ yes | head -n10000 | ltrace -c src/shuf >/dev/null
% time seconds usecs/call calls function
------ ----------- ----------- --------- --------------------
49.73 0.425651 21 20000 rawmemchr
25.95 0.222125 20 10780 memcpy
24.10 0.206289 20 10000 fwrite_unlocked
0.03 0.000252 63 4 fread
Hmm, I went back and profiled shuf again and now I can't reproduce it either
:-) My best guess is I was previously looking at the perf profile filtered to
just the shuf binary's own symbols by accident.
So to adjust my explanation (and looking with more coffee), the reason the
patch shows no change for shuf is simpler than I made it sound. shuf reads the
entire input via fread_file() and scans with rawmemchr(), and it simply doesn't
go through readlinebuffer_delim() at all in the way it was exercised in the
benchmark. Whoops!
So I'll either remove it for v2, or make readlinebuffer_delim() actually
exercised there.