Paul Eggert writes:
On 3/24/26 02:31, Chris Down wrote:
You might wonder why there's no change in shuf. Profiling shows shuf
spends its time almost entirely in randperm_new() and randint_genmax(),
so I/O is not the bottleneck.

Odd. I'm not seeing that in this benchmark, on Fedora 43 x86-64 with a default build:

$ yes | head -n10000 | ltrace -c src/shuf >/dev/null
% time     seconds  usecs/call     calls      function
------ ----------- ----------- --------- --------------------
 49.73    0.425651          21     20000 rawmemchr
 25.95    0.222125          20     10780 memcpy
 24.10    0.206289          20     10000 fwrite_unlocked
  0.03    0.000252          63         4 fread


Hmm, I went back and profiled shuf again and now I can't reproduce it either :-) My best guess is I was previously looking at the perf profile filtered to just the shuf binary's own symbols by accident.

So to adjust my explanation (and looking with more coffee), the reason the patch shows no change for shuf is simpler than I made it sound. shuf reads the entire input via fread_file() and scans with rawmemchr(), and it simply doesn't go through readlinebuffer_delim() at all in the way it was exercised in the benchmark. Whoops!

So I'll either remove it for v2, or make readlinebuffer_delim() actually exercised there.

Reply via email to