On 2026-03-22 16:52, Terence Kelly wrote:
1. Can you confirm that shuf uses the Fisher-Yates/Durstenfeld unbiased
shuffle algorithm?
Never heard of that name for that algorithm, which I consider obvious.
As I recall, I independently invented a superset of it in 2006 and shuf
uses this superset algorithm, which is equivalent to
Fisher-Yates/Durstenfeld in the special case you're surely thinking of.
See coreutils/gl/lib/randperm.c's randperm_new.
2. Can you confirm that shuf avoids modulo bias --- the infamous and
widespread bug, "random_number%N" --- when it makes equiprobable
selections in its implementation of the shuffle algorithm?
Yes. See coreutils/gl/lib/randint.c's randint_genmax.
3. Can you confirm that when shuf's "--random-source=FILE" option is
used, the specified file is the sole source of random bits for all of
shuf's behavior, and that if the same random source file is used again,
holding all other shuf options & inputs constant, then shuf will emit
the same output?
Yes. See coreutils/gl/lib/randread.c's randread_new and its callers.
Before I can recommend shuf, however, I must confirm that it is free of several
defects that are often found in other random permutation software.
I suggest recommending coreutils 9.6 (2025-01-17) or later, due to the
bug fixed here (a bug that's not on your list...):
https://cgit.git.savannah.gnu.org/cgit/coreutils.git/commit/?id=bfbb3ec7f798b179d7fa7b42673e068b18048899
The bug doesn't matter if you use --random-source=FILE, as that option
bypasses the bug.