Hi Paul,

Paul Eggert <[email protected]> writes:

> On 2026-03-22 16:52, Terence Kelly wrote:
>
>> 1. Can you confirm that shuf uses the Fisher-Yates/Durstenfeld
>> unbiased shuffle algorithm?
>
> Never heard of that name for that algorithm, which I consider obvious.
> As I recall, I independently invented a superset of it in 2006 and
> shuf uses this superset algorithm, which is equivalent to
> Fisher-Yates/Durstenfeld in the special case you're surely thinking
> of. See coreutils/gl/lib/randperm.c's randperm_new.

I assume many different people independently "discovered" the algorithm,
and did not know about the previous works. For example, Knuth's first
edition of "The Art of Computer Programming, Volume 2" credits L. E.
Moses and R. V. Oakford for first publishing it in 1964. The second
edition corrects it to credit R. A. Fisher and F. Yates for publishing
it in 1938.

>> Before I can recommend shuf, however, I must confirm that it is free of 
>> several defects that are often found in other random permutation software.
>
> I suggest recommending coreutils 9.6 (2025-01-17) or later, due to the
> bug fixed here (a bug that's not on your list...):
>
> https://cgit.git.savannah.gnu.org/cgit/coreutils.git/commit/?id=bfbb3ec7f798b179d7fa7b42673e068b18048899
>
> The bug doesn't matter if you use --random-source=FILE, as that option
> bypasses the bug.

Not directly related to Terence's questions, but I had a few
questions/thoughts while looking into 'shuf' recently that his mail
reminded me of. I will just share them here because he may also be
interested.

First, I was curious about this comment from lib/randread.c you wrote in
2011:

    /* FIXME: Improve performance by adding support for the RDRAND machine
       instruction if available (e.g., Ivy Bridge processors).  */

I wonder if this is still the case nowadays? As Terence's article
mentions the RDSEED and RDRAND instructions have had some quite severe
issues. E.g. random numbers being stored in a buffer that could be read
from other cores [1]. This was supposedly fixed by Intel with microcode,
however it also adds latency to the instruction. Recently, AMD has had a
severe issue with RDSEED consistently returning zero [2].

Also, I was curious about the use of ISAAC. As far as I can tell it was
chosen for it's speed and being cryptographically secure. I guess I
would have to benchmark it, but I wonder how it compares to ChaCha20 as
used by OpenBSD's and Linux's (among others) /dev/random files. It also
has the benefit of more recent cryptanalysis, and more cryptanalysis in
general since it is used in many protocols, e.g., TLS.

Collin

[1] https://download.vusec.net/papers/crosstalk_sp21.pdf
[2] https://www.amd.com/en/resources/product-security/bulletin/amd-sb-7055.html



Reply via email to