Hi Paul & Collin,
Thanks for your quick and detailed replies!
Regarding true-random sources versus PRNGs, I recommend that the shuf
documentation should educate users about the qualitative difference
between the two. Any PRNG that accepts a fixed-length seed inevitably
throttles the entropy brought to bear on whatever problem the PRNG's
output is trying to solve (equiprobable selection of a random permutation
in the case of shuf). As my "Zero Tolerance" paper points out, a
20,000-bit seed isn't enough entropy to equiprobably shuffle 2,087 items.
My latest paper presents another argument ("balls into bins") showing why
*any* PRNG inevitably introduces enormous biases into a related
combinatorial selection problem:
https://spawn-queue.acm.org/doi/pdf/10.1145/3778029
Cryptographic security is irrelevant to the simple arguments that
establish that the use of *any* PRNG precludes equiprobability for both
shuffling and random subset selection (a.k.a. sampling).
One way to present this fact to users would be to say that by using a
"good" PRNG, you're giving up / relaxing the equiprobability requirement
--- hopefully in ways that won't harm your application. Let users make
fully informed decisions regarding the tradeoffs of true-RNG versus PRNG.
Thanks!
-- Terence
On Mon, 23 Mar 2026, Paul Eggert wrote:
On 2026-03-22 23:30, Collin Funk wrote:
the RDSEED and RDRAND instructions have had some quite severe
issues. E.g. random numbers being stored in a buffer that could be read
from other cores [1].
Yes, we'd need assurance of reliability for the underlying RNG. As I
understand it, the hardware folks are getting there, in the sense that the
hardware RNGs are good enough for Coreutils and there are known mitigations
for the bugs we know about. Likewise for what Coreutils currently uses: by
default it relies on GNU/Linux getrandom calls despite the presence of
CVE-2025-0577 <https://bugzilla.redhat.com/show_bug.cgi?id=2338871>.
Recently, AMD has had a
severe issue with RDSEED consistently returning zero [2].
That would be an issue if we used the broken RDSEED implementations. Luckily,
there's a simple workaround: use 64-bit RDSEED, which is what we should use
anyway.
Also, I was curious about the use of ISAAC. As far as I can tell it was
chosen for it's speed and being cryptographically secure. I guess I
would have to benchmark it, but I wonder how it compares to ChaCha20 as
used by OpenBSD's and Linux's (among others) /dev/random files. It also
has the benefit of more recent cryptanalysis, and more cryptanalysis in
general since it is used in many protocols, e.g., TLS.
The code uses ISAAC because it was written before ChaCha20 was invented. If
we had to do it over again now, we'd likely use ChaCha, though perhaps not
ChaCha20 as Jean-Philippe Aumasson, who looked into ISAAC (see
coreutils/gl/lib/randread.c) without finding problems that would affect
current Coreutils, has said that ChaCha8 is both 2.5× faster than ChaCha20
and good enough for Coreutils-like applications. See:
Aumasson J-P. 6 years after too much crypto. bfSwA. 2025-11-16.
<https://bfswa.substack.com/p/6-years-after-too-much-crypto>