bug#80658: questions about "shuf" utility

Terence Kelly Mon, 23 Mar 2026 17:53:43 -0700



Hi Paul & Collin,

Thanks for your quick and detailed replies!

Regarding true-random sources versus PRNGs, I recommend that the shufdocumentation should educate users about the qualitative differencebetween the two. Any PRNG that accepts a fixed-length seed inevitablythrottles the entropy brought to bear on whatever problem the PRNG'soutput is trying to solve (equiprobable selection of a random permutationin the case of shuf). As my "Zero Tolerance" paper points out, a20,000-bit seed isn't enough entropy to equiprobably shuffle 2,087 items.

My latest paper presents another argument ("balls into bins") showing why*any* PRNG inevitably introduces enormous biases into a relatedcombinatorial selection problem:


https://spawn-queue.acm.org/doi/pdf/10.1145/3778029

Cryptographic security is irrelevant to the simple arguments thatestablish that the use of *any* PRNG precludes equiprobability for bothshuffling and random subset selection (a.k.a. sampling).

One way to present this fact to users would be to say that by using a"good" PRNG, you're giving up / relaxing the equiprobability requirement--- hopefully in ways that won't harm your application. Let users makefully informed decisions regarding the tradeoffs of true-RNG versus PRNG.


Thanks!

-- Terence


On Mon, 23 Mar 2026, Paul Eggert wrote:

On 2026-03-22 23:30, Collin Funk wrote:
the RDSEED and RDRAND instructions have had some quite severe
issues. E.g. random numbers being stored in a buffer that could be read
from other cores [1].
Yes, we'd need assurance of reliability for the underlying RNG. As Iunderstand it, the hardware folks are getting there, in the sense that thehardware RNGs are good enough for Coreutils and there are known mitigationsfor the bugs we know about. Likewise for what Coreutils currently uses: bydefault it relies on GNU/Linux getrandom calls despite the presence ofCVE-2025-0577 <https://bugzilla.redhat.com/show_bug.cgi?id=2338871>.
Recently, AMD has had a
severe issue with RDSEED consistently returning zero [2].
That would be an issue if we used the broken RDSEED implementations. Luckily,there's a simple workaround: use 64-bit RDSEED, which is what we should useanyway.
Also, I was curious about the use of ISAAC. As far as I can tell it was
chosen for it's speed and being cryptographically secure. I guess I
would have to benchmark it, but I wonder how it compares to ChaCha20 as
used by OpenBSD's and Linux's (among others) /dev/random files. It also
has the benefit of more recent cryptanalysis, and more cryptanalysis in
general since it is used in many protocols, e.g., TLS.
The code uses ISAAC because it was written before ChaCha20 was invented. Ifwe had to do it over again now, we'd likely use ChaCha, though perhaps notChaCha20 as Jean-Philippe Aumasson, who looked into ISAAC (seecoreutils/gl/lib/randread.c) without finding problems that would affectcurrent Coreutils, has said that ChaCha8 is both 2.5× faster than ChaCha20and good enough for Coreutils-like applications. See:
Aumasson J-P. 6 years after too much crypto. bfSwA. 2025-11-16.<https://bfswa.substack.com/p/6-years-after-too-much-crypto>

bug#80658: questions about "shuf" utility

Reply via email to