Mr. Eggert & colleagues,
I'm preparing an invited article for a medical journal on the topic of
correct randomization for randomized controlled experiments and clinical
trials. I want to recomend the "shuf" utility for certain randomization
tasks, because it is easy to use and widely available.
Before I can recommend shuf, however, I must confirm that it is free of
several defects that are often found in other random permutation software.
I'd be grateful if you would answer the following questions, providing
pointers to relevant parts of the latest source code. (I looked at the
coreutils code myself, but I don't trust myself to unilaterally provide
high-confidence answers to these questions.)
1. Can you confirm that shuf uses the Fisher-Yates/Durstenfeld unbiased
shuffle algorithm?
2. Can you confirm that shuf avoids modulo bias --- the infamous and
widespread bug, "random_number%N" --- when it makes equiprobable
selections in its implementation of the shuffle algorithm?
3. Can you confirm that when shuf's "--random-source=FILE" option is used,
the specified file is the sole source of random bits for all of shuf's
behavior, and that if the same random source file is used again, holding
all other shuf options & inputs constant, then shuf will emit the same
output?
Many thanks in advance for your reply, and thank you for generously giving
shuf to the world! I'll do my best to find new users for shuf.
Finally, if you'd like to include in the "shuf" documentation a
comprehensive discussion of the snafus that often arise in random
permutation software, I recommend the following article:
https://spawn-queue.acm.org/doi/pdf/10.1145/3664645
Terence Kelly
tpkelly @ { acm.org, cs.princeton.edu, eecs.umich.edu }