Mr. Eggert & colleagues,

I'm preparing an invited article for a medical journal on the topic of correct randomization for randomized controlled experiments and clinical trials. I want to recomend the "shuf" utility for certain randomization tasks, because it is easy to use and widely available.

Before I can recommend shuf, however, I must confirm that it is free of several defects that are often found in other random permutation software. I'd be grateful if you would answer the following questions, providing pointers to relevant parts of the latest source code. (I looked at the coreutils code myself, but I don't trust myself to unilaterally provide high-confidence answers to these questions.)

1. Can you confirm that shuf uses the Fisher-Yates/Durstenfeld unbiased shuffle algorithm?

2. Can you confirm that shuf avoids modulo bias --- the infamous and widespread bug, "random_number%N" --- when it makes equiprobable selections in its implementation of the shuffle algorithm?

3. Can you confirm that when shuf's "--random-source=FILE" option is used, the specified file is the sole source of random bits for all of shuf's behavior, and that if the same random source file is used again, holding all other shuf options & inputs constant, then shuf will emit the same output?

Many thanks in advance for your reply, and thank you for generously giving shuf to the world! I'll do my best to find new users for shuf.

Finally, if you'd like to include in the "shuf" documentation a comprehensive discussion of the snafus that often arise in random permutation software, I recommend the following article:

https://spawn-queue.acm.org/doi/pdf/10.1145/3664645

Terence Kelly
tpkelly @ { acm.org, cs.princeton.edu, eecs.umich.edu }




Reply via email to