From: Sergei Trofimovich <siarh...@google.com> Artem Klimov noticed that current shuffle implementation suffers from probability bias due to a typical bug in the shuffling implementation.
When we consider already shuffled element we slightly bias their propability. https://en.wikipedia.org/wiki/Fisher%E2%80%93Yates_shuffle#The_modern_algorithm suggests shuffling each "struck" element only once. Before the change probabilities of 4-element array to land from `i` index to `j` index are: 0 1 2 3 _____ _____ _____ _____ 0 | 25.00 29.30 24.60 21.10 1 | 25.00 22.25 28.13 24.63 2 | 25.00 23.44 22.26 29.30 3 | 25.00 25.01 25.01 24.98 Note that `0->1` probability is almost 29% while `0->3` os only 21%. After the change probabilities are not as biased: 0 1 2 3 _____ _____ _____ _____ 0 | 24.99 24.99 25.01 25.01 1 | 24.99 25.04 24.99 24.99 2 | 25.01 25.00 25.00 24.99 3 | 25.01 24.98 25.00 25.01 * src/shuffle.c (random_shuffle_array): Fix biased shuffle by avoiding already "struck" elements. --- src/shuffle.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/src/shuffle.c b/src/shuffle.c index 8fbefc7b..cda28f20 100644 --- a/src/shuffle.c +++ b/src/shuffle.c @@ -104,12 +104,16 @@ static void random_shuffle_array (void **a, size_t len) { size_t i; - for (i = 0; i < len; i++) + + if (len <= 1) + return; + + for (i = len - 1; i >= 1; i--) { void *t; /* Pick random element and swap. */ - unsigned int j = make_rand () % len; + unsigned int j = make_rand () % (i + 1); if (i == j) continue; -- 2.45.1