P.S. The cited paper calls your algorithm the "OpenBSD algorithm"
and has a bunch of benchmarks comparing it to others in Fisher-Yates
shuffles of sizes 1e3..1e9.

Including all overhead (base PRNG, shuffle), it's 3x slower for
32-bit operations and 8x slower for 64-bit up to arrays of size
1e6, after which cache misses slow all algorithms, reducing the
ratio.

If you want a faster division-based agorithm, the "Java algorithm"
does 1+retries divides:

unsigned long java(unsigned long s)
{
        unsigned long x, r;

        do {
                x = random_integer();
                r = x % s;
        } while (x - r > -s);
        return r;
}

Reply via email to