I'm unsure whether I should post into the ".learn" sub-forum or some other one in such a case, but still.

I wonder what is the use case of std.random.randomCover when one has std.random.randomShuffle. I was trying to use it just to get a random permutation of integers, and found randomCover prior to randomShuffle. However, for the number of elements as low as 10,000, the delay was already rather surprising, so I searched for a faster solution and found randomShuffle does a superior job. And now I wonder: how does one correctly use randomCover? Below is a sample test program showing the difference.

-----
import std.array;
import std.random;
import std.range;
import std.stdio;

immutable int MAX_N = 10_000;

int [] fun (int n, ref Random rnd)
{
        auto t = array (iota (MAX_N));
        version (randomCover)
        {
                auto c = randomCover (t, rnd);
        }
        version (randomShuffle)
        {
                auto c = t;
                randomShuffle (c, rnd);
        }

        return array (c);
}

void main ()
{
        auto rnd = Random (123456789);
        writeln (fun (MAX_N, rnd));
        writeln (fun (MAX_N, rnd) == fun (MAX_N, rnd));
}
-----

Here is a comparison:

1. Speed.
+randomShuffle performs O(n) steps and O(n) uniform() calls.
-randomCover performs O(n^2) steps and O(n^2) uniform() calls.

The latter however can (and perhaps should?) be optimized to O(n): in the implementation, the line
            auto chooseMe = uniform(0, k, _rnd) == 0;
can be moved outside the foreach loop and store the integer
        auto toPick = uniform(0, k, _rnd);
instead of a bool. I can try and write the respective patch if needed.

2. Size.
-randomShuffle does not allocate anything extra, but modifies the range in place, and so requires allocating another range of n values if the original range has to be stored too. +randomCover only allocates an array of n bools. If that is the intended advantage, the implementation would be better off using a bit array instead of a bool array, as in this enhancement proposition: http://d.puremagic.com/issues/show_bug.cgi?id=2898

3. Laziness.
-randomShuffle just does its job once.
+randomCover produces some sort of a lazy generator instead. Still, the generator performs an O(n) computation on each step, so the profit is debatable.

4. Convenience.
+randomShuffle called multiple times with the same RNG advances the internal state of the RNG and thus produces different results. If one needs the same results, it is still achievable by knowingly saving and loading the internal state of the RNG. -randomCover called multiple times with the same RNG copies the RNG each time by value and thus produces the same result. That is not the intended behavior in the majority of use cases I can imagine (e.g., generating different random permutations in a loop). This is already the topic of an issue I found: http://d.puremagic.com/issues/show_bug.cgi?id=7067

Now, the only case I can think of where randomCover should be preferred to randomShuffle is when you have a huge range (hundreds of Mb), but you need to iterate only through the first few values in randomCover. Is there any other?

Whether the above is indeed the intended use of randomCover or not, I think that the intended use (and a reference to randomShuffle for other cases) should be mentioned in the documentation along with the time complexity.

More on the topic of optimization, the performance of the whole randomCover thing can be optimized to O(n log n) using a Fenwick tree or such to popFront in O(log n). But it will then require storing n integers, not n bools, thus losing the advantage of having smaller memory requirements than randomShuffle with copy.

-----
Ivan Kazmenko.

Reply via email to