What is the use case of RandomCover?

Ivan Kazmenko Mon, 18 Feb 2013 06:40:20 -0800

I'm unsure whether I should post into the ".learn" sub-forum orsome other one in such a case, but still.

I wonder what is the use case of std.random.randomCover when onehas std.random.randomShuffle. I was trying to use it just to geta random permutation of integers, and found randomCover prior torandomShuffle. However, for the number of elements as low as10,000, the delay was already rather surprising, so I searchedfor a faster solution and found randomShuffle does a superiorjob. And now I wonder: how does one correctly use randomCover?Below is a sample test program showing the difference.


-----
import std.array;
import std.random;
import std.range;
import std.stdio;

immutable int MAX_N = 10_000;

int [] fun (int n, ref Random rnd)
{
        auto t = array (iota (MAX_N));
        version (randomCover)
        {
                auto c = randomCover (t, rnd);
        }
        version (randomShuffle)
        {
                auto c = t;
                randomShuffle (c, rnd);
        }

        return array (c);
}

void main ()
{
        auto rnd = Random (123456789);
        writeln (fun (MAX_N, rnd));
        writeln (fun (MAX_N, rnd) == fun (MAX_N, rnd));
}
-----

Here is a comparison:

1. Speed.
+randomShuffle performs O(n) steps and O(n) uniform() calls.
-randomCover performs O(n^2) steps and O(n^2) uniform() calls.

The latter however can (and perhaps should?) be optimized toO(n): in the implementation, the line

            auto chooseMe = uniform(0, k, _rnd) == 0;
can be moved outside the foreach loop and store the integer
        auto toPick = uniform(0, k, _rnd);

instead of a bool. I can try and write the respective patch ifneeded.


2. Size.

-randomShuffle does not allocate anything extra, but modifies therange in place, and so requires allocating another range of nvalues if the original range has to be stored too.+randomCover only allocates an array of n bools. If that is theintended advantage, the implementation would be better off usinga bit array instead of a bool array, as in this enhancementproposition: http://d.puremagic.com/issues/show_bug.cgi?id=2898


3. Laziness.
-randomShuffle just does its job once.

+randomCover produces some sort of a lazy generator instead.Still, the generator performs an O(n) computation on each step,so the profit is debatable.


4. Convenience.

+randomShuffle called multiple times with the same RNG advancesthe internal state of the RNG and thus produces differentresults. If one needs the same results, it is still achievableby knowingly saving and loading the internal state of the RNG.-randomCover called multiple times with the same RNG copies theRNG each time by value and thus produces the same result. Thatis not the intended behavior in the majority of use cases I canimagine (e.g., generating different random permutations in aloop). This is already the topic of an issue I found:http://d.puremagic.com/issues/show_bug.cgi?id=7067

Now, the only case I can think of where randomCover should bepreferred to randomShuffle is when you have a huge range(hundreds of Mb), but you need to iterate only through the firstfew values in randomCover. Is there any other?

Whether the above is indeed the intended use of randomCover ornot, I think that the intended use (and a reference torandomShuffle for other cases) should be mentioned in thedocumentation along with the time complexity.

More on the topic of optimization, the performance of the wholerandomCover thing can be optimized to O(n log n) using a Fenwicktree or such to popFront in O(log n). But it will then requirestoring n integers, not n bools, thus losing the advantage ofhaving smaller memory requirements than randomShuffle with copy.


-----
Ivan Kazmenko.

What is the use case of RandomCover?

Reply via email to