On Tuesday, 19 February 2013 at 11:46:54 UTC, Ivan Kazmenko wrote:
Hi!

Thank you for the reply.

Hum... randomShuffle and randomSample actually have nothing to do with each other.
<snip>

I'd like to note that my post is about randomCover, not randomSample. I do see the difference between the purpose of randomSample and randomShuffle. But randomCover's effect is, at the first glance, just a slower version of randomSample wrapped as a lazy generator.

Hum... sorry about that.

I also want to comment on your "randomSample" vs "randomSuffle" implementation suggestion. Keep in mind that: a) randomSample doesn't allocate, whereas yours suggestion doesn't b) randomSample gives direct access to the elements, whereas your suggestion doesn't.

If you don't care about a) and b), then by all means, dup away, and get better performance!

But think about the fact that you wouldn't be able to do something like this...
<snip>
       auto sample = randomSample(arr[], 5);
       foreach(ref a; sample)
           ++a;

That stands for randomCover, too. Well, thank you, perhaps that's the difference I was seeking.

If this is the intended difference, well, my proposition to enhance randomCover's performance and usefulness transforms into:

1. Document the usage of randomCover with an example such as above, and refer to randomShuffle as a faster version for simpler use cases.

2. Optimize the performance by putting Fenwick trees to good use. Currently, randomCover'ing 10,000 elements takes time on the order of one second, and for 100,000 or more elements, it is hardly usable.

Extra documentation never hurts, but keep in mind that a ton of algorithms in phobos are lazy and operate this way. Usually, only the lazy version is implemented, as the aggressive version is trivial (as you suggested).

AFAIK, most of the ranges in std.range are lazy (though not obviously) in one way or another.

Last but not least, be warned there is an old-standing bug with anything in phobos that takes a PRNG by value. Basically, the PRNG will be duplicated, and generate the same sequence over and over. Ergo, do NOT pass a specific random generator to things like randomSample or randomSuffle.

This problem is one of the major reason we are currently (and slowly) re-designing random into random2.

So, there is a general agreement that in random2, RNG should by default get passed by reference everywhere? That's nice to hear.

-----
Ivan Kazmenko.

The agreement is rather to make them reference types, so even when passed by value, you won't accidentally duplicate them. This is important as you sometimes pass random ranges to algorithms that have nothing to do with randomness.

I'd say the "textbook" example would be:

//----
import std.random;
import std.algorithm;
import std.stdio;

void main()
{
    uint[5] arr1;
    arr1[].fill(rndGen);
    uint[5] arr2;
    arr2[].fill(rndGen);
    writeln("arr1: ", arr1[]);
    writeln("arr1: ", arr2[]);
}
//----
arr1: [3622200385, 2579361262, 3208046123, 1753759120, 133131992]
arr2: [3622200385, 2579361262, 3208046123, 1753759120, 133131992]
//----
Oops!

Reply via email to