On Tuesday, 19 February 2013 at 11:46:54 UTC, Ivan Kazmenko wrote:
Hi!
Thank you for the reply.
Hum... randomShuffle and randomSample actually have nothing to
do with each other.
<snip>
I'd like to note that my post is about randomCover, not
randomSample. I do see the difference between the purpose of
randomSample and randomShuffle. But randomCover's effect is,
at the first glance, just a slower version of randomSample
wrapped as a lazy generator.
Hum... sorry about that.
I also want to comment on your "randomSample" vs
"randomSuffle" implementation suggestion. Keep in mind that:
a) randomSample doesn't allocate, whereas yours suggestion
doesn't
b) randomSample gives direct access to the elements, whereas
your suggestion doesn't.
If you don't care about a) and b), then by all means, dup
away, and get better performance!
But think about the fact that you wouldn't be able to do
something like this...
<snip>
auto sample = randomSample(arr[], 5);
foreach(ref a; sample)
++a;
That stands for randomCover, too. Well, thank you, perhaps
that's the difference I was seeking.
If this is the intended difference, well, my proposition to
enhance randomCover's performance and usefulness transforms
into:
1. Document the usage of randomCover with an example such as
above, and refer to randomShuffle as a faster version for
simpler use cases.
2. Optimize the performance by putting Fenwick trees to good
use.
Currently, randomCover'ing 10,000 elements takes time on the
order of one second, and for 100,000 or more elements, it is
hardly usable.
Extra documentation never hurts, but keep in mind that a ton of
algorithms in phobos are lazy and operate this way. Usually, only
the lazy version is implemented, as the aggressive version is
trivial (as you suggested).
AFAIK, most of the ranges in std.range are lazy (though not
obviously) in one way or another.
Last but not least, be warned there is an old-standing bug
with anything in phobos that takes a PRNG by value. Basically,
the PRNG will be duplicated, and generate the same sequence
over and over. Ergo, do NOT pass a specific random generator
to things like randomSample or randomSuffle.
This problem is one of the major reason we are currently (and
slowly) re-designing random into random2.
So, there is a general agreement that in random2, RNG should by
default get passed by reference everywhere? That's nice to
hear.
-----
Ivan Kazmenko.
The agreement is rather to make them reference types, so even
when passed by value, you won't accidentally duplicate them. This
is important as you sometimes pass random ranges to algorithms
that have nothing to do with randomness.
I'd say the "textbook" example would be:
//----
import std.random;
import std.algorithm;
import std.stdio;
void main()
{
uint[5] arr1;
arr1[].fill(rndGen);
uint[5] arr2;
arr2[].fill(rndGen);
writeln("arr1: ", arr1[]);
writeln("arr1: ", arr2[]);
}
//----
arr1: [3622200385, 2579361262, 3208046123, 1753759120, 133131992]
arr2: [3622200385, 2579361262, 3208046123, 1753759120, 133131992]
//----
Oops!