On Wednesday, 18 April 2012 at 01:17:18 UTC, Joseph Rushton
Wakeling wrote:
On 17/04/12 18:25, Joseph Rushton Wakeling wrote:
On 17/04/12 17:31, Andrei Alexandrescu wrote:
Actually that's not correct. RandomSample works fine with an
input range and
does not keep it in memory.
Ahh, OK. I should have anticipated this as the output also
works as a range.
A query on this point, as it looks like this can have some
unfortunate side-effects.
If I write e.g.
auto s = randomSample(iota(0,100), 5);
foreach(uint i; s)
writeln(i);
writeln();
foreach(uint i; s)
writeln(i);
... then it's noticeable that the _first_ element of the two
sample lists is identical, while the others change. If I put
in place a specified source of randomness, e.g.
auto s = randomSample(iota(0,100), 5, rndGen);
... then the numbers stay the same for both lists.
I'm presuming that the first element remains identical because
it's defined in the constructor rather than elsewhere, but it's
not clear why passing a defined RNG produces identical output
on both occasions.
The effect can be worse with Vitter's Algorithm D because
often, having defined the first element, the second may be
derived deterministically -- meaning the first 2 elements of
the sample are identical.
I'm sure that the above use of the RandomSample struct is not
the advised use, but it's still disconcerting to see this.
I've run into this trap more than once. :) You have to pass the
random number generator by ref, otherwise you are just generating
two identical sequences of random numbers. Just change the
randomSample signature to:
auto randomSampleVitter(R, Random)(R r, size_t n, ref Random
gen)
-Lars