On Wednesday, 18 April 2012 at 01:17:18 UTC, Joseph Rushton Wakeling wrote:
On 17/04/12 18:25, Joseph Rushton Wakeling wrote:
On 17/04/12 17:31, Andrei Alexandrescu wrote:
Actually that's not correct. RandomSample works fine with an input range and
does not keep it in memory.

Ahh, OK. I should have anticipated this as the output also works as a range.

A query on this point, as it looks like this can have some unfortunate side-effects.

If I write e.g.

    auto s = randomSample(iota(0,100), 5);

    foreach(uint i; s)
        writeln(i);

    writeln();

    foreach(uint i; s)
        writeln(i);

... then it's noticeable that the _first_ element of the two sample lists is identical, while the others change. If I put in place a specified source of randomness, e.g.

    auto s = randomSample(iota(0,100), 5, rndGen);

... then the numbers stay the same for both lists.

I'm presuming that the first element remains identical because it's defined in the constructor rather than elsewhere, but it's not clear why passing a defined RNG produces identical output on both occasions.

The effect can be worse with Vitter's Algorithm D because often, having defined the first element, the second may be derived deterministically -- meaning the first 2 elements of the sample are identical.

I'm sure that the above use of the RandomSample struct is not the advised use, but it's still disconcerting to see this.

I've run into this trap more than once. :) You have to pass the random number generator by ref, otherwise you are just generating two identical sequences of random numbers. Just change the randomSample signature to:

auto randomSampleVitter(R, Random)(R r, size_t n, ref Random gen)

-Lars

Reply via email to