randomSample with unknown length

Magnus Lie Hetland Wed, 02 Feb 2011 04:06:10 -0800

Reading the doc for std.random.randomSample, I saw that "The totallength of r must be known". There are rather straightforward algorithmsfor drawing random samples *without* knowing this. This might be usefulif one wants to support input ranges, I guess?

Take, for example, the method described by Knuth (TAoP 2), forselecting n elements uniformly at random from an input range:


- Select the first n elements as the current sample.

- Each subsequent element is rejected with a probability of 1 - n/t,where t is the number seen so far.

- If a new item is selected, it replaces a random item in the current sample.

A cool property of this is that at any time, the current sample is onedrawn randomly (i.e., uniformly, without replacement) from the itemsyou've seen so far, so you could really stop at any point. That is,stop iterating over the input; you can't really give the output as asmall-footprint range here (as far as I can see). Seems you have toallocate room for n pointers. (Or you *could* just keep track of whichobjects were swapped in -- might be worth the overhead if n is largecompared to the input size.)


--
Magnus Lie Hetland
http://hetland.org

randomSample with unknown length

Reply via email to