On Monday, 3 June 2013 at 13:18:30 UTC, Joseph Rushton Wakeling
wrote:
On 06/03/2013 02:30 PM, Joseph Rushton Wakeling wrote:
On 06/03/2013 01:29 PM, Diggory wrote:
For small samples from very large ranges an efficient
algorithm would be:
int[] randomGen(int N, int M) {
if (N == 0)
return [];
int[] result = randomGen(N-1, M-1);
int num = rand(M);
foreach (ref int i; result)
if (i >= num)
++i;
result ~= num;
return result;
}
Are you sure about the efficiency of that algorithm? Looks
like it's be O(M) to me.
Apologies, I misread the algorithm slightly. Your qualifier is
quite correct --
when the number of sample points is very small (e.g. 5 out of
some arbitrary
very large number), that algorithm is very quick indeed (I ran
some tests as I
was curious:-), and can outperform D's randomSample.
It doesn't scale with the size of the input (your value M), but
with the number
of sample points n, but that scaling is very bad -- so as the
sample size grows
it becomes _much_ worse than randomSample.
Do you have a reference for the algorithm? Off the top of my
head I don't
recognize it from any of the texts I've read.
Thanks for testing before dismissing completely :P The way it
returns results can be improved a lot by pre-allocating a range
of the necessary size/using a range passed in.
The complexity is O(N²) where N is the number of samples out of
M. The algorithm is something I just thought of although I've
possibly used it before and I'm sure someone else has thought of
it too. It's quite simple, it effectively says "pick a value from
the range except for this value" recursively but instead of
dividing the range in two, it shifts the top part down first to
make a contiguous range and then shifts the results which are
past the split up one afterward.
I expect the phobos implementation to be something like O(M) in
which case my method would be faster whenever N < sqrt(M) (with
the optimisations)