On Monday, 3 June 2013 at 17:35:22 UTC, Joseph Rushton Wakeling
wrote:
On 06/03/2013 07:00 PM, Diggory wrote:
Thanks for testing before dismissing completely :P The way it
returns results
can be improved a lot by pre-allocating a range of the
necessary size/using a
range passed in.
Surely. :-)
The complexity is O(N²) where N is the number of samples out
of M. The algorithm
is something I just thought of although I've possibly used it
before and I'm
sure someone else has thought of it too. It's quite simple, it
effectively says
"pick a value from the range except for this value"
recursively but instead of
dividing the range in two, it shifts the top part down first
to make a
contiguous range and then shifts the results which are past
the split up one
afterward.
It's an interesting little piece of invention. I'll have to
look around and see
if there's anything similar documented. I've not seen anything
in the
literature which has this particular property of O(N) random
variates plus
O(N^2) calculation.
The fact that it's not sequential might have been a bit of an
issue
historically, which maybe explains why I haven't come across
something like it.
I expect the phobos implementation to be something like O(M)
in which case my
method would be faster whenever N < sqrt(M) (with the
optimisations)
No, the Phobos implementation is O(N) -- I wrote it :-) See:
http://braingam.es/2012/07/sampling-d/
The API is Andrei's -- the original implementation used what
Knuth refers to as
Algorithm S [which is really simple and easy to implement and
check, but is
indeed O(M)], I updated it to use Algorithm D (which is
complicated:-).
So, since randomSample takes kN time where k is constant, your
algorithm will
probably win where N < k.
I found that the sampling time was about equal with N = 50 and
your algorithm as
it's currently written.
I'd guess that the heavy use of floating point arithmetic to
calculate the step sizes means that algorithm has a fairly large
constant overhead even though the complexity is smaller.