On Monday, 1 May 2017 at 04:15:35 UTC, H. S. Teoh wrote:
Given a set A of n elements (let's say it's a random-access
range of
size n, where n is relatively large), and a predicate P(x) that
specifies some subset of A of elements that we're interested
in, what's
the best algorithm (in terms of big-O time complexity) for
selecting a
random element x satisfying P(x), such that elements that
satisfy P(x)
have equal probability of being chosen? (Elements that do not
satisfy
P(x) are disregarded.)
I'd like to note here that, if you make use of the same P(x) many
times (instead of different predicates on each call), it makes
sense to spend O(n) time and memory filtering by that predicate
and storing the result, and then answer each query in O(1).
3) Permutation walk:
auto r = ... /* elements of A */
foreach (i; iota(0 .. r.length).randomPermutation) {
if (P(r[i])) return r[i];
}
/* no elements satisfy P(x) */
Advantages: if an element that satisfies P(x) is found
early, the
loop will terminate before n iterations. This seems like the
best of
both worlds of (1) and (2), except:
Disadvantages: AFAIK, generating a random permutation of
indices from
0 .. n requires at least O(n) time, so any advantage we may
have had
seems to be negated.
Is there an algorithm for *incrementally* generating a random
permutation of indices? If there is, we could use that in (3)
and thus achieve the best of both worlds between early
termination if an element satisfying P(x) is found, and
guaranteeing termination after n iterations if no elements
satisfying P(x) exists.
Yes, there is.
There are actually two variations of Fisher-Yates shuffle:
(https://en.wikipedia.org/wiki/Fisher%E2%80%93Yates_shuffle)
1.
auto p = n.iota.array;
foreach (pos; 0..n) {
auto otherPos = uniform (0, pos + 1);
swap (p[pos], p[otherPos]);
}
When we look at this after k-th iteration, the first k elements
are randomly and uniformly permuted, and the rest (n-k) are left
untouched.
2.
auto p = n.iota.array;
foreach (pos; 0..n) {
auto otherPos = uniform (pos, n);
swap (p[pos], p[otherPos]);
}
When we look at this after k-th iteration, the first k elements
are a random combination of all n elements, and this combination
is randomly and uniformly permuted. So, the second variation is
what we need: each new element is randomly and uniformly selected
from all the elements left. Once we get the first element
satisfying the predicate, we can just terminate the loop. If
there are m out of n elements satisfying the predicate, the
average number of steps is n/m.
Now, the problem is that both of these allocate n "size_t"-s of
memory to start with. And your problem does not allow to shuffle
the elements of the original array in place, so we do need an
external permutation for these algorithms. However, there are at
least two ways to mitigate that:
(I)
We can allocate the permutation once using n time and memory, and
then, on every call, just reuse it in its current state in n/m
time. It does not matter if the permutation is not identity
permutation: by symmetry argument, any starting permutation will
do just fine.
(II)
We can store the permutation p in an associative array instead of
a regular array, actually storing only the elements accessed at
least once, and assuming other elements to satisfy the identity
p[x] = x. So, if we finish in n/m steps on average, the time and
extra memory used will be O(n/m) too. I can put together an
example implementation if this best satisfies your requirements.
Ivan Kazmenko.