The docs on random.sample indicate that it works with iterators:
> To choose a sample from a range of integers, use a range()
> <https://docs.python.org/3/library/stdtypes.html#range> object as an
> argument. This is especially fast and space efficient for sampling from a
> large population: sample(range(10000000),k=60).
However, when I try to use iterators other than range, like so:
random.sample(itertools.product(range(height), range(with)),
0.5*height*width)
I get:
TypeError: Population must be a sequence or set. For dicts, use list(d).
I don't know if Python Ideas is the right channel for this, but this seems
overly constrained. The inability to handle dictionaries is especially
puzzling.
Randomly sampling from some population is often done because the entire
population is impractically large which is also a motivation for using
iterators, so it seems natural that one would be able to sample from an
iterator. A naive implementation could use a heap queue:
import heapq
import random
def stream():
while True: yield random.random()
def sample(population, size):
q = [tuple()]*size
for el in zip(stream(), population):
if el > q[0]: heapq.heapreplace(q, el)
return [el[1] for el in q if el]
It would also be helpful to add a ratio version of the function:
def sample(population, size=None, *, ratio=None):
assert None in (size, ratio), "can't specify both sample size and ratio"
if ratio:
return [el for el in population if random.random() < ratio]
...
_______________________________________________
Python-ideas mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/