The docs on random.sample indicate that it works with iterators: > To choose a sample from a range of integers, use a range() > <https://docs.python.org/3/library/stdtypes.html#range> object as an > argument. This is especially fast and space efficient for sampling from a > large population: sample(range(10000000),k=60).
However, when I try to use iterators other than range, like so: random.sample(itertools.product(range(height), range(with)), 0.5*height*width) I get: TypeError: Population must be a sequence or set. For dicts, use list(d). I don't know if Python Ideas is the right channel for this, but this seems overly constrained. The inability to handle dictionaries is especially puzzling. Randomly sampling from some population is often done because the entire population is impractically large which is also a motivation for using iterators, so it seems natural that one would be able to sample from an iterator. A naive implementation could use a heap queue: import heapq import random def stream(): while True: yield random.random() def sample(population, size): q = [tuple()]*size for el in zip(stream(), population): if el > q[0]: heapq.heapreplace(q, el) return [el[1] for el in q if el] It would also be helpful to add a ratio version of the function: def sample(population, size=None, *, ratio=None): assert None in (size, ratio), "can't specify both sample size and ratio" if ratio: return [el for el in population if random.random() < ratio] ...
_______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/