The docs on random.sample indicate that it works with iterators:

> To choose a sample from a range of integers, use a range() 
> <https://docs.python.org/3/library/stdtypes.html#range> object as an 
> argument. This is especially fast and space efficient for sampling from a 
> large population: sample(range(10000000),k=60).


However, when I try to use iterators other than range, like so:

random.sample(itertools.product(range(height), range(with)), 
0.5*height*width)

I get:

TypeError: Population must be a sequence or set. For dicts, use list(d).

I don't know if Python Ideas is the right channel for this, but this seems 
overly constrained. The inability to handle dictionaries is especially 
puzzling.
Randomly sampling from some population is often done because the entire 
population is impractically large which is also a motivation for using 
iterators, so it seems natural that one would be able to sample from an 
iterator. A naive implementation could use a heap queue: 
import heapq
import random

def stream(): 
    while True: yield random.random()

def sample(population, size):
    q = [tuple()]*size
    for el in zip(stream(), population):
        if el > q[0]: heapq.heapreplace(q, el)
    return [el[1] for el in q if el]

It would also be helpful to add a ratio version of the function: 

def sample(population, size=None, *, ratio=None):
    assert None in (size, ratio), "can't specify both sample size and ratio"
    if ratio:
        return [el for el in population if random.random() < ratio]
    ...


_______________________________________________
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

Reply via email to