On Tue, Jun 26, 2018 at 05:36:51PM -0700, Abe Dillon wrote: > The docs on random.sample indicate that it works with iterators: > > > To choose a sample from a range of integers, use a range() > > <https://docs.python.org/3/library/stdtypes.html#range> object as an > > argument. This is especially fast and space efficient for sampling from a > > large population: sample(range(10000000),k=60).
That doesn't mention anything about iterators. > However, when I try to use iterators other than range, like so: range is not an iterator. Thinking it is is a very common error, but it certainly is not. It is a lazily-generated *sequence*, not an iterator. The definition of an iterator is that the object must have an __iter__ method returning *itself*, and a __next__ method (the "iterator protocol"): py> obj = range(100) py> hasattr(obj, '__next__') False py> obj.__iter__() is obj False However, it is a sequence: py> import collections py> isinstance(obj, collections.Sequence) True (Aside: I'm surprised there's no inspect.isiterator and .isiterable functions.) > random.sample(itertools.product(range(height), range(with)), > 0.5*height*width) > > I get: > > TypeError: Population must be a sequence or set. For dicts, use list(d). > > I don't know if Python Ideas is the right channel for this, but this seems > overly constrained. The inability to handle dictionaries is especially > puzzling. Puzzling in what way? If sample() supported dicts, should it return the keys or the values or both? Also consider this: https://bugs.python.org/issue33098 > Randomly sampling from some population is often done because the entire > population is impractically large which is also a motivation for using > iterators, so it seems natural that one would be able to sample from an > iterator. A naive implementation could use a heap queue: > > import heapq > import random > > def stream(): > while True: yield random.random() > > def sample(population, size): > q = [tuple()]*size > for el in zip(stream(), population): > if el > q[0]: heapq.heapreplace(q, el) > return [el[1] for el in q if el] Is that an improvement over: sample(list(itertools.slice(population, size))) and if so, please explain. > It would also be helpful to add a ratio version of the function: > > def sample(population, size=None, *, ratio=None): > assert None in (size, ratio), "can't specify both sample size and ratio" > if ratio: > return [el for el in population if random.random() < ratio] > ... Helpful under what circumstances? Don't let the source speak for itself. Explain what it means. I understand what sample(population, size=100) does. What would sample(population, ratio=0.25) do? (That's not a rhetorical question, I genuinely don't understand the semantics of this proposed ratio argument.) -- Steve _______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/