Serhiy Storchaka <storchaka+cpyt...@gmail.com> added the comment:
> ISTM that if a generator produces so much data that it is infeasible to fit > in memory, then it will also take a long time to loop over it and generate a > random value for each entry. Good point! $ ./python -m timeit -s 'from random import sample as s' 's(range(10**6), 50)' 10000 loops, best of 5: 25.6 usec per loop $ ./python -m timeit -s 'from random import sample as s' 's(list(range(10**6)), 50)' 10 loops, best of 5: 31.5 msec per loop $ ./python -m timeit -s 'from random import reservoir_sample as s' 's(range(10**6), 50)' 1 loop, best of 5: 328 msec per loop $ ./python -m timeit -s 'from random import sample as s' 's(range(10**8), 50)' 10000 loops, best of 5: 26.9 usec per loop $ ./python -m timeit -s 'from random import sample as s' 's(list(range(10**8)), 50)' 1 loop, best of 5: 3.41 sec per loop $ ./python -m timeit -s 'from random import reservoir_sample as s' 's(range(10**8), 50)' 1 loop, best of 5: 36.5 sec per loop It is possible that a generator produces not so much data, but every item takes much memory so the total size does not fit in memory. But I suppose that the generation time of larger items will be proportionally larger, so reservoir_sample() will be just as slow. ---------- _______________________________________ Python tracker <rep...@bugs.python.org> <https://bugs.python.org/issue37682> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com