Raymond Hettinger <raymond.hettin...@gmail.com> added the comment:

> This comment suggest that you have missed the general
> motivation for reservoir sampling.

Please don't get personal.  I've devoted a good deal of time thinking about 
your proposal.  Tim is also giving it an honest look. Please devote some time 
to honestly thinking about what we have to say.

FWIW, this is an area of expertise for me.  I too have been fascinated with 
reservoir sampling for several decades, have done a good deal of reading on the 
topic, and routinely performed statistical sampling as part of my job (where it 
needed to be done in a legally defensible manner).


> The idea of reservoir sampling is that you want to sample from
> an iterator, you only get one chance to iterate over it, and 
> you don't know a priori how many items it will yield.

Several thoughts:

* The need for sampling a generator or one-time stream of data is in the 
"almost never" category.  Presumably, that is why you don't find it in numpy or 
Julia.

* The examples you gave involved dicts or sets.  These aren't one-chance 
examples and we do know the length in advance.

* Whether talking about sets, dicts, generators, or arbitrary iterators, 
"sample(list(it), k)" would still work.  Both ways still have to consume the 
entire input before returning.  So really this is just an optimization, one 
that under some circumstances runs a bit faster, but one that forgoes a number 
of desirable characteristics of the existing tool.  

* IMO, sample_iter() is hard to use correctly.  In most cases, the users would 
be worse off than they are now and it would be challenging to communicate 
clearly under what circumstances they would be marginally better off.

At any rate, my recommendation stands.  This should not be part of standard 
library random module API.  Perhaps it could be a recipe or a see-also link.  
We really don't have to do this.

----------

_______________________________________
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue41311>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to