On 8 September 2016 at 15:05, Danilo J. S. Bellini <danilo.bell...@gmail.com> wrote: >> 1. The cognitive leap between shuffling and sampling isn't small > > I don't think so, actually the Fisher-Yates shuffle algorithm algorithm is > an iterative sampling algorithm: > https://gist.github.com/danilobellini/6384872
I'm not talking about mathematical equivalence, I'm talking about making an unaided leap from "shuffle this deck of cards" (random.shuffle) and "pick a card, any card" (random.choice) to "choose a random sample from this population" (random.sample). I can see people following that logic given suitable instruction (since they really are closely related operations), but it's a tough connection to see on your own. >> 4. With a default, random.sample becomes more easily confused with >> random.choice > > I don't think k=1 would be a good default sample size from a statistics > point of view, but I get the point (I'm from a DSP background, where "a > sample" means one single "value"). Likewise - it isn't that I think "1" would be a reasonable default, it's that without the second argument being there, I lapse back into DSP terminology rather than statistical terminology. Requiring the second argument as random.sample() does today keeps everything nicely unambiguous. > Controling the random function is required for the function to be really > pure, else its output won't depend only on the inputs (and there would be > some "state" in that "implicit input"). That would also be a great feature > when non-uniform (or external) random number generators are to be used. This > seem to be something that only shuffle gives some control (among the > functions we're talking about), or am I missing something? The module level "functions" in random are just bound methods for a default global random.Random() instance, so they're not truly pure - there's interdependence there via the shared PRNG state. However, by creating your *own* Random instance, or another object that provides the same API, you can get a lot more control over things, including reproducible behaviour for a given seed. I'm not familiar with how shuffle works internally, but presumably passing a non-uniform distribution is a way let you bias the shuffle (the docs don't actually explain *why* you'd want to use a randomiser other than the default). Cheers, Nick. -- Nick Coghlan | ncogh...@gmail.com | Brisbane, Australia _______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/