On Wed, Jan 18, 2017 at 4:52 AM, Nadav Har'El <[email protected]> wrote:
> > On Wed, Jan 18, 2017 at 11:00 AM, [email protected] <[email protected]> > wrote: > >> Let's look at what the user asked this function, and what it returns: >> >>> >>> User asks: please give me random pairs of the three items, where item 1 >>> has probability 0.2, item 2 has 0.4, and 3 has 0.4. >>> >>> Function returns: random pairs, where if you make many random returned >>> results (as in the law of large numbers) and look at the items they >>> contain, item 1 is 0.2333 of the items, item 2 is 0.38333, and item 3 is >>> 0.38333. >>> These are not (quite) the probabilities the user asked for... >>> >>> Can you explain a sense where the user's requested probabilities (0.2, >>> 0.4, 0.4) are actually adhered in the results which random.choice returns? >>> >> >> I think that the question the user is asking by specifying p is a >> slightly different one: >> "please give me random pairs of the three items extracted from a >> population of 3 items where item 1 has probability of being extracted of >> 0.2, item 2 has 0.4, and 3 has 0.4. Also please remove extract items once >> extracted." >> > > You are right, if that is what the user wants, numpy.random.choice does > the right thing. > > I'm just wondering whether this is actually what users want, and whether > they understand this is what they are getting. > > As I said, I expected it to generate pairs with, empirically, the desired > distribution of individual items. The documentation of numpy.random.choice > seemed to me (wrongly) that it implis that that's what it does. So I was > surprised to realize that it does not. > As Alessandro and you showed, the function returns something that makes sense. If the user wants something different, then they need to look for a different function, which is however difficult if it doesn't have a solution in general. Sounds to me a bit like a Monty Hall problem. Whether we like it or not, or find it counter intuitive, it is what it is given the sampling scheme. Having more sampling schemes would be useful, but it's not possible to implement sampling schemes with impossible properties Josef > > Nadav. > > _______________________________________________ > NumPy-Discussion mailing list > [email protected] > https://mail.scipy.org/mailman/listinfo/numpy-discussion > >
_______________________________________________ NumPy-Discussion mailing list [email protected] https://mail.scipy.org/mailman/listinfo/numpy-discussion
