On Tue, Dec 11, 2018 at 10:32 AM Ralf Gommers <ralf.gomm...@gmail.com> wrote:
> > > On Mon, Dec 10, 2018 at 10:27 AM Warren Weckesser < > warren.weckes...@gmail.com> wrote: > >> >> >> On 12/10/18, Ralf Gommers <ralf.gomm...@gmail.com> wrote: >> > On Sun, Dec 9, 2018 at 2:00 PM Alan Isaac <alan.is...@gmail.com> wrote: >> > >> >> I believe this was proposed in the past to little enthusiasm, >> >> with the response, "you're using a library; learn its functions". >> >> >> > >> > Not only that, NumPy and the core libraries around it are the standard >> for >> > numerical/statistical computing. If core Python devs want to replicate a >> > small subset of that functionality in a new Python version like 3.6, it >> > would be sensible for them to choose compatible names. I don't think >> > there's any justification for us to bother our users based on new things >> > that get added to the stdlib. >> > >> > >> >> Nevertheless, given the addition of `choices` to the Python >> >> random module in 3.6, it would be nice to have the *same name* >> >> for parallel functionality in numpy.random. >> >> >> >> And given the redundancy of numpy.random.sample, it would be >> >> nice to deprecate it with the intent to reintroduce >> >> the name later, better aligned with Python's usage. >> >> >> > >> > No, there is nothing wrong with the current API, so I'm -10 on >> deprecating >> > it. >> >> Actually, the `numpy.random.choice` API has one major weakness. When >> `replace` is False and `size` is greater than 1, the function is actually >> drawing a *one* sample from a multivariate distribution. For the other >> multivariate distributions (multinomial, multivariate_normal and >> dirichlet), `size` sets the number of samples to draw from the >> distribution. With `replace=False` in `choice`, size becomes a *parameter* >> of the distribution, and it is only possible to draw one (multivariate) >> sample. >> > > I'm not sure I follow. `choice` draws samples from a given 1-D array, more > than 1: > > In [12]: np.random.choice(np.arange(5), size=2, replace=True) > Out[12]: array([2, 2]) > > In [13]: np.random.choice(np.arange(5), size=2, replace=False) > Out[13]: array([3, 0]) > > The multivariate distribution you're talking about is for generating the > indices I assume. Does the current implementation actually give a result > for size>1 that has different statistic properties from calling the > function N times with size=1? If so, that's definitely worth a bug report > at least (I don't think there is one for this). > > There is no bug, just a limitation in the API. When I draw without replacement, say, three values from a collection of length five, the three values that I get are not independent. So really, this is *one* sample from a three-dimensional (discrete-valued) distribution. The problem with the current API is that I can't get multiple samples from this three-dimensional distribution in one call. If I need to repeat the process six times, I have to use a loop, e.g.: >>> samples = [np.random.choice([10, 20, 30, 40, 50], replace=False, size=3) for _ in range(6)] With the `select` function I described in my previous email, which I'll call `random_select` here, the parameter that determines the number of items per sample, `nsample`, is separate from the parameter that determines the number of samples, `size`: >>> samples = random_select([10, 20, 30, 40, 50], nsample=3, size=6) >>> samples array([[30, 40, 50], [40, 50, 30], [10, 20, 40], [20, 30, 50], [40, 20, 50], [20, 10, 30]]) (`select` is a really bad name, since `numpy.select` already exists and is something completely different. I had the longer name `random.select` in mind when I started using it. "There are only two hard problems..." etc.) Warren > Cheers, > Ralf > > > >> I thought about this some time ago, and came up with an API that >> eliminates the boolean flag, and separates the `size` argument from the >> number of items drawn in one sample, which I'll call `nsample`. To avoid >> creating a "false friend" with the standard library and with numpy's >> `choice`, I'll call this function `select`. >> >> Here's the proposed signature and docstring. (A prototype implementation >> is in a gist at >> https://gist.github.com/WarrenWeckesser/2e5905d116e710914af383ee47adc2bf.) >> The key feature is the `nsample` argument, which sets how many items to >> select from the given collection. Also note that this function is *always* >> drawing *without replacement*. It covers the `replace=True` case because >> drawing one item without replacement is the same as drawing one item with >> replacement. >> >> Whether or not an API like the following is used, it would be nice if >> there was some way to get multiple samples in the `replace=False` case in >> one function call. >> >> def select(items, nsample=None, p=None, size=None): >> """ >> Select random samples from `items`. >> >> The function randomly selects `nsample` items from `items` without >> replacement. >> >> Parameters >> ---------- >> items : sequence >> The collection of items from which the selection is made. >> nsample : int, optional >> Number of items to select without replacement in each draw. >> It must be between 0 and len(items), inclusize. >> p : array-like of floats, same length as `items, optional >> Probabilities of the items. If this argument is not given, >> the elements in `items` are assumed to have equal probability. >> size : int, optional >> Number of variates to draw. >> >> Notes >> ----- >> `size=None` means "generate a single selection". >> >> If `size` is None, the result is equivalent to >> numpy.random.choice(items, size=nsample, replace=False) >> >> `nsample=None` means draw one (scalar) sample. >> If `nsample` is None, the functon acts (almost) like nsample=1 (see >> below for more information), and the result is equivalent to >> numpy.random.choice(items, size=size) >> In effect, it does choice with replacement. The case `nsample=None` >> can be interpreted as each sample is a scalar, and `nsample=k` >> means each sample is a sequence with length k. >> >> If `nsample` is not None, it must be a nonnegative integer with >> 0 <= nsample <= len(items). >> >> If `size` is not None, it must be an integer or a tuple of integers. >> When `size` is an integer, it is treated as the tuple ``(size,)``. >> >> When both `nsample` and `size` are not None, the result >> has shape ``size + (nsample,)``. >> >> Examples >> -------- >> Make 6 choices with replacement from [10, 20, 30, 40]. (This is >> equivalent to "Make 1 choice without replacement from [10, 20, 30, >> 40]; >> do it six times.") >> >> >>> select([10, 20, 30, 40], size=6) >> array([20, 20, 40, 10, 40, 30]) >> >> Choose two items from [10, 20, 30, 40] without replacement. Do it six >> times. >> >> >>> select([10, 20, 30, 40], nsample=2, size=6) >> array([[40, 10], >> [20, 30], >> [10, 40], >> [30, 10], >> [10, 30], >> [10, 20]]) >> >> When `nsample` is an integer, there is always an axis at the end of >> the >> result with length `nsample`, even when `nsample=1`. For example, the >> shape of the array returned in the following call is (2, 3, 1) >> >> >>> select([10, 20, 30, 40], nsample=1, size=(2, 3)) >> array([[[10], >> [30], >> [20]], >> >> [[10], >> [40], >> [20]]]) >> >> When `nsample` is None, it acts like `nsample=1`, but the trivial >> dimension is not included. The shape of the array returned in the >> following call is (2, 3). >> >> >>> select([10, 20, 30, 40], size=(2, 3)) >> array([[20, 40, 30], >> [30, 20, 40]]) >> >> """ >> >> >> Warren >> >> _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion@python.org > https://mail.python.org/mailman/listinfo/numpy-discussion >
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion