On Mon, Dec 10, 2018 at 10:27 AM Warren Weckesser < warren.weckes...@gmail.com> wrote:
> > > On 12/10/18, Ralf Gommers <ralf.gomm...@gmail.com> wrote: > > On Sun, Dec 9, 2018 at 2:00 PM Alan Isaac <alan.is...@gmail.com> wrote: > > > >> I believe this was proposed in the past to little enthusiasm, > >> with the response, "you're using a library; learn its functions". > >> > > > > Not only that, NumPy and the core libraries around it are the standard > for > > numerical/statistical computing. If core Python devs want to replicate a > > small subset of that functionality in a new Python version like 3.6, it > > would be sensible for them to choose compatible names. I don't think > > there's any justification for us to bother our users based on new things > > that get added to the stdlib. > > > > > >> Nevertheless, given the addition of `choices` to the Python > >> random module in 3.6, it would be nice to have the *same name* > >> for parallel functionality in numpy.random. > >> > >> And given the redundancy of numpy.random.sample, it would be > >> nice to deprecate it with the intent to reintroduce > >> the name later, better aligned with Python's usage. > >> > > > > No, there is nothing wrong with the current API, so I'm -10 on > deprecating > > it. > > Actually, the `numpy.random.choice` API has one major weakness. When > `replace` is False and `size` is greater than 1, the function is actually > drawing a *one* sample from a multivariate distribution. For the other > multivariate distributions (multinomial, multivariate_normal and > dirichlet), `size` sets the number of samples to draw from the > distribution. With `replace=False` in `choice`, size becomes a *parameter* > of the distribution, and it is only possible to draw one (multivariate) > sample. > I'm not sure I follow. `choice` draws samples from a given 1-D array, more than 1: In [12]: np.random.choice(np.arange(5), size=2, replace=True) Out[12]: array([2, 2]) In [13]: np.random.choice(np.arange(5), size=2, replace=False) Out[13]: array([3, 0]) The multivariate distribution you're talking about is for generating the indices I assume. Does the current implementation actually give a result for size>1 that has different statistic properties from calling the function N times with size=1? If so, that's definitely worth a bug report at least (I don't think there is one for this). Cheers, Ralf > I thought about this some time ago, and came up with an API that > eliminates the boolean flag, and separates the `size` argument from the > number of items drawn in one sample, which I'll call `nsample`. To avoid > creating a "false friend" with the standard library and with numpy's > `choice`, I'll call this function `select`. > > Here's the proposed signature and docstring. (A prototype implementation > is in a gist at > https://gist.github.com/WarrenWeckesser/2e5905d116e710914af383ee47adc2bf.) > The key feature is the `nsample` argument, which sets how many items to > select from the given collection. Also note that this function is *always* > drawing *without replacement*. It covers the `replace=True` case because > drawing one item without replacement is the same as drawing one item with > replacement. > > Whether or not an API like the following is used, it would be nice if > there was some way to get multiple samples in the `replace=False` case in > one function call. > > def select(items, nsample=None, p=None, size=None): > """ > Select random samples from `items`. > > The function randomly selects `nsample` items from `items` without > replacement. > > Parameters > ---------- > items : sequence > The collection of items from which the selection is made. > nsample : int, optional > Number of items to select without replacement in each draw. > It must be between 0 and len(items), inclusize. > p : array-like of floats, same length as `items, optional > Probabilities of the items. If this argument is not given, > the elements in `items` are assumed to have equal probability. > size : int, optional > Number of variates to draw. > > Notes > ----- > `size=None` means "generate a single selection". > > If `size` is None, the result is equivalent to > numpy.random.choice(items, size=nsample, replace=False) > > `nsample=None` means draw one (scalar) sample. > If `nsample` is None, the functon acts (almost) like nsample=1 (see > below for more information), and the result is equivalent to > numpy.random.choice(items, size=size) > In effect, it does choice with replacement. The case `nsample=None` > can be interpreted as each sample is a scalar, and `nsample=k` > means each sample is a sequence with length k. > > If `nsample` is not None, it must be a nonnegative integer with > 0 <= nsample <= len(items). > > If `size` is not None, it must be an integer or a tuple of integers. > When `size` is an integer, it is treated as the tuple ``(size,)``. > > When both `nsample` and `size` are not None, the result > has shape ``size + (nsample,)``. > > Examples > -------- > Make 6 choices with replacement from [10, 20, 30, 40]. (This is > equivalent to "Make 1 choice without replacement from [10, 20, 30, 40]; > do it six times.") > > >>> select([10, 20, 30, 40], size=6) > array([20, 20, 40, 10, 40, 30]) > > Choose two items from [10, 20, 30, 40] without replacement. Do it six > times. > > >>> select([10, 20, 30, 40], nsample=2, size=6) > array([[40, 10], > [20, 30], > [10, 40], > [30, 10], > [10, 30], > [10, 20]]) > > When `nsample` is an integer, there is always an axis at the end of the > result with length `nsample`, even when `nsample=1`. For example, the > shape of the array returned in the following call is (2, 3, 1) > > >>> select([10, 20, 30, 40], nsample=1, size=(2, 3)) > array([[[10], > [30], > [20]], > > [[10], > [40], > [20]]]) > > When `nsample` is None, it acts like `nsample=1`, but the trivial > dimension is not included. The shape of the array returned in the > following call is (2, 3). > > >>> select([10, 20, 30, 40], size=(2, 3)) > array([[20, 40, 30], > [30, 20, 40]]) > > """ > > > Warren > >
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion