On Sun, Oct 12, 2014 at 12:14 PM, Warren Weckesser < warren.weckes...@gmail.com> wrote:
> > > On Sat, Oct 11, 2014 at 6:51 PM, Warren Weckesser < > warren.weckes...@gmail.com> wrote: > >> I created an issue on github for an enhancement >> to numpy.random.shuffle: >> https://github.com/numpy/numpy/issues/5173 >> I'd like to get some feedback on the idea. >> >> Currently, `shuffle` shuffles the first dimension of an array >> in-place. For example, shuffling a 2D array shuffles the rows: >> >> In [227]: a >> Out[227]: >> array([[ 0, 1, 2], >> [ 3, 4, 5], >> [ 6, 7, 8], >> [ 9, 10, 11]]) >> >> In [228]: np.random.shuffle(a) >> >> In [229]: a >> Out[229]: >> array([[ 0, 1, 2], >> [ 9, 10, 11], >> [ 3, 4, 5], >> [ 6, 7, 8]]) >> >> >> To add an axis keyword, we could (in effect) apply `shuffle` to >> `a.swapaxes(axis, 0)`. For a 2-D array, `axis=1` would shuffles >> the columns: >> >> In [232]: a = np.arange(15).reshape(3,5) >> >> In [233]: a >> Out[233]: >> array([[ 0, 1, 2, 3, 4], >> [ 5, 6, 7, 8, 9], >> [10, 11, 12, 13, 14]]) >> >> In [234]: axis = 1 >> >> In [235]: np.random.shuffle(a.swapaxes(axis, 0)) >> >> In [236]: a >> Out[236]: >> array([[ 3, 2, 4, 0, 1], >> [ 8, 7, 9, 5, 6], >> [13, 12, 14, 10, 11]]) >> >> So that's the first part--adding an `axis` keyword. >> >> The other part of the enhancement request is to add a shuffle >> behavior that shuffles the 1-d slices *independently*. That is, >> for a 2-d array, shuffling with `axis=0` would apply a different >> shuffle to each column. In the github issue, I defined a >> function called `disarrange` that implements this behavior: >> >> In [240]: a >> Out[240]: >> array([[ 0, 1, 2], >> [ 3, 4, 5], >> [ 6, 7, 8], >> [ 9, 10, 11], >> [12, 13, 14]]) >> >> In [241]: disarrange(a, axis=0) >> >> In [242]: a >> Out[242]: >> array([[ 6, 1, 2], >> [ 3, 13, 14], >> [ 9, 10, 5], >> [12, 7, 8], >> [ 0, 4, 11]]) >> >> Note that each column has been shuffled independently. >> >> This behavior is analogous to how `sort` handles the `axis` >> keyword. `sort` sorts the 1-d slices along the given axis >> independently. >> >> In the github issue, I suggested the following signature >> for `shuffle` (but I'm not too fond of the name `independent`): >> >> def shuffle(a, independent=False, axis=0) >> >> If `independent` is False, the current behavior of `shuffle` >> is used. If `independent` is True, each 1-d slice is shuffled >> independently (in the same way that `sort` sorts each 1-d >> slice). >> >> Like most functions that take an `axis` argument, `axis=None` >> means to shuffle the flattened array. With `independent=True`, >> it would act like `np.random.shuffle(a.flat)`, e.g. >> >> In [247]: a >> Out[247]: >> array([[ 0, 1, 2, 3, 4], >> [ 5, 6, 7, 8, 9], >> [10, 11, 12, 13, 14]]) >> >> In [248]: np.random.shuffle(a.flat) >> >> In [249]: a >> Out[249]: >> array([[ 0, 14, 9, 1, 13], >> [ 2, 8, 5, 3, 4], >> [ 6, 10, 7, 12, 11]]) >> >> >> A small wart in this API is the meaning of >> >> shuffle(a, independent=False, axis=None) >> >> It could be argued that the correct behavior is to leave the >> array unchanged. (The current behavior can be interpreted as >> shuffling a 1-d sequence of monolithic blobs; the axis argument >> specifies which axis of the array corresponds to the >> sequence index. Then `axis=None` means the argument is >> a single monolithic blob, so there is nothing to shuffle.) >> Or an error could be raised. >> >> What do you think? >> >> Warren >> >> > > > It is clear from the comments so far that, when `axis` is None, the result > should be a shuffle of all the elements in the array, for both methods of > shuffling (whether implemented as a new method or with a boolean argument > to `shuffle`). Forget I ever suggested doing nothing or raising an error. > :) > > Josef's comment reminded me that `numpy.random.permutation` returns a > shuffled copy of the array (when its argument is an array). This function > should also get an `axis` argument. `permutation` shuffles the same way > `shuffle` does--it simply makes a copy and then calls `shuffle` on the > copy. If a new method is added for the new shuffling style, then it would > be consistent to also add a new method that uses the new shuffling style > and returns a copy of the shuffled array. Then we would then have four > methods: > > In-place Copy > Current shuffle style shuffle permutation > New shuffle style (name TBD) (name TBD) > > (All of them will have an `axis` argument.) > > That table makes me think that, *if* we go with new methods, the names should be `shuffleXXX` and `permutationXXX`, where `XXX` is a common suffix that is to be determined. That will ensure that the names appear together in alphabetical lists, and should show up together as options in tab-completion or code-completion. Warren > I suspect this will make some folks prefer the approach of adding a > boolean argument to `shuffle` and `permutation`. > > Warren > >
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion