On Sat, Oct 11, 2014 at 6:51 PM, Warren Weckesser < warren.weckes...@gmail.com> wrote:
> I created an issue on github for an enhancement > to numpy.random.shuffle: > https://github.com/numpy/numpy/issues/5173 > I'd like to get some feedback on the idea. > > Currently, `shuffle` shuffles the first dimension of an array > in-place. For example, shuffling a 2D array shuffles the rows: > > In [227]: a > Out[227]: > array([[ 0, 1, 2], > [ 3, 4, 5], > [ 6, 7, 8], > [ 9, 10, 11]]) > > In [228]: np.random.shuffle(a) > > In [229]: a > Out[229]: > array([[ 0, 1, 2], > [ 9, 10, 11], > [ 3, 4, 5], > [ 6, 7, 8]]) > > > To add an axis keyword, we could (in effect) apply `shuffle` to > `a.swapaxes(axis, 0)`. For a 2-D array, `axis=1` would shuffles > the columns: > > In [232]: a = np.arange(15).reshape(3,5) > > In [233]: a > Out[233]: > array([[ 0, 1, 2, 3, 4], > [ 5, 6, 7, 8, 9], > [10, 11, 12, 13, 14]]) > > In [234]: axis = 1 > > In [235]: np.random.shuffle(a.swapaxes(axis, 0)) > > In [236]: a > Out[236]: > array([[ 3, 2, 4, 0, 1], > [ 8, 7, 9, 5, 6], > [13, 12, 14, 10, 11]]) > > So that's the first part--adding an `axis` keyword. > > The other part of the enhancement request is to add a shuffle > behavior that shuffles the 1-d slices *independently*. That is, > for a 2-d array, shuffling with `axis=0` would apply a different > shuffle to each column. In the github issue, I defined a > function called `disarrange` that implements this behavior: > > In [240]: a > Out[240]: > array([[ 0, 1, 2], > [ 3, 4, 5], > [ 6, 7, 8], > [ 9, 10, 11], > [12, 13, 14]]) > > In [241]: disarrange(a, axis=0) > > In [242]: a > Out[242]: > array([[ 6, 1, 2], > [ 3, 13, 14], > [ 9, 10, 5], > [12, 7, 8], > [ 0, 4, 11]]) > > Note that each column has been shuffled independently. > > This behavior is analogous to how `sort` handles the `axis` > keyword. `sort` sorts the 1-d slices along the given axis > independently. > > In the github issue, I suggested the following signature > for `shuffle` (but I'm not too fond of the name `independent`): > > def shuffle(a, independent=False, axis=0) > > If `independent` is False, the current behavior of `shuffle` > is used. If `independent` is True, each 1-d slice is shuffled > independently (in the same way that `sort` sorts each 1-d > slice). > > Like most functions that take an `axis` argument, `axis=None` > means to shuffle the flattened array. With `independent=True`, > it would act like `np.random.shuffle(a.flat)`, e.g. > > In [247]: a > Out[247]: > array([[ 0, 1, 2, 3, 4], > [ 5, 6, 7, 8, 9], > [10, 11, 12, 13, 14]]) > > In [248]: np.random.shuffle(a.flat) > > In [249]: a > Out[249]: > array([[ 0, 14, 9, 1, 13], > [ 2, 8, 5, 3, 4], > [ 6, 10, 7, 12, 11]]) > > > A small wart in this API is the meaning of > > shuffle(a, independent=False, axis=None) > > It could be argued that the correct behavior is to leave the > array unchanged. (The current behavior can be interpreted as > shuffling a 1-d sequence of monolithic blobs; the axis argument > specifies which axis of the array corresponds to the > sequence index. Then `axis=None` means the argument is > a single monolithic blob, so there is nothing to shuffle.) > Or an error could be raised. > > What do you think? > > Warren > > It is clear from the comments so far that, when `axis` is None, the result should be a shuffle of all the elements in the array, for both methods of shuffling (whether implemented as a new method or with a boolean argument to `shuffle`). Forget I ever suggested doing nothing or raising an error. :) Josef's comment reminded me that `numpy.random.permutation` returns a shuffled copy of the array (when its argument is an array). This function should also get an `axis` argument. `permutation` shuffles the same way `shuffle` does--it simply makes a copy and then calls `shuffle` on the copy. If a new method is added for the new shuffling style, then it would be consistent to also add a new method that uses the new shuffling style and returns a copy of the shuffled array. Then we would then have four methods: In-place Copy Current shuffle style shuffle permutation New shuffle style (name TBD) (name TBD) (All of them will have an `axis` argument.) I suspect this will make some folks prefer the approach of adding a boolean argument to `shuffle` and `permutation`. Warren
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion