I created an issue on github for an enhancement to numpy.random.shuffle: https://github.com/numpy/numpy/issues/5173 I'd like to get some feedback on the idea.
Currently, `shuffle` shuffles the first dimension of an array in-place. For example, shuffling a 2D array shuffles the rows: In [227]: a Out[227]: array([[ 0, 1, 2], [ 3, 4, 5], [ 6, 7, 8], [ 9, 10, 11]]) In [228]: np.random.shuffle(a) In [229]: a Out[229]: array([[ 0, 1, 2], [ 9, 10, 11], [ 3, 4, 5], [ 6, 7, 8]]) To add an axis keyword, we could (in effect) apply `shuffle` to `a.swapaxes(axis, 0)`. For a 2-D array, `axis=1` would shuffles the columns: In [232]: a = np.arange(15).reshape(3,5) In [233]: a Out[233]: array([[ 0, 1, 2, 3, 4], [ 5, 6, 7, 8, 9], [10, 11, 12, 13, 14]]) In [234]: axis = 1 In [235]: np.random.shuffle(a.swapaxes(axis, 0)) In [236]: a Out[236]: array([[ 3, 2, 4, 0, 1], [ 8, 7, 9, 5, 6], [13, 12, 14, 10, 11]]) So that's the first part--adding an `axis` keyword. The other part of the enhancement request is to add a shuffle behavior that shuffles the 1-d slices *independently*. That is, for a 2-d array, shuffling with `axis=0` would apply a different shuffle to each column. In the github issue, I defined a function called `disarrange` that implements this behavior: In [240]: a Out[240]: array([[ 0, 1, 2], [ 3, 4, 5], [ 6, 7, 8], [ 9, 10, 11], [12, 13, 14]]) In [241]: disarrange(a, axis=0) In [242]: a Out[242]: array([[ 6, 1, 2], [ 3, 13, 14], [ 9, 10, 5], [12, 7, 8], [ 0, 4, 11]]) Note that each column has been shuffled independently. This behavior is analogous to how `sort` handles the `axis` keyword. `sort` sorts the 1-d slices along the given axis independently. In the github issue, I suggested the following signature for `shuffle` (but I'm not too fond of the name `independent`): def shuffle(a, independent=False, axis=0) If `independent` is False, the current behavior of `shuffle` is used. If `independent` is True, each 1-d slice is shuffled independently (in the same way that `sort` sorts each 1-d slice). Like most functions that take an `axis` argument, `axis=None` means to shuffle the flattened array. With `independent=True`, it would act like `np.random.shuffle(a.flat)`, e.g. In [247]: a Out[247]: array([[ 0, 1, 2, 3, 4], [ 5, 6, 7, 8, 9], [10, 11, 12, 13, 14]]) In [248]: np.random.shuffle(a.flat) In [249]: a Out[249]: array([[ 0, 14, 9, 1, 13], [ 2, 8, 5, 3, 4], [ 6, 10, 7, 12, 11]]) A small wart in this API is the meaning of shuffle(a, independent=False, axis=None) It could be argued that the correct behavior is to leave the array unchanged. (The current behavior can be interpreted as shuffling a 1-d sequence of monolithic blobs; the axis argument specifies which axis of the array corresponds to the sequence index. Then `axis=None` means the argument is a single monolithic blob, so there is nothing to shuffle.) Or an error could be raised. What do you think? Warren
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion