[Numpy-discussion] Request for enhancement to numpy.random.shuffle

Warren Weckesser Sat, 11 Oct 2014 15:52:58 -0700

I created an issue on github for an enhancement
to numpy.random.shuffle:
    https://github.com/numpy/numpy/issues/5173
I'd like to get some feedback on the idea.


Currently, `shuffle` shuffles the first dimension of an array
in-place.  For example, shuffling a 2D array shuffles the rows:

In [227]: a
Out[227]:
array([[ 0,  1,  2],
       [ 3,  4,  5],
       [ 6,  7,  8],
       [ 9, 10, 11]])

In [228]: np.random.shuffle(a)

In [229]: a
Out[229]:
array([[ 0,  1,  2],
       [ 9, 10, 11],
       [ 3,  4,  5],
       [ 6,  7,  8]])


To add an axis keyword, we could (in effect) apply `shuffle` to
`a.swapaxes(axis, 0)`.  For a 2-D array, `axis=1` would shuffles
the columns:

In [232]: a = np.arange(15).reshape(3,5)

In [233]: a
Out[233]:
array([[ 0,  1,  2,  3,  4],
       [ 5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14]])

In [234]: axis = 1

In [235]: np.random.shuffle(a.swapaxes(axis, 0))

In [236]: a
Out[236]:
array([[ 3,  2,  4,  0,  1],
       [ 8,  7,  9,  5,  6],
       [13, 12, 14, 10, 11]])

So that's the first part--adding an `axis` keyword.

The other part of the enhancement request is to add a shuffle
behavior that shuffles the 1-d slices *independently*.  That is,
for a 2-d array, shuffling with `axis=0` would apply a different
shuffle to each column.  In the github issue, I defined a
function called `disarrange` that implements this behavior:

In [240]: a
Out[240]:
array([[ 0,  1,  2],
       [ 3,  4,  5],
       [ 6,  7,  8],
       [ 9, 10, 11],
       [12, 13, 14]])

In [241]: disarrange(a, axis=0)

In [242]: a
Out[242]:
array([[ 6,  1,  2],
       [ 3, 13, 14],
       [ 9, 10,  5],
       [12,  7,  8],
       [ 0,  4, 11]])

Note that each column has been shuffled independently.

This behavior is analogous to how `sort` handles the `axis`
keyword.  `sort` sorts the 1-d slices along the given axis
independently.

In the github issue, I suggested the following signature
for `shuffle` (but I'm not too fond of the name `independent`):

  def shuffle(a, independent=False, axis=0)

If `independent` is False, the current behavior of `shuffle`
is used.  If `independent` is True, each 1-d slice is shuffled
independently (in the same way that `sort` sorts each 1-d
slice).

Like most functions that take an `axis` argument, `axis=None`
means to shuffle the flattened array.  With `independent=True`,
it would act like `np.random.shuffle(a.flat)`, e.g.

In [247]: a
Out[247]:
array([[ 0,  1,  2,  3,  4],
       [ 5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14]])

In [248]: np.random.shuffle(a.flat)

In [249]: a
Out[249]:
array([[ 0, 14,  9,  1, 13],
       [ 2,  8,  5,  3,  4],
       [ 6, 10,  7, 12, 11]])


A small wart in this API is the meaning of

  shuffle(a, independent=False, axis=None)

It could be argued that the correct behavior is to leave the
array unchanged. (The current behavior can be interpreted as
shuffling a 1-d sequence of monolithic blobs; the axis argument
specifies which axis of the array corresponds to the
sequence index.  Then `axis=None` means the argument is
a single monolithic blob, so there is nothing to shuffle.)
Or an error could be raised.

What do you think?

Warren

_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

[Numpy-discussion] Request for enhancement to numpy.random.shuffle

Reply via email to