Re: [Numpy-discussion] Request for enhancement to numpy.random.shuffle

2014-10-11 Thread John Zwinck
On Sun, Oct 12, 2014 at 6:51 AM, Warren Weckesser
 wrote:
> I created an issue on github for an enhancement
> to numpy.random.shuffle:
> https://github.com/numpy/numpy/issues/5173

I like this idea.  I was a bit surprised there wasn't something like
this already.

> A small wart in this API is the meaning of
>
>   shuffle(a, independent=False, axis=None)
>
> It could be argued that the correct behavior is to leave the
> array unchanged. (The current behavior can be interpreted as
> shuffling a 1-d sequence of monolithic blobs; the axis argument
> specifies which axis of the array corresponds to the
> sequence index.  Then `axis=None` means the argument is
> a single monolithic blob, so there is nothing to shuffle.)
> Or an error could be raised.

Let's think about it from the other direction: if a user wants to
shuffle all the elements as if it were 1-d, as you point out they
could do this:

  shuffle(a, axis=None, independent=True)

But that's a lot of typing.  Maybe we should just let this do the same thing:

  shuffle(a, axis=None)

That seems to be in keeping with the other APIs taking axis as you
mentioned.  To me, "independent" has no relevance when the array is
1-d, it can simply be ignored.

John Zwinck
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] Request for enhancement to numpy.random.shuffle

2014-10-11 Thread Warren Weckesser
I created an issue on github for an enhancement
to numpy.random.shuffle:
https://github.com/numpy/numpy/issues/5173
I'd like to get some feedback on the idea.

Currently, `shuffle` shuffles the first dimension of an array
in-place.  For example, shuffling a 2D array shuffles the rows:

In [227]: a
Out[227]:
array([[ 0,  1,  2],
   [ 3,  4,  5],
   [ 6,  7,  8],
   [ 9, 10, 11]])

In [228]: np.random.shuffle(a)

In [229]: a
Out[229]:
array([[ 0,  1,  2],
   [ 9, 10, 11],
   [ 3,  4,  5],
   [ 6,  7,  8]])


To add an axis keyword, we could (in effect) apply `shuffle` to
`a.swapaxes(axis, 0)`.  For a 2-D array, `axis=1` would shuffles
the columns:

In [232]: a = np.arange(15).reshape(3,5)

In [233]: a
Out[233]:
array([[ 0,  1,  2,  3,  4],
   [ 5,  6,  7,  8,  9],
   [10, 11, 12, 13, 14]])

In [234]: axis = 1

In [235]: np.random.shuffle(a.swapaxes(axis, 0))

In [236]: a
Out[236]:
array([[ 3,  2,  4,  0,  1],
   [ 8,  7,  9,  5,  6],
   [13, 12, 14, 10, 11]])

So that's the first part--adding an `axis` keyword.

The other part of the enhancement request is to add a shuffle
behavior that shuffles the 1-d slices *independently*.  That is,
for a 2-d array, shuffling with `axis=0` would apply a different
shuffle to each column.  In the github issue, I defined a
function called `disarrange` that implements this behavior:

In [240]: a
Out[240]:
array([[ 0,  1,  2],
   [ 3,  4,  5],
   [ 6,  7,  8],
   [ 9, 10, 11],
   [12, 13, 14]])

In [241]: disarrange(a, axis=0)

In [242]: a
Out[242]:
array([[ 6,  1,  2],
   [ 3, 13, 14],
   [ 9, 10,  5],
   [12,  7,  8],
   [ 0,  4, 11]])

Note that each column has been shuffled independently.

This behavior is analogous to how `sort` handles the `axis`
keyword.  `sort` sorts the 1-d slices along the given axis
independently.

In the github issue, I suggested the following signature
for `shuffle` (but I'm not too fond of the name `independent`):

  def shuffle(a, independent=False, axis=0)

If `independent` is False, the current behavior of `shuffle`
is used.  If `independent` is True, each 1-d slice is shuffled
independently (in the same way that `sort` sorts each 1-d
slice).

Like most functions that take an `axis` argument, `axis=None`
means to shuffle the flattened array.  With `independent=True`,
it would act like `np.random.shuffle(a.flat)`, e.g.

In [247]: a
Out[247]:
array([[ 0,  1,  2,  3,  4],
   [ 5,  6,  7,  8,  9],
   [10, 11, 12, 13, 14]])

In [248]: np.random.shuffle(a.flat)

In [249]: a
Out[249]:
array([[ 0, 14,  9,  1, 13],
   [ 2,  8,  5,  3,  4],
   [ 6, 10,  7, 12, 11]])


A small wart in this API is the meaning of

  shuffle(a, independent=False, axis=None)

It could be argued that the correct behavior is to leave the
array unchanged. (The current behavior can be interpreted as
shuffling a 1-d sequence of monolithic blobs; the axis argument
specifies which axis of the array corresponds to the
sequence index.  Then `axis=None` means the argument is
a single monolithic blob, so there is nothing to shuffle.)
Or an error could be raised.

What do you think?

Warren
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion