On Sun, Oct 12, 2014 at 9:29 AM, Warren Weckesser < warren.weckes...@gmail.com> wrote:
> > > On Sun, Oct 12, 2014 at 12:14 PM, Warren Weckesser < > warren.weckes...@gmail.com> wrote: > >> >> >> On Sat, Oct 11, 2014 at 6:51 PM, Warren Weckesser < >> warren.weckes...@gmail.com> wrote: >> >>> I created an issue on github for an enhancement >>> to numpy.random.shuffle: >>> https://github.com/numpy/numpy/issues/5173 >>> I'd like to get some feedback on the idea. >>> >>> Currently, `shuffle` shuffles the first dimension of an array >>> in-place. For example, shuffling a 2D array shuffles the rows: >>> >>> In [227]: a >>> Out[227]: >>> array([[ 0, 1, 2], >>> [ 3, 4, 5], >>> [ 6, 7, 8], >>> [ 9, 10, 11]]) >>> >>> In [228]: np.random.shuffle(a) >>> >>> In [229]: a >>> Out[229]: >>> array([[ 0, 1, 2], >>> [ 9, 10, 11], >>> [ 3, 4, 5], >>> [ 6, 7, 8]]) >>> >>> >>> To add an axis keyword, we could (in effect) apply `shuffle` to >>> `a.swapaxes(axis, 0)`. For a 2-D array, `axis=1` would shuffles >>> the columns: >>> >>> In [232]: a = np.arange(15).reshape(3,5) >>> >>> In [233]: a >>> Out[233]: >>> array([[ 0, 1, 2, 3, 4], >>> [ 5, 6, 7, 8, 9], >>> [10, 11, 12, 13, 14]]) >>> >>> In [234]: axis = 1 >>> >>> In [235]: np.random.shuffle(a.swapaxes(axis, 0)) >>> >>> In [236]: a >>> Out[236]: >>> array([[ 3, 2, 4, 0, 1], >>> [ 8, 7, 9, 5, 6], >>> [13, 12, 14, 10, 11]]) >>> >>> So that's the first part--adding an `axis` keyword. >>> >>> The other part of the enhancement request is to add a shuffle >>> behavior that shuffles the 1-d slices *independently*. That is, >>> for a 2-d array, shuffling with `axis=0` would apply a different >>> shuffle to each column. In the github issue, I defined a >>> function called `disarrange` that implements this behavior: >>> >>> In [240]: a >>> Out[240]: >>> array([[ 0, 1, 2], >>> [ 3, 4, 5], >>> [ 6, 7, 8], >>> [ 9, 10, 11], >>> [12, 13, 14]]) >>> >>> In [241]: disarrange(a, axis=0) >>> >>> In [242]: a >>> Out[242]: >>> array([[ 6, 1, 2], >>> [ 3, 13, 14], >>> [ 9, 10, 5], >>> [12, 7, 8], >>> [ 0, 4, 11]]) >>> >>> Note that each column has been shuffled independently. >>> >>> This behavior is analogous to how `sort` handles the `axis` >>> keyword. `sort` sorts the 1-d slices along the given axis >>> independently. >>> >>> In the github issue, I suggested the following signature >>> for `shuffle` (but I'm not too fond of the name `independent`): >>> >>> def shuffle(a, independent=False, axis=0) >>> >>> If `independent` is False, the current behavior of `shuffle` >>> is used. If `independent` is True, each 1-d slice is shuffled >>> independently (in the same way that `sort` sorts each 1-d >>> slice). >>> >>> Like most functions that take an `axis` argument, `axis=None` >>> means to shuffle the flattened array. With `independent=True`, >>> it would act like `np.random.shuffle(a.flat)`, e.g. >>> >>> In [247]: a >>> Out[247]: >>> array([[ 0, 1, 2, 3, 4], >>> [ 5, 6, 7, 8, 9], >>> [10, 11, 12, 13, 14]]) >>> >>> In [248]: np.random.shuffle(a.flat) >>> >>> In [249]: a >>> Out[249]: >>> array([[ 0, 14, 9, 1, 13], >>> [ 2, 8, 5, 3, 4], >>> [ 6, 10, 7, 12, 11]]) >>> >>> >>> A small wart in this API is the meaning of >>> >>> shuffle(a, independent=False, axis=None) >>> >>> It could be argued that the correct behavior is to leave the >>> array unchanged. (The current behavior can be interpreted as >>> shuffling a 1-d sequence of monolithic blobs; the axis argument >>> specifies which axis of the array corresponds to the >>> sequence index. Then `axis=None` means the argument is >>> a single monolithic blob, so there is nothing to shuffle.) >>> Or an error could be raised. >>> >>> What do you think? >>> >>> Warren >>> >>> >> >> >> It is clear from the comments so far that, when `axis` is None, the >> result should be a shuffle of all the elements in the array, for both >> methods of shuffling (whether implemented as a new method or with a boolean >> argument to `shuffle`). Forget I ever suggested doing nothing or raising >> an error. :) >> >> Josef's comment reminded me that `numpy.random.permutation` returns a >> shuffled copy of the array (when its argument is an array). This function >> should also get an `axis` argument. `permutation` shuffles the same way >> `shuffle` does--it simply makes a copy and then calls `shuffle` on the >> copy. If a new method is added for the new shuffling style, then it would >> be consistent to also add a new method that uses the new shuffling style >> and returns a copy of the shuffled array. Then we would then have four >> methods: >> >> In-place Copy >> Current shuffle style shuffle permutation >> New shuffle style (name TBD) (name TBD) >> >> (All of them will have an `axis` argument.) >> >> > > That table makes me think that, *if* we go with new methods, the names > should be `shuffleXXX` and `permutationXXX`, where `XXX` is a common suffix > that is to be determined. That will ensure that the names appear together > in alphabetical lists, and should show up together as options in > tab-completion or code-completion. > Just to add some noise to a productive conversation: if you add a 'copy' flag to shuffle, then all the functionality is in one place, and 'permutation' can either be deprecated, or trivially implemented in terms of the new 'shuffle'. Jaime > > > Warren > > >> I suspect this will make some folks prefer the approach of adding a >> boolean argument to `shuffle` and `permutation`. >> >> Warren >> >> > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion@scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -- (\__/) ( O.o) ( > <) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes de dominación mundial.
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion