Re: [Numpy-discussion] Request for enhancement to numpy.random.shuffle
On Sun, Oct 12, 2014 at 9:13 PM, Nathaniel Smith n...@pobox.com wrote: On Sun, Oct 12, 2014 at 5:14 PM, Sebastian se...@sebix.at wrote: On 2014-10-12 16:54, Warren Weckesser wrote: On Sun, Oct 12, 2014 at 7:57 AM, Robert Kern robert.k...@gmail.com mailto:robert.k...@gmail.com wrote: On Sat, Oct 11, 2014 at 11:51 PM, Warren Weckesser warren.weckes...@gmail.com mailto:warren.weckes...@gmail.com wrote: A small wart in this API is the meaning of shuffle(a, independent=False, axis=None) It could be argued that the correct behavior is to leave the array unchanged. (The current behavior can be interpreted as shuffling a 1-d sequence of monolithic blobs; the axis argument specifies which axis of the array corresponds to the sequence index. Then `axis=None` means the argument is a single monolithic blob, so there is nothing to shuffle.) Or an error could be raised. What do you think? It seems to me a perfectly good reason to have two methods instead of one. I can't imagine when I wouldn't be using a literal True or False for this, so it really should be two different methods. I agree, and my first inclination was to propose a different method (and I had the bikeshedding conversation with myself about the name: disarrange, scramble, disorder, randomize, ashuffle, some other variation of the word shuffle, ...), but I figured the first thing folks would say is Why not just add options to shuffle? So, choose your battles and all that. What do other folks think of making a separate method I'm not a fan of more methods with similar functionality in Numpy. It's already hard to overlook the existing functions and all their possible applications and variants. The axis=None proposal for shuffling all items is very intuitive. I think we don't want to take the path of matlab: a huge amount of powerful functions, but few people know of their powerful possibilities. I totally agree with this principle, but I think this is an exception to the rule, b/c unfortunately in this case the function that we *do* have is weird and inconsistent with how most other functions in numpy work. It doesn't vectorize! Cf. 'sort' or how a 'shuffle' gufunc (k,)-(k,) would work. Also, it's easy to implement the current 'shuffle' in terms of any 1d shuffle function, with no explicit loops, Warren's disarrange requires an explicit loop. So, we really implemented the wrong one, oops. What this means going forward, though, is that our only options are either to implement both behaviours with two functions, or else to give up on have the more natural behaviour altogether. I think the former is the lesser of two evils. Regarding names: shuffle/permutation is a terrible naming convention IMHO and shouldn't be propagated further. We already have a good naming convention for inplace-vs-sorted: sort vs. sorted, reverse vs. reversed, etc. So, how about: scramble + scrambled shuffle individual entries within each row/column/..., as in Warren's suggestion. shuffle + shuffled to do what shuffle, permutation do now (mnemonic: these break a 2d array into a bunch of 1d cards, and then shuffle those cards). permuted remains indefinitely, with the docstring: Deprecated alias for 'shuffled'. That sounds good to me. (I might go with 'randomize' instead of 'scramble', but that's a second-order decision for the API.) Warren -n -- Nathaniel J. Smith Postdoctoral researcher - Informatics - University of Edinburgh http://vorpus.org ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Request for enhancement to numpy.random.shuffle
On Thu, Oct 16, 2014 at 8:39 AM, Warren Weckesser warren.weckes...@gmail.com wrote: On Sun, Oct 12, 2014 at 9:13 PM, Nathaniel Smith n...@pobox.com wrote: On Sun, Oct 12, 2014 at 5:14 PM, Sebastian se...@sebix.at wrote: On 2014-10-12 16:54, Warren Weckesser wrote: On Sun, Oct 12, 2014 at 7:57 AM, Robert Kern robert.k...@gmail.com mailto:robert.k...@gmail.com wrote: On Sat, Oct 11, 2014 at 11:51 PM, Warren Weckesser warren.weckes...@gmail.com mailto:warren.weckes...@gmail.com wrote: A small wart in this API is the meaning of shuffle(a, independent=False, axis=None) It could be argued that the correct behavior is to leave the array unchanged. (The current behavior can be interpreted as shuffling a 1-d sequence of monolithic blobs; the axis argument specifies which axis of the array corresponds to the sequence index. Then `axis=None` means the argument is a single monolithic blob, so there is nothing to shuffle.) Or an error could be raised. What do you think? It seems to me a perfectly good reason to have two methods instead of one. I can't imagine when I wouldn't be using a literal True or False for this, so it really should be two different methods. I agree, and my first inclination was to propose a different method (and I had the bikeshedding conversation with myself about the name: disarrange, scramble, disorder, randomize, ashuffle, some other variation of the word shuffle, ...), but I figured the first thing folks would say is Why not just add options to shuffle? So, choose your battles and all that. What do other folks think of making a separate method I'm not a fan of more methods with similar functionality in Numpy. It's already hard to overlook the existing functions and all their possible applications and variants. The axis=None proposal for shuffling all items is very intuitive. I think we don't want to take the path of matlab: a huge amount of powerful functions, but few people know of their powerful possibilities. I totally agree with this principle, but I think this is an exception to the rule, b/c unfortunately in this case the function that we *do* have is weird and inconsistent with how most other functions in numpy work. It doesn't vectorize! Cf. 'sort' or how a 'shuffle' gufunc (k,)-(k,) would work. Also, it's easy to implement the current 'shuffle' in terms of any 1d shuffle function, with no explicit loops, Warren's disarrange requires an explicit loop. So, we really implemented the wrong one, oops. What this means going forward, though, is that our only options are either to implement both behaviours with two functions, or else to give up on have the more natural behaviour altogether. I think the former is the lesser of two evils. Regarding names: shuffle/permutation is a terrible naming convention IMHO and shouldn't be propagated further. We already have a good naming convention for inplace-vs-sorted: sort vs. sorted, reverse vs. reversed, etc. So, how about: scramble + scrambled shuffle individual entries within each row/column/..., as in Warren's suggestion. shuffle + shuffled to do what shuffle, permutation do now (mnemonic: these break a 2d array into a bunch of 1d cards, and then shuffle those cards). permuted remains indefinitely, with the docstring: Deprecated alias for 'shuffled'. That sounds good to me. (I might go with 'randomize' instead of 'scramble', but that's a second-order decision for the API.) So the only little detail left is someone actually rolling up his/her sleeves and creating a PR... ;-) The current shuffle and permutation are implemented here: https://github.com/numpy/numpy/blob/master/numpy/random/mtrand/mtrand.pyx#L4551 It's in Cython, so it is a good candidate for anyone wanting to contribute to numpy, but wary of C code. Jaime Warren -n -- Nathaniel J. Smith Postdoctoral researcher - Informatics - University of Edinburgh http://vorpus.org ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion -- (\__/) ( O.o) ( ) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes de dominación mundial. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Request for enhancement to numpy.random.shuffle
On Thu, Oct 16, 2014 at 4:39 PM, Warren Weckesser warren.weckes...@gmail.com wrote: On Sun, Oct 12, 2014 at 9:13 PM, Nathaniel Smith n...@pobox.com wrote: Regarding names: shuffle/permutation is a terrible naming convention IMHO and shouldn't be propagated further. We already have a good naming convention for inplace-vs-sorted: sort vs. sorted, reverse vs. reversed, etc. So, how about: scramble + scrambled shuffle individual entries within each row/column/..., as in Warren's suggestion. shuffle + shuffled to do what shuffle, permutation do now (mnemonic: these break a 2d array into a bunch of 1d cards, and then shuffle those cards). permuted remains indefinitely, with the docstring: Deprecated alias for 'shuffled'. That sounds good to me. (I might go with 'randomize' instead of 'scramble', but that's a second-order decision for the API.) I hesitate to use names like randomize because they're less informative than they feel seem -- if asked what this operation does to an array, then it would be natural to say it randomizes the array. But if told that the random module has a function called randomize, then that's not very informative -- everything in random randomizes something somehow. -n -- Nathaniel J. Smith Postdoctoral researcher - Informatics - University of Edinburgh http://vorpus.org ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Changed behavior of np.gradient
On Tue, Oct 14, 2014 at 10:33 PM, Charles R Harris charlesr.har...@gmail.com wrote: On Tue, Oct 14, 2014 at 11:50 AM, Nathaniel Smith n...@pobox.com wrote: On 14 Oct 2014 18:29, Charles R Harris charlesr.har...@gmail.com wrote: On Tue, Oct 14, 2014 at 10:57 AM, Nathaniel Smith n...@pobox.com wrote: On 4 Oct 2014 22:17, Stéfan van der Walt ste...@sun.ac.za wrote: On Oct 4, 2014 10:14 PM, Derek Homeier de...@astro.physik.uni-goettingen.de wrote: +1 for an order=2 or maxorder=2 flag If you parameterize that flag, users will want to change its value (above two). Perhaps rather use a boolean flag such as second_order or high_order, unless it seems feasible to include additional orders in the future. Predicting the future is hard :-). And in particular high_order= would create all kinds of confusion if in the future we added 3rd order approximations but high_order=True continued to mean 2nd order because of compatibility. I like maxorder (or max_order would be more pep8ish I guess) because it leaves our options open. (Similar to how it's often better to have a kwarg that can take two possible string values than to have a boolean kwarg. It makes current code more explicit and makes future enhancements easier.) I think maxorder is a bit misleading. The both versions are second order in the interior while at the ends the old is first order and the new is second order. Maybe edge_order? Ah, that makes sense. edge_order makes more sense to me too then - and we can always add interior_order to complement it later, if appropriate. The other thing to decide on is the default. Is the 2nd order version generally preferred (modulo compatibility)? If so then it might make sense to keep it the default, given that there are already numpy's in the wild with that version, so we can't fully guarantee compatibility even if we wanted to. But what do others think? I'd be inclined to keep the older as the default and regard adding the keyword as a bugfix. I should have caught the incompatibility in review. I don't have any code that uses gradient, so I don't have a great sense of the trade-offs here. - Usually if we have a change that produces increased accuracy, we make the increased accuracy the default. Otherwise no-one ever uses it, and everyone gets less accurate results than they would otherwise. (I don't have a great sense of how much this change affects accuracy though.) - If the change in output per se is a serious problem for people, then it's not one we can fix at this point -- 1.9.0 is out there and people are using it anyway, so those who have the problem already need to take some affirmative action to fix it. (Also, it's kinda weird to change a function's behaviour and add a new argument in a point release!) So I'd like to hear from people affected by this -- would you prefer to have the 2nd order boundary calculations by default, you just need some way to workaround the immediate problems in existing code? Or do you prefer the old default remain the default, with 2nd order boundary calculations something that must be requested by hand every time? -n -- Nathaniel J. Smith Postdoctoral researcher - Informatics - University of Edinburgh http://vorpus.org ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Request for enhancement to numpy.random.shuffle
On Thu, Oct 16, 2014 at 12:40 PM, Nathaniel Smith n...@pobox.com wrote: On Thu, Oct 16, 2014 at 4:39 PM, Warren Weckesser warren.weckes...@gmail.com wrote: On Sun, Oct 12, 2014 at 9:13 PM, Nathaniel Smith n...@pobox.com wrote: Regarding names: shuffle/permutation is a terrible naming convention IMHO and shouldn't be propagated further. We already have a good naming convention for inplace-vs-sorted: sort vs. sorted, reverse vs. reversed, etc. So, how about: scramble + scrambled shuffle individual entries within each row/column/..., as in Warren's suggestion. shuffle + shuffled to do what shuffle, permutation do now (mnemonic: these break a 2d array into a bunch of 1d cards, and then shuffle those cards). permuted remains indefinitely, with the docstring: Deprecated alias for 'shuffled'. That sounds good to me. (I might go with 'randomize' instead of 'scramble', but that's a second-order decision for the API.) I hesitate to use names like randomize because they're less informative than they feel seem -- if asked what this operation does to an array, then it would be natural to say it randomizes the array. But if told that the random module has a function called randomize, then that's not very informative -- everything in random randomizes something somehow. I had some similar concerns (hence my original disarrange), but randomize seemed more likely to be found when searching or browsing the docs, and while it might be a bit too generic-sounding, it does feel like a natural verb for the process. On the other hand, permute and permuted are even more natural and unambiguous. Any objections to those? (The existing function is permutation.) Whatever the names, the docstrings for the four functions should be cross-referenced in their See Also sections to help users find the appropriate function. By the way, permutation has a feature not yet mentioned here: if the argument is an integer 'n', it generates a permutation of arange(n). In this case, it acts like matlab's randperm function. Unless we replicate that in the new function, we shouldn't deprecate permutation. Warren -n -- Nathaniel J. Smith Postdoctoral researcher - Informatics - University of Edinburgh http://vorpus.org ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Request for enhancement to numpy.random.shuffle
On Thu, Oct 16, 2014 at 6:30 PM, Warren Weckesser warren.weckes...@gmail.com wrote: On Thu, Oct 16, 2014 at 12:40 PM, Nathaniel Smith n...@pobox.com wrote: On Thu, Oct 16, 2014 at 4:39 PM, Warren Weckesser warren.weckes...@gmail.com wrote: On Sun, Oct 12, 2014 at 9:13 PM, Nathaniel Smith n...@pobox.com wrote: Regarding names: shuffle/permutation is a terrible naming convention IMHO and shouldn't be propagated further. We already have a good naming convention for inplace-vs-sorted: sort vs. sorted, reverse vs. reversed, etc. So, how about: scramble + scrambled shuffle individual entries within each row/column/..., as in Warren's suggestion. shuffle + shuffled to do what shuffle, permutation do now (mnemonic: these break a 2d array into a bunch of 1d cards, and then shuffle those cards). permuted remains indefinitely, with the docstring: Deprecated alias for 'shuffled'. That sounds good to me. (I might go with 'randomize' instead of 'scramble', but that's a second-order decision for the API.) I hesitate to use names like randomize because they're less informative than they feel seem -- if asked what this operation does to an array, then it would be natural to say it randomizes the array. But if told that the random module has a function called randomize, then that's not very informative -- everything in random randomizes something somehow. I had some similar concerns (hence my original disarrange), but randomize seemed more likely to be found when searching or browsing the docs, and while it might be a bit too generic-sounding, it does feel like a natural verb for the process. On the other hand, permute and permuted are even more natural and unambiguous. Any objections to those? (The existing function is permutation.) [...] By the way, permutation has a feature not yet mentioned here: if the argument is an integer 'n', it generates a permutation of arange(n). In this case, it acts like matlab's randperm function. Unless we replicate that in the new function, we shouldn't deprecate permutation. I guess we could do something like: permutation(n): Return a random permutation on n items. Equivalent to permuted(arange(n)). Note: for backwards compatibility, a call like permutation(an_array) currently returns the same as shuffled(an_array). (This is *not* equivalent to permuted(an_array).) This functionality is deprecated. OTOH np.random.permute as a name does have a downside: someday we'll probably add a function called np.permute (for applying a given permutation in place -- the O(n) algorithm for this is useful and tricky), and having two functions with the same name and very different semantics would be pretty confusing. -n -- Nathaniel J. Smith Postdoctoral researcher - Informatics - University of Edinburgh http://vorpus.org ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Changed behavior of np.gradient
On Thu, Oct 16, 2014 at 10:22 AM, Nathaniel Smith n...@pobox.com wrote: On Tue, Oct 14, 2014 at 10:33 PM, Charles R Harris charlesr.har...@gmail.com wrote: On Tue, Oct 14, 2014 at 11:50 AM, Nathaniel Smith n...@pobox.com wrote: On 14 Oct 2014 18:29, Charles R Harris charlesr.har...@gmail.com wrote: On Tue, Oct 14, 2014 at 10:57 AM, Nathaniel Smith n...@pobox.com wrote: On 4 Oct 2014 22:17, Stéfan van der Walt ste...@sun.ac.za wrote: On Oct 4, 2014 10:14 PM, Derek Homeier de...@astro.physik.uni-goettingen.de wrote: +1 for an order=2 or maxorder=2 flag If you parameterize that flag, users will want to change its value (above two). Perhaps rather use a boolean flag such as second_order or high_order, unless it seems feasible to include additional orders in the future. Predicting the future is hard :-). And in particular high_order= would create all kinds of confusion if in the future we added 3rd order approximations but high_order=True continued to mean 2nd order because of compatibility. I like maxorder (or max_order would be more pep8ish I guess) because it leaves our options open. (Similar to how it's often better to have a kwarg that can take two possible string values than to have a boolean kwarg. It makes current code more explicit and makes future enhancements easier.) I think maxorder is a bit misleading. The both versions are second order in the interior while at the ends the old is first order and the new is second order. Maybe edge_order? Ah, that makes sense. edge_order makes more sense to me too then - and we can always add interior_order to complement it later, if appropriate. The other thing to decide on is the default. Is the 2nd order version generally preferred (modulo compatibility)? If so then it might make sense to keep it the default, given that there are already numpy's in the wild with that version, so we can't fully guarantee compatibility even if we wanted to. But what do others think? I'd be inclined to keep the older as the default and regard adding the keyword as a bugfix. I should have caught the incompatibility in review. I don't have any code that uses gradient, so I don't have a great sense of the trade-offs here. - Usually if we have a change that produces increased accuracy, we make the increased accuracy the default. Otherwise no-one ever uses it, and everyone gets less accurate results than they would otherwise. (I don't have a great sense of how much this change affects accuracy though.) - If the change in output per se is a serious problem for people, then it's not one we can fix at this point -- 1.9.0 is out there and people are using it anyway, so those who have the problem already need to take some affirmative action to fix it. (Also, it's kinda weird to change a function's behaviour and add a new argument in a point release!) So I'd like to hear from people affected by this -- would you prefer to have the 2nd order boundary calculations by default, you just need some way to workaround the immediate problems in existing code? Or do you prefer the old default remain the default, with 2nd order boundary calculations something that must be requested by hand every time? -n -- Nathaniel J. Smith Postdoctoral researcher - Informatics - University of Edinburgh http://vorpus.org ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion Since I started this discussion, I'll chime in. I don't have a strong preference for either mode that stems from a computational/scientific principle. As Nathaniel suggested - I have resorted to simply copying the 1.8 version of the function into my algorithm implementation, with the hope of removing that down the line. In that respect, I have a very weak preference for preserving the (1.8) status quo per default. Thanks! ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Changed behavior of np.gradient
It isn't really a question of accuracy. It breaks unit tests and reproducibility elsewhere. My vote is to revert to the old behavior in 1.9.1. Ben Root On Thu, Oct 16, 2014 at 6:10 PM, Ariel Rokem aro...@gmail.com wrote: On Thu, Oct 16, 2014 at 10:22 AM, Nathaniel Smith n...@pobox.com wrote: On Tue, Oct 14, 2014 at 10:33 PM, Charles R Harris charlesr.har...@gmail.com wrote: On Tue, Oct 14, 2014 at 11:50 AM, Nathaniel Smith n...@pobox.com wrote: On 14 Oct 2014 18:29, Charles R Harris charlesr.har...@gmail.com wrote: On Tue, Oct 14, 2014 at 10:57 AM, Nathaniel Smith n...@pobox.com wrote: On 4 Oct 2014 22:17, Stéfan van der Walt ste...@sun.ac.za wrote: On Oct 4, 2014 10:14 PM, Derek Homeier de...@astro.physik.uni-goettingen.de wrote: +1 for an order=2 or maxorder=2 flag If you parameterize that flag, users will want to change its value (above two). Perhaps rather use a boolean flag such as second_order or high_order, unless it seems feasible to include additional orders in the future. Predicting the future is hard :-). And in particular high_order= would create all kinds of confusion if in the future we added 3rd order approximations but high_order=True continued to mean 2nd order because of compatibility. I like maxorder (or max_order would be more pep8ish I guess) because it leaves our options open. (Similar to how it's often better to have a kwarg that can take two possible string values than to have a boolean kwarg. It makes current code more explicit and makes future enhancements easier.) I think maxorder is a bit misleading. The both versions are second order in the interior while at the ends the old is first order and the new is second order. Maybe edge_order? Ah, that makes sense. edge_order makes more sense to me too then - and we can always add interior_order to complement it later, if appropriate. The other thing to decide on is the default. Is the 2nd order version generally preferred (modulo compatibility)? If so then it might make sense to keep it the default, given that there are already numpy's in the wild with that version, so we can't fully guarantee compatibility even if we wanted to. But what do others think? I'd be inclined to keep the older as the default and regard adding the keyword as a bugfix. I should have caught the incompatibility in review. I don't have any code that uses gradient, so I don't have a great sense of the trade-offs here. - Usually if we have a change that produces increased accuracy, we make the increased accuracy the default. Otherwise no-one ever uses it, and everyone gets less accurate results than they would otherwise. (I don't have a great sense of how much this change affects accuracy though.) - If the change in output per se is a serious problem for people, then it's not one we can fix at this point -- 1.9.0 is out there and people are using it anyway, so those who have the problem already need to take some affirmative action to fix it. (Also, it's kinda weird to change a function's behaviour and add a new argument in a point release!) So I'd like to hear from people affected by this -- would you prefer to have the 2nd order boundary calculations by default, you just need some way to workaround the immediate problems in existing code? Or do you prefer the old default remain the default, with 2nd order boundary calculations something that must be requested by hand every time? -n -- Nathaniel J. Smith Postdoctoral researcher - Informatics - University of Edinburgh http://vorpus.org ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion Since I started this discussion, I'll chime in. I don't have a strong preference for either mode that stems from a computational/scientific principle. As Nathaniel suggested - I have resorted to simply copying the 1.8 version of the function into my algorithm implementation, with the hope of removing that down the line. In that respect, I have a very weak preference for preserving the (1.8) status quo per default. Thanks! ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Changed behavior of np.gradient
On Fri, Oct 17, 2014 at 2:23 AM, Benjamin Root ben.r...@ou.edu wrote: It isn't really a question of accuracy. It breaks unit tests and reproducibility elsewhere. My vote is to revert to the old behavior in 1.9.1. Why would one want the 2nd order differences at all, if they're not more accurate? Should we just revert the patch entirely? I assumed the change had some benefit... -- Nathaniel J. Smith Postdoctoral researcher - Informatics - University of Edinburgh http://vorpus.org ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Request for enhancement to numpy.random.shuffle
On Thu, Oct 16, 2014 at 3:39 PM, Nathaniel Smith n...@pobox.com wrote: On Thu, Oct 16, 2014 at 6:30 PM, Warren Weckesser warren.weckes...@gmail.com wrote: On Thu, Oct 16, 2014 at 12:40 PM, Nathaniel Smith n...@pobox.com wrote: On Thu, Oct 16, 2014 at 4:39 PM, Warren Weckesser warren.weckes...@gmail.com wrote: On Sun, Oct 12, 2014 at 9:13 PM, Nathaniel Smith n...@pobox.com wrote: Regarding names: shuffle/permutation is a terrible naming convention IMHO and shouldn't be propagated further. We already have a good naming convention for inplace-vs-sorted: sort vs. sorted, reverse vs. reversed, etc. So, how about: scramble + scrambled shuffle individual entries within each row/column/..., as in Warren's suggestion. shuffle + shuffled to do what shuffle, permutation do now (mnemonic: these break a 2d array into a bunch of 1d cards, and then shuffle those cards). permuted remains indefinitely, with the docstring: Deprecated alias for 'shuffled'. That sounds good to me. (I might go with 'randomize' instead of 'scramble', but that's a second-order decision for the API.) I hesitate to use names like randomize because they're less informative than they feel seem -- if asked what this operation does to an array, then it would be natural to say it randomizes the array. But if told that the random module has a function called randomize, then that's not very informative -- everything in random randomizes something somehow. I had some similar concerns (hence my original disarrange), but randomize seemed more likely to be found when searching or browsing the docs, and while it might be a bit too generic-sounding, it does feel like a natural verb for the process. On the other hand, permute and permuted are even more natural and unambiguous. Any objections to those? (The existing function is permutation.) [...] By the way, permutation has a feature not yet mentioned here: if the argument is an integer 'n', it generates a permutation of arange(n). In this case, it acts like matlab's randperm function. Unless we replicate that in the new function, we shouldn't deprecate permutation. I guess we could do something like: permutation(n): Return a random permutation on n items. Equivalent to permuted(arange(n)). Note: for backwards compatibility, a call like permutation(an_array) currently returns the same as shuffled(an_array). (This is *not* equivalent to permuted(an_array).) This functionality is deprecated. OTOH np.random.permute as a name does have a downside: someday we'll probably add a function called np.permute (for applying a given permutation in place -- the O(n) algorithm for this is useful and tricky), and having two functions with the same name and very different semantics would be pretty confusing. I like `permute`. That's the one term I'm looking for first. If np.permute does some kind of deterministic permutation or pivoting, then I wouldn't find it confusing if np.random.permute does random permutation. (I definitely don't like scrambled, sounds like eggs or cable TV that needs to be unscrambled.) Josef -n -- Nathaniel J. Smith Postdoctoral researcher - Informatics - University of Edinburgh http://vorpus.org ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Changed behavior of np.gradient
That isn't what I meant. Higher order doesn't necessarily mean more accurate. The results simply have different properties. The user needs to choose the differentiation order that they need. One interesting effect in data assimilation/modeling is that even-order differentiation can often have detrimental effects while higher odd order differentiation are better, but it is highly dependent upon the model. This change in gradient broke a unit test in matplotlib (for a new feature, so it isn't *that* critical). We didn't notice it at first because we weren't testing numpy 1.9 at the time. I want the feature (I have need for it elsewhere), but I don't want the change in default behavior. Cheers! Ben Root On Thu, Oct 16, 2014 at 9:31 PM, Nathaniel Smith n...@pobox.com wrote: On Fri, Oct 17, 2014 at 2:23 AM, Benjamin Root ben.r...@ou.edu wrote: It isn't really a question of accuracy. It breaks unit tests and reproducibility elsewhere. My vote is to revert to the old behavior in 1.9.1. Why would one want the 2nd order differences at all, if they're not more accurate? Should we just revert the patch entirely? I assumed the change had some benefit... -- Nathaniel J. Smith Postdoctoral researcher - Informatics - University of Edinburgh http://vorpus.org ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Changed behavior of np.gradient
Hi, On Thu, Oct 16, 2014 at 6:38 PM, Benjamin Root ben.r...@ou.edu wrote: That isn't what I meant. Higher order doesn't necessarily mean more accurate. The results simply have different properties. The user needs to choose the differentiation order that they need. One interesting effect in data assimilation/modeling is that even-order differentiation can often have detrimental effects while higher odd order differentiation are better, but it is highly dependent upon the model. This change in gradient broke a unit test in matplotlib (for a new feature, so it isn't *that* critical). We didn't notice it at first because we weren't testing numpy 1.9 at the time. I want the feature (I have need for it elsewhere), but I don't want the change in default behavior. I think it would be a bad idea to revert now. I suspect, if you revert, then a lot of other code will assume the 1.9.0, = 1.9.1 behavior. In that case, the code will work as expected most of the time, except when combined with 1.9.0, which could be seriously surprising, and often missed. If you keep the new behavior, then it will be clearer that other code will have to adapt to this change = 1.9.0 - surprise, but predictable surprise, if you see what I mean... Matthew ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Changed behavior of np.gradient
On Thu, Oct 16, 2014 at 8:25 PM, Matthew Brett matthew.br...@gmail.com wrote: Hi, On Thu, Oct 16, 2014 at 6:38 PM, Benjamin Root ben.r...@ou.edu wrote: That isn't what I meant. Higher order doesn't necessarily mean more accurate. The results simply have different properties. The user needs to choose the differentiation order that they need. One interesting effect in data assimilation/modeling is that even-order differentiation can often have detrimental effects while higher odd order differentiation are better, but it is highly dependent upon the model. This change in gradient broke a unit test in matplotlib (for a new feature, so it isn't *that* critical). We didn't notice it at first because we weren't testing numpy 1.9 at the time. I want the feature (I have need for it elsewhere), but I don't want the change in default behavior. I think it would be a bad idea to revert now. I suspect, if you revert, then a lot of other code will assume the 1.9.0, = 1.9.1 behavior. In that case, the code will work as expected most of the time, except when combined with 1.9.0, which could be seriously surprising, and often missed. If you keep the new behavior, then it will be clearer that other code will have to adapt to this change = 1.9.0 - surprise, but predictable surprise, if you see what I mean... 1.9.1 will be out in a week or so. To be honest, these days I regard the 1.x.0 releases as sort of an advanced release candidate. I think there are just a lot more changes going in between releases and the release gets a lot more testing than the official release candidates. Chuck ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Request for enhancement to numpy.random.shuffle
On Fri, Oct 17, 2014 at 2:35 AM, josef.p...@gmail.com wrote: On Thu, Oct 16, 2014 at 3:39 PM, Nathaniel Smith n...@pobox.com wrote: On Thu, Oct 16, 2014 at 6:30 PM, Warren Weckesser warren.weckes...@gmail.com wrote: On Thu, Oct 16, 2014 at 12:40 PM, Nathaniel Smith n...@pobox.com wrote: On Thu, Oct 16, 2014 at 4:39 PM, Warren Weckesser warren.weckes...@gmail.com wrote: On Sun, Oct 12, 2014 at 9:13 PM, Nathaniel Smith n...@pobox.com wrote: Regarding names: shuffle/permutation is a terrible naming convention IMHO and shouldn't be propagated further. We already have a good naming convention for inplace-vs-sorted: sort vs. sorted, reverse vs. reversed, etc. So, how about: scramble + scrambled shuffle individual entries within each row/column/..., as in Warren's suggestion. shuffle + shuffled to do what shuffle, permutation do now (mnemonic: these break a 2d array into a bunch of 1d cards, and then shuffle those cards). permuted remains indefinitely, with the docstring: Deprecated alias for 'shuffled'. That sounds good to me. (I might go with 'randomize' instead of 'scramble', but that's a second-order decision for the API.) I hesitate to use names like randomize because they're less informative than they feel seem -- if asked what this operation does to an array, then it would be natural to say it randomizes the array. But if told that the random module has a function called randomize, then that's not very informative -- everything in random randomizes something somehow. I had some similar concerns (hence my original disarrange), but randomize seemed more likely to be found when searching or browsing the docs, and while it might be a bit too generic-sounding, it does feel like a natural verb for the process. On the other hand, permute and permuted are even more natural and unambiguous. Any objections to those? (The existing function is permutation.) [...] By the way, permutation has a feature not yet mentioned here: if the argument is an integer 'n', it generates a permutation of arange(n). In this case, it acts like matlab's randperm function. Unless we replicate that in the new function, we shouldn't deprecate permutation. I guess we could do something like: permutation(n): Return a random permutation on n items. Equivalent to permuted(arange(n)). Note: for backwards compatibility, a call like permutation(an_array) currently returns the same as shuffled(an_array). (This is *not* equivalent to permuted(an_array).) This functionality is deprecated. OTOH np.random.permute as a name does have a downside: someday we'll probably add a function called np.permute (for applying a given permutation in place -- the O(n) algorithm for this is useful and tricky), and having two functions with the same name and very different semantics would be pretty confusing. I like `permute`. That's the one term I'm looking for first. If np.permute does some kind of deterministic permutation or pivoting, then I wouldn't find it confusing if np.random.permute does random permutation. Yeah, but: from ... import permute # 500 lines later def foo(...): permute(...) # what the heck is this It definitely *can* be confusing; basically everything else in np.random has a name that suggests randomness even without seeing the full path. It's not a huge deal, though. (I definitely don't like scrambled, sounds like eggs or cable TV that needs to be unscrambled.) I vote that in this kind of bikeshed we try to restrict ourselves to arguments that we can at least pretend are motivated by some technical/UX concern ;-). (I guess unscrambling eggs would be technically impressive tho ;-)) -- Nathaniel J. Smith Postdoctoral researcher - Informatics - University of Edinburgh http://vorpus.org ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] Passing multiple output arguments to ufunc
There is an oldish feature request in github (https://github.com/numpy/numpy/issues/4752), complaining about it not being possible to pass multiple output arguments to a ufunc using keyword arguments. You can pass them all as positional arguments: out1 = np.empty(1) out2 = np.empty(1) np.modf([1.333], out1, out2) (array([ 0.333]), array([ 1.])) You can also pass the first as a kwarg if you leave the others unspecified: np.modf([1.333], out=out1) (array([ 0.333]), array([ 1.])) You can also use None in a positional argument to leave some of the output arguments unspecified: np.modf([1.], None, out2) (array([ 0.]), array([ 1.])) But you cannot do something like np.modf([1.333], out=(out1, out2)) Traceback (most recent call last): File stdin, line 1, in module TypeError: return arrays must be of ArrayType Would this behavior make sense? The idea would be to allow a tuple as a valid input for the 'out=' kwarg. It would have to have a length exactly matching the number of output arguments, and its items would have to be either arrays or None. For backwards compatibility we probably should still allow a single array to mean the first output argument, even if the ufunc has multiple outputs. Any other thoughts? Jaime -- (\__/) ( O.o) ( ) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes de dominación mundial. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion