Re: [Numpy-discussion] Request for enhancement to numpy.random.shuffle

2014-10-16 Thread Warren Weckesser
On Sun, Oct 12, 2014 at 9:13 PM, Nathaniel Smith n...@pobox.com wrote:

 On Sun, Oct 12, 2014 at 5:14 PM, Sebastian se...@sebix.at wrote:
 
  On 2014-10-12 16:54, Warren Weckesser wrote:
 
 
  On Sun, Oct 12, 2014 at 7:57 AM, Robert Kern robert.k...@gmail.com
  mailto:robert.k...@gmail.com wrote:
 
  On Sat, Oct 11, 2014 at 11:51 PM, Warren Weckesser
  warren.weckes...@gmail.com mailto:warren.weckes...@gmail.com
  wrote:
 
   A small wart in this API is the meaning of
  
 shuffle(a, independent=False, axis=None)
  
   It could be argued that the correct behavior is to leave the
   array unchanged. (The current behavior can be interpreted as
   shuffling a 1-d sequence of monolithic blobs; the axis argument
   specifies which axis of the array corresponds to the
   sequence index.  Then `axis=None` means the argument is
   a single monolithic blob, so there is nothing to shuffle.)
   Or an error could be raised.
  
   What do you think?
 
  It seems to me a perfectly good reason to have two methods instead
 of
  one. I can't imagine when I wouldn't be using a literal True or
 False
  for this, so it really should be two different methods.
 
 
 
  I agree, and my first inclination was to propose a different method
  (and I had the bikeshedding conversation with myself about the name:
  disarrange, scramble, disorder, randomize, ashuffle, some
  other variation of the word shuffle, ...), but I figured the first
  thing folks would say is Why not just add options to shuffle?  So,
  choose your battles and all that.
 
  What do other folks think of making a separate method
  I'm not a fan of more methods with similar functionality in Numpy. It's
  already hard to overlook the existing functions and all their possible
  applications and variants. The axis=None proposal for shuffling all
  items is very intuitive.
 
  I think we don't want to take the path of matlab: a huge amount of
  powerful functions, but few people know of their powerful possibilities.

 I totally agree with this principle, but I think this is an exception
 to the rule, b/c unfortunately in this case the function that we *do*
 have is weird and inconsistent with how most other functions in numpy
 work. It doesn't vectorize! Cf. 'sort' or how a 'shuffle' gufunc
 (k,)-(k,) would work. Also, it's easy to implement the current
 'shuffle' in terms of any 1d shuffle function, with no explicit loops,
 Warren's disarrange requires an explicit loop. So, we really
 implemented the wrong one, oops. What this means going forward,
 though, is that our only options are either to implement both
 behaviours with two functions, or else to give up on have the more
 natural behaviour altogether. I think the former is the lesser of two
 evils.

 Regarding names: shuffle/permutation is a terrible naming convention
 IMHO and shouldn't be propagated further. We already have a good
 naming convention for inplace-vs-sorted: sort vs. sorted, reverse vs.
 reversed, etc.

 So, how about:

 scramble + scrambled shuffle individual entries within each
 row/column/..., as in Warren's suggestion.

 shuffle + shuffled to do what shuffle, permutation do now (mnemonic:
 these break a 2d array into a bunch of 1d cards, and then shuffle
 those cards).

 permuted remains indefinitely, with the docstring: Deprecated alias
 for 'shuffled'.



That sounds good to me.  (I might go with 'randomize' instead of
'scramble', but that's a second-order decision for the API.)

Warren


-n

 --
 Nathaniel J. Smith
 Postdoctoral researcher - Informatics - University of Edinburgh
 http://vorpus.org
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Request for enhancement to numpy.random.shuffle

2014-10-16 Thread Jaime Fernández del Río
On Thu, Oct 16, 2014 at 8:39 AM, Warren Weckesser 
warren.weckes...@gmail.com wrote:



 On Sun, Oct 12, 2014 at 9:13 PM, Nathaniel Smith n...@pobox.com wrote:

 On Sun, Oct 12, 2014 at 5:14 PM, Sebastian se...@sebix.at wrote:
 
  On 2014-10-12 16:54, Warren Weckesser wrote:
 
 
  On Sun, Oct 12, 2014 at 7:57 AM, Robert Kern robert.k...@gmail.com
  mailto:robert.k...@gmail.com wrote:
 
  On Sat, Oct 11, 2014 at 11:51 PM, Warren Weckesser
  warren.weckes...@gmail.com mailto:warren.weckes...@gmail.com
  wrote:
 
   A small wart in this API is the meaning of
  
 shuffle(a, independent=False, axis=None)
  
   It could be argued that the correct behavior is to leave the
   array unchanged. (The current behavior can be interpreted as
   shuffling a 1-d sequence of monolithic blobs; the axis argument
   specifies which axis of the array corresponds to the
   sequence index.  Then `axis=None` means the argument is
   a single monolithic blob, so there is nothing to shuffle.)
   Or an error could be raised.
  
   What do you think?
 
  It seems to me a perfectly good reason to have two methods instead
 of
  one. I can't imagine when I wouldn't be using a literal True or
 False
  for this, so it really should be two different methods.
 
 
 
  I agree, and my first inclination was to propose a different method
  (and I had the bikeshedding conversation with myself about the name:
  disarrange, scramble, disorder, randomize, ashuffle, some
  other variation of the word shuffle, ...), but I figured the first
  thing folks would say is Why not just add options to shuffle?  So,
  choose your battles and all that.
 
  What do other folks think of making a separate method
  I'm not a fan of more methods with similar functionality in Numpy. It's
  already hard to overlook the existing functions and all their possible
  applications and variants. The axis=None proposal for shuffling all
  items is very intuitive.
 
  I think we don't want to take the path of matlab: a huge amount of
  powerful functions, but few people know of their powerful possibilities.

 I totally agree with this principle, but I think this is an exception
 to the rule, b/c unfortunately in this case the function that we *do*
 have is weird and inconsistent with how most other functions in numpy
 work. It doesn't vectorize! Cf. 'sort' or how a 'shuffle' gufunc
 (k,)-(k,) would work. Also, it's easy to implement the current
 'shuffle' in terms of any 1d shuffle function, with no explicit loops,
 Warren's disarrange requires an explicit loop. So, we really
 implemented the wrong one, oops. What this means going forward,
 though, is that our only options are either to implement both
 behaviours with two functions, or else to give up on have the more
 natural behaviour altogether. I think the former is the lesser of two
 evils.

 Regarding names: shuffle/permutation is a terrible naming convention
 IMHO and shouldn't be propagated further. We already have a good
 naming convention for inplace-vs-sorted: sort vs. sorted, reverse vs.
 reversed, etc.

 So, how about:

 scramble + scrambled shuffle individual entries within each
 row/column/..., as in Warren's suggestion.

 shuffle + shuffled to do what shuffle, permutation do now (mnemonic:
 these break a 2d array into a bunch of 1d cards, and then shuffle
 those cards).

 permuted remains indefinitely, with the docstring: Deprecated alias
 for 'shuffled'.



 That sounds good to me.  (I might go with 'randomize' instead of
 'scramble', but that's a second-order decision for the API.)


So the only little detail left is someone actually rolling up his/her
sleeves and creating a PR... ;-)

The current shuffle and permutation are implemented here:

https://github.com/numpy/numpy/blob/master/numpy/random/mtrand/mtrand.pyx#L4551

It's in Cython, so it is a good candidate for anyone wanting to contribute
to numpy, but wary of C code.

Jaime





 Warren


 -n

 --
 Nathaniel J. Smith
 Postdoctoral researcher - Informatics - University of Edinburgh
 http://vorpus.org
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion



 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion




-- 
(\__/)
( O.o)
(  ) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes
de dominación mundial.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Request for enhancement to numpy.random.shuffle

2014-10-16 Thread Nathaniel Smith
On Thu, Oct 16, 2014 at 4:39 PM, Warren Weckesser
warren.weckes...@gmail.com wrote:

 On Sun, Oct 12, 2014 at 9:13 PM, Nathaniel Smith n...@pobox.com wrote:

 Regarding names: shuffle/permutation is a terrible naming convention
 IMHO and shouldn't be propagated further. We already have a good
 naming convention for inplace-vs-sorted: sort vs. sorted, reverse vs.
 reversed, etc.

 So, how about:

 scramble + scrambled shuffle individual entries within each
 row/column/..., as in Warren's suggestion.

 shuffle + shuffled to do what shuffle, permutation do now (mnemonic:
 these break a 2d array into a bunch of 1d cards, and then shuffle
 those cards).

 permuted remains indefinitely, with the docstring: Deprecated alias
 for 'shuffled'.

 That sounds good to me.  (I might go with 'randomize' instead of 'scramble',
 but that's a second-order decision for the API.)

I hesitate to use names like randomize because they're less
informative than they feel seem -- if asked what this operation does
to an array, then it would be natural to say it randomizes the
array. But if told that the random module has a function called
randomize, then that's not very informative -- everything in random
randomizes something somehow.

-n

-- 
Nathaniel J. Smith
Postdoctoral researcher - Informatics - University of Edinburgh
http://vorpus.org
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Changed behavior of np.gradient

2014-10-16 Thread Nathaniel Smith
On Tue, Oct 14, 2014 at 10:33 PM, Charles R Harris
charlesr.har...@gmail.com wrote:

 On Tue, Oct 14, 2014 at 11:50 AM, Nathaniel Smith n...@pobox.com wrote:

 On 14 Oct 2014 18:29, Charles R Harris charlesr.har...@gmail.com
 wrote:
 
 
 
  On Tue, Oct 14, 2014 at 10:57 AM, Nathaniel Smith n...@pobox.com wrote:
 
  On 4 Oct 2014 22:17, Stéfan van der Walt ste...@sun.ac.za wrote:
  
   On Oct 4, 2014 10:14 PM, Derek Homeier
   de...@astro.physik.uni-goettingen.de wrote:
   
+1 for an order=2 or maxorder=2 flag
  
   If you parameterize that flag, users will want to change its value
   (above two). Perhaps rather use a boolean flag such as second_order or
   high_order, unless it seems feasible to include additional orders in 
   the
   future.
 
  Predicting the future is hard :-). And in particular high_order= would
  create all kinds of confusion if in the future we added 3rd order
  approximations but high_order=True continued to mean 2nd order because of
  compatibility. I like maxorder (or max_order would be more pep8ish I 
  guess)
  because it leaves our options open. (Similar to how it's often better to
  have a kwarg that can take two possible string values than to have a 
  boolean
  kwarg. It makes current code more explicit and makes future enhancements
  easier.)
 
 
  I think maxorder is a bit misleading. The both versions are second order
  in the interior while at the ends the old is first order and the new is
  second order. Maybe edge_order?

 Ah, that makes sense. edge_order makes more sense to me too then - and we
 can always add interior_order to complement it later, if appropriate.

 The other thing to decide on is the default. Is the 2nd order version
 generally preferred (modulo compatibility)? If so then it might make sense
 to keep it the default, given that there are already numpy's in the wild
 with that version, so we can't fully guarantee compatibility even if we
 wanted to. But what do others think?

 I'd be inclined to keep the older as the default and regard adding the
 keyword as a bugfix. I should have caught the incompatibility in review.

I don't have any code that uses gradient, so I don't have a great
sense of the trade-offs here.

- Usually if we have a change that produces increased accuracy, we
make the increased accuracy the default. Otherwise no-one ever uses
it, and everyone gets less accurate results than they would otherwise.
(I don't have a great sense of how much this change affects accuracy
though.)

- If the change in output per se is a serious problem for people, then
it's not one we can fix at this point -- 1.9.0 is out there and people
are using it anyway, so those who have the problem already need to
take some affirmative action to fix it. (Also, it's kinda weird to
change a function's behaviour and add a new argument in a point
release!)

So I'd like to hear from people affected by this -- would you prefer
to have the 2nd order boundary calculations by default, you just need
some way to workaround the immediate problems in existing code? Or do
you prefer the old default remain the default, with 2nd order boundary
calculations something that must be requested by hand every time?

-n

-- 
Nathaniel J. Smith
Postdoctoral researcher - Informatics - University of Edinburgh
http://vorpus.org
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Request for enhancement to numpy.random.shuffle

2014-10-16 Thread Warren Weckesser
On Thu, Oct 16, 2014 at 12:40 PM, Nathaniel Smith n...@pobox.com wrote:

 On Thu, Oct 16, 2014 at 4:39 PM, Warren Weckesser
 warren.weckes...@gmail.com wrote:
 
  On Sun, Oct 12, 2014 at 9:13 PM, Nathaniel Smith n...@pobox.com wrote:
 
  Regarding names: shuffle/permutation is a terrible naming convention
  IMHO and shouldn't be propagated further. We already have a good
  naming convention for inplace-vs-sorted: sort vs. sorted, reverse vs.
  reversed, etc.
 
  So, how about:
 
  scramble + scrambled shuffle individual entries within each
  row/column/..., as in Warren's suggestion.
 
  shuffle + shuffled to do what shuffle, permutation do now (mnemonic:
  these break a 2d array into a bunch of 1d cards, and then shuffle
  those cards).
 
  permuted remains indefinitely, with the docstring: Deprecated alias
  for 'shuffled'.
 
  That sounds good to me.  (I might go with 'randomize' instead of
 'scramble',
  but that's a second-order decision for the API.)

 I hesitate to use names like randomize because they're less
 informative than they feel seem -- if asked what this operation does
 to an array, then it would be natural to say it randomizes the
 array. But if told that the random module has a function called
 randomize, then that's not very informative -- everything in random
 randomizes something somehow.



I had some similar concerns (hence my original disarrange), but
randomize seemed more likely to be found when searching or browsing the
docs, and while it might be a bit too generic-sounding, it does feel like a
natural verb for the process.   On the other hand, permute and permuted
are even more natural and unambiguous.  Any objections to those?  (The
existing function is permutation.)

Whatever the names, the docstrings for the four functions should be
cross-referenced in their See Also sections to help users find the
appropriate function.

By the way, permutation has a feature not yet mentioned here: if the
argument is an integer 'n', it generates a permutation of arange(n).  In
this case, it acts like matlab's randperm function.  Unless we replicate
that in the new function, we shouldn't deprecate permutation.

Warren



 -n

 --
 Nathaniel J. Smith
 Postdoctoral researcher - Informatics - University of Edinburgh
 http://vorpus.org
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Request for enhancement to numpy.random.shuffle

2014-10-16 Thread Nathaniel Smith
On Thu, Oct 16, 2014 at 6:30 PM, Warren Weckesser
warren.weckes...@gmail.com wrote:


 On Thu, Oct 16, 2014 at 12:40 PM, Nathaniel Smith n...@pobox.com wrote:

 On Thu, Oct 16, 2014 at 4:39 PM, Warren Weckesser
 warren.weckes...@gmail.com wrote:
 
  On Sun, Oct 12, 2014 at 9:13 PM, Nathaniel Smith n...@pobox.com wrote:
 
  Regarding names: shuffle/permutation is a terrible naming convention
  IMHO and shouldn't be propagated further. We already have a good
  naming convention for inplace-vs-sorted: sort vs. sorted, reverse vs.
  reversed, etc.
 
  So, how about:
 
  scramble + scrambled shuffle individual entries within each
  row/column/..., as in Warren's suggestion.
 
  shuffle + shuffled to do what shuffle, permutation do now (mnemonic:
  these break a 2d array into a bunch of 1d cards, and then shuffle
  those cards).
 
  permuted remains indefinitely, with the docstring: Deprecated alias
  for 'shuffled'.
 
  That sounds good to me.  (I might go with 'randomize' instead of
  'scramble',
  but that's a second-order decision for the API.)

 I hesitate to use names like randomize because they're less
 informative than they feel seem -- if asked what this operation does
 to an array, then it would be natural to say it randomizes the
 array. But if told that the random module has a function called
 randomize, then that's not very informative -- everything in random
 randomizes something somehow.

 I had some similar concerns (hence my original disarrange), but
 randomize seemed more likely to be found when searching or browsing the
 docs, and while it might be a bit too generic-sounding, it does feel like a
 natural verb for the process.   On the other hand, permute and permuted
 are even more natural and unambiguous.  Any objections to those?  (The
 existing function is permutation.)
[...]
 By the way, permutation has a feature not yet mentioned here: if the
 argument is an integer 'n', it generates a permutation of arange(n).  In
 this case, it acts like matlab's randperm function.  Unless we replicate
 that in the new function, we shouldn't deprecate permutation.

I guess we could do something like:

permutation(n):

Return a random permutation on n items. Equivalent to permuted(arange(n)).

Note: for backwards compatibility, a call like permutation(an_array)
currently returns the same as shuffled(an_array). (This is *not*
equivalent to permuted(an_array).) This functionality is deprecated.

OTOH np.random.permute as a name does have a downside: someday we'll
probably add a function called np.permute (for applying a given
permutation in place -- the O(n) algorithm for this is useful and
tricky), and having two functions with the same name and very
different semantics would be pretty confusing.

-n

-- 
Nathaniel J. Smith
Postdoctoral researcher - Informatics - University of Edinburgh
http://vorpus.org
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Changed behavior of np.gradient

2014-10-16 Thread Ariel Rokem
On Thu, Oct 16, 2014 at 10:22 AM, Nathaniel Smith n...@pobox.com wrote:

 On Tue, Oct 14, 2014 at 10:33 PM, Charles R Harris
 charlesr.har...@gmail.com wrote:
 
  On Tue, Oct 14, 2014 at 11:50 AM, Nathaniel Smith n...@pobox.com wrote:
 
  On 14 Oct 2014 18:29, Charles R Harris charlesr.har...@gmail.com
  wrote:
  
  
  
   On Tue, Oct 14, 2014 at 10:57 AM, Nathaniel Smith n...@pobox.com
 wrote:
  
   On 4 Oct 2014 22:17, Stéfan van der Walt ste...@sun.ac.za wrote:
   
On Oct 4, 2014 10:14 PM, Derek Homeier
de...@astro.physik.uni-goettingen.de wrote:

 +1 for an order=2 or maxorder=2 flag
   
If you parameterize that flag, users will want to change its value
(above two). Perhaps rather use a boolean flag such as
 second_order or
high_order, unless it seems feasible to include additional
 orders in the
future.
  
   Predicting the future is hard :-). And in particular high_order=
 would
   create all kinds of confusion if in the future we added 3rd order
   approximations but high_order=True continued to mean 2nd order
 because of
   compatibility. I like maxorder (or max_order would be more pep8ish I
 guess)
   because it leaves our options open. (Similar to how it's often
 better to
   have a kwarg that can take two possible string values than to have a
 boolean
   kwarg. It makes current code more explicit and makes future
 enhancements
   easier.)
  
  
   I think maxorder is a bit misleading. The both versions are second
 order
   in the interior while at the ends the old is first order and the new
 is
   second order. Maybe edge_order?
 
  Ah, that makes sense. edge_order makes more sense to me too then - and
 we
  can always add interior_order to complement it later, if appropriate.
 
  The other thing to decide on is the default. Is the 2nd order version
  generally preferred (modulo compatibility)? If so then it might make
 sense
  to keep it the default, given that there are already numpy's in the wild
  with that version, so we can't fully guarantee compatibility even if we
  wanted to. But what do others think?
 
  I'd be inclined to keep the older as the default and regard adding the
  keyword as a bugfix. I should have caught the incompatibility in review.

 I don't have any code that uses gradient, so I don't have a great
 sense of the trade-offs here.

 - Usually if we have a change that produces increased accuracy, we
 make the increased accuracy the default. Otherwise no-one ever uses
 it, and everyone gets less accurate results than they would otherwise.
 (I don't have a great sense of how much this change affects accuracy
 though.)

 - If the change in output per se is a serious problem for people, then
 it's not one we can fix at this point -- 1.9.0 is out there and people
 are using it anyway, so those who have the problem already need to
 take some affirmative action to fix it. (Also, it's kinda weird to
 change a function's behaviour and add a new argument in a point
 release!)

 So I'd like to hear from people affected by this -- would you prefer
 to have the 2nd order boundary calculations by default, you just need
 some way to workaround the immediate problems in existing code? Or do
 you prefer the old default remain the default, with 2nd order boundary
 calculations something that must be requested by hand every time?

 -n

 --
 Nathaniel J. Smith
 Postdoctoral researcher - Informatics - University of Edinburgh
 http://vorpus.org
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion


Since I started this discussion, I'll chime in. I don't have a strong
preference for either mode that stems from a computational/scientific
principle. As Nathaniel suggested - I have resorted to simply copying the
1.8 version of the function into my algorithm implementation, with the hope
of removing that down the line. In that respect, I have a very weak
preference for preserving the (1.8) status quo per default.

Thanks!
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Changed behavior of np.gradient

2014-10-16 Thread Benjamin Root
It isn't really a question of accuracy. It breaks unit tests and
reproducibility elsewhere. My vote is to revert to the old behavior in
1.9.1.

Ben Root

On Thu, Oct 16, 2014 at 6:10 PM, Ariel Rokem aro...@gmail.com wrote:


 On Thu, Oct 16, 2014 at 10:22 AM, Nathaniel Smith n...@pobox.com wrote:

 On Tue, Oct 14, 2014 at 10:33 PM, Charles R Harris
 charlesr.har...@gmail.com wrote:
 
  On Tue, Oct 14, 2014 at 11:50 AM, Nathaniel Smith n...@pobox.com
 wrote:
 
  On 14 Oct 2014 18:29, Charles R Harris charlesr.har...@gmail.com
  wrote:
  
  
  
   On Tue, Oct 14, 2014 at 10:57 AM, Nathaniel Smith n...@pobox.com
 wrote:
  
   On 4 Oct 2014 22:17, Stéfan van der Walt ste...@sun.ac.za
 wrote:
   
On Oct 4, 2014 10:14 PM, Derek Homeier
de...@astro.physik.uni-goettingen.de wrote:

 +1 for an order=2 or maxorder=2 flag
   
If you parameterize that flag, users will want to change its value
(above two). Perhaps rather use a boolean flag such as
 second_order or
high_order, unless it seems feasible to include additional
 orders in the
future.
  
   Predicting the future is hard :-). And in particular high_order=
 would
   create all kinds of confusion if in the future we added 3rd order
   approximations but high_order=True continued to mean 2nd order
 because of
   compatibility. I like maxorder (or max_order would be more pep8ish
 I guess)
   because it leaves our options open. (Similar to how it's often
 better to
   have a kwarg that can take two possible string values than to have
 a boolean
   kwarg. It makes current code more explicit and makes future
 enhancements
   easier.)
  
  
   I think maxorder is a bit misleading. The both versions are second
 order
   in the interior while at the ends the old is first order and the new
 is
   second order. Maybe edge_order?
 
  Ah, that makes sense. edge_order makes more sense to me too then - and
 we
  can always add interior_order to complement it later, if appropriate.
 
  The other thing to decide on is the default. Is the 2nd order version
  generally preferred (modulo compatibility)? If so then it might make
 sense
  to keep it the default, given that there are already numpy's in the
 wild
  with that version, so we can't fully guarantee compatibility even if we
  wanted to. But what do others think?
 
  I'd be inclined to keep the older as the default and regard adding the
  keyword as a bugfix. I should have caught the incompatibility in review.

 I don't have any code that uses gradient, so I don't have a great
 sense of the trade-offs here.

 - Usually if we have a change that produces increased accuracy, we
 make the increased accuracy the default. Otherwise no-one ever uses
 it, and everyone gets less accurate results than they would otherwise.
 (I don't have a great sense of how much this change affects accuracy
 though.)

 - If the change in output per se is a serious problem for people, then
 it's not one we can fix at this point -- 1.9.0 is out there and people
 are using it anyway, so those who have the problem already need to
 take some affirmative action to fix it. (Also, it's kinda weird to
 change a function's behaviour and add a new argument in a point
 release!)

 So I'd like to hear from people affected by this -- would you prefer
 to have the 2nd order boundary calculations by default, you just need
 some way to workaround the immediate problems in existing code? Or do
 you prefer the old default remain the default, with 2nd order boundary
 calculations something that must be requested by hand every time?

 -n

 --
 Nathaniel J. Smith
 Postdoctoral researcher - Informatics - University of Edinburgh
 http://vorpus.org
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion


 Since I started this discussion, I'll chime in. I don't have a strong
 preference for either mode that stems from a computational/scientific
 principle. As Nathaniel suggested - I have resorted to simply copying the
 1.8 version of the function into my algorithm implementation, with the hope
 of removing that down the line. In that respect, I have a very weak
 preference for preserving the (1.8) status quo per default.

 Thanks!


 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Changed behavior of np.gradient

2014-10-16 Thread Nathaniel Smith
On Fri, Oct 17, 2014 at 2:23 AM, Benjamin Root ben.r...@ou.edu wrote:
 It isn't really a question of accuracy. It breaks unit tests and
 reproducibility elsewhere. My vote is to revert to the old behavior in
 1.9.1.

Why would one want the 2nd order differences at all, if they're not
more accurate? Should we just revert the patch entirely? I assumed the
change had some benefit...

-- 
Nathaniel J. Smith
Postdoctoral researcher - Informatics - University of Edinburgh
http://vorpus.org
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Request for enhancement to numpy.random.shuffle

2014-10-16 Thread josef.pktd
On Thu, Oct 16, 2014 at 3:39 PM, Nathaniel Smith n...@pobox.com wrote:
 On Thu, Oct 16, 2014 at 6:30 PM, Warren Weckesser
 warren.weckes...@gmail.com wrote:


 On Thu, Oct 16, 2014 at 12:40 PM, Nathaniel Smith n...@pobox.com wrote:

 On Thu, Oct 16, 2014 at 4:39 PM, Warren Weckesser
 warren.weckes...@gmail.com wrote:
 
  On Sun, Oct 12, 2014 at 9:13 PM, Nathaniel Smith n...@pobox.com wrote:
 
  Regarding names: shuffle/permutation is a terrible naming convention
  IMHO and shouldn't be propagated further. We already have a good
  naming convention for inplace-vs-sorted: sort vs. sorted, reverse vs.
  reversed, etc.
 
  So, how about:
 
  scramble + scrambled shuffle individual entries within each
  row/column/..., as in Warren's suggestion.
 
  shuffle + shuffled to do what shuffle, permutation do now (mnemonic:
  these break a 2d array into a bunch of 1d cards, and then shuffle
  those cards).
 
  permuted remains indefinitely, with the docstring: Deprecated alias
  for 'shuffled'.
 
  That sounds good to me.  (I might go with 'randomize' instead of
  'scramble',
  but that's a second-order decision for the API.)

 I hesitate to use names like randomize because they're less
 informative than they feel seem -- if asked what this operation does
 to an array, then it would be natural to say it randomizes the
 array. But if told that the random module has a function called
 randomize, then that's not very informative -- everything in random
 randomizes something somehow.

 I had some similar concerns (hence my original disarrange), but
 randomize seemed more likely to be found when searching or browsing the
 docs, and while it might be a bit too generic-sounding, it does feel like a
 natural verb for the process.   On the other hand, permute and permuted
 are even more natural and unambiguous.  Any objections to those?  (The
 existing function is permutation.)
 [...]
 By the way, permutation has a feature not yet mentioned here: if the
 argument is an integer 'n', it generates a permutation of arange(n).  In
 this case, it acts like matlab's randperm function.  Unless we replicate
 that in the new function, we shouldn't deprecate permutation.

 I guess we could do something like:

 permutation(n):

 Return a random permutation on n items. Equivalent to permuted(arange(n)).

 Note: for backwards compatibility, a call like permutation(an_array)
 currently returns the same as shuffled(an_array). (This is *not*
 equivalent to permuted(an_array).) This functionality is deprecated.

 OTOH np.random.permute as a name does have a downside: someday we'll
 probably add a function called np.permute (for applying a given
 permutation in place -- the O(n) algorithm for this is useful and
 tricky), and having two functions with the same name and very
 different semantics would be pretty confusing.

I like `permute`. That's the one term I'm looking for first.

If np.permute does some kind of deterministic permutation or pivoting,
then I wouldn't find it confusing if np.random.permute does random
permutation.

(I definitely don't like scrambled, sounds like eggs or cable TV that
needs to be unscrambled.)

Josef



 -n

 --
 Nathaniel J. Smith
 Postdoctoral researcher - Informatics - University of Edinburgh
 http://vorpus.org
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Changed behavior of np.gradient

2014-10-16 Thread Benjamin Root
That isn't what I meant. Higher order doesn't necessarily mean more
accurate. The results simply have different properties. The user needs to
choose the differentiation order that they need. One interesting effect in
data assimilation/modeling is that even-order differentiation can often
have detrimental effects while higher odd order differentiation are better,
but it is highly dependent upon the model.

This change in gradient broke a unit test in matplotlib (for a new feature,
so it isn't *that* critical). We didn't notice it at first because we
weren't testing numpy 1.9 at the time. I want the feature (I have need for
it elsewhere), but I don't want the change in default behavior.

Cheers!
Ben Root


On Thu, Oct 16, 2014 at 9:31 PM, Nathaniel Smith n...@pobox.com wrote:

 On Fri, Oct 17, 2014 at 2:23 AM, Benjamin Root ben.r...@ou.edu wrote:
  It isn't really a question of accuracy. It breaks unit tests and
  reproducibility elsewhere. My vote is to revert to the old behavior in
  1.9.1.

 Why would one want the 2nd order differences at all, if they're not
 more accurate? Should we just revert the patch entirely? I assumed the
 change had some benefit...

 --
 Nathaniel J. Smith
 Postdoctoral researcher - Informatics - University of Edinburgh
 http://vorpus.org
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Changed behavior of np.gradient

2014-10-16 Thread Matthew Brett
Hi,

On Thu, Oct 16, 2014 at 6:38 PM, Benjamin Root ben.r...@ou.edu wrote:
 That isn't what I meant. Higher order doesn't necessarily mean more
 accurate. The results simply have different properties. The user needs to
 choose the differentiation order that they need. One interesting effect in
 data assimilation/modeling is that even-order differentiation can often have
 detrimental effects while higher odd order differentiation are better, but
 it is highly dependent upon the model.

 This change in gradient broke a unit test in matplotlib (for a new feature,
 so it isn't *that* critical). We didn't notice it at first because we
 weren't testing numpy 1.9 at the time. I want the feature (I have need for
 it elsewhere), but I don't want the change in default behavior.

I think it would be a bad idea to revert now.

I suspect, if you revert, then a lot of other code will assume the 
1.9.0, = 1.9.1  behavior.  In that case, the code will work as
expected most of the time, except when combined with 1.9.0, which
could be seriously surprising, and often missed.   If you keep the new
behavior, then it will be clearer that other code will have to adapt
to this change = 1.9.0 - surprise, but predictable surprise, if you
see what I mean...

Matthew
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Changed behavior of np.gradient

2014-10-16 Thread Charles R Harris
On Thu, Oct 16, 2014 at 8:25 PM, Matthew Brett matthew.br...@gmail.com
wrote:

 Hi,

 On Thu, Oct 16, 2014 at 6:38 PM, Benjamin Root ben.r...@ou.edu wrote:
  That isn't what I meant. Higher order doesn't necessarily mean more
  accurate. The results simply have different properties. The user needs to
  choose the differentiation order that they need. One interesting effect
 in
  data assimilation/modeling is that even-order differentiation can often
 have
  detrimental effects while higher odd order differentiation are better,
 but
  it is highly dependent upon the model.
 
  This change in gradient broke a unit test in matplotlib (for a new
 feature,
  so it isn't *that* critical). We didn't notice it at first because we
  weren't testing numpy 1.9 at the time. I want the feature (I have need
 for
  it elsewhere), but I don't want the change in default behavior.

 I think it would be a bad idea to revert now.

 I suspect, if you revert, then a lot of other code will assume the 
 1.9.0, = 1.9.1  behavior.  In that case, the code will work as
 expected most of the time, except when combined with 1.9.0, which
 could be seriously surprising, and often missed.   If you keep the new
 behavior, then it will be clearer that other code will have to adapt
 to this change = 1.9.0 - surprise, but predictable surprise, if you
 see what I mean...


1.9.1 will be out in a week or so. To be honest, these days I regard the
1.x.0 releases as sort of an advanced release candidate. I think there are
just a lot more changes going in between releases and the release gets a
lot more testing than the official release candidates.

Chuck
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Request for enhancement to numpy.random.shuffle

2014-10-16 Thread Nathaniel Smith
On Fri, Oct 17, 2014 at 2:35 AM,  josef.p...@gmail.com wrote:
 On Thu, Oct 16, 2014 at 3:39 PM, Nathaniel Smith n...@pobox.com wrote:
 On Thu, Oct 16, 2014 at 6:30 PM, Warren Weckesser
 warren.weckes...@gmail.com wrote:


 On Thu, Oct 16, 2014 at 12:40 PM, Nathaniel Smith n...@pobox.com wrote:

 On Thu, Oct 16, 2014 at 4:39 PM, Warren Weckesser
 warren.weckes...@gmail.com wrote:
 
  On Sun, Oct 12, 2014 at 9:13 PM, Nathaniel Smith n...@pobox.com wrote:
 
  Regarding names: shuffle/permutation is a terrible naming convention
  IMHO and shouldn't be propagated further. We already have a good
  naming convention for inplace-vs-sorted: sort vs. sorted, reverse vs.
  reversed, etc.
 
  So, how about:
 
  scramble + scrambled shuffle individual entries within each
  row/column/..., as in Warren's suggestion.
 
  shuffle + shuffled to do what shuffle, permutation do now (mnemonic:
  these break a 2d array into a bunch of 1d cards, and then shuffle
  those cards).
 
  permuted remains indefinitely, with the docstring: Deprecated alias
  for 'shuffled'.
 
  That sounds good to me.  (I might go with 'randomize' instead of
  'scramble',
  but that's a second-order decision for the API.)

 I hesitate to use names like randomize because they're less
 informative than they feel seem -- if asked what this operation does
 to an array, then it would be natural to say it randomizes the
 array. But if told that the random module has a function called
 randomize, then that's not very informative -- everything in random
 randomizes something somehow.

 I had some similar concerns (hence my original disarrange), but
 randomize seemed more likely to be found when searching or browsing the
 docs, and while it might be a bit too generic-sounding, it does feel like a
 natural verb for the process.   On the other hand, permute and permuted
 are even more natural and unambiguous.  Any objections to those?  (The
 existing function is permutation.)
 [...]
 By the way, permutation has a feature not yet mentioned here: if the
 argument is an integer 'n', it generates a permutation of arange(n).  In
 this case, it acts like matlab's randperm function.  Unless we replicate
 that in the new function, we shouldn't deprecate permutation.

 I guess we could do something like:

 permutation(n):

 Return a random permutation on n items. Equivalent to permuted(arange(n)).

 Note: for backwards compatibility, a call like permutation(an_array)
 currently returns the same as shuffled(an_array). (This is *not*
 equivalent to permuted(an_array).) This functionality is deprecated.

 OTOH np.random.permute as a name does have a downside: someday we'll
 probably add a function called np.permute (for applying a given
 permutation in place -- the O(n) algorithm for this is useful and
 tricky), and having two functions with the same name and very
 different semantics would be pretty confusing.

 I like `permute`. That's the one term I'm looking for first.

 If np.permute does some kind of deterministic permutation or pivoting,
 then I wouldn't find it confusing if np.random.permute does random
 permutation.

Yeah, but:

from ... import permute
# 500 lines later
def foo(...):
permute(...)  # what the heck is this

It definitely *can* be confusing; basically everything else in
np.random has a name that suggests randomness even without seeing the
full path.

It's not a huge deal, though.

 (I definitely don't like scrambled, sounds like eggs or cable TV that
 needs to be unscrambled.)

I vote that in this kind of bikeshed we try to restrict ourselves to
arguments that we can at least pretend are motivated by some
technical/UX concern ;-). (I guess unscrambling eggs would be
technically impressive tho ;-))

-- 
Nathaniel J. Smith
Postdoctoral researcher - Informatics - University of Edinburgh
http://vorpus.org
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] Passing multiple output arguments to ufunc

2014-10-16 Thread Jaime Fernández del Río
There is an oldish feature request in github
(https://github.com/numpy/numpy/issues/4752), complaining about it not
being possible to pass multiple output arguments to a ufunc using
keyword arguments.

You can pass them all as positional arguments:

 out1 = np.empty(1)
 out2 = np.empty(1)
 np.modf([1.333], out1, out2)
(array([ 0.333]), array([ 1.]))

You can also pass the first as a kwarg if you leave the others unspecified:

 np.modf([1.333], out=out1)
(array([ 0.333]), array([ 1.]))

You can also use None in a positional argument to leave some of the
output arguments unspecified:

 np.modf([1.], None, out2)
(array([ 0.]), array([ 1.]))

But you cannot do something like

 np.modf([1.333], out=(out1, out2))
Traceback (most recent call last):
  File stdin, line 1, in module
TypeError: return arrays must be of ArrayType

Would this behavior make sense? The idea would be to allow a tuple as
a valid input for the 'out=' kwarg. It would have to have a length
exactly matching the number of output arguments, and its items would
have to be either arrays or None.

For backwards compatibility we probably should still allow a single
array to mean the first output argument, even if the ufunc has
multiple outputs.

Any other thoughts?

Jaime

-- 
(\__/)
( O.o)
(  ) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus
planes de dominación mundial.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion