Re: [Numpy-discussion] padding options for diff

2016-10-28 Thread Marten van Kerkwijk
Matthew has made what looks like a very nice implementation of padding
in np.diff in https://github.com/numpy/numpy/pull/8206. I raised two
general questions about desired behaviour there that Matthew thought
we should put out on the mailiing list as well. This indeed seemed a
good opportunity to get feedback, so herewith a copy of
https://github.com/numpy/numpy/pull/8206#issuecomment-256909027

-- Marten

1. I'm not sure that treating a 1-d array as something that will just
extend the result along `axis` is a good idea, as it breaks standard
broadcasting rules. E.g., consider
```
np.diff([[1, 2], [4, 8]], to_begin=[1, 4])
# with your PR:
array([[1, 4, 1],
   [1, 4, 4]])
# but from regular broadcasting I would expect
array([[1, 1],
   [4, 4]])
# i.e., the same as if I did to_begin=[[1, 4]]
```
I think it is slightly odd to break the broadcasting expectation here,
especially since the regular use case surely is just to add a single
element so that one keeps the original shape. The advantage of
assuming this is that you do not have to do *any* array shaping of
`to_begin` and `to_end` (which perhaps also suggests it is the right
thing to do).

2. As I mentioned above, I think it may be worth thinking through a
little what to do with higher order differences, at least for
`to_begin='first'`. If the goal is to ensure that with that option, it
becomes the inverse of `cumsum`, then I think for higher order one
should add multiple elements in front, i.e., for that case, the
recursive call should be
```
return np.diff(np.diff(a, to_begin='first'), n-1, to_begin='first')
```
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] padding options for diff

2016-10-26 Thread Peter Creasey
> Date: Wed, 26 Oct 2016 16:18:05 -0400
> From: Matthew Harrigan 
>
> Would it be preferable to have to_begin='first' as an option under the
> existing kwarg to avoid overlapping?
>
>> if keep_left:
>> if to_begin is None:
>> to_begin = np.take(a, [0], axis=axis)
>> else:
>> raise ValueError(?np.diff(a, keep_left=False, to_begin=None)
>> can be used with either keep_left or to_begin, but not both.?)
>>
>> Generally I try to avoid optional keyword argument overlap, but in
>> this case it is probably justified.
>>

It works for me. I can't *think* of a case where you could have a
np.diff on a string array and 'first' could be confused with an
element, since you're not allowed diff on strings in the present numpy
anyway (unless wiser heads than me know something!). Feel free to move
the conversation to github btw.

Peter
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] padding options for diff

2016-10-26 Thread Matthew Harrigan
Would it be preferable to have to_begin='first' as an option under the
existing kwarg to avoid overlapping?

On Wed, Oct 26, 2016 at 3:35 PM, Peter Creasey <
p.e.creasey...@googlemail.com> wrote:

> > Date: Wed, 26 Oct 2016 09:05:41 -0400
> > From: Matthew Harrigan 
> >
> > np.cumsum(np.diff(x, to_begin=x.take([0], axis=axis), axis=axis),
> axis=axis)
> >
> > That's certainly not going to win any beauty contests.  The 1d case is
> > clean though:
> >
> > np.cumsum(np.diff(x, to_begin=x[0]))
> >
> > I'm not sure if this means the API should change, and if so how.  Higher
> > dimensional arrays seem to just have extra complexity.
> >
> >>
> >> I like the proposal, though I suspect that making it general has
> >> obscured that the most common use-case for padding is to make the
> >> inverse of np.cumsum (at least that?s what I frequently need), and now
> >> in the multidimensional case you have the somewhat unwieldy:
> >>
> >> >>> np.diff(a, axis=axis, to_begin=np.take(a, 0, axis=axis))
> >>
> >> rather than
> >>
> >> >>> np.diff(a, axis=axis, keep_left=True)
> >>
> >> which of course could just be an option upon what you already have.
> >>
>
> So my suggestion was intended that you might want an additional
> keyword argument (keep_left=False) to make the inverse np.cumsum
> use-case easier, i.e. you would have something in your np.diff like:
>
> if keep_left:
> if to_begin is None:
> to_begin = np.take(a, [0], axis=axis)
> else:
> raise ValueError(‘np.diff(a, keep_left=False, to_begin=None)
> can be used with either keep_left or to_begin, but not both.’)
>
> Generally I try to avoid optional keyword argument overlap, but in
> this case it is probably justified.
>
> Peter
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] padding options for diff

2016-10-26 Thread Peter Creasey
> Date: Wed, 26 Oct 2016 09:05:41 -0400
> From: Matthew Harrigan 
>
> np.cumsum(np.diff(x, to_begin=x.take([0], axis=axis), axis=axis), axis=axis)
>
> That's certainly not going to win any beauty contests.  The 1d case is
> clean though:
>
> np.cumsum(np.diff(x, to_begin=x[0]))
>
> I'm not sure if this means the API should change, and if so how.  Higher
> dimensional arrays seem to just have extra complexity.
>
>>
>> I like the proposal, though I suspect that making it general has
>> obscured that the most common use-case for padding is to make the
>> inverse of np.cumsum (at least that?s what I frequently need), and now
>> in the multidimensional case you have the somewhat unwieldy:
>>
>> >>> np.diff(a, axis=axis, to_begin=np.take(a, 0, axis=axis))
>>
>> rather than
>>
>> >>> np.diff(a, axis=axis, keep_left=True)
>>
>> which of course could just be an option upon what you already have.
>>

So my suggestion was intended that you might want an additional
keyword argument (keep_left=False) to make the inverse np.cumsum
use-case easier, i.e. you would have something in your np.diff like:

if keep_left:
if to_begin is None:
to_begin = np.take(a, [0], axis=axis)
else:
raise ValueError(‘np.diff(a, keep_left=False, to_begin=None)
can be used with either keep_left or to_begin, but not both.’)

Generally I try to avoid optional keyword argument overlap, but in
this case it is probably justified.

Peter
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] padding options for diff

2016-10-26 Thread Matthew Harrigan
The inverse of cumsum is actually a little more unweildy since you can't
drop a dimension with take.  This returns the original array (numerical
caveats aside):

np.cumsum(np.diff(x, to_begin=x.take([0], axis=axis), axis=axis), axis=axis)

That's certainly not going to win any beauty contests.  The 1d case is
clean though:

np.cumsum(np.diff(x, to_begin=x[0]))

I'm not sure if this means the API should change, and if so how.  Higher
dimensional arrays seem to just have extra complexity.

On Tue, Oct 25, 2016 at 1:26 PM, Peter Creasey <
p.e.creasey...@googlemail.com> wrote:

> > Date: Mon, 24 Oct 2016 08:44:46 -0400
> > From: Matthew Harrigan 
> >
> > I posted a pull request  which
> > adds optional padding kwargs "to_begin" and "to_end" to diff.  Those
> > options are based on what's available in ediff1d.  It closes this issue
> > 
>
> I like the proposal, though I suspect that making it general has
> obscured that the most common use-case for padding is to make the
> inverse of np.cumsum (at least that’s what I frequently need), and now
> in the multidimensional case you have the somewhat unwieldy:
>
> >>> np.diff(a, axis=axis, to_begin=np.take(a, 0, axis=axis))
>
> rather than
>
> >>> np.diff(a, axis=axis, keep_left=True)
>
> which of course could just be an option upon what you already have.
>
> Best,
> Peter
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] padding options for diff

2016-10-25 Thread Peter Creasey
> Date: Mon, 24 Oct 2016 08:44:46 -0400
> From: Matthew Harrigan 
>
> I posted a pull request  which
> adds optional padding kwargs "to_begin" and "to_end" to diff.  Those
> options are based on what's available in ediff1d.  It closes this issue
> 

I like the proposal, though I suspect that making it general has
obscured that the most common use-case for padding is to make the
inverse of np.cumsum (at least that’s what I frequently need), and now
in the multidimensional case you have the somewhat unwieldy:

>>> np.diff(a, axis=axis, to_begin=np.take(a, 0, axis=axis))

rather than

>>> np.diff(a, axis=axis, keep_left=True)

which of course could just be an option upon what you already have.

Best,
Peter
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] padding options for diff

2016-10-24 Thread Stephan Hoyer
This looks like a welcome addition in functionality! It will be nice to be
able to finally (soft) deprecate ediff1d.

On Mon, Oct 24, 2016 at 5:44 AM, Matthew Harrigan <
harrigan.matt...@gmail.com> wrote:

> I posted a pull request  which
> adds optional padding kwargs "to_begin" and "to_end" to diff.  Those
> options are based on what's available in ediff1d.  It closes this issue
> 
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] padding options for diff

2016-10-24 Thread Matthew Harrigan
I posted a pull request  which
adds optional padding kwargs "to_begin" and "to_end" to diff.  Those
options are based on what's available in ediff1d.  It closes this issue

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion