[Numpy-discussion] Re: Add to NumPy a function to compute cumulative sums from 0.

Dom Grigonis Tue, 15 Aug 2023 10:49:59 -0700


> On 14 Aug 2023, at 15:22, john.daw...@camlingroup.com wrote:
> 
>> From my point of view, such function is a bit of a corner-case to be added 
>> to numpy. And it doesn’t justify it’s naming anymore. It is not one 
>> operation anymore. It is a cumsum and prepending 0. And it is very difficult 
>> to argue why prepending 0 to cumsum is a part of cumsum.
> 
> That is backwards. Consider the array [x0, x1, x2].
> 
> The sum of the first 0 elements is 0.
> The sum of the first 1 elements is x0.
> The sum of the first 2 elements is x0+x1.
> The sum of the first 3 elements is x0+x1+x2.
> 
> Hence, the array of partial sums is [0, x0, x0+x1, x0+x1+x2].
> 
> Thus, the operation [x0, x1, x2] -> [0, x0, x0+x1, x0+x1+x2] is a natural and 
> primitive one.
> 
> The current behaviour of numpy.cumsum is the composition of two basic 
> operations, computing the partial sums and omitting the initial value:
> 
> [x0, x1, x2] -> [0, x0, x0+x1, x0+x1+x2] -> [x0, x0+x1, x0+x1+x2].
In reality both of these functions do exactly what they need to do. But the 
issue, as I understand it, is to have one of these in such way, so that they 
are inverses of each other. The only question is which one is better suitable 
for it and provides most benefits.


Arguments for np.diff0:
1. Dimension length stays constant, while cumusm0 extends length to n+1, then 
np.diff, truncates it back. This adds extra complexity, while things are very 
convenient to work with when dimension length stays constant throughout the 
code.
2. Although I see your argument about element 0, but the fact is that it 
doesn’t exist at all. in np.diff0 case at least half of it exists and the other 
half has a half decent rationale. In cumsum0 case it just appeared out of 
nowhere and in your example above you are providing very different logic to 
what np.cumsum is intrinsically. Ilhan has accurately pointed it out in his 
e-mail.

For now, I only see my point of view and I can list a number of cases from data 
analysis and modelling, where I found np.diff0 to be a fairly optimal choice to 
use and it made things smoother. While I haven’t seen any real-life examples 
where np.cumsum0 would be useful so I am naturally biased. I would appreciate 
If anyone provided some examples that justify np.cumsum0 - for now I just can’t 
think of any case where this could actually be useful or why it would be more 
convenient/sensible than np.diff0.

>> What I would rather vouch for is adding an argument to `np.diff` so that it 
>> leaves first row unmodified.
>> def diff0(a, axis=-1):
>>    """Differencing which appends first item along the axis"""
>>    a0 = np.take(a, [0], axis=axis)
>>    return np.concatenate([a0, np.diff(a, n=1, axis=axis)], axis=axis)
>> This would be more sensible from conceptual point of view. As difference can 
>> not be made, the result is the difference from absolute origin. With 
>> recognition that first non-origin value in a sequence is the one after it. 
>> And if the first row is the origin in a specific case, then that origin is 
>> correctly defined in relation to absolute origin.
>> Then, if origin row is needed, then it can be prepended in the beginning of 
>> a procedure. And np.diff and np.cumsum are inverses throughout the 
>> sequential code.
>> np.diff0 was one the first functions I had added to my numpy utils and been 
>> using it instead of np.diff quite a lot.
> 
> This suggestion is bad: diff0 is conceptually confused. numpy.diff changes an 
> array of numpy.datetime64s to an array of numpy.timedelta64s, but numpy.diff0 
> changes an array of numpy.datetime64s to a heterogeneous array where one 
> element is a numpy.datetime64 and the rest are numpy.timedelta64s. In 
> general, whereas numpy.diff changes an array of positions to an array of 
> displacements, diff0 changes an array of positions to a heterogeneous array 
> where one element is a position and the rest are displacements.


This isn’t really argument against np.diff0, just one aspect of it which would 
have to be dealt with. If instead of just prepending, the difference from 0 was 
made, it would result in numpy.timedelta64s. So not a big issue.

_______________________________________________
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com

[Numpy-discussion] Re: Add to NumPy a function to compute cumulative sums from 0.

Reply via email to