This has come up before, see https://github.com/numpy/numpy/issues/6044 for
the first time this came up; there were several subsequent discussions
linked there.

In the meantime, the data APIs consortium has been actively working on
adding a `cumulative_sum` function to the array API standard, see
https://github.com/data-apis/array-api/issues/597 and
https://github.com/data-apis/array-api/pull/653. The proposed
`cumulative_sum` function includes an `include_initial` keyword argument
that gets the OP's desired behavior.

I think we should probably eventually deprecate `cumsum` and `cumprod` in
favor of the array API standard's `cumulative_sum` and `cumulative_product`
if only because of the embarrassing naming issue. Once the array API
standard has finalized the name for the keyword argument, I think it makes
sense to add the keyword argument to np.cumsum, even if we don't deprecate
it yet. I don't think it makes sense to add a new function just for this.

On Fri, Aug 11, 2023 at 6:34 AM <john.daw...@camlingroup.com> wrote:

> `cumsum` computes the sum of the first k summands for every k from 1.
> Judging by my experience, it is more often useful to compute the sum of the
> first k summands for every k from 0, as `cumsum`'s behaviour leads to
> fencepost-like problems.
> https://en.wikipedia.org/wiki/Off-by-one_error#Fencepost_error
> For example, `cumsum` is not the inverse of `diff`. I propose adding a
> function to NumPy to compute cumulative sums beginning with 0, that is, an
> inverse of `diff`. It might be called `cumsum0`. The following code is
> probably not the best way to implement it, but it illustrates the desired
> behaviour.
>
> ```
> def cumsum0(a, axis=None, dtype=None, out=None):
>     """
>     Return the cumulative sum of the elements along a given axis,
>     beginning with 0.
>
>     cumsum0 does the same as cumsum except that cumsum computes the sum
>     of the first k summands for every k from 1 and cumsum, from 0.
>
>     Parameters
>     ----------
>     a : array_like
>         Input array.
>     axis : int, optional
>         Axis along which the cumulative sum is computed. The default
>         (None) is to compute the cumulative sum over the flattened
>         array.
>     dtype : dtype, optional
>         Type of the returned array and of the accumulator in which the
>         elements are summed. If `dtype` is not specified, it defaults to
>         the dtype of `a`, unless `a` has an integer dtype with a
>         precision less than that of the default platform integer. In
>         that case, the default platform integer is used.
>     out : ndarray, optional
>         Alternative output array in which to place the result. It must
>         have the same shape and buffer length as the expected output but
>         the type will be cast if necessary. See
>         :ref:`ufuncs-output-type` for more details.
>
>     Returns
>     -------
>     cumsum0_along_axis : ndarray.
>         A new array holding the result is returned unless `out` is
>         specified, in which case a reference to `out` is returned. If
>         `axis` is not None the result has the same shape as `a` except
>         along `axis`, where the dimension is smaller by 1.
>
>     See Also
>     --------
>     cumsum : Cumulatively sum array elements, beginning with the first.
>     sum : Sum array elements.
>     trapz : Integration of array values using the composite trapezoidal
> rule.
>     diff : Calculate the n-th discrete difference along given axis.
>
>     Notes
>     -----
>     Arithmetic is modular when using integer types, and no error is
>     raised on overflow.
>
>     ``cumsum0(a)[-1]`` may not be equal to ``sum(a)`` for floating-point
>     values since ``sum`` may use a pairwise summation routine, reducing
>     the roundoff-error. See `sum` for more information.
>
>     Examples
>     --------
>     >>> a = np.array([[1, 2, 3], [4, 5, 6]])
>     >>> a
>     array([[1, 2, 3],
>            [4, 5, 6]])
>     >>> np.cumsum0(a)
>     array([ 0,  1,  3,  6, 10, 15, 21])
>     >>> np.cumsum0(a, dtype=float)  # specifies type of output value(s)
>     array([ 0.,  1.,  3.,  6., 10., 15., 21.])
>
>     >>> np.cumsum0(a, axis=0)  # sum over rows for each of the 3 columns
>     array([[0, 0, 0],
>            [1, 2, 3],
>            [5, 7, 9]])
>     >>> np.cumsum0(a, axis=1)  # sum over columns for each of the 2 rows
>     array([[ 0,  1,  3,  6],
>            [ 0,  4,  9, 15]])
>
>     ``cumsum(b)[-1]`` may not be equal to ``sum(b)``
>
>     >>> b = np.array([1, 2e-9, 3e-9] * 1000000)
>     >>> np.cumsum0(b)[-1]
>     1000000.0050045159
>     >>> b.sum()
>     1000000.0050000029
>
>     """
>     empty = a.take([], axis=axis)
>     zero = empty.sum(axis, dtype=dtype, keepdims=True)
>     later_cumsum = a.cumsum(axis, dtype=dtype)
>     return concatenate([zero, later_cumsum], axis=axis, dtype=dtype,
> out=out)
> ```
> _______________________________________________
> NumPy-Discussion mailing list -- numpy-discussion@python.org
> To unsubscribe send an email to numpy-discussion-le...@python.org
> https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
> Member address: nathan12...@gmail.com
>
_______________________________________________
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com

Reply via email to