[Numpy-discussion] Re: Add to NumPy a function to compute cumulative sums from 0.

Michael Siebert Sun, 20 Aug 2023 05:25:14 -0700

Dear all,

another aspect to think about is that there is not only cumsum. There are other cumulative aggregations as well (whether or not they have top-level np functions, like cummax is represented by np.maximum.accumulate):

1. cumprod: there instead of starting with zero one would need to start with one

2. cummax: start with -np.inf

3. cummin: start with np.inf

4. Maybe more? Those so far came to my

mind.

So introducing a parameter for cummax and not one for the others would be some sort of inconsistency. For e.g. cummax and cumprod, all data types (int8, int16, …, float, double) support 0 and 1, for cummax and cummin one would need types that support infinity and negative numbers (to make it meaningfully convertable to other types).

And how would such a cumprod be called if one wanted to give it a new name? cumprod0 or cumprod1?

Just some thoughts.

Best, Michael

On 19. Aug 2023, at 19:02, Dom Grigonis <[email protected]> wrote:

Unfortunately, I don’t have a good answer.

For now, I can only tell you what I think might benefit from improvement.

1. Verbosity. I appreciate that bracket syntax such as one in julia or matlab `[A B C ...]` is not possible, so functional is the only option. E.g. julia has functions named ‘cat’, ‘vcat’, ‘hcat’, ‘vhcat’. I myself have recently redefined np.concatenate to `np_c`. For simple operations, it would surely be nice to have methods. E.g. `arr.append(axis)/arr.prepend(axis)`.

2. Excessive number of functions. There seems to be very many functions for concatenating and stacking. Many operations can be done using different functions and approaches and usually one of them is several times faster than the rest. I will give an example. Stacking two 1d vectors as columns of 2d array:
arr = np.arange(100)
TIMER.repeat([
    lambda: np.array([arr, arr]).T,
    lambda: np.vstack([arr, arr]).T,
    lambda: np.stack([arr, arr]).T,
    lambda: np.c_[arr, arr],
    lambda: np.column_stack((arr, arr)),
    lambda: np.concatenate([arr[:, None], arr[:, None]], axis=1)
]).print(3)
# mean [[0.012 0.044 0.052 0.13  0.032 0.024]]
Instead, having fewer, but more intuitive/flexible and well optimised functions would be a bit more convenient.

3. Flattening and reshaping API is not very intuitive. e.g. torch flatten is an example of a function which has a desired level of flexibility in contrast to `np.flatten`. https://pytorch.org/docs/stable/generated/torch.flatten.html. I had similar issues with multidimensional searching, sorting, multi-dimensional overlaps and custom unique functions. In other words, all functionality is there already, but in more custom (although requirement is often very simple from perspective of how it looks in my mind) multi-dimensional cases, there is no easy API and I end up writing my own numpy functions and benchmarking numerous ways to achieve the same thing. By now, I have my own multi-dimensional unique, sort, search, flatten, more flexible ix_, which are not well tested, but already more convenient, flexible and often several times faster than numpy ones (although all they do is reuse existing numpy functionality).

I think these are more along the lines of numpy 2.0, rather than simple extension. It feels that API can generally be more flexible and intuitive and there is enough of existing numpy material and external examples from which to draw from to make next level API happen. Although I appreciate required effort and difficulties.

Having all that said, implementing julia’s equivalents ‘cat’, ‘vcat’, ‘hcat’, ‘vhcat’ together with `arr.append(others, axis), arr.prepend(others, axis)` while ensuring that they use most optimised approaches could potentially make life easier for the time being.

—Nothing ever dies, just enters the state of deferred evaluation—
Dg

On 19 Aug 2023, at 17:39, Ronald van Elburg <[email protected]> wrote:

I think ultimately the copy is unnecessary.

That being said introducing prepend and append functions concentrates the complexity of the mapping in one place. Trying to avoid the extra copy would probably lead to a more complex implementation of accumulate.

How would in your view the prepend interface differ from concatenation or stacking?
_______________________________________________
NumPy-Discussion mailing list -- [email protected]
To unsubscribe send an email to [email protected]
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: [email protected]

_______________________________________________
NumPy-Discussion mailing list -- [email protected]
To unsubscribe send an email to [email protected]
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: [email protected]

_______________________________________________
NumPy-Discussion mailing list -- [email protected]
To unsubscribe send an email to [email protected]
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: [email protected]

[Numpy-discussion] Re: Add to NumPy a function to compute cumulative sums from 0.

Reply via email to