[Numpy-discussion] Generalized UFunc without output dimension specified as argument
First of all, I really love the docs of the C API :) It's way above what I would expect! I was reviewing the signature possibilities for generalized UFuncs, and had a question https://numpy.org/doc/stable/reference/c-api/generalized-ufuncs.html I am playing with a UFunc that scores and returns some top N, where N could be specified the user. IE the user might do get_most_similar(X, y, n=10) You can imagine situations where this could happen in similarity functions, where we want to get some Top N rows of X most similar to y. But sometimes users will want 10, or 100, or need to page through results etc. For performance reasons, I wouldn't want to maintain an index of every row of X, I'd prefer to only have to care about the top 10 or so. I wonder what the best way to do this? One thought I had was always set the output dimension to 10 for now, and handle paging on the python side by perhaps also having an offset parameter for my function, to window into the similar results. The second thought I had was to just get 100 instead of 10, as that probably is enough for most use cases. And users can slice out what they need. It's a little annoying in terms of perf cost, but probably not a big deal. But it would be convenient to just let the user specify the N they want. Thanks for any insights! -Doug ___ NumPy-Discussion mailing list -- numpy-discussion@python.org To unsubscribe send an email to numpy-discussion-le...@python.org https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ Member address: arch...@mail-archive.com
[Numpy-discussion] Re: Generalized UFunc without output dimension specified as argument
On Sun, Aug 20, 2023 at 7:33 AM Doug Turnbull wrote: > First of all, I really love the docs of the C API :) It's way above what I > would expect! > > I was reviewing the signature possibilities for generalized UFuncs, and > had a question > > https://numpy.org/doc/stable/reference/c-api/generalized-ufuncs.html > > I am playing with a UFunc that scores and returns some top N, where N > could be specified the user. IE the user might do > > get_most_similar(X, y, n=10) > > You can imagine situations where this could happen in similarity > functions, where we want to get some Top N rows of X most similar to y. But > sometimes users will want 10, or 100, or need to page through results etc. > For performance reasons, I wouldn't want to maintain an index of every row > of X, I'd prefer to only have to care about the top 10 or so. > > I wonder what the best way to do this? > > One thought I had was always set the output dimension to 10 for now, and > handle paging on the python side by perhaps also having an offset parameter > for my function, to window into the similar results. > > The second thought I had was to just get 100 instead of 10, as that > probably is enough for most use cases. And users can slice out what they > need. It's a little annoying in terms of perf cost, but probably not a big > deal. > > But it would be convenient to just let the user specify the N they want. > > Thanks for the suggestion, Doug. This is something I've thought about too. In fact, I've drafted a proposal at https://github.com/WarrenWeckesser/numpy-notes/blob/main/enhancements/gufunc-shape-only-params.md for allowing "shape only" parameters of a gufunc. This is the first time that I've announced that proposal on the mailing list. Any comments from NumPy devs would be appreciated. Warren Thanks for any insights! > -Doug > ___ > NumPy-Discussion mailing list -- numpy-discussion@python.org > To unsubscribe send an email to numpy-discussion-le...@python.org > https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ > Member address: warren.weckes...@gmail.com > ___ NumPy-Discussion mailing list -- numpy-discussion@python.org To unsubscribe send an email to numpy-discussion-le...@python.org https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ Member address: arch...@mail-archive.com
[Numpy-discussion] next NumPy triage meeting
The next NumPy triage meeting will be held this Wednesday, August 23rd at 5pm UTC. This is a meeting where we synchronously triage prioritized PRs and issues. Join us via Zoom: https://numfocus-org.zoom.us/j/82096749952?pwd=MW9oUmtKQ1c3a2gydGk1RTdYUUVXZz09 . Everyone is welcome to attend and contribute to a conversation. Please notify us of issues or PRs that you’d like to have reviewed by adding a GitHub link to them in the meeting agenda: https://hackmd.io/68i_JvOYQfy9ERiHgXMPvg. -- Cheers, Inessa Inessa Pawson Contributor Experience Lead | NumPy https://numpy.org/ GitHub: inessapawson ___ NumPy-Discussion mailing list -- numpy-discussion@python.org To unsubscribe send an email to numpy-discussion-le...@python.org https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ Member address: arch...@mail-archive.com
[Numpy-discussion] Re: Add to NumPy a function to compute cumulative sums from 0.
Dear all,another aspect to think about is that there is not only cumsum. There are other cumulative aggregations as well (whether or not they have top-level np functions, like cummax is represented by np.maximum.accumulate):1. cumprod: there instead of starting with zero one would need to start with one2. cummax: start with -np.inf3. cummin: start with np.inf4. Maybe more? Those so far came to mymind.So introducing a parameter for cummax and not one for the others would be some sort of inconsistency. For e.g. cummax and cumprod, all data types (int8, int16, …, float, double) support 0 and 1, for cummax and cummin one would need types that support infinity and negative numbers (to make it meaningfully convertable to other types).And how would such a cumprod be called if one wanted to give it a new name? cumprod0 or cumprod1?Just some thoughts.Best, MichaelOn 19. Aug 2023, at 19:02, Dom Grigonis wrote:Unfortunately, I don’t have a good answer.For now, I can only tell you what I think might benefit from improvement.1. Verbosity. I appreciate that bracket syntax such as one in julia or matlab `[A B C ...]` is not possible, so functional is the only option. E.g. julia has functions named ‘cat’, ‘vcat’, ‘hcat’, ‘vhcat’. I myself have recently redefined np.concatenate to `np_c`. For simple operations, it would surely be nice to have methods. E.g. `arr.append(axis)/arr.prepend(axis)`.2. Excessive number of functions. There seems to be very many functions for concatenating and stacking. Many operations can be done using different functions and approaches and usually one of them is several times faster than the rest. I will give an example. Stacking two 1d vectors as columns of 2d array: arr = np.arange(100) TIMER.repeat([ lambda: np.array([arr, arr]).T, lambda: np.vstack([arr, arr]).T, lambda: np.stack([arr, arr]).T, lambda: np.c_[arr, arr], lambda: np.column_stack((arr, arr)), lambda: np.concatenate([arr[:, None], arr[:, None]], axis=1) ]).print(3) # mean [[0.012 0.044 0.052 0.13 0.032 0.024]]Instead, having fewer, but more intuitive/flexible and well optimised functions would be a bit more convenient.3. Flattening and reshaping API is not very intuitive. e.g. torch flatten is an example of a function which has a desired level of flexibility in contrast to `np.flatten`. https://pytorch.org/docs/stable/generated/torch.flatten.html. I had similar issues with multidimensional searching, sorting, multi-dimensional overlaps and custom unique functions. In other words, all functionality is there already, but in more custom (although requirement is often very simple from perspective of how it looks in my mind) multi-dimensional cases, there is no easy API and I end up writing my own numpy functions and benchmarking numerous ways to achieve the same thing. By now, I have my own multi-dimensional unique, sort, search, flatten, more flexible ix_, which are not well tested, but already more convenient, flexible and often several times faster than numpy ones (although all they do is reuse existing numpy functionality).I think these are more along the lines of numpy 2.0, rather than simple extension. It feels that API can generally be more flexible and intuitive and there is enough of existing numpy material and external examples from which to draw from to make next level API happen. Although I appreciate required effort and difficulties.Having all that said, implementing julia’s equivalents ‘cat’, ‘vcat’, ‘hcat’, ‘vhcat’ together with `arr.append(others, axis), arr.prepend(others, axis)` while ensuring that they use most optimised approaches could potentially make life easier for the time being. —Nothing ever dies, just enters the state of deferred evaluation—Dg On 19 Aug 2023, at 17:39, Ronald van Elburgwrote:I think ultimately the copy is unnecessary.That being said introducing prepend and append functions concentrates the complexity of the mapping in one place. Trying to avoid the extra copy would probably lead to a more complex implementation of accumulate. How would in your view the prepend interface differ from concatenation or stacking?___NumPy-Discussion mailing list -- numpy-discussion@python.orgTo unsubscribe send an email to numpy-discussion-le...@python.orghttps://mail.python.org/mailman3/lists/numpy-discussion.python.org/Member address: dom.grigo...@gmail.com___NumPy-Discussion mailing list -- numpy-discussion@python.orgTo unsubscribe send an email to numpy-discussion-le...@python.orghttps://mail.python.org/mailman3/lists/numpy-discussion.python.org/Member address: michael.sieber...@gmail.com___ NumPy-Discussion mailing list -- numpy-discussion@python.org To unsubscribe send an email to numpy-discussion-le...@python.org https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ Member address: arch...@mail-