First of all, I really love the docs of the C API :) It's way above what I
would expect!

I was reviewing the signature possibilities for generalized UFuncs, and had
a question

https://numpy.org/doc/stable/reference/c-api/generalized-ufuncs.html

I am playing with a UFunc that scores and returns some top N, where N could
be specified the user. IE the user might do

get_most_similar(X, y, n=10)

You can imagine situations where this could happen in similarity functions,
where we want to get some Top N rows of X most similar to y. But
sometimes users will want 10, or 100, or need to page through results etc.
For performance reasons, I wouldn't want to maintain an index of every row
of X, I'd prefer to only have to care about the top 10 or so.

I wonder what the best way to do this?

One thought I had was always set the output dimension to 10 for now, and
handle paging on the python side by perhaps also having an offset parameter
for my function, to window into the similar results.

The second thought I had was to just get 100 instead of 10, as that
probably is enough for most use cases. And users can slice out what they
need. It's a little annoying in terms of perf cost, but probably not a big
deal.

But it would be convenient to just let the user specify the N they want.

Thanks for any insights!
-Doug
_______________________________________________
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com

Reply via email to