[Numpy-discussion] Proposal: np.search() to complement np.searchsorted()

2017-05-09 Thread Martin Spacek

Hello,

I've opened up a pull request to add a function called np.search(), or something 
like it, to complement np.searchsorted():


https://github.com/numpy/numpy/pull/9055

There's also this issue I opened before starting the PR:

https://github.com/numpy/numpy/issues/9052

Proposed API changes require discussion on the list, so here I am!

This proposed function (and perhaps array method?) does the same as 
np.searchsorted(a, v), but doesn't require `a` to be sorted, and explicitly 
checks if all the values in `v` are a subset of those in `a`. If not, it 
currently raises an error, but that could be controlled via a kwarg.


As I mentioned in the PR, I often find myself abusing np.searchsorted() by not 
explicitly checking these assumptions. The temptation to use it is great, 
because it's such a fast and convenient function, and most of the time that I 
use it, the assumptions are indeed valid. Explicitly checking those assumptions 
each and every time before I use np.searchsorted() is tedious, and easy to 
forget to do. I wouldn't be surprised if many others abuse np.searchsorted() in 
the same way.


Looking at my own habits and uses, it seems to me that finding the indices of 
matching values of one array in another is a more common use case than finding 
insertion indices of one array into another sorted array. So, I propose that 
np.search(), or something like it, could be even more useful than np.searchsorted().


Thoughts?

Martin
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Proposal: np.search() to complement np.searchsorted()

2017-05-15 Thread Martin Spacek

On 2017-05-09 07:39 PM, Stephan Hoyer wrote:

On Tue, May 9, 2017 at 9:46 AM, Martin Spacek mailto:nu...@mspacek.mm.st>> wrote:

Looking at my own habits and uses, it seems to me that finding the indices
of matching values of one array in another is a more common use case than
finding insertion indices of one array into another sorted array. So, I
propose that np.search(), or something like it, could be even more useful
than np.searchsorted().


The current version of this PR only returns the indices of the /first/ match
(rather than all matches), which is an important detail. I would strongly
consider including that detail in the name (e.g., by calling this "find_first"
rather than "search"), because my naive expectation for a method called "search"
is to find all matches.

In any case, I agree that this functionality would be welcome. Getting the
details right for a high performance solution is tricky, and there is strong
evidence of interest given the 200+ upvotes on this StackOverflow question:
http://stackoverflow.com/questions/432112/is-there-a-numpy-function-to-return-the-first-index-of-something-in-an-array



Good point about it only finding the first hit. However, `np.find_first` sounds 
a bit awkward to me. I've updated the PR to have a `which` kwarg that specifies 
whether it should return the first or last hit for values in `v` that have 
multiple hits in `a`.


https://github.com/numpy/numpy/pull/9055

Not sure if I already mentioned this somewhere, but we might also consider 
naming this `np.index` due to its analogous behaviour to the Python list method 
`.index()`.


Martin
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion