[Numpy-discussion] Proposal: np.search() to complement np.searchsorted()
Hello, I've opened up a pull request to add a function called np.search(), or something like it, to complement np.searchsorted(): https://github.com/numpy/numpy/pull/9055 There's also this issue I opened before starting the PR: https://github.com/numpy/numpy/issues/9052 Proposed API changes require discussion on the list, so here I am! This proposed function (and perhaps array method?) does the same as np.searchsorted(a, v), but doesn't require `a` to be sorted, and explicitly checks if all the values in `v` are a subset of those in `a`. If not, it currently raises an error, but that could be controlled via a kwarg. As I mentioned in the PR, I often find myself abusing np.searchsorted() by not explicitly checking these assumptions. The temptation to use it is great, because it's such a fast and convenient function, and most of the time that I use it, the assumptions are indeed valid. Explicitly checking those assumptions each and every time before I use np.searchsorted() is tedious, and easy to forget to do. I wouldn't be surprised if many others abuse np.searchsorted() in the same way. Looking at my own habits and uses, it seems to me that finding the indices of matching values of one array in another is a more common use case than finding insertion indices of one array into another sorted array. So, I propose that np.search(), or something like it, could be even more useful than np.searchsorted(). Thoughts? Martin ___ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Proposal: np.search() to complement np.searchsorted()
On 2017-05-09 07:39 PM, Stephan Hoyer wrote: On Tue, May 9, 2017 at 9:46 AM, Martin Spacek mailto:nu...@mspacek.mm.st>> wrote: Looking at my own habits and uses, it seems to me that finding the indices of matching values of one array in another is a more common use case than finding insertion indices of one array into another sorted array. So, I propose that np.search(), or something like it, could be even more useful than np.searchsorted(). The current version of this PR only returns the indices of the /first/ match (rather than all matches), which is an important detail. I would strongly consider including that detail in the name (e.g., by calling this "find_first" rather than "search"), because my naive expectation for a method called "search" is to find all matches. In any case, I agree that this functionality would be welcome. Getting the details right for a high performance solution is tricky, and there is strong evidence of interest given the 200+ upvotes on this StackOverflow question: http://stackoverflow.com/questions/432112/is-there-a-numpy-function-to-return-the-first-index-of-something-in-an-array Good point about it only finding the first hit. However, `np.find_first` sounds a bit awkward to me. I've updated the PR to have a `which` kwarg that specifies whether it should return the first or last hit for values in `v` that have multiple hits in `a`. https://github.com/numpy/numpy/pull/9055 Not sure if I already mentioned this somewhere, but we might also consider naming this `np.index` due to its analogous behaviour to the Python list method `.index()`. Martin ___ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion