[Numpy-discussion] Re: Function that searches arrays for the first element that satisfies a condition

rosko37 Mon, 30 Oct 2023 07:24:30 -0700

An example with a 1-D array (where it is easiest to see what I mean) is the
following. I will follow Dom Grigonis's suggestion that the range not be
provided as a separate argument, as it can be just as easily "folded into"
the array by passing a slice. So it becomes just:
idx = first_true(arr, cond)


As Dom also points out, the "cond" would likely need to be a "function
pointer" (i.e., the name of a function defined elsewhere, turning
first_true into a higher-order function), unless there's some way to pass a
parseable expression for simple cases. A few special cases like the first
zero/nonzero element could be handled with dedicated options (sort of like
matplotlib colors), but for anything beyond that it gets unwieldy fast.

So let's say we have this:
******************
def cond(x):
    return x>50

search_arr = np.exp(np.arange(0,1000))

print(np.first_true(search_arr, cond))
*******************

This should print 4, because the element of search_arr at index 4 (i.e. the
5th element) is e^4, which is slightly greater than 50 (while e^3 is less
than 50). It should return this *without testing the 6th through 1000th
elements of the array at all to see whether they exceed 50 or not*. This
example is rather contrived, because simply taking the natural log of 50
and rounding up is far superior, not even *evaluating the array of
exponentials *(which my example clearly still does--and in the use cases
I've had for such a function, I can't predict the array elements like
this--they come from loaded data, the output of a simulation, etc., and are
all already in a numpy array). And in this case, since the values are
strictly increasing, search_sorted() would work as well. But it illustrates
the idea.





On Thu, Oct 26, 2023 at 5:54 AM Dom Grigonis <dom.grigo...@gmail.com> wrote:

> Could you please give a concise example? I know you have provided one, but
> it is engrained deep in verbose text and has some typos in it, which makes
> hard to understand exactly what inputs should result in what output.
>
> Regards,
> DG
>
> > On 25 Oct 2023, at 22:59, rosko37 <rosk...@gmail.com> wrote:
> >
> > I know this question has been asked before, both on this list as well as
> several threads on Stack Overflow, etc. It's a common issue. I'm NOT asking
> for how to do this using existing Numpy functions (as that information can
> be found in any of those sources)--what I'm asking is whether Numpy would
> accept inclusion of a function that does this, or whether (possibly more
> likely) such a proposal has already been considered and rejected for some
> reason.
> >
> > The task is this--there's a large array and you want to find the next
> element after some index that satisfies some condition. Such elements are
> common, and the typical number of elements to be searched through is small
> relative to the size of the array. Therefore, it would greatly improve
> performance to avoid testing ALL elements against the conditional once one
> is found that returns True. However, all built-in functions that I know of
> test the entire array.
> >
> > One can obviously jury-rig some ways, like for instance create a "for"
> loop over non-overlapping slices of length slice_length and call something
> like np.where(cond) on each--that outer "for" loop is much faster than a
> loop over individual elements, and the inner loop at most will go
> slice_length-1 elements past the first "hit". However, needing to use such
> a convoluted piece of code for such a simple task seems to go against the
> Numpy spirit of having one operation being one function of the form
> func(arr)".
> >
> > A proposed function for this, let's call it "np.first_true(arr,
> start_idx, [stop_idx])" would be best implemented at the C code level,
> possibly in the same code file that defines np.where. I'm wondering if I,
> or someone else, were to write such a function, if the Numpy developers
> would consider merging it as a standard part of the codebase. It's possible
> that the idea of such a function is bad because it would violate some
> existing broadcasting or fancy indexing rules. Clearly one could make it
> possible to pass an "axis" argument to np.first_true() that would select an
> axis to search over in the case of multi-dimensional arrays, and then the
> result would be an array of indices of one fewer dimension than the
> original array. So np.first_true(np.array([1,5],[2,7],[9,10],cond) would
> return [1,1,0] for cond(x): x>4. The case where no elements satisfy the
> condition would need to return a "signal value" like -1. But maybe there
> are some weird cases where there isn't a sensible return val
>  ue, hence why such a function has not been added.
> >
> > -Andrew Rosko
> > _______________________________________________
> > NumPy-Discussion mailing list -- numpy-discussion@python.org
> > To unsubscribe send an email to numpy-discussion-le...@python.org
> > https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
> > Member address: dom.grigo...@gmail.com
>
> _______________________________________________
> NumPy-Discussion mailing list -- numpy-discussion@python.org
> To unsubscribe send an email to numpy-discussion-le...@python.org
> https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
> Member address: rosk...@gmail.com
>

_______________________________________________
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com

[Numpy-discussion] Re: Function that searches arrays for the first element that satisfies a condition

Reply via email to