[Numpy-discussion] Re: Function that searches arrays for the first element that satisfies a condition

Dom Grigonis Thu, 26 Oct 2023 07:02:04 -0700

If such issue is at numpy level,
eg xor, which tests for number truth value is equal to n:
xor([1, 1, 0], 2) == True
xor([1, 0, 0], 2) == False


I try to use builtin iterator functions for efficiency, such as combination of 
filter + next.


If, however, the problem is at numpy level, I find `numba` does a pretty good 
job. I had a similar issue and I couldn’t beat numba’s performance with Cython. 
Most likely due to the reason that I don’t know how to use Cython most 
optimally, but in my experience numba is good enough.

import numba as nb
import numpy as np

@nb.njit
def inner(x, func):
    result = np.full(x.shape[0], -1, dtype=np.int32)
    for i in range(x.shape[0]):
        for j in range(x.shape[1]):
            if func(x[i, j]):
                result[i] = j
                break
    return result

def first_true_nb_func(arr, cond):
    func = nb.njit(cond)
    return inner(arr, func)


@nb.njit
def first_true_nb(arr):
    result = np.full(arr.shape[0], -1, dtype=np.int32)
    for i in range(arr.shape[0]):
        for j in range(arr.shape[1]):
            if arr[i, j] > 4:
                result[i] = j
                break
    return result


def first_true(arr, cond):
    result = np.full(arr.shape[0], -1, dtype=np.int32)
    for i in range(arr.shape[0]):
        for j in range(arr.shape[1]):
            if cond(arr[i, j]):
                result[i] = j
                break
    return result


arr = np.array([[1,5],[2,7],[9,10]])
print(first_true_nb_func(arr, lambda x: x > 4)) # [1, 1, 0] 163 ms
print(first_true(arr, lambda x: x > 4))         # [1, 1, 0] 4.48 µs
print(first_true_nb(arr))                       # [1, 1, 0] 1.02 µs

# LARGER ARRAY
arr = 4 + np.random.normal(0, 1, (100, 5))
print(first_true_nb_func(arr, lambda x: x > 4)) # 152 ms
print(first_true(arr, lambda x: x > 4))         # 69.7 µs
print(first_true_nb(arr))                       # 1.02 µs
So numba is a very good option if not needing to source callable. Although I 
think with certain size numba with callable would outperform pure-python 
solution.


Having that said, I completely support the idea that optimised mechanism for 
such situations was part of numpy. Maybe np.where_first_n(arr, op, value, n=1, 
axis=None), where op is a selection of standard comparison operators.

Args:
* Obviously having `cond` to be a callable would be most flexible, but not sure 
if it was easy to achieve good performance with it. Same as in example above.
* `first`, `last` args are not needed as input can be the slice view.
* where_last_n is not needed as input can be reversed view.

Regards,
DG


> On 26 Oct 2023, at 16:07, Ilhan Polat <ilhanpo...@gmail.com> wrote:
> 
> It's typically called short-circuiting or quick exit when the target 
> condition is met. 
> 
> if you have an array a = np.array([-1, 2, 3, 4, ...., 10000]) and you are 
> looking for a true/false result whether anything is negative or not (a < 
> 0).any() will generate a bool array equal to and then check all entries of 
> that bool array just to reach the conclusion true which was already true at 
> the first entry. Instead it spends 10000 units of time for all entries.
> 
> We did similar things on SciPy side Cython level, but they are not really 
> competitive, instead more of a convenience. More general discussion I opened 
> is in https://github.com/data-apis/array-api/issues/675 
> <https://github.com/data-apis/array-api/issues/675>
> 
> 
> 
> 
> 
> On Thu, Oct 26, 2023 at 2:52 PM Dom Grigonis <dom.grigo...@gmail.com 
> <mailto:dom.grigo...@gmail.com>> wrote:
> Could you please give a concise example? I know you have provided one, but it 
> is engrained deep in verbose text and has some typos in it, which makes hard 
> to understand exactly what inputs should result in what output.
> 
> Regards,
> DG
> 
> > On 25 Oct 2023, at 22:59, rosko37 <rosk...@gmail.com 
> > <mailto:rosk...@gmail.com>> wrote:
> > 
> > I know this question has been asked before, both on this list as well as 
> > several threads on Stack Overflow, etc. It's a common issue. I'm NOT asking 
> > for how to do this using existing Numpy functions (as that information can 
> > be found in any of those sources)--what I'm asking is whether Numpy would 
> > accept inclusion of a function that does this, or whether (possibly more 
> > likely) such a proposal has already been considered and rejected for some 
> > reason.
> > 
> > The task is this--there's a large array and you want to find the next 
> > element after some index that satisfies some condition. Such elements are 
> > common, and the typical number of elements to be searched through is small 
> > relative to the size of the array. Therefore, it would greatly improve 
> > performance to avoid testing ALL elements against the conditional once one 
> > is found that returns True. However, all built-in functions that I know of 
> > test the entire array. 
> > 
> > One can obviously jury-rig some ways, like for instance create a "for" loop 
> > over non-overlapping slices of length slice_length and call something like 
> > np.where(cond) on each--that outer "for" loop is much faster than a loop 
> > over individual elements, and the inner loop at most will go slice_length-1 
> > elements past the first "hit". However, needing to use such a convoluted 
> > piece of code for such a simple task seems to go against the Numpy spirit 
> > of having one operation being one function of the form func(arr)".
> > 
> > A proposed function for this, let's call it "np.first_true(arr, start_idx, 
> > [stop_idx])" would be best implemented at the C code level, possibly in the 
> > same code file that defines np.where. I'm wondering if I, or someone else, 
> > were to write such a function, if the Numpy developers would consider 
> > merging it as a standard part of the codebase. It's possible that the idea 
> > of such a function is bad because it would violate some existing 
> > broadcasting or fancy indexing rules. Clearly one could make it possible to 
> > pass an "axis" argument to np.first_true() that would select an axis to 
> > search over in the case of multi-dimensional arrays, and then the result 
> > would be an array of indices of one fewer dimension than the original 
> > array. So np.first_true(np.array([1,5],[2,7],[9,10],cond) would return 
> > [1,1,0] for cond(x): x>4. The case where no elements satisfy the condition 
> > would need to return a "signal value" like -1. But maybe there are some 
> > weird cases where there isn't a sensible return val
>  ue, hence why such a function has not been added.
> > 
> > -Andrew Rosko
> > _______________________________________________
> > NumPy-Discussion mailing list -- numpy-discussion@python.org 
> > <mailto:numpy-discussion@python.org>
> > To unsubscribe send an email to numpy-discussion-le...@python.org 
> > <mailto:numpy-discussion-le...@python.org>
> > https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ 
> > <https://mail.python.org/mailman3/lists/numpy-discussion.python.org/>
> > Member address: dom.grigo...@gmail.com <mailto:dom.grigo...@gmail.com>
> 
> _______________________________________________
> NumPy-Discussion mailing list -- numpy-discussion@python.org 
> <mailto:numpy-discussion@python.org>
> To unsubscribe send an email to numpy-discussion-le...@python.org 
> <mailto:numpy-discussion-le...@python.org>
> https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ 
> <https://mail.python.org/mailman3/lists/numpy-discussion.python.org/>
> Member address: ilhanpo...@gmail.com <mailto:ilhanpo...@gmail.com>
> _______________________________________________
> NumPy-Discussion mailing list -- numpy-discussion@python.org
> To unsubscribe send an email to numpy-discussion-le...@python.org
> https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
> Member address: dom.grigo...@gmail.com

_______________________________________________
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com

[Numpy-discussion] Re: Function that searches arrays for the first element that satisfies a condition

Reply via email to