If such issue is at numpy level,
eg xor, which tests for number truth value is equal to n:
xor([1, 1, 0], 2) == True
xor([1, 0, 0], 2) == False
I try to use builtin iterator functions for efficiency, such as combination of
filter + next.
If, however, the problem is at numpy level, I find `numba` does a pretty good
job. I had a similar issue and I couldn’t beat numba’s performance with Cython.
Most likely due to the reason that I don’t know how to use Cython most
optimally, but in my experience numba is good enough.
import numba as nb
import numpy as np
@nb.njit
def inner(x, func):
result = np.full(x.shape[0], -1, dtype=np.int32)
for i in range(x.shape[0]):
for j in range(x.shape[1]):
if func(x[i, j]):
result[i] = j
break
return result
def first_true_nb_func(arr, cond):
func = nb.njit(cond)
return inner(arr, func)
@nb.njit
def first_true_nb(arr):
result = np.full(arr.shape[0], -1, dtype=np.int32)
for i in range(arr.shape[0]):
for j in range(arr.shape[1]):
if arr[i, j] > 4:
result[i] = j
break
return result
def first_true(arr, cond):
result = np.full(arr.shape[0], -1, dtype=np.int32)
for i in range(arr.shape[0]):
for j in range(arr.shape[1]):
if cond(arr[i, j]):
result[i] = j
break
return result
arr = np.array([[1,5],[2,7],[9,10]])
print(first_true_nb_func(arr, lambda x: x > 4)) # [1, 1, 0] 163 ms
print(first_true(arr, lambda x: x > 4)) # [1, 1, 0] 4.48 µs
print(first_true_nb(arr)) # [1, 1, 0] 1.02 µs
# LARGER ARRAY
arr = 4 + np.random.normal(0, 1, (100, 5))
print(first_true_nb_func(arr, lambda x: x > 4)) # 152 ms
print(first_true(arr, lambda x: x > 4)) # 69.7 µs
print(first_true_nb(arr)) # 1.02 µs
So numba is a very good option if not needing to source callable. Although I
think with certain size numba with callable would outperform pure-python
solution.
Having that said, I completely support the idea that optimised mechanism for
such situations was part of numpy. Maybe np.where_first_n(arr, op, value, n=1,
axis=None), where op is a selection of standard comparison operators.
Args:
* Obviously having `cond` to be a callable would be most flexible, but not sure
if it was easy to achieve good performance with it. Same as in example above.
* `first`, `last` args are not needed as input can be the slice view.
* where_last_n is not needed as input can be reversed view.
Regards,
DG
> On 26 Oct 2023, at 16:07, Ilhan Polat <[email protected]> wrote:
>
> It's typically called short-circuiting or quick exit when the target
> condition is met.
>
> if you have an array a = np.array([-1, 2, 3, 4, ...., 10000]) and you are
> looking for a true/false result whether anything is negative or not (a <
> 0).any() will generate a bool array equal to and then check all entries of
> that bool array just to reach the conclusion true which was already true at
> the first entry. Instead it spends 10000 units of time for all entries.
>
> We did similar things on SciPy side Cython level, but they are not really
> competitive, instead more of a convenience. More general discussion I opened
> is in https://github.com/data-apis/array-api/issues/675
> <https://github.com/data-apis/array-api/issues/675>
>
>
>
>
>
> On Thu, Oct 26, 2023 at 2:52 PM Dom Grigonis <[email protected]
> <mailto:[email protected]>> wrote:
> Could you please give a concise example? I know you have provided one, but it
> is engrained deep in verbose text and has some typos in it, which makes hard
> to understand exactly what inputs should result in what output.
>
> Regards,
> DG
>
> > On 25 Oct 2023, at 22:59, rosko37 <[email protected]
> > <mailto:[email protected]>> wrote:
> >
> > I know this question has been asked before, both on this list as well as
> > several threads on Stack Overflow, etc. It's a common issue. I'm NOT asking
> > for how to do this using existing Numpy functions (as that information can
> > be found in any of those sources)--what I'm asking is whether Numpy would
> > accept inclusion of a function that does this, or whether (possibly more
> > likely) such a proposal has already been considered and rejected for some
> > reason.
> >
> > The task is this--there's a large array and you want to find the next
> > element after some index that satisfies some condition. Such elements are
> > common, and the typical number of elements to be searched through is small
> > relative to the size of the array. Therefore, it would greatly improve
> > performance to avoid testing ALL elements against the conditional once one
> > is found that returns True. However, all built-in functions that I know of
> > test the entire array.
> >
> > One can obviously jury-rig some ways, like for instance create a "for" loop
> > over non-overlapping slices of length slice_length and call something like
> > np.where(cond) on each--that outer "for" loop is much faster than a loop
> > over individual elements, and the inner loop at most will go slice_length-1
> > elements past the first "hit". However, needing to use such a convoluted
> > piece of code for such a simple task seems to go against the Numpy spirit
> > of having one operation being one function of the form func(arr)".
> >
> > A proposed function for this, let's call it "np.first_true(arr, start_idx,
> > [stop_idx])" would be best implemented at the C code level, possibly in the
> > same code file that defines np.where. I'm wondering if I, or someone else,
> > were to write such a function, if the Numpy developers would consider
> > merging it as a standard part of the codebase. It's possible that the idea
> > of such a function is bad because it would violate some existing
> > broadcasting or fancy indexing rules. Clearly one could make it possible to
> > pass an "axis" argument to np.first_true() that would select an axis to
> > search over in the case of multi-dimensional arrays, and then the result
> > would be an array of indices of one fewer dimension than the original
> > array. So np.first_true(np.array([1,5],[2,7],[9,10],cond) would return
> > [1,1,0] for cond(x): x>4. The case where no elements satisfy the condition
> > would need to return a "signal value" like -1. But maybe there are some
> > weird cases where there isn't a sensible return val
> ue, hence why such a function has not been added.
> >
> > -Andrew Rosko
> > _______________________________________________
> > NumPy-Discussion mailing list -- [email protected]
> > <mailto:[email protected]>
> > To unsubscribe send an email to [email protected]
> > <mailto:[email protected]>
> > https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
> > <https://mail.python.org/mailman3/lists/numpy-discussion.python.org/>
> > Member address: [email protected] <mailto:[email protected]>
>
> _______________________________________________
> NumPy-Discussion mailing list -- [email protected]
> <mailto:[email protected]>
> To unsubscribe send an email to [email protected]
> <mailto:[email protected]>
> https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
> <https://mail.python.org/mailman3/lists/numpy-discussion.python.org/>
> Member address: [email protected] <mailto:[email protected]>
> _______________________________________________
> NumPy-Discussion mailing list -- [email protected]
> To unsubscribe send an email to [email protected]
> https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
> Member address: [email protected]
_______________________________________________
NumPy-Discussion mailing list -- [email protected]
To unsubscribe send an email to [email protected]
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: [email protected]