If such issue is at numpy level, eg xor, which tests for number truth value is equal to n: xor([1, 1, 0], 2) == True xor([1, 0, 0], 2) == False
I try to use builtin iterator functions for efficiency, such as combination of filter + next. If, however, the problem is at numpy level, I find `numba` does a pretty good job. I had a similar issue and I couldn’t beat numba’s performance with Cython. Most likely due to the reason that I don’t know how to use Cython most optimally, but in my experience numba is good enough. import numba as nb import numpy as np @nb.njit def inner(x, func): result = np.full(x.shape[0], -1, dtype=np.int32) for i in range(x.shape[0]): for j in range(x.shape[1]): if func(x[i, j]): result[i] = j break return result def first_true_nb_func(arr, cond): func = nb.njit(cond) return inner(arr, func) @nb.njit def first_true_nb(arr): result = np.full(arr.shape[0], -1, dtype=np.int32) for i in range(arr.shape[0]): for j in range(arr.shape[1]): if arr[i, j] > 4: result[i] = j break return result def first_true(arr, cond): result = np.full(arr.shape[0], -1, dtype=np.int32) for i in range(arr.shape[0]): for j in range(arr.shape[1]): if cond(arr[i, j]): result[i] = j break return result arr = np.array([[1,5],[2,7],[9,10]]) print(first_true_nb_func(arr, lambda x: x > 4)) # [1, 1, 0] 163 ms print(first_true(arr, lambda x: x > 4)) # [1, 1, 0] 4.48 µs print(first_true_nb(arr)) # [1, 1, 0] 1.02 µs # LARGER ARRAY arr = 4 + np.random.normal(0, 1, (100, 5)) print(first_true_nb_func(arr, lambda x: x > 4)) # 152 ms print(first_true(arr, lambda x: x > 4)) # 69.7 µs print(first_true_nb(arr)) # 1.02 µs So numba is a very good option if not needing to source callable. Although I think with certain size numba with callable would outperform pure-python solution. Having that said, I completely support the idea that optimised mechanism for such situations was part of numpy. Maybe np.where_first_n(arr, op, value, n=1, axis=None), where op is a selection of standard comparison operators. Args: * Obviously having `cond` to be a callable would be most flexible, but not sure if it was easy to achieve good performance with it. Same as in example above. * `first`, `last` args are not needed as input can be the slice view. * where_last_n is not needed as input can be reversed view. Regards, DG > On 26 Oct 2023, at 16:07, Ilhan Polat <ilhanpo...@gmail.com> wrote: > > It's typically called short-circuiting or quick exit when the target > condition is met. > > if you have an array a = np.array([-1, 2, 3, 4, ...., 10000]) and you are > looking for a true/false result whether anything is negative or not (a < > 0).any() will generate a bool array equal to and then check all entries of > that bool array just to reach the conclusion true which was already true at > the first entry. Instead it spends 10000 units of time for all entries. > > We did similar things on SciPy side Cython level, but they are not really > competitive, instead more of a convenience. More general discussion I opened > is in https://github.com/data-apis/array-api/issues/675 > <https://github.com/data-apis/array-api/issues/675> > > > > > > On Thu, Oct 26, 2023 at 2:52 PM Dom Grigonis <dom.grigo...@gmail.com > <mailto:dom.grigo...@gmail.com>> wrote: > Could you please give a concise example? I know you have provided one, but it > is engrained deep in verbose text and has some typos in it, which makes hard > to understand exactly what inputs should result in what output. > > Regards, > DG > > > On 25 Oct 2023, at 22:59, rosko37 <rosk...@gmail.com > > <mailto:rosk...@gmail.com>> wrote: > > > > I know this question has been asked before, both on this list as well as > > several threads on Stack Overflow, etc. It's a common issue. I'm NOT asking > > for how to do this using existing Numpy functions (as that information can > > be found in any of those sources)--what I'm asking is whether Numpy would > > accept inclusion of a function that does this, or whether (possibly more > > likely) such a proposal has already been considered and rejected for some > > reason. > > > > The task is this--there's a large array and you want to find the next > > element after some index that satisfies some condition. Such elements are > > common, and the typical number of elements to be searched through is small > > relative to the size of the array. Therefore, it would greatly improve > > performance to avoid testing ALL elements against the conditional once one > > is found that returns True. However, all built-in functions that I know of > > test the entire array. > > > > One can obviously jury-rig some ways, like for instance create a "for" loop > > over non-overlapping slices of length slice_length and call something like > > np.where(cond) on each--that outer "for" loop is much faster than a loop > > over individual elements, and the inner loop at most will go slice_length-1 > > elements past the first "hit". However, needing to use such a convoluted > > piece of code for such a simple task seems to go against the Numpy spirit > > of having one operation being one function of the form func(arr)". > > > > A proposed function for this, let's call it "np.first_true(arr, start_idx, > > [stop_idx])" would be best implemented at the C code level, possibly in the > > same code file that defines np.where. I'm wondering if I, or someone else, > > were to write such a function, if the Numpy developers would consider > > merging it as a standard part of the codebase. It's possible that the idea > > of such a function is bad because it would violate some existing > > broadcasting or fancy indexing rules. Clearly one could make it possible to > > pass an "axis" argument to np.first_true() that would select an axis to > > search over in the case of multi-dimensional arrays, and then the result > > would be an array of indices of one fewer dimension than the original > > array. So np.first_true(np.array([1,5],[2,7],[9,10],cond) would return > > [1,1,0] for cond(x): x>4. The case where no elements satisfy the condition > > would need to return a "signal value" like -1. But maybe there are some > > weird cases where there isn't a sensible return val > ue, hence why such a function has not been added. > > > > -Andrew Rosko > > _______________________________________________ > > NumPy-Discussion mailing list -- numpy-discussion@python.org > > <mailto:numpy-discussion@python.org> > > To unsubscribe send an email to numpy-discussion-le...@python.org > > <mailto:numpy-discussion-le...@python.org> > > https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ > > <https://mail.python.org/mailman3/lists/numpy-discussion.python.org/> > > Member address: dom.grigo...@gmail.com <mailto:dom.grigo...@gmail.com> > > _______________________________________________ > NumPy-Discussion mailing list -- numpy-discussion@python.org > <mailto:numpy-discussion@python.org> > To unsubscribe send an email to numpy-discussion-le...@python.org > <mailto:numpy-discussion-le...@python.org> > https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ > <https://mail.python.org/mailman3/lists/numpy-discussion.python.org/> > Member address: ilhanpo...@gmail.com <mailto:ilhanpo...@gmail.com> > _______________________________________________ > NumPy-Discussion mailing list -- numpy-discussion@python.org > To unsubscribe send an email to numpy-discussion-le...@python.org > https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ > Member address: dom.grigo...@gmail.com
_______________________________________________ NumPy-Discussion mailing list -- numpy-discussion@python.org To unsubscribe send an email to numpy-discussion-le...@python.org https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ Member address: arch...@mail-archive.com