[Numpy-discussion] Re: Function that searches arrays for the first element that satisfies a condition

2023-11-01 Thread Juan Nunez-Iglesias
Have you tried timing things? Thankfully this is easy to test because the 
Python source of numba-jitted functions is available at jitted_func.py_func.

In [23]: @numba.njit
...: def _first(arr, pred):
...: for i, elem in enumerate(arr):
...: if pred(elem):
...: return i
...:
...: def first(arr, pred):
...: _pred = numba.njit(pred)
...: return _first(arr, _pred)
...:

In [24]: arr = np.random.random(100_000_000)

In [25]: %timeit first(arr, lambda x: x > 5)
72 ms ± 1.36 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [26]: %timeit arr + 5
90.3 ms ± 762 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

In [27]: %timeit _first.py_func(arr, lambda x: x > 5)
7.8 s ± 46.2 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

So numba gives a >100x speedup. It's still not as fast as a NumPy function call 
that doesn't have an allocation overhead:

In [30]: arr2 = np.empty_like(arr, dtype=bool)

In [32]: %timeit np.greater(arr, 5, out=arr2)
13.9 ms ± 69.6 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

But it's certainly much better than pure Python! And it's not a huge cost for 
the flexibility.

Juan.

On Wed, 1 Nov 2023, at 10:42 AM, Dom Grigonis wrote:
> This results in a very slow code. The function calls of 
> 
> 
> _pred = numba.njit(pred)
> 
> are expensive and this sort of approach will be comparable to pure python 
> functions.
> 
> This is only recommended for sourcing functions that are not called 
> frequently, but rather have a large computational content within them. In 
> other words not suitable for predicates.
> 
> Regards,
> DG
> 
>> On 1 Nov 2023, at 01:05, Juan Nunez-Iglesias  wrote:
>> 
>> If you add a layer of indirection with Numba you can get a *very* nice API:
>> 
>> @numba.njit
>> def _first(arr, pred):
>> for i, elem in enumerate(arr):
>> if pred(elem):
>> return i
>> 
>> def first(arr, pred):
>> _pred = numba.njit(pred)
>> return _first(arr, _pred)
>> 
>> This even works with lambdas! (TIL, thanks Numba devs!)
>> 
>> >>> first(np.random.random(10_000_000), lambda x: x > 0.99)
>> 215
>> 
>> Since Numba has ufunc support I don't suppose it would be hard to make it 
>> work with an axis= argument, but I've never played with that API myself.
>> 
>> On Tue, 31 Oct 2023, at 6:49 PM, Lev Maximov wrote:
>>> I've implemented such functions in Cython and packaged them into a library 
>>> called numpy_illustrated 
>>> 
>>> It exposes the following functions:
>>> 
>>> find(a, v)  # returns the index of the first occurrence of v in a
>>> first_above(a, v)   # returns the index of the first element in a that is 
>>> strictly above v
>>> first_nonzero(a)   # returns the index of the first nonzero element
>>> 
>>> They scan the array and bail out immediately once the match is found. Have 
>>> a significant performance gain if the element to be
>>> found is closer to the beginning of the array. Have roughly the same speed 
>>> as alternative methods if the value is missing.
>>> 
>>> The complete signatures of the functions look like this:
>>> 
>>> find(a, v, rtol=1e-05, atol=1e-08, sorted=False, default=-1, raises=False)
>>> first_above(a, v, sorted=False, missing=-1, raises=False)
>>> first_nonzero(a, missing=-1, raises=False)
>>> 
>>> This covers the most common use cases and does not accept Python callbacks 
>>> because accepting them would nullify any speed gain
>>> one would expect from such a function. A Python callback can be implemented 
>>> with Numba, but anyone who can write the callback
>>> in Numba has no need for a library that wraps it into a dedicated function.
>>> 
>>> The library has a 100% test coverage. Code style 'black'. It should be easy 
>>> to add functions like 'first_below' if necessary.
>>> 
>>> A more detailed description of these functions can be found here 
>>> .
>>> 
>>> Best regards,
>>>   Lev Maximov
>>> 
>>> On Tue, Oct 31, 2023 at 3:50 AM Dom Grigonis  wrote:
 I juggled a bit and found pretty nice solution using numba. Which is 
 probably not very robust, but proves that such thing can be optimised 
 while retaining flexibility. Check if it works for your use cases and let 
 me know if anything fails or if it is slow compared to what you used.
 
 
 
 first_true_str = """
 def first_true(arr, n):
 result = np.full((n, arr.shape[1]), -1, dtype=np.int32)
 for j in range(arr.shape[1]):
 k = 0
 for i in range(arr.shape[0]):
 x = arr[i:i + 1, j]
 if cond(x):
 result[k, j] = i
 k += 1
 if k >= n:
 break
 return result
 """
 
 
 *class* *FirstTrue*:
 CONTEXT = {'np': np}
 

[Numpy-discussion] Re: Function that searches arrays for the first element that satisfies a condition

2023-11-01 Thread Neal Becker
Thanks Juan, this is really great!  I plan to make use of this right away.

On Wed, Nov 1, 2023 at 8:13 AM Juan Nunez-Iglesias  wrote:

> Have you tried timing things? Thankfully this is easy to test because the
> Python source of numba-jitted functions is available at jitted_func.py_func.
>
> In [23]: @numba.njit
> ...: def _first(arr, pred):
> ...: for i, elem in enumerate(arr):
> ...: if pred(elem):
> ...: return i
> ...:
> ...: def first(arr, pred):
> ...: _pred = numba.njit(pred)
> ...: return _first(arr, _pred)
> ...:
>
> In [24]: arr = np.random.random(100_000_000)
>
> In [25]: %timeit first(arr, lambda x: x > 5)
> 72 ms ± 1.36 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
>
> In [26]: %timeit arr + 5
> 90.3 ms ± 762 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
>
> In [27]: %timeit _first.py_func(arr, lambda x: x > 5)
> 7.8 s ± 46.2 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
>
> So numba gives a >100x speedup. It's still not as fast as a NumPy function
> call that doesn't have an allocation overhead:
>
> In [30]: arr2 = np.empty_like(arr, dtype=bool)
>
> In [32]: %timeit np.greater(arr, 5, out=arr2)
> 13.9 ms ± 69.6 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
>
> But it's certainly much better than pure Python! And it's not a huge cost
> for the flexibility.
>
> Juan.
>
> On Wed, 1 Nov 2023, at 10:42 AM, Dom Grigonis wrote:
>
> This results in a very slow code. The function calls of
>
> _pred = numba.njit(pred)
>
> are expensive and this sort of approach will be comparable to pure python
> functions.
>
> This is only recommended for sourcing functions that are not called
> frequently, but rather have a large computational content within them. In
> other words not suitable for predicates.
>
> Regards,
> DG
>
> On 1 Nov 2023, at 01:05, Juan Nunez-Iglesias  wrote:
>
> If you add a layer of indirection with Numba you can get a *very* nice API:
>
> @numba.njit
> def _first(arr, pred):
> for i, elem in enumerate(arr):
> if pred(elem):
> return i
>
> def first(arr, pred):
> _pred = numba.njit(pred)
> return _first(arr, _pred)
>
> This even works with lambdas! (TIL, thanks Numba devs!)
>
> >>> first(np.random.random(10_000_000), lambda x: x > 0.99)
> 215
>
> Since Numba has ufunc support I don't suppose it would be hard to make it
> work with an axis= argument, but I've never played with that API myself.
>
> On Tue, 31 Oct 2023, at 6:49 PM, Lev Maximov wrote:
>
> I've implemented such functions in Cython and packaged them into a library
> called numpy_illustrated 
>
> It exposes the following functions:
>
> find(a, v)  # returns the index of the first occurrence of v in a
> first_above(a, v)   # returns the index of the first element in a that is
> strictly above v
> first_nonzero(a)   # returns the index of the first nonzero element
>
> They scan the array and bail out immediately once the match is found. Have
> a significant performance gain if the element to be
> found is closer to the beginning of the array. Have roughly the same speed
> as alternative methods if the value is missing.
>
> The complete signatures of the functions look like this:
>
> find(a, v, rtol=1e-05, atol=1e-08, sorted=False, default=-1, raises=False)
> first_above(a, v, sorted=False, missing=-1, raises=False)
> first_nonzero(a, missing=-1, raises=False)
>
> This covers the most common use cases and does not accept Python callbacks
> because accepting them would nullify any speed gain
> one would expect from such a function. A Python callback can be
> implemented with Numba, but anyone who can write the callback
> in Numba has no need for a library that wraps it into a dedicated function.
>
> The library has a 100% test coverage. Code style 'black'. It should be
> easy to add functions like 'first_below' if necessary.
>
> A more detailed description of these functions can be found here
> 
> .
>
> Best regards,
>   Lev Maximov
>
> On Tue, Oct 31, 2023 at 3:50 AM Dom Grigonis 
> wrote:
>
> I juggled a bit and found pretty nice solution using numba. Which is
> probably not very robust, but proves that such thing can be optimised while
> retaining flexibility. Check if it works for your use cases and let me know
> if anything fails or if it is slow compared to what you used.
>
>
> first_true_str = """def first_true(arr, n):result = np.full((n, 
> arr.shape[1]), -1, dtype=np.int32)for j in range(arr.shape[1]):k 
> = 0for i in range(arr.shape[0]):x = arr[i:i + 1, j]   
>  if cond(x):result[k, j] = ik += 1
> if k >= n:breakreturn result"""
>
> *class* *FirstTrue*:
> CONTEXT = {'np': np}
>
> *def* __init__(self, expr):
>  

[Numpy-discussion] Re: Function that searches arrays for the first element that satisfies a condition

2023-11-01 Thread Dom Grigonis

Your comparisons do not paint correct picture.
a) Most of time here is spent for array allocation and the actual time of the 
loop gets swallowed in the noise
a) You do not test early stop - your benchmarks simply test full loop as 
condition is almost never hit - 5 standard deviations...

Here is comparison with Cython:

import npi
arr = np.random.random(100_000)
%timeit npi.first_above(arr, 5) # 66.7 µs
%timeit first(arr, lambda x: x > 5) # 83.8 ms
arr[100] += 10
%timeit npi.first_above(arr, 5) # 16.2 µs
%timeit first(arr, lambda x: x > 5) # 86.9 ms

It is in the magnitude of 1000 x slower.

Regards,
DG

> On 1 Nov 2023, at 14:07, Juan Nunez-Iglesias  wrote:
> 
> Have you tried timing things? Thankfully this is easy to test because the 
> Python source of numba-jitted functions is available at jitted_func.py_func.
> 
> In [23]: @numba.njit
> ...: def _first(arr, pred):
> ...: for i, elem in enumerate(arr):
> ...: if pred(elem):
> ...: return i
> ...:
> ...: def first(arr, pred):
> ...: _pred = numba.njit(pred)
> ...: return _first(arr, _pred)
> ...:
> 
> In [24]: arr = np.random.random(100_000_000)
> 
> In [25]: %timeit first(arr, lambda x: x > 5)
> 72 ms ± 1.36 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
> 
> In [26]: %timeit arr + 5
> 90.3 ms ± 762 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
> 
> In [27]: %timeit _first.py_func(arr, lambda x: x > 5)
> 7.8 s ± 46.2 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
> 
> So numba gives a >100x speedup. It's still not as fast as a NumPy function 
> call that doesn't have an allocation overhead:
> 
> In [30]: arr2 = np.empty_like(arr, dtype=bool)
> 
> In [32]: %timeit np.greater(arr, 5, out=arr2)
> 13.9 ms ± 69.6 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
> 
> But it's certainly much better than pure Python! And it's not a huge cost for 
> the flexibility.
> 
> Juan.
> 
> On Wed, 1 Nov 2023, at 10:42 AM, Dom Grigonis wrote:
>> This results in a very slow code. The function calls of 
>> 
>> 
>> _pred = numba.njit(pred)
>> 
>> are expensive and this sort of approach will be comparable to pure python 
>> functions.
>> 
>> This is only recommended for sourcing functions that are not called 
>> frequently, but rather have a large computational content within them. In 
>> other words not suitable for predicates.
>> 
>> Regards,
>> DG
>> 
>>> On 1 Nov 2023, at 01:05, Juan Nunez-Iglesias >> > wrote:
>>> 
>>> If you add a layer of indirection with Numba you can get a *very* nice API:
>>> 
>>> @numba.njit
>>> def _first(arr, pred):
>>> for i, elem in enumerate(arr):
>>> if pred(elem):
>>> return i
>>> 
>>> def first(arr, pred):
>>> _pred = numba.njit(pred)
>>> return _first(arr, _pred)
>>> 
>>> This even works with lambdas! (TIL, thanks Numba devs!)
>>> 
>>> >>> first(np.random.random(10_000_000), lambda x: x > 0.99)
>>> 215
>>> 
>>> Since Numba has ufunc support I don't suppose it would be hard to make it 
>>> work with an axis= argument, but I've never played with that API myself.
>>> 
>>> On Tue, 31 Oct 2023, at 6:49 PM, Lev Maximov wrote:
 I've implemented such functions in Cython and packaged them into a library 
 called numpy_illustrated 
 
 It exposes the following functions:
 
 find(a, v)  # returns the index of the first occurrence of v in a
 first_above(a, v)   # returns the index of the first element in a that is 
 strictly above v
 first_nonzero(a)   # returns the index of the first nonzero element
 
 They scan the array and bail out immediately once the match is found. Have 
 a significant performance gain if the element to be
 found is closer to the beginning of the array. Have roughly the same speed 
 as alternative methods if the value is missing.
 
 The complete signatures of the functions look like this:
 
 find(a, v, rtol=1e-05, atol=1e-08, sorted=False, default=-1, raises=False)
 first_above(a, v, sorted=False, missing=-1, raises=False)
 first_nonzero(a, missing=-1, raises=False)
 
 This covers the most common use cases and does not accept Python callbacks 
 because accepting them would nullify any speed gain
 one would expect from such a function. A Python callback can be 
 implemented with Numba, but anyone who can write the callback
 in Numba has no need for a library that wraps it into a dedicated function.
 
 The library has a 100% test coverage. Code style 'black'. It should be 
 easy to add functions like 'first_below' if necessary.
 
 A more detailed description of these functions can be found here 
 .
 
 Best regards,
   Lev Maximov
 
 On Tue, Oct 31, 2023 at 3:50 AM Do

[Numpy-discussion] Re: Function that searches arrays for the first element that satisfies a condition

2023-11-01 Thread Dom Grigonis
Ok, I think I didn’t do a full justice.

So actually, numba is fairly fast, given everything is precompiled. Predicates 
still make things slower, but not as much as I initially thought.

However, the way your code is structured, it has to re-compile both predicate 
and target function (with new predicate) on every call.

If you source the same already compiled predicate, it does not need to 
recompile the target function. See timings below.

Full scan results:
* Pre-compiled numba function (hardcoded predicate) is fastest
* Pre-compiled numba function (with flexible predicate) is second in speed and 
in line with Cython
* Pure python function is significantly slower, but 3rd place
* Any need to compile numba for one-off call makes it the slowest option

Early stopping results:
* Pre-compiled numba function (hardcoded predicate) is significantly faster 
than anything else
* Pre-compiled numba function (with flexible predicate), cython and pure python 
perform roughly the same

Conclusions:
* Use pre-compiled hardcoded numba when it is re-used many times
* Use pre-compiled numba with predicates when it is re-used many times with 
several variations (so flexibility justifies decrease in performance)
* Use Cython for one-off cases whenever it can handle it
* Use pure python for one-off cases, when Cython isn’t flexible enough.

Finally, just use numpy when prototyping and optimise later.

These tests are done on smaller arrays (100K size) than you tested. With large 
arrays such as yours there is an extra dimension to take into account, which I 
don’t think is relevant for majority of use cases, but obviously needs to be 
explored when working with arrays of such size.


Regards,
DG

import numpy as np
import numba as nb
import npi


def first_pure_py(arr, pred):
for i, elem in enumerate(arr):
if pred(elem):
return i

_first = nb.njit(first_pure_py)


def first_nb(arr, pred):
return _first(arr, nb.njit(pred))

@nb.njit
def first_pure_nb(arr):
for i, elem in enumerate(arr):
if elem > 5:
return i


pred = lambda x: x > 5
pred_cmp = nb.njit(pred)

# FULL SCAN
arr = np.random.random(100_000)
TIMER.repeat([
lambda: first_pure_py(arr, pred),   # 0.15640
lambda: nb.njit(first_pure_py)(arr, pred_cmp),  # 0.54549 cmp target
lambda: nb.njit(pred)(1),   # 0.32500 cmp predicate
lambda: first_nb(arr, pred),# 0.84759 = sum of previous 
2
lambda: _first(arr, pred_cmp),  # 0.00069 pre-compiled
lambda: first_pure_nb(arr), # 0.00052
lambda: npi.first_above(arr, 5),# 0.00071
]).print(5, idx=1, t=True)  # {'reps': 5, 'n': 10}

# EARLY STOPPING
arr2 = np.random.random(100_000)
arr2[100] += 10
TIMER.repeat([
lambda: first_pure_py(arr2, pred),  # 0.00014
lambda: nb.njit(first_pure_py)(arr, pred_cmp),  # 0.55114 cmp target
lambda: nb.njit(pred)(1),   # 0.31801 cmp predicate
lambda: first_nb(arr2, pred),   # 0.83330 = sum of previous 
2
lambda: _first(arr2, pred_cmp), # 0.00013 pre-compiled
lambda: first_pure_nb(arr2),# 0.1
lambda: npi.first_above(arr2, 5),   # 0.00021
]).print(5, idx=1, t=True)  # {'reps': 5, 'n': 10}


> On 1 Nov 2023, at 16:19, Dom Grigonis  wrote:
> 
> 
> Your comparisons do not paint correct picture.
> a) Most of time here is spent for array allocation and the actual time of the 
> loop gets swallowed in the noise
> a) You do not test early stop - your benchmarks simply test full loop as 
> condition is almost never hit - 5 standard deviations...
> 
> Here is comparison with Cython:
> 
> import npi
> arr = np.random.random(100_000)
> %timeit npi.first_above(arr, 5) # 66.7 µs
> %timeit first(arr, lambda x: x > 5) # 83.8 ms
> arr[100] += 10
> %timeit npi.first_above(arr, 5) # 16.2 µs
> %timeit first(arr, lambda x: x > 5) # 86.9 ms
> 
> It is in the magnitude of 1000 x slower.
> 
> Regards,
> DG
> 
>> On 1 Nov 2023, at 14:07, Juan Nunez-Iglesias > > wrote:
>> 
>> Have you tried timing things? Thankfully this is easy to test because the 
>> Python source of numba-jitted functions is available at jitted_func.py_func.
>> 
>> In [23]: @numba.njit
>> ...: def _first(arr, pred):
>> ...: for i, elem in enumerate(arr):
>> ...: if pred(elem):
>> ...: return i
>> ...:
>> ...: def first(arr, pred):
>> ...: _pred = numba.njit(pred)
>> ...: return _first(arr, _pred)
>> ...:
>> 
>> In [24]: arr = np.random.random(100_000_000)
>> 
>> In [25]: %timeit first(arr, lambda x: x > 5)
>> 72 ms ± 1.36 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
>> 
>> In [26]: %timeit arr + 5
>> 90.3 ms ± 762 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
>> 
>> In [27]: %timeit _fi