On Thu, 2018-04-26 at 19:26 +0200, Sebastian Berg wrote:
> On Thu, 2018-04-26 at 09:51 -0700, Hameer Abbasi wrote:
> > Hi Nathan,
> > 
> > np.any and np.all call np.or.reduce and np.and.reduce respectively,
> > and unfortunately the underlying function (ufunc.reduce) has no way
> > of detecting that the value isn’t going to change anymore. It’s
> > also
> > used for (for example) np.sum (np.add.reduce), np.prod
> > (np.multiply.reduce), np.min(np.minimum.reduce),
> > np.max(np.maximum.reduce).
> 
> 
> I would like to point out that this is not almost, but not quite
> true.
> The boolean versions will short circuit on the innermost level, which
> is good enough for all practical purposes probably.
> 
> One way to get around it would be to use a chunked iteration using
> np.nditer in pure python. I admit it is a bit tricky to get start on,
> but it is basically what numexpr uses also (at least in the simplest
> mode), and if your arrays are relatively large, there is likely no
> real
> performance hit compared to a non-pure python version.
> 

I mean something like this:

def check_any(arr, func=lambda x: x, buffersize=0):
    """
    Check if the function is true for any value in arr and stop once the first 
was found.
    
    Parameters
    ----------
    arr : ndarray
        Array to test.
    func : function
        Function taking a 1D array as argument and returning an array (on which 
``np.any``
        will be called.
    buffersize : int
        Size of the chunk/buffer in the iteration, zero will use the default 
numpy value.
    Notes
    -----
    The stopping does not occur immediatly but in buffersize chunks.
    """
    iterflags = ['buffered', 'external_loop', 'refs_ok', 'zerosize_ok']
    for chunk in np.nditer((arr,), flags=iterflags, buffersize=buffersize):
        if np.any(func(chunk)):
            return True
    
    return False


not sure how it performs actually, but you can give it a try especially
if you know you have large arrays, or if "func" is pretty expensive.
If the input is already bool, it will be quite a bit slower though I am
sure.

- Sebastian



> - Sebastian
> 
> 
> 
> > 
> > You can find more information about this on the ufunc doc page. I
> > don’t think it’s worth it to break this machinery for any and all,
> > as
> > it has numerous other advantages (such as being able to override in
> > duck arrays, etc)
> > 
> > Best regards,
> > Hameer Abbasi
> > Sent from Astro for Mac
> > 
> > > On Apr 26, 2018 at 18:45, Nathan Goldbaum <nathan12...@gmail.com>
> > > wrote:
> > > 
> > > Hi all,
> > > 
> > > I was surprised recently to discover that both np.any and
> > > np.all()
> > > do not have a way to exit early:
> > > 
> > > In [1]: import numpy as np
> > > 
> > > In [2]: data = np.arange(1e6)
> > > 
> > > In [3]: print(data[:10])
> > > [0. 1. 2. 3. 4. 5. 6. 7. 8. 9.]
> > > 
> > > In [4]: %timeit np.any(data)
> > > 724 us +- 42.4 us per loop (mean +- std. dev. of 7 runs, 1000
> > > loops
> > > each)
> > > 
> > > In [5]: data = np.zeros(int(1e6))
> > > 
> > > In [6]: %timeit np.any(data)
> > > 732 us +- 52.9 us per loop (mean +- std. dev. of 7 runs, 1000
> > > loops
> > > each)
> > > 
> > > I don't see any discussions about this on the NumPy issue tracker
> > > but perhaps I'm missing something.
> > > 
> > > I'm curious if there's a way to get a fast early-terminating
> > > search
> > > in NumPy? Perhaps there's another package I can depend on that
> > > does
> > > this? I guess I could also write a bit of cython code that does
> > > this but so far this project is pure python and I don't want to
> > > deal with the packaging headache of getting wheels built and
> > > conda-
> > > forge packages set up on all platforms.
> > > 
> > > Thanks for your help!
> > > 
> > > -Nathan
> > > 
> > > _______________________________________________
> > > NumPy-Discussion mailing list
> > > NumPy-Discussion@python.org
> > > https://mail.python.org/mailman/listinfo/numpy-discussion
> > 
> > _______________________________________________
> > NumPy-Discussion mailing list
> > NumPy-Discussion@python.org
> > https://mail.python.org/mailman/listinfo/numpy-discussion
> 
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion

Attachment: signature.asc
Description: This is a digitally signed message part

_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion

Reply via email to