Is evaluate_iter basically numpexpr but using your numpy branch or are there other changes?
On Sun, Jan 9, 2011 at 2:45 PM, Mark Wiebe <mwwi...@gmail.com> wrote: > As a benchmark of C-based iterator usage and to make it work properly in a > multi-threaded context, I've updated numexpr to use the new iterator. In > addition to some performance improvements, this also made it easy to add > optional out= and order= parameters to the evaluate function. The numexpr > repository with this update is available here: > > https://github.com/m-paradox/numexpr > > To use it, you need the new_iterator branch of NumPy from here: > > https://github.com/m-paradox/numpy > > In all cases tested, the iterator version of numexpr's evaluate function > matches or beats the standard version. The timing results are below, with > some explanatory comments placed inline: > > -Mark > > In [1]: import numexpr as ne > > # numexpr front page example > > In [2]: a = np.arange(1e6) > In [3]: b = np.arange(1e6) > > In [4]: timeit a**2 + b**2 + 2*a*b > 1 loops, best of 3: 121 ms per loop > > In [5]: ne.set_num_threads(1) > > # iterator version performance matches standard version > > In [6]: timeit ne.evaluate("a**2 + b**2 + 2*a*b") > 10 loops, best of 3: 24.8 ms per loop > In [7]: timeit ne.evaluate_iter("a**2 + b**2 + 2*a*b") > 10 loops, best of 3: 24.3 ms per loop > > In [8]: ne.set_num_threads(2) > > # iterator version performance matches standard version > > In [9]: timeit ne.evaluate("a**2 + b**2 + 2*a*b") > 10 loops, best of 3: 21 ms per loop > In [10]: timeit ne.evaluate_iter("a**2 + b**2 + 2*a*b") > 10 loops, best of 3: 20.5 ms per loop > > # numexpr front page example with a 10x bigger array > > In [11]: a = np.arange(1e7) > In [12]: b = np.arange(1e7) > > In [13]: ne.set_num_threads(2) > > # the iterator version performance improvement is due to > # a small task scheduler tweak > > In [14]: timeit ne.evaluate("a**2 + b**2 + 2*a*b") > 1 loops, best of 3: 282 ms per loop > In [15]: timeit ne.evaluate_iter("a**2 + b**2 + 2*a*b") > 1 loops, best of 3: 255 ms per loop > > # numexpr front page example with a Fortran contiguous array > > In [16]: a = np.arange(1e7).reshape(10,100,100,100).T > In [17]: b = np.arange(1e7).reshape(10,100,100,100).T > > In [18]: timeit a**2 + b**2 + 2*a*b > 1 loops, best of 3: 3.22 s per loop > > In [19]: ne.set_num_threads(1) > > # even with a C-ordered output, the iterator version performs better > > In [20]: timeit ne.evaluate("a**2 + b**2 + 2*a*b") > 1 loops, best of 3: 3.74 s per loop > In [21]: timeit ne.evaluate_iter("a**2 + b**2 + 2*a*b") > 1 loops, best of 3: 379 ms per loop > In [22]: timeit ne.evaluate_iter("a**2 + b**2 + 2*a*b", order='C') > 1 loops, best of 3: 2.03 s per loop > > In [23]: ne.set_num_threads(2) > > # the standard version just uses 1 thread here, I believe > # the iterator version performs the same as for the flat 1e7-sized array > > In [24]: timeit ne.evaluate("a**2 + b**2 + 2*a*b") > 1 loops, best of 3: 3.92 s per loop > In [25]: timeit ne.evaluate_iter("a**2 + b**2 + 2*a*b") > 1 loops, best of 3: 254 ms per loop > In [26]: timeit ne.evaluate_iter("a**2 + b**2 + 2*a*b", order='C') > 1 loops, best of 3: 1.74 s per loop > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion@scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > >
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion