I applaud you on your vision. I only have one small suggestion: I suggest you put a table of contents at the beginning of your NEP so people may skip to the part that most interests them.
On Tue, Dec 21, 2010 at 4:59 PM, John Salvatier <jsalv...@u.washington.edu>wrote: > That is an amazing christmas present. > > On Tue, Dec 21, 2010 at 4:53 PM, Mark Wiebe <mwwi...@gmail.com> wrote: > >> Hello NumPy-ers, >> >> After some performance analysis, I've designed and implemented a new >> iterator designed to speed up ufuncs and allow for easier multi-dimensional >> iteration. The new code is fairly large, but works quite well already. If >> some people could read the NEP and give some feedback, that would be great! >> Here's a link: >> >> >> https://github.com/m-paradox/numpy/blob/mw_neps/doc/neps/new-iterator-ufunc.rst >> >> I would also love it if someone could try building the code and play >> around with it a bit. The github branch is here: >> >> https://github.com/m-paradox/numpy/tree/new_iterator >> >> To give a taste of the iterator's functionality, below is an example from >> the NEP for how to implement a "Lambda UFunc." With just a few lines of >> code, it's possible to replicate something similar to the numexpr library >> (numexpr still gets a bigger speedup, though). In the example expression I >> chose, execution time went from 138ms to 61ms. >> >> Hopefully this is a good Christmas present for NumPy. :) >> >> Cheers, >> Mark >> >> Here is the definition of the ``luf`` function.:: >> >> def luf(lamdaexpr, *args, **kwargs): >> """Lambda UFunc >> >> e.g. >> c = luf(lambda i,j:i+j, a, b, order='K', >> casting='safe', buffersize=8192) >> >> c = np.empty(...) >> luf(lambda i,j:i+j, a, b, out=c, order='K', >> casting='safe', buffersize=8192) >> """ >> >> nargs = len(args) >> op = args + (kwargs.get('out',None),) >> it = np.newiter(op, ['buffered','no_inner_iteration'], >> [['readonly','nbo_aligned']]*nargs + >> [['writeonly','allocate','no_broadcast']], >> order=kwargs.get('order','K'), >> casting=kwargs.get('casting','safe'), >> buffersize=kwargs.get('buffersize',0)) >> while not it.finished: >> it[-1] = lamdaexpr(*it[:-1]) >> it.iternext() >> >> return it.operands[-1] >> >> Then, by using ``luf`` instead of straight Python expressions, we >> can gain some performance from better cache behavior.:: >> >> In [2]: a = np.random.random((50,50,50,10)) >> In [3]: b = np.random.random((50,50,1,10)) >> In [4]: c = np.random.random((50,50,50,1)) >> >> In [5]: timeit 3*a+b-(a/c) >> 1 loops, best of 3: 138 ms per loop >> >> In [6]: timeit luf(lambda a,b,c:3*a+b-(a/c), a, b, c) >> 10 loops, best of 3: 60.9 ms per loop >> >> In [7]: np.all(3*a+b-(a/c) == luf(lambda a,b,c:3*a+b-(a/c), a, b, c)) >> Out[7]: True >> >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion@scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> >> >
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion