Hi Steven, I added your version (vander3) to my benchmark and updated the IJulia notebook:
http://nbviewer.ipython.org/gist/synapticarbors/26910166ab775c04c47b As you mentioned it's a lot faster than the other version I wrote and evens out the underperformance vs numpy for the larger arrays on my machine. The @inbounds macro makes a small difference, but not dramatic. One of the things that I wonder is if there would be interest in having a way of globally turning off bounds checking either at the function level or module/file level, similar to cython's decorators and file-level compiler directives. Josh On Thursday, January 8, 2015 at 4:36:52 PM UTC-5, Steven G. Johnson wrote: > > For comparison, the NumPy vander function > > > https://github.com/numpy/numpy/blob/f4be1039d6fe3e4fdc157a22e8c071ac10651997/numpy/lib/twodim_base.py#L490-L577 > > does all its work in multiply.accumulate. Here is the outer loop of > multiply.accumulate (written in C): > > > https://github.com/numpy/numpy/blob/3b22d87050ab63db0dcd2d763644d924a69c5254/numpy/core/src/umath/ufunc_object.c#L2936-L3264 > > and the inner loops (I think) are generated from this source file for > various numeric types: > > > https://github.com/numpy/numpy/blob/3b22d87050ab63db0dcd2d763644d924a69c5254/numpy/core/src/umath/loops.c.src > > A quick glance at these will tell you the price in code complexity that > NumPy is paying for the performance they manage to get. >