On Thu, Mar 7, 2013 at 11:47 AM, Francesc Alted <franc...@continuum.io> wrote:
> On 3/6/13 7:42 PM, Kurt Smith wrote:
>
> Hmm, that clearly depends on the architecture.  On my machine:
> ...
> That is, the unaligned column is 4x slower (!).  numexpr allows somewhat
> better results:
> ...
> Yes, in this case, the unaligned array goes faster (as much as 30%).  I
> think the reason is that numexpr optimizes the unaligned access by doing
> a copy of the different chunks in internal buffers that fits in L1
> cache.  Apparently this is very beneficial in this case (not sure why,
> though).
>
> On my machine:
> ...
> Again, the 4x slowdown is here.  Using numexpr:
> ...
> Again, the unaligned case is (sligthly better).  In this case numexpr is
> a bit slower that NumPy because sum() is not parallelized internally.
> Hmm, provided that, I'm wondering if some internal copies to L1 in NumPy
> could help improving unaligned performance. Worth a try?
>

Very interesting -- thanks for sharing.

> --
> Francesc Alted
_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Reply via email to