On Thu, Mar 7, 2013 at 11:47 AM, Francesc Alted <franc...@continuum.io> wrote: > On 3/6/13 7:42 PM, Kurt Smith wrote: > > Hmm, that clearly depends on the architecture. On my machine: > ... > That is, the unaligned column is 4x slower (!). numexpr allows somewhat > better results: > ... > Yes, in this case, the unaligned array goes faster (as much as 30%). I > think the reason is that numexpr optimizes the unaligned access by doing > a copy of the different chunks in internal buffers that fits in L1 > cache. Apparently this is very beneficial in this case (not sure why, > though). > > On my machine: > ... > Again, the 4x slowdown is here. Using numexpr: > ... > Again, the unaligned case is (sligthly better). In this case numexpr is > a bit slower that NumPy because sum() is not parallelized internally. > Hmm, provided that, I'm wondering if some internal copies to L1 in NumPy > could help improving unaligned performance. Worth a try? >
Very interesting -- thanks for sharing. > -- > Francesc Alted _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion