A Friday 11 June 2010 02:27:18 Sturla Molden escrigué: > >> Another thing I did when reimplementing lfilter was "copy-in copy-out" > >> for strided arrays. > > > > What is copy-in copy out ? I am not familiar with this term ? > > Strided memory access is slow. So it often helps to make a temporary > copy that are contiguous.
In my experience, this technique will only work well with strided arrays if you are going to re-use the data of these temporaries in cache, or your data is unaligned. But if you are going to use the data only once (and this is very common in NumPy element-wise operations), this is rather counter- productive for strided arrays. For example, in numexpr, we made a lot of different tests comparing "copy-in copy-out" and direct access techniques for strided arrays. The result was that operations with direct access showed significantly better performance with strided arrays. On the contrary, for unaligned arrays the copy-in copy- out technique gave better results. Look at these times, where the arrays where unidimensional with a length of 1 million element each, but the results can be extrapolated to larger, multidimensional arrays (the original benchmark file is bench/vml_timing.py): -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= Numexpr version: 1.3.2.dev169 NumPy version: 1.4.1rc2 Python version: 2.6.1 (r261:67515, Feb 3 2009, 17:34:37) [GCC 4.3.2 [gcc-4_3-branch revision 141291]] Platform: linux2-x86_64 AMD/Intel CPU? True VML available? True VML/MKL version: Intel(R) Math Kernel Library Version 10.1.0 Product Build 081809.14 for Intel(R) 64 architecture applications -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= To start with, times between numpy and numexpr are very similar for very simple expressions (except for unaligned arrays, where "copy-in copy-out" works pretty well for numexpr): ******************* Expression: i2 > 0 numpy: 0.0016 numpy strided: 0.0037 numpy unaligned: 0.0086 numexpr: 0.0016 Speed-up of numexpr over numpy: 0.9512 numexpr strided: 0.0039 Speed-up of numexpr over numpy: 0.964 numexpr unaligned: 0.0042 Speed-up of numexpr over numpy: 2.0598 When doing some basic operations (mind that there are no temporaries here, so numpy should be not in great disadvantage), direct access to strided data goes between 2x and 3x faster than numpy: ******************* Expression: f3+f4 numpy: 0.0060 numpy strided: 0.0176 numpy unaligned: 0.0166 numexpr: 0.0052 Speed-up of numexpr over numpy: 1.1609 numexpr strided: 0.0086 Speed-up of numexpr over numpy: 2.0584 numexpr unaligned: 0.0099 Speed-up of numexpr over numpy: 1.6785 ******************* Expression: f3+i2 numpy: 0.0060 numpy strided: 0.0176 numpy unaligned: 0.0176 numexpr: 0.0031 Speed-up of numexpr over numpy: 1.9137 numexpr strided: 0.0061 Speed-up of numexpr over numpy: 2.8789 numexpr unaligned: 0.0078 Speed-up of numexpr over numpy: 2.2411 Notice how, until now, absolute times in numexpr and strided arrays (using the direct technique) are faster than the unaligned case (copy-in copy-out). Also, when evaluating transcendental expressions (numexpr uses Intel's Vector Math Library, VML, here), direct access is again faster than NumPy: ******************* Expression: exp(f3) numpy: 0.0150 numpy strided: 0.0155 numpy unaligned: 0.0222 numexpr: 0.0030 Speed-up of numexpr over numpy: 5.0268 numexpr strided: 0.0081 Speed-up of numexpr over numpy: 1.9086 numexpr unaligned: 0.0066 Speed-up of numexpr over numpy: 3.3454 ******************* Expression: log(exp(f3)+1)/f4 numpy: 0.0486 numpy strided: 0.0563 numpy unaligned: 0.0639 numexpr: 0.0121 Speed-up of numexpr over numpy: 4.0332 numexpr strided: 0.0170 Speed-up of numexpr over numpy: 3.3067 numexpr unaligned: 0.0164 Speed-up of numexpr over numpy: 3.8833 However, now that I see the latter figures, I don't remember that we have checked whether a copy-in copy-out technique would work faster in combination with VML. By looking at the better absolute times in unaligned arrays, I'd say chances are that performance for the strided scenario *might* benefit from using copy-in/copy-out. Mmh, that's worth a try... -- Francesc Alted _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion