Hi! I wonder why simple elementwise operations like "a * 2" or "a + 1" are not performed in order of increasing memory addresses in order to exploit CPU caches etc. - as it is now, their speed drops by a factor of around 3 simply by transpose()ing. Similarly (but even less logical), copy() and even the constructor are affected (yes, I understand that copy() creates contiguous arrays, but shouldn't it respect/retain the order nevertheless?):
### constructor ### In [89]: %timeit -r 10 -n 1000000 numpy.ndarray((3,3,3)) 1000000 loops, best of 10: 1.19 s per loop In [90]: %timeit -r 10 -n 1000000 numpy.ndarray((3,3,3), order="f") 1000000 loops, best of 10: 2.19 s per loop ### copy 3x3x3 array ### In [85]: a = numpy.ndarray((3,3,3)) In [86]: %timeit -r 10 a.copy() 1000000 loops, best of 10: 1.14 s per loop In [87]: a = numpy.ndarray((3,3,3), order="f") In [88]: %timeit -r 10 -n 1000000 a.copy() 1000000 loops, best of 10: 3.39 s per loop ### copy 256x256x256 array ### In [74]: a = numpy.ndarray((256,256,256)) In [75]: %timeit -r 10 a.copy() 10 loops, best of 10: 119 ms per loop In [76]: a = numpy.ndarray((256,256,256), order="f") In [77]: %timeit -r 10 a.copy() 10 loops, best of 10: 274 ms per loop ### fill ### In [79]: a = numpy.ndarray((256,256,256)) In [80]: %timeit -r 10 a.fill(0) 10 loops, best of 10: 60.2 ms per loop In [81]: a = numpy.ndarray((256,256,256), order="f") In [82]: %timeit -r 10 a.fill(0) 10 loops, best of 10: 60.2 ms per loop ### power ### In [151]: a = numpy.ndarray((256,256,256)) In [152]: %timeit -r 10 a ** 2 10 loops, best of 10: 124 ms per loop In [153]: a = numpy.asfortranarray(a) In [154]: %timeit -r 10 a ** 2 10 loops, best of 10: 458 ms per loop ### addition ### In [160]: a = numpy.ndarray((256,256,256)) In [161]: %timeit -r 10 a + 1 10 loops, best of 10: 139 ms per loop In [162]: a = numpy.asfortranarray(a) In [163]: %timeit -r 10 a + 1 10 loops, best of 10: 465 ms per loop ### fft ### In [146]: %timeit -r 10 numpy.fft.fft(vol, axis=0) 10 loops, best of 10: 1.16 s per loop In [148]: %timeit -r 10 numpy.fft.fft(vol0, axis=2) 10 loops, best of 10: 1.16 s per loop In [149]: vol.flags Out[149]: C_CONTIGUOUS : True F_CONTIGUOUS : False OWNDATA : True WRITEABLE : True ALIGNED : True UPDATEIFCOPY : False In [150]: vol0.flags Out[150]: C_CONTIGUOUS : False F_CONTIGUOUS : True OWNDATA : False WRITEABLE : True ALIGNED : True UPDATEIFCOPY : False In [9]: %timeit -r 10 numpy.fft.fft(vol0, axis=0) 10 loops, best of 10: 939 ms per loop ### mean ### In [173]: %timeit -r 10 vol.mean() 10 loops, best of 10: 272 ms per loop In [174]: %timeit -r 10 vol0.mean() 10 loops, best of 10: 683 ms per loop ### max ### In [175]: %timeit -r 10 vol.max() 10 loops, best of 10: 63.8 ms per loop In [176]: %timeit -r 10 vol0.max() 10 loops, best of 10: 475 ms per loop ### min ### In [177]: %timeit -r 10 vol.min() 10 loops, best of 10: 63.8 ms per loop In [178]: %timeit -r 10 vol0.min() 10 loops, best of 10: 476 ms per loop ### rot90 ### In [10]: %timeit -r 10 numpy.rot90(vol) 100000 loops, best of 10: 6.97 s per loop In [12]: %timeit -r 10 numpy.rot90(vol0) 100000 loops, best of 10: 6.92 s per loop -- Ciao, / / /--/ / / ANS _______________________________________________ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion