I did some testing of read-only and restrict Py_buffer access with simple matrix multiplication. I consistantly find that it does not help on large arrays, but does help for small.
However, multiplying small matrices is so fast that speed does not matter anyway. Relative speed measurements for matrix multiplication average of 100 trials matrix ranks: (100 x 100) and (100 x 100) NumPy, np.matrix 1.00 x Cython, plain buffer 0.25 x Cython, readonly buffer 0.25 x Cython, restrict buffer 0.24 x Cython, restrict and readonly buffer 0.25 x Relative speed measurements for matrix multiplication average of 1000 trials matrix ranks: (10 x 10) and (10 x 10) NumPy, np.matrix 1.00 x Cython, plain buffer 2.35 x Cython, readonly buffer 2.65 x Cython, restrict buffer 2.75 x Cython, restrict and readonly buffer 2.70 x I also find that gcc autovectorizer does not like the way we work with Py_buffers: D:\benchmark\matmul>gcc -c -std=gnu99 -O3 -fpeel-loops -funroll-loops -ftree-loo p-linear -ffast-math -ftree-vectorizer-verbose=5 -march=core2 -Ic:/Python 26/include -Ic:/Python26/Lib/site-packages/numpy/core/include matmul_test.c [...] matmul_test.c:1171: note: not vectorized: data ref analysis failed D.9077_530 = *D.9076_529; matmul_test.c:854: note: vectorized 0 loops in function. I think it is the way we access arrays that confuses GCC's autovectorizer. After some searching, I found out that it only works with power-of-two strides, and "data ref analysis failed" is the error we would get when it is confued. Sturla Molden _______________________________________________ Cython-dev mailing list [email protected] http://codespeak.net/mailman/listinfo/cython-dev
