I did some testing of read-only and restrict Py_buffer access with 
simple matrix multiplication. I consistantly find that it does not help 
on large arrays, but does help for small.

However, multiplying small matrices is so fast that speed does not 
matter anyway.


Relative speed measurements for matrix multiplication
average of 100 trials
matrix ranks: (100 x 100) and (100 x 100)

NumPy,  np.matrix                         1.00 x
Cython, plain buffer                      0.25 x
Cython, readonly buffer                   0.25 x
Cython, restrict buffer                   0.24 x
Cython, restrict and readonly buffer      0.25 x



Relative speed measurements for matrix multiplication
average of 1000 trials
matrix ranks: (10 x 10) and (10 x 10)

NumPy,  np.matrix                         1.00 x
Cython, plain buffer                      2.35 x
Cython, readonly buffer                   2.65 x
Cython, restrict buffer                   2.75 x
Cython, restrict and readonly buffer      2.70 x



I also find that gcc autovectorizer does not like the way we work with 
Py_buffers:

D:\benchmark\matmul>gcc -c -std=gnu99 -O3 -fpeel-loops -funroll-loops 
-ftree-loo
p-linear -ffast-math -ftree-vectorizer-verbose=5 -march=core2 -Ic:/Python
26/include -Ic:/Python26/Lib/site-packages/numpy/core/include matmul_test.c

[...]

matmul_test.c:1171: note: not vectorized: data ref analysis failed 
D.9077_530 =
*D.9076_529;

matmul_test.c:854: note: vectorized 0 loops in function.


I think it is the way we access arrays that confuses GCC's 
autovectorizer. After some searching, I found out that it only works 
with power-of-two strides, and "data ref analysis failed" is the error 
we would get when it is confued. 


Sturla Molden
_______________________________________________
Cython-dev mailing list
[email protected]
http://codespeak.net/mailman/listinfo/cython-dev

Reply via email to