When I tried this, I wasn't even able to vectorize a sum-reduce-style loop written in pure C. It seemed to me then that the auto-vectorizer only tackles parallel arithmetic and not reductions... (Not with cython-generated code though).
BTW I tried with Intel C and it seems Cython code isn't vectorizeable then either. While I'm happy for any patches towards ato-vectorization, I don't think it is that important as a concept -- SIMD instructions in Cython would allow going directly, rather than "serialize vector op to loop, then parse loop s vector op". (the strides/power of 2 shouldn't matter if it is declared "c" ?) Dag Sverre Seljebotn -----Original Message----- From: Sturla Molden <[email protected]> Date: Tuesday, Oct 20, 2009 4:49 pm Subject: [Cython] restrict and readonly To: [email protected]: [email protected] >I did some testing of read-only and restrict Py_buffer access with >simple matrix multiplication. I consistantly find that it does not help >on large arrays, but does help for small. > >However, multiplying small matrices is so fast that speed does not >matter anyway. > > >Relative speed measurements for matrix multiplication >average of 100 trials >matrix ranks: (100 x 100) and (100 x 100) > >NumPy, np.matrix 1.00 x >Cython, plain buffer 0.25 x >Cython, readonly buffer 0.25 x >Cython, restrict buffer 0.24 x >Cython, restrict and readonly buffer 0.25 x > > > >Relative speed measurements for matrix multiplication >average of 1000 trials >matrix ranks: (10 x 10) and (10 x 10) > >NumPy, np.matrix 1.00 x >Cython, plain buffer 2.35 x >Cython, readonly buffer 2.65 x >Cython, restrict buffer 2.75 x >Cython, restrict and readonly buffer 2.70 x > > > >I also find that gcc autovectorizer does not like the way we work with >Py_buffers: > >D:\benchmark\matmul>gcc -c -std=gnu99 -O3 -fpeel-loops -funroll-loops >-ftree-loo >p-linear -ffast-math -ftree-vectorizer-verbose=5 -march=core2 -Ic:/Python >26/include -Ic:/Python26/Lib/site-packages/numpy/core/include matmul_test.c > >[...] > >matmul_test.c:1171: note: not vectorized: data ref analysis failed >D.9077_530 = >*D.9076_529; > >matmul_test.c:854: note: vectorized 0 loops in function. > > >I think it is the way we access arrays that confuses GCC's >autovectorizer. After some searching, I found out that it only works >with power-of-two strides, and "data ref analysis failed" is the error >we would get when it is confued. > > >Sturla Molden >_______________________________________________ >Cython-dev mailing list >[email protected] >http://codespeak.net/mailman/listinfo/cython-dev > _______________________________________________ Cython-dev mailing list [email protected] http://codespeak.net/mailman/listinfo/cython-dev
