When I tried this, I wasn't even able to vectorize a sum-reduce-style loop 
written in pure C. It seemed to me then that the auto-vectorizer only tackles 
parallel arithmetic and not reductions... (Not with cython-generated code 
though).

BTW I tried with Intel C and it seems Cython code isn't vectorizeable then 
either.

While I'm happy for any patches towards ato-vectorization, I don't think it is 
that important as a concept -- SIMD instructions in Cython would allow going 
directly, rather than "serialize vector op to loop, then parse loop s vector 
op". 

(the strides/power of 2 shouldn't matter if it is declared "c" ?)

Dag Sverre Seljebotn
-----Original Message-----
From: Sturla Molden <[email protected]>
Date: Tuesday, Oct 20, 2009 4:49 pm
Subject: [Cython] restrict and readonly
To: [email protected]: [email protected]


>I did some testing of read-only and restrict Py_buffer access with 
>simple matrix multiplication. I consistantly find that it does not help 
>on large arrays, but does help for small.
>
>However, multiplying small matrices is so fast that speed does not 
>matter anyway.
>
>
>Relative speed measurements for matrix multiplication
>average of 100 trials
>matrix ranks: (100 x 100) and (100 x 100)
>
>NumPy,  np.matrix                         1.00 x
>Cython, plain buffer                      0.25 x
>Cython, readonly buffer                   0.25 x
>Cython, restrict buffer                   0.24 x
>Cython, restrict and readonly buffer      0.25 x
>
>
>
>Relative speed measurements for matrix multiplication
>average of 1000 trials
>matrix ranks: (10 x 10) and (10 x 10)
>
>NumPy,  np.matrix                         1.00 x
>Cython, plain buffer                      2.35 x
>Cython, readonly buffer                   2.65 x
>Cython, restrict buffer                   2.75 x
>Cython, restrict and readonly buffer      2.70 x
>
>
>
>I also find that gcc autovectorizer does not like the way we work with 
>Py_buffers:
>
>D:\benchmark\matmul>gcc -c -std=gnu99 -O3 -fpeel-loops -funroll-loops 
>-ftree-loo
>p-linear -ffast-math -ftree-vectorizer-verbose=5 -march=core2 -Ic:/Python
>26/include -Ic:/Python26/Lib/site-packages/numpy/core/include matmul_test.c
>
>[...]
>
>matmul_test.c:1171: note: not vectorized: data ref analysis failed 
>D.9077_530 =
>*D.9076_529;
>
>matmul_test.c:854: note: vectorized 0 loops in function.
>
>
>I think it is the way we access arrays that confuses GCC's 
>autovectorizer. After some searching, I found out that it only works 
>with power-of-two strides, and "data ref analysis failed" is the error 
>we would get when it is confued. 
>
>
>Sturla Molden
>_______________________________________________
>Cython-dev mailing list
>[email protected]
>http://codespeak.net/mailman/listinfo/cython-dev
>

_______________________________________________
Cython-dev mailing list
[email protected]
http://codespeak.net/mailman/listinfo/cython-dev

Reply via email to