>  However, profiling revealed that hardly anything was gained because of
>  1) non-alignment of the vectors.... this _could_ be handled by
>  shuffled loading of the values though
>  2) the fact that my application used relatively large vectors that
>  wouldn't fit into the CPU cache, hence the memory transfer slowed down
>  the CPU.
I've had generally positive results from vectorizing code in the past,
admittedly on architectures with fast memory buses (Xeon 5100s). Naive
implementations of most simple vector operations (dot,+,-,etc) were
sped up by around ~20%. I also haven't found aligned accesses to make
much difference (~2-3%), but this might be dependent on the
architecture.

James
_______________________________________________
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion

Reply via email to