> However, profiling revealed that hardly anything was gained because of > 1) non-alignment of the vectors.... this _could_ be handled by > shuffled loading of the values though > 2) the fact that my application used relatively large vectors that > wouldn't fit into the CPU cache, hence the memory transfer slowed down > the CPU. I've had generally positive results from vectorizing code in the past, admittedly on architectures with fast memory buses (Xeon 5100s). Naive implementations of most simple vector operations (dot,+,-,etc) were sped up by around ~20%. I also haven't found aligned accesses to make much difference (~2-3%), but this might be dependent on the architecture.
James _______________________________________________ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion