On 1/23/2011 10:36 AM, Vladimir Voznesensky wrote: > My computer has 12 hyperthreaded cores. > My application uses dot multiplication from Intel MKL, that accelerated > it by ~ 5 times. > After OpenMP-fication of loops.c.src, my app was accelerated by ~12-15 > times. >
I was greatly disappointed in the parallel performance on a new workstation for some of my programs. I could not get better than about a factor of 5 on my dual xeon with 24 threads. Last Fall, I stumbled across this example of OpenMP with f2py, https://gist.github.com/226473 that I built on Ubuntu 10.04 x64 using (slightly different than the instructions): f2py -c -m deemingomp periodogram.f90 --f90flags="-fopenmp " -lgomp -lf77blas -lcblas -latlas On my machine for larger array sizes, I saw speed-ups of 20x over single thread in the example program. Indeed, the example serves as an excellent way to test the thermal stability of the workstation. I did not get a chance to follow this up yet, but if you can get 12x improvement with normal numpy codes, I am very interested... Cheers, EC _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion