Package: python-numpy Version: 1:1.8.2-2 (This problem report was written by Yaroslav Bulatov. I've confirmed it on Debian sid chroot. My computer gave a 30X improvement.)
Default numpy install uses inferior BLAS, and is very slow. Matrix multiplication benchmark below gets me 1.26 G items/second with default install on my Xeon 6-core 3.2 Ghz. When I install OpenBLAS, it goes up 186 G items/second. That's 150x improvement in speed which should be the default for numpy. benchmark ---- import time import numpy as np import numpy.random as random size = 512 iters = 500 a = random.rand(size, size).astype(np.float32) b = random.rand(size, size).astype(np.float32) start_time = time.time() for i in range(iters): np.dot(a, b) end_time = time.time() # size**3 multiplies, (size - 1) * size ** 2 adds num_ops = size ** 3 + (size - 1) * size ** 2 print "Did %d multiplications of %d x %d matrices in %.1f seconds" % (iters, size, size, end_time - start_time) print "%.4f G items/sec" % (num_ops * float(iters) / (end_time - start_time) / 10 ** 9) fixing it with openblas: ----- $ sudo apt-get install libopenblas-dev virtualenv python-dev $ mkdir ~/tmp $ cd ~/tmp $ virtualenv env $ source env/bin/activate $ pip install numpy $ which python ~/tmp/env/bin/python $ python ...