For the first benchmark apparently A.dot(B) with A real and B complex is
a known issue performance wise https://github.com/numpy/numpy/issues/10468
In general, it might be worth trying different BLAS backends. For
instance, if you install numpy from conda-forge you should be able to
switch between OpenBLAS, MKL and BLIS:
https://conda-forge.org/docs/maintainer/knowledge_base.html#switching-blas-implementation
Roman
On 23/02/2021 19:32, Andrea Gavana wrote:
Hi,
On Tue, 23 Feb 2021 at 19.11, Neal Becker <ndbeck...@gmail.com
<mailto:ndbeck...@gmail.com>> wrote:
I have code that performs dot product of a 2D matrix of size (on the
order of) [1000,16] with a vector of size [1000]. The matrix is
float64 and the vector is complex128. I was using numpy.dot but it
turned out to be a bottleneck.
So I coded dot2x1 in c++ (using xtensor-python just for the
interface). No fancy simd was used, unless g++ did it on it's own.
On a simple benchmark using timeit I find my hand-coded routine is on
the order of 1000x faster than numpy? Here is the test code:
My custom c++ code is dot2x1. I'm not copying it here because it has
some dependencies. Any idea what is going on?
I had a similar experience - albeit with an older numpy and Python 2.7,
so my comments are easily outdated and irrelevant. This was on Windows
10 64 bit, way more than plenty RAM.
It took me forever to find out that numpy.dot was the culprit, and I
ended up using fortran + f2py. Even with the overhead of having to go
through f2py bridge, the fortran dot_product was several times faster.
Sorry if It doesn’t help much.
Andrea.
import numpy as np
from dot2x1 import dot2x1
a = np.ones ((1000,16))
b = np.array([ 0.80311816+0.80311816j, 0.80311816-0.80311816j,
-0.80311816+0.80311816j, -0.80311816-0.80311816j,
1.09707981+0.29396165j, 1.09707981-0.29396165j,
-1.09707981+0.29396165j, -1.09707981-0.29396165j,
0.29396165+1.09707981j, 0.29396165-1.09707981j,
-0.29396165+1.09707981j, -0.29396165-1.09707981j,
0.25495815+0.25495815j, 0.25495815-0.25495815j,
-0.25495815+0.25495815j, -0.25495815-0.25495815j])
def F1():
d = dot2x1 (a, b)
def F2():
d = np.dot (a, b)
from timeit import timeit
print (timeit ('F1()', globals=globals(), number=1000))
print (timeit ('F2()', globals=globals(), number=1000))
In [13]: 0.013910860987380147 << 1st timeit
28.608758996007964 << 2nd timeit
--
Those who don't understand recursion are doomed to repeat it
_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion@python.org <mailto:NumPy-Discussion@python.org>
https://mail.python.org/mailman/listinfo/numpy-discussion
<https://mail.python.org/mailman/listinfo/numpy-discussion>
_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion
_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion