Hi all,
I have run a very quick comparison between SystemML's LibMatrixMult and
Breeze matrix multiplication using native BLAS (OpenBLAS through
netlib-java). As per my very small comparison I get the result that
there is a performance difference for dense-dense Matrices of size 1000
x 1000 (our default blocksize) with Breeze being about 5-6 times faster
here. The code I used can be found here:
https://github.com/fschueler/incubator-systemml/blob/model_types/src/test/scala/org/apache/sysml/api/linalg/layout/local/SystemMLLocalBackendTest.scala
Running this code with 50 iterations each gives me for example average
times of:
Breeze: 49.74 ms
SystemML: 363.44 ms
I don't want to say this is true for every operation, but those results
let us form the hypothesis that native BLAS operations can lead to a
significant speedup for certain operations which is worth testing with
more advanced benchmarks.
Btw: I am definitely not saying we should use Breeze here. I am more
looking at native BLAS and LAPACK implementations in general (as
provided by OpenBLAS, MKL, etc.).
Let me know what you think!
Felix