Could you please make sure you're comparing the right thing. Even on old sandy bridge CPUs our matrix mult for 1kx1k usually takes 40-50ms. We also did the same experiments with larger matrices and SystemML was about 2x faster compared to Breeze. Please decomment the timings in LibMatrixMult.matrixMult and double check the timing as well as that we're actually comparing dense matrix multiply.

Regards,
Matthias

On 11/30/2016 11:54 PM, fschue...@posteo.de wrote:
Hi all,

I have run a very quick comparison between SystemML's LibMatrixMult and
Breeze matrix multiplication using native BLAS (OpenBLAS through
netlib-java). As per my very small comparison I get the result that
there is a performance difference for dense-dense Matrices of size 1000
x 1000 (our default blocksize) with Breeze being about 5-6 times faster
here. The code I used can be found here:
https://github.com/fschueler/incubator-systemml/blob/model_types/src/test/scala/org/apache/sysml/api/linalg/layout/local/SystemMLLocalBackendTest.scala


Running this code with 50 iterations each gives me for example average
times of:
Breeze:         49.74 ms
SystemML:   363.44 ms

I don't want to say this is true for every operation, but those results
let us form the hypothesis that native BLAS operations can lead to a
significant speedup for certain operations which is worth testing with
more advanced benchmarks.

Btw: I am definitely not saying we should use Breeze here. I am more
looking at native BLAS and LAPACK implementations in general (as
provided by OpenBLAS, MKL, etc.).

Let me know what you think!
Felix

Reply via email to