Re: Performance differences between SystemML LibMatrixMult and Breeze with native BLAS

Matthias Boehm Wed, 30 Nov 2016 15:08:49 -0800

Could you please make sure you're comparing the right thing. Even on oldsandy bridge CPUs our matrix mult for 1kx1k usually takes 40-50ms. Wealso did the same experiments with larger matrices and SystemML wasabout 2x faster compared to Breeze. Please decomment the timings inLibMatrixMult.matrixMult and double check the timing as well as thatwe're actually comparing dense matrix multiply.


Regards,
Matthias


On 11/30/2016 11:54 PM, fschue...@posteo.de wrote:

Hi all,

I have run a very quick comparison between SystemML's LibMatrixMult and
Breeze matrix multiplication using native BLAS (OpenBLAS through
netlib-java). As per my very small comparison I get the result that
there is a performance difference for dense-dense Matrices of size 1000
x 1000 (our default blocksize) with Breeze being about 5-6 times faster
here. The code I used can be found here:
https://github.com/fschueler/incubator-systemml/blob/model_types/src/test/scala/org/apache/sysml/api/linalg/layout/local/SystemMLLocalBackendTest.scala


Running this code with 50 iterations each gives me for example average
times of:
Breeze:         49.74 ms
SystemML:   363.44 ms

I don't want to say this is true for every operation, but those results
let us form the hypothesis that native BLAS operations can lead to a
significant speedup for certain operations which is worth testing with
more advanced benchmarks.

Btw: I am definitely not saying we should use Breeze here. I am more
looking at native BLAS and LAPACK implementations in general (as
provided by OpenBLAS, MKL, etc.).

Let me know what you think!
Felix

Re: Performance differences between SystemML LibMatrixMult and Breeze with native BLAS

Reply via email to