Hello.

Is there anybody succeeded in running Linear Algebra Applications that
use lots of SSE instruction ?
Or did anybody ever try running applications using Intel MKL library ?

I am trying to port my Linear Algebra applications to run in Marss.
Specifically, I want to run DGEMM, which is double precision
matrix-matrix multiplication.

I use GotoBLAS library, which is supposedly better than Intel mkl

http://hpc.ucla.edu/hoffman2/software/blas_benchmark.php

Considering that I could use SSE instruction which is 4 FLOPS/ cycle,
I was expecting to get at least close to 4 FLOPS/cycle.

http://stackoverflow.com/questions/15655835/flops-per-cycle-for-sandy-bridge-and-haswell-sse2-avx-avx2

However, when I ran using Xeon configuration (in config folder), I
could only get 1.2 FLOPS/cycle, which is way below what I would have
expected.
I got 1.2 Flops/cycle  by dividing the number of floating operations
for doing dgemm(which is 2*m*n*k) by the number of simulation cycles
that Marss produced.

Has anybody faced the same experience before ?
If someone could point out any insight or hint where it may have gone
wrong, I would really appreciate it.

Also, side question, does Marss support AVX instruction ? (this
performs even better than SSE instruction, in terms of FLOPS)

Any suggestion would be very much appreciated.

Thanks a lot,

_______________________________________________
http://www.marss86.org
Marss86-Devel mailing list
[email protected]
https://www.cs.binghamton.edu/mailman/listinfo/marss86-devel

Reply via email to