Hello. Is there anybody succeeded in running Linear Algebra Applications that use lots of SSE instruction ? Or did anybody ever try running applications using Intel MKL library ?
I am trying to port my Linear Algebra applications to run in Marss. Specifically, I want to run DGEMM, which is double precision matrix-matrix multiplication. I use GotoBLAS library, which is supposedly better than Intel mkl http://hpc.ucla.edu/hoffman2/software/blas_benchmark.php Considering that I could use SSE instruction which is 4 FLOPS/ cycle, I was expecting to get at least close to 4 FLOPS/cycle. http://stackoverflow.com/questions/15655835/flops-per-cycle-for-sandy-bridge-and-haswell-sse2-avx-avx2 However, when I ran using Xeon configuration (in config folder), I could only get 1.2 FLOPS/cycle, which is way below what I would have expected. I got 1.2 Flops/cycle by dividing the number of floating operations for doing dgemm(which is 2*m*n*k) by the number of simulation cycles that Marss produced. Has anybody faced the same experience before ? If someone could point out any insight or hint where it may have gone wrong, I would really appreciate it. Also, side question, does Marss support AVX instruction ? (this performs even better than SSE instruction, in terms of FLOPS) Any suggestion would be very much appreciated. Thanks a lot, _______________________________________________ http://www.marss86.org Marss86-Devel mailing list [email protected] https://www.cs.binghamton.edu/mailman/listinfo/marss86-devel
