BLIS seems like a nice project as well. I like the arbitrary striding; BLAS 
lacking this has always annoyed me.

-----Original Message-----
From: "Sturla Molden" <sturla.mol...@gmail.com>
Sent: ‎12-‎4-‎2014 13:12
To: "numpy-discussion@scipy.org" <numpy-discussion@scipy.org>
Subject: Re: [Numpy-discussion] Wiki page for building numerical stuff onWindows

Eelco Hoogendoorn <hoogendoorn.ee...@gmail.com> wrote:

> I wonder: how hard would it be to create a more 21th-century oriented BLAS,
> relying more on code generation tools, and perhaps LLVM/JITting?
> 
> Wouldn't we get ten times the portability with one-tenth the lines of code?
> Or is there too much dark magic going on in BLAS for such an approach to
> come close enough to hand-tuned performance?

The "dark magic" in OpenBLAS is mostly to place prefetch instructions
strategically, to make sure hierarchical memory is used optimally. This is
very hard for the compiler to get correctly, because it doesn't know matrix
algebra like we do. The reason prefetching is needed, is because when two
matrices are multiplied, one of them will have strided memory access. On
the other hand, putting in other SIMD instructions than _mm_prefetch is
something a compiler might be able to vectorize without a lot of help
today.

Sturla

_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion
_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Reply via email to