BLIS seems like a nice project as well. I like the arbitrary striding; BLAS lacking this has always annoyed me.
-----Original Message----- From: "Sturla Molden" <sturla.mol...@gmail.com> Sent: 12-4-2014 13:12 To: "numpy-discussion@scipy.org" <numpy-discussion@scipy.org> Subject: Re: [Numpy-discussion] Wiki page for building numerical stuff onWindows Eelco Hoogendoorn <hoogendoorn.ee...@gmail.com> wrote: > I wonder: how hard would it be to create a more 21th-century oriented BLAS, > relying more on code generation tools, and perhaps LLVM/JITting? > > Wouldn't we get ten times the portability with one-tenth the lines of code? > Or is there too much dark magic going on in BLAS for such an approach to > come close enough to hand-tuned performance? The "dark magic" in OpenBLAS is mostly to place prefetch instructions strategically, to make sure hierarchical memory is used optimally. This is very hard for the compiler to get correctly, because it doesn't know matrix algebra like we do. The reason prefetching is needed, is because when two matrices are multiplied, one of them will have strided memory access. On the other hand, putting in other SIMD instructions than _mm_prefetch is something a compiler might be able to vectorize without a lot of help today. Sturla _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion