1. Create generalised (only type template and my be flags) BLAS algorithms (probably slow) with CBLAS like API.
See [1] (the Matmul benchmark) Julia Native is probably backed with Intel MKL or OpenBLAS. D version was optimized by Martin Nowak [2] and is still _much_ slower.

2. Allow users to use existing CBLAS libraries inside generalised BLAS.
I think a good interface is more important than speed of default implementation (at least for e.g large matrix multiplication). Just use existing code for speed...
Goto's papers about his BLAS: [3][4]
Having something a competitive in D would be great but probably a lot of work. Without a good D interface dstep + openBLAS/Atlas header will not look that bad. Note I am not talking about small matrices/graphics.

3. Start to improve generalised BLAS with SIMD instructions.
nice, but not really important. Good interface to existing high quality BLAS seems more important to me than fast D linear algebra implementation + CBLAS like interface.

4. And then continue discussion about type of matrixes we want...

+1

2. Then write similar code with hardware optimized BLAS and benchmark where the overhead between pure C/LLVM and BLAS calls balance out to even.
may there are more important / beneficial things to work on - assuming total time of contributors is fix and used for other D stuff:)

[1] https://github.com/kostya/benchmarks
[2] https://github.com/kostya/benchmarks/pull/6
[3] http://www.cs.utexas.edu/users/flame/pubs/GotoTOMS2.pdf
[4] http://www.cs.utexas.edu/users/pingali/CS378/2008sp/papers/gotoPaper.pdf

Reply via email to