On Sunday, 14 June 2015 at 18:49:21 UTC, Ilya Yaroshenko wrote:
Yes, but it would be hard to create SIMD optimised version.
Then again clang is getting better at this stuff.
What do you think about this chain of steps?
1. Create generalised (only type template and my be flags) BLAS
algorithms (probably slow) with CBLAS like API.
2. Allow users to use existing CBLAS libraries inside
generalised BLAS.
3. Start to improve generalised BLAS with SIMD instructions.
4. And then continue discussion about type of matrixes we
want...
Hmm… I don't know. In general I think the best thing to do is to
develop libraries with a project and then turn it into something
more abstract.
If I had more time I think I would have made the assumption that
we could make LDC produce whatever next version of clang can do
with pragmas/GCC-extensions and used that assumption for building
some prototypes. So I would:
1. protoype typical constructs in C, compile it with next version
of llvm/clang (with e.g. 4xloop-unrolling and try different
optimization/vectorizing options) the look at the output in LLVM
IR and assembly mnemonic code.
2. Then write similar code with hardware optimized BLAS and
benchmark where the overhead between pure C/LLVM and BLAS calls
balance out to even.
Then you have a rough idea of what the limitations of the current
infrastructure looks like, and can start modelling the template
types in D?
I'm not sure that you should use SIMD directly, but align the
memory for it. Like, on iOS you end up using LLVM subsets because
of the new bitcode requirements. Ditto for PNACL.
Just a thought, but that's what I would I do.