On Sunday, 14 June 2015 at 18:49:21 UTC, Ilya Yaroshenko wrote:
Yes, but it would be hard to create SIMD optimised version.

Then again clang is getting better at this stuff.

What do you think about this chain of steps?

1. Create generalised (only type template and my be flags) BLAS algorithms (probably slow) with CBLAS like API. 2. Allow users to use existing CBLAS libraries inside generalised BLAS.
3. Start to improve generalised BLAS with SIMD instructions.
4. And then continue discussion about type of matrixes we want...

Hmm… I don't know. In general I think the best thing to do is to develop libraries with a project and then turn it into something more abstract.

If I had more time I think I would have made the assumption that we could make LDC produce whatever next version of clang can do with pragmas/GCC-extensions and used that assumption for building some prototypes. So I would:

1. protoype typical constructs in C, compile it with next version of llvm/clang (with e.g. 4xloop-unrolling and try different optimization/vectorizing options) the look at the output in LLVM IR and assembly mnemonic code.

2. Then write similar code with hardware optimized BLAS and benchmark where the overhead between pure C/LLVM and BLAS calls balance out to even.

Then you have a rough idea of what the limitations of the current infrastructure looks like, and can start modelling the template types in D?

I'm not sure that you should use SIMD directly, but align the memory for it. Like, on iOS you end up using LLVM subsets because of the new bitcode requirements. Ditto for PNACL.

Just a thought, but that's what I would I do.

Reply via email to