On Sunday, 27 December 2015 at 10:28:53 UTC, Russel Winder wrote:
On Sat, 2015-12-26 at 19:57 +0000, Ilya Yaroshenko via
Digitalmars-d- announce wrote:
Hi,
I will write GEMM and GEMV families of BLAS for Phobos.
Goals:
- code without assembler
- code based on SIMD instructions
- DMD/LDC/GDC support
- kernel based architecture like OpenBLAS
- 85-100% FLOPS comparing with OpenBLAS (100%)
- tiny generic code comparing with OpenBLAS
- ability to define user kernels
- allocators support. GEMM requires small internal
allocations.
- @nogc nothrow pure template functions (depends on
allocator)
- optional multithreaded
- ability to work with `Slice` multidimensional arrays when
stride between elements in vector is greater than 1. In common
BLAS matrix strides between rows or columns always equals 1.
Shouldn't to goal of a project like this be to be something
that OpenBLAS isn't? Given D's ability to call C and C++ code,
it is not clear to me that simply rewriting OpenBLAS in D has
any goal for the D or BLAS communities per se. Doesn't stop it
being a fun activity for the programmer, obviously, but unless
there is something that isn't in OpenBLAS, I cannot see this
ever being competition and so building a community around the
project.
It depends on what you mean with "something like this". OpenBLAS
is _huge_ amount of assembler code. For _each_ platform for
_each_ CPU generation for _each_ floating point / complex type it
would have a kernel or few kernels. It is 30 MB of assembler code.
Not only D code can call C/C++, but also C/C++ (and so any other
language) can call D code. So std.blas may be used in C/C++
projects like Julia.
Now if the threads/OpenCL/CUDA was front and centre so that a
goal was to be Nx faster than OpenBLAS, that could be a goal
worth standing behind.
It can be goal for standalone project. But standard library
should be portable on any platform without significant problems
(especially without problems caused by matrix multiplication). So
my goal is tiny and portable project like ATLAS, but fast like
OpenBLAS. BTW, threads in std.blas would be optional like in
OpenBLAS. Futhermore std.blas will allow a user to write his own
kernels.
Not to mention full N-dimension vectors so that D could
seriously compete against Numpy in the Python world.
I am not sure how D can compete against Numpy in the Python
world, but it can compete Python in world of programming
languages. BTW, N-dimension ranges/arrays/vectors already
implemented for Phobos:
PR:
https://github.com/D-Programming-Language/phobos/pull/3397
Updated Docs:
http://dtest.thecybershadow.net/artifact/website-76234ca0eab431527327d5ce1ec0ad74c6421533-fedfc857090c1c873b17e7a1e4cf853c/web/phobos-prerelease/std_experimental_ndslice.html
Please participate in voting (time constraints is extended) :-)
http://forum.dlang.org/thread/nexiojzouxtawdwnl...@forum.dlang.org
Ilya