On Sat, 11 Jul 2009 06:14:56 -0400, Lutger <lutger.blijdest...@gmail.com>
wrote:
"Jérôme M. Berger" wrote:
(...)
BLADE has already shown that it is possible to do stuff like this in a
library, but I think it goes without saying that if it was built into
the language the syntax could be made considerably nicer. Compare:
auto m = MatrixOp!("a*A*B + b*C")(aVal, bVal, aMtrx, bMtrx, cMtrx);
auto m = a*A*B + b*C;
If D could do this, I think it would become the next FORTRAN. :)
-Lars
Actually, this has already been done in C++:
http://flens.sourceforge.net/ It should be possible to port it to D...
Jerome
Someone correct me if I'm wrong, but I think what Blade does is a bit
more
advanced than FLENS. Blade performs optimizations on the AST level and
generates (near) optimal assembly at compile time. I couldn't find info
on
what FLENS does exactly beyond inlining through template expressions, but
from the looks of it it doesn't do any of the AST level optimizations
Blade
does. Anyone care to provide more info? Can Blade also generate better
asm
than is possible with libraries such as FLENS?
Flens (and several other libraries like it) just provide a syntactic sugar
for BLAS using expression templates. So calling something like a = b + c +
d gets transformed into (if you're lucky) a = b + c; a = a + d; (and if
you're unlucky) temp = b + c; a = temp + d; So you end up looping through
memory multiple times. The next best option are expression templates,
(Blitz or Boost come to mind) which encode the arguments of each operation
into a struct. This results in a lot of temporaries and is a major
performance hit for small-ish vectors, but you only loop through memory
once and make no allocations, which is a win on larger vectors. Then you
have BLADE and D array ops which don't create any temporaries and are
faster still. The counter is that a BLAS library can be tuned for each
specific CPU or GPU and select the fastest library at runtime.