On 10/07/2016 03:42 AM, Ilya Yaroshenko wrote:
For example, SUM_i of sqrt(fabs(a[i])) can be vectorised using
mir.ndslice.algorithm.
vxorps instruction can be used for fabs.
vsqrtps instruction can be used for sqrt.
LDC's @fastmath allows to re-associate summation elements.

Depend on data cache level this allows to speed up iteration 8 times for
single precision floating point number for AVX (16 times for AVX512?).

Yah, 8 times is large enough to justify an important change.

Current std.math has following problems:

1. Math funcitons are not templates -> Phobos should be linked.

This is also the case for C++ - most math functions are linked from the C standard library. How do typical linear algebra libraries similar in functionality with Mir (such as Eigen) deal with this situation?

Also, one question is how does the existence of unused functions impede the working of faster functions provided separately? Is it a sticky point that std.math is he exact module used?

Trying to get a good grip on the matter. Generally you'd have a very easy time convincing me that templates are a better way to go :o). But we need to have a good motivation. Do you have a brief example illustrating one proposed template and how it is better than the old ways?

   1.a I strongly decided to move forward without DRuntime. A phobos as
source library is partially OK, but no linking dependencies should be.
BetterC mode is what required for Mir to replace OpenBLAS and Eigen.
New
cpuid, threads and mutexes should be provided too. New cpuid [1] is
already implemented (I just need to replace module constructor with
explicit initialization function).

Do you think you can integrate the new cpuid implementation with the existing interface (most likely greatly enhancing it) without breaking the existing clients?

Same question for threads.

Same question for mutexes.

My strong opinion is that a D library
for D is a wrong direction. A numeric D library should be a product for
other languages too, like many C libraries does. One my client is
thinking to invest to nothrow @nogc async I/O for production, so it may
help to move to betterC direction too.

Sure. A different way to frame this is to make D friendlier toward linking with other languages. The way I see it, if we get alternatives for cpuid, threads, and mutexes in Mir, that would benefit clients interested in linear algebra. If we get them in druntime, that would benefit clients interested in linear algebra and everything else. Clearly the impact would be much larger.

  2.b In context of 1.a, linking multiple binaries compiled with
different DRuntime/Phobos versions may cause significant problems.
DRuntime is not so stable like std C lib. One may say that I am doing
something wrong if I need to link libraries compiled with different
DRuntimes. But this is what will happen often with D in real world if D
start to replace C libraries (1.a). So, betterC without DRuntime /
Phobos linking dependencies is a direction to move forward. nothrow
@nogc generic Phobos code seems to be OK.

Hmmm... well I seem to recall the C std lib in gcc has large interoperability issues with its own previous versions, even across minor releases. This has caused numerous headaches at Facebook because the breakages always come without warning and manifest themselves in obscure ways. On the Microsoft side things are even worse, because they virtually guarantee that a version of VS is not binary compatible with the previous ones (I'm not kidding; it's deliberate).

That sets a rather low baseline for us :o). Clearly we'd want to do better, and we probably can. But I think it would be an exaggeration to worry too much about such scenarios.

2. Math funcitons are not templates -> They are not inlined -> No
vectorization + function calls in a loop body. One day this may be
fixed, but (1.a, 1.b).

How to the likes of Eigen do it? Do they provide their own templated implementation of <math.h>?

Have you investigated the much hailed link-time inlining?

3. Math funcitons are not aliases for LDC -> LDC's @fastmath would not
work for them. To enable @fastmath for this functions they should be
annotated with @fastmath, which is not acceptable. If a function is an
alias for llvm intrinsics, than @fastmath flag can be applied to a
function, which calls it.

Not sure I udnerstand this, but it seems to me making the math functions templates would solve it?


Thanks,

Andrei

Reply via email to