On Friday, 7 October 2016 at 01:53:27 UTC, Andrei Alexandrescu wrote:
On 10/6/16 12:53 PM, Ilya Yaroshenko wrote:
Effective work with std.experimental.ndslice and and mir.ndslice.array requires half of std.math be an exactly aliases to LLVM intrinsics (for
LDC).

Why?

To enable vectorization for mir.ndslice.algorithm I created internal math module [1] in Mir. But this is weird, because third side packages like DCV [2] requires to use the module too. Also, some optimisation for std.complex and future std.exprimental.color would be very ugly without
proposed change.

I'd love to understand this point better. In particular, how do you reconcile it with kinke's assertion that some of these intrinsics simply format to C routines?

Our high-level view is that doing efficient work should not require one to fork the standard library. On the other hand, the traditional place for compiler-specific code is in the core runtime, not the standard library. (There is a tiny bit of stdlib code that depends on dmd to be fair.)

So I'd like to be reasonably confident the right rocks are put in the right places. Have you considered (per Iain) migrating these symbols to core.math and then forward those in stdlib to them?


Thanks,

Andrei

For example, SUM_i of sqrt(fabs(a[i])) can be vectorised using mir.ndslice.algorithm.
vxorps instruction can be used for fabs.
vsqrtps instruction can be used for sqrt.
LDC's @fastmath allows to re-associate summation elements.

Depend on data cache level this allows to speed up iteration 8 times for single precision floating point number for AVX (16 times for AVX512?).

Furthermore, at least for x86, @fastmath flag does not break any math logic. It allows only to re-associate elementes (i mean exactly this example for x86).

Current std.math has following problems:

1. Math funcitons are not templates -> Phobos should be linked.
1.a I strongly decided to move forward without DRuntime. A phobos as source library is partially OK, but no linking dependencies should be. BetterC mode is what required for Mir to replace OpenBLAS and Eigen. New cpuid, threads and mutexes should be provided too. New cpuid [1] is already implemented (I just need to replace module constructor with explicit initialization function). My strong opinion is that a D library for D is a wrong direction. A numeric D library should be a product for other languages too, like many C libraries does. One my client is thinking to invest to nothrow @nogc async I/O for production, so it may help to move to betterC direction too. 2.b In context of 1.a, linking multiple binaries compiled with different DRuntime/Phobos versions may cause significant problems. DRuntime is not so stable like std C lib. One may say that I am doing something wrong if I need to link libraries compiled with different DRuntimes. But this is what will happen often with D in real world if D start to replace C libraries (1.a). So, betterC without DRuntime / Phobos linking dependencies is a direction to move forward. nothrow @nogc generic Phobos code seems to be OK.

2. Math funcitons are not templates -> They are not inlined -> No vectorization + function calls in a loop body. One day this may be fixed, but (1.a, 1.b).

3. Math funcitons are not aliases for LDC -> LDC's @fastmath would not work for them. To enable @fastmath for this functions they should be annotated with @fastmath, which is not acceptable. If a function is an alias for llvm intrinsics, than @fastmath flag can be applied to a function, which calls it.

[1] https://github.com/libmir/cpuid

Best regards,
Ilya

Reply via email to