Re: std.math API rework
On Friday, 7 October 2016 at 17:02:02 UTC, Andrei Alexandrescu wrote: On 10/07/2016 03:42 AM, Ilya Yaroshenko wrote: For example, SUM_i of sqrt(fabs(a[i])) can be vectorised using mir.ndslice.algorithm. vxorps instruction can be used for fabs. vsqrtps instruction can be used for sqrt. LDC's @fastmath allows to re-associate summation elements. Depend on data cache level this allows to speed up iteration 8 times for single precision floating point number for AVX (16 times for AVX512?). Yah, 8 times is large enough to justify an important change. Current std.math has following problems: 1. Math funcitons are not templates -> Phobos should be linked. This is also the case for C++ - most math functions are linked from the C standard library. How do typical linear algebra libraries similar in functionality with Mir (such as Eigen) deal with this situation? 1) BLAS-like API requires only sqrt and fabs. The solutions used in Eigen depend on compiler. For example, the following code can be found: ```c++ template<> EIGEN_DEVICE_FUNC inline float4 pabs(const float4& a) { return make_float4(fabsf(a.x), fabsf(a.y), fabsf(a.z), fabsf(a.w)); } template<> EIGEN_DEVICE_FUNC inline double2 pabs(const double2& a) { return make_double2(fabs(a.x), fabs(a.y)); } ``` 2) Eigen, uBLAS and other use Expression Templates [1], which are used to compose few multiplications, additions/subtractions and maybe some per element operations on matrices and vectors. In the same time I have never seen that a lambda can be passed. C/C++ high performance libraries uses macroses/templates for type specification, but lambdas are not used. This makes upcoming ndslice.algorithm a unique solution, which is more flexible, fast, and universal comparing with C++ Expression Templates. It still requires some rework, and LDC based DMD 2.072 for further optimization. Also, one question is how does the existence of unused functions impede the working of faster functions provided separately? Is it a sticky point that std.math is he exact module used? Of course a separate module or dub can be provided instead. In addition, std.math should be splitted into package and reworked. So, instead of modifying std.math we can start a new math package. Trying to get a good grip on the matter. Generally you'd have a very easy time convincing me that templates are a better way to go :o). But we need to have a good motivation. Do you have a brief example illustrating one proposed template and how it is better than the old ways? Yes, the example can be found at [2]. First template is better for BetterC mode. The example contains a C program. The last paragraph in this post contains second part about this example. The first part: ```c #include #include #include float mir_alg_bar(float, float, float); int main(int argc, char const *argv[]) { if(argc < 4) { puts("Usage: app number_a number_b number_с"); return 1; } float a = atof(argv[1]); float b = atof(argv[2]); float c = atof(argv[3]); float d = mir_alg_bar(a, b, c); printf("%f\n", d); return 0; } ``` This program should be linked with BetterC libray: ```sh clang app.c alg/libmir-alg.a ``` `mir-alg` is a small betterC library, which uses a generic `mir` dummy (not a normal Mir for example simplicity). It can be linked as common C library and has extern(C) nothrow @nogc interface. ```d module alg_bar; pragma(LDC_no_moduleinfo); import ldc.attributes : fastmath; import mir.alg; extern(C) nothrow @nogc @fastmath: float mir_alg_bar(float a, float b, float c) { return alg1!bar(a, b, c); }; ``` Mir dummy contains 3 implementations `alg1`, `alg2`, `alg3`. ```d module mir.alg; import ldc.intrinsics : llvm_fabs; import ldc.attributes : fastmath; pragma(LDC_no_moduleinfo); @fastmath { auto alg1(alias f)(float a, float b, float c) { return f(a, llvm_fabs(b), c); } auto alg2(alias f)(float a, float b, float c) { return f(a, fabs(b), c); } auto alg3(alias f)(float a, float b, float c) { import std.math; return f(a, std.math.fabs(b), c); } } @fastmath auto bar()(float a, float b, float c) { return a * b + c; } float fabs(float x) @safe pure nothrow @nogc { return llvm_fabs(x); } ``` `fabs` function declaration is the same as in LDC's Phobos fork. `alg1` can be linked with C library in any optimization modes. `alg2` and `alg3` uses function declarations and requir to link `libmir` dummy or `libphobos2` respectively. Making `fabs` template solves this problem. LDC can inline `fabs` for `alg2` and `alg3`, but `O2` flag is required. 1.a I strongly decided to move forward without DRuntime. A phobos as source library is partially OK, but no linking dependencies should be. BetterC mode is what
Re: std.math API rework
On Friday, 7 October 2016 at 17:02:02 UTC, Andrei Alexandrescu wrote: On 10/07/2016 03:42 AM, Ilya Yaroshenko wrote: 2. Math funcitons are not templates -> They are not inlined -> No vectorization + function calls in a loop body. One day this may be fixed, but (1.a, 1.b). That trivial non-template functions are not cross-module inlined by LDC is something I am working on (use `-enable-cross-module-inlining` with 1.1.0). I wouldn't use it as an argument for significant changes. How to the likes of Eigen do it? Do they provide their own templated implementation of ? Have you investigated the much hailed link-time inlining? Also a work-in-progress. It would at the very least require a special build of Phobos, something we don't do yet.
Re: std.math API rework
On 10/07/2016 03:42 AM, Ilya Yaroshenko wrote: For example, SUM_i of sqrt(fabs(a[i])) can be vectorised using mir.ndslice.algorithm. vxorps instruction can be used for fabs. vsqrtps instruction can be used for sqrt. LDC's @fastmath allows to re-associate summation elements. Depend on data cache level this allows to speed up iteration 8 times for single precision floating point number for AVX (16 times for AVX512?). Yah, 8 times is large enough to justify an important change. Current std.math has following problems: 1. Math funcitons are not templates -> Phobos should be linked. This is also the case for C++ - most math functions are linked from the C standard library. How do typical linear algebra libraries similar in functionality with Mir (such as Eigen) deal with this situation? Also, one question is how does the existence of unused functions impede the working of faster functions provided separately? Is it a sticky point that std.math is he exact module used? Trying to get a good grip on the matter. Generally you'd have a very easy time convincing me that templates are a better way to go :o). But we need to have a good motivation. Do you have a brief example illustrating one proposed template and how it is better than the old ways? 1.a I strongly decided to move forward without DRuntime. A phobos as source library is partially OK, but no linking dependencies should be. BetterC mode is what required for Mir to replace OpenBLAS and Eigen. New cpuid, threads and mutexes should be provided too. New cpuid [1] is already implemented (I just need to replace module constructor with explicit initialization function). Do you think you can integrate the new cpuid implementation with the existing interface (most likely greatly enhancing it) without breaking the existing clients? Same question for threads. Same question for mutexes. My strong opinion is that a D library for D is a wrong direction. A numeric D library should be a product for other languages too, like many C libraries does. One my client is thinking to invest to nothrow @nogc async I/O for production, so it may help to move to betterC direction too. Sure. A different way to frame this is to make D friendlier toward linking with other languages. The way I see it, if we get alternatives for cpuid, threads, and mutexes in Mir, that would benefit clients interested in linear algebra. If we get them in druntime, that would benefit clients interested in linear algebra and everything else. Clearly the impact would be much larger. 2.b In context of 1.a, linking multiple binaries compiled with different DRuntime/Phobos versions may cause significant problems. DRuntime is not so stable like std C lib. One may say that I am doing something wrong if I need to link libraries compiled with different DRuntimes. But this is what will happen often with D in real world if D start to replace C libraries (1.a). So, betterC without DRuntime / Phobos linking dependencies is a direction to move forward. nothrow @nogc generic Phobos code seems to be OK. Hmmm... well I seem to recall the C std lib in gcc has large interoperability issues with its own previous versions, even across minor releases. This has caused numerous headaches at Facebook because the breakages always come without warning and manifest themselves in obscure ways. On the Microsoft side things are even worse, because they virtually guarantee that a version of VS is not binary compatible with the previous ones (I'm not kidding; it's deliberate). That sets a rather low baseline for us :o). Clearly we'd want to do better, and we probably can. But I think it would be an exaggeration to worry too much about such scenarios. 2. Math funcitons are not templates -> They are not inlined -> No vectorization + function calls in a loop body. One day this may be fixed, but (1.a, 1.b). How to the likes of Eigen do it? Do they provide their own templated implementation of ? Have you investigated the much hailed link-time inlining? 3. Math funcitons are not aliases for LDC -> LDC's @fastmath would not work for them. To enable @fastmath for this functions they should be annotated with @fastmath, which is not acceptable. If a function is an alias for llvm intrinsics, than @fastmath flag can be applied to a function, which calls it. Not sure I udnerstand this, but it seems to me making the math functions templates would solve it? Thanks, Andrei
Re: std.math API rework
On Friday, 7 October 2016 at 01:53:27 UTC, Andrei Alexandrescu wrote: On 10/6/16 12:53 PM, Ilya Yaroshenko wrote: Effective work with std.experimental.ndslice and and mir.ndslice.array requires half of std.math be an exactly aliases to LLVM intrinsics (for LDC). Why? To enable vectorization for mir.ndslice.algorithm I created internal math module [1] in Mir. But this is weird, because third side packages like DCV [2] requires to use the module too. Also, some optimisation for std.complex and future std.exprimental.color would be very ugly without proposed change. I'd love to understand this point better. In particular, how do you reconcile it with kinke's assertion that some of these intrinsics simply format to C routines? Our high-level view is that doing efficient work should not require one to fork the standard library. On the other hand, the traditional place for compiler-specific code is in the core runtime, not the standard library. (There is a tiny bit of stdlib code that depends on dmd to be fair.) So I'd like to be reasonably confident the right rocks are put in the right places. Have you considered (per Iain) migrating these symbols to core.math and then forward those in stdlib to them? Thanks, Andrei For example, SUM_i of sqrt(fabs(a[i])) can be vectorised using mir.ndslice.algorithm. vxorps instruction can be used for fabs. vsqrtps instruction can be used for sqrt. LDC's @fastmath allows to re-associate summation elements. Depend on data cache level this allows to speed up iteration 8 times for single precision floating point number for AVX (16 times for AVX512?). Furthermore, at least for x86, @fastmath flag does not break any math logic. It allows only to re-associate elementes (i mean exactly this example for x86). Current std.math has following problems: 1. Math funcitons are not templates -> Phobos should be linked. 1.a I strongly decided to move forward without DRuntime. A phobos as source library is partially OK, but no linking dependencies should be. BetterC mode is what required for Mir to replace OpenBLAS and Eigen. New cpuid, threads and mutexes should be provided too. New cpuid [1] is already implemented (I just need to replace module constructor with explicit initialization function). My strong opinion is that a D library for D is a wrong direction. A numeric D library should be a product for other languages too, like many C libraries does. One my client is thinking to invest to nothrow @nogc async I/O for production, so it may help to move to betterC direction too. 2.b In context of 1.a, linking multiple binaries compiled with different DRuntime/Phobos versions may cause significant problems. DRuntime is not so stable like std C lib. One may say that I am doing something wrong if I need to link libraries compiled with different DRuntimes. But this is what will happen often with D in real world if D start to replace C libraries (1.a). So, betterC without DRuntime / Phobos linking dependencies is a direction to move forward. nothrow @nogc generic Phobos code seems to be OK. 2. Math funcitons are not templates -> They are not inlined -> No vectorization + function calls in a loop body. One day this may be fixed, but (1.a, 1.b). 3. Math funcitons are not aliases for LDC -> LDC's @fastmath would not work for them. To enable @fastmath for this functions they should be annotated with @fastmath, which is not acceptable. If a function is an alias for llvm intrinsics, than @fastmath flag can be applied to a function, which calls it. [1] https://github.com/libmir/cpuid Best regards, Ilya
Re: std.math API rework
On 10/6/16 12:53 PM, Ilya Yaroshenko wrote: Effective work with std.experimental.ndslice and and mir.ndslice.array requires half of std.math be an exactly aliases to LLVM intrinsics (for LDC). Why? To enable vectorization for mir.ndslice.algorithm I created internal math module [1] in Mir. But this is weird, because third side packages like DCV [2] requires to use the module too. Also, some optimisation for std.complex and future std.exprimental.color would be very ugly without proposed change. I'd love to understand this point better. In particular, how do you reconcile it with kinke's assertion that some of these intrinsics simply format to C routines? Our high-level view is that doing efficient work should not require one to fork the standard library. On the other hand, the traditional place for compiler-specific code is in the core runtime, not the standard library. (There is a tiny bit of stdlib code that depends on dmd to be fair.) So I'd like to be reasonably confident the right rocks are put in the right places. Have you considered (per Iain) migrating these symbols to core.math and then forward those in stdlib to them? Thanks, Andrei
Re: std.math API rework
On Thursday, 6 October 2016 at 20:55:55 UTC, Ilya Yaroshenko wrote: So, I don't see a reason why this change break something, hehe No, Iain is right. These LLVM intrinsics are most often simple forwarders to the C runtime functions; I was rather negatively surprised to find out a while ago.
Re: std.math API rework
On 6 October 2016 at 22:55, Ilya Yaroshenko via Digitalmars-d wrote: > On Thursday, 6 October 2016 at 20:45:24 UTC, Iain Buclaw wrote: >> >> On 6 October 2016 at 22:31, Ilya Yaroshenko via Digitalmars-d >> wrote: >>> >>> On Thursday, 6 October 2016 at 20:07:19 UTC, Iain Buclaw wrote: On 6 October 2016 at 18:53, Ilya Yaroshenko via Digitalmars-d wrote: > > > [...] If you can prove that llvm intrinsics are pure (gcc math intrinsics are not) and that llvm intrinsics pass the unittest (gcc math intrinsics aren't guaranteed to due to the vagary of libm implementations and quirky cpu support that trades correctness for efficiency). [...] >>> >>> >>> >>> LLVM math functions are pure :P http://llvm.org/docs/LangRef.html >>> >> >> I picked a random example. >> >> http://llvm.org/docs/LangRef.html#llvm-sin-intrinsic >> >> """ >> >> Semantics: >> >> This function returns the sine of the specified operand, returning the >> same values as the libm sin functions would, and handles error conditions in >> the same way. >> >> """ >> >> This would have me believe that they are infact not pure. ;-) >> >> But, I've never looked under the hood of LLVM, so I can only believe those >> who have. In any case, IMO, you should focus on getting this into >> core.math. That's where compiler intrinsics should go. The intrinsics of >> std.math are historical baggage and are probably due a deprecation - that >> is, in the sense that their symbols be converted into aliases. >> >> Iain. > > > Current code is (please look in LDC's fork): > > version(LDC) > { > real cos(real x) @safe pure nothrow @nogc { return llvm_cos(x); } > ///ditto > double cos(double x) @safe pure nothrow @nogc { return llvm_cos(x); } > ///ditto > float cos(float x) @safe pure nothrow @nogc { return llvm_cos(x); } > } > else > { > > real cos(real x) @safe pure nothrow @nogc { pragma(inline, true); return > core.math.cos(x); } > //FIXME > ///ditto > double cos(double x) @safe pure nothrow @nogc { return cos(cast(real)x); } > //FIXME > ///ditto > float cos(float x) @safe pure nothrow @nogc { return cos(cast(real)x); } > > } > > So, I don't see a reason why this change break something, hehe Well, sure, I could mark all gcc intrinsics as pure so you can use __builtin_print() or malloc() in pure code. Doesn't mean the compiler is honest in allowing it. ;-) Get this in core.math, there's no place for compiler-specific code in phobos. Iain.
Re: std.math API rework
On Thursday, 6 October 2016 at 20:45:24 UTC, Iain Buclaw wrote: On 6 October 2016 at 22:31, Ilya Yaroshenko via Digitalmars-d wrote: On Thursday, 6 October 2016 at 20:07:19 UTC, Iain Buclaw wrote: On 6 October 2016 at 18:53, Ilya Yaroshenko via Digitalmars-d wrote: [...] If you can prove that llvm intrinsics are pure (gcc math intrinsics are not) and that llvm intrinsics pass the unittest (gcc math intrinsics aren't guaranteed to due to the vagary of libm implementations and quirky cpu support that trades correctness for efficiency). [...] LLVM math functions are pure :P http://llvm.org/docs/LangRef.html I picked a random example. http://llvm.org/docs/LangRef.html#llvm-sin-intrinsic """ Semantics: This function returns the sine of the specified operand, returning the same values as the libm sin functions would, and handles error conditions in the same way. """ This would have me believe that they are infact not pure. ;-) But, I've never looked under the hood of LLVM, so I can only believe those who have. In any case, IMO, you should focus on getting this into core.math. That's where compiler intrinsics should go. The intrinsics of std.math are historical baggage and are probably due a deprecation - that is, in the sense that their symbols be converted into aliases. Iain. Current code is (please look in LDC's fork): version(LDC) { real cos(real x) @safe pure nothrow @nogc { return llvm_cos(x); } ///ditto double cos(double x) @safe pure nothrow @nogc { return llvm_cos(x); } ///ditto float cos(float x) @safe pure nothrow @nogc { return llvm_cos(x); } } else { real cos(real x) @safe pure nothrow @nogc { pragma(inline, true); return core.math.cos(x); } //FIXME ///ditto double cos(double x) @safe pure nothrow @nogc { return cos(cast(real)x); } //FIXME ///ditto float cos(float x) @safe pure nothrow @nogc { return cos(cast(real)x); } } So, I don't see a reason why this change break something, hehe
Re: std.math API rework
On 6 October 2016 at 22:31, Ilya Yaroshenko via Digitalmars-d wrote: > On Thursday, 6 October 2016 at 20:07:19 UTC, Iain Buclaw wrote: >> >> On 6 October 2016 at 18:53, Ilya Yaroshenko via Digitalmars-d >> wrote: >>> >>> [...] >> >> >> If you can prove that llvm intrinsics are pure (gcc math intrinsics are >> not) and that llvm intrinsics pass the unittest (gcc math intrinsics aren't >> guaranteed to due to the vagary of libm implementations and quirky cpu >> support that trades correctness for efficiency). >> >> [...] > > > LLVM math functions are pure :P http://llvm.org/docs/LangRef.html > I picked a random example. http://llvm.org/docs/LangRef.html#llvm-sin-intrinsic """ Semantics: This function returns the sine of the specified operand, returning the same values as the libm sin functions would, and handles error conditions in the same way. """ This would have me believe that they are infact not pure. ;-) But, I've never looked under the hood of LLVM, so I can only believe those who have. In any case, IMO, you should focus on getting this into core.math. That's where compiler intrinsics should go. The intrinsics of std.math are historical baggage and are probably due a deprecation - that is, in the sense that their symbols be converted into aliases. Iain.
Re: std.math API rework
On Thursday, 6 October 2016 at 20:07:19 UTC, Iain Buclaw wrote: On 6 October 2016 at 18:53, Ilya Yaroshenko via Digitalmars-d wrote: [...] If you can prove that llvm intrinsics are pure (gcc math intrinsics are not) and that llvm intrinsics pass the unittest (gcc math intrinsics aren't guaranteed to due to the vagary of libm implementations and quirky cpu support that trades correctness for efficiency). [...] LLVM math functions are pure :P http://llvm.org/docs/LangRef.html I can do a Phobos fork. But I hope I can fix it.
Re: std.math API rework
On 6 October 2016 at 18:53, Ilya Yaroshenko via Digitalmars-d wrote: > Effective work with std.experimental.ndslice and and mir.ndslice.array > requires half of std.math be an exactly aliases to LLVM intrinsics (for > LDC). > > To enable vectorization for mir.ndslice.algorithm I created internal math > module [1] in Mir. But this is weird, because third side packages like DCV > [2] requires to use the module too. Also, some optimisation for std.complex > and future std.exprimental.color would be very ugly without proposed change. > > Proposed change is very simple: > Each math function listed in [1] should be a template for DMD/GDC and an > alias for LDC in std.math. > > If some one has strong arguments against it, please let me know now. > > [1] https://github.com/libmir/mir/blob/master/source/mir/internal/math.d > [2] https://github.com/ljubobratovicrelja/dcv > > Best regards, > Ilya If you can prove that llvm intrinsics are pure (gcc math intrinsics are not) and that llvm intrinsics pass the unittest (gcc math intrinsics aren't guaranteed to due to the vagary of libm implementations and quirky cpu support that trades correctness for efficiency). I have a reasonable belief to say that the answer is no on both parts. Even if some llvm intrinsics lower to native instructions on x86, most other platforms will just forward it to an impure, mixed bag of long double support libm. :-) If you need it specialized, do it yourself. Phobos seems more of a place for generalized application support, from what I gather, and how I approach it. Iain.
Re: std.math API rework
On Thursday, 6 October 2016 at 16:53:54 UTC, Ilya Yaroshenko wrote: Effective work with std.experimental.ndslice and and mir.ndslice.array requires half of std.math be an exactly EDIT: mir.ndslice.algorithm
std.math API rework
Effective work with std.experimental.ndslice and and mir.ndslice.array requires half of std.math be an exactly aliases to LLVM intrinsics (for LDC). To enable vectorization for mir.ndslice.algorithm I created internal math module [1] in Mir. But this is weird, because third side packages like DCV [2] requires to use the module too. Also, some optimisation for std.complex and future std.exprimental.color would be very ugly without proposed change. Proposed change is very simple: Each math function listed in [1] should be a template for DMD/GDC and an alias for LDC in std.math. If some one has strong arguments against it, please let me know now. [1] https://github.com/libmir/mir/blob/master/source/mir/internal/math.d [2] https://github.com/ljubobratovicrelja/dcv Best regards, Ilya