Re: Why don't we switch to C like floating pointed arithmetic instead of automatic expansion to reals?
On Saturday, 6 August 2016 at 09:35:32 UTC, Walter Bright wrote: On 8/6/2016 1:21 AM, Ilya Yaroshenko wrote: We need 2 new pragmas with the same syntax as `pragma(inline, xxx)`: 1. `pragma(fusedMath)` allows fused mul-add, mul-sub, div-add, div-sub operations. 2. `pragma(fastMath)` equivalents to [1]. This pragma can be used to allow extended precision. The LDC fastmath bothers me a lot. It throws away proper NaN and infinity handling, and throws away precision by allowing reciprocal and algebraic transformations. As I've said before, correctness should be first, not speed, and fastmath has nothing to do with this thread. I don't know what the point of fusedMath is. It's not as cut and dry. Sometime, processing faster mean you can process more data to begin with, and get a better result.
Re: Why don't we switch to C like floating pointed arithmetic instead of automatic expansion to reals?
On 8/6/2016 11:45 PM, Ilya Yaroshenko wrote: So it does make sense that allowing fused operations would be equivalent to having no maximum precision. Fused operations are mul/div+add/sub only. Fused operations does not break compesator subtraction: auto t = a - x + x; So, please, make them as separate pragma. ok
Re: Why don't we switch to C like floating pointed arithmetic instead of automatic expansion to reals?
On 6 August 2016 at 22:12, David Nadlinger via Digitalmars-dwrote: > On Saturday, 6 August 2016 at 10:02:25 UTC, Iain Buclaw wrote: >> >> No pragmas tied to a specific architecture should be allowed in the >> language spec, please. > > > I wholeheartedly agree. However, it's not like FP optimisation pragmas would > be specific to any particular architecture. They just describe classes of > transformations that are allowed on top of the standard semantics. > > For example, whether transforming `a + (b * c)` into a single operation is > allowed is not a question of the target architecture at all, but rather > whether the implicit rounding after evaluating (b * c) can be skipped or > not. While this in turn of course enables the compiler to use FMA > instructions on x86/AVX, ARM/NEON, PPC, …, it is not architecture-specific > at all on a conceptual level. > Well, you get fusedMath for free when turning on -mfma or -mfused-madd - whatever is most relevant for the target. Try adding -mfma here. http://goo.gl/xsvDXM
Re: Why don't we switch to C like floating pointed arithmetic instead of automatic expansion to reals?
On Saturday, 6 August 2016 at 22:32:08 UTC, Walter Bright wrote: On 8/6/2016 3:14 PM, David Nadlinger wrote: Of course, if floating point values are strictly defined as having only a minimum precision, then folding away the rounding after the multiplication is always legal. Yup. So it does make sense that allowing fused operations would be equivalent to having no maximum precision. Fused operations are mul/div+add/sub only. Fused operations does not break compesator subtraction: auto t = a - x + x; So, please, make them as separate pragma.
Re: Why don't we switch to C like floating pointed arithmetic instead of automatic expansion to reals?
On Saturday, 6 August 2016 at 21:56:06 UTC, Walter Bright wrote: On 8/6/2016 1:06 PM, Ilya Yaroshenko wrote: Some applications requires exactly the same results for different architectures (probably because business requirement). So this optimization is turned off by default in LDC for example. Let me rephrase the question - how does fusing them alter the result? The result became more precise, because single rounding instead of two.
Re: Why don't we switch to C like floating pointed arithmetic instead of automatic expansion to reals?
On 8/6/2016 3:14 PM, David Nadlinger wrote: On Saturday, 6 August 2016 at 21:56:06 UTC, Walter Bright wrote: Let me rephrase the question - how does fusing them alter the result? There is just one rounding operation instead of two. Makes sense. Of course, if floating point values are strictly defined as having only a minimum precision, then folding away the rounding after the multiplication is always legal. Yup. So it does make sense that allowing fused operations would be equivalent to having no maximum precision.
Re: Why don't we switch to C like floating pointed arithmetic instead of automatic expansion to reals?
On Saturday, 6 August 2016 at 21:56:06 UTC, Walter Bright wrote: Let me rephrase the question - how does fusing them alter the result? There is just one rounding operation instead of two. Of course, if floating point values are strictly defined as having only a minimum precision, then folding away the rounding after the multiplication is always legal. — David
Re: Why don't we switch to C like floating pointed arithmetic instead of automatic expansion to reals?
On 8/6/2016 2:12 PM, David Nadlinger wrote: This is true – and precisely the reason why it is actually defined (ldc.attributes) as --- alias fastmath = AliasSeq!(llvmAttr("unsafe-fp-math", "true"), llvmFastMathFlag("fast")); --- This way, users can actually combine different optimisations in a more tasteful manner as appropriate for their particular application. Experience has shown that people – even those intimately familiar with FP semantics – expect a catch-all kitchen-sink switch for all natural optimisations (natural when equating FP values with real numbers). This is why the shorthand exists. I didn't know that, thanks for the explanation. But the same can be done for pragmas, as the second argument isn't just true|false, it's an expression.
Re: Why don't we switch to C like floating pointed arithmetic instead of automatic expansion to reals?
On 8/6/2016 1:06 PM, Ilya Yaroshenko wrote: Some applications requires exactly the same results for different architectures (probably because business requirement). So this optimization is turned off by default in LDC for example. Let me rephrase the question - how does fusing them alter the result?
Re: Why don't we switch to C like floating pointed arithmetic instead of automatic expansion to reals?
On Saturday, 6 August 2016 at 09:35:32 UTC, Walter Bright wrote: The LDC fastmath bothers me a lot. It throws away proper NaN and infinity handling, and throws away precision by allowing reciprocal and algebraic transformations. This is true – and precisely the reason why it is actually defined (ldc.attributes) as --- alias fastmath = AliasSeq!(llvmAttr("unsafe-fp-math", "true"), llvmFastMathFlag("fast")); --- This way, users can actually combine different optimisations in a more tasteful manner as appropriate for their particular application. Experience has shown that people – even those intimately familiar with FP semantics – expect a catch-all kitchen-sink switch for all natural optimisations (natural when equating FP values with real numbers). This is why the shorthand exists. — David
Re: Why don't we switch to C like floating pointed arithmetic instead of automatic expansion to reals?
On Saturday, 6 August 2016 at 12:48:26 UTC, Iain Buclaw wrote: There are compiler switches for that. Maybe there should be one pragma to tweak these compiler switches on a per-function basis, rather than separately named pragmas. This might be a solution for inherently compiler-specific settings (although for LDC we would probably go for "type-safe" UDAs/pragmas instead of parsing faux command-line strings). Floating point transformation semantics aren't compiler-specific, though. The corresponding options are used commonly enough in certain kinds of code that it doesn't seem prudent to require users to resort to compiler-specific ways of expressing them. — David
Re: Why don't we switch to C like floating pointed arithmetic instead of automatic expansion to reals?
On Saturday, 6 August 2016 at 10:02:25 UTC, Iain Buclaw wrote: No pragmas tied to a specific architecture should be allowed in the language spec, please. I wholeheartedly agree. However, it's not like FP optimisation pragmas would be specific to any particular architecture. They just describe classes of transformations that are allowed on top of the standard semantics. For example, whether transforming `a + (b * c)` into a single operation is allowed is not a question of the target architecture at all, but rather whether the implicit rounding after evaluating (b * c) can be skipped or not. While this in turn of course enables the compiler to use FMA instructions on x86/AVX, ARM/NEON, PPC, …, it is not architecture-specific at all on a conceptual level. — David
Re: Why don't we switch to C like floating pointed arithmetic instead of automatic expansion to reals?
On Saturday, 6 August 2016 at 19:51:11 UTC, Walter Bright wrote: On 8/6/2016 2:48 AM, Ilya Yaroshenko wrote: I don't know what the point of fusedMath is. It allows a compiler to replace two arithmetic operations with single composed one, see AVX2 (FMA3 for intel and FMA4 for AMD) instruction set. I understand that, I just don't understand why that wouldn't be done anyway. Some applications requires exactly the same results for different architectures (probably because business requirement). So this optimization is turned off by default in LDC for example.
Re: Why don't we switch to C like floating pointed arithmetic instead of automatic expansion to reals?
On 8/6/2016 2:48 AM, Ilya Yaroshenko wrote: I don't know what the point of fusedMath is. It allows a compiler to replace two arithmetic operations with single composed one, see AVX2 (FMA3 for intel and FMA4 for AMD) instruction set. I understand that, I just don't understand why that wouldn't be done anyway.
Re: Why don't we switch to C like floating pointed arithmetic instead of automatic expansion to reals?
On 8/6/2016 3:02 AM, Iain Buclaw via Digitalmars-d wrote: No pragmas tied to a specific architecture should be allowed in the language spec, please. A good point. On the other hand, a list of them would be nice so implementations don't step on each other.
Re: Why don't we switch to C like floating pointed arithmetic instead of automatic expansion to reals?
On 8/6/2016 5:09 AM, Johannes Pfau wrote: I think this restriction is also quite arbitrary. You're right that there are gray areas, but the distinction is not arbitrary. For example, mangling does not affect the interface. It affects the name. Using an attribute has more downsides, as it affects the whole function rather than just part of it, like a pragma would.
Re: Why don't we switch to C like floating pointed arithmetic instead of automatic expansion to reals?
On 6 August 2016 at 16:11, Patrick Schluter via Digitalmars-dwrote: > On Saturday, 6 August 2016 at 10:02:25 UTC, Iain Buclaw wrote: >> >> On 6 August 2016 at 11:48, Ilya Yaroshenko via Digitalmars-d >> wrote: >>> >>> On Saturday, 6 August 2016 at 09:35:32 UTC, Walter Bright wrote: >> >> No pragmas tied to a specific architecture should be allowed in the >> language spec, please. > > > Hmmm, that's the whole point of pragmas (at least in C) to specify > implementation specific stuff outside of the language specs. If it's in the > language specs it should be done with language specific mechanisms. https://dlang.org/spec/pragma.html#predefined-pragmas """ All implementations must support these, even if by just ignoring them. ... Vendor specific pragma Identifiers can be defined if they are prefixed by the vendor's trademarked name, in a similar manner to version identifiers. """ So all added pragmas that have no vendor prefix must be treated as part of the language in order to conform with the specs.
Re: Why don't we switch to C like floating pointed arithmetic instead of automatic expansion to reals?
On Saturday, 6 August 2016 at 10:02:25 UTC, Iain Buclaw wrote: On 6 August 2016 at 11:48, Ilya Yaroshenko via Digitalmars-dwrote: On Saturday, 6 August 2016 at 09:35:32 UTC, Walter Bright wrote: No pragmas tied to a specific architecture should be allowed in the language spec, please. Hmmm, that's the whole point of pragmas (at least in C) to specify implementation specific stuff outside of the language specs. If it's in the language specs it should be done with language specific mechanisms.
Re: Why don't we switch to C like floating pointed arithmetic instead of automatic expansion to reals?
On 6 August 2016 at 13:30, Ilya Yaroshenko via Digitalmars-dwrote: > On Saturday, 6 August 2016 at 11:10:18 UTC, Iain Buclaw wrote: >> >> On 6 August 2016 at 12:07, Ilya Yaroshenko via Digitalmars-d >> wrote: >>> >>> On Saturday, 6 August 2016 at 10:02:25 UTC, Iain Buclaw wrote: On 6 August 2016 at 11:48, Ilya Yaroshenko via Digitalmars-d wrote: > > > On Saturday, 6 August 2016 at 09:35:32 UTC, Walter Bright wrote: >> >> >> [...] > > > > > OK, then we need a third pragma,`pragma(ieeeRound)`. But > `pragma(fusedMath)` > and `pragma(fastMath)` should be presented too. > >> [...] > > > > > It allows a compiler to replace two arithmetic operations with single > composed one, see AVX2 (FMA3 for intel and FMA4 for AMD) instruction set. No pragmas tied to a specific architecture should be allowed in the language spec, please. >>> >>> >>> >>> Then probably Mir will drop all compilers, but LDC >>> LLVM is tied for real world, so we can tied D for real world too. If a >>> compiler can not implement optimization pragma, then this pragma can be >>> just >>> ignored by the compiler. >> >> >> If you need a function to work with an exclusive instruction set or >> something as specific as use of composed/fused instructions, then it is >> common to use an indirect function resolver to choose the most relevant >> implementation for the system that's running the code (a la @ifunc), then >> for the targetted fusedMath implementation, do it yourself. > > > What do you mean by "do it yourself"? Write code using FMA GCC intrinsics? > Why I need to do something that can be automated by a compiler? Modern > approach is to give a hint to the compiler instead of write specialised code > for different architectures. > > It seems you have misunderstood me. I don't want to force compiler to use > explicit instruction sets. Instead, I want to give a hint to a compiler, > about what math _transformations_ are allowed. And this hints are > architecture independent. A compiler may a may not use this hints to > optimise code. There are compiler switches for that. Maybe there should be one pragma to tweak these compiler switches on a per-function basis, rather than separately named pragmas. That way you tell the compiler what you want, rather than it being part of the language logic to understand what must be turned on/off internally. First, assume the language knows nothing about what platform it's running on, then use that as a basis for suggesting new pragmas that should be supported everywhere.
Re: Why don't we switch to C like floating pointed arithmetic instead of automatic expansion to reals?
Am Sat, 6 Aug 2016 02:29:50 -0700 schrieb Walter Bright: > On 8/6/2016 1:21 AM, Ilya Yaroshenko wrote: > > On Friday, 5 August 2016 at 20:53:42 UTC, Walter Bright wrote: > > > >> I agree that the typical summation algorithm suffers from double > >> rounding. But that's one algorithm. I would appreciate if you > >> would review > >> http://dlang.org/phobos/std_algorithm_iteration.html#sum to ensure > >> it doesn't have this problem, and if it does, how we can fix it. > > > > Phobos's sum is two different algorithms. Pairwise summation for > > Random Access Ranges and Kahan summation for Input Ranges. Pairwise > > summation does not require IEEE rounding, but Kahan summation > > requires it. > > > > The problem with real world example is that it depends on > > optimisation. For example, if all temporary values are rounded, > > this is not a problem, and if all temporary values are not rounded > > this is not a problem too. However if some of them rounded and > > others are not, than this will break Kahan algorithm. > > > > Kahan is the shortest and one of the slowest (comparing with KBN > > for example) summation algorithms. The true story about Kahan, that > > we may have it in Phobos, but we can use pairwise summation for > > Input Ranges without random access, and it will be faster then > > Kahan. So we don't need Kahan for current API at all. > > > > Mir has both Kahan, which works with 32-bit DMD, and pairwise, > > witch works with input ranges. > > > > Kahan, KBN, KB2, and Precise summations is always use `real` or > > `Complex!real` internal values for 32 bit X86 target. The only > > problem with Precise summation, if we need precise result in double > > and use real for internal summation, then the last bit will be > > wrong in the 50% of cases. > > > > Another good point about Mir's summation algorithms, that they are > > Output Ranges. This means they can be used effectively to sum > > multidimensional arrays for example. Also, Precise summator may be > > used to compute exact sum of distributed data. > > > > When we get a decision and solution for rounding problem, I will > > make PR for std.experimental.numeric.sum. > > > >> I hear you. I'd like to explore ways of solving it. Got any > >> ideas? > > > > We need to take the overall picture. > > > > It is very important to recognise that D core team is small and D > > community is not large enough now to involve a lot of new > > professionals. This means that time of existing one engineers is > > very important for D and the most important engineer for D is you, > > Walter. > > > > In the same time we need to move forward fast with language changes > > and druntime changes (GC-less Fibers for example). > > > > So, we need to choose tricky options for development. The most > > important option for D in the science context is to split D > > Programming Language from DMD in our minds. I am not asking to > > remove DMD as reference compiler. Instead of that, we can introduce > > changes in D that can not be optimally implemented in DMD (because > > you have a lot of more important things to do for D instead of > > optimisation) but will be awesome for our LLVM-based or GCC-based > > backends. > > > > We need 2 new pragmas with the same syntax as `pragma(inline, xxx)`: > > > > 1. `pragma(fusedMath)` allows fused mul-add, mul-sub, div-add, > > div-sub operations. 2. `pragma(fastMath)` equivalents to [1]. This > > pragma can be used to allow extended precision. > > > > This should be 2 separate pragmas. The second one may assume the > > first one. > > > > Recent LDC beta has @fastmath attribute for functions, and it is > > already used in Phobos ndslice.algorithm PR and its Mir's mirror. > > Attributes are alternative for pragmas, but their syntax should be > > extended, see [2] > > > > The old approach is separate compilation, but it is weird, low > > level for users, and requires significant efforts for both small > > and large projects. > > > > [1] http://llvm.org/docs/LangRef.html#fast-math-flags > > [2] https://github.com/ldc-developers/ldc/issues/1669 > > Thanks for your help with this. > > Using attributes for this is a mistake. Attributes affect the > interface to a function This is not true for UDAs. LDC and GDC actually implement @attribute as an UDA. And UDAs used in serialization interfaces, the std.benchmark proposals, ... do not affect the interface either. > not its internal implementation. It's possible to reflect on the UDAs of the current function, so this is not true in general: - @(40) int foo() { mixin("alias thisFunc = " ~ __FUNCTION__ ~ ";"); return __traits(getAttributes, thisFunc)[0]; } - https://dpaste.dzfl.pl/aa0615b40adf I think this restriction is also quite arbitrary. For end users attributes provide a much nicer syntax than pragmas. Both GDC and LDC already successfully use UDAs for function specific backend options,
Re: Why don't we switch to C like floating pointed arithmetic instead of automatic expansion to reals?
On Saturday, 6 August 2016 at 11:10:18 UTC, Iain Buclaw wrote: On 6 August 2016 at 12:07, Ilya Yaroshenko via Digitalmars-dwrote: On Saturday, 6 August 2016 at 10:02:25 UTC, Iain Buclaw wrote: On 6 August 2016 at 11:48, Ilya Yaroshenko via Digitalmars-d wrote: On Saturday, 6 August 2016 at 09:35:32 UTC, Walter Bright wrote: [...] OK, then we need a third pragma,`pragma(ieeeRound)`. But `pragma(fusedMath)` and `pragma(fastMath)` should be presented too. [...] It allows a compiler to replace two arithmetic operations with single composed one, see AVX2 (FMA3 for intel and FMA4 for AMD) instruction set. No pragmas tied to a specific architecture should be allowed in the language spec, please. Then probably Mir will drop all compilers, but LDC LLVM is tied for real world, so we can tied D for real world too. If a compiler can not implement optimization pragma, then this pragma can be just ignored by the compiler. If you need a function to work with an exclusive instruction set or something as specific as use of composed/fused instructions, then it is common to use an indirect function resolver to choose the most relevant implementation for the system that's running the code (a la @ifunc), then for the targetted fusedMath implementation, do it yourself. What do you mean by "do it yourself"? Write code using FMA GCC intrinsics? Why I need to do something that can be automated by a compiler? Modern approach is to give a hint to the compiler instead of write specialised code for different architectures. It seems you have misunderstood me. I don't want to force compiler to use explicit instruction sets. Instead, I want to give a hint to a compiler, about what math _transformations_ are allowed. And this hints are architecture independent. A compiler may a may not use this hints to optimise code.
Re: Why don't we switch to C like floating pointed arithmetic instead of automatic expansion to reals?
On 6 August 2016 at 12:07, Ilya Yaroshenko via Digitalmars-dwrote: > On Saturday, 6 August 2016 at 10:02:25 UTC, Iain Buclaw wrote: >> >> On 6 August 2016 at 11:48, Ilya Yaroshenko via Digitalmars-d >> wrote: >>> >>> On Saturday, 6 August 2016 at 09:35:32 UTC, Walter Bright wrote: [...] >>> >>> >>> >>> OK, then we need a third pragma,`pragma(ieeeRound)`. But >>> `pragma(fusedMath)` >>> and `pragma(fastMath)` should be presented too. >>> [...] >>> >>> >>> >>> It allows a compiler to replace two arithmetic operations with single >>> composed one, see AVX2 (FMA3 for intel and FMA4 for AMD) instruction set. >> >> >> No pragmas tied to a specific architecture should be allowed in the >> language spec, please. > > > Then probably Mir will drop all compilers, but LDC > LLVM is tied for real world, so we can tied D for real world too. If a > compiler can not implement optimization pragma, then this pragma can be just > ignored by the compiler. If you need a function to work with an exclusive instruction set or something as specific as use of composed/fused instructions, then it is common to use an indirect function resolver to choose the most relevant implementation for the system that's running the code (a la @ifunc), then for the targetted fusedMath implementation, do it yourself.
Re: Why don't we switch to C like floating pointed arithmetic instead of automatic expansion to reals?
On Saturday, 6 August 2016 at 10:02:25 UTC, Iain Buclaw wrote: On 6 August 2016 at 11:48, Ilya Yaroshenko via Digitalmars-dwrote: On Saturday, 6 August 2016 at 09:35:32 UTC, Walter Bright wrote: [...] OK, then we need a third pragma,`pragma(ieeeRound)`. But `pragma(fusedMath)` and `pragma(fastMath)` should be presented too. [...] It allows a compiler to replace two arithmetic operations with single composed one, see AVX2 (FMA3 for intel and FMA4 for AMD) instruction set. No pragmas tied to a specific architecture should be allowed in the language spec, please. Then probably Mir will drop all compilers, but LDC LLVM is tied for real world, so we can tied D for real world too. If a compiler can not implement optimization pragma, then this pragma can be just ignored by the compiler.
Re: Why don't we switch to C like floating pointed arithmetic instead of automatic expansion to reals?
On 6 August 2016 at 11:48, Ilya Yaroshenko via Digitalmars-dwrote: > On Saturday, 6 August 2016 at 09:35:32 UTC, Walter Bright wrote: >> >> On 8/6/2016 1:21 AM, Ilya Yaroshenko wrote: >>> >>> We need 2 new pragmas with the same syntax as `pragma(inline, xxx)`: >>> >>> 1. `pragma(fusedMath)` allows fused mul-add, mul-sub, div-add, div-sub >>> operations. >>> 2. `pragma(fastMath)` equivalents to [1]. This pragma can be used to >>> allow >>> extended precision. >> >> >> >> The LDC fastmath bothers me a lot. It throws away proper NaN and infinity >> handling, and throws away precision by allowing reciprocal and algebraic >> transformations. As I've said before, correctness should be first, not >> speed, and fastmath has nothing to do with this thread. > > > OK, then we need a third pragma,`pragma(ieeeRound)`. But `pragma(fusedMath)` > and `pragma(fastMath)` should be presented too. > >> I don't know what the point of fusedMath is. > > > It allows a compiler to replace two arithmetic operations with single > composed one, see AVX2 (FMA3 for intel and FMA4 for AMD) instruction set. No pragmas tied to a specific architecture should be allowed in the language spec, please.
Re: Why don't we switch to C like floating pointed arithmetic instead of automatic expansion to reals?
On Saturday, 6 August 2016 at 09:35:32 UTC, Walter Bright wrote: On 8/6/2016 1:21 AM, Ilya Yaroshenko wrote: We need 2 new pragmas with the same syntax as `pragma(inline, xxx)`: 1. `pragma(fusedMath)` allows fused mul-add, mul-sub, div-add, div-sub operations. 2. `pragma(fastMath)` equivalents to [1]. This pragma can be used to allow extended precision. The LDC fastmath bothers me a lot. It throws away proper NaN and infinity handling, and throws away precision by allowing reciprocal and algebraic transformations. As I've said before, correctness should be first, not speed, and fastmath has nothing to do with this thread. OK, then we need a third pragma,`pragma(ieeeRound)`. But `pragma(fusedMath)` and `pragma(fastMath)` should be presented too. I don't know what the point of fusedMath is. It allows a compiler to replace two arithmetic operations with single composed one, see AVX2 (FMA3 for intel and FMA4 for AMD) instruction set.
Re: Why don't we switch to C like floating pointed arithmetic instead of automatic expansion to reals?
On 8/6/2016 1:21 AM, Ilya Yaroshenko wrote: We need 2 new pragmas with the same syntax as `pragma(inline, xxx)`: 1. `pragma(fusedMath)` allows fused mul-add, mul-sub, div-add, div-sub operations. 2. `pragma(fastMath)` equivalents to [1]. This pragma can be used to allow extended precision. The LDC fastmath bothers me a lot. It throws away proper NaN and infinity handling, and throws away precision by allowing reciprocal and algebraic transformations. As I've said before, correctness should be first, not speed, and fastmath has nothing to do with this thread. I don't know what the point of fusedMath is.
Re: Why don't we switch to C like floating pointed arithmetic instead of automatic expansion to reals?
On 8/6/2016 1:21 AM, Ilya Yaroshenko wrote: On Friday, 5 August 2016 at 20:53:42 UTC, Walter Bright wrote: I agree that the typical summation algorithm suffers from double rounding. But that's one algorithm. I would appreciate if you would review http://dlang.org/phobos/std_algorithm_iteration.html#sum to ensure it doesn't have this problem, and if it does, how we can fix it. Phobos's sum is two different algorithms. Pairwise summation for Random Access Ranges and Kahan summation for Input Ranges. Pairwise summation does not require IEEE rounding, but Kahan summation requires it. The problem with real world example is that it depends on optimisation. For example, if all temporary values are rounded, this is not a problem, and if all temporary values are not rounded this is not a problem too. However if some of them rounded and others are not, than this will break Kahan algorithm. Kahan is the shortest and one of the slowest (comparing with KBN for example) summation algorithms. The true story about Kahan, that we may have it in Phobos, but we can use pairwise summation for Input Ranges without random access, and it will be faster then Kahan. So we don't need Kahan for current API at all. Mir has both Kahan, which works with 32-bit DMD, and pairwise, witch works with input ranges. Kahan, KBN, KB2, and Precise summations is always use `real` or `Complex!real` internal values for 32 bit X86 target. The only problem with Precise summation, if we need precise result in double and use real for internal summation, then the last bit will be wrong in the 50% of cases. Another good point about Mir's summation algorithms, that they are Output Ranges. This means they can be used effectively to sum multidimensional arrays for example. Also, Precise summator may be used to compute exact sum of distributed data. When we get a decision and solution for rounding problem, I will make PR for std.experimental.numeric.sum. I hear you. I'd like to explore ways of solving it. Got any ideas? We need to take the overall picture. It is very important to recognise that D core team is small and D community is not large enough now to involve a lot of new professionals. This means that time of existing one engineers is very important for D and the most important engineer for D is you, Walter. In the same time we need to move forward fast with language changes and druntime changes (GC-less Fibers for example). So, we need to choose tricky options for development. The most important option for D in the science context is to split D Programming Language from DMD in our minds. I am not asking to remove DMD as reference compiler. Instead of that, we can introduce changes in D that can not be optimally implemented in DMD (because you have a lot of more important things to do for D instead of optimisation) but will be awesome for our LLVM-based or GCC-based backends. We need 2 new pragmas with the same syntax as `pragma(inline, xxx)`: 1. `pragma(fusedMath)` allows fused mul-add, mul-sub, div-add, div-sub operations. 2. `pragma(fastMath)` equivalents to [1]. This pragma can be used to allow extended precision. This should be 2 separate pragmas. The second one may assume the first one. Recent LDC beta has @fastmath attribute for functions, and it is already used in Phobos ndslice.algorithm PR and its Mir's mirror. Attributes are alternative for pragmas, but their syntax should be extended, see [2] The old approach is separate compilation, but it is weird, low level for users, and requires significant efforts for both small and large projects. [1] http://llvm.org/docs/LangRef.html#fast-math-flags [2] https://github.com/ldc-developers/ldc/issues/1669 Thanks for your help with this. Using attributes for this is a mistake. Attributes affect the interface to a function, not its internal implementation. Pragmas are suitable for internal implementation things. I also oppose using compiler flags, because they tend to be overly global, and the details of an algorithm should not be split between the source code and the makefile.
Re: Why don't we switch to C like floating pointed arithmetic instead of automatic expansion to reals?
On 4 August 2016 at 23:38, Seb via Digitalmars-dwrote: > On Thursday, 4 August 2016 at 21:13:23 UTC, Iain Buclaw wrote: >> >> On 4 August 2016 at 01:00, Seb via Digitalmars-d >> wrote: >>> >>> To make matters worse std.math yields different results than >>> compiler/assembly intrinsics - note that in this example import std.math.pow >>> adds about 1K instructions to the output assembler, whereas llvm_powf boils >>> down to the assembly powf. Of course the performance of powf is a lot >>> better, I measured [3] that e.g. std.math.pow takes ~1.5x as long for both >>> LDC and DMD. Of course if you need to run this very often, this cost isn't >>> acceptable. >>> >> >> This could be something specific to your architecture. I get the same >> result on from all versions of powf, and from GCC builtins too, regardless >> of optimization tunings. > > > I can reproduce this on DPaste (also x86_64). > > https://dpaste.dzfl.pl/c0ab5131b49d > > Behavior with a recent LDC build is similar (as annotated with the > comments). When testing the math functions, I chose not to compare results to what C libraries, or CPU instructions return, but rather compared the results to Wolfram, which I hope I'm correct in saying is a more reliable and proven source of scientific maths than libm. As of the time I ported all pure D (not IASM) implementations of math functions, the results returned from all unittests using 80-bit reals were identical with Wolfram given up to the last 2 digits as an acceptable error with some values. This was true for all except inputs that were just inside the domain for the function, in which case only double precision was guaranteed. Where applicable, they were also found to in some cases to be more accurate than the inline assembler or yl2x implementations version paths that are used if you compile with DMD or LDC.
Re: Why don't we switch to C like floating pointed arithmetic instead of automatic expansion to reals?
On Friday, 5 August 2016 at 20:53:42 UTC, Walter Bright wrote: I agree that the typical summation algorithm suffers from double rounding. But that's one algorithm. I would appreciate if you would review http://dlang.org/phobos/std_algorithm_iteration.html#sum to ensure it doesn't have this problem, and if it does, how we can fix it. Phobos's sum is two different algorithms. Pairwise summation for Random Access Ranges and Kahan summation for Input Ranges. Pairwise summation does not require IEEE rounding, but Kahan summation requires it. The problem with real world example is that it depends on optimisation. For example, if all temporary values are rounded, this is not a problem, and if all temporary values are not rounded this is not a problem too. However if some of them rounded and others are not, than this will break Kahan algorithm. Kahan is the shortest and one of the slowest (comparing with KBN for example) summation algorithms. The true story about Kahan, that we may have it in Phobos, but we can use pairwise summation for Input Ranges without random access, and it will be faster then Kahan. So we don't need Kahan for current API at all. Mir has both Kahan, which works with 32-bit DMD, and pairwise, witch works with input ranges. Kahan, KBN, KB2, and Precise summations is always use `real` or `Complex!real` internal values for 32 bit X86 target. The only problem with Precise summation, if we need precise result in double and use real for internal summation, then the last bit will be wrong in the 50% of cases. Another good point about Mir's summation algorithms, that they are Output Ranges. This means they can be used effectively to sum multidimensional arrays for example. Also, Precise summator may be used to compute exact sum of distributed data. When we get a decision and solution for rounding problem, I will make PR for std.experimental.numeric.sum. I hear you. I'd like to explore ways of solving it. Got any ideas? We need to take the overall picture. It is very important to recognise that D core team is small and D community is not large enough now to involve a lot of new professionals. This means that time of existing one engineers is very important for D and the most important engineer for D is you, Walter. In the same time we need to move forward fast with language changes and druntime changes (GC-less Fibers for example). So, we need to choose tricky options for development. The most important option for D in the science context is to split D Programming Language from DMD in our minds. I am not asking to remove DMD as reference compiler. Instead of that, we can introduce changes in D that can not be optimally implemented in DMD (because you have a lot of more important things to do for D instead of optimisation) but will be awesome for our LLVM-based or GCC-based backends. We need 2 new pragmas with the same syntax as `pragma(inline, xxx)`: 1. `pragma(fusedMath)` allows fused mul-add, mul-sub, div-add, div-sub operations. 2. `pragma(fastMath)` equivalents to [1]. This pragma can be used to allow extended precision. This should be 2 separate pragmas. The second one may assume the first one. Recent LDC beta has @fastmath attribute for functions, and it is already used in Phobos ndslice.algorithm PR and its Mir's mirror. Attributes are alternative for pragmas, but their syntax should be extended, see [2] The old approach is separate compilation, but it is weird, low level for users, and requires significant efforts for both small and large projects. [1] http://llvm.org/docs/LangRef.html#fast-math-flags [2] https://github.com/ldc-developers/ldc/issues/1669 Best regards, Ilya
Re: Why don't we switch to C like floating pointed arithmetic instead of automatic expansion to reals?
On 8/5/2016 2:40 AM, Ilya Yaroshenko wrote: No. For example std.math.log requires it! But you don't care about other compilers which not use yl2x and about making it template (real version slows down code for double and float). I'm interested in correct to the last bit results first, and performance second. std.math changes that hew to that are welcome.
Re: Why don't we switch to C like floating pointed arithmetic instead of automatic expansion to reals?
On 8/5/2016 4:27 AM, Seb wrote: 1) There are some function (exp, pow, log, round, sqrt) for which using llvm_intrinsincs significantly increases your performance. It's a simple benchmark and might be flawed, but I hope it shows the point. Speed is not the only criteria. Accuracy is as well. I've been using C math functions forever, and have constantly discovered that this or that math function on this or that platform produces bad results. This is why D's math functions are re-implemented in D rather than just forwarding to the C ones. 2) As mentioned before they can yield _different_ results https://dpaste.dzfl.pl/c0ab5131b49d Ah, but which result is the correct one? I am interested in getting the functions correct to the last bit first, and performance second.
Re: Why don't we switch to C like floating pointed arithmetic instead of automatic expansion to reals?
Thanks for finding these. On 8/5/2016 3:22 AM, Ilya Yaroshenko wrote: 1. https://www.python.org/ftp/python/3.5.2/Python-3.5.2.tgz mathmodule.c, math_fsum has comment: Depends on IEEE 754 arithmetic guarantees and half-even rounding. The same algorithm also available in Mir. And it does not work with 32 bit DMD. 2. sum_kbn in https://github.com/JuliaLang/julia/blob/master/base/reduce.jl requires ieee arithmetic. The same algorithm also available in Mir. I agree that the typical summation algorithm suffers from double rounding. But that's one algorithm. I would appreciate if you would review http://dlang.org/phobos/std_algorithm_iteration.html#sum to ensure it doesn't have this problem, and if it does, how we can fix it. 3. http://www.netlib.org/cephes/ See log2.c for example: z = x - 0.5; z -= 0.5; y = 0.5 * x + 0.5; This code requires IEEE. And you can found it in Phobos std.math It'd be great to have a value for x where it fails, then we can add it to the unittests and ensure it is fixed. 4. Mir has 5 types of summation, and 3 of them requires IEEE. See above for summation. 5. Tinflex requires IEEE arithmetic because extended precision may force algorithm to choose wrong computation branch. The most significant part of original code was written in R, and the scientists who create this algorithm did not care about non IEEE compilers at all. 6. Atmosphere requires IEEE for may functions such as https://github.com/9il/atmosphere/blob/master/source/atmosphere/math.d#L616 Without proper IEEE rounding the are not guarantee that this functions will stop. 7. The same true for real world implementations of algorithms presented in Numeric Recipes, which uses various series expansion such as for Modified Bessel Function. I hear you. I'd like to explore ways of solving it. Got any ideas?
Re: Why don't we switch to C like floating pointed arithmetic instead of automatic expansion to reals?
On Friday, 5 August 2016 at 09:21:53 UTC, Ilya Yaroshenko wrote: On Friday, 5 August 2016 at 08:43:48 UTC, deadalnix wrote: On Friday, 5 August 2016 at 08:17:00 UTC, Ilya Yaroshenko wrote: 1. Could you please provide an assembler example with clang or recent gcc? I have better: compile your favorite project with -Wdouble-promotion and enjoy the rain of warnings. But try it yourself: float foo(float a, float b) { return 3.0 * a / b; } Your example is just a speculation. 3.0 force compiler to convert a and b to double. This is obvious. Ha you are right. Testing more it seems that gcc and clang are not promoting on 64 bit code, but still are on 32 bits.
Re: Why don't we switch to C like floating pointed arithmetic instead of automatic expansion to reals?
On Friday, 5 August 2016 at 09:40:59 UTC, Ilya Yaroshenko wrote: On Friday, 5 August 2016 at 09:24:49 UTC, Walter Bright wrote: On 8/5/2016 12:43 AM, Ilya Yaroshenko wrote: You are wrong that there are far fewer of those cases. This is naive point of view. A lot of netlib math functions require exact IEEE arithmetic. Tinflex requires it. Python C backend and Mir library require exact IEEE arithmetic. Atmosphere package requires it, Atmosphere is used as reference code for my publication in JMS, Springer. And the most important case: no one top scientific laboratory will use a language without exact IEEE arithmetic by default. A library has a lot of algorithms in it, a library requiring exact IEEE arithmetic doesn't mean every algorithm in it does. None of the Phobos math library functions require it, and as far as I can tell they are correct out to the last bit. No. For example std.math.log requires it! But you don't care about other compilers which not use yl2x and about making it template (real version slows down code for double and float). Yep. 1) There are some function (exp, pow, log, round, sqrt) for which using llvm_intrinsincs significantly increases your performance. It's a simple benchmark and might be flawed, but I hope it shows the point. Code is here: https://gist.github.com/wilzbach/2b64e10dec66a3153c51fbd1e6848f72 ldmd -inline -release -O3 -boundscheck=off test.d fun: pow std.math.pow = 15 secs, 914 ms, 102 μs, and 8 hnsecs core.stdc.pow = 11 secs, 590 ms, 702 μs, and 5 hnsecs llvm_pow = 13 secs, 570 ms, 439 μs, and 7 hnsecs fun: exp std.math.exp = 6 secs, 85 ms, 741 μs, and 7 hnsecs core.stdc.exp = 16 secs, 267 ms, 997 μs, and 4 hnsecs llvm_exp = 2 secs, 22 ms, and 876 μs fun: exp2 std.math.exp2 = 3 secs, 117 ms, 624 μs, and 2 hnsecs core.stdc.exp2 = 2 secs, 973 ms, and 243 μs llvm_exp2 = 2 secs, 451 ms, 628 μs, and 9 hnsecs fun: sin std.math.sin = 1 sec, 805 ms, 626 μs, and 7 hnsecs core.stdc.sin = 17 secs, 743 ms, 33 μs, and 5 hnsecs llvm_sin = 2 secs, 95 ms, and 178 μs fun: cos std.math.cos = 2 secs, 820 ms, 684 μs, and 5 hnsecs core.stdc.cos = 17 secs, 626 ms, 78 μs, and 1 hnsec llvm_cos = 2 secs, 814 ms, 60 μs, and 5 hnsecs fun: log std.math.log = 5 secs, 584 ms, 344 μs, and 5 hnsecs core.stdc.log = 16 secs, 443 ms, 893 μs, and 3 hnsecs llvm_log = 2 secs, 13 ms, 291 μs, and 1 hnsec fun: log2 std.math.log2 = 5 secs, 583 ms, 777 μs, and 7 hnsecs core.stdc.log2 = 2 secs, 800 ms, 848 μs, and 5 hnsecs llvm_log2 = 2 secs, 165 ms, 849 μs, and 6 hnsecs fun: sqrt std.math.sqrt = 799 ms and 917 μs core.stdc.sqrt = 864 ms, 834 μs, and 7 hnsecs llvm_sqrt = 439 ms, 469 μs, and 2 hnsecs fun: ceil std.math.ceil = 540 ms and 167 μs core.stdc.ceil = 971 ms, 533 μs, and 6 hnsecs llvm_ceil = 562 ms, 490 μs, and 2 hnsecs fun: round std.math.round = 3 secs, 52 ms, 567 μs, and 3 hnsecs core.stdc.round = 958 ms and 217 μs llvm_round = 590 ms, 742 μs, and 7 hnsecs 2) As mentioned before they can yield _different_ results https://dpaste.dzfl.pl/c0ab5131b49d
Re: Why don't we switch to C like floating pointed arithmetic instead of automatic expansion to reals?
Here is a relevant example: https://hal.inria.fr/inria-00171497v1/document It is used in at least one real world geometric modeling system.
Re: Why don't we switch to C like floating pointed arithmetic instead of automatic expansion to reals?
On Friday, 5 August 2016 at 09:40:23 UTC, Walter Bright wrote: On 8/5/2016 12:43 AM, Ilya Yaroshenko wrote: You are wrong that there are far fewer of those cases. This is naive point of view. A lot of netlib math functions require exact IEEE arithmetic. Tinflex requires it. Python C backend and Mir library require exact IEEE arithmetic. Atmosphere package requires it, Atmosphere is used as reference code for my publication in JMS, Springer. And the most important case: no one top scientific laboratory will use a language without exact IEEE arithmetic by default. I'd appreciate it if you could provide links to where these requirements are. I can't find anything on Tinflex, for example. 1. https://www.python.org/ftp/python/3.5.2/Python-3.5.2.tgz mathmodule.c, math_fsum has comment: Depends on IEEE 754 arithmetic guarantees and half-even rounding. The same algorithm also available in Mir. And it does not work with 32 bit DMD. 2. sum_kbn in https://github.com/JuliaLang/julia/blob/master/base/reduce.jl requires ieee arithmetic. The same algorithm also available in Mir. 3. http://www.netlib.org/cephes/ See log2.c for example: z = x - 0.5; z -= 0.5; y = 0.5 * x + 0.5; This code requires IEEE. And you can found it in Phobos std.math 4. Mir has 5 types of summation, and 3 of them requires IEEE. 5. Tinflex requires IEEE arithmetic because extended precision may force algorithm to choose wrong computation branch. The most significant part of original code was written in R, and the scientists who create this algorithm did not care about non IEEE compilers at all. 6. Atmosphere requires IEEE for may functions such as https://github.com/9il/atmosphere/blob/master/source/atmosphere/math.d#L616 Without proper IEEE rounding the are not guarantee that this functions will stop. 7. The same true for real world implementations of algorithms presented in Numeric Recipes, which uses various series expansion such as for Modified Bessel Function.
Re: Why don't we switch to C like floating pointed arithmetic instead of automatic expansion to reals?
On Friday, 5 August 2016 at 09:24:49 UTC, Walter Bright wrote: On 8/5/2016 12:43 AM, Ilya Yaroshenko wrote: You are wrong that there are far fewer of those cases. This is naive point of view. A lot of netlib math functions require exact IEEE arithmetic. Tinflex requires it. Python C backend and Mir library require exact IEEE arithmetic. Atmosphere package requires it, Atmosphere is used as reference code for my publication in JMS, Springer. And the most important case: no one top scientific laboratory will use a language without exact IEEE arithmetic by default. A library has a lot of algorithms in it, a library requiring exact IEEE arithmetic doesn't mean every algorithm in it does. None of the Phobos math library functions require it, and as far as I can tell they are correct out to the last bit. No. For example std.math.log requires it! But you don't care about other compilers which not use yl2x and about making it template (real version slows down code for double and float). And besides, all these libraries presumably work, or used to work, on the x87, which does not provide exact IEEE arithmetic for intermediate results without a special setting, and that setting substantially slows it down. x87 FPU is deprecated. We have more significant performance issues with std.math. For example, it is x87 oriented, is slows down the code for double and float. Many functions are not inlined. This 2 are real performance problems. By netlib do you mean the Cephes functions? I've used them, and am not aware of any of them that require reduced precision. Yes and many of its functions requires IEEE. For example log2.c for doubles: z = x - 0.5; z -= 0.5; y = 0.5 * x + 0.5; This code requires IEEE. The same code appears in our std.math :P
Re: Why don't we switch to C like floating pointed arithmetic instead of automatic expansion to reals?
On 8/5/2016 12:43 AM, Ilya Yaroshenko wrote: You are wrong that there are far fewer of those cases. This is naive point of view. A lot of netlib math functions require exact IEEE arithmetic. Tinflex requires it. Python C backend and Mir library require exact IEEE arithmetic. Atmosphere package requires it, Atmosphere is used as reference code for my publication in JMS, Springer. And the most important case: no one top scientific laboratory will use a language without exact IEEE arithmetic by default. I'd appreciate it if you could provide links to where these requirements are. I can't find anything on Tinflex, for example.
Re: Why don't we switch to C like floating pointed arithmetic instead of automatic expansion to reals?
On Friday, 5 August 2016 at 08:43:48 UTC, deadalnix wrote: On Friday, 5 August 2016 at 08:17:00 UTC, Ilya Yaroshenko wrote: 1. Could you please provide an assembler example with clang or recent gcc? I have better: compile your favorite project with -Wdouble-promotion and enjoy the rain of warnings. But try it yourself: float foo(float a, float b) { return 3.0 * a / b; } GCC 5.3 gives me foo(float, float): cvtss2sdxmm0, xmm0 cvtss2sdxmm1, xmm1 mulsd xmm0, QWORD PTR .LC0[rip] divsd xmm0, xmm1 cvtsd2ssxmm0, xmm0 ret .LC0: .long 0 .long 1074266112 Gotta be careful with those examples. See this: https://godbolt.org/g/0yNUSG float foo1(float a, float b) { return 3.42 * (a / b); } float foo2(float a, float b) { return 3.0 * (a / b); } float foo3(float a, float b) { return 3.0 * a / b; } float foo4(float a, float b) { return 3.0f * a / b; } foo1(float, float): divss xmm0, xmm1 cvtss2sdxmm0, xmm0 mulsd xmm0, QWORD PTR .LC0[rip] cvtsd2ssxmm0, xmm0 ret foo2(float, float): divss xmm0, xmm1 mulss xmm0, DWORD PTR .LC2[rip] ret foo3(float, float): cvtss2sdxmm0, xmm0 cvtss2sdxmm1, xmm1 mulsd xmm0, QWORD PTR .LC4[rip] divsd xmm0, xmm1 cvtsd2ssxmm0, xmm0 ret foo4(float, float): mulss xmm0, DWORD PTR .LC2[rip] divss xmm0, xmm1 ret .LC0: .long 4123168604 .long 1074486312 .LC2: .long 1077936128 .LC4: .long 0 .long 1074266112 It depends on the literal value, not just the type.
Re: Why don't we switch to C like floating pointed arithmetic instead of automatic expansion to reals?
On 8/5/2016 1:17 AM, Ilya Yaroshenko wrote: 2. C compilers not promote double to 80-bit reals anyway. Java originally came out with an edict that floats will all be done in float precision, and double in double. Sun had evidently never used an x87 before, because it soon became obvious that it was unworkable on the x87, and the spec was changed to allow intermediate values out to 80 bits.
Re: Why don't we switch to C like floating pointed arithmetic instead of automatic expansion to reals?
On 8/5/2016 12:43 AM, Ilya Yaroshenko wrote: You are wrong that there are far fewer of those cases. This is naive point of view. A lot of netlib math functions require exact IEEE arithmetic. Tinflex requires it. Python C backend and Mir library require exact IEEE arithmetic. Atmosphere package requires it, Atmosphere is used as reference code for my publication in JMS, Springer. And the most important case: no one top scientific laboratory will use a language without exact IEEE arithmetic by default. A library has a lot of algorithms in it, a library requiring exact IEEE arithmetic doesn't mean every algorithm in it does. None of the Phobos math library functions require it, and as far as I can tell they are correct out to the last bit. And besides, all these libraries presumably work, or used to work, on the x87, which does not provide exact IEEE arithmetic for intermediate results without a special setting, and that setting substantially slows it down. By netlib do you mean the Cephes functions? I've used them, and am not aware of any of them that require reduced precision.
Re: Why don't we switch to C like floating pointed arithmetic instead of automatic expansion to reals?
On Friday, 5 August 2016 at 08:43:48 UTC, deadalnix wrote: On Friday, 5 August 2016 at 08:17:00 UTC, Ilya Yaroshenko wrote: 1. Could you please provide an assembler example with clang or recent gcc? I have better: compile your favorite project with -Wdouble-promotion and enjoy the rain of warnings. But try it yourself: float foo(float a, float b) { return 3.0 * a / b; } Your example is just a speculation. 3.0 force compiler to convert a and b to double. This is obvious.
Re: Why don't we switch to C like floating pointed arithmetic instead of automatic expansion to reals?
On Friday, 5 August 2016 at 08:17:00 UTC, Ilya Yaroshenko wrote: 1. Could you please provide an assembler example with clang or recent gcc? I have better: compile your favorite project with -Wdouble-promotion and enjoy the rain of warnings. But try it yourself: float foo(float a, float b) { return 3.0 * a / b; } GCC 5.3 gives me foo(float, float): cvtss2sdxmm0, xmm0 cvtss2sdxmm1, xmm1 mulsd xmm0, QWORD PTR .LC0[rip] divsd xmm0, xmm1 cvtsd2ssxmm0, xmm0 ret .LC0: .long 0 .long 1074266112 Which clearly uses double precision. And clang 3.8: LCPI0_0: .quad 4613937818241073152 # double 3 foo(float, float): # @foo(float, float) cvtss2sdxmm0, xmm0 mulsd xmm0, qword ptr [rip + .LCPI0_0] cvtss2sdxmm1, xmm1 divsd xmm0, xmm1 cvtsd2ssxmm0, xmm0 ret which uses double as well. 2. C compilers not promote double to 80-bit reals anyway. VC++ does it on 32 bits build, but initialize the x87 unit to double precision (on 80 bits floats - yes that's a x87 setting). VC++ will keep using float for x64 builds. Intel compiler use compiler flags to promote or not. In case you were wondering, this is not limited to X86/64 as GCC gives me on ARM: foo(float, float): fmovd2, 3.0e+0 fcvtd0, s0 fmuld0, d0, d2 fcvtd1, s1 fdivd0, d0, d1 fcvts0, d0 ret Which also promotes to double.
Re: Why don't we switch to C like floating pointed arithmetic instead of automatic expansion to reals?
On Friday, 5 August 2016 at 07:59:15 UTC, deadalnix wrote: On Friday, 5 August 2016 at 07:43:19 UTC, Ilya Yaroshenko wrote: You are wrong that there are far fewer of those cases. This is naive point of view. A lot of netlib math functions require exact IEEE arithmetic. Tinflex requires it. Python C backend and Mir library require exact IEEE arithmetic. Atmosphere package requires it, Atmosphere is used as reference code for my publication in JMS, Springer. And the most important case: no one top scientific laboratory will use a language without exact IEEE arithmetic by default. Most C compilers always promote float to double, so I'm not sure what point you are trying to make here. 1. Could you please provide an assembler example with clang or recent gcc? 2. C compilers not promote double to 80-bit reals anyway.
Re: Why don't we switch to C like floating pointed arithmetic instead of automatic expansion to reals?
On Friday, 5 August 2016 at 07:43:19 UTC, Ilya Yaroshenko wrote: On Friday, 5 August 2016 at 06:59:21 UTC, Walter Bright wrote: On 8/4/2016 11:05 PM, Fool wrote: I understand your point of view. However, there are (probably rare) situations where one requires more control. I think that simulating double-double precision arithmetic using Veltkamp split was mentioned as a resonable example, earlier. There are cases where doing things at higher precision results in double rounding and a less accurate result. But I am pretty sure there are far fewer of those cases compared to routine computations that get a more accurate result with more precision. If that wasn't true, we wouldn't ever need double precision. You are wrong that there are far fewer of those cases. This is naive point of view. A lot of netlib math functions require exact IEEE arithmetic. Tinflex requires it. Python C backend and Mir library require exact IEEE arithmetic. Atmosphere package requires it, Atmosphere is used as reference code for my publication in JMS, Springer. And the most important case: no one top scientific laboratory will use a language without exact IEEE arithmetic by default. Most C compilers always promote float to double, so I'm not sure what point you are trying to make here.
Re: Why don't we switch to C like floating pointed arithmetic instead of automatic expansion to reals?
On Friday, 5 August 2016 at 06:59:21 UTC, Walter Bright wrote: On 8/4/2016 11:05 PM, Fool wrote: I understand your point of view. However, there are (probably rare) situations where one requires more control. I think that simulating double-double precision arithmetic using Veltkamp split was mentioned as a resonable example, earlier. There are cases where doing things at higher precision results in double rounding and a less accurate result. But I am pretty sure there are far fewer of those cases compared to routine computations that get a more accurate result with more precision. If that wasn't true, we wouldn't ever need double precision. You are wrong that there are far fewer of those cases. This is naive point of view. A lot of netlib math functions require exact IEEE arithmetic. Tinflex requires it. Python C backend and Mir library require exact IEEE arithmetic. Atmosphere package requires it, Atmosphere is used as reference code for my publication in JMS, Springer. And the most important case: no one top scientific laboratory will use a language without exact IEEE arithmetic by default.
Re: Why don't we switch to C like floating pointed arithmetic instead of automatic expansion to reals?
On 8/4/2016 11:05 PM, Fool wrote: I understand your point of view. However, there are (probably rare) situations where one requires more control. I think that simulating double-double precision arithmetic using Veltkamp split was mentioned as a resonable example, earlier. There are cases where doing things at higher precision results in double rounding and a less accurate result. But I am pretty sure there are far fewer of those cases compared to routine computations that get a more accurate result with more precision. If that wasn't true, we wouldn't ever need double precision.
Re: Why don't we switch to C like floating pointed arithmetic instead of automatic expansion to reals?
On Thursday, 4 August 2016 at 20:58:57 UTC, Walter Bright wrote: On 8/4/2016 1:29 PM, Fool wrote: I'm afraid, I don't understand your implementation. Isn't toFloat(x) + toFloat(y) computed in real precision (first rounding)? Why doesn't toFloat(toFloat(x) + toFloat(y)) involve another rounding? You're right, in that case, it does. But C does, too: http://www.exploringbinary.com/double-rounding-errors-in-floating-point-conversions/ Yes. It seems, however, as if Rick Regan is not advocating this behavior. This is important to remember when advocating for "C-like" floating point - because C simply does not behave as most programmers seem to assume it does. That's right. "C-like" might be what they say but what they want is double precision computations to be carried out in double precision. What toFloat() does is guarantee that its argument is rounded to float. The best way to approach this when designing fp algorithms is to not require them to have reduced precision. I understand your point of view. However, there are (probably rare) situations where one requires more control. I think that simulating double-double precision arithmetic using Veltkamp split was mentioned as a resonable example, earlier. It's also important to realize that on some machines, the hardware does not actually support float precision operations, or may do so at a large runtime penalty (x87). That's another story.
Re: Why don't we switch to C like floating pointed arithmetic instead of automatic expansion to reals?
On Thursday, 4 August 2016 at 21:13:23 UTC, Iain Buclaw wrote: On 4 August 2016 at 01:00, Seb via Digitalmars-dwrote: To make matters worse std.math yields different results than compiler/assembly intrinsics - note that in this example import std.math.pow adds about 1K instructions to the output assembler, whereas llvm_powf boils down to the assembly powf. Of course the performance of powf is a lot better, I measured [3] that e.g. std.math.pow takes ~1.5x as long for both LDC and DMD. Of course if you need to run this very often, this cost isn't acceptable. This could be something specific to your architecture. I get the same result on from all versions of powf, and from GCC builtins too, regardless of optimization tunings. I can reproduce this on DPaste (also x86_64). https://dpaste.dzfl.pl/c0ab5131b49d Behavior with a recent LDC build is similar (as annotated with the comments).
Re: Why don't we switch to C like floating pointed arithmetic instead of automatic expansion to reals?
On 8/4/2016 2:13 PM, Iain Buclaw via Digitalmars-d wrote: This could be something specific to your architecture. I get the same result on from all versions of powf, and from GCC builtins too, regardless of optimization tunings. It's important to remember that what gcc does and what the C standard allows are not necessarily the same - even if the former is standard compliant. C allows for a lot of implementation defined FP behavior.
Re: Why don't we switch to C like floating pointed arithmetic instead of automatic expansion to reals?
On 4 August 2016 at 01:00, Seb via Digitalmars-dwrote: > > Consider the following program, it fails on 32-bit :/ > It would be nice if explicit casts were honoured by CTFE here. toDouble(a + b) just seems to be avoiding the why CTFE ignores the cast in cast(double)(a + b). > To make matters worse std.math yields different results than > compiler/assembly intrinsics - note that in this example import std.math.pow > adds about 1K instructions to the output assembler, whereas llvm_powf boils > down to the assembly powf. Of course the performance of powf is a lot > better, I measured [3] that e.g. std.math.pow takes ~1.5x as long for both > LDC and DMD. Of course if you need to run this very often, this cost isn't > acceptable. > This could be something specific to your architecture. I get the same result on from all versions of powf, and from GCC builtins too, regardless of optimization tunings.
Re: Why don't we switch to C like floating pointed arithmetic instead of automatic expansion to reals?
On 8/4/2016 1:29 PM, Fool wrote: I'm afraid, I don't understand your implementation. Isn't toFloat(x) + toFloat(y) computed in real precision (first rounding)? Why doesn't toFloat(toFloat(x) + toFloat(y)) involve another rounding? You're right, in that case, it does. But C does, too: http://www.exploringbinary.com/double-rounding-errors-in-floating-point-conversions/ This is important to remember when advocating for "C-like" floating point - because C simply does not behave as most programmers seem to assume it does. What toFloat() does is guarantee that its argument is rounded to float. The best way to approach this when designing fp algorithms is to not require them to have reduced precision. It's also important to realize that on some machines, the hardware does not actually support float precision operations, or may do so at a large runtime penalty (x87).
Re: Why don't we switch to C like floating pointed arithmetic instead of automatic expansion to reals?
IEEE behaviour by default is required by numeric software. @fastmath (like recent LDC) or something like that can be used to allow extended precision. Ilya
Re: Why don't we switch to C like floating pointed arithmetic instead of automatic expansion to reals?
On 8/4/2016 1:03 PM, deadalnix wrote: It is actually very common for C compiler to work with double for intermediate values, which isn't far from what D does. In fact, it used to be specified that C behave that way!
Re: Why don't we switch to C like floating pointed arithmetic instead of automatic expansion to reals?
On Thursday, 4 August 2016 at 20:00:14 UTC, Walter Bright wrote: On 8/4/2016 12:03 PM, Fool wrote: How can we ensure that toFloat(toFloat(x) + toFloat(y)) does not involve double-rounding? It's the whole point of it. I'm afraid, I don't understand your implementation. Isn't toFloat(x) + toFloat(y) computed in real precision (first rounding)? Why doesn't toFloat(toFloat(x) + toFloat(y)) involve another rounding?
Re: Why don't we switch to C like floating pointed arithmetic instead of automatic expansion to reals?
On 8/4/2016 12:03 PM, Fool wrote: How can we ensure that toFloat(toFloat(x) + toFloat(y)) does not involve double-rounding? It's the whole point of it.
Re: Why don't we switch to C like floating pointed arithmetic instead of automatic expansion to reals?
On Thursday, 4 August 2016 at 18:53:23 UTC, Walter Bright wrote: On 8/4/2016 7:08 AM, Andrew Godfrey wrote: Now, my major experience is in the context of Intel non-SIMD FP, where internal precision is 80-bit. I can see the appeal of asking for the ability to reduce internal precision to match the data type you're using, and I think I've read something written by Walter on that topic. But this would hardly be "C-like" FP support so I'm not sure that's he topic at hand. Also, carefully reading the C Standard, D's behavior is allowed by the C Standard. The idea that C requires rounding of all intermediate values to the target precision is incorrect, and is not "C-like". C floating point semantics can and do vary from platform to platform, and vary based on optimization settings, and this is all allowed by the C Standard. It is actually very common for C compiler to work with double for intermediate values, which isn't far from what D does.
Re: Why don't we switch to C like floating pointed arithmetic instead of automatic expansion to reals?
On 8/4/2016 11:53 AM, Walter Bright wrote: It has been proposed many times that the solution for D is to have a function called toFloat() or something like that in core.math, which guarantees a round to float precision for its argument. But so far nobody has written such a function. https://github.com/dlang/druntime/pull/1621
Re: Why don't we switch to C like floating pointed arithmetic instead of automatic expansion to reals?
On Thursday, 4 August 2016 at 18:53:23 UTC, Walter Bright wrote: It has been proposed many times that the solution for D is to have a function called toFloat() or something like that in core.math, which guarantees a round to float precision for its argument. But so far nobody has written such a function. How can we ensure that toFloat(toFloat(x) + toFloat(y)) does not involve double-rounding?
Re: Why don't we switch to C like floating pointed arithmetic instead of automatic expansion to reals?
On 8/4/2016 7:08 AM, Andrew Godfrey wrote: Now, my major experience is in the context of Intel non-SIMD FP, where internal precision is 80-bit. I can see the appeal of asking for the ability to reduce internal precision to match the data type you're using, and I think I've read something written by Walter on that topic. But this would hardly be "C-like" FP support so I'm not sure that's he topic at hand. Also, carefully reading the C Standard, D's behavior is allowed by the C Standard. The idea that C requires rounding of all intermediate values to the target precision is incorrect, and is not "C-like". C floating point semantics can and do vary from platform to platform, and vary based on optimization settings, and this is all allowed by the C Standard. It has been proposed many times that the solution for D is to have a function called toFloat() or something like that in core.math, which guarantees a round to float precision for its argument. But so far nobody has written such a function.
Re: Why don't we switch to C like floating pointed arithmetic instead of automatic expansion to reals?
On Wednesday, 3 August 2016 at 23:00:11 UTC, Seb wrote: There was a recent discussion on Phobos about D's floating point behavior [1]. I think Ilya summarized quite elegantly our problem: [...] In my experience (production-quality FP coding in C++), you are in error merely by combining floating point with exact comparison (==). Even if you have just one compiler and architecture to target, you can expect instability if you do this. Writing robust FP algorithms is an art and it's made harder if you use mathematical thinking, because FP arithmetic lacks many properties that integer or fixed-point arithmetic have. I'm not saying D gets it right (I haven't explored that at all) but I am saying you need better examples. Now, my major experience is in the context of Intel non-SIMD FP, where internal precision is 80-bit. I can see the appeal of asking for the ability to reduce internal precision to match the data type you're using, and I think I've read something written by Walter on that topic. But this would hardly be "C-like" FP support so I'm not sure that's he topic at hand.