Re: [fpc-pascal] Floating point question
My opinions about the solutions below ... Am 13.02.2024 um 12:07 schrieb Thomas Kurz via fpc-pascal: But, sorry, because we are talking about compile time math, performance (nanoseconds) in this case doesn't count, IMO. That's what i thought at first, too. But then I started thinking about how to deal with it and sumbled upon difficulties very soon: a) 8427.0 + 33.0 / 1440.0 An easy case: all constants, so do the calculation at highest precision and reduce it afterwards, if possible. I agree; I would say: all constants, so do the calculation at highest precision and reduce it afterwards, if required by the target b) var_single + 33.0 / 1440.0 Should also be feasable by evaluating the constant expression first, then reducing it to single (if possible) and adding the variable in the end. yes ... first evaluate the constant expression with maximum precision (best at compile time), then reduce the result. The reduction to single must be done in any case, because the var_single in the expression dictates it, IMO c) 8427.0 + var_double / 1440.0 Because of using the double-type variable here, constants should be treated as double even at the cost of performance due to not knowing whether the result will be assigned to a single or double. yes d) 8427.0 + var_single / 1440.0 And this is the one I got to struggle with. And I can imagine this is the reason for the decision about how to handle decimal constants. My first approach would have been to implicitly use single precision values throughout the expression. This would mean to lose precision if the result will be assigned to a double-precision variable. One could say: "bad luck - if the programmer intended to get better precision, he should have used a double-precision variable as in case c". But this wouldn't be any better than the current state we have now. 8427.0 + (var_single / 1440.0) the 1440.0 can be reduced to single, because the other operand is single and so the whole operation is done using single arithmetic. If here we had a FP constant instead of var_single, the whole operation IMO should be done with maximum precision and at compile time in the best case. I have no problem that this operation may give a different result with decimal constants than with explicitly typed (reduced) FP variables. This can be easily explained to the users. Operations involving FP variables with reduced precision may give reduced precision results. This seems to be desirable for performance reasons and can be avoided by appropriate type casting. ___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal
Re: [fpc-pascal] Floating point question
Ok, maybe this example will prove why it's not happening correctly: program Const_Vs_Var; Const A_const = Integer(8427); B_const = Byte(33); C_const = Single(1440.5); Win_Calc = 16854.045817424505380076362374176; Const_Ans = 16854.045817424505380076362374176 / (8427 + 33 / 1440.5); Var A_Var : Integer; B_Var : Byte; C_Var : Single; Const_Ans1, Var_Ans1 : Extended; Begin A_Var := A_Const; B_Var := B_Const; C_Var := C_Const; Var_Ans1 := Win_Calc / (A_Var+B_Var/C_Var); Const_Ans1 := Win_Calc / (A_Const+B_Const/C_Const); WRITELN ( ' Const_Ans = ', Const_Ans:20:20); WRITELN ( ' Const_Ans1 = ', Const_Ans1:20:20); WRITELN ( ' Var_Ans1 = ', Var_Ans1:20:20); End. The result is: Const_Ans = 2.0010627116630224 Const_Ans1 = 2.0010627116630224 Var_Ans1 = 2. Now you can see, if the math was done the same as the way math is done for variables, we could have stored the constants as Byte(2). But because the math is being carried out after the reduction in precision we are left with storing this as extended. If the result of all the math can be reduced, or if there is no math, then it's great to reduce precision, but if the reduction in precision happens before the math, you can end up with the opposite of what you intended. Sure the compiler is working with faster math, but who cares what the compiler has to do, now we're going to be stuck with a program using extended(2.0010627116630224) for any calculations that use Const_Ans instead of byte(2); if Const_Ans is used in some kind of iterative process, it the program could be using this extended millions of times when it could have been using a byte. Notice when I do the EXACT same math with variables, it DOES give me a result of 2, and THAT can be reduced. If the answer after all the math can be reduced, it should be reduced, if it can't be, then it should not be. Math with constants should be the same as math with variables. I'm trying to show there doesn't need to be a trade off at all, the math with constants just needs to be done correctly... as in the exact same way math with variables is done. What has happened is the math with constants was written and tested with the assumption that all constants would be full precision, because it was impossible for constants to be anything other than full precision, but now that is no longer the case and the math with constants isn't working correctly anymore. Either the math needs to happen before the reduction in precision or the math needs to be fixed so it works the same as math with variables, either way there won't need to be a trade off and everything will work the way everyone wants it to.. performance when possible and precision when needed. James ___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal
Re: [fpc-pascal] Floating point question
>As Jonas said, this would result in less efficient code, since all the math will then be done at full precision, which is slower. I don't think I'm explaining it well, I'm saying where there is an entire formula that the compiler needs to evaluate, what's happening now, is that each term is being reduced in precision first, Then the math happens, and the result it stored. If instead the compiler did all the math first, THEN ran the function that determines if the entire answer should be reduced in precision, then the math would work correctly. But we don't care how long it takes to do the math during the compile, the constants are only compiled once and stored in the executable. The reason to do all this is to make the executing program that ends up using the constants over and over many times more efficient, the speed of the compilation is irrelevant. >As usual, it is a trade-off between size (=precision) and speed. I agree with that, but only in the executing program, not the compiler. James ___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal
Re: [fpc-pascal] Floating point question
> But, sorry, because we are talking about compile time math, performance (nanoseconds) in this case doesn't count, IMO. That's what i thought at first, too. But then I started thinking about how to deal with it and sumbled upon difficulties very soon: a) 8427.0 + 33.0 / 1440.0 An easy case: all constants, so do the calculation at highest precision and reduce it afterwards, if possible. b) var_single + 33.0 / 1440.0 Should also be feasable by evaluating the constant expression first, then reducing it to single (if possible) and adding the variable in the end. c) 8427.0 + var_double / 1440.0 Because of using the double-type variable here, constants should be treated as double even at the cost of performance due to not knowing whether the result will be assigned to a single or double. d) 8427.0 + var_single / 1440.0 And this is the one I got to struggle with. And I can imagine this is the reason for the decision about how to handle decimal constants. My first approach would have been to implicitly use single precision values throughout the expression. This would mean to lose precision if the result will be assigned to a double-precision variable. One could say: "bad luck - if the programmer intended to get better precision, he should have used a double-precision variable as in case c". But this wouldn't be any better than the current state we have now. Overall, I must admit that the choice ain't easy at all. In this situation, it might be a good choice to ask "what would other languages do here?". As far as I know about C, it treats constants as double-precision by default. You have to write "1.0f" if you explicitly want single precision. But I think it's too late for introducing yet another change. Imho, the correct decision at FPC v2.2 would have been to keep the previous behavior and instruct those concering performance to use "{$MINFPCONSTPREC 32}" (or using the "1.0f" notation) instead of requiring everyone to use "{$MINFPCONSTPREC 64}" to keep compatibility with previous releases. Thomas ___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal
Re: [fpc-pascal] Floating point question
Op 13-2-2024 om 11:39 schreef Bernd Oppolzer via fpc-pascal: But, sorry, because we are talking about compile time math, performance (nanoseconds) in this case doesn't count, IMO. But probably compiled code is then automatically upscaled to the higher type too, since if one of the terms of an expression is of higher precision, then the whole expression is. ___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal
Re: [fpc-pascal] Floating point question
Am 13.02.2024 um 10:54 schrieb Michael Van Canneyt via fpc-pascal: On Tue, 13 Feb 2024, James Richters via fpc-pascal wrote: Sorry for the kind of duplicate post, I submitted it yesterday morning and I thought it failed, so I re-did it and tried again.. then after that the original one showed up. A thought occurred to me. Since the complier math is expecting all the constants would be in full precision, then the compiler math doesn't need to change, it's just that the reduction in precision is just happening too soon. It's evaluating and reducing each term of an expression, then the math is happening, and the answer is not coming out right. If instead everything was left full precision until after the compiler math (because this is what the compiler math expects), and then the final answer was reduced in precision where possible, then it would work flawlessly. So the reduction in precision function only needs to run once on the final answer, not on every term before the calculation. As Jonas said, this would result in less efficient code, since all the math will then be done at full precision, which is slower. As usual, it is a trade-off between size (=precision) and speed. Michael. But, sorry, because we are talking about compile time math, performance (nanoseconds) in this case doesn't count, IMO. ___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal
Re: [fpc-pascal] Floating point question
On Tue, 13 Feb 2024, James Richters via fpc-pascal wrote: Sorry for the kind of duplicate post, I submitted it yesterday morning and I thought it failed, so I re-did it and tried again.. then after that the original one showed up. A thought occurred to me. Since the complier math is expecting all the constants would be in full precision, then the compiler math doesn't need to change, it's just that the reduction in precision is just happening too soon. It's evaluating and reducing each term of an expression, then the math is happening, and the answer is not coming out right. If instead everything was left full precision until after the compiler math (because this is what the compiler math expects), and then the final answer was reduced in precision where possible, then it would work flawlessly. So the reduction in precision function only needs to run once on the final answer, not on every term before the calculation. As Jonas said, this would result in less efficient code, since all the math will then be done at full precision, which is slower. As usual, it is a trade-off between size (=precision) and speed. Michael. ___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal
Re: [fpc-pascal] Floating point question
Sorry for the kind of duplicate post, I submitted it yesterday morning and I thought it failed, so I re-did it and tried again.. then after that the original one showed up. A thought occurred to me. Since the complier math is expecting all the constants would be in full precision, then the compiler math doesn't need to change, it's just that the reduction in precision is just happening too soon. It's evaluating and reducing each term of an expression, then the math is happening, and the answer is not coming out right. If instead everything was left full precision until after the compiler math (because this is what the compiler math expects), and then the final answer was reduced in precision where possible, then it would work flawlessly. So the reduction in precision function only needs to run once on the final answer, not on every term before the calculation. Sorry again for the duplicate. James ___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal
Re: [fpc-pascal] Floating point question
>>Overall, the intermediate float precision is a very difficult topic. I agree it's a difficult topic, it all comes down to what your program is doing, and whether you need performance or precision. >>And generate the slowest code possible on most platforms. I can appreciate the need to reduce precision where it's possible for the sake of performance, especially when it won't make any difference. What makes it difficult is there are many different reasons for wanting it one way, or the other, it depends on the purpose of the program, and the compiler has no way to know what the purpose is. It occurs to me that one could want part of a program to be optimized for performance and another part of the same program to be optimized for precision, for example if you are doing calculations to generate geometry, and also want to display the geometry on the screen, the data you write out to a file you would want maximum precision, but since what you will display on the screen will eventually become only integer values of pixels you want to do that math as fast as possible, especially if you want to pan / zoom / rotate, and even though what the screen data is based on might be double precision or more, I can see how reducing its precision as fast as possible would be beneficial to increase performance. So Im trying to learn something, I agree it would be better have performance where its possible and precision when needed. But I just don't understand what is going on. I'm not trying to say that this reduction in precision should not be done, I'm understanding the value in it. Im trying to figure out why the math done with constants where the compiler is doing the math is not the same as when the program does with math with variables. If the solution is to typecast where needed to get the desired results, they why isnt it working the way I expect it to? Below is a sample program, Im not trying to make everything extended, in fact quite the opposite, there is no need for the input constants / variables to be Extended because they all fit perfectly in smaller data types, so I put them all into smaller datatypes as an example. I am defining constants explicitly and defining variables the exact same way, so Im comparing apples to apples here, I have A as always an Integer, B as always a Byte, and C as always a single, with a value the fits in a single. My goal is to add the integer to a byte thats been divided by a single and get the result in Extended. When I do this with the variables, everything is as I expected, when I do this with constants, its not as I expect. This is what I dont understand, and if this worked as expected then I think everyone is happy.What ever is happening for it to work correctly during program execution should also be happening when the compiler does the math. The problem isnt that the constants got stored in lower precision its that they are somehow forcing the result of the calculation to also be at the lower precision and not re-evaluated after the math. Its completely legitimate to divide a low precision number by a low precision number and get a high precision result, it works with Variables, why doesnt it work with Constants? I suspect that whats happened is that there is something missing in the way the compiler does math, something that is not needed if it was always done at maximum precision, but that is needed with mixed precision. Its not that the fact that the constants were reduced in precision, its something to do with the way the math is done with constants of reduced precision that isnt being accounted for, and that is not necessary if calculating with full precision. Its not that the changes in 2.2 are the problem at all, its that something else needed to be done at the same time that was missed. The only way I can get the correct result when using constants is to re-cast ALL of them as extended, not just the ones involving division, and not the entire formula, but every single constant. This is what I dont understand. >>The evaluation of the expression on the right of := does not know (and should not know) what the type is of the expression on the left. Why cant the compiler do tall the math at full precision and then evaluate only the result to see if that can be stored in a lower precision. If the expression on the right cannot and should not know the type on the left, then there is a good possibility that its a high precision data type, and then there should be some provision to safeguard against data loss if the type is of high precision. Why doesnt this work?JJ := Extended(A_Const+B_Const/C_Const); It requires no knowledge of what is on the left. Why cant the math be done with high precision and the result be reduced to the smallest datatype, Math with low precision data types often results in high precision results. If I want to have a mixed program with portions in high precision and portion
Re: [fpc-pascal] Floating point question
It occurs to me that there is merit in reduction of precision to increase performance, and so I'm trying to learn how to do this correctly, but the thing that confuses me is that math with constants doesn't seem to be the same as math with variables, and I don't know why. It also looks to me like when there is an expression such as: e := 8427.0 + 33.0 / 1440.0; what is happening each term of the expression is evaluated individually to see if it can be reduced in precision, and then the math is carried out, but if the math was carried out at full precision first by the compiler, THEN the entire answer was evaluated to see if it can be reduced in precision, the results would be what we are all expecting. Regardless of that however, when I am working with variables, an integer added to a byte that has been divided by a single results in an extended...it's legitimate to expect you could get an extended result from such an operation, just as dividing a byte by another byte could result in an extended answer. With variables, this seems to always be the case, but with constants, it does not seems to be the case. If constants just did the math the same as variables, then all this reduction in precision stuff would work flawlessly for everyone without re-casting everything. Please consider the code below, I am comparing the results to what I get when I perform this math with the Windows Calculator, as you can see no matter how I cast it, when using variables, I get the expected answer, but when the compiler does the math, it's not working the same way. What seems to be happening with variables is that the answer to lower precision entities can result in higher precision results, while with constants, the resulting precision is limited in some way, but in a way I don't understand, because it's being reduced to single precision, but the lowest precision element is a byte. In other words with variables a byte / single is perfectly capable of producing an extended result, without re-casting. But with constants doing the exact same thing forces the result to always be a single. I don't think the real issue has anything to do with this reduction in precision at all, I think it has to do with whatever causes the compiler to do math differently than the executing program does with variables. I don't understand why I must individually re-cast every element of the equation using constants to extended, while when I do the exact same thing with variables it's not necessary. I am wondering if the way the compiler does the math, it's is expecting that all constants would be full precision, and therefore the way it did the math before always came out right, but when the change was made in 2.2 to reduce the precision to variables, no corresponding adjustment was made to the way the compiler carries out math to compensate for the possibility that there was such a thing as a constant with reduced precision. So the compiler is doing math as if all input terms are at highest precision, therefore not needing to bother considering the answer might be higher precision than the input terms, but now that there is the possibility of the result being of higher precision, some adjustment to the way math is done by the compiler is necessary. I just think if the compiler did all the math the same way the executing program does with math with variables, then everything is solved for everyone... without any re-casting or unexpected results due to division, and while also preventing unnecessary precision. this has nothing to do with the reduction of precision, only the way the compiler is doing it's calculations needs to be adjusted for this new situation. Just fixing the way the compiler does the math also requires no knowledge of the left side of the equation by the right. The compiler just needs to do the calculations the same way as variables are calculated with the extra step of re-evaluating to see if the precision can be reduced when it's done. James program Const_Vs_Var; Const A_const = Integer(8427); B_const = Byte(33); C_const = Single(1440.5); Win_Calc = 8427.0229087122526900381811870878; Const_Ans = A_Const+B_Const/C_Const; Var A_Var : Integer; B_Var : Byte; C_Var : Single; Const_Ans1, Const_Ans2, Const_Ans3, Var_Ans1, Var_Ans2, Var_Ans3 : Extended; Begin A_Var := A_Const; B_Var := B_Const; C_Var := C_Const; Var_Ans1 := A_Var+B_Var/C_Var; Const_Ans1 := A_Const+B_Const/C_Const; Var_Ans2 := Integer(A_Var)+Byte(B_Var)/Single(C_Var); Const_Ans2 := Integer(A_Const)+Byte(B_Const)/Single(C_Const); Var_Ans3 := Extended(A_Var)+Extended(B_Var)/Extended(C_Var); Const_Ans3 := Extended(A_Const)+Extended(B_Const)/Extended(C_Const); WRITELN ( ' Win_Calc = ', Win_Calc:20:20) ; WRITELN ( ' Const_Ans = ', Const_Ans:20:20 ,' Win_Calc-Const_Ans = ',Win_Calc-Const_Ans:20:20) ; WRITELN ( ' Const_Ans1 = ', Const_Ans1:20:20 ,' Win_Calc-Const_Ans1 = ',Win_Ca
Re: [fpc-pascal] Floating point question
In this example below, the performance argument does not count IMO, because the complete computation can be done at compile time. That's why IMO in all 3 cases the values on the right side should be computed with maximum precision (of course independent of the left side), and in an ideal world it should be done at compile time. But if not: anyway with max precision. Tagging the FP constants with FP attributes like single, double and extended and then doing arithmetic on them which leads to MATHEMATICAL results which are unexpected is IMO wrong and would not be accepted in most other programming languages or compilers. This is NOT about variables ... they have attributes and there you can explain all sort of strange behaviour. It's about CONSTANT EXPRESSIONS (which can and should be evaluated at compile time, and the result should be the same, no matter if the evaluation is done at compile time or not). That said: if you have arithmetic involving a single variable and a FP constant, say x + 1440.0 you don't need to handle this as an extended arithmetic IMO, if you accept my statement above. You can treat the 1440.0 as a single constant in this case, if you wish. It's all about context ... Kind regards Bernd Am 12.02.2024 um 10:44 schrieb Thomas Kurz via fpc-pascal: I wouldn't say so. Or at least, not generally. Why can't the compiler do what the programer intends to do: var s: single; d: double; e: extended; begin s := 8427.0 + 33.0 / 1440.0; // treat all constants all "single" d := 8427.0 + 33.0 / 1440.0; // treat all constants all "double" e := 8427.0 + 33.0 / 1440.0; // treat all constants all "extended" end. Shouldn't this satisfy all the needs? Those caring for precision will work with double precision and don't have to take care for a loss in precision. Those caring for speed can use the single precision type and be sure that no costly conversion to double or extended will take place. - Original Message - From: Jonas Maebe via fpc-pascal To: fpc-pascal@lists.freepascal.org Sent: Sunday, February 11, 2024, 23:29:42 Subject: [fpc-pascal] Floating point question On 11/02/2024 23:21, Bernd Oppolzer via fpc-pascal wrote: and this would IMHO be the solution which is the easiest to document and maybe to implement and which would satisfy the users. And generate the slowest code possible on most platforms. Jonas ___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal ___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal ___ fpc-pascal maillist - fpc-pascal@lists.freepascal.org https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal