>>Overall, the intermediate float precision is a very difficult topic. I agree it's a difficult topic, it all comes down to what your program is doing, and whether you need performance or precision. >>And generate the slowest code possible on most platforms. I can appreciate the need to reduce precision where it's possible for the sake of performance, especially when it won't make any difference. What makes it difficult is there are many different reasons for wanting it one way, or the other, it depends on the purpose of the program, and the compiler has no way to know what the purpose is. It occurs to me that one could want part of a program to be optimized for performance and another part of the same program to be optimized for precision, for example if you are doing calculations to generate geometry, and also want to display the geometry on the screen, the data you write out to a file you would want maximum precision, but since what you will display on the screen will eventually become only integer values of pixels you want to do that math as fast as possible, especially if you want to pan / zoom / rotate, and even though what the screen data is based on might be double precision or more, I can see how reducing its precision as fast as possible would be beneficial to increase performance. So Im trying to learn something, I agree it would be better have performance where its possible and precision when needed. But I just don't understand what is going on. I'm not trying to say that this reduction in precision should not be done, I'm understanding the value in it. Im trying to figure out why the math done with constants where the compiler is doing the math is not the same as when the program does with math with variables. If the solution is to typecast where needed to get the desired results, they why isnt it working the way I expect it to? Below is a sample program, Im not trying to make everything extended, in fact quite the opposite, there is no need for the input constants / variables to be Extended because they all fit perfectly in smaller data types, so I put them all into smaller datatypes as an example. I am defining constants explicitly and defining variables the exact same way, so Im comparing apples to apples here, I have A as always an Integer, B as always a Byte, and C as always a single, with a value the fits in a single. My goal is to add the integer to a byte thats been divided by a single and get the result in Extended. When I do this with the variables, everything is as I expected, when I do this with constants, its not as I expect. This is what I dont understand, and if this worked as expected then I think everyone is happy. What ever is happening for it to work correctly during program execution should also be happening when the compiler does the math. The problem isnt that the constants got stored in lower precision its that they are somehow forcing the result of the calculation to also be at the lower precision and not re-evaluated after the math. Its completely legitimate to divide a low precision number by a low precision number and get a high precision result, it works with Variables, why doesnt it work with Constants? I suspect that whats happened is that there is something missing in the way the compiler does math, something that is not needed if it was always done at maximum precision, but that is needed with mixed precision. Its not that the fact that the constants were reduced in precision, its something to do with the way the math is done with constants of reduced precision that isnt being accounted for, and that is not necessary if calculating with full precision. Its not that the changes in 2.2 are the problem at all, its that something else needed to be done at the same time that was missed. The only way I can get the correct result when using constants is to re-cast ALL of them as extended, not just the ones involving division, and not the entire formula, but every single constant. This is what I dont understand. >>The evaluation of the expression on the right of := does not know (and should not know) what the type is of the expression on the left. Why cant the compiler do tall the math at full precision and then evaluate only the result to see if that can be stored in a lower precision. If the expression on the right cannot and should not know the type on the left, then there is a good possibility that its a high precision data type, and then there should be some provision to safeguard against data loss if the type is of high precision. Why doesnt this work? JJ := Extended(A_Const+B_Const/C_Const); It requires no knowledge of what is on the left. Why cant the math be done with high precision and the result be reduced to the smallest datatype, Math with low precision data types often results in high precision results. If I want to have a mixed program with portions in high precision and portions that are highest performance possible, then what is the correct way to accomplish the precision portions? Are we supposed to re-cast every constant at highest precision in every formula to make sure we dont lose data? This doesnt need to be done with Variables, why does it need to be done with constants? Please see my comments in the sample program. I hope it is readable, because sometimes e-mail breaks lines where I dont intend it to. James program Const_Vs_Var; Const A_const = Integer(8427); B_const = Byte(33); C_const = Single(1440.5); Var A_Var : Integer; B_Var : Byte; C_Var : Single; FF, GG, HH, II, JJ, KK, LL : Extended; Begin A_Var := A_Const; B_Var := B_Const; C_Var := C_Const; FF := A_Var+B_Var/C_Var; // This is the baseline, The math done with variables comes out the way I expect it to. GG := Integer(A_Var)+Byte(B_Var)/Single(C_Var); // This is just for emphasis that I am doing the math with the data types explicitly defined and I get the correct results. HH := Integer(A_Const)+Byte(B_Const)/Single(C_Const); // The result of this ONLY fits in an extended, and the Variable is Extended, the constants are explicitly defined as above, why is the precision of the result reduced? KK := A_Const+Extended(B_Const/C_Const); // Here Im trying to define that the result of the division should be stored as an extended. II := A_Const+B_Const/C_Const; // I really expected this to work without all the typecasting, because the constants are defined the way I want them to be. JJ := Extended(A_Const+B_Const/C_Const); // Here I am explicitly defining the result of the calculation to be Extended, why doesnt this work? LL := Extended(A_Const)+Extended(B_Const)/Extended(C_Const); // This is what I need to do to get the results I want, but I dont understand why. Why does the integer need to be converted to floating point here? WRITELN ( ' A_const = ',A_Const) ; // A_const = 8427 //Integer WRITELN ( ' A_var = ',A_Var) ; // A_var = 8427 //Integer WRITELN ( ' B_const = ',B_Const) ; // B_const = 33 //Byte WRITELN ( ' B_var = ',B_Var) ; // B_var = 33 //Byte WRITELN ( ' C_const = ',C_Const: 20 : 20 ) ; // C_const = 1440.50000000000000000000 //Single WRITELN ( ' C_var = ',C_Var: 20 : 20 ) ; // C_var = 1440.50000000000000000000 //Single WRITELN ( ' FF = ',FF:20:20 ,' FF-FF = ',FF-FF:20:20) ; // FF = 8427.02290871225268987000 FF-FF = 0.00000000000000000000 //This is what I expect WRITELN ( ' GG = ',GG:20:20 ,' FF-GG = ',FF-GG:20:20) ; // GG = 8427.02290871225268987000 FF-GG = 0.00000000000000000000 //This is what I expect WRITELN ( ' HH = ',HH:20:20 ,' FF-HH = ',FF-HH:20:20) ; // HH = 8427.02246093750000000000 FF-HH = 0.00044777475268986677 //I don't understand why this is different from GG? It's an Int + Byte / Single and cast the same way WRITELN ( ' II = ',II:20:20 ,' FF-II = ',FF-II:20:20) ; // II = 8427.02246093750000000000 FF-II = 0.00044777475268986677 //I don't understand why this is different from FF? It's an Int + Byte / Single WRITELN ( ' JJ = ',JJ:20:20 ,' FF-JJ = ',FF-JJ:20:20) ; // JJ = 8427.02246093750000000000 FF-JJ = 0.00044777475268986677 //Why doesn't this casting work? I'm saying I want the result in an Extended. WRITELN ( ' KK = ',KK:20:20 ,' FF-KK = ',FF-KK:20:20) ; // KK = 8427.02290871180593967000 FF-KK = 0.00000000044675019240 //Why is this off a little? I am casting the division to be Extended. WRITELN ( ' LL = ',KK:20:20 ,' FF-LL = ',FF-LL:20:20) ; // LL = 8427.02290871180593967000 FF-LL = 0.00000000000000000000 //Why do I need to re-cast each constant as Extended? its not what I really want, I want to add an integer to a byte divided by a single.. do it correctly and store it as Extended. End. A_const = 8427 A_var = 8427 B_const = 33 B_var = 33 C_const = 1440.50000000000000000000 C_var = 1440.50000000000000000000 FF = 8427.02290871225268987000 FF-FF = 0.00000000000000000000 GG = 8427.02290871225268987000 FF-GG = 0.00000000000000000000 HH = 8427.02246093750000000000 FF-HH = 0.00044777475268986677 II = 8427.02246093750000000000 FF-II = 0.00044777475268986677 JJ = 8427.02246093750000000000 FF-JJ = 0.00044777475268986677 KK = 8427.02290871180593967000 FF-KK = 0.00000000044675019240 LL = 8427.02290871180593967000 FF-LL = 0.00000000000000000000
_______________________________________________ fpc-pascal maillist - fpc-pascal@lists.freepascal.org https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal