Re: [fpc-pascal] Floating point question

James Richters via fpc-pascal Sun, 04 Feb 2024 00:48:36 -0800

I don't understand it either,  the result of the 33/1440 is being stored in a 
single precision apparently, but it shouldn't be,.
If TT is Double or Extended, then all parts of the evaluation of TT should be 
carried out in the same way, whether evaluated
By the compiler or the program.  That is what I expect, but that is not what is 
happening.
 
program TESTDBL1 ;
 
Const
    TT_Const = 8427 + 33 / 1440.0 ;
    SS_Const = 8427 + Double(33 / 1440.0) ;
 
Var
    AA_Double : Double;
    BB_Double : Double;
    CC_Double : Double;
    DD_Double : Double;
    EE_Double : Double;
    FF_Double : Double;
    GG_Double : Double;
    HH_Double : Double;
    II_Double : Double;
    JJ_Double : Double;
    KK_Double : Double;
    SS_Double : Double;
    TT_Double : Double;
    VV_Single : Single;
    WW_Single : Single;
    XX_Single : Single;
    YY_Single : Single;
    ZZ_Single : Single;
 
begin
   AA_Double := 8427;
   BB_Double := 33/1440;
   CC_Double := AA_Double+BB_Double;
   DD_Double := 8427 + 33 / 1440.0 ;
   VV_Single := 8427;
   WW_Single := 33/1440;
   XX_Single := VV_Single+WW_Single;
   YY_Single := 8427 + 33 / 1440.0 ;
   ZZ_Single := DD_Double;
   EE_Double := Double(8427 + 33 / 1440.0) ;
   FF_Double := 8427 + Double(33 / 1440.0) ;
   GG_Double := Double(8427) + Double(33) / Double(1440.0) ;
   HH_Double := Double(8427 + Single(33 / 1440.0)) ;
   II_Double := 33;
   JJ_Double := 1440;
   KK_Double := AA_Double+II_Double/JJ_Double;
   SS_Double := SS_Const;
   TT_Double := TT_Const;
   
 
   WRITELN ( 'AA_Double := 8427;                                        =' , 
AA_Double : 20 : 20 ) ;
   WRITELN ( 'BB_Double := 33/1440;                                     =' , 
BB_Double : 20 : 20 ) ;
   WRITELN ( 'CC_Double := AA_Double+BB_Double;                         =' , 
CC_Double : 20 : 20 ) ;
   WRITELN ( 'DD_Double := 8427 + 33 / 1440.0 ;                         =' , 
DD_Double : 20 : 20 ) ;
   WRITELN ( 'VV_Single := 8427;                                        =' , 
VV_Single : 20 : 20 ) ;
   WRITELN ( 'WW_Single := 33/1440;                                     =' , 
WW_Single : 20 : 20 ) ;
   WRITELN ( 'XX_Single := VV_Single+WW_Single;                         =' , 
XX_Single : 20 : 20 ) ;
   WRITELN ( 'YY_Single := 8427 + 33 / 1440.0 ;                         =' , 
YY_Single : 20 : 20 ) ;
   WRITELN ( 'ZZ_Single := DD_Double;                                   =' , 
ZZ_Single : 20 : 20 ) ;
   WRITELN ( 'EE_Double := Double(8427 + 33 / 1440.0) ;                 =' , 
EE_Double : 20 : 20 ) ;
   WRITELN ( 'FF_Double := 8427 + Double(33 / 1440.0) ;                 =' , 
FF_Double : 20 : 20 ) ;
   WRITELN ( 'GG_Double := Double(8427) + Double(33) / Double(1440.0) ; =' , 
GG_Double : 20 : 20 ) ;
   WRITELN ( 'HH_Double := Double(8427 + Single(33 / 1440.0)) ;         =' , 
HH_Double : 20 : 20 ) ;
   WRITELN ( 'KK_Double := AA_Double+II_Double/JJ_Double;               =' , 
KK_Double : 20 : 20 ) ;
   WRITELN ( 'TT_Const = 8427 + 33 / 1440.0 ;                           =' , 
TT_Const  : 20 : 20 ) ;
   WRITELN ( 'SS_Const = Double(8427 + 33 / 1440.0);                    =' , 
SS_Const  : 20 : 20 ) ;
   WRITELN ( 'TT_Double := TT_Const;                                    =' , 
TT_Double : 20 : 20 ) ;
   WRITELN ( 'SS_Double := SS_Const;                                    =' , 
SS_Double : 20 : 20 ) ;
end.
 
AA_Double := 8427;                                        
=8427.00000000000000000000
BB_Double := 33/1440;                                     
=0.02291666666666666500
CC_Double := AA_Double+BB_Double;                         
=8427.02291666666680000000
DD_Double := 8427 + 33 / 1440.0 ;                         
=8427.02246093750000000000
VV_Single := 8427;                                        
=8427.00000000000000000000
WW_Single := 33/1440;                                     
=0.02291666716000000000
XX_Single := VV_Single+WW_Single;                         
=8427.02246100000000000000
YY_Single := 8427 + 33 / 1440.0 ;                         
=8427.02246100000000000000
ZZ_Single := DD_Double;                                   
=8427.02246100000000000000
EE_Double := Double(8427 + 33 / 1440.0) ;                 
=8427.02246093750000000000
FF_Double := 8427 + Double(33 / 1440.0) ;                 
=8427.02291666716340000000
GG_Double := Double(8427) + Double(33) / Double(1440.0) ; 
=8427.02291666666680000000
HH_Double := Double(8427 + Single(33 / 1440.0)) ;         
=8427.02246093750000000000
KK_Double := AA_Double+II_Double/JJ_Double;               
=8427.02291666666680000000
TT_Const = 8427 + 33 / 1440.0 ;                           
=8427.02246100000000000000
SS_Const = Double(8427 + 33 / 1440.0);                    
=8427.02291666716340000000
TT_Double := TT_Const;                                    
=8427.02246093750000000000
SS_Double := SS_Const;                                    
=8427.02291666716340000000
 
I would actually expect values that were calculated by the compiler to ALWAYS 
be done in extended and only the final answer be reduced to fit into a smaller 
variable. 
If this was the case, then the result of ALL would be 8427.0229…   
This may be debatable, but certainly when the result is to be stored in a 
double then all operations calculated by the compiler should also be stored in 
doubles, I don't see how anything else could be argued to be correct.
This is not the case at all, or DD, EE, FF, and GG would all be 8427.0229…  but 
only  FF is because I explicitly stated the result of the division is to be a 
double.
 
When the program executes and does math, in the example of BB and CC, and II, 
it’s always correct, but when the compiler evaluates it, it’s doing it wrong. 
And storing portions of the calculation in a single even if the final result is 
a double. 
The compiler should ALWAYS use the highest precision possible, because it can 
be stored in reduce precision variables, but once it’s been butchered by low 
precision, it can’t be fixed. 
 
Constants are also evaluated wrong,  you don’t know what that constant is going 
to be used for, so all steps of evaluating a constant MUST be done in extended 
by the compiler, or the answer is just wrong. 
TT_Const and SS_Const should have been the same, so that when assigned to 
double variables TT_Double and SS_Double they would also be the same.   
TT_Double and TT_Const are wrong.
 
I think this is a legitimate bug you have discovered.  I shouldn’t have to cast 
the division, it’s not what any user would expect to need to do. 
 
My tests were done on a Windows 10 64 bit machine with FPC Win32.
■ Free Pascal IDE Version 1.0.12 [2023/06/26]
■ Compiler Version 3.3.1-12875-gadf843196a



James
 
-----Original Message-----
From: fpc-pascal <fpc-pascal-boun...@lists.freepascal.org> On Behalf Of Thomas 
Kurz via fpc-pascal
Sent: Friday, February 2, 2024 4:37 PM
To: FPC-Pascal users discussions <fpc-pascal@lists.freepascal.org>
Cc: Thomas Kurz <fpc.2...@t-net.ruhr>
Subject: Re: [fpc-pascal] Floating point question
 
Well, 8427.0229...., that's what I want.
 
But what I get is 8427.0224....
 
And that's what I don't unterstand.
 
 
 
----- Original Message -----
From: Bernd Oppolzer via fpc-pascal < <mailto:fpc-pascal@lists.freepascal.org> 
fpc-pascal@lists.freepascal.org>
To: Bart via fpc-pascal < <mailto:fpc-pascal@lists.freepascal.org> 
fpc-pascal@lists.freepascal.org>
Sent: Sunday, January 28, 2024, 10:13:07
Subject: [fpc-pascal] Floating point question
 
To simplify the problem further:
 
the addition of 12 /24.0 and the subtraction of 0.5 should be removed, IMO, 
because both can be done with floats without loss of precision (0.5 can be 
represented exactly in float).
 
So the problem can be reproduced IMO with this small Pascal program:
 
program TESTDBL1 ;
 
var TT : REAL ;
 
begin (* HAUPTPROGRAMM *)
   TT := 8427 + 33 / 1440.0 ;
   WRITELN ( 'tt=' , TT : 20 : 20 ) ;
end (* HAUPTPROGRAMM *) .
 
With my compiler, REAL is always DOUBLE, and the computation is carried out by 
a P-Code interpreter (or call it just-in-time compiler - much like Java), which 
is written in C.
 
The result is:
 
tt=8427.02291666666678790000
 
and it is the same, no matter if I use this simplified computation or the 
original
 
tt := (8427 - 0.5) + (12 / 24.0) + (33 / 1440.0);
 
My value is between the two other values:
 
tt=8427.02291666666680000000
tt=8427.02291666666678790000
ee=8427.02291666666666625000
 
The problem now is:
 
the printout of my value suggest an accuracy which in fact is not there, 
because with double, you can trust only the first 16 decimal digits ... after 
that, all is speculative a.k.a. wrong. That's why FPC IMO rounds at this place, 
prints the 8, and then only zeroes.
 
The extended format internally has more hex digits and therefore can reliably 
show more decimal digits.
But the last two are wrong, too (the exact value is 66666... period).
 
HTH,
kind regards
 
Bernd
 
 
 
Am 27.01.2024 um 22:53 schrieb Bart via fpc-pascal:
> On Sat, Jan 27, 2024 at 6:23 PM Thomas Kurz via fpc-pascal 
> < <mailto:fpc-pascal@lists.freepascal.org> fpc-pascal@lists.freepascal.org>  
> wrote:
 
>> Hmmm... I don't think I can understand that. If the precision of "double" 
>> were that bad, it wouldn't be possible to store dates up to a precision of 
>> milliseconds in a TDateTime. I have a discrepancy of 40 seconds here.
> Consider the following simplified program:
> ====
> var
>    tt: double;
>    ee: extended;
 
> begin
>    tt := (8427 - Double(0.5)) + (12/ Double(24.0)) +
> (33/Double(1440.0)) + (0/Double(86400.0));
>    ee := (8427 - Extended(0.5)) + (12/ Extended(24.0)) +
> (33/Extended(1440.0)) + (0/Extended(86400.0));
>    writeln('tt=',tt:20:20);
>    writeln('ee=',ee:20:20);
> end.
> ===
 
> Now see what it outputs:
 
> C:\Users\Bart\LazarusProjecten\ConsoleProjecten>fpc test.pas Free 
> Pascal Compiler version 3.2.2 [2021/05/15] for i386 ...
 
> C:\Users\Bart\LazarusProjecten\ConsoleProjecten>test
> tt=8427.02291666666680000000
> ee=8427.02291666666666625000
 
> C:\Users\Bart\LazarusProjecten\ConsoleProjecten>fpc -Px86_64 test.pas 
> Free Pascal Compiler version 3.2.2 [2021/05/15] for x86_64 ..
 
> C:\Users\Bart\LazarusProjecten\ConsoleProjecten>test
> tt=8427.02291666666680000000
> ee=8427.02291666666680000000
 
> On Win64 both values are the same, because there Extended = Double.
> On Win32 the Extended version is a bit closer to the exact solution:
> 8427 - 1/2 + 1/2 + 33/1440 = 8427 + 11/480
 
> Simple as that.
 
> Bart
> _______________________________________________
> fpc-pascal maillist   <mailto:-fpc-pascal@lists.freepascal.org> 
> -fpc-pascal@lists.freepascal.org 
>  <https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal> 
> https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal

_______________________________________________
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal

Re: [fpc-pascal] Floating point question

Reply via email to