* Jonathan S. Shapiro:

>>> Second, a 64-bit single-precision unsigned integer has far more precision
>>> than a 64-bit double precision float.
>>
>> But apparently not enough.

> Sorry; that's just not true. A double precision float effectively has a 53
> bit significand (52-bits plus an implicit "1" in the most significant bit
> position). Though IEEE doesn't specify how rounding modes apply to
> string->double conversion, even the obscure IEEE rounding modes can be done
> with an additional 3 bits of internal precision. So yes, a 64-bit unsigned
> integer really *does* have enough bits.

I suppose the routines you look at can only correctly deal with a
restricted set of inputs (probably those generated by the
corresponding binary-to-decimal conversion).  But they fail on
arbitrary precision decimal input.

Here's why I think my counterexample is correct.

The two floating point values have the IEEE representation
3FEFFFFFFFFFFFFD and 3FEFFFFFFFFFFFFE.

3FEFFFFFFFFFFFFD is the rational number 2**(0x3fd - 1023) *
0x1ffffffffffffd * 2**(-52).  I'm using CLISP's rational
arithmetic (which is exact):

(setq D
  (* (expt 2 (- #x3fe 1023)) #x1ffffffffffffd (expt 2 -52)))

CLISP evalutes this to 9007199254740989/9007199254740992.

Similarly, 3FEFFFFFFFFFFFFE is this rational number:

(setq E
  (* (expt 2 (- #x3fe 1023)) #x1ffffffffffffe (expt 2 -52)))
  --> 4503599627370495/4503599627370496

0.999999999999999722444243843710864894092082977294921875
is the rational number
999999999999999722444243843710864894092082977294921875/1000000000000000000000000000000000000000000000000000000:

(setq IN875
  
999999999999999722444243843710864894092082977294921875/1000000000000000000000000000000000000000000000000000000)
  --> 18014398509481979/18014398509481984

0.99999999999999972244424384371086489409208297729492187 is this:

(setq IN87
  
99999999999999972244424384371086489409208297729492187/100000000000000000000000000000000000000000000000000000)

The distance between IN87 and D, E is

(abs (- IN87 D))
  -->  
2775557561562891351059079170227050781/50000000000000000000000000000000000000000000000000000
(abs (- IN87 E))
  --> 
5551115123125782702118158340454101563/100000000000000000000000000000000000000000000000000000

The distance to D is smaller:

(< (abs (- IN87 D)) (abs (- IN87 E)))
  --> T

For IN875, the picture is this:

(abs (- IN875 D))
  --> 1/18014398509481984
(abs (- IN875 E))
  --> 1/18014398509481984

So IN875 is right in the middle between D and E, and the correct
conversion result depends on the rounding mode.  This is rather
unexpected.  (But not too surprising in retrospect, considering how I
came up with the number.)  I guess it explains why you got different
results when converting this number.

So let's introduce another number,
0.999999999999999722444243843710864894092082977294921876:
 1000000000000000000000000000000000000000000000000000000

(setq IN876
  
999999999999999722444243843710864894092082977294921876/1000000000000000000000000000000000000000000000000000000)
  --> 
249999999999999930611060960927716223523020744323730469/250000000000000000000000000000000000000000000000000000

The distances are:

(abs (- IN876 D))
  --> 
27755575615628913510590791702270507813/500000000000000000000000000000000000000000000000000000
(abs (- IN876 E))
  --> 
6938893903907228377647697925567626953/125000000000000000000000000000000000000000000000000000

This time, the distance to E is shorter:

(< (abs (- IN876 D)) (abs (- IN876 E)))
  --> nil

So, to recap, the correct conversion for
0.99999999999999972244424384371086489409208297729492187 is
0.9999999999999997, and for
0.999999999999999722444243843710864894092082977294921876, it's
0.9999999999999998.  For
0.999999999999999722444243843710864894092082977294921875, either
result is correct, depending on the rounding mode.  These numbers
agree in the first 53 decimal places, not just the first 18
representable in a 64-bit integer.

I suppose if you have shorter inputs, using 64-bit arithmetic can work
(but I'd rather have a formal proof of that).  But you still need
multi-precision arithmetic for longer inputs.  You cannot simply
truncate the number by discarding all the remaining digits once you
hit the 64-bit overflow.
_______________________________________________
bitc-dev mailing list
[email protected]
http://www.coyotos.org/mailman/listinfo/bitc-dev

Reply via email to