Re: Numbers with small mantissa widths

Marc Nieper-Wißkirchen Mon, 02 Sep 2024 08:57:25 -0700

Dear Will,

thank you for your most recent post; I find it very helpful.  Also for the
possible benefit of other readers, let me rephrase what you call the
"double rounding problem", which doesn't need large numbers to be explained:


Let M and N be two sets of (real) numbers where the result of the rounding
of a real number x should lie.  E.g., M and N can be the sets of
floating-point numbers with mantissa widths m and n.  Let us assume that M
is a subset of N.  If x = y - eps where y in N and eps > 0 sufficiently
small, rounding of x to N will yield y.  Assuming that y is not in M but y
+/- d are, a rounding algorithm from N to M will have to decide with no
further information whether it maps y to y + d or y - d, both being
equidistant to y.  Now, a direct rounding of x to M is y - d.  The two-step
rounding, first to N and then to M would achieve this if the rounding
algorithm from N to M maps y to y - d.  This is not universal, however.  If
it were that x = y + eps, it would have to map y to y + d instead.

The simplest example is probably given when M and N are the sets of
floating-point numbers with mantissa widths 1 and 2.  Let x = 1011 in
binary notation.  Rounding to N gives y = 1100.  We have d = 0100.  So,
rounding to M is either 10000 (round to even or round to infinity) or 1000
(round to zero).  Now a direct rounding of x = 1011 to M gives 1000.
Finally, play the same game with x = 1101.

While the "double-rounding" problem is, according to the specialist, the
main reason making converting decimal scientific notation to binary
floating-point non-trivial, it is noteworthy that it is independent of any
decimal (or other) printing format.

One might want to explain the "double-rounding" problem by the
discontinuity of the rounding function.  In fact, when the numbers in M and
N are sufficiently evenly spaced, remembering a sign, i.e. writing y +/- 0
where +/- is a signed zero instead of y is enough to make the double
rounding equivalent to the single rounding, which I make more precise
below.  Incidentally, one can see this as a hint that the IEEE floating
point format is unbalanced in a certain sense; as branch cuts, for example,
can appear everywhere in the complex plane, not only 0 but all
floating-point numbers should remember whether they are constructed by a
limit from below or from above.

Now let's make the previous remark more precise with the following
proposition concerned with binary floating-point approximations.

Proposition: Let x be a real number.  Let m < n be natural numbers.  Let y
be a rounding of x to n significant bits.  Then there exists a rounding z
of y to m significant bits such that z is also a rounding of x to m
significant bits.

Proof: Let M and N be the set of numbers with m and n, respectively,
significant bits.  There exists t0 and t1 in M such that M does not
intersect the open interval ]t0, t1[ and such that x in [t0, t1].  We
distinguish two cases:  In the first case, N does not intersect ]t0, t1[.
In this case, any rounding of x to n significant bits is a rounding of x to
m significant bits and vice versa (and equal to t0 or t1).  In the second
case, N does intersect ]t0, t1[.  In this case, t = (t0 + t1)/2 is in N.
If x in [t0, t], we therefore have y in [t0, t].  From that it follows that
t0, which is a rounding of x to m significant bits, is a rounding of y to m
significant bits.  If x in [t, t1], we argue similarly.

After these preliminary remarks, let me finally come to the two claims in
Will's previous email.

Claim 1 is about how SRFI 77/R6RS is to be read.  Unless I misinterpret
anything in Will's text, Claim 1 is more or less literally what's in the
two documents, is consistent with Mike's paper from where the x|p idea came
from, and is also how Mike as the author of SRFI 77 who was responsible for
the x|p notation reads the text.  I, therefore, stand by Claim 1.

I would also like to add that the term "double rounding" might be slightly
misleading in this context.  In the notation 1.1|54, the first rounding
from the mathematical number 1.1 to the actual mathematical number
represented by 1.1|54 is of a slightly different semantic quality than the
the rounding of mathematical number 1.1|54 to an IEEE double.  The precise
decimal number 1.1 plays no role in the lexical notation as long as for any
other decimal number x the mathematical number x|54 is the same as 1.1|54.

Claim 2, different from Claim 1, is ultimately not about how to read SRFI
77/R6RS.  Whether Claim 2 holds or not does not touch the original topic of
this thread.  The original context of Claim 2 was a discussion on ways to
implement the semantics of Claim 1 efficiently.  If Claim 2 does not have
to be retracted, Will's method cited in Mike's paper seems to be an
efficient way to implement the semantics of Claim 1.  If Claim 2 has to be
retracted, reading numbers with an explicitly given mantissa width may not
be as efficient as reading numbers without, but this wouldn't make explicit
mantissa widths unviable because (1) some important implementations like
Chez Scheme have always used bigint arithmetic so there would be no
difference in efficiency and (2) the vast majority of number literals in
practice do not have an explicitly given mantissa width and can be read as
efficiently as before.

I made Claim 2 in response to an earlier post by Will in the context that I
didn't understand his argument that arbitrary precision would be needed to
process the x|p notation.  If Claim 2 has to be retracted, it will mean
that I have understood his argument.  (Given your helpful examples, I think
I already have understood your line of thinking, Will.)

That said, I don't think that Claim 2 in the way I intended it has to be
retracted, though.  The reason is the Proposition I formulated above.  The
point is that R6RS does not prescribe how ties are to be resolved when a
number has to be approximated by a number object with limited precision
(i.e., no rounding mode is specified).  In other words, a binary
approximation of x|53 in the IEEE floating point format is a suitable
binary approximation of x|p for any p > 53.  The only thing we lose in an
implementation that makes use of the statement of Claim 2 is that the
expression (inexact #ex|p) does not evaluate to the same number object as
x|p, but I can't find a statement that it must.  (That said, if I were an
implementer, I would choose the slower method using bigints so that the
invariant still holds.)

I can illustrate this point with the example 99999999999999983222783|54.
By Claim 1, this number literal describes the mathematical number
99999999999999983222784, which now has to be represented by IEEE double.
The two different methods ("double rounding" versus "single rounding")
give, as shown in Will's post, the approximations 99999999999999991611392
and 99999999999999974834176.  The point is now that both approximations
differ from the mathematical number 99999999999999983222784 by exactly the
same absolute value, namely 8388608, so both approximations are equally
good approximations and both methods give results allowed by R6RS.

Let me conclude by saying that we haven't touched on whether the x|p
notation is particularly helpful or useful or not.  This is outside the
scope of this thread.  The only thing we can conclude from this thread is
that it does not touch the efficiency of the rest of the number system, in
particular, the reading and writing of numbers without an explicitly given
mantissa width.

Marc

Am So., 1. Sept. 2024 um 00:48 Uhr schrieb Will Clinger <[email protected]
>:

> For the benefit of anyone who might still be reading this August 2024
> resurrection of the SRFI 77 discussion thread, let me summarize.
>
> Marc Nieper-Wißkirchen has made several claims, and it looks to me
> as though two of his claims contradict each other.
>
> On 27 August 2024, in his remarks that resurrected this thread, Marc
> made a claim to the effect that (in my words):
>
>     Claim 1.  The R6RS requires the x|p notation to be read as (1) the best
>     possible binary floating-point approximation to x that uses p bits of
>     significand (which is mathematically well-defined), and then, in
> systems
>     that use binary floating-point representations for inexact reals, (2)
> that
>     best possible p-bit binary floating-point approximation is converted
>     to the best possible floating-point approximation that uses p bits "if
>     practical, or by the largest available precision if p or more bits of
>     significand are not practical within the implementation."
>
> I believe there is general agreement that the "if...practical" loopholes
> are there to accommodate the fact that implementations usually
> support only a small finite number of floating-point precisions for
> inexact reals.  To simplify the following discussion, I will assume the
> implementation supports IEEE double precision (53 bits of significand)
> as its only representation for inexact reals.
>
> Hence the external notation
>
>     99999999999999983222784|54
>
> would first convert 99999999999999983222784 to the best possible
> 54-bit floating point approximation to 99999999999999983222784,
> which is 99999999999999983222784.0, and would then convert that
> number to its best possible approximation using IEEE double precision
> arithmetic, which is 99999999999999991611392.0, which most
> implementations of Scheme would print as 1.0e23.
>
> For x = 99999999999999983222784, in an implementation that
> represents all inexact reals using IEEE double precision floating point,
> x|53 and x|54 would be read as the same inexact real.
>
> Marc claimed that, in such an implementation, x|53 and x|54 would
> always be read as the same inexact real, regardless of x:
>
> Claim 2.
> > ...in the case of inexact numbers, [...] the
> > explicitly given mantissa width can be truncated at the maximum number
> > of significant bits (which is 53 in the case of IEEE doubles).
>
> But Marc's Claim 2 contradicts my interpretation of his Claim 1.  Consider
> the external notation
>
>     99999999999999983222783|54
>
> The best 54-bit fp approximation to 99999999999999983222783 is
> 99999999999999983222784.0, and (as seen above), the best IEEE
> double precision approximation to 99999999999999983222784.0 is
> 99999999999999991611392.0, which should print as 1.0e23.
>
> Marc's Claim 2 says 99999999999999983222783|54 should be read as
> the same inexact real as 99999999999999983222783|53.  But the best
> IEEE double precision approximation to 99999999999999983222783 is
> 99999999999999974834176.0, which most implementations of Scheme
> would print as 9.999999999999997e22.
>
> When representing inexact reals using IEEE double precision (53 bits),
> 9.999999999999997e22 is not the same inexact real as 1.0e23.
>
> Marc must therefore retract one of his two claims (or explain how I have
> been misinterpreting his claims).  What I say next will depend upon which
> claim Marc decides to retract.
>
> (Spoiler alert:  The numerical examples above illustrate the "double
> rounding"
> problem, which is the main reason converting decimal scientific notation to
> binary floating-point is non-trivial.  Marc's Claim 1, as I interpret it,
> creates a
> double rounding problem.)
>
> Will Clinger
>

Re: Numbers with small mantissa widths

Reply via email to