Re: Numbers with small mantissa widths

Marc Nieper-Wißkirchen Tue, 03 Sep 2024 09:46:00 -0700

Am Di., 3. Sept. 2024 um 01:27 Uhr schrieb Will Clinger <[email protected]
>:


This is a lengthy statement of my opinions and some facts
> explaining those opinions, as I hope this can be my last
> submission to the SRFI 77 discussion of this subject.
>
> BRIEF RESPONSE TO MARC's MOST RECENT EMAIL.
>
> Marc seems to think he can retain both his Claim 1 and his
> Claim 2.  He's wrong about that.  To bypass further back
> and forth about this, I'll just say his Claim 1 is the one
> he cares about.  He made his Claim 2 as part of an argument
> that the semantics of his Claim 1 is easier to implement
> efficiently than I was saying.  Because he still doesn't
> see why he can't retain both claims, I suspect his "simple"
> algorithm actually relies on Claim 2 and therefore fails
> to implement the semantics of his Claim 1.  All that means
> is that, if/when he gets around to creating an algorithm
> that truly implements his Claim 1 semantics, his algorithm
> will be less efficient than he currently believes.
>
> I will therefore assume Marc will eventually come to
> realize he must choose between his Claim 1 and Claim 2,
> and that he will, at that point, retain Claim 1 while
> abandoning Claim 2.  That assumption allows me to
> proceed to my final summary of the situation.
>

As no reason was given why Will still thinks my Claim 2 contradicts Claim 1
(such reason would have to point out a mistake in the proof I made in my
previous post), I can't say much about it.

However, it is not too relevant to the actual topic of this thread for the
following two reasons:

(1) Claim 2 is only about further possibilities of implementing the
semantics.
(2) I wouldn't use the statement I made in Claim 2 to implement the
semantics anyway because I would like (inexact #eXXX) to evaluate to the
same result as the number literal XXX.  This is, in my opinion, much more
preferable than a potentially faster algorithm to read numbers with
explicit mantissa widths.

The "simple algorithm" (not yet fully tested) I created for Chez evaluates
x|p as (inexact #ex|p).  In other words, I only had to implement the
reading of literals of the form #ex|p.  The value of #ex|p, being an exact
number, has nothing to do with the IEEE double format nor with any other
inexact number formats used by the implementation.  Moreover, I have to use
bigint arithmetics because p can be arbitrarily large.  My "simple
algorithm" has nothing to do with Claim 2.


> SEMANTICS.
>
> In my previous missive, I somehow deleted an important
> SRFI 77 and R6RS loophole from my statement of Marc's
> Claim 1.  I'll correct that statement here.
>
>     Claim 1.  The R6RS requires the x|p notation to be read
>     as (1) the best possible binary floating-point
>     approximation to x that uses p bits of significand (which
>     is mathematically well-defined), and then, in systems
>     that use binary floating-point representations for
>     inexact reals, (2) that best possible p-bit binary
>     floating-point approximation is converted to the best
>     possible floating-point approximation that uses p bits
>     "if practical, or by a greater precision if a p-bit
>     significand is not practical, or by the largest available
>     precision if p or more bits of significand are not
>     practical within the implementation."
>
> In the part of that claim that quotes R6RS, the three
> "practical" loopholes are necessary because implementations
> of the R6RS are unlikely to support inexact reals of all
> possible precisions p.  In at least some implementations,
> all inexact reals are represented in IEEE double precision,
> with p=53.  Throughout most of this note, I will assume all
> inexact reals are represented in IEEE double precision,
> because dealing with multiple precisions for inexact reals
> complicates the exposition to no "real" purpose.  I will
> also assume the implementation uses correct rounding, which
> is to say external representations of inexact reals (other
> than the x|p notation, which I must treat separately) are
> converted into the best possible IEEE double precision
> approximation.
>
> I will refer to claim 1 as "Marc's preferred semantics", and
> will be comparing it to "Will's preferred semantics", which
> is
>
> Will's preferred semantics:
>
>     The R6RS allows implementations to read the x|p notation
>     by selecting a precision q that satisfies both (1) and (2):
>
>         (1) the implementation supports q-bit floating point
>         as one of its representations for inexact reals
>
>         (2) one of the following is true:
>
>             (a) q = p
>
>             (b) q > p, and the implementation does not support
>             inexact reals of any precision that lies between
>             p and q
>
>             (c) q < p, but the implementation does not support
>             inexact reals of any precision greater than q
>
>     Then, having selected q, x|p is read as an inexact real
>     whose value is the best possible q-bit approximation to x.
>

> It is my opinion that the deliberate ambiguity and deliberate
> loopholes of R6RS 4.2.8 allow (but do not require) Marc's
> preferred semantics, and also allow implementations of the
> R6RS that use Will's preferred semantics.
>

Will's semantics double the purpose of the S, F, D, and L markers.  In
fact, Will's semantics improve on the semantics of the S, F, D, and L
markers because the S, F, D, and L markers are relatively underspecified.

A Scheme standard that follows Will's semantics therefore doesn't need the
S, F, D, and L markers.  Yet they are in R6RS.

I already stated in some previous posts that the text as written does not
allow Will's semantics unless it is read so loosely that the x|p notation
becomes meaningless.  I excluded such loose reading because that cannot
have been the intent (and, as Mike wrote, wasn't the intent of the author
who added the x|p notation to R6RS).

I would like to repeat that this thread is not about what else could have
been specified about the x|p notation (Will's semantics are certainly
equally viable if the text of R6RS supported it and not Marc's semantics
and if there were no S, F, D, and L markers), but about what's said in SRFI
77/R6RS.

Even ignoring the above, there is another reason why I think that Will's
semantics are not preferable as long as other parts of SRFI 77/R6RS are not
changed: The meaning of #ex|p has never been disputed and I hope we agree
on its meaning.  With Will's semantics, one doesn't get that (inexact
#ex|p) evaluates to the value of x|p.


> It is also my opinion that implementations and most users of
> the R6RS should prefer Will's preferred semantics.  The rest
> of this note explains why I hold that opinion.
>
> ****************************************************************
>
> ACCURACY.
>
> Marc's preferred semantics generally implies double rounding,
> which is usually regarded as something to be avoided because
> it can degrade accuracy.
>
> Theorem.  For all x and for all p, Will's preferred semantics
> delivers an inexact real that approximates x at least as well
> as Marc's preferred semantics, and the inexact real obtained
> via Will's preferred semantics is often a better approximation
> to x than the inexact real obtained via Marc's preferred
> semantics.
>
> That theorem is uncontroversial.  It underscores the fact that
> Marc's preferred semantics is not about using the x|p notation
> as a good approximation for x.  Marc's preferred semantics is
> about using the x|p notation as an approximation for the p-bit
> floating point approximation of x.
>

Indeed, and this is exactly what SRFI 77/R6RS says about it.  I tried to
explain in my previous post why I think that the term "double rounding"
does not fit exactly.


> Aside from the use case described in Footnote 1, for which the
> x|p notation is clearly worthless, I am not aware of any common
> use case for which someone would really want to be computing
> q-bit floating point approximations to p-bit floating point
> approximations.  Such use cases might exist, but they are not
> common.
>
> On the other hand, it is not unheard of for someone to want to
> compute the inexact real that best approximates x using at
> least p bits of precision.  That might not be a common use
> case, but I maintain that it is more common than wanting to
> compute q-bit approximations to p-bit approximations of x.
>
> That is the main reason why I prefer Will's preferred semantics
> to Marc's preferred semantics, and it is why I believe Will's
> preferred semantics should be preferred by implementations and
> most users to Marc's preferred semantics.
>
> But there are other reasons as well, to which I will turn after
> quantifying the probability that the double rounding implied by
> Marc's preferred semantics will degrade accuracy.
>
> In the following questions, the interval (1, 1000000) serves
> only to guarantee that any real number within the interval
> can be approximated by a normalized IEEE double precision
> result.
>
> QUESTION:  If x is a real number selected at random from the
> interval (1, 1000000), what is the probability that the best
> IEEE double precision (53-bit) approximation to x is closer
> to x than the best 53-bit approximation to the best 54-bit
> approximation of x?
>
> ANSWER: Roughly 1/4.
>
> Sketch of proof:  The set of real numbers representable in
> IEEE double precision has measure zero, so we can calculate
> the probability without considering them.  Let a be the best
> 53-bit approximation to x for which a < x, and let b be the
> least 53-bit floating point value for which a < b.  With very
> high probability, (a+b)/2 is representable using 54 bits of
> precision, and is the only real number between a and b that
> can be represented using 54 bits.  Dividing the interval (a,b)
> into four equal quadrants, we find that an x in the quadrants
> (a, (3a+b)/4) or ((a+3b)/4, b) is best approximated by a or b,
> respectively, regardless of whether we use 53 or 54 bits of
> precision.  If the best 53-bit approximation to (a+b)/2 is b,
> however, then a is the best 53-bit approximation to every x
> in the interval ((3a+b)/4, (a+b)/2), but converting x to its
> best 54-bit approximation yields (a+b)/2, whose best 53-bit
> approximation is b, not a.  Thus the processing of converting
> x first to 54 bits and then to 53 bits involves double rounding
> that degrades accuracy for one quarter of the real numbers in
> (a, b).  Similarly if the best 53-bit approximation to (a+b)/2
> is a.  QED
>
> Generalizing the question, answer, and proof sketch, we find
> that the probability of degraded accuracy becomes 1/8 if we
> first convert to 55 bits and then to 53.  That probability
> becomes 1/16 if we first convert to 57 bits and then to 53.
> And so on.
>
> QUESTION:  If x is a real number selected at random from the
> interval (1, 1000000), what is the probability that the best
> IEEE double precision (53-bit) approximation to x is closer
> to x than the best 53-bit approximation to the best 52-bit
> approximation of x?
>
> ANSWER: Roughly 1/2.
>
> Generalizing, we find that the probability of degraded accuracy
> becomes 3/4 if we first convert to 51 bits and then to 53.  If
> we first convert to 50 bits and then to 53, the probability of
> degraded accuracy is 7/8.  And so on.
>
> From the above, it is perfectly clear that Marc's preferred
> semantics is not about finding accurate approximations to x,
> whereas Will's preferred semantics always delivers the most
> accurate q-bit approximation to x that is possible within a
> given implementation, for a particular precision q that is
> clearly allowed by the constraints stated in R6RS 4.2.8.
>

While the probability calculations are interesting on their own (and
implicit in my proof from my earlier post), they are irrelevant unless
someone modifies the R6RS text (and that someone should probably also get
rid of the S, F, D, and L markers).  The x|p notation, originally coming
from Mike's paper, simply serves a different purpose.  (Whether that
purpose is useful enough to warrant this explicit notation is outside the
scope of this thread.)

Even under the assumption that Will's semantics are more useful, all
usefulness would be gone if an interpretation could decide not to follow it
but to follow some other semantics (Marc's, for example).


> ****************************************************************
>
> EFFICIENCY.
>
> Marc's preferred semantics implies double rounding, whereas
> Will's preferred semantics implies single rounding, so Marc's
> preferred semantics is twice as expensive as Will's even if
> all rounding steps are equally efficient.
>
> It is likely, however, that rounding to one of the precisions
> supported by an implementation's selection of formats for
> inexact reals will be less costly than rounding to some other
> precision.  There are three reasons for that.  One is that
> all external representations for inexact numbers that don't
> involve the x|p notation will round to a supported precision,
> giving implementations an incentive to optimize that common
> case.  Another is that at least some supported precisions
> are likely to match a precision supported by floating point
> hardware, which allows implementations to use more efficient
> algorithms such as "Clinger's method".  (David Gay improved
> and implemented that method in C.  Nowadays some variation
> of that algorithm is part of multiple subsystems installed
> on virtually every general purpose computing system and smart
> phone.)
>
> The third reason is that, with Marc's preferred semantics,
> implementations that would otherwise use conversion routines
> available via standard C libraries would probably have to
> implement arbitrary-precision conversions in Scheme.  Chez
> Scheme might be fast enough to get by with using bignum
> arithmetic to support arbitrary precisions, but many other
> implementations are not.  In particular, some implementations
> are interpreted, so using Scheme code to implement the
> conversions mandated by Marc's preferred semantics might
> well be considerably slower than the conversions necessary
> for Will's preferred semantics.
>
> Thus Will's preferred semantics is likely to be implemented
> more than twice as efficiently as Marc's, and quite possibly
> a great deal more than twice as efficiently.
>

Any Scheme implementation must implement #ex|p anyway so the code for my
"simple algorithm" is already there.  I already stated that efficiency
would only be a concern if the x|p notation occurred often in practice,
which is not the case.  Moreover, an implementation that implements bigints
via Scheme code has other more important efficiency bottlenecks than the
reading of the x|p notation.


> ****************************************************************
>
> SUMMARY.
>
> The text of R6RS 4.2.8 allows Will's preferred semantics,
> and also allows Marc's preferred semantics.
>

This is claimed by Will but disputed by me.  As I said above and in earlier
posts, if both semantics were allowed, the meaning of x|p would be so
arbitrary that it would not serve any meaningful purpose.  Fortunately, a
strict, literal reading of the text fixes the semantics (and also gives
back the intended semantics).

Will's preferred semantics is more commonly useful, more
> accurate, and more efficient than Marc's preferred semantics.
>
> To dispute the "more commonly useful" part of that sentence,
> it would be necessary to cite applications of Marc's preferred
> semantics that are more common or more useful than expecting
> the x|p notation to deliver an accurate approximation for x.
>

This is not relevant to this thread.  It would be relevant if a revision of
SRFI 77 (or R6RS) were written.

That said, the purpose I see in what I should better call Mike's semantics
(because they come from his paper) is that it allows writing short literals
of (mathematical!) binary floating-point numbers using decimal notations.

To dispute the "more accurate" part of that sentence, it
> would be necessary to adopt a notion of "accurate" that is
> not about finding accurate approximations for x.  To argue
> for that alternative notion of accuracy, one would have to
> explain why that alternative notion is more commonly useful
> than what "accurate" means in the context of finding an
> accurate approximation for x.
>

This notion of accuracy plays no role in the semantics as written.  To say
it differently, we are back at the point of whether it would have been
better whether SRFI 77/R6RS used the notation for a different purpose.


> To dispute the "more efficient" part of that sentence would
> be silly.
>

This relies on the correctness of Claim 2, which hasn't been proven wrong
yet.  That said, as argued above, I prefer a less efficient solution anyway.

Marc


> Will Clinger
>
>
> FOOTNOTE 1.  I find it amusing to point out that "Clinger's
> method" for reading floating point numbers accurately is the
> only algorithm I can think of that needs to compute q-bit
> floating point approximations to p-bit floating point
> approximations.  In that algorithm, the precision p *must*
> be greater than the ultimate precision q, contrary to what
> R6RS 4.2.8 says about the x|p notation, which insists p
> *must* be less than or equal to the precision q of the
> second approximation whenever that is at all practical.
> So "Clinger's method" is not a use case for x|p notation,
> regardless of whose semantics you prefer.
>
>

Re: Numbers with small mantissa widths

Reply via email to