Hello Eike,

Let me strongly disagree with your post. I can explain why.

> Well, 15-digits might have been intended behaviour in the '70s, but is
> hardly appropriate in 2008.

IEEE 754 double did not change.

But spreadsheet use did change. Doubles don't fit anymore.

For the interested reader, I would always suggest to start first with Prof. Kahan's texts. I imply from Prof. Kahan's "How JAVA's Floating-Point Hurts Everyone Everywhere" (http://www.cs.berkeley.edu/~wkahan/JAVAhurt.pdf) that Sun did not do it that correctly after all. [**]

I gave some eloquent examples on the OASIS list how - even tiny errors - can result in the full catastrophe, with multi-million losses and even life loss, and these were NOT non-professionals who designed those systems. To quote from my post on OASIS:
Otherwise, this might end as the maiden flight of the Arianne 5 rocket:
http://www.ima.umn.edu/~arnold/disasters/ariane.html 
<http://www.ima.umn.edu/%7Earnold/disasters/ariane.html>
or the failed Patriot-rocket:
http://www.ima.umn.edu/~arnold/disasters/patriot.html 
<http://www.ima.umn.edu/%7Earnold/disasters/patriot.html>
To name but a few floating-point miscalculations.

Where does the problem stem from?
As I explained, spreadsheets are used in every imaginable and non-imaginable way. Users perform (sometimes) non-trivial calculations.

One thing did change over the years: spreadsheets have increased in size. So did the need for *accuracy*, while the *error tolerability* has steadily decreased.

A tiny error gets amplified as exemplified by the Patriot rocket: allow it enough time, or enough operations (e.g. a spreadsheet with > 65,000 rows) and it will get big, sometimes catastrophically big. Even a tiny error might become relevant under such circumstances, and especially if used in subsequent operations. The product between the Scud's missile velocity and the tiny error was large enough to preclude an efficient interception. Would you want to live near a nuclear plant if some of the design was made using a spreadsheet? It is enough to fail once (and time plays against us - exposing even trivial errors).

Now, back to overflow:

Detecting overflows doesn't help against roundoff errors.

Indeed, (float) 29,513,736 * (float) 92,842,033 won't produce an overflow, just a roundoff error.

BUT: (integer) 29,513,736 * (integer) 92,842,033 would raise an overflow, and so the (explicit) conversion to float could be signaled to the user. He would know now that the result is inaccurate due to roundoff. This is only possible IF spreadsheets implement true integers not just floats (and this seems sensible, because financial operations work mostly with accurate numbers; [in order to work with integers, the last 2 digits of the currency (the cents) could be moved to the integer part]).

[Integer: e.g. long or long long, or some other format, although long long should have computed even the previous result fine. So, long long would be an excellent choice.]

32 bit integers are not enough, because most revenues for bigger companies are already above this limit. I would suggest going to Sun's financial department and asking the correct figures. I will assume a value above > $ 10 biliion for fiscal year 2007.

You will need 11 digits to represent this number. If you add the 2 decimals, you end with 13 digits, this is damn close to the maximum precision of 15 digits for double. Just some unfortunate multiplication with 100 for some conversion, and you loose accuracy. And imagine you do a forecasting and end with terms of the form X^2, you loose almost half the digits (and even more).

How many numbers on this site are close to this limit?
http://www.deutsche-bank.de/ir/en/content/funding.htm

Yes, there are companies that come close to the limit, and there are governments that have passed beyond this limit. And the secret is, auditors out there will look for every missing cent. But consider intermediate results during some forecasting for the next 10 years: you will loose damn many digits, not just a few cents, and over 10 years the results will add up.

Now, another big problem arises from unfortunate formulas. Unfortunately, way over 95% of users don't know much about numerical stability (to quote Prof. Kahan and James Gosling in a slightly different context).*

Terms of the form: (x - 1) /(EXP(x - 1) - 1) should be known to users working in the financial sector. The term 1/(EXP(x -1) - 1) is particularly unstable around 1:
x - 1 = 0 => DIVISION by 0 (NaN)
x - 1 = -0 (negative underflow) => result is -INF
x - 1 = +0 (positive underflow) => result is +INF

Actually, you can get any result between -INF and +INF. Considering that this is an investment rate, the end results summed over a few years would be catastrophic, I hope everyone agrees. (Fortunately, (x - 1) / (...) is slightly more stable, and looses only half the digits around x =1, but IF a user manually crafts the formula and computes intermediate results, then he will loose ALL the digits. How many users are numerical analysts to devise numerically stable formulas? I am for sure NOT one of the analysts.)

If there are some things to be aware of, then I would like to quote Prof. Kahan:
*Thirteen Prevalent Misconceptions about Floating-Point Arithmetic:*
3. Arithmetic much more precise than the data it operates upon is needless, and wasteful.

I have concerns that double is not appropriate anymore. It is not appropriate for Sun, nor for IBM, nor for Google, nor for any financial institution nor any government.

Is it possible to use greater precision? And still attain reasonable speed?
The answer is definitely yes.

R, Matlab, other financial tools are orders of magnitude faster than Calc. And work usually with greater precision than Calc anyway. Therefore, it should be at least theoretically possible to speed up Calc to work fast enough with higher precision.

I am looking forward to see at least the double-extended data type. And hopefully a long long (long64) integer. And some D(ecimal)-type functions. And then maybe some IEEE 754r features. And maybe some other great ideas. And probably I'll complain even then. ;)

Sincerely,

Leonard

* “ 95% of the folks out there are completely clueless about floating-point.” ( J.G., 28 Feb. 1998 )
(Maybe more than 95% ?) [Prof. Kahan]

** For a lot more details, please visit Prof. Kahan's website:
http://www.cs.berkeley.edu/~wkahan/

I added some comments in-line, like a DSUM() function for decimal floating point.


Eike Rathke wrote:
Hi Leonard,

On Friday, 2008-07-04 03:07:04 +0300, Leonard Mada wrote:

Well, 15-digits might have been intended behaviour in the '70s, but is hardly appropriate in 2008.

IEEE 754 double did not change.

I am really amazed, that programs still cling to 15-digits. Well, the 64-bit double precision might limit calculations to ~16-digits, but professional programs should do better. Even MS Windows calculator ouperforms both Excel and Calc (as the previous article states).

So what? Comparing apples (e.g. BCD arithmetic or IEEE 854) with oranges
(IEEE 754) does not help at all. I assume you do not want Calc to use
BCD or other arbitrary precision instead of floating point at the cost
of multiples of calculation time?

Yes, I do want special functions (e.g. a DSUM) to perform exactly these high accuracy operations. Users who depend on accurate results might invest the time in using these functions. Knowing that such functions do exist will also remove some of the misconceptions and errors.

I sincerely hope that OOo increases the precision, at least if an operation overflows, Calc should increase the precision to 24-32 digits. [Most existent processor designs implement the IEEE754 Exceptions, including the OVERFLOW exception, therefore it is possible to catch such an overflow. In this case, Calc should either display a warning that the result is wrong - and offer a robust calculation mode, or just use a greater precision. Hiding the error is not very rational.]

Detecting overflows doesn't help against roundoff errors.

And there are many other issues. Floating points spring undoubtedly to mind. I hope that the floating point arithmetic improves drastically, too.

With IEEE 754 most improvement is about curing symptoms and cheating the
user because she expects 0.3-0.2-0.1 to equal 0.0

I advise the interested reader to read some of Kahan's papers, see:
http://www.cs.berkeley.edu/~wkahan/
[e.g. http://www.cs.berkeley.edu/~wkahan/Mindless.pdf]
[Kahan is the architect of IEEE 754 - he is also known as "The Father of Floating Point".]

Well, yes, nice read, it states what can go wrong and at the end the
best solution is he hopes "that popular programming languages will
support Interval Arithmetic of arbitrarily high (within limits)
precision variable (coarsely) at run-time. Then programmers may use it
to prove most of their numerical software free from roundoff-induced
anomalies even if it runs sometimes slower than usual."

I don't see where we'd currently benefit, but maybe I browsed too
quickly.

For the interested reader:

Numerical Computation Guide
http://docs.sun.com/source/819-3693/index.html
    2005, Sun Microsystems Studio 11.

What Every Computer Scientist Should Know About Floating-Point
Arithmetic
http://docs.sun.com/source/819-0499/ncg_goldberg.html
    Appendix D of the Numerical Computation Guide.

Numerical Computation Guide
http://docs.sun.com/source/806-3568/index.html
    2002, may be better suited for high resolution displays,
    mathematical formulas have better readable graphic images.


  Eike



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to