Re: [sc-dev] just found ;-)

Leonard Mada Thu, 10 Jul 2008 14:32:00 -0700

Hello Eike,

Let me strongly disagree with your post. I can explain why.

> Well, 15-digits might have been intended behaviour in the '70s, but is
> hardly appropriate in 2008.

IEEE 754 double did not change.


But spreadsheet use did change. Doubles don't fit anymore.

For the interested reader, I would always suggest to start first withProf. Kahan's texts. I imply from Prof. Kahan's "How JAVA'sFloating-Point Hurts Everyone Everywhere"(http://www.cs.berkeley.edu/~wkahan/JAVAhurt.pdf) that Sun did not do itthat correctly after all. [**]

I gave some eloquent examples on the OASIS list how - even tiny errors -can result in the full catastrophe, with multi-million losses and evenlife loss, and these were NOT non-professionals who designed thosesystems. To quote from my post on OASIS:

Otherwise, this might end as the maiden flight of the Arianne 5 rocket:
http://www.ima.umn.edu/~arnold/disasters/ariane.html 
<http://www.ima.umn.edu/%7Earnold/disasters/ariane.html>
or the failed Patriot-rocket:
http://www.ima.umn.edu/~arnold/disasters/patriot.html 
<http://www.ima.umn.edu/%7Earnold/disasters/patriot.html>
To name but a few floating-point miscalculations.


Where does the problem stem from?

As I explained, spreadsheets are used in every imaginable andnon-imaginable way. Users perform (sometimes) non-trivial calculations.

One thing did change over the years: spreadsheets have increased insize. So did the need for *accuracy*, while the *error tolerability* hassteadily decreased.

A tiny error gets amplified as exemplified by the Patriot rocket: allowit enough time, or enough operations (e.g. a spreadsheet with > 65,000rows) and it will get big, sometimes catastrophically big. Even a tinyerror might become relevant under such circumstances, and especially ifused in subsequent operations. The product between the Scud's missilevelocity and the tiny error was large enough to preclude an efficientinterception. Would you want to live near a nuclear plant if some of thedesign was made using a spreadsheet? It is enough to fail once (and timeplays against us - exposing even trivial errors).


Now, back to overflow:

Detecting overflows doesn't help against roundoff errors.

Indeed, (float) 29,513,736 * (float) 92,842,033 won't produce anoverflow, just a roundoff error.

BUT: (integer) 29,513,736 * (integer) 92,842,033 would raise anoverflow, and so the (explicit) conversion to float could be signaled tothe user. He would know now that the result is inaccurate due toroundoff. This is only possible IF spreadsheets implement true integersnot just floats (and this seems sensible, because financial operationswork mostly with accurate numbers; [in order to work with integers, thelast 2 digits of the currency (the cents) could be moved to the integerpart]).

[Integer: e.g. long or long long, or some other format, although longlong should have computed even the previous result fine. So, long longwould be an excellent choice.]

32 bit integers are not enough, because most revenues for biggercompanies are already above this limit. I would suggest going to Sun'sfinancial department and asking the correct figures. I will assume avalue above > $ 10 biliion for fiscal year 2007.

You will need 11 digits to represent this number. If you add the 2decimals, you end with 13 digits, this is damn close to the maximumprecision of 15 digits for double. Just some unfortunate multiplicationwith 100 for some conversion, and you loose accuracy. And imagine you doa forecasting and end with terms of the form X^2, you loose almost halfthe digits (and even more).


How many numbers on this site are close to this limit?
http://www.deutsche-bank.de/ir/en/content/funding.htm

Yes, there are companies that come close to the limit, and there aregovernments that have passed beyond this limit. And the secret is,auditors out there will look for every missing cent. But considerintermediate results during some forecasting for the next 10 years: youwill loose damn many digits, not just a few cents, and over 10 years theresults will add up.

Now, another big problem arises from unfortunate formulas.Unfortunately, way over 95% of users don't know much about numericalstability (to quote Prof. Kahan and James Gosling in a slightlydifferent context).*

Terms of the form: (x - 1) /(EXP(x - 1) - 1) should be known to usersworking in the financial sector. The term 1/(EXP(x -1) - 1) isparticularly unstable around 1:

x - 1 = 0 => DIVISION by 0 (NaN)
x - 1 = -0 (negative underflow) => result is -INF
x - 1 = +0 (positive underflow) => result is +INF

Actually, you can get any result between -INF and +INF. Considering thatthis is an investment rate, the end results summed over a few yearswould be catastrophic, I hope everyone agrees. (Fortunately, (x - 1) /(...) is slightly more stable, and looses only half the digits around x=1, but IF a user manually crafts the formula and computes intermediateresults, then he will loose ALL the digits. How many users are numericalanalysts to devise numerically stable formulas? I am for sure NOT one ofthe analysts.)

If there are some things to be aware of, then I would like to quoteProf. Kahan:

*Thirteen Prevalent Misconceptions about Floating-Point Arithmetic:*
3. Arithmetic much more precise than the data it operates upon isneedless, and wasteful.

I have concerns that double is not appropriate anymore. It is notappropriate for Sun, nor for IBM, nor for Google, nor for any financialinstitution nor any government.


Is it possible to use greater precision? And still attain reasonable speed?
The answer is definitely yes.

R, Matlab, other financial tools are orders of magnitude faster thanCalc. And work usually with greater precision than Calc anyway.Therefore, it should be at least theoretically possible to speed up Calcto work fast enough with higher precision.

I am looking forward to see at least the double-extended data type. Andhopefully a long long (long64) integer. And some D(ecimal)-typefunctions. And then maybe some IEEE 754r features. And maybe some othergreat ideas. And probably I'll complain even then. ;)


Sincerely,

Leonard

* “ 95% of the folks out there are completely clueless aboutfloating-point.” ( J.G., 28 Feb. 1998 )

(Maybe more than 95% ?) [Prof. Kahan]

** For a lot more details, please visit Prof. Kahan's website:
http://www.cs.berkeley.edu/~wkahan/

I added some comments in-line, like a DSUM() function for decimalfloating point.



Eike Rathke wrote:

Hi Leonard,

On Friday, 2008-07-04 03:07:04 +0300, Leonard Mada wrote:
Well, 15-digits might have been intended behaviour in the '70s, but ishardly appropriate in 2008.
IEEE 754 double did not change.
I am really amazed, that programs still cling to 15-digits. Well, the64-bit double precision might limit calculations to ~16-digits, butprofessional programs should do better. Even MS Windows calculatorouperforms both Excel and Calc (as the previous article states).
So what? Comparing apples (e.g. BCD arithmetic or IEEE 854) with oranges
(IEEE 754) does not help at all. I assume you do not want Calc to use
BCD or other arbitrary precision instead of floating point at the cost
of multiples of calculation time?

Yes, I do want special functions (e.g. a DSUM) to perform exactly thesehigh accuracy operations. Users who depend on accurate results mightinvest the time in using these functions. Knowing that such functions doexist will also remove some of the misconceptions and errors.

I sincerely hope that OOo increases the precision, at least if anoperation overflows, Calc should increase the precision to 24-32 digits.[Most existent processor designs implement the IEEE754 Exceptions,including the OVERFLOW exception, therefore it is possible to catch suchan overflow. In this case, Calc should either display a warning that theresult is wrong - and offer a robust calculation mode, or just use agreater precision. Hiding the error is not very rational.]
Detecting overflows doesn't help against roundoff errors.
And there are many other issues. Floating points spring undoubtedly tomind. I hope that the floating point arithmetic improves drastically,too.
With IEEE 754 most improvement is about curing symptoms and cheating the
user because she expects 0.3-0.2-0.1 to equal 0.0
I advise the interested reader to read some of Kahan's papers, see:
http://www.cs.berkeley.edu/~wkahan/
[e.g. http://www.cs.berkeley.edu/~wkahan/Mindless.pdf]
[Kahan is the architect of IEEE 754 - he is also known as "The Father ofFloating Point".]
Well, yes, nice read, it states what can go wrong and at the end the
best solution is he hopes "that popular programming languages will
support Interval Arithmetic of arbitrarily high (within limits)
precision variable (coarsely) at run-time. Then programmers may use it
to prove most of their numerical software free from roundoff-induced
anomalies even if it runs sometimes slower than usual."

I don't see where we'd currently benefit, but maybe I browsed too
quickly.

For the interested reader:

Numerical Computation Guide
http://docs.sun.com/source/819-3693/index.html
    2005, Sun Microsystems Studio 11.

What Every Computer Scientist Should Know About Floating-Point
Arithmetic
http://docs.sun.com/source/819-0499/ncg_goldberg.html
    Appendix D of the Numerical Computation Guide.

Numerical Computation Guide
http://docs.sun.com/source/806-3568/index.html
    2002, may be better suited for high resolution displays,
    mathematical formulas have better readable graphic images.


  Eike



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: [sc-dev] just found ;-)

Reply via email to