Hello Eike,
Let me strongly disagree with your post. I can explain why.
> Well, 15-digits might have been intended behaviour in the '70s, but is
> hardly appropriate in 2008.
IEEE 754 double did not change.
But spreadsheet use did change. Doubles don't fit anymore.
For the interested reader, I would always suggest to start first with
Prof. Kahan's texts. I imply from Prof. Kahan's "How JAVA's
Floating-Point Hurts Everyone Everywhere"
(http://www.cs.berkeley.edu/~wkahan/JAVAhurt.pdf) that Sun did not do it
that correctly after all. [**]
I gave some eloquent examples on the OASIS list how - even tiny errors -
can result in the full catastrophe, with multi-million losses and even
life loss, and these were NOT non-professionals who designed those
systems. To quote from my post on OASIS:
Otherwise, this might end as the maiden flight of the Arianne 5 rocket:
http://www.ima.umn.edu/~arnold/disasters/ariane.html
<http://www.ima.umn.edu/%7Earnold/disasters/ariane.html>
or the failed Patriot-rocket:
http://www.ima.umn.edu/~arnold/disasters/patriot.html
<http://www.ima.umn.edu/%7Earnold/disasters/patriot.html>
To name but a few floating-point miscalculations.
Where does the problem stem from?
As I explained, spreadsheets are used in every imaginable and
non-imaginable way. Users perform (sometimes) non-trivial calculations.
One thing did change over the years: spreadsheets have increased in
size. So did the need for *accuracy*, while the *error tolerability* has
steadily decreased.
A tiny error gets amplified as exemplified by the Patriot rocket: allow
it enough time, or enough operations (e.g. a spreadsheet with > 65,000
rows) and it will get big, sometimes catastrophically big. Even a tiny
error might become relevant under such circumstances, and especially if
used in subsequent operations. The product between the Scud's missile
velocity and the tiny error was large enough to preclude an efficient
interception. Would you want to live near a nuclear plant if some of the
design was made using a spreadsheet? It is enough to fail once (and time
plays against us - exposing even trivial errors).
Now, back to overflow:
Detecting overflows doesn't help against roundoff errors.
Indeed, (float) 29,513,736 * (float) 92,842,033 won't produce an
overflow, just a roundoff error.
BUT: (integer) 29,513,736 * (integer) 92,842,033 would raise an
overflow, and so the (explicit) conversion to float could be signaled to
the user. He would know now that the result is inaccurate due to
roundoff. This is only possible IF spreadsheets implement true integers
not just floats (and this seems sensible, because financial operations
work mostly with accurate numbers; [in order to work with integers, the
last 2 digits of the currency (the cents) could be moved to the integer
part]).
[Integer: e.g. long or long long, or some other format, although long
long should have computed even the previous result fine. So, long long
would be an excellent choice.]
32 bit integers are not enough, because most revenues for bigger
companies are already above this limit. I would suggest going to Sun's
financial department and asking the correct figures. I will assume a
value above > $ 10 biliion for fiscal year 2007.
You will need 11 digits to represent this number. If you add the 2
decimals, you end with 13 digits, this is damn close to the maximum
precision of 15 digits for double. Just some unfortunate multiplication
with 100 for some conversion, and you loose accuracy. And imagine you do
a forecasting and end with terms of the form X^2, you loose almost half
the digits (and even more).
How many numbers on this site are close to this limit?
http://www.deutsche-bank.de/ir/en/content/funding.htm
Yes, there are companies that come close to the limit, and there are
governments that have passed beyond this limit. And the secret is,
auditors out there will look for every missing cent. But consider
intermediate results during some forecasting for the next 10 years: you
will loose damn many digits, not just a few cents, and over 10 years the
results will add up.
Now, another big problem arises from unfortunate formulas.
Unfortunately, way over 95% of users don't know much about numerical
stability (to quote Prof. Kahan and James Gosling in a slightly
different context).*
Terms of the form: (x - 1) /(EXP(x - 1) - 1) should be known to users
working in the financial sector. The term 1/(EXP(x -1) - 1) is
particularly unstable around 1:
x - 1 = 0 => DIVISION by 0 (NaN)
x - 1 = -0 (negative underflow) => result is -INF
x - 1 = +0 (positive underflow) => result is +INF
Actually, you can get any result between -INF and +INF. Considering that
this is an investment rate, the end results summed over a few years
would be catastrophic, I hope everyone agrees. (Fortunately, (x - 1) /
(...) is slightly more stable, and looses only half the digits around x
=1, but IF a user manually crafts the formula and computes intermediate
results, then he will loose ALL the digits. How many users are numerical
analysts to devise numerically stable formulas? I am for sure NOT one of
the analysts.)
If there are some things to be aware of, then I would like to quote
Prof. Kahan:
*Thirteen Prevalent Misconceptions about Floating-Point Arithmetic:*
3. Arithmetic much more precise than the data it operates upon is
needless, and wasteful.
I have concerns that double is not appropriate anymore. It is not
appropriate for Sun, nor for IBM, nor for Google, nor for any financial
institution nor any government.
Is it possible to use greater precision? And still attain reasonable speed?
The answer is definitely yes.
R, Matlab, other financial tools are orders of magnitude faster than
Calc. And work usually with greater precision than Calc anyway.
Therefore, it should be at least theoretically possible to speed up Calc
to work fast enough with higher precision.
I am looking forward to see at least the double-extended data type. And
hopefully a long long (long64) integer. And some D(ecimal)-type
functions. And then maybe some IEEE 754r features. And maybe some other
great ideas. And probably I'll complain even then. ;)
Sincerely,
Leonard
* “ 95% of the folks out there are completely clueless about
floating-point.” ( J.G., 28 Feb. 1998 )
(Maybe more than 95% ?) [Prof. Kahan]
** For a lot more details, please visit Prof. Kahan's website:
http://www.cs.berkeley.edu/~wkahan/
I added some comments in-line, like a DSUM() function for decimal
floating point.
Eike Rathke wrote:
Hi Leonard,
On Friday, 2008-07-04 03:07:04 +0300, Leonard Mada wrote:
Well, 15-digits might have been intended behaviour in the '70s, but is
hardly appropriate in 2008.
IEEE 754 double did not change.
I am really amazed, that programs still cling to 15-digits. Well, the
64-bit double precision might limit calculations to ~16-digits, but
professional programs should do better. Even MS Windows calculator
ouperforms both Excel and Calc (as the previous article states).
So what? Comparing apples (e.g. BCD arithmetic or IEEE 854) with oranges
(IEEE 754) does not help at all. I assume you do not want Calc to use
BCD or other arbitrary precision instead of floating point at the cost
of multiples of calculation time?
Yes, I do want special functions (e.g. a DSUM) to perform exactly these
high accuracy operations. Users who depend on accurate results might
invest the time in using these functions. Knowing that such functions do
exist will also remove some of the misconceptions and errors.
I sincerely hope that OOo increases the precision, at least if an
operation overflows, Calc should increase the precision to 24-32 digits.
[Most existent processor designs implement the IEEE754 Exceptions,
including the OVERFLOW exception, therefore it is possible to catch such
an overflow. In this case, Calc should either display a warning that the
result is wrong - and offer a robust calculation mode, or just use a
greater precision. Hiding the error is not very rational.]
Detecting overflows doesn't help against roundoff errors.
And there are many other issues. Floating points spring undoubtedly to
mind. I hope that the floating point arithmetic improves drastically,
too.
With IEEE 754 most improvement is about curing symptoms and cheating the
user because she expects 0.3-0.2-0.1 to equal 0.0
I advise the interested reader to read some of Kahan's papers, see:
http://www.cs.berkeley.edu/~wkahan/
[e.g. http://www.cs.berkeley.edu/~wkahan/Mindless.pdf]
[Kahan is the architect of IEEE 754 - he is also known as "The Father of
Floating Point".]
Well, yes, nice read, it states what can go wrong and at the end the
best solution is he hopes "that popular programming languages will
support Interval Arithmetic of arbitrarily high (within limits)
precision variable (coarsely) at run-time. Then programmers may use it
to prove most of their numerical software free from roundoff-induced
anomalies even if it runs sometimes slower than usual."
I don't see where we'd currently benefit, but maybe I browsed too
quickly.
For the interested reader:
Numerical Computation Guide
http://docs.sun.com/source/819-3693/index.html
2005, Sun Microsystems Studio 11.
What Every Computer Scientist Should Know About Floating-Point
Arithmetic
http://docs.sun.com/source/819-0499/ncg_goldberg.html
Appendix D of the Numerical Computation Guide.
Numerical Computation Guide
http://docs.sun.com/source/806-3568/index.html
2002, may be better suited for high resolution displays,
mathematical formulas have better readable graphic images.
Eike
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]