On Sat, 21 Jun 2003, Gennaro Prota wrote: > On Fri, 20 Jun 2003 22:04:53 +0200 (CEST), Guillaume Melquiond > wrote: > > [...] > >I know this part of the standard. But it doesn't apply in the situation I > >was describing. I was describing the case of a constant whose decimal (and > >consequently binary) representation is not finite. It can be an irrational > >number like pi; but it can also simply be a rational like 1/3. > > I was just trying to start from something sure, such as the standard > requirements. I had some difficulties in understanding your example. > > The way I read the paragraph quoted above, "the scaled value" is the > value intended in the mathematical sense, not its truncated internal > representation. So the compiler must behave *as if* it considered all > the digits you provide and choose the nearest element (smaller or > larger - BTW C99 is different in this regard). In your example: > > the constant is 1.00050001236454786005785305678.... > you write it with 7 seven digits: 1.000500 > the floating-point format only uses 4 digits > > you wouldn't write 1.000500 but, for instance, 1.00050001 and the > hypothetical base-10 implementation should then consider all the > digits even if it can store only 4 of them. Thus if it chooses the > *larger* value nearest to that it chooses 1.001. Of course if it > chooses the smaller..., but that's another story.
Yes you're right, but you have lost track of the reason why I was giving this example. I was speaking about the conversion from a constant in 'long double' to 'double'. In your case, you can always add numbers to correct the problem. But in the situation I was describing, you can't: the 'long double' format is fixed once and for all; you can't suddenly add digits if you need them. So, even if you have the 'long double' representation of a constant, you may still need the 'double' representation of this constant if you want the 'double' version to be the nearest number. > Just to understand each other: suppose I write > > double d = > 2348542582773833227889480596789337027375682548908319870707290971532209025114608443463698998384768703031934976.0; > // 2**360 > > The value is 2**360. On my implementation, where DBL_MAX_EXP is 1024 > and FLT_RADIX is 2, isn't the compiler required to accept and > represent it exactly? Yes it is. > >When manipulating such a number, you can only give a finite number of > >decimal digit. And so the phenomenon of double rounding I was describing > >will occur since you first do a rounding to have a finite number of > >digits and then the compiler do another rounding (which is described by > >the part of the standard you are quoting) to fit the constant in > >the floating-point format. > > I'm not very expert in this area. Can you give a real example > (constant given in decimal and internal representation non-decimal)? Okay, here is a "real" example. Let's suppose you have an x86 processor: 'long double' mantissa is 64 bits wide, 'double' is 53 (52 + one implicit). The constant is the fractional number: c = 73786976294838196225 / 55340232221128654848 (it's 1/3-3413/2^64) Since you don't want the compiler to do the rounding (because of the standard we can't be sure of the rounding direction), you give the exact representation of the 'long double' nearest number. long double c_l() { return 1.3333333333333332593184650249895639717578887939453125; } We are sure c_l is the nearest 'long double'. Now we want the nearest 'double'. Can we simply do: double c_d() { return c_l(); } No, we can't. Because the constant was specially crafted to make this conversion fail (if rounded to nearest). And it wasn't that difficult, it is just the matter of a four decimal digits number (3413) to make it fail. In fact, it is enough that 13 bits have a particular value. Consequently, more than one constant out of 10000 may suffer from this problem. So it is rare, but it is not impossible. It's why I was suggesting that a library should provide a mean to know if a number representing a constant is the nearest or not. Is it a bit more clear? And for people still wondering why I didn't use this kind of code (please note that the value of the constant has changed, this time it's the nearest 53 decimal digits to the constant): #define c 1.3333333333333331483142325986820016699615128648777803 long double c_l() { return c; } double c_d() { return c; } It's because the standard doesn't guarantee the direction of rounding. So not only the 'double' value could be incorrect, but the 'long double' value could also be incorrect. Contrarily to the previous example where at least the 'long double' value was correct. So, in the end, it's only a matter of what we expect from a constant library. Do we expect it to give the nearest number for a given precision? As for me, I do. Regards, Guillaume PS: To summarize, for a given constant, if we want the best values without relying on the rounding done by the compiler, we need to have an exact representation for each available format. The most common floating-point formats are 32, 64, 80, 64+64 and 128 bits. So it's 5 values by constant (15 values if we also count lower and upper bounds). Because of the huge number of required values, some of them may be missing. If that is the case, the library needs a way to warn the program that it will not get the nearest value but only an approximation. _______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost