On Mon, Feb 07, 2011 at 09:56:18AM +0100, Martin Maechler wrote: > >>>>> Ben Bolker <bbol...@gmail.com> > >>>>> on Sat, 5 Feb 2011 15:58:09 -0500 writes: > > > A bug was recently posted to the R bug database (which > > probably would better have been posted as a query here) as > > to why this happens: > > >> print(7.921,digits=2) > > [1] 8 > >> print(7.92,digits=2) > > [1] 7.9 > [...]
> I had started to delve into this after you've posted the bug > report. It is clearly a bug(let), > caused by code that has been in R from its very > beginning, at least in the first source code I've seen in 1994. > > The problem is not related to digits=2, > but using such a low number of digits shows it more > dramatically, e.g., also > > > print(5.9994, digits=4) > [1] 5.999 > > print(5.9994001, digits=4) > [1] 6 > > Interestingly, the problem seems *not* to be present for > digits = 1 (only). > > I haven't found time to mathematically "analyze" it for > determining a correct solution though. > Note that fixing this bug(let) will probably (very slightly) > change a huge number of R outputs .. so there is a caveat, > but nonetheless, we must approach it. > > The responsible code is the C function scientific() > in src/main/format.c I inspected the source of scientific() and formatReal() (2.13.0, revision 2011-02-08 r54284). Let me point out an example of the difference between the output of print() and rounding to "digits" significant digits, which is slightly different from the previous ones, since also the exponent changes. Namely, print(9.991, digits=3) [1] 10 while rounding to 3 digits yields 9.99. The reason is in scientific(), where the situation that rounding increases the exponent is tested using /* make sure that alpha is in [1,10) AFTER rounding */ if (10.0 - alpha < eps*alpha) { alpha /= 10.0; kp += 1; } Here, eps is determined in formatReal() as double eps = pow(10.0, -(double)R_print.digits); so we have eps = 10^-digits. The above condition on alpha is equivalent to alpha > 10.0/(1 + 10^-digits) For digits=3, this is alpha > 9.99000999000999 This bound may be verified as follows print(9.9900099900, digits=3) [1] 9.99 print(9.9900099901, digits=3) [1] 10 The existing algorithm for choosing the number of digits is designed to predict the format suitable for all numbers in a vector before the actual call of sprintf() or snprintf(). For speed, this algorithm should use the standard double precision, so it is necessarily inaccurate for precision 15 and more and there may be some rare such cases also for smaller precisions. For smaller precisions, say below 7, the algorithm can be made more precise. This would change the output in rare cases and mainly for printing single numbers. If a vector is printed, then the format is typically not determined by the problematic numbers. Changing the default behavior may be undesirable for backward compatibility reasons. If this is the case, then a possible solution is to make the documentation more precise on this and include pointers to possible solutions. The functions sprintf(), formatC() and signif() may be used. In particular, if signif() is used to round the numbers before printing, then we get the correct output print(signif(7.921, digits=2)) [1] 7.9 print(signif(9.9900099901, digits=3)) [1] 9.99 The current ?print.default contains digits: a non-null value for ‘digits’ specifies the minimum number of significant digits to be printed in values. The word "minimum" here means that all numbers in the vector will have at least the chosen number of digits, but some may have more. I suggest to add "See 'options' for more detail.". The current ?options contains ‘digits’: controls the number of digits to print when printing numeric values. It is a suggestion only. I suggest to extend this to It is a suggestion only and the actual number of printed digits may be smaller, if the relative error of the output number is less than 10^-digits. Use 'signif(, digits)' before printing to get the exact number of the printed significant digits. I appreciate to know the opinion of R developers on this. Petr Savicky. ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel