Re: [Rd] print(...,digits=2) behavior
This affects _many_ *.Rout.save checks in packages. I assume this is in the R-devel branch. I've got an addition to survival nearly ready to go (faster concordance calculation). At what point should I should I switch over to the newer version, fix up my .out files etc, to best mesh with the automatic checks on CRAN? It's a nuisance for me to update, but only a nuisance. I've been through it twice before (Splus version 4- 5 and Splus - R switch). I might as well time it so as to make life as easy as possible for you folks. Thanks for all the hard work. Terry T __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] print(...,digits=2) behavior
On Mon, Feb 07, 2011 at 09:56:18AM +0100, Martin Maechler wrote: [...] Namely, one important purpose of the test is to ensure that e.g. print(3.597, digits = 3) is printed as 3.6 and not 3.60 Now I have had -- since 1997 at least -- an R version of scientific() for more easy experimentation. Here's an updated version of that: [...] Now, I'm waiting for comments/suggestions .. In a previous email, i discussed the case print(9.991, digits=3). This case is already handled in your R level experimental function scientific(). I noticed this only after sending the previous email. I would like to suggest a modification of the algorithm. The purpose is to determine the minimum number of digits, such that the printed number has the same value as the number rounded to exactly getOption(digits) digits, but shorter representation. Algorithm. First, the mantissa of the number rounded to digits digits is computed as an integer by the formula a - floor(alpha*10^(digits - 1) + 0.5) The obtained integer number a satisfies 10^(digits-1) = a = 10^digits The case a = 10^digits corresponds to the situation that we have to increase the exponent. This may be tested either before the remaining part of the code or as the last step as below. For example, if alpha = 3.597 and digits = 3, we get a = 360. The question, whether (digits - 1) digits are sufficient for printing the number, is equivalent to the condition that a is divisible by 10. Similarly, (digits - 2) digits are sufficient if and only if a is divisible by 100. This may be tested using the following code nsig - digits for (j in 1:nsig) { a - a / 10 if (a == floor(a)) { nsig - nsig - 1 } else { break } } ## nsig == 0 if and only if we started with a == 10^digits if (nsig == 0) { nsig - 1 kpower - kpower + 1 } This code uses double precision for a, since values up to 10^digits may occur and digits may be 15 or more. The algorithm is not exact for digits = 15, but works reasonably. I suggest to use a different, slower and more accurate algorithm, if getOption(digits) = 16. Please, find in an attachment two versions of the above algorithm. One is a modification of your R level function from scientific.R and the other is a patch against src/main/format.c, which i tested. I suggest to consider the following test cases. The presented output is obtained using the patch. example1 - c( 7.94, 7.950001, 8.04, 8.050001) for (x in example1) print(x, digits=2) [1] 7.9 [1] 8 [1] 8 [1] 8.1 example2 - c( 3.599499, 3.599501, 3.600499, 3.600501) for (x in example2) print(x, digits=4) [1] 3.599 [1] 3.6 [1] 3.6 [1] 3.601 example3 - c( 1.004999, 1.005001) for (x in example3) print(x, digits=7) [1] 1 [1] 1.01 example4 - c( 9.994999, 9.995001) for (x in example4) print(x, digits=7) [1] 9.99 [1] 10 I appreciate comments. Petr Savicky. ###--- R function that does the same as 'scientific' in ###--- /usr/local/R-0.50/src/main/format.c ###--- ~~~||~~ scientific1 - function(x, digits = getOption('digits')) ## eps = 10^-(digits) { ##- /* for 1 number x , return ##-* sgn= 1_{x 0} {0/1} ##-* kpower = Exponent of 10; ##-* nsig = min(digits, #{significant digits of alpha}) ##-* ##-* where |x| = alpha * 10^kpower and 1 = alpha 10 ##-*/ eps - 10 ^(-digits) if (x == 0) { kpower - 0 nsig - 1 sgn - 0 } else { if(x 0) { sgn - 1; r - -x } else { sgn - 0; r - x } kpower - floor(log10(r));##-- r = |x| ; 10^k = r if (kpower = -308) ## close to denormalization -- added for R 2.x.y alpha - (r * 1e+30) / 10^(kpower+30) else alpha - r / 10^kpower ## a integer, 10^(digits-1) = a = 10^digits a - floor(alpha*10^(digits - 1) + 0.5) nsig - digits for (j in 1:nsig) { a - a / 10 if (a == floor(a)) { nsig - nsig - 1 } else { break } } if (nsig == 0) { nsig - 1 kpower - kpower + 1 } } left - kpower + 1 c(sgn = sgn, kpower = kpower, nsig = nsig, left = left, right = nsig - left, sleft = sgn + max(1,left)) } diff --minimal -U 5 -r R-devel/src/main/format.c R-devel-print/src/main/format.c --- R-devel/src/main/format.c 2010-04-27 17:52:24.0 +0200 +++ R-devel-print/src/main/format.c 2011-02-12 11:07:37.0 +0100 @@ -167,28 +167,26 @@ alpha = (r * 1e+30)/pow(10.0, (double)(kp+30)); } else alpha = r / pow(10.0, (double)kp); - /* make sure that alpha is in [1,10) AFTER rounding */ - - if (10.0 - alpha eps*alpha) { - alpha /= 10.0; -
Re: [Rd] print(...,digits=2) behavior
On Mon, Feb 07, 2011 at 09:56:18AM +0100, Martin Maechler wrote: Ben Bolker bbol...@gmail.com on Sat, 5 Feb 2011 15:58:09 -0500 writes: A bug was recently posted to the R bug database (which probably would better have been posted as a query here) as to why this happens: print(7.921,digits=2) [1] 8 print(7.92,digits=2) [1] 7.9 [...] I had started to delve into this after you've posted the bug report. It is clearly a bug(let), caused by code that has been in R from its very beginning, at least in the first source code I've seen in 1994. The problem is not related to digits=2, but using such a low number of digits shows it more dramatically, e.g., also print(5.9994, digits=4) [1] 5.999 print(5.9994001, digits=4) [1] 6 Interestingly, the problem seems *not* to be present for digits = 1 (only). I haven't found time to mathematically analyze it for determining a correct solution though. Note that fixing this bug(let) will probably (very slightly) change a huge number of R outputs .. so there is a caveat, but nonetheless, we must approach it. The responsible code is the C function scientific() in src/main/format.c I inspected the source of scientific() and formatReal() (2.13.0, revision 2011-02-08 r54284). Let me point out an example of the difference between the output of print() and rounding to digits significant digits, which is slightly different from the previous ones, since also the exponent changes. Namely, print(9.991, digits=3) [1] 10 while rounding to 3 digits yields 9.99. The reason is in scientific(), where the situation that rounding increases the exponent is tested using /* make sure that alpha is in [1,10) AFTER rounding */ if (10.0 - alpha eps*alpha) { alpha /= 10.0; kp += 1; } Here, eps is determined in formatReal() as double eps = pow(10.0, -(double)R_print.digits); so we have eps = 10^-digits. The above condition on alpha is equivalent to alpha 10.0/(1 + 10^-digits) For digits=3, this is alpha 9.99000999000999 This bound may be verified as follows print(9.9900099900, digits=3) [1] 9.99 print(9.9900099901, digits=3) [1] 10 The existing algorithm for choosing the number of digits is designed to predict the format suitable for all numbers in a vector before the actual call of sprintf() or snprintf(). For speed, this algorithm should use the standard double precision, so it is necessarily inaccurate for precision 15 and more and there may be some rare such cases also for smaller precisions. For smaller precisions, say below 7, the algorithm can be made more precise. This would change the output in rare cases and mainly for printing single numbers. If a vector is printed, then the format is typically not determined by the problematic numbers. Changing the default behavior may be undesirable for backward compatibility reasons. If this is the case, then a possible solution is to make the documentation more precise on this and include pointers to possible solutions. The functions sprintf(), formatC() and signif() may be used. In particular, if signif() is used to round the numbers before printing, then we get the correct output print(signif(7.921, digits=2)) [1] 7.9 print(signif(9.9900099901, digits=3)) [1] 9.99 The current ?print.default contains digits: a non-null value for ‘digits’ specifies the minimum number of significant digits to be printed in values. The word minimum here means that all numbers in the vector will have at least the chosen number of digits, but some may have more. I suggest to add See 'options' for more detail.. The current ?options contains ‘digits’: controls the number of digits to print when printing numeric values. It is a suggestion only. I suggest to extend this to It is a suggestion only and the actual number of printed digits may be smaller, if the relative error of the output number is less than 10^-digits. Use 'signif(, digits)' before printing to get the exact number of the printed significant digits. I appreciate to know the opinion of R developers on this. Petr Savicky. __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] print(...,digits=2) behavior
Ben Bolker bbol...@gmail.com on Sat, 5 Feb 2011 15:58:09 -0500 writes: A bug was recently posted to the R bug database (which probably would better have been posted as a query here) as to why this happens: print(7.921,digits=2) [1] 8 print(7.92,digits=2) [1] 7.9 Two things I *haven't* done to help make sense of this for myself are (1) writing out the binary representations to see if something obvious pops out about why this would be a breakpoint and (2) poking in the source code (I did a little bit of this but gave up). I know that confusion over rounding etc. is very common, and my first reaction to this sort of question is always that there must be some sort of confusion (either (1) in a FAQ 7.31-ish sort of way that floating point values have finite precision in the first place, or (2) a confusion over the difference between the value and the representation of the number, or (3) more subtly, about the differences among various choices of rounding conventions). However, in this case I am a bit stumped: I don't see that any of the standard confusions apply. I grepped the R manuals for rounding and didn't find anything useful ... I have a very strong prior that such a core part of R must be correct, and that therefore I (and the original bug reporter) must be misunderstanding something. Thoughts/references? I had started to delve into this after you've posted the bug report. It is clearly a bug(let), caused by code that has been in R from its very beginning, at least in the first source code I've seen in 1994. The problem is not related to digits=2, but using such a low number of digits shows it more dramatically, e.g., also print(5.9994, digits=4) [1] 5.999 print(5.9994001, digits=4) [1] 6 Interestingly, the problem seems *not* to be present for digits = 1 (only). I haven't found time to mathematically analyze it for determining a correct solution though. Note that fixing this bug(let) will probably (very slightly) change a huge number of R outputs .. so there is a caveat, but nonetheless, we must approach it. The responsible code is the C function scientific() in src/main/format.c a somewhat direct R interface to its functionality is given by R's format.info() : format.info(7.92, digits=2) [1] 3 1 0 format.info(7.921, digits=2) [1] 1 0 0 which means that 7.92 will use 3 characters, 1 digit after the decimal 7.921 will use 1 character, 0 digits after decimal. The crucial buggy part of the code is {line 180 ff} : /* compute number of digits */ *nsig = R_print.digits; for (j = 1; j = *nsig; j++) { if (fabs(alpha - floor(alpha+0.5)) eps * alpha) { *nsig = j; break; } alpha *= 10.0; } where notably if (fabs(alpha - floor(alpha+0.5)) eps * alpha) { is not quite the correct check. Note that eps = 10^-2 in our case and alpha = 7.92 or 7.921 and that 8 - 7.921 = 0.079 7.921/10^2 is just at the border. Looking for all the border cases is solving the above analytically with '=' and setting k = ceiling(alpha) : p0 - function(...) paste(..., sep=) k - 6:9; names(k) - p0(k=,k) d - 1:5; names(d) - p0(d=,d) xc - outer(k, 1+10^-d, /) print(xc, digits= 11) d=1 d=2 d=3d=4 d=5 k=6 5.4545454545 5.9405940594 5.994005994 5.99940006 5.46 k=7 6.3636363636 6.9306930693 6.993006993 6.99930007 6.37 k=8 7.2727272727 7.9207920792 7.992007992 7.99920008 7.28 k=9 8.1818181818 8.9108910891 8.991008991 8.99910009 8.19 So we see that for our case, 7.920792... is more exactly the border case. One change that is *not* correct is making it an absolute instead of relative comparison, i.e. replacing the RHS eps * alpha by eps Namely, one important purpose of the test is to ensure that e.g. print(3.597, digits = 3) is printed as 3.6 and not 3.60 Now I have had -- since 1997 at least -- an R version of scientific() for more easy experimentation. Here's an updated version of that: ###--- R function that does the same as 'scientific' in ###--- /usr/local/R-0.50/src/main/format.c ###--- ~~~||~~ scientific - function(x, digits = getOption('digits')) ## eps = 10^-(digits) { ##- /* for 1 number x , return ##-* sgn= 1_{x 0} {0/1} ##-* kpower = Exponent of 10; ##-* nsig = min(digits, #{significant digits of alpha}) ##-* ##-* where |x| = alpha * 10^kpower and 1 = alpha 10 ##-*/ eps - 10 ^(-digits) if (x == 0) { kpower - 0 nsig - 1 sgn - 0 } else { if(x 0) { sgn - 1; r - -x } else { sgn - 0; r - x } kpower - floor(log10(r));##-- r = |x| ; 10^k = r if (kpower = -308) ## close to
Re: [Rd] print(...,digits=2) behavior
On Mon, Feb 07, 2011 at 09:56:18AM +0100, Martin Maechler wrote: Ben Bolker bbol...@gmail.com on Sat, 5 Feb 2011 15:58:09 -0500 writes: A bug was recently posted to the R bug database (which probably would better have been posted as a query here) as to why this happens: print(7.921,digits=2) [1] 8 print(7.92,digits=2) [1] 7.9 Two things I *haven't* done to help make sense of this for myself are (1) writing out the binary representations to see if something obvious pops out about why this would be a breakpoint and (2) poking in the source code (I did a little bit of this but gave up). I know that confusion over rounding etc. is very common, and my first reaction to this sort of question is always that there must be some sort of confusion (either (1) in a FAQ 7.31-ish sort of way that floating point values have finite precision in the first place, or (2) a confusion over the difference between the value and the representation of the number, or (3) more subtly, about the differences among various choices of rounding conventions). However, in this case I am a bit stumped: I don't see that any of the standard confusions apply. I grepped the R manuals for rounding and didn't find anything useful ... I have a very strong prior that such a core part of R must be correct, and that therefore I (and the original bug reporter) must be misunderstanding something. Thoughts/references? I had started to delve into this after you've posted the bug report. It is clearly a bug(let), caused by code that has been in R from its very beginning, at least in the first source code I've seen in 1994. The problem is not related to digits=2, but using such a low number of digits shows it more dramatically, e.g., also print(5.9994, digits=4) [1] 5.999 print(5.9994001, digits=4) [1] 6 I think that this may be a consequence of the following analysis, which i did some time ago using the source code of scientific() function in src/main/format.c. The conclusion was the following. Printing to k digits looks for the smallest number of digits l = k so that the relative error of the printed mantissa (significand) is at most 10^-k. If the mantissa begins with a digit less than 5, then this condition is stronger than having absolute error at most 5*10^-k. So, in this case, we cannot get lower accuracy than rounding to k digits. If the mantissa begins with 5 or more, then having relative error at most 10^-k is a weaker condition than having absolute error at most 5*10^-k. In this case, the chosen l may be smaller than k even if the printed number has larger error than the number rounded to k digits. This may be seen in the following example xx - 8 + (6:10)*10^-7 for (x in xx) print(x) [1] 8 [1] 8 [1] 8 [1] 8.01 [1] 8.01 where all the printed numbers are 8.01 if rounded to 7 digits. The cases print(7.921,digits=2) [1] 8 print(7.92,digits=2) [1] 7.9 seem to have the same explanation. The relative errors of the numbers rounded to a single digit are (8 - 7.921)/7.921 [1] 0.009973488 (8 - 7.92)/7.92 [1] 0.01010101 In the first case, this is less than 10^-2 and so, l = 1 is used. In the second case, the relative error for l = 1 is larger than 10^-2 an so, l = 2 is chosen. In the cases print(5.9994, digits=4) [1] 5.999 print(5.9994001, digits=4) [1] 6 the relative errors of one digit output are (6 - 5.9994)/5.9994 [1] 0.00010001 (6 - 5.9994001)/5.9994001 [1] 9.999333e-05 Here, one digit output is chosen if the relative error is less than 10^-4 and not otherwise. Petr Savicky. __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] print(...,digits=2) behavior
A bug was recently posted to the R bug database (which probably would better have been posted as a query here) as to why this happens: print(7.921,digits=2) [1] 8 print(7.92,digits=2) [1] 7.9 Two things I *haven't* done to help make sense of this for myself are (1) writing out the binary representations to see if something obvious pops out about why this would be a breakpoint and (2) poking in the source code (I did a little bit of this but gave up). I know that confusion over rounding etc. is very common, and my first reaction to this sort of question is always that there must be some sort of confusion (either (1) in a FAQ 7.31-ish sort of way that floating point values have finite precision in the first place, or (2) a confusion over the difference between the value and the representation of the number, or (3) more subtly, about the differences among various choices of rounding conventions). However, in this case I am a bit stumped: I don't see that any of the standard confusions apply. I grepped the R manuals for rounding and didn't find anything useful ... I have a very strong prior that such a core part of R must be correct, and that therefore I (and the original bug reporter) must be misunderstanding something. Thoughts/references? cheers Ben Bolker __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel