>>>>> Ben Bolker <bbol...@gmail.com>
>>>>> on Sat, 5 Feb 2011 15:58:09 -0500 writes:
> A bug was recently posted to the R bug database (which
> probably would better have been posted as a query here) as
> to why this happens:
>> print(7.921,digits=2)
> [1] 8
>> print(7.92,digits=2)
> [1] 7.9
> Two things I *haven't* done to help make sense of this
> for myself are (1) writing out the binary representations
> to see if something obvious pops out about why this would
> be a breakpoint and (2) poking in the source code (I did a
> little bit of this but gave up).
> I know that confusion over rounding etc. is very common,
> and my first reaction to this sort of question is always
> that there must be some sort of confusion (either (1) in a
> FAQ 7.31-ish sort of way that floating point values have
> finite precision in the first place, or (2) a confusion
> over the difference between the value and the
> representation of the number, or (3) more subtly, about
> the differences among various choices of rounding
> conventions).
> However, in this case I am a bit stumped: I don't see
> that any of the standard confusions apply. I grepped the
> R manuals for "rounding" and didn't find anything useful
> ... I have a very strong prior that such a core part of R
> must be correct, and that therefore I (and the original
> bug reporter) must be misunderstanding something.
> Thoughts/references?
I had started to delve into this after you've posted the bug
report. It is clearly a bug(let),
caused by code that has been in R from its very
beginning, at least in the first source code I've seen in 1994.
The problem is not related to digits=2,
but using such a low number of digits shows it more
dramatically, e.g., also
> print(5.9994, digits=4)
[1] 5.999
> print(5.9994001, digits=4)
[1] 6
Interestingly, the problem seems *not* to be present for
digits = 1 (only).
I haven't found time to mathematically "analyze" it for
determining a correct solution though.
Note that fixing this bug(let) will probably (very slightly)
change a huge number of R outputs .. so there is a caveat,
but nonetheless, we must approach it.
The responsible code is the C function scientific()
in src/main/format.c
a somewhat direct R interface to its functionality is given by
R's format.info() :
> format.info(7.92, digits=2)
[1] 3 1 0
> format.info(7.921, digits=2)
[1] 1 0 0
which means that 7.92 will use 3 characters, 1 digit after the decimal
7.921 will use 1 character, 0 digits after decimal.
The crucial "buggy" part of the code is {line 180 ff} :
/* compute number of digits */
*nsig = R_print.digits;
for (j = 1; j <= *nsig; j++) {
if (fabs(alpha - floor(alpha+0.5)) < eps * alpha) {
*nsig = j;
break;
}
alpha *= 10.0;
}
where notably
if (fabs(alpha - floor(alpha+0.5)) < eps * alpha) {
is not quite the correct check.
Note that eps = 10^-2 in our case and alpha = 7.92 or 7.921
and that 8 - 7.921 = 0.079 < 7.921/10^2 is just at the border.
Looking for "all the border cases" is solving the above
analytically with '=' and setting k = ceiling(alpha) :
> p0 <- function(...) paste(..., sep="")
> k <- 6:9; names(k) <- p0("k=",k)
> d <- 1:5; names(d) <- p0("d=",d)
> xc <- outer(k, 1+10^-d, "/")
> print(xc, digits= 11)
d=1 d=2 d=3 d=4 d=5
k=6 5.4545454545 5.9405940594 5.994005994 5.99940006 5.9999400006
k=7 6.3636363636 6.9306930693 6.993006993 6.99930007 6.9999300007
k=8 7.2727272727 7.9207920792 7.992007992 7.99920008 7.9999200008
k=9 8.1818181818 8.9108910891 8.991008991 8.99910009 8.9999100009
So we see that for our case, 7.920792... is more exactly the
border case.
One change that is *not* correct is making it an absolute
instead of relative comparison, i.e. replacing the RHS
eps * alpha by eps
Namely, one important purpose of the test is to ensure that e.g.
print(3.597, digits = 3)
is printed as 3.6 and not 3.60
Now I have had -- since 1997 at least -- an R version of
scientific() for more easy experimentation.
Here's an updated version of that:
###--- R function that does the same as 'scientific' in
###--- /usr/local/R-0.50/src/main/format.c
###--- ~~~~~~~~~~~||||||~~~~~~~~~~~~~~~~~~
scientific <- function(x, digits = getOption('digits')) ## eps = 10^-(digits)
{
##- /* for 1 number x , return
##- * sgn = 1_{x < 0} {0/1}
##- * kpower = Exponent of 10;
##- * nsig = min(digits, #{significant digits of alpha})
##- *
##- * where |x| = alpha * 10^kpower and 1 <= alpha < 10
##- */
eps <- 10 ^(-digits)
if (x == 0) {
kpower <- 0
nsig <- 1
sgn <- 0
} else {
if(x < 0) {
sgn <- 1; r <- -x
} else {
sgn <- 0; r <- x
}
kpower <- floor(log10(r));##--> r = |x| ; 10^k <= r
if (kpower <= -308) ## close to denormalization -- added for R 2.x.y
alpha <- (r * 1e+30) / 10^(kp+30)
else
alpha <- r / 10^kpower
## make sure that alpha is in [1,10)
if (10 - alpha < eps) {
cat(".. scientific(.) correcting alpha=",format(alpha),
". x=", formatC(x,dig=2,w=1), " >>> RARE ! <<<\n")
alpha <- alpha/10
kpower<- kpower + 1
}
## compute number of digits
a <- alpha
nsig <- digits
for (j in 1:nsig) {
if (abs(a - floor(a+0.5)) < eps * a) {
nsig <- j
break
}
a <- a * 10
}
}
left <- kpower + 1
c(sgn = sgn, kpower = kpower, nsig = nsig,
left = left, right = nsig - left,
sleft = sgn + max(1,left))
}
names.num <- function(v, dig=2)
{
## Purpose: Add names to vector if it hasn't
## Author: Martin Maechler, Date: 16 Jul 97, 08:49
names(v) <- formatC(v, w=1, dig=dig)
v
}
do.sci <- function(v, nam.dig = 3, digits = getOption('digits'))
{
## Purpose: Call scientific(.) and do some more
## -------------------------------------------------------------------------
## Author: Martin Maechler, Date: 16 Jul 97, 17:03
## -- names.num(.) is from /u/maechler/R/Util.R ---
r <- sapply(names.num(v, dig= nam.dig), scientific, digits=digits)
cbind(r, min = apply(r,1,min), max = apply(r,1,max))
}
and here some of the experimentation code:
####-- Examples / Experiment, using scientific(),
#### the R function that does the same as 'scientific' in
#### <Rsource>/src/main/format.c
source("scientific.R")# for function definitions
## ~~~~~~~~~~~~
scientific(pi)
scientific(pi,14)
scientific(pi,17)# 0 0 16
tens <- names.num(10 ^ (-2:6))
do.sci(tens)
do.sci(3.1 * tens)
do.sci(pi * tens)
do.sci(pi * tens, dig = 12)
## v1 from /u/maechler/R/print-tests.R
if(!exists("v1")) v1 <- 10^c(-4:0,2,4)
do.sci(v1)
## Ben's example (*, digits=2)
x2 <- c(outer(c(-1,1), c(7.92, 7.921)))
x2 <- c(x2, 10^c(-10,-3, -1:1, 3,10))
do.sci(x2, 4, digits = 2)
-------
Now, I'm waiting for comments/suggestions ..
Martin
______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel