>>>>> Ben Bolker <bbol...@gmail.com>
>>>>>     on Sat, 5 Feb 2011 15:58:09 -0500 writes:

    >   A bug was recently posted to the R bug database (which
    > probably would better have been posted as a query here) as
    > to why this happens:

    >> print(7.921,digits=2)
    > [1] 8
    >> print(7.92,digits=2)
    > [1] 7.9

    >   Two things I *haven't* done to help make sense of this
    > for myself are (1) writing out the binary representations
    > to see if something obvious pops out about why this would
    > be a breakpoint and (2) poking in the source code (I did a
    > little bit of this but gave up).

    >   I know that confusion over rounding etc. is very common,
    > and my first reaction to this sort of question is always
    > that there must be some sort of confusion (either (1) in a
    > FAQ 7.31-ish sort of way that floating point values have
    > finite precision in the first place, or (2) a confusion
    > over the difference between the value and the
    > representation of the number, or (3) more subtly, about
    > the differences among various choices of rounding
    > conventions).

    >   However, in this case I am a bit stumped: I don't see
    > that any of the standard confusions apply.  I grepped the
    > R manuals for "rounding" and didn't find anything useful
    > ...  I have a very strong prior that such a core part of R
    > must be correct, and that therefore I (and the original
    > bug reporter) must be misunderstanding something.

    >   Thoughts/references?

I had started to delve into this after you've posted the bug
report. It is clearly a bug(let),
caused by code that has been in  R  from its very
beginning, at least in the first source code I've seen in 1994.

The problem is not related to digits=2,
but using such a low number of digits shows it more
dramatically, e.g., also

 > print(5.9994, digits=4)
 [1] 5.999
 > print(5.9994001, digits=4)
 [1] 6

Interestingly, the problem seems *not* to be present for
digits = 1 (only).

I haven't found time to mathematically "analyze" it for
determining a correct solution though.
Note that fixing this bug(let) will probably (very slightly)
change a huge number of R outputs .. so there is a caveat,
but nonetheless, we must approach it.

The responsible code is the C function  scientific()
in src/main/format.c 

a somewhat direct R interface to its functionality is given by
R's format.info() :

  > format.info(7.92, digits=2)
  [1] 3 1 0
  > format.info(7.921, digits=2)
  [1] 1 0 0

which means that 7.92 will use 3 characters, 1 digit after the decimal
7.921 will use 1 character, 0 digits after decimal.

The crucial "buggy" part of the code is  {line 180 ff} :

        /* compute number of digits */

        *nsig = R_print.digits;
        for (j = 1; j <= *nsig; j++) {
            if (fabs(alpha - floor(alpha+0.5)) < eps * alpha) {
                *nsig = j;
                break;
            }
            alpha *= 10.0;
        }

where notably
            if (fabs(alpha - floor(alpha+0.5)) < eps * alpha) {
is not quite the correct check.
Note that  eps = 10^-2  in our case and  alpha = 7.92 or 7.921
and that  8 - 7.921 = 0.079 <  7.921/10^2  is just at the border.

Looking for "all the border cases" is solving the above
analytically with '=' and setting  k = ceiling(alpha) :

> p0 <- function(...) paste(..., sep="")
> k <- 6:9; names(k) <- p0("k=",k)
> d <- 1:5; names(d) <- p0("d=",d)
> xc <- outer(k, 1+10^-d, "/")
> print(xc, digits= 11)
             d=1          d=2         d=3        d=4          d=5
k=6 5.4545454545 5.9405940594 5.994005994 5.99940006 5.9999400006
k=7 6.3636363636 6.9306930693 6.993006993 6.99930007 6.9999300007
k=8 7.2727272727 7.9207920792 7.992007992 7.99920008 7.9999200008
k=9 8.1818181818 8.9108910891 8.991008991 8.99910009 8.9999100009

So we see that for our case, 7.920792... is more exactly the
border case.

One change that is *not* correct is making it an absolute
instead of relative comparison, i.e.  replacing the RHS
eps * alpha  by  eps

Namely, one important purpose of the test is to ensure that e.g.

  print(3.597, digits = 3)

is printed as  3.6   and not 3.60

Now I have had -- since 1997 at least -- an R version of
scientific() for more easy experimentation.
Here's an updated version of that:

###--- R function that does the same as 'scientific' in

###--- /usr/local/R-0.50/src/main/format.c
###--- ~~~~~~~~~~~||||||~~~~~~~~~~~~~~~~~~

scientific <- function(x, digits = getOption('digits')) ## eps = 10^-(digits)
{
  ##-   /* for 1 number  x , return
  ##-    *      sgn    = 1_{x < 0}  {0/1}
  ##-    *      kpower = Exponent of 10;
  ##-    *      nsig   = min(digits, #{significant digits of alpha})
  ##-    *
  ##-    * where  |x| = alpha * 10^kpower   and  1 <= alpha < 10
  ##-    */

  eps <- 10 ^(-digits)

  if (x == 0) {
    kpower <- 0
    nsig <- 1
    sgn <- 0
  } else {
    if(x < 0) {
      sgn <- 1; r <- -x
    } else {
      sgn <- 0; r <- x
    }
    kpower <- floor(log10(r));##-->      r = |x| ;  10^k <= r

    if (kpower <= -308) ## close to denormalization -- added for R 2.x.y
        alpha <- (r * 1e+30) / 10^(kp+30)
    else
        alpha <- r / 10^kpower

    ## make sure that alpha is in [1,10)

    if (10 - alpha < eps) {
      cat(".. scientific(.) correcting alpha=",format(alpha),
          ".  x=", formatC(x,dig=2,w=1), "  >>> RARE ! <<<\n")
      alpha <- alpha/10
      kpower<- kpower + 1
    }

    ## compute number of digits

    a <- alpha
    nsig <- digits
    for (j in 1:nsig) {
      if (abs(a - floor(a+0.5)) < eps * a) {
        nsig <- j
        break
      }
      a <- a * 10
    }
  }
  left <- kpower + 1
  c(sgn = sgn, kpower = kpower, nsig = nsig,
    left = left, right = nsig - left,
    sleft = sgn + max(1,left))
}

names.num <- function(v, dig=2)
{
  ## Purpose: Add names to vector if it hasn't
  ## Author: Martin Maechler, Date: 16 Jul 97, 08:49
  names(v) <- formatC(v, w=1, dig=dig)
  v
}

do.sci <- function(v, nam.dig = 3, digits = getOption('digits'))
{
  ## Purpose: Call scientific(.) and do some more
  ## -------------------------------------------------------------------------
  ## Author: Martin Maechler, Date: 16 Jul 97, 17:03
  ## -- names.num(.) is from /u/maechler/R/Util.R ---
  r <- sapply(names.num(v, dig= nam.dig), scientific, digits=digits)
  cbind(r, min = apply(r,1,min), max = apply(r,1,max))
}
and here some of the experimentation code:

####-- Examples / Experiment, using  scientific(),
####   the R function that does the same as 'scientific' in
####   <Rsource>/src/main/format.c

source("scientific.R")# for function definitions
##      ~~~~~~~~~~~~

scientific(pi)
scientific(pi,14)
scientific(pi,17)# 0 0 16

tens <- names.num(10 ^ (-2:6))
do.sci(tens)
do.sci(3.1 * tens)
do.sci(pi  * tens)
do.sci(pi  * tens, dig = 12)

## v1 from /u/maechler/R/print-tests.R
if(!exists("v1")) v1 <- 10^c(-4:0,2,4)
do.sci(v1)

## Ben's example (*, digits=2)
x2 <- c(outer(c(-1,1), c(7.92, 7.921)))
x2 <- c(x2, 10^c(-10,-3, -1:1, 3,10))
do.sci(x2, 4, digits = 2)
-------

Now, I'm waiting for comments/suggestions ..
Martin
______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Reply via email to