Re: [Rd] print(...,digits=2) behavior

2011-03-08 Thread Terry Therneau
 This affects _many_ *.Rout.save checks in packages.

  I assume this is in the R-devel branch.

 I've got an addition to survival nearly ready to go (faster concordance
calculation).  At what point should I should I switch over to the newer
version, fix up my .out files etc, to best mesh with the automatic
checks on CRAN?  

   It's a nuisance for me to update, but only a nuisance.  I've been
through it twice before (Splus version 4- 5 and Splus - R switch).  I
might as well time it so as to make life as easy as possible for you
folks. Thanks for all the hard work.

Terry T

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] print(...,digits=2) behavior

2011-02-13 Thread Petr Savicky
On Mon, Feb 07, 2011 at 09:56:18AM +0100, Martin Maechler wrote:
[...]
 Namely, one important purpose of the test is to ensure that e.g.
 
   print(3.597, digits = 3)
 
 is printed as  3.6   and not 3.60
 
 Now I have had -- since 1997 at least -- an R version of
 scientific() for more easy experimentation.
 Here's an updated version of that:

[...]
 
 Now, I'm waiting for comments/suggestions ..

In a previous email, i discussed the case print(9.991, digits=3).
This case is already handled in your R level experimental function
scientific(). I noticed this only after sending the previous email.

I would like to suggest a modification of the algorithm. The purpose
is to determine the minimum number of digits, such that the printed number
has the same value as the number rounded to exactly getOption(digits)
digits, but shorter representation.

Algorithm. First, the mantissa of the number rounded to digits digits
is computed as an integer by the formula

  a - floor(alpha*10^(digits - 1) + 0.5)

The obtained integer number a satisfies

  10^(digits-1) = a = 10^digits

The case a = 10^digits corresponds to the situation that we have to
increase the exponent. This may be tested either before the remaining
part of the code or as the last step as below.

For example, if alpha = 3.597 and digits = 3, we get a = 360. The
question, whether (digits - 1) digits are sufficient for printing the
number, is equivalent to the condition that a is divisible by 10.
Similarly, (digits - 2) digits are sufficient if and only if a is
divisible by 100. This may be tested using the following code

nsig - digits
for (j in 1:nsig) {
a - a / 10
if (a == floor(a)) {
nsig - nsig - 1
} else {
break
}
}
## nsig == 0 if and only if we started with a == 10^digits
if (nsig == 0) {
nsig - 1
kpower - kpower + 1
}

This code uses double precision for a, since values up to 10^digits
may occur and digits may be 15 or more. The algorithm is not exact
for digits = 15, but works reasonably. I suggest to use a different,
slower and more accurate algorithm, if getOption(digits) = 16.

Please, find in an attachment two versions of the above algorithm. One
is a modification of your R level function from scientific.R and the
other is a patch against src/main/format.c, which i tested.

I suggest to consider the following test cases. The presented output
is obtained using the patch.

  example1 - c(
  7.94,
  7.950001,
  8.04,
  8.050001)
  for (x in example1) print(x, digits=2)
  
  [1] 7.9
  [1] 8
  [1] 8
  [1] 8.1
  
  example2 - c(
  3.599499,
  3.599501,
  3.600499,
  3.600501)
  for (x in example2) print(x, digits=4)
  
  [1] 3.599
  [1] 3.6
  [1] 3.6
  [1] 3.601
  
  example3 - c(
  1.004999,
  1.005001)
  for (x in example3) print(x, digits=7)
  
  [1] 1
  [1] 1.01
  
  example4 - c(
  9.994999,
  9.995001)
  for (x in example4) print(x, digits=7)
  
  [1] 9.99
  [1] 10

I appreciate comments.

Petr Savicky.

###--- R function that does the same as 'scientific' in

###--- /usr/local/R-0.50/src/main/format.c
###--- ~~~||~~

scientific1 - function(x, digits = getOption('digits')) ## eps = 10^-(digits)
{
  ##-   /* for 1 number  x , return
  ##-*  sgn= 1_{x  0}  {0/1}
  ##-*  kpower = Exponent of 10;
  ##-*  nsig   = min(digits, #{significant digits of alpha})
  ##-*
  ##-* where  |x| = alpha * 10^kpower   and  1 = alpha  10
  ##-*/

  eps - 10 ^(-digits)

  if (x == 0) {
kpower - 0
nsig - 1
sgn - 0
  } else {
if(x  0) {
  sgn - 1; r - -x
} else {
  sgn - 0; r - x
}
kpower - floor(log10(r));##--  r = |x| ;  10^k = r

if (kpower = -308) ## close to denormalization -- added for R 2.x.y
alpha - (r * 1e+30) / 10^(kpower+30)
else
alpha - r / 10^kpower

## a integer, 10^(digits-1) = a = 10^digits

a - floor(alpha*10^(digits - 1) + 0.5)
nsig - digits
for (j in 1:nsig) {
a - a / 10
if (a == floor(a)) {
nsig - nsig - 1
} else {
break
}
}
if (nsig == 0) {
nsig - 1
kpower - kpower + 1
}
  }
  left - kpower + 1
  c(sgn = sgn, kpower = kpower, nsig = nsig,
left = left, right = nsig - left,
sleft = sgn + max(1,left))
}

diff --minimal -U 5 -r R-devel/src/main/format.c R-devel-print/src/main/format.c
--- R-devel/src/main/format.c   2010-04-27 17:52:24.0 +0200
+++ R-devel-print/src/main/format.c 2011-02-12 11:07:37.0 +0100
@@ -167,28 +167,26 @@
alpha = (r * 1e+30)/pow(10.0, (double)(kp+30));
}
else
alpha = r / pow(10.0, (double)kp);
 
-   /* make sure that alpha is in [1,10) AFTER rounding */
-
-   if (10.0 - alpha  eps*alpha) {
-   alpha /= 10.0;
-

Re: [Rd] print(...,digits=2) behavior

2011-02-09 Thread Petr Savicky
On Mon, Feb 07, 2011 at 09:56:18AM +0100, Martin Maechler wrote:
  Ben Bolker bbol...@gmail.com
  on Sat, 5 Feb 2011 15:58:09 -0500 writes:
 
A bug was recently posted to the R bug database (which
  probably would better have been posted as a query here) as
  to why this happens:
 
  print(7.921,digits=2)
  [1] 8
  print(7.92,digits=2)
  [1] 7.9
 
[...]

 I had started to delve into this after you've posted the bug
 report. It is clearly a bug(let),
 caused by code that has been in  R  from its very
 beginning, at least in the first source code I've seen in 1994.
 
 The problem is not related to digits=2,
 but using such a low number of digits shows it more
 dramatically, e.g., also
 
   print(5.9994, digits=4)
  [1] 5.999
   print(5.9994001, digits=4)
  [1] 6
 
 Interestingly, the problem seems *not* to be present for
 digits = 1 (only).
 
 I haven't found time to mathematically analyze it for
 determining a correct solution though.
 Note that fixing this bug(let) will probably (very slightly)
 change a huge number of R outputs .. so there is a caveat,
 but nonetheless, we must approach it.
 
 The responsible code is the C function  scientific()
 in src/main/format.c 

I inspected the source of scientific() and formatReal() (2.13.0, revision
2011-02-08 r54284). Let me point out an example of the difference between
the output of print() and rounding to digits significant digits, which
is slightly different from the previous ones, since also the exponent
changes. Namely,

  print(9.991, digits=3)
  [1] 10

while rounding to 3 digits yields 9.99. The reason is in scientific(),
where the situation that rounding increases the exponent is tested using

/* make sure that alpha is in [1,10) AFTER rounding */

if (10.0 - alpha  eps*alpha) {
alpha /= 10.0;
kp += 1;
}

Here, eps is determined in formatReal() as

double eps = pow(10.0, -(double)R_print.digits);

so we have eps = 10^-digits. The above condition on alpha is equivalent to

  alpha  10.0/(1 + 10^-digits)

For digits=3, this is

  alpha  9.99000999000999

This bound may be verified as follows

  print(9.9900099900, digits=3)
  [1] 9.99

  print(9.9900099901, digits=3)
  [1] 10

The existing algorithm for choosing the number of digits is designed to
predict the format suitable for all numbers in a vector before the actual
call of sprintf() or snprintf(). For speed, this algorithm should use
the standard double precision, so it is necessarily inaccurate for precision
15 and more and there may be some rare such cases also for smaller
precisions. For smaller precisions, say below 7, the algorithm can be made
more precise. This would change the output in rare cases and mainly for
printing single numbers. If a vector is printed, then the format is typically
not determined by the problematic numbers.

Changing the default behavior may be undesirable for backward compatibility
reasons. If this is the case, then a possible solution is to make the
documentation more precise on this and include pointers to possible
solutions. The functions sprintf(), formatC() and signif() may be used. In
particular, if signif() is used to round the numbers before printing, then we
get the correct output

  print(signif(7.921, digits=2))  
  [1] 7.9

  print(signif(9.9900099901, digits=3))
  [1] 9.99

The current ?print.default contains

  digits: a non-null value for ‘digits’ specifies the minimum number of
  significant digits to be printed in values.

The word minimum here means that all numbers in the vector will have
at least the chosen number of digits, but some may have more. I suggest
to add See 'options' for more detail..

The current ?options contains

‘digits’: controls the number of digits to print when printing
  numeric values.  It is a suggestion only.

I suggest to extend this to

   It is a suggestion only and the actual number of printed digits
   may be smaller, if the relative error of the output number is
   less than 10^-digits. Use 'signif(, digits)' before printing to get
   the exact number of the printed significant digits.

I appreciate to know the opinion of R developers on this.

Petr Savicky.

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] print(...,digits=2) behavior

2011-02-07 Thread Martin Maechler
 Ben Bolker bbol...@gmail.com
 on Sat, 5 Feb 2011 15:58:09 -0500 writes:

   A bug was recently posted to the R bug database (which
 probably would better have been posted as a query here) as
 to why this happens:

 print(7.921,digits=2)
 [1] 8
 print(7.92,digits=2)
 [1] 7.9

   Two things I *haven't* done to help make sense of this
 for myself are (1) writing out the binary representations
 to see if something obvious pops out about why this would
 be a breakpoint and (2) poking in the source code (I did a
 little bit of this but gave up).

   I know that confusion over rounding etc. is very common,
 and my first reaction to this sort of question is always
 that there must be some sort of confusion (either (1) in a
 FAQ 7.31-ish sort of way that floating point values have
 finite precision in the first place, or (2) a confusion
 over the difference between the value and the
 representation of the number, or (3) more subtly, about
 the differences among various choices of rounding
 conventions).

   However, in this case I am a bit stumped: I don't see
 that any of the standard confusions apply.  I grepped the
 R manuals for rounding and didn't find anything useful
 ...  I have a very strong prior that such a core part of R
 must be correct, and that therefore I (and the original
 bug reporter) must be misunderstanding something.

   Thoughts/references?

I had started to delve into this after you've posted the bug
report. It is clearly a bug(let),
caused by code that has been in  R  from its very
beginning, at least in the first source code I've seen in 1994.

The problem is not related to digits=2,
but using such a low number of digits shows it more
dramatically, e.g., also

  print(5.9994, digits=4)
 [1] 5.999
  print(5.9994001, digits=4)
 [1] 6

Interestingly, the problem seems *not* to be present for
digits = 1 (only).

I haven't found time to mathematically analyze it for
determining a correct solution though.
Note that fixing this bug(let) will probably (very slightly)
change a huge number of R outputs .. so there is a caveat,
but nonetheless, we must approach it.

The responsible code is the C function  scientific()
in src/main/format.c 

a somewhat direct R interface to its functionality is given by
R's format.info() :

   format.info(7.92, digits=2)
  [1] 3 1 0
   format.info(7.921, digits=2)
  [1] 1 0 0

which means that 7.92 will use 3 characters, 1 digit after the decimal
7.921 will use 1 character, 0 digits after decimal.

The crucial buggy part of the code is  {line 180 ff} :

/* compute number of digits */

*nsig = R_print.digits;
for (j = 1; j = *nsig; j++) {
if (fabs(alpha - floor(alpha+0.5))  eps * alpha) {
*nsig = j;
break;
}
alpha *= 10.0;
}

where notably
if (fabs(alpha - floor(alpha+0.5))  eps * alpha) {
is not quite the correct check.
Note that  eps = 10^-2  in our case and  alpha = 7.92 or 7.921
and that  8 - 7.921 = 0.079   7.921/10^2  is just at the border.

Looking for all the border cases is solving the above
analytically with '=' and setting  k = ceiling(alpha) :

 p0 - function(...) paste(..., sep=)
 k - 6:9; names(k) - p0(k=,k)
 d - 1:5; names(d) - p0(d=,d)
 xc - outer(k, 1+10^-d, /)
 print(xc, digits= 11)
 d=1  d=2 d=3d=4  d=5
k=6 5.4545454545 5.9405940594 5.994005994 5.99940006 5.46
k=7 6.3636363636 6.9306930693 6.993006993 6.99930007 6.37
k=8 7.2727272727 7.9207920792 7.992007992 7.99920008 7.28
k=9 8.1818181818 8.9108910891 8.991008991 8.99910009 8.19

So we see that for our case, 7.920792... is more exactly the
border case.

One change that is *not* correct is making it an absolute
instead of relative comparison, i.e.  replacing the RHS
eps * alpha  by  eps

Namely, one important purpose of the test is to ensure that e.g.

  print(3.597, digits = 3)

is printed as  3.6   and not 3.60

Now I have had -- since 1997 at least -- an R version of
scientific() for more easy experimentation.
Here's an updated version of that:

###--- R function that does the same as 'scientific' in

###--- /usr/local/R-0.50/src/main/format.c
###--- ~~~||~~

scientific - function(x, digits = getOption('digits')) ## eps = 10^-(digits)
{
  ##-   /* for 1 number  x , return
  ##-*  sgn= 1_{x  0}  {0/1}
  ##-*  kpower = Exponent of 10;
  ##-*  nsig   = min(digits, #{significant digits of alpha})
  ##-*
  ##-* where  |x| = alpha * 10^kpower   and  1 = alpha  10
  ##-*/

  eps - 10 ^(-digits)

  if (x == 0) {
kpower - 0
nsig - 1
sgn - 0
  } else {
if(x  0) {
  sgn - 1; r - -x
} else {
  sgn - 0; r - x
}
kpower - floor(log10(r));##--  r = |x| ;  10^k = r

if (kpower = -308) ## close to 

Re: [Rd] print(...,digits=2) behavior

2011-02-07 Thread Petr Savicky
On Mon, Feb 07, 2011 at 09:56:18AM +0100, Martin Maechler wrote:
  Ben Bolker bbol...@gmail.com
  on Sat, 5 Feb 2011 15:58:09 -0500 writes:
 
A bug was recently posted to the R bug database (which
  probably would better have been posted as a query here) as
  to why this happens:
 
  print(7.921,digits=2)
  [1] 8
  print(7.92,digits=2)
  [1] 7.9
 
Two things I *haven't* done to help make sense of this
  for myself are (1) writing out the binary representations
  to see if something obvious pops out about why this would
  be a breakpoint and (2) poking in the source code (I did a
  little bit of this but gave up).
 
I know that confusion over rounding etc. is very common,
  and my first reaction to this sort of question is always
  that there must be some sort of confusion (either (1) in a
  FAQ 7.31-ish sort of way that floating point values have
  finite precision in the first place, or (2) a confusion
  over the difference between the value and the
  representation of the number, or (3) more subtly, about
  the differences among various choices of rounding
  conventions).
 
However, in this case I am a bit stumped: I don't see
  that any of the standard confusions apply.  I grepped the
  R manuals for rounding and didn't find anything useful
  ...  I have a very strong prior that such a core part of R
  must be correct, and that therefore I (and the original
  bug reporter) must be misunderstanding something.
 
Thoughts/references?
 
 I had started to delve into this after you've posted the bug
 report. It is clearly a bug(let),
 caused by code that has been in  R  from its very
 beginning, at least in the first source code I've seen in 1994.
 
 The problem is not related to digits=2,
 but using such a low number of digits shows it more
 dramatically, e.g., also
 
   print(5.9994, digits=4)
  [1] 5.999
   print(5.9994001, digits=4)
  [1] 6

I think that this may be a consequence of the following
analysis, which i did some time ago using the source code
of scientific() function in src/main/format.c. The
conclusion was the following.

Printing to k digits looks for the smallest number of
digits l = k so that the relative error of the printed
mantissa (significand) is at most 10^-k. If the mantissa
begins with a digit less than 5, then this condition is
stronger than having absolute error at most 5*10^-k. So,
in this case, we cannot get lower accuracy than rounding
to k digits.

If the mantissa begins with 5 or more, then having relative
error at most 10^-k is a weaker condition than having absolute
error at most 5*10^-k. In this case, the chosen l may be
smaller than k even if the printed number has larger error
than the number rounded to k digits.

This may be seen in the following example

  xx - 8 + (6:10)*10^-7
  for (x in xx) print(x)

  [1] 8
  [1] 8
  [1] 8
  [1] 8.01
  [1] 8.01

where all the printed numbers are 8.01 if rounded
to 7 digits.

The cases 

  print(7.921,digits=2)
  [1] 8

  print(7.92,digits=2)
  [1] 7.9

seem to have the same explanation. The relative errors of the
numbers rounded to a single digit are

  (8 - 7.921)/7.921
  [1] 0.009973488

  (8 - 7.92)/7.92
  [1] 0.01010101

In the first case, this is less than 10^-2 and so, l = 1
is used. In the second case, the relative error for l = 1
is larger than 10^-2 an so, l = 2 is chosen.

In the cases

  print(5.9994, digits=4)
  [1] 5.999
  print(5.9994001, digits=4)
  [1] 6

the relative errors of one digit output are

  (6 - 5.9994)/5.9994
  [1] 0.00010001
  
  (6 - 5.9994001)/5.9994001
  [1] 9.999333e-05

Here, one digit output is chosen if the relative error is
less than 10^-4 and not otherwise.

Petr Savicky.

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] print(...,digits=2) behavior

2011-02-05 Thread Ben Bolker

  A bug was recently posted to the R bug database (which probably would
better have been posted as a query here) as to why this happens:

 print(7.921,digits=2)
[1] 8
 print(7.92,digits=2)
[1] 7.9

  Two things I *haven't* done to help make sense of this for myself are
(1) writing out the binary representations to see if something obvious
pops out about why this would be a breakpoint and (2) poking in the
source code (I did a little bit of this but gave up).

  I know that confusion over rounding etc. is very common, and my first
reaction to this sort of question is always that there must be some sort
of confusion (either (1) in a FAQ 7.31-ish sort of way that floating
point values have finite precision in the first place, or (2) a
confusion over the difference between the value and the representation
of the number, or (3) more subtly, about the differences among various
choices of rounding conventions).

  However, in this case I am a bit stumped: I don't see that any of the
standard confusions apply.  I grepped the R manuals for rounding and
didn't find anything useful ...  I have a very strong prior that such a
core part of R must be correct, and that therefore I (and the original
bug reporter) must be misunderstanding something.

  Thoughts/references?

  cheers
Ben Bolker

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel