On 14-03-16 2:13 AM, Mike Miller wrote:
On Sat, 15 Mar 2014, peter dalgaard wrote:

On 15 Mar 2014, at 20:54 , Mike Miller <mbmille...@gmail.com> wrote:

$ cat data1.txt
0.005
0.00499999999999989

I don't know why it shows 17 digits and doesn't round to 15, but it is showing 
that the numbers are different, for some reason.


Aiding my weakening eyesight a little:

0.004 999 999 999 999 89

Notice that that makes 15 _significant_ digits.

OK, now I feel really stupid.  Of course it's 15 mantissa digits, not 15
%f digits, or whatever that should be called.  Sorry about that.


Do you understand why there is a difference between 1-0.995 and 2-1.995
in their internal representations?

Let's see,  that'll be like

1 - 2/3 vs. 10 - 29/3

on a decimal computer if someone is perverse enough to give input in
base 3 (i.e., 1.0 - 0.2 ternary vs. 101.0 - 100.2 ternary). Assume that
the computer is floating point with 3 significant digits (and possibly
taking some liberties compared to what real computers really do), we
have

   1 = 1.000 * 10^0
  10 = 1.000 * 10^1
2/3 = 0.667 * 10^0
29/3 = 0.967 * 10^1

1 - 2/3  = 0.333 * 10^0
10 - 29/3 = 0.033 * 10^1 = 0.330 * 10^0

So, yes, I think I do understand how these things can happen.

Yes, and that's a nice explanation, but you had me at "_significant_".  I
don't know why I didn't get that in the first place.  So the difference in
my example is that 0.995 is 9.950e-1 so that the 5 is the third
significant digit and in 1.995, the 5 is the fourth significant digit, so
1-0.995 provides a more precise representation of 0.005 than does 2-1.995.

I always knew there was some numerical reason why I was getting very long
stretches of 9s or 0s in the write.table() output, but my concern is
really with how to prevent that from happening.  So the question still is,
how do I avoid getting 0.00499999999999989 in my output file when I want
0.005?  I'm sure I'm not alone in this.  It looks like the standard answer
is to use format().  For example, I could do this:

write.table(format(data, digits=13, trim=T), file="data.txt", row.names=F, 
col.names=F, quote=F)

You could also round the numbers to 13 digits before printing, e.g.

write.table(signif(data, digits=13), ...)

(or use round() if you want to specify decimal places instead of significant digits).

Duncan Murdoch


That does fix the long numbers -- all of them are reduced to three digits.
The one thing that concerns me is that when format() is called, isn't it
making an object that could take up a lot of memory if the data frame is
large?  The data frame created by format() might use a lot more memory
than the original data frame if it is converting a lot of doubles (8
bytes) to a lot of possibly 16-byte strings.  For example, -10/81 takes up
8 bytes as a double, but converted by format with digits=13 it uses 16
bytes to include the sign, the zero and the decimal point (plus a
delimiter when there are many per line of output):

write.table(format(-10/81, digits=13), row.names=F, col.names=F, quote=F)
-0.1234567901235

I'm assuming that write.table() is streaming the data into a file (or
stdout) and not creating a complete representation of the output in memory
before it does that.  It looks like format() creates a data frame where
all variables are converted to character type.  Thus, it wouldn't be just
for convenience that one might want digits=N to be an option in the
write.table() function.  It would be very useful with large data frames,
making it possible to write out things that would be too large to handle
using format().  When files are already super-large, we really want to
avoid expanding the number of digits per value in the output.

Mike


______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to