On Sun, 16 Mar 2014, Duncan Murdoch wrote:

On 14-03-16 2:13 AM, Mike Miller wrote:

I always knew there was some numerical reason why I was getting very long stretches of 9s or 0s in the write.table() output, but my concern is really with how to prevent that from happening. So the question still is, how do I avoid getting 0.00499999999999989 in my output file when I want 0.005? I'm sure I'm not alone in this. It looks like the standard answer is to use format(). For example, I could do this:

write.table(format(data, digits=13, trim=T), file="data.txt", row.names=F, 
col.names=F, quote=F)

You could also round the numbers to 13 digits before printing, e.g.

write.table(signif(data, digits=13), ...)

(or use round() if you want to specify decimal places instead of significant digits).


I like that idea. It can be used in exactly that way only if all of the variables in the data frame are numeric. I can use signif() on the numeric variables before using write.table():

data[,c(5:9,11,13,17:21)] <- signif(data[,c(5:9,11,13,17:21)], digits=5)

Then write.table(data) does what I'd want. It works better than format(). Example:

data2 <- data
data2[,c(5:9,11,13,17:21)] <- signif(data2[,c(5:9,11,13,17:21)], digits=5)

write.table(format(data[1:10,], digits=5, trim=T), row.names=F, col.names=F, 
quote=F)
3100674 303164 6 1 -0.11869237 0.0073947 0.0084493 0.00012708 -0.1320 1 0 TT 1 
GA 0 0 2 0.000 0 0.000 0.00000
3100765 303321 6 1 0.01434426 -0.0136545 -0.0017613 0.08502718 1.0365 1 1 CT 1 
GA 1 0 1 0.000 0 1.000 1.00000
3101201 304352 6 1 -0.01710451 -0.0169568 0.0320392 0.00884896 0.4279 1 1 CT 2 
GG 1 0 1 0.000 0 1.000 1.00000
3101862 305250 6 1 -0.01328316 0.0108479 -0.0170081 -0.03692398 -0.4470 1 0 TT 
1 GA 0 0 2 0.000 1 1.000 1.00000
3103579 305847 6 1 0.01593935 0.0096043 -0.0437904 -0.02224669 -0.3365 1 0 TT 1 
GA 0 0 2 0.000 0 0.000 0.00000
3103645 305961 6 1 0.20441289 -0.1090142 0.2727132 -0.29890268 1.5818 1 2 CC 0 
AA 2 0 0 0.000 0 2.000 4.00000
3104098 308536 6 1 0.02842117 0.0562814 -0.0715448 -0.11510562 0.9974 1 0 TT 0 
AA 0 1 1 0.944 0 0.944 0.89114
3104361 306928 6 1 -0.04840401 0.0266719 -0.0548747 -0.03640484 0.4499 1 0 TT 0 
AA 0 0 2 0.000 1 1.000 1.00000
5100094 503136 6 1 0.19702704 -0.4104611 0.0869569 -0.03952420 0.3057 1 2 CC 0 
AA 2 0 0 0.000 0 2.000 4.00000
5100938 503615 6 1 0.00098838 0.0267176 0.0451301 0.04790277 -0.1743 2 1 CT 0 
AA 1 0 1 0.000 0 1.000 1.00000

write.table(data2[1:10,], row.names=F, col.names=F, quote=F)
3100674 303164 6 1 -0.11869 0.0073947 0.0084493 0.00012708 -0.132 1 0 TT 1 GA 0 
0 2 0 0 0 0
3100765 303321 6 1 0.014344 -0.013654 -0.0017613 0.085027 1.0365 1 1 CT 1 GA 1 
0 1 0 0 1 1
3101201 304352 6 1 -0.017105 -0.016957 0.032039 0.008849 0.4279 1 1 CT 2 GG 1 0 
1 0 0 1 1
3101862 305250 6 1 -0.013283 0.010848 -0.017008 -0.036924 -0.447 1 0 TT 1 GA 0 
0 2 0 1 1 1
3103579 305847 6 1 0.015939 0.0096043 -0.04379 -0.022247 -0.3365 1 0 TT 1 GA 0 
0 2 0 0 0 0
3103645 305961 6 1 0.20441 -0.10901 0.27271 -0.2989 1.5818 1 2 CC 0 AA 2 0 0 0 
0 2 4
3104098 308536 6 1 0.028421 0.056281 -0.071545 -0.11511 0.9974 1 0 TT 0 AA 0 1 
1 0.944 0 0.944 0.89114
3104361 306928 6 1 -0.048404 0.026672 -0.054875 -0.036405 0.4499 1 0 TT 0 AA 0 
0 2 0 1 1 1
5100094 503136 6 1 0.19703 -0.41046 0.086957 -0.039524 0.3057 1 2 CC 0 AA 2 0 0 
0 0 2 4
5100938 503615 6 1 0.00098838 0.026718 0.04513 0.047903 -0.1743 2 1 CT 0 AA 1 0 
1 0 0 1 1

format() with digits=5 is still showing 7 significant digits. Why? signif() only shows 5. Another thing that is desirable about signif(), at least for me, in the write.table() output is that a number like 1.00000 is presented simply as 1. I think I would always want that.

I also think the signif() approach, if I replace some variables with signif() versions of those variables, doesn't force me to make a really huge additional data frame.

In R, if I do this...

data <- signif(data, digits=12)

...do I need to have enough memory to hold two copies of the data frame called "data"? If the answer is "yes," then that is a problem.

I assume that "data" and "signif(data, digits=12)" use the same amount of memory: 8 bytes per numeric value (double precision), and that is much better than "format(data, digits=12)" because the numbers then must use more than 12 bytes each.

Mike

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to