On Mon, 17 Mar 2014, Berend Hasselman wrote:

On 17-03-2014, at 21:03, Mike Miller <mbmille...@gmail.com> wrote:

…...
data[,c(5:9,11,13,17:21)] <- signif(data[,c(5:9,11,13,17:21)], digits=5)

Then write.table(data) does what I'd want.  It works better than format(). 
Example:

data2 <- data
data2[,c(5:9,11,13,17:21)] <- signif(data2[,c(5:9,11,13,17:21)], digits=5)

write.table(format(data[1:10,], digits=5, trim=T), row.names=F, col.names=F, 
quote=F)
3100674 303164 6 1 -0.11869237 0.0073947 0.0084493 0.00012708 -0.1320 1 0 TT 1 
GA 0 0 2 0.000 0 0.000 0.00000
3100765 303321 6 1 0.01434426 -0.0136545 -0.0017613 0.08502718 1.0365 1 1 CT 1 
GA 1 0 1 0.000 0 1.000 1.00000
3101201 304352 6 1 -0.01710451 -0.0169568 0.0320392 0.00884896 0.4279 1 1 CT 2 
GG 1 0 1 0.000 0 1.000 1.00000
3101862 305250 6 1 -0.01328316 0.0108479 -0.0170081 -0.03692398 -0.4470 1 0 TT 
1 GA 0 0 2 0.000 1 1.000 1.00000
3103579 305847 6 1 0.01593935 0.0096043 -0.0437904 -0.02224669 -0.3365 1 0 TT 1 
GA 0 0 2 0.000 0 0.000 0.00000
3103645 305961 6 1 0.20441289 -0.1090142 0.2727132 -0.29890268 1.5818 1 2 CC 0 
AA 2 0 0 0.000 0 2.000 4.00000
3104098 308536 6 1 0.02842117 0.0562814 -0.0715448 -0.11510562 0.9974 1 0 TT 0 
AA 0 1 1 0.944 0 0.944 0.89114
3104361 306928 6 1 -0.04840401 0.0266719 -0.0548747 -0.03640484 0.4499 1 0 TT 0 
AA 0 0 2 0.000 1 1.000 1.00000
5100094 503136 6 1 0.19702704 -0.4104611 0.0869569 -0.03952420 0.3057 1 2 CC 0 
AA 2 0 0 0.000 0 2.000 4.00000
5100938 503615 6 1 0.00098838 0.0267176 0.0451301 0.04790277 -0.1743 2 1 CT 0 
AA 1 0 1 0.000 0 1.000 1.00000

write.table(data2[1:10,], row.names=F, col.names=F, quote=F)
3100674 303164 6 1 -0.11869 0.0073947 0.0084493 0.00012708 -0.132 1 0 TT 1 GA 0 
0 2 0 0 0 0
3100765 303321 6 1 0.014344 -0.013654 -0.0017613 0.085027 1.0365 1 1 CT 1 GA 1 
0 1 0 0 1 1
3101201 304352 6 1 -0.017105 -0.016957 0.032039 0.008849 0.4279 1 1 CT 2 GG 1 0 
1 0 0 1 1
3101862 305250 6 1 -0.013283 0.010848 -0.017008 -0.036924 -0.447 1 0 TT 1 GA 0 
0 2 0 1 1 1
3103579 305847 6 1 0.015939 0.0096043 -0.04379 -0.022247 -0.3365 1 0 TT 1 GA 0 
0 2 0 0 0 0
3103645 305961 6 1 0.20441 -0.10901 0.27271 -0.2989 1.5818 1 2 CC 0 AA 2 0 0 0 
0 2 4
3104098 308536 6 1 0.028421 0.056281 -0.071545 -0.11511 0.9974 1 0 TT 0 AA 0 1 
1 0.944 0 0.944 0.89114
3104361 306928 6 1 -0.048404 0.026672 -0.054875 -0.036405 0.4499 1 0 TT 0 AA 0 
0 2 0 1 1 1
5100094 503136 6 1 0.19703 -0.41046 0.086957 -0.039524 0.3057 1 2 CC 0 AA 2 0 0 
0 0 2 4
5100938 503615 6 1 0.00098838 0.026718 0.04513 0.047903 -0.1743 2 1 CT 0 AA 1 0 
1 0 0 1 1

format() with digits=5 is still showing 7 significant digits.  Why? signif() 
only shows 5.


From the help of format:

digits "how many significant digits are to be used for numeric and complex x. The default, NULL, uses getOption("digits"). This is a suggestion: enough decimal places will be used so that the smallest (in magnitude) number has this many significant digits, and also to satisfy nsmall. (For the interpretation for complex numbers see signif.)”

So if I read this correctly the smallest number will have 5 significant digits. Larger numbers may get more. Given the fixed width (see argument trim).


Thanks! Another thing I've figured out: Use of "drop0trailing=T" in format() fixes the .00000 stuff that I didn't like:

write.table(format(data[1:10,], digits=5, trim=T, drop0trailing=T), 
row.names=F, col.names=F, quote=F)
3100674 303164 6 1 -0.11869237 0.0073947 0.0084493 0.00012708 -0.132 1 0 TT 1 
GA 0 0 2 0 0 0 0
3100765 303321 6 1 0.01434426 -0.0136545 -0.0017613 0.08502718 1.0365 1 1 CT 1 
GA 1 0 1 0 0 1 1
3101201 304352 6 1 -0.01710451 -0.0169568 0.0320392 0.00884896 0.4279 1 1 CT 2 
GG 1 0 1 0 0 1 1
3101862 305250 6 1 -0.01328316 0.0108479 -0.0170081 -0.03692398 -0.447 1 0 TT 1 
GA 0 0 2 0 1 1 1
3103579 305847 6 1 0.01593935 0.0096043 -0.0437904 -0.02224669 -0.3365 1 0 TT 1 
GA 0 0 2 0 0 0 0
3103645 305961 6 1 0.20441289 -0.1090142 0.2727132 -0.29890268 1.5818 1 2 CC 0 
AA 2 0 0 0 0 2 4
3104098 308536 6 1 0.02842117 0.0562814 -0.0715448 -0.11510562 0.9974 1 0 TT 0 
AA 0 1 1 0.944 0 0.944 0.89114
3104361 306928 6 1 -0.04840401 0.0266719 -0.0548747 -0.03640484 0.4499 1 0 TT 0 
AA 0 0 2 0 1 1 1
5100094 503136 6 1 0.19702704 -0.4104611 0.0869569 -0.0395242 0.3057 1 2 CC 0 
AA 2 0 0 0 0 2 4
5100938 503615 6 1 0.00098838 0.0267176 0.0451301 0.04790277 -0.1743 2 1 CT 0 
AA 1 0 1 0 0 1 1

That's pretty close to the signif() output I was getting (above) but with a few digits added because of the small numbers (as you explained).

format() with trim=T seems to just delete the spaces that format() would have added for column alignment. It doesn't seem to affect the number of digits displayed.

I still have more to figure out, but for most smaller table-writing jobs, I think something like the last command above will be my usual approach. In real life, I would use a tab delimiter, though.

I'm still unsure about the best way for dealing with very large data frames, though. There's probably a good way to stream data into a file so that it doesn't have to be written as an additional large object in memory. There must be a way to make a connection and then just pipe the formatted data into it. Maybe something related to sprintf() will work.

Mike
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to