On 14-03-17 6:22 PM, Mike Miller wrote:
On Mon, 17 Mar 2014, Berend Hasselman wrote:
On 17-03-2014, at 21:03, Mike Miller <mbmille...@gmail.com> wrote:
…...
data[,c(5:9,11,13,17:21)] <- signif(data[,c(5:9,11,13,17:21)], digits=5)
Then write.table(data) does what I'd want. It works better than format().
Example:
data2 <- data
data2[,c(5:9,11,13,17:21)] <- signif(data2[,c(5:9,11,13,17:21)], digits=5)
write.table(format(data[1:10,], digits=5, trim=T), row.names=F, col.names=F,
quote=F)
3100674 303164 6 1 -0.11869237 0.0073947 0.0084493 0.00012708 -0.1320 1 0 TT 1
GA 0 0 2 0.000 0 0.000 0.00000
3100765 303321 6 1 0.01434426 -0.0136545 -0.0017613 0.08502718 1.0365 1 1 CT 1
GA 1 0 1 0.000 0 1.000 1.00000
3101201 304352 6 1 -0.01710451 -0.0169568 0.0320392 0.00884896 0.4279 1 1 CT 2
GG 1 0 1 0.000 0 1.000 1.00000
3101862 305250 6 1 -0.01328316 0.0108479 -0.0170081 -0.03692398 -0.4470 1 0 TT
1 GA 0 0 2 0.000 1 1.000 1.00000
3103579 305847 6 1 0.01593935 0.0096043 -0.0437904 -0.02224669 -0.3365 1 0 TT 1
GA 0 0 2 0.000 0 0.000 0.00000
3103645 305961 6 1 0.20441289 -0.1090142 0.2727132 -0.29890268 1.5818 1 2 CC 0
AA 2 0 0 0.000 0 2.000 4.00000
3104098 308536 6 1 0.02842117 0.0562814 -0.0715448 -0.11510562 0.9974 1 0 TT 0
AA 0 1 1 0.944 0 0.944 0.89114
3104361 306928 6 1 -0.04840401 0.0266719 -0.0548747 -0.03640484 0.4499 1 0 TT 0
AA 0 0 2 0.000 1 1.000 1.00000
5100094 503136 6 1 0.19702704 -0.4104611 0.0869569 -0.03952420 0.3057 1 2 CC 0
AA 2 0 0 0.000 0 2.000 4.00000
5100938 503615 6 1 0.00098838 0.0267176 0.0451301 0.04790277 -0.1743 2 1 CT 0
AA 1 0 1 0.000 0 1.000 1.00000
write.table(data2[1:10,], row.names=F, col.names=F, quote=F)
3100674 303164 6 1 -0.11869 0.0073947 0.0084493 0.00012708 -0.132 1 0 TT 1 GA 0
0 2 0 0 0 0
3100765 303321 6 1 0.014344 -0.013654 -0.0017613 0.085027 1.0365 1 1 CT 1 GA 1
0 1 0 0 1 1
3101201 304352 6 1 -0.017105 -0.016957 0.032039 0.008849 0.4279 1 1 CT 2 GG 1 0
1 0 0 1 1
3101862 305250 6 1 -0.013283 0.010848 -0.017008 -0.036924 -0.447 1 0 TT 1 GA 0
0 2 0 1 1 1
3103579 305847 6 1 0.015939 0.0096043 -0.04379 -0.022247 -0.3365 1 0 TT 1 GA 0
0 2 0 0 0 0
3103645 305961 6 1 0.20441 -0.10901 0.27271 -0.2989 1.5818 1 2 CC 0 AA 2 0 0 0
0 2 4
3104098 308536 6 1 0.028421 0.056281 -0.071545 -0.11511 0.9974 1 0 TT 0 AA 0 1
1 0.944 0 0.944 0.89114
3104361 306928 6 1 -0.048404 0.026672 -0.054875 -0.036405 0.4499 1 0 TT 0 AA 0
0 2 0 1 1 1
5100094 503136 6 1 0.19703 -0.41046 0.086957 -0.039524 0.3057 1 2 CC 0 AA 2 0 0
0 0 2 4
5100938 503615 6 1 0.00098838 0.026718 0.04513 0.047903 -0.1743 2 1 CT 0 AA 1 0
1 0 0 1 1
format() with digits=5 is still showing 7 significant digits. Why? signif()
only shows 5.
From the help of format:
digits "how many significant digits are to be used for numeric and
complex x. The default, NULL, uses getOption("digits"). This is a
suggestion: enough decimal places will be used so that the smallest (in
magnitude) number has this many significant digits, and also to satisfy
nsmall. (For the interpretation for complex numbers see signif.)”
So if I read this correctly the smallest number will have 5 significant
digits. Larger numbers may get more. Given the fixed width (see argument
trim).
Thanks! Another thing I've figured out: Use of "drop0trailing=T" in
format() fixes the .00000 stuff that I didn't like:
write.table(format(data[1:10,], digits=5, trim=T, drop0trailing=T),
row.names=F, col.names=F, quote=F)
3100674 303164 6 1 -0.11869237 0.0073947 0.0084493 0.00012708 -0.132 1 0 TT 1
GA 0 0 2 0 0 0 0
3100765 303321 6 1 0.01434426 -0.0136545 -0.0017613 0.08502718 1.0365 1 1 CT 1
GA 1 0 1 0 0 1 1
3101201 304352 6 1 -0.01710451 -0.0169568 0.0320392 0.00884896 0.4279 1 1 CT 2
GG 1 0 1 0 0 1 1
3101862 305250 6 1 -0.01328316 0.0108479 -0.0170081 -0.03692398 -0.447 1 0 TT 1
GA 0 0 2 0 1 1 1
3103579 305847 6 1 0.01593935 0.0096043 -0.0437904 -0.02224669 -0.3365 1 0 TT 1
GA 0 0 2 0 0 0 0
3103645 305961 6 1 0.20441289 -0.1090142 0.2727132 -0.29890268 1.5818 1 2 CC 0
AA 2 0 0 0 0 2 4
3104098 308536 6 1 0.02842117 0.0562814 -0.0715448 -0.11510562 0.9974 1 0 TT 0
AA 0 1 1 0.944 0 0.944 0.89114
3104361 306928 6 1 -0.04840401 0.0266719 -0.0548747 -0.03640484 0.4499 1 0 TT 0
AA 0 0 2 0 1 1 1
5100094 503136 6 1 0.19702704 -0.4104611 0.0869569 -0.0395242 0.3057 1 2 CC 0
AA 2 0 0 0 0 2 4
5100938 503615 6 1 0.00098838 0.0267176 0.0451301 0.04790277 -0.1743 2 1 CT 0
AA 1 0 1 0 0 1 1
That's pretty close to the signif() output I was getting (above) but with
a few digits added because of the small numbers (as you explained).
format() with trim=T seems to just delete the spaces that format() would
have added for column alignment. It doesn't seem to affect the number of
digits displayed.
I still have more to figure out, but for most smaller table-writing jobs,
I think something like the last command above will be my usual approach.
In real life, I would use a tab delimiter, though.
I'm still unsure about the best way for dealing with very large data
frames, though. There's probably a good way to stream data into a file so
that it doesn't have to be written as an additional large object in
memory. There must be a way to make a connection and then just pipe the
formatted data into it. Maybe something related to sprintf() will work.
You've never explained why you want to write these gigantic text files.
Text is a lossy way to store numbers: it takes 15 bytes to store
about 8 bytes of information, and you'll probably lose a few bits at the
end. Why not write your files in binary, storing exactly what you have
in memory? It'll be a lot faster to write and to read, you won't need
to duplicated before writing, etc.
Duncan Murdoch
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.