On 14-03-17 8:43 PM, Mike Miller wrote:
On Mon, 17 Mar 2014, Duncan Murdoch wrote:

On 14-03-17 6:22 PM, Mike Miller wrote:

Thanks!  Another thing I've figured out:  Use of "drop0trailing=T" in
format() fixes the .00000 stuff that I didn't like:

write.table(format(data[1:10,], digits=5, trim=T, drop0trailing=T), 
row.names=F, col.names=F, quote=F)
[snip]

I still have more to figure out, but for most smaller table-writing
jobs, I think something like the last command above will be my usual
approach. In real life, I would use a tab delimiter, though.

I'm still unsure about the best way for dealing with very large data
frames, though.  There's probably a good way to stream data into a file
so that it doesn't have to be written as an additional large object in
memory.  There must be a way to make a connection and then just pipe
the formatted data into it.  Maybe something related to sprintf() will
work.

You've never explained why you want to write these gigantic text files.
Text is a lossy way to store numbers:  it takes 15 bytes to store about
8 bytes of information, and you'll probably lose a few bits at the end.
Why not write your files in binary, storing exactly what you have in
memory?  It'll be a lot faster to write and to read, you won't need to
duplicated before writing, etc.


Thanks for asking, Duncan.  A typical problem is that I am running 12
processes at once on a 12-core machine with 32 GB of RAM, so each process
has to be limited to about 2.5 GB total.  Then I try to load as much data
as I can within that limitation.  The output data does not always need to
be in text format, but it usually does because it has to be read by other
programs.

Other programs are unlikely to be able to read save() files, but they should be able to read the output of writeBin. Not all programs can do it easily, e.g. I wouldn't want to try to do that in Excel (though I think you can using VBA), but most should be able to.

The main reasons to use text files are so that humans can read the output or so that you can keep it for a long time and not worry about losing the documentation of the internal format; neither of those seems to apply to your use case. Binary files are better for interprocess communication, because you skip two conversion steps.

Duncan Murdoch


I was hoping I could read a line from a data frame and format it like
this:

sprintf(c(rep("%s",2), rep("%d",2), rep("%.4f",4)), data[1,1:8])

But sprintf reads vectors, so they have to be of a single type.

Thanks for your help.

Mike


______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to