Re: [R] Hand-crafting an .RData file

Adam D. I. Kramer Mon, 09 Nov 2009 22:33:04 -0800

Thanks as always for a very helpful response. I'm now loading a few million
rows in only a few seconds.


Cordially,
Adam Kramer

On Mon, 9 Nov 2009, Prof Brian Ripley wrote:

The R 'save' format (as used for the saved workspace .RData) is described inthe 'R Internals' manual (section 1.8). It is intended for R objects, andyou would first have to create one[*] of those in your other application.That seems a lot of work.
The normal way to transfer numeric data between applications is to write abinary file: R can read such files with readBin(), and it also haswrappers/C-code to read a number of commmon binary data formats (e.g. thosefrom SPSS).
With character data there are more issues (and more formats, see alsoreadChar()), but load() is not particularly fast for those.
Ultimately the R functions pay a performance price for their flexibility sohand-crafted C code to read the format can be worthwhile: but see thecomments below about whether I/O speed is that important.
[*] the 'save' format is a serialization of a single R object, even if yousave many objects, since the object(s) are combined into a pairlist.
On Sun, 8 Nov 2009, Adam D. I. Kramer wrote:
Hello,

        I frequently have to export a large quantity of data from some
source (for example, a database, or a hand-written perl script) and then
read it into R.  This occasionally takes a lot of time; I'm usually using
read.table("filename",comment.char="",quote="") to read the data once it is
written to disk.
Specifying colClasses and nrows will usually help.
To read from a database, packages such as RODBC use binary data transfer:with suitable tuning this can be fast.
        However, I *know* that the program that generates the data is more
or less just calling printf in a for loop to create the csv ortab-delimitedfile, writing, then having R parse it, which is pretty inefficient.Instead, I am interested in figuring out how to write the data in .RData
format so that I can load() it instead of read.table() it.
Without more details it is hard to say if it is inefficient. read.table() canread data pretty fast (millions of items per second) if used following thehints in the 'R Data' manual. See e.g.
https://stat.ethz.ch/pipermail/r-devel/2004-December/031733.html
Almost anything non-trivial one might do with such data is much slower. Thetrend is to write richer (and slower to read) data formats.
        Trolling the internet, however, has not suggested anything about the
specification for an .RData file. Could somebody link me to a specification
or some information that would instruct me on how to construct a .RData
file (either compressed or uncompressed)?

        Also, I am open to other suggestions of how to get load()-like
efficiency in some other way.

Many thanks,
Adam D. I. Kramer
--
Brian D. Ripley,                  rip...@stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Hand-crafting an .RData file

Reply via email to