On Mon, 23 Aug 2004, Tony Plate wrote: > > One idea I was thinking about was to have a new class of object that > referred to data in a file on disk, and which had all the standard methods > of matrices and arrays, i.e., subsetting ("["), dim, dimnames, etc. The > object in memory would only store the array attributes, while the actual > array data (the elements) would reside in a file. When some extraction > method was called, it would access data in the file and return the > appropriate data. With sensible use of seek operations, the data access > could probably be quite fast. The file format of the object on disk could > possibly be the standard serialized binary format as used in .RData > files. Of course, if the object was larger than would fit in memory, then > trying to extract too large a subarray would exhaust memory, but it should > be possible to efficiently extract reasonably sized subarrays. To be more > useful, one would want want apply() to work with such arrays. That would > be doable, either by creating a new method for apply, or possibly just for > aperm.
This is what RPgSql does with proxy dataframes and what I did (read-only) for netCDF access. It's a good idea if you have a data format for which random access is fairly fast. I'm not sure that the standard serialized binary format satisfies this. Fixed-format text files would work, but free-format ones wouldn't -- seek() only helps when you can work out where to seek without reading all the data. -thomas ______________________________________________ [EMAIL PROTECTED] mailing list https://stat.ethz.ch/mailman/listinfo/r-devel