Re: [R] Large data sets with R (binding to hadoop available?)

Thomas Lumley Fri, 22 Aug 2008 09:11:45 -0700

On Thu, 21 Aug 2008, Roland Rau wrote:

Hi
Avram Aelony wrote: (in part)
1. How do others handle situations of large data sets (gigabytes,terabytes) for analysis in R ?
I usually try to store the data in an SQLite database and interface viafunctions from the packages RSQLite (and DBI).
No idea about Question No. 2, though.

Hope this helps,
Roland
P.S. When I am sure that I only need a certain subset of large data sets, Istill prefer to do some pre-processing in awk (gawk).2.P.S. The size of my data sets are in the gigabyte range (not terabyterange). This might be important if your data sets are *really large* and youwant to use sqlite: http://www.sqlite.org/whentouse.html

I use netCDF for (genomic) datasets in the 100Gb range, with the ncdfpackage, because SQLite was too slow for the sort of queries I needed.HDF5 would be another possibility; I'm not sure of the current status ofthe HDF5 support in Bioconductor, though.


        -thomas

Thomas Lumley                   Assoc. Professor, Biostatistics
[EMAIL PROTECTED]       University of Washington, Seattle

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Large data sets with R (binding to hadoop available?)

Reply via email to