When working with datasets too large to fit in memory it is usually
best to use an actual database, read the data into the database, then
pull the records that you want into R.  There are several packages for
working with databases, but 2 of the simplest are the RSQLite and
sqldf packages (installing them will install the database backend for
you).  The read.csv.sql function in the sqldf package will read in a
csv file by first reading it into the database, then pulling the
desired subset (you need to know some basic sql) into R, all the
database stuff is handled in the background for you.

On Thu, Sep 18, 2014 at 5:48 PM, Stephen HK Wong <hon...@stanford.edu> wrote:
> Dear All,
>
> I have a table of 4 columns and many millions rows separated by 
> tab-delimited. I don't have enough memory to read.table in that 1 Gb file. 
> And actually I have 12 text files like that. Is there a way that I can just 
> randomly read.table() in 10% of rows ? I was able to do that using colbycol 
> package, but it is not not available. Many thanks!!
>
>
>
> Stephen HK Wong
> Stanford, California 94305-5324
>
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



-- 
Gregory (Greg) L. Snow Ph.D.
538...@gmail.com

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to