The read.csv.sql function in the sqldf package may make this approach quite simple.
On Thu, Aug 16, 2012 at 10:12 AM, jim holtman <jholt...@gmail.com> wrote: > Why not put this into a database, and then you can easily extract the > records you want specifying the record numbers. You play the one time > expense of creating the database, but then have much faster access to > the data as you make subsequent runs. > > On Thu, Aug 16, 2012 at 9:44 AM, Tudor Medallion > <tudormedall...@googlemail.com> wrote: >> Hello, >> >> I'm most grateful for your time to read this. >> >> I have a uber size 30GB file of 6 million records and 3000 (mostly >> categorical data) columns in csv format. I want to bootstrap subsamples for >> multinomial regression, but it's proving difficult even with my 64GB RAM >> in my machine and twice that swap file , the process becomes super slow >> and halts. >> >> I'm thinking about generating subsample indicies in R and feeding them into >> a system command using sed or awk, but don't know how to do this. If >> someone knew of a clean way to do this using just R commands, I would be >> really grateful. >> >> One problem is that I need to pick complete observations of subsamples, >> that is I need to have all the rows of a particular multinomial observation >> - they are not the same length from observation to observation. I plan to >> use glmnet and then some fancy transforms to get an approximation to the >> multinomial case. One other point is that I don't know how to choose sample >> size to fit around memory limits. >> >> Appreciate your thoughts greatly. >> >> >>> R.version >> >> platform x86_64-pc-linux-gnu >> arch x86_64 >> os linux-gnu >> system x86_64, linux-gnu >> status >> major 2 >> minor 15.1 >> year 2012 >> month 06 >> day 22 >> svn rev 59600 >> language R >> version.string R version 2.15.1 (2012-06-22) >> nickname Roasted Marshmallows >> >> >> tags: read.csv(), system(), awk, sed, sample(), glmnet, multinomial, MASS. >> >> Yoda >> >> [[alternative HTML version deleted]] >> >> ______________________________________________ >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. > > > > -- > Jim Holtman > Data Munger Guru > > What is the problem that you are trying to solve? > Tell me what you want to do, not how you want to do it. > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. -- Gregory (Greg) L. Snow Ph.D. 538...@gmail.com ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.