Re: [R] Suggestion for big files [was: Re: A comment about R:]

Wensui Liu Fri, 06 Jan 2006 07:30:59 -0800

RG,

Actually, SQLite provides a solution to read *.csv file directly into db.


Just for your consideration.

On 1/5/06, ronggui <[EMAIL PROTECTED]> wrote:
>
> 2006/1/6, jim holtman <[EMAIL PROTECTED]>:
> > If what you are reading in is numeric data, then it would require (807 *
> > 118519 * 8) 760MB just to store a single copy of the object -- more
> memory
> > than you have on your computer.  If you were reading it in, then the
> problem
> > is the paging that was occurring.
> In fact,If I read it in 3 pieces, each is about 170M.
>
> >
> > You have to look at storing this in a database and working on a subset
> of
> > the data.  Do you really need to have all 807 variables in memory at the
> > same time?
>
> Yip,I don't need all the variables.But I don't know how to get the
> necessary  variables into R.
>
> At last I  read the data in piece and use RSQLite package to write it
> to a database.and do then do the analysis. If i am familiar with
> database software, using database (and R) is the best choice,but
> convert the file into database format is not an easy job for me.I ask
> for help in SQLite list,but the solution is not satisfying as that
> required the knowledge about the third script language.After searching
> the internet,I get this solution:
>
> #begin
> rm(list=ls())
> f<-file("D:\wvsevs_sb_v4.csv","r")
> i <- 0
> done <- FALSE
> library(RSQLite)
> con<-dbConnect("SQLite","c:\sqlite\database.db3")
> tim1<-Sys.time()
>
> while(!done){
> i<-i+1
> tt<-readLines(f,2500)
> if (length(tt)<2500) done <- TRUE
> tt<-textConnection(tt)
> if (i==1) {
>            assign("dat",read.table(tt,head=T,sep=",",quote=""));
>          }
> else assign("dat",read.table(tt,head=F,sep=",",quote=""))
> close(tt)
> ifelse(dbExistsTable(con, "wvs"),dbWriteTable(con,"wvs",dat,append=T),
>   dbWriteTable(con,"wvs",dat) )
> }
> close(f)
> #end
> It's not the best solution,but it works.
>
>
>
> > If you use 'scan', you could specify that you do not want some of the
> > variables read in so it might make a more reasonably sized objects.
> >
> >
> > On 1/5/06, FranÃ§ois Pinard <[EMAIL PROTECTED]> wrote:
> > > [ronggui]
> > >
> > > >R's week when handling large data file.  I has a data file : 807
> vars,
> > > >118519 obs.and its CVS format.  Stata can read it in in 2 minus,but
> In
> > > >my PC,R almost can not handle. my pc's cpu 1.7G ;RAM 512M.
> > >
> > > Just (another) thought.  I used to use SPSS, many, many years ago, on
> > > CDC machines, where the CPU had limited memory and no kind of paging
> > > architecture.  Files did not need to be very large for being too
> large.
> > >
> > > SPSS had a feature that was then useful, about the capability of
> > > sampling a big dataset directly at file read time, quite before
> > > processing starts.  Maybe something similar could help in R (that is,
> > > instead of reading the whole data in memory, _then_ sampling it.)
> > >
> > > One can read records from a file, up to a preset amount of them.  If
> the
> > > file happens to contain more records than that preset number (the
> number
> > > of records in the whole file is not known beforehand), already read
> > > records may be dropped at random and replaced by other records coming
> > > from the file being read.  If the random selection algorithm is
> properly
> > > chosen, it can be made so that all records in the original file have
> > > equal probability of being kept in the final subset.
> > >
> > > If such a sampling facility was built right within usual R reading
> > > routines (triggered by an extra argument, say), it could offer
> > > a compromise for processing large files, and also sometimes accelerate
> > > computations for big problems, even when memory is not at stake.
> > >
> > > --
> > > FranÃ§ois Pinard   http://pinard.progiciels-bpi.ca
> > >
> > > ______________________________________________
> > > R-help@stat.math.ethz.ch mailing list
> > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > PLEASE do read the posting guide!
> > http://www.R-project.org/posting-guide.html
> > >
> >
> >
> >
> > --
> > Jim Holtman
> > Cincinnati, OH
> > +1 513 247 0281
> >
> > What the problem you are trying to solve?
>
>
> --
> é»è£è´µ
> Deparment of Sociology
> Fudan University
>
> ______________________________________________
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide!
> http://www.R-project.org/posting-guide.html




--
WenSui Liu
(http://statcompute.blogspot.com)
Senior Decision Support Analyst
Health Policy and Clinical Effectiveness
Cincinnati Children Hospital Medical Center

        [[alternative HTML version deleted]]

______________________________________________
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Re: [R] Suggestion for big files [was: Re: A comment about R:]

Reply via email to