> -----Original Message----- > > [ronggui] > > >R's week when handling large data file. I has a data file : 807 vars, > >118519 obs.and its CVS format. Stata can read it in in 2 minus,but In > >my PC,R almost can not handle. my pc's cpu 1.7G ;RAM 512M. > > Just (another) thought. I used to use SPSS, many, many years ago, on > CDC machines, where the CPU had limited memory and no kind of paging > architecture. Files did not need to be very large for being too large. > > SPSS had a feature that was then useful, about the capability of > sampling a big dataset directly at file read time, quite before > processing starts. Maybe something similar could help in R (that is, > instead of reading the whole data in memory, _then_ sampling it.) > > One can read records from a file, up to a preset amount of them. If the > file happens to contain more records than that preset number (the number > of records in the whole file is not known beforehand), already read > records may be dropped at random and replaced by other records coming > from the file being read. If the random selection algorithm is properly > chosen, it can be made so that all records in the original file have > equal probability of being kept in the final subset. > > If such a sampling facility was built right within usual R reading > routines (triggered by an extra argument, say), it could offer > a compromise for processing large files, and also sometimes accelerate > computations for big problems, even when memory is not at stake. >
Since I often work with images and other large data sets, I have been thinking about a "BLOb" (binary large object--though it wouldn't necessarily have to be binary) package for R--one that would handle I/O for such creatures and only bring as much data into the R space as was actually needed. So I see 3 possibilities: 1. The sort of functionality you describe is implemented in the R internals (by people other than me). 2. Some individuals (perhaps myself included) write such a package. 3. This thread fizzles out and we do nothing. I guess I will see what, if any, discussion ensues from this point to see which of these three options seems worth pursuing. > -- > François Pinard http://pinard.progiciels-bpi.ca > > ______________________________________________ > R-help@stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! http://www.R-project.org/posting- > guide.html This email message, including any attachments, is for the so...{{dropped}} ______________________________________________ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html