I'd like to fit linear models on very large datasets. My data frames are about 2000000 rows x 200 columns of doubles and I am using an 64 bit build of R. I've googled about this extensively and went over the "R Data Import/Export" guide. My primary issue is although my data represented in ascii form is 4Gb in size (therefore much smaller considered in binary), R consumes about 12Gb of virtual memory.
What exactly are my options to improve this? I looked into the biglm package but the problem with it is it uses update() function and is therefore not transparent (I am using a sophisticated script which is hard to modify). I really liked the concept behind the LM package here: http://www.econ.uiuc.edu/~roger/research/rq/RMySQL.html But it is no longer available. How could one fit linear models to very large datasets without loading the entire set into memory but from a file/database (possibly through a connection) using a relatively simple modification of standard lm()? Alternatively how could one improve the memory usage of R given a large dataset (by changing some default parameters of R or even using on-the-fly compression)? I don't mind much higher levels of CPU time required. Thank you in advance for your help. ______________________________________________ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.