Here's a skeletal example. Embellish as needed: p <- 5 n <- 300 set.seed(1) dat <- cbind(rnorm(n), matrix(runif(n * p), n, p)) write.table(dat, file="c:/temp/big.txt", row=FALSE, col=FALSE) xtx <- matrix(0, p + 1, p + 1) xty <- numeric(p + 1) f <- file("c:/temp/big.txt", open="r") for (i in 1:3) { x <- matrix(scan(f, nlines=100), 100, p + 1, byrow=TRUE) xtx <- xtx + crossprod(cbind(1, x[, -1])) xty <- xty + crossprod(cbind(1, x[, -1]), x[, 1]) } close(f) solve(xtx, xty) coef(lm.fit(cbind(1, dat[,-1]), dat[,1])) ## check result
unlink("c:/temp/big.txt") ## clean up. Andy -----Original Message----- From: Sachin J [mailto:[EMAIL PROTECTED] Sent: Monday, April 24, 2006 5:09 PM To: Liaw, Andy; R-help@stat.math.ethz.ch Subject: RE: [R] Handling large dataset & dataframe [Broadcast] Hi Andy: I searched through R-archive to find out how to handle large data set using readLines and other related R functions. I couldn't find any single post which elaborates the process. Can you provide me with an example or any pointers to the postings elaborating the process. Thanx in advance Sachin "Liaw, Andy" <[EMAIL PROTECTED]> wrote: Instead of reading the entire data in at once, you read a chunk at a time, and compute X'X and X'y on that chunk, and accumulate (i.e., add) them. There are examples in "S Programming", taken from independent replies by the two authors to a post on S-news, if I remember correctly. Andy From: Sachin J > > Gabor: > > Can you elaborate more. > > Thanx > Sachin > > Gabor Grothendieck wrote: > You just need the much smaller cross product matrix X'X and > vector X'Y so you can build those up as you read the data in > in chunks. > > > On 4/24/06, Sachin J wrote: > > Hi, > > > > I have a dataset consisting of 350,000 rows and 266 columns. Out of > > 266 columns 250 are dummy variable columns. I am trying to > read this > > data set into R dataframe object but unable to do it due to memory > > size limitations (object size created is too large to > handle in R). Is > > there a way to handle such a large dataset in R. > > > > My PC has 1GB of RAM, and 55 GB harddisk space running windows XP. > > > > Any pointers would be of great help. > > > > TIA > > Sachin > > > > > > --------------------------------- > > > > [[alternative HTML version deleted]] > > > > ______________________________________________ > > R-help@stat.math.ethz.ch mailing list > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide! > > http://www.R-project.org/posting-guide.html > > > > > > --------------------------------- > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help@stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! > http://www.R-project.org/posting-guide.html > > ---------------------------------------------------------------------------- -- Notice: This e-mail message, together with any attachments, ...{{dropped}} ______________________________________________ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html