Here's a skeletal example.  Embellish as needed:
 
p <- 5
n <- 300
set.seed(1)
dat <- cbind(rnorm(n), matrix(runif(n * p), n, p))
write.table(dat, file="c:/temp/big.txt", row=FALSE, col=FALSE)
 
xtx <- matrix(0, p + 1, p + 1)
xty <- numeric(p + 1)
f <- file("c:/temp/big.txt", open="r")
for (i in 1:3) {
    x <- matrix(scan(f, nlines=100), 100, p + 1, byrow=TRUE)
    xtx <- xtx + crossprod(cbind(1, x[, -1]))
    xty <- xty + crossprod(cbind(1, x[, -1]), x[, 1])
}
close(f)
solve(xtx, xty)
coef(lm.fit(cbind(1, dat[,-1]), dat[,1]))  ## check result

unlink("c:/temp/big.txt")  ## clean up.
 
Andy

-----Original Message-----
From: Sachin J [mailto:[EMAIL PROTECTED] 
Sent: Monday, April 24, 2006 5:09 PM
To: Liaw, Andy; R-help@stat.math.ethz.ch
Subject: RE: [R] Handling large dataset & dataframe [Broadcast]


Hi Andy:
 
I searched through R-archive to find out how to handle large data set using
readLines and other related R functions. I couldn't find any single post
which elaborates the process. Can you provide me with an example or any
pointers to the postings elaborating the process. 
 
Thanx in advance
Sachin
 

"Liaw, Andy" <[EMAIL PROTECTED]> wrote:

Instead of reading the entire data in at once, you read a chunk at a time,
and compute X'X and X'y on that chunk, and accumulate (i.e., add) them.
There are examples in "S Programming", taken from independent replies by the
two authors to a post on S-news, if I remember correctly.

Andy

From: Sachin J
> 
> Gabor:
> 
> Can you elaborate more.
> 
> Thanx
> Sachin
> 
> Gabor Grothendieck wrote:
> You just need the much smaller cross product matrix X'X and 
> vector X'Y so you can build those up as you read the data in 
> in chunks.
> 
> 
> On 4/24/06, Sachin J wrote:
> > Hi,
> >
> > I have a dataset consisting of 350,000 rows and 266 columns. Out of 
> > 266 columns 250 are dummy variable columns. I am trying to 
> read this 
> > data set into R dataframe object but unable to do it due to memory 
> > size limitations (object size created is too large to 
> handle in R). Is 
> > there a way to handle such a large dataset in R.
> >
> > My PC has 1GB of RAM, and 55 GB harddisk space running windows XP.
> >
> > Any pointers would be of great help.
> >
> > TIA
> > Sachin
> >
> >
> > ---------------------------------
> >
> > [[alternative HTML version deleted]]
> >
> > ______________________________________________
> > R-help@stat.math.ethz.ch mailing list 
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide! 
> > http://www.R-project.org/posting-guide.html
> >
> 
> 
> 
> ---------------------------------
> 
> [[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help@stat.math.ethz.ch mailing list 
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! 
> http://www.R-project.org/posting-guide.html
> 
> 


----------------------------------------------------------------------------
--
Notice: This e-mail message, together with any attachments, ...{{dropped}}

______________________________________________
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html

Reply via email to