[Rd] combining large list of data.frames

Cole Beck Thu, 19 Apr 2012 23:51:25 -0700

It's normal for me to create a list of data.frames and then usedo.call('rbind', list(...)) to create a single data.frame. However,I've noticed as the size of the list grows large, it is perhaps betterto do this in chunks. As an example here's a list of 20,000 similardata.frames.


# create list of data.frames
dat <- vector("list", 20000)
for(i in seq_along(dat)) {
  size <- sample(1:30, 1)

dat[[i]] <- data.frame(id=rep(i, size), value=rnorm(size),letter=sample(LETTERS, size, replace=TRUE), ind=sample(c(TRUE,FALSE),size, replace=TRUE))

}
# combine into one data.frame, normal usage
# system.time(do.call('rbind', dat)) # takes 2-3 minutes
combine <- function(x, steps=NA, verbose=FALSE) {
  nr <- length(x)
  if(is.na(steps)) steps <- nr
  while(nr %% steps != 0) steps <- steps+1
  if(verbose) cat(sprintf("step size: %s\r\n", steps))
  dl <- vector("list", steps)
  for(i in seq(steps)) {
    ix <- seq(from=(i-1)*nr/steps+1, length.out=nr/steps)
    dl[[i]] <- do.call("rbind", x[ix])
  }
  do.call("rbind", dl)
}
# combine into one data.frame
system.time(combine(dat, 100)) # takes 5-10 seconds

I'm very surprised by this result. Does this improvement seemreasonable? I would think "do.call" could utilize something similar bydefault when the length of "args" is too high. Is using "do.call" notrecommended in this scenario?


Regards,
Cole Beck

______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

[Rd] combining large list of data.frames

Reply via email to