On Thu, Mar 19, 2009 at 8:40 PM, jim holtman <jholt...@gmail.com> wrote: > Try this technique. I use it with large data objects since it is > sometime faster, and uses less memory, by using indices: > > x <- read.table(textConnection(" v1 v2 n1 n2 > 1 a a1 1 21 > 2 a a1 2 22 > 3 a a1 3 23 > 4 a a2 4 24 > 5 a a3 5 25 > 6 b b1 6 26 > 7 b b1 7 27 > 8 b b2 8 28 > 9 b b2 9 29 > 10 b b2 10 30 > 11 c c1 11 31 > 12 c c2 12 32 > 13 c c2 13 33 > 14 c c2 14 34 > 15 c c3 15 35 > 16 d d1 16 36 > 17 d d2 17 37 > 18 d d3 18 38 > 19 d d4 19 39 > 20 d d4 20 40"), header=TRUE) > closeAllConnections() > # use indices to reduce memory > x.ind <- split(seq(nrow(x)), list(x$v1, x$v2), drop=TRUE) > # now aggregate using the indices > x.agg <- do.call(rbind, lapply(x.ind, function(.seg){ > data.frame(v1=x$v1[.seg[1]], v2=x$v2[.seg[1]], > n1=sum(x$n1[.seg]), n2=sum(x$n2[.seg])) > }))
This is basically the approach that the plyr package, http://had.co.nz/plyr, uses behind a user-friendly interface. Hadley -- http://had.co.nz/ ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.