Hi Thomas, Here is a comparison of performance times from my own igroupSums versus using split and rowsum:
> x <- rnorm(2e6) > i <- rep(1:1e6,2) > > unix.time(suma <- unlist(lapply(split(x,i),sum))) [1] 8.188 0.076 8.263 0.000 0.000 > > names(suma)<- NULL > > unix.time(sumb <- igroupSums(x,i)) [1] 0.036 0.000 0.035 0.000 0.000 > > all.equal(suma, sumb) [1] TRUE > > unix.time(sumc <- rowsum(x,i)) [1] 0.744 0.000 0.742 0.000 0.000 > > sumc <- sumc[,1] > names(sumc)<-NULL > all.equal(suma,sumc) [1] TRUE So my implementation of igroupSums is faster and already handles NA. I also have implemented igroupMins, igroupMaxs, igroupAnys, igroupAlls, igroupCounts, igroupMeans, and igroupRanges. The igroup functions I implemented do not handle weights yet but do handle NAs properly. Assuming I clean them up, is anyone in the R developer group interested? Or would you rather I instead extend the rowsum appropach to create rowcount, rowmax, rowmin, rowcount, etc using a hash function approach. All of these approaches simply use differently ways to map group codes to integers and then do the functions the same. Thanks, Kevin ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel