> There's definitely something amiss with aggregate() here since similar > functions from other packages can reproduce your 'control' sum. I expect > ddply() will have some timing issues because of all the subgrouping in your > data frame, but data.table did very well and the summaryBy() function in the > doBy package did OK:
Well, if you use the right plyr function, it works just fine: system.time(count(dat, c("x1", "x2", "x3", "x4", "x4", "x5", "x6", "x7", "x8"), "y")) # user system elapsed # 9.754 1.314 11.073 Which illustrates something that I've believed for a while about data.table - it's not the indexing that speed things up, it's the custom data structure. If you use ddply with data frames, it's slow because data frames are slow. I think the right way to resolve this is to to make data frames more efficient, perhaps using some kind of mutable interface where necessary for high-performance operations. Hadley -- Assistant Professor / Dobelman Family Junior Chair Department of Statistics / Rice University http://had.co.nz/ ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.