Hi, > * Steve Lianoglou <[email protected]> [2012-11-26 17:32:21 > -0500]: > >> --8<---------------cut here---------------start------------->8--- >>> f <- data.frame(id=rep(1:3,4),country=rep(6:8,4),delay=1:12) >>> f >> id country delay >> 1 1 6 1 >> 2 2 7 2 >> 3 3 8 3 >> 4 1 6 4 >> 5 2 7 5 >> 6 3 8 6 >> 7 1 6 7 >> 8 2 7 8 >> 9 3 8 9 >> 10 1 6 10 >> 11 2 7 11 >> 12 3 8 12 >>> f <- as.data.table(f) >>> setkey(f,id) >>> delays <- >>> f[,list(min=min(delay),max=max(delay),count=.N,country=unique(country)),by="id"] >>> delays >> id min max count country >> 1: 1 1 10 4 6 >> 2: 2 2 11 4 7 >> 3: 3 3 12 4 8 >> --8<---------------cut here---------------end--------------->8--- >> >> this is still too slow, apparently because of unique. >> how do I speed it up? > > I think I'm missing something. > > Your call to `min(delay)` and `max(delay)` will return the minimum and > maximum delays within the particular "id" you are grouping by. I guess > there must be several values for "country" within each "id" group -- > do you really want the same min and max values to be replicated as > many times as there are unique "country"s?
there is precisely one country for each id. i.e., unique(country) is the same as country[1]. thanks a lot for the suggestion! > R> result <- f[, list(min=min(delay), max=max(delay), > count=.N,country=country[1L]), by="share.id"] -- Sam Steingold (http://sds.podval.org/) on Ubuntu 12.04 (precise) X 11.0.11103000 http://www.childpsy.net/ http://thereligionofpeace.com http://pmw.org.il http://honestreporting.com http://americancensorship.org Why do you never call me back after I scream that I will never talk to you again?! ______________________________________________ [email protected] mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.

