This is great!! Sqldf is exactly the kind of thing I was looking for,
other stuff.
I suppose you can speed up both functions 1 and 5 using aggregate and
tapply only once, as was suggested earlier. But it comes at the
expense of readability.
Nikhil
On 9 Dec 2009, at 7:59AM, Titus von der Malsburg wrote:
On Wed, Dec 9, 2009 at 12:11 AM, Gabor Grothendieck
<ggrothendi...@gmail.com> wrote:
Here are a couple of solutions. The first uses by and the second
sqldf:
Brilliant! Now I have a whole collection of solutions. I did a
simple
performance comparison with a data frame that has 7929 lines.
The results were as following (loading appropriate packages is not
included in
the measurements):
times <- c(0.248, 0.551, 41.080, 0.16, 0.190)
names(times) <- c("aggregate","summaryBy","by
+transform","sqldf","tapply")
barplot(times, log="y", ylab="log(s)")
So sqldf clearly wins followed by tapply and aggregate. summaryBy
is slower
than necessary because it computes for x and dur both, mean /and/ sum.
by+transform presumably suffers from the contruction of many
intermediate data
frames.
Are there any canonical places where R-recipes are collected? If
yes I would
write-up a summary.
These were the competitors:
# Gary's and Nikhil's aggregate solution:
aggregate.fixations1 <- function(d) {
idx <- c(TRUE,diff(d$roi)!=0)
d2 <- d[idx,]
idx <- cumsum(idx)
d2$dur <- aggregate(d$dur, list(idx), sum)[2]
d2$x <- aggregate(d$x, list(idx), mean)[2]
d2
}
# Marek's symmaryBy:
library(doBy)
aggregate.fixations2 <- function(d) {
idx <- c(TRUE,diff(d$roi)!=0)
d2 <- d[idx,]
d$idx <- cumsum(idx)
d2$r <- summaryBy(dur+x~idx, data=d, FUN=c(sum,
mean))[c("dur.sum", "x.mean")]
d2
}
# Gabor's by+transform solution:
aggregate.fixations3 <- function(d) {
idx <- cumsum(c(TRUE,diff(d$roi)!=0))
d2 <- do.call(rbind, by(d, idx, function(x)
transform(x, dur = sum(dur), x = mean(x))[1,,drop =
FALSE ]))
d2
}
# Gabor's sqldf solution:
library(sqldf)
aggregate.fixations4 <- function(d) {
idx <- c(TRUE,diff(d$roi)!=0)
d2 <- d[idx,]
d$idx <- cumsum(idx)
d2$r <- sqldf("select sum(dur), avg(x) x from d group by idx")
d2
}
# Titus' solution using plain old tapply:
aggregate.fixations5 <- function(d) {
idx <- c(TRUE,diff(d$roi)!=0)
d2 <- d[idx,]
idx <- cumsum(idx)
d2$dur <- tapply(d$dur, idx, sum)
d2$x <- tapply(d$x, idx, mean)
d2
}
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.