This is great!! Sqldf is exactly the kind of thing I was looking for, other stuff.

I suppose you can speed up both functions 1 and 5 using aggregate and tapply only once, as was suggested earlier. But it comes at the expense of readability.

Nikhil

On 9 Dec 2009, at 7:59AM, Titus von der Malsburg wrote:

On Wed, Dec 9, 2009 at 12:11 AM, Gabor Grothendieck
<ggrothendi...@gmail.com> wrote:
Here are a couple of solutions. The first uses by and the second sqldf:

Brilliant! Now I have a whole collection of solutions. I did a simple
performance comparison with a data frame that has 7929 lines.

The results were as following (loading appropriate packages is not included in
the measurements):

times <- c(0.248, 0.551, 41.080, 0.16, 0.190)
names(times) <- c("aggregate","summaryBy","by +transform","sqldf","tapply")
barplot(times, log="y", ylab="log(s)")

So sqldf clearly wins followed by tapply and aggregate. summaryBy is slower
than necessary because it computes for x and dur both, mean /and/ sum.
by+transform presumably suffers from the contruction of many intermediate data
frames.

Are there any canonical places where R-recipes are collected? If yes I would
write-up a summary.

These were the competitors:

# Gary's and Nikhil's aggregate solution:

aggregate.fixations1 <- function(d) {

  idx  <- c(TRUE,diff(d$roi)!=0)
  d2     <- d[idx,]

  idx  <- cumsum(idx)
  d2$dur <- aggregate(d$dur, list(idx), sum)[2]
  d2$x   <- aggregate(d$x, list(idx), mean)[2]

  d2
}

# Marek's symmaryBy:

library(doBy)

aggregate.fixations2 <- function(d) {

  idx  <- c(TRUE,diff(d$roi)!=0)
  d2     <- d[idx,]

  d$idx  <- cumsum(idx)
  d2$r <- summaryBy(dur+x~idx, data=d, FUN=c(sum,
mean))[c("dur.sum", "x.mean")]
  d2
}

# Gabor's by+transform solution:

aggregate.fixations3 <- function(d) {

  idx  <- cumsum(c(TRUE,diff(d$roi)!=0))

  d2 <- do.call(rbind, by(d, idx, function(x)
transform(x, dur = sum(dur), x = mean(x))[1,,drop = FALSE ]))

  d2
}

# Gabor's sqldf solution:

library(sqldf)

aggregate.fixations4 <- function(d) {

  idx  <- c(TRUE,diff(d$roi)!=0)
  d2     <- d[idx,]

  d$idx  <- cumsum(idx)
  d2$r <- sqldf("select sum(dur), avg(x) x from d group by idx")

  d2
}

# Titus' solution using plain old tapply:

aggregate.fixations5 <- function(d) {

  idx  <- c(TRUE,diff(d$roi)!=0)
  d2     <- d[idx,]

  idx  <- cumsum(idx)
  d2$dur <- tapply(d$dur, idx, sum)
  d2$x <- tapply(d$x, idx, mean)

  d2
}

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to