Re: [R] conditionally merging adjacent rows in a data frame

Nikhil Kaza Wed, 09 Dec 2009 05:39:01 -0800

This is great!! Sqldf is exactly the kind of thing I was looking for,other stuff.

I suppose you can speed up both functions 1 and 5 using aggregate andtapply only once, as was suggested earlier. But it comes at theexpense of readability.


Nikhil

On 9 Dec 2009, at 7:59AM, Titus von der Malsburg wrote:

On Wed, Dec 9, 2009 at 12:11 AM, Gabor Grothendieck
<ggrothendi...@gmail.com> wrote:

Here are a couple of solutions. The first uses by and the secondsqldf:

Brilliant! Now I have a whole collection of solutions. I did asimple

performance comparison with a data frame that has 7929 lines.

The results were as following (loading appropriate packages is notincluded in

the measurements):

times <- c(0.248, 0.551, 41.080, 0.16, 0.190)

names(times) <- c("aggregate","summaryBy","by+transform","sqldf","tapply")

barplot(times, log="y", ylab="log(s)")

So sqldf clearly wins followed by tapply and aggregate. summaryByis slower

than necessary because it computes for x and dur both, mean /and/ sum.

by+transform presumably suffers from the contruction of manyintermediate data

frames.

Are there any canonical places where R-recipes are collected? Ifyes I would

write-up a summary.

These were the competitors:

# Gary's and Nikhil's aggregate solution:

aggregate.fixations1 <- function(d) {

  idx  <- c(TRUE,diff(d$roi)!=0)
  d2     <- d[idx,]

  idx  <- cumsum(idx)
  d2$dur <- aggregate(d$dur, list(idx), sum)[2]
  d2$x   <- aggregate(d$x, list(idx), mean)[2]

  d2
}

# Marek's symmaryBy:

library(doBy)

aggregate.fixations2 <- function(d) {

  idx  <- c(TRUE,diff(d$roi)!=0)
  d2     <- d[idx,]

  d$idx  <- cumsum(idx)
  d2$r <- summaryBy(dur+x~idx, data=d, FUN=c(sum,
mean))[c("dur.sum", "x.mean")]
  d2
}

# Gabor's by+transform solution:

aggregate.fixations3 <- function(d) {

  idx  <- cumsum(c(TRUE,diff(d$roi)!=0))

  d2 <- do.call(rbind, by(d, idx, function(x)

transform(x, dur = sum(dur), x = mean(x))[1,,drop =FALSE ]))


  d2
}

# Gabor's sqldf solution:

library(sqldf)

aggregate.fixations4 <- function(d) {

  idx  <- c(TRUE,diff(d$roi)!=0)
  d2     <- d[idx,]

  d$idx  <- cumsum(idx)
  d2$r <- sqldf("select sum(dur), avg(x) x from d group by idx")

  d2
}

# Titus' solution using plain old tapply:

aggregate.fixations5 <- function(d) {

  idx  <- c(TRUE,diff(d$roi)!=0)
  d2     <- d[idx,]

  idx  <- cumsum(idx)
  d2$dur <- tapply(d$dur, idx, sum)
  d2$x <- tapply(d$x, idx, mean)

  d2
}

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] conditionally merging adjacent rows in a data frame

Reply via email to