> I didn't dispute whether '%>%' may be useful -- I just pointed out that it > is slow. However, it is only part of the problem: 'filter()' and > 'select()', although aesthetically pleasing, also seem to be slow: > >> all.states <- data.frame(state.x77, Name = rownames(state.x77)) >> >> f1 <- function() > + all.states[all.states$Frost > 150, c("Name", "Frost")] >> >> f2 <- function() > + subset(all.states, Frost > 150, select = c("Name", "Frost")) >> >> f3 <- function() { > + filt <- subset(all.states, Frost > 150) > + subset(filt, select = c("Name", "Frost")) > + } >> >> f4 <- function() > + all.states %>% subset(Frost > 150) %>% > + subset(select = c("Name", "Frost")) >> >> f5 <- function() > + select(filter(all.states, Frost > 150), Name, Frost) >> >> f6 <- function() > + all.states %>% filter(Frost > 150) %>% select(Name, Frost) >> >> mb <- microbenchmark( > + f1(), f2(), f3(), f4(), f5(), f6(), > + times = 1000L > + ) >> print(mb, signif = 3L) > Unit: microseconds > expr min lq mean median uq max neval cld > f1() 115 124 134.8812 129 134 1500 1000 a > f2() 128 141 147.4694 145 151 1520 1000 a > f3() 303 328 344.3175 338 348 1740 1000 b > f4() 458 494 518.0830 510 523 1890 1000 c > f5() 806 848 887.7270 875 894 3510 1000 d > f6() 971 1010 1056.5659 1040 1060 3110 1000 e > > So, using '%>%', but leaving 'filter()' and 'select()' out of the equation, > as in 'f4()' is only half as bad as the "full" 'dplyr' idiom in 'f6()'. In > this case, since we're talking microseconds, the speed-up is negligible but > that *is* beside the point.
When benchmarking it's important to consider both the relative and absolute difference and to think about how the cost scales as the data grows - the cost of using using %>% is fixed, and 500 µs doesn't seem like a huge performance penalty to me. Hadley -- http://had.co.nz/ ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.