Alexander Shenkin wrote: > Though, from my limited understanding, the 'apply' family of functions > are actually just loops. Please correct me if I'm wrong. So, while > more readable (which is important), they're not necessarily more > efficient than explicit 'for' loops.
Hi Allie -- This uses an R-level loop (and a lot of C loops!), but the length of the loop is only as long as the maximum lag f0 <- function(df0, max_lag) { max_lag <- min(nrow(df0), max_lag) a <- df0[[1]] ans <- df <- df0[,-1, drop=FALSE] for (lag in seq_len(max_lag)) { idx <- diff(a, lag) <= max_lag pad <- logical(lag) ans[c(pad, idx),] <- ans[c(pad, idx),] + df[c(idx, pad),] } cbind(a, ans) } it makes the assumption that 'a' is sorted and unique, as in a time series. This f1 <- function(df0, max_lag) { max_lag <- min(nrow(df0), max_lag) a <- df0[[1]] ans <- df0[,-1, drop=FALSE] lag <- 1 while(sum(idx <- diff(a, lag) <= max_lag) != 0) { pad <- logical(lag) ans[c(pad, idx),] <- ans[c(pad, idx),] + df[c(idx, pad),] lag <- lag + 1 } cbind(a, ans) } relaxes the assumption that 'a' is unique, I think, but I haven't tested carefully; it seems to perform about the same as f0. I think there's a clever recursive solution in there, too. This is my implementation of Phil's solution phil0 <- function(df0, max_lag) { with(df0, { g <- function(x) apply(df0[a - x >= -max_lag & a - x <= 0, c('b','c')], 2, sum) data.frame(a, t(sapply(a, g))) }) } Here's my implementation of Chuck Berry's solution chuck0 <- function(df0, max_lag) { criterion <- as.matrix(dist(df0$a)) <= max_lag & outer(df0$a,df0$a,">=") criterion %*% as.matrix(df0[, c("b","c")]) } Here's a data generator setup <- function(n, m) ## n: number of rows ## m: expected counts per sum { a0 <- sort(sample(seq_len(m * n), n)) data.frame(a=a0, b=as.integer(runif(n, 1, 10)), c=as.integer(runif(n, 1, 10))) } and a comparison with df0 <- setup(10^3, 3) max_lag <- 5 > system.time(f0res <- f0(df0, max_lag), gcFirst=TRUE) user system elapsed 0.016 0.000 0.016 > system.time(phil0res <- phil0(df0, max_lag), gcFirst=TRUE) user system elapsed 0.960 0.000 0.962 > system.time(chuck0res <- chuck0(df0, 5), gcFirst=TRUE) user system elapsed 0.252 0.000 0.254 > all.equal(f0res, phil0res) [1] TRUE > all.equal(as.matrix(f0res[,2:3]), chuck0res, check.attributes=FALSE) [1] TRUE The f0 solution seems to be usable up to about a million rows, > df0 <- setup(10^6, 3) > system.time(f0res <- f0(df0, max_lag), gcFirst=TRUE) user system elapsed 2.680 0.004 2.700 Martin > > allie > > On 9/2/2009 3:13 AM, Phil Spector wrote: >> Here's one way (assuming your data frame is named dat): >> >> with(dat, >> data.frame(a,t(sapply(a,function(x){ >> apply(dat[a - x >= -5 & a - x <= >> 0,c('b','c')],2,sum)})))) >> >> >> - Phil Spector >> Statistical Computing Facility >> Department of Statistics >> UC Berkeley >> spec...@stat.berkeley.edu >> >> >> >> On Tue, 1 Sep 2009, dolar wrote: >> >>> Would like some tips on how to avoid loops as I know they are slow in R >>> >>> i've got a data frame : >>> >>> a b c >>> 1 5 2 >>> 4 6 9 >>> 5 2 3 >>> 8 3 2 >>> >>> What i'd like is to sum for each value of a, the sum of b and the sum >>> of c >>> where a equal to or less than (with a distance of 5) >>> >>> i.e. for row three >>> we have a=5 >>> i'd like to sum up b and sum up c with the above rule >>> since 5, 4 and 1 are less than (within a distance of 5) or equal to >>> 5, then >>> we should get the following result: >>> >>> a b c >>> 5 13 14 >>> >>> the overall result should be >>> a b c >>> 1 5 2 >>> 4 11 11 >>> 5 13 14 >>> 8 11 14 >>> >>> how can i do this without a loop? >>> -- >>> View this message in context: >>> http://www.nabble.com/Avoiding-loops-tp25251376p25251376.html >>> Sent from the R help mailing list archive at Nabble.com. >>> >>> ______________________________________________ >>> R-help@r-project.org mailing list >>> https://stat.ethz.ch/mailman/listinfo/r-help >>> PLEASE do read the posting guide >>> http://www.R-project.org/posting-guide.html >>> and provide commented, minimal, self-contained, reproducible code. >>> >> ______________________________________________ >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.