Loops are not slow, but your code did a lot of unneeded operations in each loop. E.g, you computed D$id==i & D$t==t for each row of D. That involves 2*nrow(D) equality tests for each of the nrow(D) rows, i.e., it is quadratic in N*T.
Then you did a data.frame replacement operation D[k,]$y <- newValue where k is D$id==1&D$t==t. This extracts the k'th row of D, then extracts the 1-row 'y' column from it, replaces it with the new value, then puts that row back into D. If you must use a data.frame, the equivalent D$y[k] <- newValue is probably much faster (data.frames are lists of columns, so replacing a column is fast). Using a matrix to organize things is less flexible, but faster because you don't have to search when you want to find the element for a given id and time - you just do a little arithmetic to get the offset from the start of the matrix. Bill Dunlap TIBCO Software wdunlap tibco.com On Thu, Nov 6, 2014 at 2:05 PM, Matteo Richiardi <matteo.richia...@gmail.com > wrote: > Hi William, > that's super. Thanks a lot. I knew that R is slow with loops, but did not > imagine so slow! B.t.w., what's the reason? > Final question: in your code you have mean(M[t-1L,]): what is the 'L' > for? I removed it at apparently the code produces the same output... > > Thanks again, > Matteo > > On 6 November 2014 18:46, William Dunlap <wdun...@tibco.com> wrote: > >> I find that representing the simulated data as a T row by N column matrix >> allows for a clearer and faster simulation function. E.g., compare the >> output of the following two functions, the first of which uses your code >> and the second a matrix representation (which I convert to a data.frame at >> the end so I can compare outputs easily). I timed both of them for T=10^3 >> times and N=50 individuals; both gave the same results and f1 was 10000 >> times faster than f0: >> > set.seed(1); t0 <- system.time(s0 <- f0(N=50,T=1000)) >> > set.seed(1); t1 <- system.time(s1 <- f1(N=50,T=1000)) >> > rbind(t0, t1) >> user.self sys.self elapsed user.child sys.child >> t0 436.87 0.11 438.48 NA NA >> t1 0.04 0.00 0.04 NA NA >> > all.equal(s0, s1) >> [1] TRUE >> >> The functions are: >> >> f0 <- function(N = 20, T = 100, lambda = 0.1, m_e = 0, sd_e = 1, y0 = 0) >> { >> # construct the data frame and initialize y >> D <- data.frame( >> id = rep(1:N,T), >> t = rep(1:T, each = N), >> y = rep(y0,N*T) >> ) >> >> # update y >> for(t in 2:T){ >> ybar.L1 = mean(D[D$t==t-1,"y"]) >> for(i in 1:N){ >> epsilon = rnorm(1,mean=m_e,sd=sd_e) >> D[D$id==i & D$t==t,]$y = lambda*y0+(1-lambda)*ybar.L1+epsilon >> } >> } >> D >> } >> >> f1 <- function(N = 20, T = 100, lambda = 0.1, m_e = 0, sd_e = 1, y0 = 0) >> { >> # same process simulated using a matrix representation >> # The T rows are times, the N columns are individuals >> M <- matrix(y0, nrow=T, ncol=N) >> if (T > 1) for(t in 2:T) { >> ybar.L1 <- mean(M[t-1L,]) >> epsilon <- rnorm(N, mean=m_e, sd=sd_e) >> M[t,] <- lambda * y0 + (1-lambda)*ybar.L1 + epsilon >> } >> # convert to the data.frame representation that f0 uses >> tM <- t(M) >> data.frame(id = as.vector(row(tM)), t = as.vector(col(tM)), y = >> as.vector(tM)) >> } >> >> >> >> Bill Dunlap >> TIBCO Software >> wdunlap tibco.com >> >> On Thu, Nov 6, 2014 at 6:47 AM, Matteo Richiardi < >> matteo.richia...@gmail.com> wrote: >> >>> I wish to simulate the following stochastic process, for i = 1...N >>> individuals and t=1...T periods: >>> >>> y_{i,t} = y_0 + lambda Ey_{t-1} + epsilon_{i,t} >>> >>> where Ey_{t-1} is the average of y over the N individuals computed at >>> time >>> t-1. >>> >>> My solution (below) works but is incredibly slow. Is there a faster but >>> still clear and readable alternative? >>> >>> Thanks a lot. Matteo >>> >>> rm(list=ls()) >>> library(plyr) >>> y0 = 0 >>> lambda = 0.1 >>> N = 20 >>> T = 100 >>> m_e = 0 >>> sd_e = 1 >>> >>> # construct the data frame and initialize y >>> D = data.frame( >>> id = rep(1:N,T), >>> t = rep(1:T, each = N), >>> y = rep(y0,N*T) >>> ) >>> >>> # update y >>> for(t in 2:T){ >>> ybar.L1 = mean(D[D$t==t-1,"y"]) >>> for(i in 1:N){ >>> epsilon = rnorm(1,mean=m_e,sd=sd_e) >>> D[D$id==i & D$t==t,]$y = lambda*y0+(1-lambda)*ybar.L1+epsilon >>> } >>> } >>> >>> ybar <- ddply(D,~t,summarise,mean=mean(y)) >>> >>> plot(ybar, col = "blue", type = "l") >>> >>> [[alternative HTML version deleted]] >>> >>> ______________________________________________ >>> R-help@r-project.org mailing list >>> https://stat.ethz.ch/mailman/listinfo/r-help >>> PLEASE do read the posting guide >>> http://www.R-project.org/posting-guide.html >>> and provide commented, minimal, self-contained, reproducible code. >>> >> >> > [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.