> -----Original Message----- > From: [EMAIL PROTECTED] > [mailto:[EMAIL PROTECTED] On Behalf Of Spencer Graves > Sent: Monday, March 08, 2004 11:21 AM > To: Prof Brian Ripley > Cc: Fred J.; r-help; Peter Dalgaard > Subject: Re: [R] applying data generating function > > > With "gc()" right before each call to "proc.time", as > Brian Ripley > and Gabor Grothendieck suggested, the times were substantially more > stable. For the for loop, extending the vector with each of 1e5 > iterations, I got 181.25, 181.27, 182.72, 182.44, and 182.56. The > averages of the last 3 of these tests are as follows: > > 10 100 1000 10000 1e+05 > for loop 0 0.01 0.05 1.13 182.14 > gen e + for loop 0 0.00 0.03 0.26 2.58 > create storage + for loop 0 0.00 0.04 0.39 3.94 > sapply 0 0.00 0.03 0.32 4.05 > replicate 0 0.00 0.03 0.31 3.55 > > Without "gc()", I got 192.05, 182.02, 126.04, 130.30, and 118.64 for > extending the vector with each for loop iteration. > > Three more observations about this: > > 1. Without "gc()", the times started higher but declined by > roughly a third. This suggests that R may actually be storing > intermediate "semi-compiled" code in "garbage" and using it when the > situation warrants -- but "gc()" discards it. > > 2. Increasing N from 1e4 to 1e5 increased the time NOT by a > factor of 10 but by a factor of 161 = 182/1.13 when the length of the > vector was extended in each iteration.
Growing a vector inside for loop is expensive and inefficient because R needs to allocate a new vector of sufficient length, copy the data over, then de-reference the old vector. (This my understanding of how it works. Hopefully I'm not too far off...) HTH, Andy > 3. The fastest was indeed to generate "e" and allocate all > required storage before entering the loop, but the major > improvement was > due to allocation of storage before initiating the for loop. > However, > with 1000 or fewer iterations, the difference was hardly detectable. > > Best Wishes, > spencer graves > > Prof Brian Ripley wrote: > > >You need to run gc() before running such timings in R, as > the first run > >often has to pay for a level-0 garbage collection. That is > normally the > >cause of (1), although I haven't seen differences as large > as 10 secs (but > >have no idea of the speed of your machine, and have seen 3 secs). > > > >On Sun, 7 Mar 2004, Spencer Graves wrote: > > > > > > > >> Peter's enumeration of alternatives inspired me to > compare compute > >>times for N = 10^(2:5), with the following results: > >> > >>*** R 1.8.1 under Windows 2000, IBM Thinkpad T30: > >> 10 100 1000 10000 1e+05 > >>for loop 0 0.01 0.09 1.27 192.05 > >>gen e + for loop 0 0.00 0.03 0.22 2.58 > >>create storage + for loop 0 0.01 0.05 0.34 3.45 > >>sapply 0 0.00 0.04 0.28 3.82 > >>replicate 0 0.01 0.05 0.29 4.02 > >> > >> I repeated this with the "for loop" both first and last. The > >>times tended to decline on replication, with the "for loop" > time for N = > >>1e5 = 182.02, 126.04 (with the "for loop" last), 130.30 ("for loop" > >>last), and 118.64 ("for loop" first again). > >> > >> Conclusions: > >> > >> (1) Apparently, in some cases, R picks up speed upon > replication > >> > >> (2) The first 3 times for the "for loop" with N = 1e5 made me > >>wonder if there was an order effect, with the "for loop" > being longer in > >>the first position. However, the last run with the "for > loop" again > >>first had the shortest time of 118.64, contradicting that > hypothesis. > >> > >> By comparison, I also tried this under S-Plus 6.2: > >> > >>*** S-Plus 6.2, Windows 2000, IBM Thinkpad T30 ("for loop" first): > >> 10 100 1000 10000 100000 > >> for loop 0.01 0.05 0.331 3.976 273.073 > >> gen e + for loop 0.00 0.04 0.320 3.154 29.112 > >>create storage + for loop 0.01 0.03 0.231 2.113 22.242 > >> sapply 0.00 0.04 0.380 4.757 23.003 > >> > >> The script I used appears below. As Peter said, "the > only really > >>crucial [issue] is to avoid the inefficient append by > preallocating" the > >>vectors to be generated. Moreover, this is only an issue > for long loop, > >>with a threshold of between 1e4 and 1e5 in this example. > For shorter > >>loops, the programmers' time is far more valuable. > >> > >>Enjoy. spencer graves > >>#################### > >> > >> > >>N.gen <- c(10, 100, 1000, 10000, 1e5) > >>mtds <- c("for loop", "gen e + for loop", "create storage + > for loop", > >> "sapply", "replicate") > >>m <- length(N.gen) > >>ellapsed.time <- array(NA, dim=c(m, length(mtds))) > >>dimnames(ellapsed.time) <- list(N.gen, mtds) > >> > >>for(iN in 1:m){ > >> cat("\n", iN, "") > >> N <- N.gen[iN] > >>#for loop > >>set.seed(123) > >>start.time <- proc.time() > >>f<-function (x.) { 3.8*x.*(1-x.) + rnorm(1,0,.001) } > >>v=c() > >>x=.1 # starting point > >>for (i in 1:N) { x=f(x); v=append(v,x) } > >>ellapsed.time[iN, "for loop"] <- (proc.time()-start.time)[3] > >>cat(mtds[1], "") > >> > >>#gen e + for loop > >>set.seed(123) > >>start.time <- proc.time() > >>e <- 0.001*rnorm(N) > >>X <- rep(0.1, N+1) > >>for(i in 2:(N+1)) > >> X[i] <- (3.8*X[i-1]*(1-X[i-1])+e[i-1]) > >>ellapsed.time[iN, "gen e + for loop"] <- (proc.time()-start.time)[3] > >>cat(mtds[2], "") > >> > >>#create storage + for loop > >>set.seed(123) > >>start.time <- proc.time() > >>V <- numeric(N) > >>xv <- .1 ; for (i in 1:N) { xv <- f(xv); V[i] <- xv } > >>ellapsed.time[iN, "create storage + for loop"] <- > >>(proc.time()-start.time)[3] > >>cat(mtds[3], "") > >> > >>#sapply > >>set.seed(123) > >>start.time <- proc.time() > >>xa <- .1 ; va <- sapply(1:N, function(i) xa <<- f(xa)) > >>ellapsed.time[iN, "sapply"] <- (proc.time()-start.time)[3] > >>cat(mtds[4], "") > >> > >>if(!is.null(version$language)){ > >>#replicate > >>set.seed(123) > >>start.time <- proc.time() > >>z <- .1 ; vr <- replicate(N, z <<- f(z)) > >>ellapsed.time[iN, "replicate"] <- (proc.time()-start.time)[3] > >>cat(mtds[5], "") > >>} > >> > >>} > >> > >>t(ellapsed.time) > >>############################# > >>Peter Dalgaard wrote: > >> > >> > >> > >>>Christophe Pallier <[EMAIL PROTECTED]> writes: > >>> > >>> > >>> > >>> > >>> > >>>>Fred J. wrote: > >>>> > >>>> > >>>> > >>>> > >>>> > >>>>>I need to generate a data set based on this equation > >>>>>X(t) = 3.8x(t-1) (1-x(t-1)) + e(t), where e(t) is a > >>>>>N(0,0,001) random variable > >>>>>I need say 100 values. > >>>>> > >>>>>How do I do this? > >>>>> > >>>>> > >>>>> > >>>>> > >>>>I assume X(t) and x(t) are the same (?). > >>>> > >>>>f<-function (x) { 3.8*x*(1-x) + rnorm(1,0,.001) } > >>>>v=c() > >>>>x=.1 # starting point > >>>>for (i in 1:100) { x=f(x); v=append(v,x) } > >>>> > >>>>There may be smarter ways... > >>>> > >>>> > >>>> > >>>> > >>>Yes, but the only really crucial one is to avoid the > inefficient append by > >>>preallocating the v: > >>> > >>>v <- numeric(100) > >>>x <- .1 ; for (i in 1:100) { x <- f(x); v[i] <- x } > >>> > >>>apart from that you can use implicit loops: > >>> > >>>x <- .1 ; v <- sapply(1:100, function(i) x <<- f(x)) > >>> > >>>or > >>> > >>>z <- .1 ; v <- replicate(100, z <<- f(z)) > >>> > >>>(You cannot use x there because of a variable capture > issue which is a > >>>bit of a bug. I intend to fix it for 1.9.0.) > >>> > >>> > >>> > >>> > >>> > >>______________________________________________ > >>[EMAIL PROTECTED] mailing list > >>https://www.stat.math.ethz.ch/mailman/listinfo/r-help > >>PLEASE do read the posting guide! > http://www.R-project.org/posting-guide.html > >> > >> > >> > >> > > > > > > > > > ______________________________________________ > [EMAIL PROTECTED] mailing list > https://www.stat.math.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! > http://www.R-project.org/posting-guide.html > > ------------------------------------------------------------------------------ Notice: This e-mail message, together with any attachments,...{{dropped}} ______________________________________________ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html