My code seems to be spending most of its time in assignment statements, in some cases simple assignment of a model frame or model matrix.
Can anyone provide any insights into what's going on, or how to speed things up? For starters, is it possible that the reports are not accurate, or that I am misreading them. In R 3.0.1 (running under ESS): > Rprof(line.profiling=TRUE) > system.time(r <- totalEffect(dodata[[1]], dodata[[2]], 1:3, 4)) user system elapsed 21.629 0.756 22.469 !> Rprof(NULL) > summaryRprof(lines="both") $by.self self.time self.pct total.time total.pct box.R#158 6.74 29.56 13.06 57.28 simulator.multinomial.R#64 2.92 12.81 2.96 12.98 simulator.multinomial.R#63 2.76 12.11 2.76 12.11 box.R#171 2.54 11.14 5.08 22.28 simulator.d1.R#70 0.98 4.30 0.98 4.30 simulator.d1.R#71 0.98 4.30 0.98 4.30 densMap.R#42 0.72 3.16 0.86 3.77 "standardGeneric" 0.52 2.28 11.30 49.56 ...... Here's some of the code, with comments at the line numbers box.R: sp <- merge(sexpartner, data, by="studyidx") sp$y <- numFactor(sp$pEthnic) #I think y is not used but must be present data(sims.c1[[k]]) <- sp ###<<<<< line 158 sp0 <- sp sp <- sim(sims.c1[[k]], i) ctable[[k]] <- update.c1(ctable[[k]], sp) if (is.null(i.c1.in)) { i.c1.in <- match("pEthnic", colnames(sp0)) i.c1.out <- match(c("studyidx", "n", "pEthnic"), colnames(sp)) } sp0 <- merge(sp0[,-i.c1.in], sp[,i.c1.out], by=c("studyidx", "n")) # d1 sp0 <- sp0[sp0$pIsMale == 1,] # avoid lots of conversion warnings sp0$pEthnic <- factor(sp0$pEthnic, levels=partRaceLevels) data(sims.d1[[k]]) <- sp0 ###<<<<< line 171 sp <- sim(sims.d1[[k]], i) dtable[[k]] <- update.d1(dtable[[k]], sp) rngstate[[k]] <- .Random.seed The timing seems odd since it doesn't appear there's anything to do at the 2 lines except invoke data<-, but if that's slow I would expect the time to go to the data<- function (in a different file) and not to the call. In fact the other big time items are inside the data<- functions. simulator.multinomial.R: setMethod("data<-", c("simulator.multinomial", "data.frame"), function(obj, value) { mf <- model.frame(obj@dataFormula, data=value) mf$iCluster <- fromOrig(obj@idmap, as.character(mf$studyidx)) if (any(is.na(mf$iCluster))) stop("New studyidx--need to draw from meta distn") mm <- model.matrix(obj@modelFormula, data=mf) obj@data <- mf ##<<< line 63 obj@mm <- mm ##<<< line 64 return(obj) }) The mm and data slots have type restrictions, but no other validation tests. setClass("simulator.multinomial", representation(fit="stanfit", idmap="sIDMap", modelFormula="formula", categories="ANY", # could be factor or character # categories should be in the order of their numeric codes in y # cached results coef="list", data="data.frame", dataFormula="formula", mm="matrix")) Does it matter that, e.g., a model frame is more than a vanilla data frame? I thought assignment, given R's lazy copying behavior, was essentially resetting a pointer, and so should be fast. Or maybe the time is going to garbage collecting the previous contents of the slots? Ross Boylan ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.