RE: [R] performance gap between R 1.7.1 and 1.8.0

Liaw, Andy Thu, 04 Dec 2003 11:24:30 -0800

> From: Martin Maechler [mailto:[EMAIL PROTECTED] 
[...]
> A very first step of diagnosis might be to activate
>   trace(read.dcf)
>   trace(library)
>   options(verbose = TRUE)
> 
> A step further might be to patch read.dcf such that it prints
> info 
>        if(getOption("verbose")) { <<print what I am trying to read>> }
> 
> Martin


To refresh people's memory, the problem is that the identical code took more
than twice as long in R-1.8.x than in R-1.7.x.  I've managed to strip the
code down to the following and still see the same problem.  It basically
bootstraps mixture models to groups of data.  It uses boot() from the boot
package and flexmix() with intercept only, in the flexmix package.

=============================================
fitmix <- function(x, verbosefit=FALSE) {
  bpunlist <- function(x) {
    unlisted <- numeric()
    for(i in 1:length([EMAIL PROTECTED])) {
      unlisted <- c(unlisted, unlist(parameters(x, component=i)))
    }
    unlisted
  }
  unifit <-  flexmix(x ~ 1, k = 1, control=list(verbose=0))
  unlist(unlist(list(G = 1, proportion = [EMAIL PROTECTED],
                     param=matrix(bpunlist(unifit), nrow=2),
                     bic=BIC(unifit), [EMAIL PROTECTED])))
}

bootmix <- function(data, i) {
  d <- data
  d.grps <- split.data.frame(d[i,], d[i, "id"])
  comps <- vector("list", length=length(d.grps))
  for (j in seq(along=comps)) {
    thefit <- try(fitmix(d.grps[[j]][,1], verbosefit=FALSE))
    comps[[j]] <- if (inherits(thefit, "try-error")) rep(NA, 48) else thefit
  }
  return(unlist(comps))
}

## The actual commands:
require(flexmix)
require(boot)
set.seed(76421)
x <- rnorm(3e3)
id <- gl(10, 300)
mydf <- data.frame(x, id)
Rprof(filename=paste("Rprof.out", system("hostname -s", int=TRUE), sep="."))
system.time(res <- boot(mydf, bootmix, strata=mydf[,"id"], R=5))
Rprof(NULL)
summaryRprof(filename=paste("Rprof.out", system("hostname -s", int=TRUE),
               sep="."))$by.self[1:20,]
=============================================

I ran this on a Xeon 2.4GHz running Mandrake 9.0, R compiled from source.

This took 2 seconds in R-1.7.1, with the output:

                  self.time self.pct total.time total.pct
"FUN"                  0.18        9       0.24        12
"names"                0.16        8       0.16         8
"initialize"           0.10        5       0.80        40
"apply"                0.08        4       0.30        15
"[.data.frame"         0.06        3       0.22        11
"inherits"             0.06        3       0.18         9
"is.null"              0.06        3       0.06         3
"match"                0.06        3       0.22        11
"seq.default"          0.06        3       0.06         3
"structure"            0.06        3       0.08         4
"any"                  0.04        2       0.14         7
"el"                   0.04        2       0.04         2
"FLXfit"               0.04        2       1.18        59
"lapply"               0.04        2       0.22        11
"lm.wfit"              0.04        2       0.16         8
"names<-.default"      0.04        2       0.04         2
"slot<-"               0.04        2       0.18         9
"<"                    0.02        1       0.02         1
"|"                    0.02        1       0.02         1
":"                    0.02        1       0.02         1

... and took over 7 seconds in R-1.8.1, with the output:

                self.time self.pct total.time total.pct
"paste"              0.52      6.7       1.10      14.1
"read.dcf"           0.38      4.9       0.96      12.3
"exists"             0.36      4.6       0.50       6.4
"lapply"             0.36      4.6       1.36      17.4
"names"              0.34      4.4       0.40       5.1
"names<-"            0.32      4.1       0.44       5.6
"inherits"           0.22      2.8       0.34       4.4
"=="                 0.20      2.6       0.20       2.6
".Call"              0.20      2.6       0.20       2.6
"seq"                0.18      2.3       0.50       6.4
"seq.default"        0.18      2.3       0.28       3.6
"dynGet"             0.16      2.1       0.46       5.9
"topenv"             0.16      2.1       0.18       2.3
"unique"             0.16      2.1       1.00      12.8
"apply"              0.14      1.8       0.20       2.6
"as.list"            0.14      1.8       0.18       2.3
".find.package"      0.14      1.8       1.88      24.1
"sapply"             0.14      1.8       1.54      19.7
"any"                0.12      1.5       1.68      21.5
"initialize"         0.12      1.5       4.98      63.8

I then did the following:

> trace(read.dcf, recover)
[1] "read.dcf"
> res <- boot(mydf, bootmix, strata=mydf[,"id"], R=1)
Tracing read.dcf(file.path(package.lib, package, "DESCRIPTION"), fields =
"Namespace") on entry 

Enter a frame number, or 0 to exit   
1:boot(mydf, bootmix, strata = mydf[, "id"], R = 1) 
2:statistic(data, original, ...) 
3:try(fitmix(d.grps[[j]][, 1], verbosefit = FALSE)) 
4:fitmix(d.grps[[j]][, 1], verbosefit = FALSE) 
5:flexmix(x ~ 1, k = 1, control = list(verbose = 0)) 
6:flexmix(x ~ 1, k = 1, control = list(verbose = 0)) 
7:flexmix(formula = formula, data = data, k = k, cluster = cluster, model =
list(F 
8:FLXglm() 
9:new("FLXmodel", weighted = TRUE, formula = formula, name = paste("FLXglm",
famil 
10:initialize(value, ...) 
11:initialize(value, ...) 
12:getClass(Class) 
13:.requirePackage(package) 
14:trySilent(loadNamespace(package)) 
15:eval.parent(call) 
16:eval(expr, p) 
17:eval(expr, envir, enclos) 
18:try(loadNamespace(package)) 
19:loadNamespace(package) 
20:packageHasNamespace(package, package.lib) 
21:read.dcf(file.path(package.lib, package, "DESCRIPTION"), fields =
"Namespace") 

I'm really out of ideas at this point.  Can anyone see the problem given the
above, or at least tell me what to try next?

Best,
Andy

______________________________________________
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help

RE: [R] performance gap between R 1.7.1 and 1.8.0

Reply via email to