Adam, Getting the variance of MLE estimator when the true parameter is on the boundary is a very difficult problem. It is known that the standard bootstrap does not work. There are some sub-sampling approaches (Springer book: Politis, Romano, Wolff), but I am not an expert on this.
However, an important question is how do we know that the truth is supposedly on the boundary. If you can assume that the truth is in the interior, then you can use standard approaches: observed hessian or bootstrap to get standard errors. Best, Ravi From: Adam Zeilinger [mailto:ad...@ucr.edu] Sent: Sunday, December 02, 2012 5:04 PM To: Ravi Varadhan Cc: Adam Zeilinger (zeil0...@umn.edu); r-help@r-project.org Subject: Re: [R] model selection with spg and AIC (or, convert list to fitted model object) Dear Ravi, Thank you so much for the help. I switched to using the optimx function but I continue to use the spg method (for the most part) because I found that only spg consistently converges give different datasets. I also decided to use AIC rather that a likelihood ratio test. I have a new question. I would like to construct 95% confidence intervals for the parameter estimates from the best model. From a previous R Help thread, you said that it was a bad idea to use the Hessian matrix computed from optim/optimx or hessian() [numDeriv package] when the optimization is constrained and parameter estimates are on the boundary because the MLE is likely at a local minimum: http://tolstoy.newcastle.edu.au/R/e15/help/11/09/6673.html In the same thread, you suggest using the Hessian matrix from augmented Lagrangian optimization with auglag() [alabama package] (with some caveats). I would like to construct 95% CI with auglag, but I don't understand how to write the inequality constraints (hin) function. Could you please help me write the hin function? Below is R code for my MLE problem, using data that results in parameter estimates on the boundary, and my unsuccessful attempt at auglag() optimization. Note: I have gradient functions for NLL1 and NLL2 but they're very large and don't seem to improve optimization, so they are not included here. I can supply them if needed for the auglag() function. Also, I am running R 2.15.2 on a Windows 7 64-bit machine. ########################################################################## library(optimx) library(alabama) # define multinomial distribution dmnom2 <- function(x,prob,log=FALSE) { r <- lgamma(sum(x) + 1) + sum(x * log(prob) - lgamma(x + 1)) if (log) r else exp(r) } # data frame y y <- structure(list(t = c(0.167, 0.5, 1, 12, 18, 24, 36), n1 = c(1L, 1L, 1L, 8L, 9L, 10L, 12L), n2 = c(0L, 1L, 2L, 6L, 5L, 3L, 2L), n3 = c(13L, 12L, 11L, 0L, 0L, 1L, 0L)), .Names = c("t", "n1", "n2", "n3"), class = "data.frame", row.names = 36:42) # Negative log-likelihood functions NLL1 <- function(par, y) { p1 <- par[1] p2 <- p1 mu1 <- par[2] mu2 <- mu1 t <- y$t n1 <- y$n1 n2 <- y$n2 n3 <- y$n3 P1 <- (p1*((-1 + exp(sqrt((mu1 + mu2 + p1 + p2)^2 - 4*(mu2*p1 + mu1*(mu2 + p2)))*t))*((-mu2)*(mu2 - p1 + p2) + mu1*(mu2 + 2*p2)) - mu2*sqrt((mu1 + mu2 + p1 + p2)^2 - 4*(mu2*p1 + mu1*(mu2 + p2))) - exp(sqrt((mu1 + mu2 + p1 + p2)^2 - 4*(mu2*p1 + mu1*(mu2 + p2)))*t)* mu2*sqrt((mu1 + mu2 + p1 + p2)^2 - 4*(mu2*p1 + mu1*(mu2 + p2))) + 2*exp((1/2)*(mu1 + mu2 + p1 + p2 + sqrt((mu1 + mu2 + p1 + p2)^2 - 4*(mu2*p1 + mu1*(mu2 + p2))))*t)*mu2* sqrt((mu1 + mu2 + p1 + p2)^2 - 4*(mu2*p1 + mu1*(mu2 + p2)))))/ exp((1/2)*(mu1 + mu2 + p1 + p2 + sqrt((mu1 + mu2 + p1 + p2)^2 - 4*(mu2*p1 + mu1*(mu2 + p2))))*t)/(2*(mu2*p1 + mu1*(mu2 + p2))* sqrt((mu1 + mu2 + p1 + p2)^2 - 4*(mu2*p1 + mu1*(mu2 + p2)))) P2 <- (p2*((-1 + exp(sqrt((mu1 + mu2 + p1 + p2)^2 - 4*(mu2*p1 + mu1*(mu2 + p2)))*t))*(-mu1^2 + 2*mu2*p1 + mu1*(mu2 - p1 + p2)) - mu1*sqrt((mu1 + mu2 + p1 + p2)^2 - 4*(mu2*p1 + mu1*(mu2 + p2))) - exp(sqrt((mu1 + mu2 + p1 + p2)^2 - 4*(mu2*p1 + mu1*(mu2 + p2)))*t)* mu1*sqrt((mu1 + mu2 + p1 + p2)^2 - 4*(mu2*p1 + mu1*(mu2 + p2))) + 2*exp((1/2)*(mu1 + mu2 + p1 + p2 + sqrt((mu1 + mu2 + p1 + p2)^2 - 4*(mu2*p1 + mu1*(mu2 + p2))))*t)*mu1* sqrt((mu1 + mu2 + p1 + p2)^2 - 4*(mu2*p1 + mu1*(mu2 + p2)))))/ exp((1/2)*(mu1 + mu2 + p1 + p2 + sqrt((mu1 + mu2 + p1 + p2)^2 - 4*(mu2*p1 + mu1*(mu2 + p2))))*t)/(2*(mu2*p1 + mu1*(mu2 + p2))* sqrt((mu1 + mu2 + p1 + p2)^2 - 4*(mu2*p1 + mu1*(mu2 + p2)))) P3 <- 1 - P1 - P2 p.all <- c(P1, P2, P3) if(all(p.all > 0 & p.all < 1)) -sum(dmnom2(c(n1, n2, n3), prob = p.all, log = TRUE)) else 1e07 } NLL2 <- function(par, y) { p1 <- par[1] p2 <- par[2] mu1 <- par[3] mu2 <- par[4] t <- y$t n1 <- y$n1 n2 <- y$n2 n3 <- y$n3 P1 <- (p1*((-1 + exp(sqrt((mu1 + mu2 + p1 + p2)^2 - 4*(mu2*p1 + mu1*(mu2 + p2)))*t))*((-mu2)*(mu2 - p1 + p2) + mu1*(mu2 + 2*p2)) - mu2*sqrt((mu1 + mu2 + p1 + p2)^2 - 4*(mu2*p1 + mu1*(mu2 + p2))) - exp(sqrt((mu1 + mu2 + p1 + p2)^2 - 4*(mu2*p1 + mu1*(mu2 + p2)))*t)* mu2*sqrt((mu1 + mu2 + p1 + p2)^2 - 4*(mu2*p1 + mu1*(mu2 + p2))) + 2*exp((1/2)*(mu1 + mu2 + p1 + p2 + sqrt((mu1 + mu2 + p1 + p2)^2 - 4*(mu2*p1 + mu1*(mu2 + p2))))*t)*mu2* sqrt((mu1 + mu2 + p1 + p2)^2 - 4*(mu2*p1 + mu1*(mu2 + p2)))))/ exp((1/2)*(mu1 + mu2 + p1 + p2 + sqrt((mu1 + mu2 + p1 + p2)^2 - 4*(mu2*p1 + mu1*(mu2 + p2))))*t)/(2*(mu2*p1 + mu1*(mu2 + p2))* sqrt((mu1 + mu2 + p1 + p2)^2 - 4*(mu2*p1 + mu1*(mu2 + p2)))) P2 <- (p2*((-1 + exp(sqrt((mu1 + mu2 + p1 + p2)^2 - 4*(mu2*p1 + mu1*(mu2 + p2)))*t))*(-mu1^2 + 2*mu2*p1 + mu1*(mu2 - p1 + p2)) - mu1*sqrt((mu1 + mu2 + p1 + p2)^2 - 4*(mu2*p1 + mu1*(mu2 + p2))) - exp(sqrt((mu1 + mu2 + p1 + p2)^2 - 4*(mu2*p1 + mu1*(mu2 + p2)))*t)* mu1*sqrt((mu1 + mu2 + p1 + p2)^2 - 4*(mu2*p1 + mu1*(mu2 + p2))) + 2*exp((1/2)*(mu1 + mu2 + p1 + p2 + sqrt((mu1 + mu2 + p1 + p2)^2 - 4*(mu2*p1 + mu1*(mu2 + p2))))*t)*mu1* sqrt((mu1 + mu2 + p1 + p2)^2 - 4*(mu2*p1 + mu1*(mu2 + p2)))))/ exp((1/2)*(mu1 + mu2 + p1 + p2 + sqrt((mu1 + mu2 + p1 + p2)^2 - 4*(mu2*p1 + mu1*(mu2 + p2))))*t)/(2*(mu2*p1 + mu1*(mu2 + p2))* sqrt((mu1 + mu2 + p1 + p2)^2 - 4*(mu2*p1 + mu1*(mu2 + p2)))) P3 <- 1 - P1 - P2 p.all <- c(P1, P2, P3) if(all(p.all > 0 & p.all < 1)) -sum(dmnom2(c(n1, n2, n3), prob = p.all, log = TRUE)) else 1e07 } # Optimization with optimx() function op1 <- optimx(par = c(0.1, 0.1), fn = NLL1, y = y, method = c("spg"), lower = rep(0.0001, 2), control = list(all.methods=FALSE, ftol = 1e-20, maxit = 5000)) op2 <- optimx(par = c(0.1, 0.1, 0.1, 0.1), fn = NLL2, y = y, method = c("spg"), lower = rep(0.0001, 4), control = list(all.methods=FALSE, maxit = 5000, ftol = 1e-20)) # AIC corrected for small sample sizes aicc <- function(k, L, n){ # k is a vector of parameter estimates, L is the negative log-likelihood, and n is the sample size kl <- length(k) aic <- 2*kl + 2*L aicc <- aic + ((2*kl*(kl + 1))/(n - kl - 1)) aicc } # AICc values for NLL models 1 and 2 aic1 <- aicc(op1$par[[1]], op1$fvalues[[1]], sum(y$n1[1], y$n2[1], y$n3[1])) aic2 <- aicc(op2$par[[1]], op2$fvalues[[1]], sum(y$n1[1], y$n2[1], y$n3[1])) # Model 2 is best, but one parameter is on the boundary (mu1 = 0.0001) # Augmented Lagrangian optimization hin <- function(x){ h <- rep(NA, 4) h[1] <- 0.0001 + x[1] h[2] <- 0.0001 + x[2] h[3] <- 0.0001 + x[3] h[4] <- 0.0001 + x[4] h } lag.op <- auglag(par = c(0.1, 0.1, 0.1, 0.1), fn = NLL2, hin = hin, y = y) ######################################################################## Any suggestions would be much appreciated. Thanks in advance for any help. Adam Zeilinger On 10/11/2012 2:36 PM, Ravi Varadhan wrote: Adam, See the attached R code that solves your problem and beyond. One important issue is that you are enforcing constraints only indirectly. You need to make sure that P1, P2, and P3 (which are functions of original parameters and time) are all between 0 and 1. It is not enough to impose constraints on original parameters p1, p2, mu1 and mu2. I also show you how to do a likelihood ratio test with the results from spg. You can also do the same for nlminb or Rvmmin. Finally, I also show you how to use optimx to compare different algorithms. This shows you that in addition to spg, you also get very good results (as seen from objective function values and KKT conditions) with a number of other algorithms (e.g., Rvmmin, nlminb), many of which are much faster than spg. This example illustrates the utility of optimx(). I am always surprised as to why more R users doing optimization are not using optimx. This is a very powerful for benchmarking unconstrained and box-constrained optimization problems. It deserves to be used widely, in my biased, but correct, opinion. Ravi Ravi Varadhan, Ph.D. Assistant Professor The Center on Aging and Health Division of Geriatric Medicine & Gerontology Johns Hopkins University rvarad...@jhmi.edu<mailto:rvarad...@jhmi.edu><mailto:rvarad...@jhmi.edu><mailto:rvarad...@jhmi.edu> 410-502-2619 -- Adam Zeilinger Post Doctoral Scholar Department of Entomology University of California Riverside www.linkedin.com/in/adamzeilinger<http://www.linkedin.com/in/adamzeilinger> [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.