Dear Ravi,

Thank you so much for the help.  I switched to using the optimx function 
but I continue to use the spg method (for the most part) because I found 
that only spg consistently converges give different datasets.  I also 
decided to use AIC rather that a likelihood ratio test.

I have a new question.  I would like to construct 95% confidence 
intervals for the parameter estimates from the best model.  From a 
previous R Help thread, you said that it was a bad idea to use the 
Hessian matrix computed from optim/optimx or hessian() [numDeriv 
package] when the optimization is constrained and parameter estimates 
are on the boundary because the MLE is likely at a local minimum:

http://tolstoy.newcastle.edu.au/R/e15/help/11/09/6673.html

In the same thread, you suggest using the Hessian matrix from augmented 
Lagrangian optimization with auglag() [alabama package] (with some 
caveats).  I would like to construct 95% CI with auglag, but I don't 
understand how to write the inequality constraints (hin) function.  
Could you please help me write the hin function?

Below is R code for my MLE problem, using data that results in parameter 
estimates on the boundary, and my unsuccessful attempt at auglag() 
optimization.  Note: I have gradient functions for NLL1 and NLL2 but 
they're very large and don't seem to improve optimization, so they are 
not included here.  I can supply them if needed for the auglag() 
function.  Also, I am running R 2.15.2 on a Windows 7 64-bit machine.

##########################################################################

library(optimx)
library(alabama)

# define multinomial distribution
dmnom2 <- function(x,prob,log=FALSE) {
   r <- lgamma(sum(x) + 1) + sum(x * log(prob) - lgamma(x + 1))
   if (log) r else exp(r)
}

# data frame y

y <- structure(list(t = c(0.167, 0.5, 1, 12, 18, 24, 36), n1 = c(1L,
1L, 1L, 8L, 9L, 10L, 12L), n2 = c(0L, 1L, 2L, 6L, 5L, 3L, 2L),
     n3 = c(13L, 12L, 11L, 0L, 0L, 1L, 0L)), .Names = c("t", "n1",
"n2", "n3"), class = "data.frame", row.names = 36:42)


# Negative log-likelihood functions
NLL1 <- function(par, y) {
   p1 <- par[1]
   p2 <- p1
   mu1 <- par[2]
   mu2 <- mu1
   t <- y$t
   n1 <- y$n1
   n2 <- y$n2
   n3 <- y$n3
   P1 <- (p1*((-1 + exp(sqrt((mu1 + mu2 + p1 + p2)^2 -
         4*(mu2*p1 + mu1*(mu2 + p2)))*t))*((-mu2)*(mu2 - p1 + p2) +
         mu1*(mu2 + 2*p2)) - mu2*sqrt((mu1 + mu2 + p1 + p2)^2 -
         4*(mu2*p1 + mu1*(mu2 + p2))) -
         exp(sqrt((mu1 + mu2 + p1 + p2)^2 - 4*(mu2*p1 + mu1*(mu2 + p2)))*t)*
         mu2*sqrt((mu1 + mu2 + p1 + p2)^2 - 4*(mu2*p1 + mu1*(mu2 + p2))) +
         2*exp((1/2)*(mu1 + mu2 + p1 + p2 + sqrt((mu1 + mu2 + p1 + p2)^2 -
         4*(mu2*p1 + mu1*(mu2 + p2))))*t)*mu2*
         sqrt((mu1 + mu2 + p1 + p2)^2 - 4*(mu2*p1 + mu1*(mu2 + p2)))))/
         exp((1/2)*(mu1 + mu2 + p1 + p2 + sqrt((mu1 + mu2 + p1 + p2)^2 -
         4*(mu2*p1 + mu1*(mu2 + p2))))*t)/(2*(mu2*p1 + mu1*(mu2 + p2))*
         sqrt((mu1 + mu2 + p1 + p2)^2 - 4*(mu2*p1 + mu1*(mu2 + p2))))
   P2 <- (p2*((-1 + exp(sqrt((mu1 + mu2 + p1 + p2)^2 -
         4*(mu2*p1 + mu1*(mu2 + p2)))*t))*(-mu1^2 + 2*mu2*p1 +
         mu1*(mu2 - p1 + p2)) - mu1*sqrt((mu1 + mu2 + p1 + p2)^2 -
         4*(mu2*p1 + mu1*(mu2 + p2))) -
         exp(sqrt((mu1 + mu2 + p1 + p2)^2 - 4*(mu2*p1 + mu1*(mu2 + p2)))*t)*
         mu1*sqrt((mu1 + mu2 + p1 + p2)^2 - 4*(mu2*p1 + mu1*(mu2 + p2))) +
         2*exp((1/2)*(mu1 + mu2 + p1 + p2 + sqrt((mu1 + mu2 + p1 + p2)^2 -
         4*(mu2*p1 + mu1*(mu2 + p2))))*t)*mu1*
         sqrt((mu1 + mu2 + p1 + p2)^2 - 4*(mu2*p1 + mu1*(mu2 + p2)))))/
         exp((1/2)*(mu1 + mu2 + p1 + p2 + sqrt((mu1 + mu2 + p1 + p2)^2 -
         4*(mu2*p1 + mu1*(mu2 + p2))))*t)/(2*(mu2*p1 + mu1*(mu2 + p2))*
         sqrt((mu1 + mu2 + p1 + p2)^2 - 4*(mu2*p1 + mu1*(mu2 + p2))))
   P3 <- 1 - P1 - P2
   p.all <- c(P1, P2, P3)
   if(all(p.all > 0 & p.all < 1)) -sum(dmnom2(c(n1, n2, n3), prob = 
p.all, log = TRUE)) else 1e07
}

NLL2 <- function(par, y) {
   p1 <- par[1]
   p2 <- par[2]
   mu1 <- par[3]
   mu2 <- par[4]
   t <- y$t
   n1 <- y$n1
   n2 <- y$n2
   n3 <- y$n3
   P1 <- (p1*((-1 + exp(sqrt((mu1 + mu2 + p1 + p2)^2 -
         4*(mu2*p1 + mu1*(mu2 + p2)))*t))*((-mu2)*(mu2 - p1 + p2) +
         mu1*(mu2 + 2*p2)) - mu2*sqrt((mu1 + mu2 + p1 + p2)^2 -
         4*(mu2*p1 + mu1*(mu2 + p2))) -
         exp(sqrt((mu1 + mu2 + p1 + p2)^2 - 4*(mu2*p1 + mu1*(mu2 + p2)))*t)*
         mu2*sqrt((mu1 + mu2 + p1 + p2)^2 - 4*(mu2*p1 + mu1*(mu2 + p2))) +
         2*exp((1/2)*(mu1 + mu2 + p1 + p2 + sqrt((mu1 + mu2 + p1 + p2)^2 -
         4*(mu2*p1 + mu1*(mu2 + p2))))*t)*mu2*
         sqrt((mu1 + mu2 + p1 + p2)^2 - 4*(mu2*p1 + mu1*(mu2 + p2)))))/
         exp((1/2)*(mu1 + mu2 + p1 + p2 + sqrt((mu1 + mu2 + p1 + p2)^2 -
         4*(mu2*p1 + mu1*(mu2 + p2))))*t)/(2*(mu2*p1 + mu1*(mu2 + p2))*
         sqrt((mu1 + mu2 + p1 + p2)^2 - 4*(mu2*p1 + mu1*(mu2 + p2))))
   P2 <- (p2*((-1 + exp(sqrt((mu1 + mu2 + p1 + p2)^2 -
         4*(mu2*p1 + mu1*(mu2 + p2)))*t))*(-mu1^2 + 2*mu2*p1 +
         mu1*(mu2 - p1 + p2)) - mu1*sqrt((mu1 + mu2 + p1 + p2)^2 -
         4*(mu2*p1 + mu1*(mu2 + p2))) -
         exp(sqrt((mu1 + mu2 + p1 + p2)^2 - 4*(mu2*p1 + mu1*(mu2 + p2)))*t)*
         mu1*sqrt((mu1 + mu2 + p1 + p2)^2 - 4*(mu2*p1 + mu1*(mu2 + p2))) +
         2*exp((1/2)*(mu1 + mu2 + p1 + p2 + sqrt((mu1 + mu2 + p1 + p2)^2 -
         4*(mu2*p1 + mu1*(mu2 + p2))))*t)*mu1*
         sqrt((mu1 + mu2 + p1 + p2)^2 - 4*(mu2*p1 + mu1*(mu2 + p2)))))/
         exp((1/2)*(mu1 + mu2 + p1 + p2 + sqrt((mu1 + mu2 + p1 + p2)^2 -
         4*(mu2*p1 + mu1*(mu2 + p2))))*t)/(2*(mu2*p1 + mu1*(mu2 + p2))*
         sqrt((mu1 + mu2 + p1 + p2)^2 - 4*(mu2*p1 + mu1*(mu2 + p2))))
   P3 <- 1 - P1 - P2
   p.all <- c(P1, P2, P3)
   if(all(p.all > 0 & p.all < 1)) -sum(dmnom2(c(n1, n2, n3), prob = 
p.all, log = TRUE)) else 1e07
}

# Optimization with optimx() function
op1 <- optimx(par = c(0.1, 0.1),
                    fn = NLL1, y = y,
                    method = c("spg"),
                    lower = rep(0.0001, 2),
                    control = list(all.methods=FALSE,
                                   ftol = 1e-20,
                                   maxit = 5000))

op2 <- optimx(par = c(0.1, 0.1, 0.1, 0.1),
                     fn = NLL2, y = y,
                     method = c("spg"),
                     lower = rep(0.0001, 4),
                     control = list(all.methods=FALSE,
                                    maxit = 5000,
                                    ftol = 1e-20))

# AIC corrected for small sample sizes
aicc <- function(k, L, n){
   # k is a vector of parameter estimates, L is the negative 
log-likelihood, and n is the sample size
   kl <- length(k)
   aic <- 2*kl + 2*L
   aicc <- aic + ((2*kl*(kl + 1))/(n - kl - 1))
   aicc
}

# AICc values for NLL models 1 and 2
aic1 <- aicc(op1$par[[1]], op1$fvalues[[1]], sum(y$n1[1], y$n2[1], y$n3[1]))
aic2 <- aicc(op2$par[[1]], op2$fvalues[[1]], sum(y$n1[1], y$n2[1], y$n3[1]))
# Model 2 is best, but one parameter is on the boundary (mu1 = 0.0001)

# Augmented Lagrangian optimization
hin <- function(x){
   h <- rep(NA, 4)
   h[1] <- 0.0001 + x[1]
   h[2] <- 0.0001 + x[2]
   h[3] <- 0.0001 + x[3]
   h[4] <- 0.0001 + x[4]
   h
}

lag.op <- auglag(par = c(0.1, 0.1, 0.1, 0.1),
                  fn = NLL2, hin = hin, y = y)

########################################################################

Any suggestions would be much appreciated.  Thanks in advance for any help.

Adam Zeilinger



On 10/11/2012 2:36 PM, Ravi Varadhan wrote:
> Adam,
>
> See the attached R code that solves your problem and beyond.  One important 
> issue is that you are enforcing constraints only indirectly.  You need to 
> make sure that P1, P2, and P3 (which are functions of original parameters and 
> time) are all between 0 and 1.  It is not enough to impose constraints on 
> original parameters p1, p2, mu1 and mu2.
>
> I also show you how to do a likelihood ratio test with the results from spg.  
> You can also do the same for nlminb or Rvmmin.
>
> Finally, I also show you how to use optimx to compare different algorithms. 
> This shows you that in addition to spg, you also get very good results (as 
> seen from objective function values and KKT conditions) with a number of 
> other algorithms (e.g., Rvmmin, nlminb), many of which are much faster than 
> spg.
>
> This example illustrates the utility of optimx().  I am always surprised as 
> to why more R users doing optimization are not using optimx.  This is a very 
> powerful for benchmarking unconstrained and box-constrained optimization 
> problems.  It deserves to be used widely, in my biased, but correct, opinion.
>
> Ravi
>
> Ravi Varadhan, Ph.D.
> Assistant Professor
> The Center on Aging and Health
> Division of Geriatric Medicine & Gerontology
> Johns Hopkins University
> rvarad...@jhmi.edu<mailto:rvarad...@jhmi.edu>
> 410-502-2619
>

-- 
Adam Zeilinger
Post Doctoral Scholar
Department of Entomology
University of California Riverside
www.linkedin.com/in/adamzeilinger


        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to