"However, it is not known whether the standard errors obtained from this Hessian are asymptotically valid."
Let me rephrase this. I think that as a measure of dispersion, the standard error obtained using the augmented Lagrangian algorithm is correct. However, what is *not known* is the asymptotic distribution of the parameter estimates when constraints are active. This is a "non-regular" situation where MLEs might have strange asymptotic behavior. We cannot generally assume normality and use the standard error estimates to construct confidence intervals or calculate significance levels. Best, Ravi. ------------------------------------------------------- Ravi Varadhan, Ph.D. Assistant Professor, Division of Geriatric Medicine and Gerontology School of Medicine Johns Hopkins University Ph. (410) 502-2619 email: rvarad...@jhmi.edu<mailto:rvarad...@jhmi.edu> From: Ravi Varadhan Sent: Wednesday, September 07, 2011 1:37 PM To: 'gpet...@uark.edu'; 'nas...@uottawa.ca'; 'r-help@r-project.org' Subject: Hessian matrix issue Yes, numDeriv::hessian is very accurate. So, I normally take the output from the optimizer, *if it is a local optimum*, and then apply numDeriv::hessian to it. I, then, compute the standard errors from it. However, it is important to know that you have obtained a local optimum. In fact, you need the gradient and Hessian to actually verify this - the first and second order KKT conditions. However, this is tricky in *constrained* optimization problems. If you have constraints, and at least one of the constraints is active at the best parameter estimate, then the gradient of the original objective function need not be close to zero and the hessian is also incorrect. If you employ an augmented Lagrangian approach (see alabama::auglag), then you can obtain the correct gradient and Hessian at best parameter estimate. These gradients and Hessian correspond to the modified objective function that includes a Lagrangian term augmented by a quadratic penalty term. They can be used to check the KKT conditions. Here is a simple example: require(alabama) fr <- function(x) { ## Rosenbrock Banana function x1 <- x[1] x2 <- x[2] 100 * (x2 - x1 * x1)^2 + (1 - x1)^2 } grr <- function(x) { ## Gradient of 'fr' x1 <- x[1] x2 <- x[2] c(-400 * x1 * (x2 - x1 * x1) - 2 * (1 - x1), 200 * (x2 - x1 * x1)) } p0 <- c(0, 0) ans1 <- optim(par=p0, fn=fr, gr=grr, upper=c(0.9, Inf), method="L-BFGS-B", hessian=TRUE) grr(ans1$par) # this will not be close to zero ans1$hessian # Using an augmented Lagrangian optimizer hcon <- function(x) 0.9 - x[1] hcon.jac <- function(x) matrix(c(-1, 0) , 1, 2) ans2 <- auglag(par=p0, fn=fr, gr=grr, hin=hcon, hin.jac=hcon.jac) ans2$gradient # this will be close to zero ans2$hessian However, it is not known whether the standard errors obtained from this Hessian are asymptotically valid. Best, Ravi. ------------------------------------------------------- Ravi Varadhan, Ph.D. Assistant Professor, Division of Geriatric Medicine and Gerontology School of Medicine Johns Hopkins University Ph. (410) 502-2619 email: rvarad...@jhmi.edu<mailto:rvarad...@jhmi.edu> [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.