Your function named 'gradient' is not the correct gradient. Take as an example the following point x0, very near to the true minimum,
x0 <- c(-0.2517964, 0.4898680, -0.2517962, 0.4898681, 0.7500995) then you get > gradient(x0) [1] -0.0372110470 0.0001816991 -0.0372102284 0.0001820976 0.0144292657 but the numerical gradient is different: > library(numDeriv) > grad(fn, x0) [1] -6.151645e-07 -5.507219e-07 1.969143e-07 -1.563892e-07 -4.955502e-08 that is the derivative is close to 0 in any direction -- as it should be for an optimum. No wonder, optim et al. get confused when applying this 'gradient'. Regards Hans Werner Doran, Harold wrote: > > I have constructed the function mml2 (below) based on the likelihood > function described in the minimal latex I have pasted below for anyone who > wants to look at it. This function finds parameter estimates for a basic > Rasch (IRT) model. Using the function without the gradient, using either > nlminb or optim returns the correct parameter estimates and, in the case > of optim, the correct standard errors. > > By correct, I mean they match another software program as well as the > rasch() function in the ltm package. > > Now, when I pass the gradient to optim, I get a message of successful > convergence, but the parameter estimates are not correct, but they are > *very* close to being correct. But, when I use nlminb with the gradient, I > get a message of false convergence and again, the estimates are off, but > again very close to being correct. > > This is best illustrated via the examples: > > ### Sample data set > set.seed(1234) > tmp <- data.frame(item1 = sample(c(0,1), 20, replace=TRUE), item2 = > sample(c(0,1), 20, replace=TRUE), item3 = sample(c(0,1), 20, > replace=TRUE),item4 = sample(c(0,1), 20, replace=TRUE),item5 = > sample(c(0,1), 20, replace=TRUE)) > > ## Use function mml2 (below) with optim with use of gradient >> mml2(tmp,Q=10) > $par > [1] -0.2438733 0.4889333 -0.2438733 0.4889333 0.7464162 > > $value > [1] 63.86376 > > $counts > function gradient > 45 6 > > $convergence > [1] 0 > > $message > NULL > > $hessian > [,1] [,2] [,3] [,4] [,5] > [1,] 4.095479 0.000000 0.000000 0.000000 0.000000 > [2,] 0.000000 3.986293 0.000000 0.000000 0.000000 > [3,] 0.000000 0.000000 4.095479 0.000000 0.000000 > [4,] 0.000000 0.000000 0.000000 3.986293 0.000000 > [5,] 0.000000 0.000000 0.000000 0.000000 3.800898 > > ## Use same function but use nlminb with use of gradient >> mml2(tmp,Q=10) > $par > [1] -0.2456398 0.4948889 -0.2456398 0.4948889 0.7516308 > > $objective > [1] 63.86364 > > $convergence > [1] 1 > > $message > [1] "false convergence (8)" > > $iterations > [1] 4 > > $evaluations > function gradient > 41 4 > > > ### use nlminb but turn off use of gradient >> mml2(tmp,Q=10) > $par > [1] -0.2517961 0.4898682 -0.2517961 0.4898682 0.7500994 > > $objective > [1] 63.8635 > > $convergence > [1] 0 > > $message > [1] "relative convergence (4)" > > $iterations > [1] 8 > > $evaluations > function gradient > 11 64 > > ### Use optim and turn off gradient > >> mml2(tmp,Q=10) > $par > [1] -0.2517990 0.4898676 -0.2517990 0.4898676 0.7500906 > > $value > [1] 63.8635 > > $counts > function gradient > 22 7 > > $convergence > [1] 0 > > $message > NULL > > $hessian > [,1] [,2] [,3] [,4] [,5] > [1,] 3.6311153 -0.3992959 -0.4224747 -0.3992959 -0.3764526 > [2,] -0.3992959 3.5338195 -0.3992959 -0.3960956 -0.3798141 > [3,] -0.4224747 -0.3992959 3.6311153 -0.3992959 -0.3764526 > [4,] -0.3992959 -0.3960956 -0.3992959 3.5338195 -0.3798141 > [5,] -0.3764526 -0.3798141 -0.3764526 -0.3798141 3.3784816 > > The parameter estimates with and without the use of the gradient are so > close that I am inclined to believe that the gradient is correct and maybe > the problem is elsewhere. > > It seems odd that optim seems to converge but nlminb does not with the use > of the gradient. But, with the use of the gradient in either case, the > parameter estimates differ from what I think are the correct values. So, > at this point I am unclear if the problem is somewhere in the way the > functions are used or how I am passing the gradient or if the problem lies > in the way I have constructed the gradient itself. > > Below is the function and also some latex for those interested in looking > at the likelihood function. > > Thanks for any reactions > Harold > >> sessionInfo() > R version 2.10.0 (2009-10-26) > i386-pc-mingw32 > > locale: > [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United > States.1252 > [3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C > [5] LC_TIME=English_United States.1252 > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > other attached packages: > [1] ltm_0.9-2 polycor_0.7-7 sfsmisc_1.0-9 mvtnorm_0.9-8 msm_0.9.4 > MASS_7.3-3 MiscPsycho_1.5 > [8] statmod_1.4.1 > > loaded via a namespace (and not attached): > [1] splines_2.10.0 survival_2.35-7 tools_2.10.0 > > > mml2 <- function(data, Q, startVal = NULL, gr = TRUE, ...){ > if(!is.null(startVal) && length(startVal) != ncol(data) ){ > stop("Length of argument startVal not > equal to the number of parameters estimated") > } > if(!is.null(startVal)){ > startVal <- startVal > } else { > p <- colMeans(data) > startVal <- as.vector(log((1 - p)/p)) > } > qq <- gauss.quad.prob(Q, dist = 'normal') > rr1 <- matrix(0, nrow = Q, ncol = nrow(data)) > data <- as.matrix(data) > L <- nrow(data) > C <- ncol(data) > fn <- function(b){ > for(j in 1:Q){ > for(i in 1:nrow(data)){ > rr1[j,i] > <- exp(sum(dbinom(data[i,], 1, plogis(qq$nodes[j]-b), log = TRUE))) * > > qq$weights[j] > } > } > -sum(log(colSums(rr1))) > } > gradient <- function(b){ > p <- outer(qq$nodes, b, plogis) * L > x <- t(matrix(colSums(data), nrow= C, Q)) > w <- matrix(qq$weights, Q, C) > rr2 <- (-x + p) * w > -colSums(rr2) > } > opt <- optim(startVal, fn, gr = gradient, hessian = TRUE, > method = "BFGS") > #opt <- nlminb(startVal, fn, gradient=gradient) > #list("coefficients" = opt$par, "LogLik" = -opt$value, > "Std.Error" = 1/sqrt(diag(opt$hessian))) > opt > } > > > > ### latex describing the likelihood function > \documentclass{article} > \usepackage{latexsym} > \usepackage{amssymb,amsmath,bm} > \newcommand{\Prb}{\operatorname{Prob}} > \title{Marginal Maximum Likelihood for IRT Models} > \author{Harold C. Doran} > \begin{document} > \maketitle > The likelihood function is written as: > \begin{equation} > \label{eqn:mml} > f(x) = > \prod^N_{i=1}\int_{\Theta}\prod^K_{j=1}p(x|\theta,\beta)f(\theta)d\theta > \end{equation} > \noindent where $N$ indexes the total number of individuals, $K$ indexes > the total number of items, $p(x|\theta,\beta)$ is the data likelihood and > $f(\theta)$ is a population distribution. For the rasch model, the data > likelihood is: > \begin{equation} > p(x|\theta,\beta) = \prod^N_{i=1}\prod^K_{j=1} \Pr(x_{ij} = 1 | \theta_i, > \beta_j)^{x_{ij}} \left[1 - \Pr(X_{ij} = 1 | \theta_i, > \beta_j)\right]^{(1-x_{ij})} > \end{equation} > \begin{equation} > \label{rasch} > \Pr(x_{ij} = 1 | \theta_i, \beta_j) = \frac{1}{1 + > e^{-(\theta_i-\beta_j)}} \quad i = (1, \ldots, K); j = (1, \ldots, N) > \end{equation} > \noindent where $\theta_i$ is the ability estimate of person $i$ and > $\beta_j$ is the location parameter for item $j$. The integral in > Equation~(\ref{eqn:mml}) has no closed form expression, so it is > approximated using Gauss-Hermite quadrature as: > \begin{equation} > \label{eqn:mml:approx} > f(x) \approx > \prod^N_{i=1}\sum_{q=1}^{Q}\prod^K_{j=1}p(x|\theta_q,\beta)w_q > \end{equation} > \noindent where $q$ indexes the node at quadrature point $q$ and $w$ is > the weight at quadrature point $q$. With Equation~(\ref{eqn:mml:approx}), > the remaining challenge is to find $\underset{x}{\operatorname{arg\,max}} > \, f(x)$. > \end{document} > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > > -- View this message in context: http://n4.nabble.com/optim-or-nlminb-for-minimization-which-to-believe-tp930841p930942.html Sent from the R help mailing list archive at Nabble.com. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.