Hi Harold, Hans is right. You can see this if you use BB::spg
require(BB) opt2 <- spg(startVal, fn) # this is fine opt3 <- spg(startVal, fn,gradient) # this is not fine! > opt3 <- spg(startVal, fn, gradient) Gradient check details: max. relative difference in gradients= 0.001697913 analytic gradient: 0.1724284 -0.3382045 0.1724284 -0.3382045 -0.4902559 numerical gradient: 0.2826690 -0.2569404 0.2826690 -0.2569404 -0.4223231 Error in spg(startVal, fn, gradient) : Analytic gradient does not seem correct! See comparison above. Fix it, remove it, or increase checkGrad.tol. > Ravi. ____________________________________________________________________ Ravi Varadhan, Ph.D. Assistant Professor, Division of Geriatric Medicine and Gerontology School of Medicine Johns Hopkins University Ph. (410) 502-2619 email: rvarad...@jhmi.edu ----- Original Message ----- From: Hans W Borchers <hwborch...@googlemail.com> Date: Sunday, November 29, 2009 6:18 pm Subject: Re: [R] optim or nlminb for minimization, which to believe? To: r-help@r-project.org > Your function named 'gradient' is not the correct gradient. Take as an > example the following point x0, very near to the true minimum, > > x0 <- c(-0.2517964, 0.4898680, -0.2517962, 0.4898681, 0.7500995) > > then you get > > > gradient(x0) > [1] -0.0372110470 0.0001816991 -0.0372102284 0.0001820976 > 0.0144292657 > > but the numerical gradient is different: > > > library(numDeriv) > > grad(fn, x0) > [1] -6.151645e-07 -5.507219e-07 1.969143e-07 -1.563892e-07 > -4.955502e-08 > > that is the derivative is close to 0 in any direction -- as it should > be for > an optimum. > > No wonder, optim et al. get confused when applying this 'gradient'. > > Regards > Hans Werner > > > Doran, Harold wrote: > > > > I have constructed the function mml2 (below) based on the likelihood > > function described in the minimal latex I have pasted below for > anyone who > > wants to look at it. This function finds parameter estimates for a basic > > Rasch (IRT) model. Using the function without the gradient, using either > > nlminb or optim returns the correct parameter estimates and, in the > case > > of optim, the correct standard errors. > > > > By correct, I mean they match another software program as well as the > > rasch() function in the ltm package. > > > > Now, when I pass the gradient to optim, I get a message of successful > > convergence, but the parameter estimates are not correct, but they are > > *very* close to being correct. But, when I use nlminb with the > gradient, I > > get a message of false convergence and again, the estimates are off, > but > > again very close to being correct. > > > > This is best illustrated via the examples: > > > > ### Sample data set > > set.seed(1234) > > tmp <- data.frame(item1 = sample(c(0,1), 20, replace=TRUE), item2 = > > sample(c(0,1), 20, replace=TRUE), item3 = sample(c(0,1), 20, > > replace=TRUE),item4 = sample(c(0,1), 20, replace=TRUE),item5 = > > sample(c(0,1), 20, replace=TRUE)) > > > > ## Use function mml2 (below) with optim with use of gradient > >> mml2(tmp,Q=10) > > $par > > [1] -0.2438733 0.4889333 -0.2438733 0.4889333 0.7464162 > > > > $value > > [1] 63.86376 > > > > $counts > > function gradient > > 45 6 > > > > $convergence > > [1] 0 > > > > $message > > NULL > > > > $hessian > > [,1] [,2] [,3] [,4] [,5] > > [1,] 4.095479 0.000000 0.000000 0.000000 0.000000 > > [2,] 0.000000 3.986293 0.000000 0.000000 0.000000 > > [3,] 0.000000 0.000000 4.095479 0.000000 0.000000 > > [4,] 0.000000 0.000000 0.000000 3.986293 0.000000 > > [5,] 0.000000 0.000000 0.000000 0.000000 3.800898 > > > > ## Use same function but use nlminb with use of gradient > >> mml2(tmp,Q=10) > > $par > > [1] -0.2456398 0.4948889 -0.2456398 0.4948889 0.7516308 > > > > $objective > > [1] 63.86364 > > > > $convergence > > [1] 1 > > > > $message > > [1] "false convergence (8)" > > > > $iterations > > [1] 4 > > > > $evaluations > > function gradient > > 41 4 > > > > > > ### use nlminb but turn off use of gradient > >> mml2(tmp,Q=10) > > $par > > [1] -0.2517961 0.4898682 -0.2517961 0.4898682 0.7500994 > > > > $objective > > [1] 63.8635 > > > > $convergence > > [1] 0 > > > > $message > > [1] "relative convergence (4)" > > > > $iterations > > [1] 8 > > > > $evaluations > > function gradient > > 11 64 > > > > ### Use optim and turn off gradient > > > >> mml2(tmp,Q=10) > > $par > > [1] -0.2517990 0.4898676 -0.2517990 0.4898676 0.7500906 > > > > $value > > [1] 63.8635 > > > > $counts > > function gradient > > 22 7 > > > > $convergence > > [1] 0 > > > > $message > > NULL > > > > $hessian > > [,1] [,2] [,3] [,4] [,5] > > [1,] 3.6311153 -0.3992959 -0.4224747 -0.3992959 -0.3764526 > > [2,] -0.3992959 3.5338195 -0.3992959 -0.3960956 -0.3798141 > > [3,] -0.4224747 -0.3992959 3.6311153 -0.3992959 -0.3764526 > > [4,] -0.3992959 -0.3960956 -0.3992959 3.5338195 -0.3798141 > > [5,] -0.3764526 -0.3798141 -0.3764526 -0.3798141 3.3784816 > > > > The parameter estimates with and without the use of the gradient are > so > > close that I am inclined to believe that the gradient is correct and > maybe > > the problem is elsewhere. > > > > It seems odd that optim seems to converge but nlminb does not with > the use > > of the gradient. But, with the use of the gradient in either case, the > > parameter estimates differ from what I think are the correct values. > So, > > at this point I am unclear if the problem is somewhere in the way the > > functions are used or how I am passing the gradient or if the > problem lies > > in the way I have constructed the gradient itself. > > > > Below is the function and also some latex for those interested in looking > > at the likelihood function. > > > > Thanks for any reactions > > Harold > > > >> sessionInfo() > > R version 2.10.0 (2009-10-26) > > i386-pc-mingw32 > > > > locale: > > [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United > > States.1252 > > [3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C > > [5] LC_TIME=English_United States.1252 > > > > attached base packages: > > [1] stats graphics grDevices utils datasets methods base > > > > other attached packages: > > [1] ltm_0.9-2 polycor_0.7-7 sfsmisc_1.0-9 mvtnorm_0.9-8 > msm_0.9.4 > > MASS_7.3-3 MiscPsycho_1.5 > > [8] statmod_1.4.1 > > > > loaded via a namespace (and not attached): > > [1] splines_2.10.0 survival_2.35-7 tools_2.10.0 > > > > > > mml2 <- function(data, Q, startVal = NULL, gr = TRUE, ...){ > > if(!is.null(startVal) && length(startVal) != > ncol(data) ){ > > stop("Length of argument startVal not > > equal to the number of parameters estimated") > > } > > if(!is.null(startVal)){ > > startVal <- startVal > > } else { > > p <- colMeans(data) > > startVal <- as.vector(log((1 - p)/p)) > > } > > qq <- gauss.quad.prob(Q, dist = 'normal') > > rr1 <- matrix(0, nrow = Q, ncol = nrow(data)) > > data <- as.matrix(data) > > L <- nrow(data) > > C <- ncol(data) > > fn <- function(b){ > > for(j in 1:Q){ > > for(i in 1:nrow(data)){ > > rr1[j,i] > > <- exp(sum(dbinom(data[i,], 1, plogis(qq$nodes[j]-b), log = TRUE))) > * > > > > qq$weights[j] > > } > > } > > -sum(log(colSums(rr1))) > > } > > gradient <- function(b){ > > p <- outer(qq$nodes, b, plogis) * L > > x <- t(matrix(colSums(data), nrow= > C, Q)) > > w <- matrix(qq$weights, Q, C) > > rr2 <- (-x + p) * w > > -colSums(rr2) > > } > > opt <- optim(startVal, fn, gr = gradient, hessian = > TRUE, > > method = "BFGS") > > #opt <- nlminb(startVal, fn, gradient=gradient) > > #list("coefficients" = opt$par, "LogLik" = -opt$value, > > "Std.Error" = 1/sqrt(diag(opt$hessian))) > > opt > > } > > > > > > > > ### latex describing the likelihood function > > \documentclass{article} > > \usepackage{latexsym} > > \usepackage{amssymb,amsmath,bm} > > \newcommand{\Prb}{\operatorname{Prob}} > > \title{Marginal Maximum Likelihood for IRT Models} > > \author{Harold C. Doran} > > \begin{document} > > \maketitle > > The likelihood function is written as: > > \begin{equation} > > \label{eqn:mml} > > f(x) = > > \prod^N_{i=1}\int_{\Theta}\prod^K_{j=1}p(x|\theta,\beta)f(\theta)d\theta > > \end{equation} > > \noindent where $N$ indexes the total number of individuals, $K$ indexes > > the total number of items, $p(x|\theta,\beta)$ is the data > likelihood and > > $f(\theta)$ is a population distribution. For the rasch model, the data > > likelihood is: > > \begin{equation} > > p(x|\theta,\beta) = \prod^N_{i=1}\prod^K_{j=1} \Pr(x_{ij} = 1 | \theta_i, > > \beta_j)^{x_{ij}} \left[1 - \Pr(X_{ij} = 1 | \theta_i, > > \beta_j)\right]^{(1-x_{ij})} > > \end{equation} > > \begin{equation} > > \label{rasch} > > \Pr(x_{ij} = 1 | \theta_i, \beta_j) = \frac{1}{1 + > > e^{-(\theta_i-\beta_j)}} \quad i = (1, \ldots, K); j = (1, \ldots, N) > > \end{equation} > > \noindent where $\theta_i$ is the ability estimate of person $i$ and > > $\beta_j$ is the location parameter for item $j$. The integral in > > Equation~(\ref{eqn:mml}) has no closed form expression, so it is > > approximated using Gauss-Hermite quadrature as: > > \begin{equation} > > \label{eqn:mml:approx} > > f(x) \approx > > \prod^N_{i=1}\sum_{q=1}^{Q}\prod^K_{j=1}p(x|\theta_q,\beta)w_q > > \end{equation} > > \noindent where $q$ indexes the node at quadrature point $q$ and $w$ > is > > the weight at quadrature point $q$. With Equation~(\ref{eqn:mml:approx}), > > the remaining challenge is to find $\underset{x}{\operatorname{arg\,max}} > > \, f(x)$. > > \end{document} > > > > [[alternative HTML version deleted]] > > > > ______________________________________________ > > R-help@r-project.org mailing list > > > > PLEASE do read the posting guide > > > > and provide commented, minimal, self-contained, reproducible code. > > > > > > -- > View this message in context: > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help@r-project.org mailing list > > PLEASE do read the posting guide > and provide commented, minimal, self-contained, reproducible code. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.