[R] optim or nlminb for minimization, which to believe?

Doran, Harold Sun, 29 Nov 2009 11:42:31 -0800

I have constructed the function mml2 (below) based on the likelihood function 
described in the minimal latex I have pasted below for anyone who wants to look 
at it. This function finds parameter estimates for a basic Rasch (IRT) model. 
Using the function without the gradient, using either nlminb or optim returns 
the correct parameter estimates and, in the case of optim, the correct standard 
errors.


By correct, I mean they match another software program as well as the rasch() 
function in the ltm package.

Now, when I pass the gradient to optim, I get a message of successful 
convergence, but the parameter estimates are not correct, but they are *very* 
close to being correct. But, when I use nlminb with the gradient, I get a 
message of false convergence and again, the estimates are off, but again very 
close to being correct.

This is best illustrated via the examples:

### Sample data set
set.seed(1234)
tmp <- data.frame(item1 = sample(c(0,1), 20, replace=TRUE), item2 = 
sample(c(0,1), 20, replace=TRUE), item3 = sample(c(0,1), 20, 
replace=TRUE),item4 = sample(c(0,1), 20, replace=TRUE),item5 = sample(c(0,1), 
20, replace=TRUE))

## Use function mml2 (below) with optim  with use of gradient
> mml2(tmp,Q=10)
$par
[1] -0.2438733  0.4889333 -0.2438733  0.4889333  0.7464162

$value
[1] 63.86376

$counts
function gradient
      45        6

$convergence
[1] 0

$message
NULL

$hessian
         [,1]     [,2]     [,3]     [,4]     [,5]
[1,] 4.095479 0.000000 0.000000 0.000000 0.000000
[2,] 0.000000 3.986293 0.000000 0.000000 0.000000
[3,] 0.000000 0.000000 4.095479 0.000000 0.000000
[4,] 0.000000 0.000000 0.000000 3.986293 0.000000
[5,] 0.000000 0.000000 0.000000 0.000000 3.800898

## Use same function but use nlminb with use of gradient
> mml2(tmp,Q=10)
$par
[1] -0.2456398  0.4948889 -0.2456398  0.4948889  0.7516308

$objective
[1] 63.86364

$convergence
[1] 1

$message
[1] "false convergence (8)"

$iterations
[1] 4

$evaluations
function gradient
      41        4


### use nlminb but turn off use of gradient
> mml2(tmp,Q=10)
$par
[1] -0.2517961  0.4898682 -0.2517961  0.4898682  0.7500994

$objective
[1] 63.8635

$convergence
[1] 0

$message
[1] "relative convergence (4)"

$iterations
[1] 8

$evaluations
function gradient
      11       64

### Use optim and turn off gradient

> mml2(tmp,Q=10)
$par
[1] -0.2517990  0.4898676 -0.2517990  0.4898676  0.7500906

$value
[1] 63.8635

$counts
function gradient
      22        7

$convergence
[1] 0

$message
NULL

$hessian
           [,1]       [,2]       [,3]       [,4]       [,5]
[1,]  3.6311153 -0.3992959 -0.4224747 -0.3992959 -0.3764526
[2,] -0.3992959  3.5338195 -0.3992959 -0.3960956 -0.3798141
[3,] -0.4224747 -0.3992959  3.6311153 -0.3992959 -0.3764526
[4,] -0.3992959 -0.3960956 -0.3992959  3.5338195 -0.3798141
[5,] -0.3764526 -0.3798141 -0.3764526 -0.3798141  3.3784816

The parameter estimates with and without the use of the gradient are so close 
that I am inclined to believe that the gradient is correct and maybe the 
problem is elsewhere.

It seems odd that optim seems to converge but nlminb does not with the use of 
the gradient. But, with the use of the gradient in either case, the parameter 
estimates differ from what I think are the correct values. So, at this point I 
am unclear if the problem is somewhere in the way the functions are used or how 
I am passing the gradient or if the problem lies in the way I have constructed 
the gradient itself.

Below is the function and also some latex for those interested in looking at 
the likelihood function.

Thanks for any reactions
Harold

> sessionInfo()
R version 2.10.0 (2009-10-26)
i386-pc-mingw32

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C
[5] LC_TIME=English_United States.1252

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] ltm_0.9-2      polycor_0.7-7  sfsmisc_1.0-9  mvtnorm_0.9-8  msm_0.9.4      
MASS_7.3-3     MiscPsycho_1.5
[8] statmod_1.4.1

loaded via a namespace (and not attached):
[1] splines_2.10.0  survival_2.35-7 tools_2.10.0


mml2 <- function(data, Q, startVal = NULL, gr = TRUE, ...){
                if(!is.null(startVal) && length(startVal) != ncol(data) ){
                                stop("Length of argument startVal not equal to 
the number of parameters estimated")
                }
                if(!is.null(startVal)){
                                startVal <- startVal
                                } else {
                                p <- colMeans(data)
                                startVal <- as.vector(log((1 - p)/p))
                }
                qq <- gauss.quad.prob(Q, dist = 'normal')
                rr1 <- matrix(0, nrow = Q, ncol = nrow(data))
                data <- as.matrix(data)
                L <- nrow(data)
                C <- ncol(data)
                fn <- function(b){
                                for(j in 1:Q){
                                                for(i in 1:nrow(data)){
                                                                rr1[j,i] <- 
exp(sum(dbinom(data[i,], 1, plogis(qq$nodes[j]-b), log = TRUE))) *
                                                                qq$weights[j]
                                                }
                                }
                -sum(log(colSums(rr1)))
                }
                gradient <- function(b){
                                p <- outer(qq$nodes, b, plogis) * L
                                x <- t(matrix(colSums(data), nrow= C, Q))
                                w <- matrix(qq$weights, Q, C)
                                rr2 <- (-x + p) * w
                                -colSums(rr2)
                }
                opt <- optim(startVal, fn, gr = gradient, hessian = TRUE, 
method = "BFGS")
                #opt <- nlminb(startVal, fn, gradient=gradient)
                #list("coefficients" = opt$par, "LogLik" = -opt$value, 
"Std.Error" = 1/sqrt(diag(opt$hessian)))
                opt
}



### latex describing the likelihood function
\documentclass{article}
\usepackage{latexsym}
\usepackage{amssymb,amsmath,bm}
\newcommand{\Prb}{\operatorname{Prob}}
\title{Marginal Maximum Likelihood for IRT Models}
\author{Harold C. Doran}
\begin{document}
\maketitle
The likelihood function is written as:
\begin{equation}
\label{eqn:mml}
f(x) = \prod^N_{i=1}\int_{\Theta}\prod^K_{j=1}p(x|\theta,\beta)f(\theta)d\theta
\end{equation}
\noindent where $N$ indexes the total number of individuals, $K$ indexes the 
total number of items, $p(x|\theta,\beta)$ is the data likelihood and 
$f(\theta)$ is a population distribution. For the rasch model, the data 
likelihood is:
\begin{equation}
p(x|\theta,\beta) = \prod^N_{i=1}\prod^K_{j=1} \Pr(x_{ij} = 1 | \theta_i, 
\beta_j)^{x_{ij}} \left[1 - \Pr(X_{ij} = 1 | \theta_i, 
\beta_j)\right]^{(1-x_{ij})}
\end{equation}
\begin{equation}
\label{rasch}
\Pr(x_{ij} = 1 | \theta_i, \beta_j) = \frac{1}{1 + e^{-(\theta_i-\beta_j)}} 
\quad i = (1, \ldots, K); j = (1, \ldots, N)
\end{equation}
\noindent where $\theta_i$ is the ability estimate of person $i$ and $\beta_j$ 
is the location parameter for item $j$. The integral in 
Equation~(\ref{eqn:mml}) has no closed form expression, so it is approximated 
using Gauss-Hermite quadrature as:
\begin{equation}
\label{eqn:mml:approx}
f(x) \approx \prod^N_{i=1}\sum_{q=1}^{Q}\prod^K_{j=1}p(x|\theta_q,\beta)w_q
\end{equation}
\noindent where $q$ indexes the node at quadrature point $q$ and $w$ is the 
weight at quadrature point $q$. With Equation~(\ref{eqn:mml:approx}), the 
remaining challenge is to find $\underset{x}{\operatorname{arg\,max}} \, f(x)$.
\end{document}

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] optim or nlminb for minimization, which to believe?

Reply via email to