Re: [R] problem with convergence in mle2/optim function

Adam Zeilinger Sun, 07 Oct 2012 22:44:17 -0700

Dear R Help,

Thank you to those who responded to my questions about mle2/optimconvergence. A few responders pointed out that the optim error seems toarise when either one of the probabilities P1, P2, or P3 become negativeor infinite. One suggested examining the exponential terms within theP1 and P2 equations.

I may have made some progress along these lines. The exponential termsin the equations for P1 and P2 go to infinity at certain (large) valuesof t. The exponential terms can be found in lines 1, 5, and 7 in the P1and P2 expressions below. Here is some example code:


###########################################################################
# P1 and P2 equations
P1 <- expression((p1*((-1 + exp(sqrt((mu1 + mu2 + p1 + p2)^2 -
  4*(mu2*p1 + mu1*(mu2 + p2)))*t))*((-mu2)*(mu2 - p1 + p2) +
  mu1*(mu2 + 2*p2)) - mu2*sqrt((mu1 + mu2 + p1 + p2)^2 -
  4*(mu2*p1 + mu1*(mu2 + p2))) -
  exp(sqrt((mu1 + mu2 + p1 + p2)^2 - 4*(mu2*p1 + mu1*(mu2 + p2)))*t)*
  mu2*sqrt((mu1 + mu2 + p1 + p2)^2 - 4*(mu2*p1 + mu1*(mu2 + p2))) +
  2*exp((1/2)*(mu1 + mu2 + p1 + p2 + sqrt((mu1 + mu2 + p1 + p2)^2 -
  4*(mu2*p1 + mu1*(mu2 + p2))))*t)*mu2*
  sqrt((mu1 + mu2 + p1 + p2)^2 - 4*(mu2*p1 + mu1*(mu2 + p2)))))/
  exp((1/2)*(mu1 + mu2 + p1 + p2 + sqrt((mu1 + mu2 + p1 + p2)^2 -
  4*(mu2*p1 + mu1*(mu2 + p2))))*t)/(2*(mu2*p1 + mu1*(mu2 + p2))*
  sqrt((mu1 + mu2 + p1 + p2)^2 - 4*(mu2*p1 + mu1*(mu2 + p2)))))
P2 <- expression((p2*((-1 + exp(sqrt((mu1 + mu2 + p1 + p2)^2 -
  4*(mu2*p1 + mu1*(mu2 + p2)))*t))*(-mu1^2 + 2*mu2*p1 +
  mu1*(mu2 - p1 + p2)) - mu1*sqrt((mu1 + mu2 + p1 + p2)^2 -
  4*(mu2*p1 + mu1*(mu2 + p2))) -
  exp(sqrt((mu1 + mu2 + p1 + p2)^2 - 4*(mu2*p1 + mu1*(mu2 + p2)))*t)*
  mu1*sqrt((mu1 + mu2 + p1 + p2)^2 - 4*(mu2*p1 + mu1*(mu2 + p2))) +
  2*exp((1/2)*(mu1 + mu2 + p1 + p2 + sqrt((mu1 + mu2 + p1 + p2)^2 -
  4*(mu2*p1 + mu1*(mu2 + p2))))*t)*mu1*
  sqrt((mu1 + mu2 + p1 + p2)^2 - 4*(mu2*p1 + mu1*(mu2 + p2)))))/
  exp((1/2)*(mu1 + mu2 + p1 + p2 + sqrt((mu1 + mu2 + p1 + p2)^2 -
  4*(mu2*p1 + mu1*(mu2 + p2))))*t)/(2*(mu2*p1 + mu1*(mu2 + p2))*
  sqrt((mu1 + mu2 + p1 + p2)^2 - 4*(mu2*p1 + mu1*(mu2 + p2)))))

# Vector of t values
tv <- c(1:200)

# 'true' parameter values
p1t = 2; p2t = 2; mu1t = 0.001; mu2t = 0.001

# Function to calculate probabilities from 'true' parameter values
psim <- function(x){
  params <- list(p1 = p1t, p2 = p2t, mu1 = mu1t, mu2 = mu2t, t = x)
  eval.P1 <- eval(P1, params)
  eval.P2 <- eval(P2, params)
  P3 <- 1 - eval.P1 - eval.P2
  c(x, matrix(c(eval.P1, eval.P2, P3), ncol = 3))
}
pdat <- sapply(tv, psim, simplify = TRUE)
Pdat <- as.data.frame(t(pdat))
names(Pdat) <- c("time", "P1", "P2", "P3")

matplot(Pdat[,-1], type = "l", xlab = "time", ylab = "Probability",
        col = c("black", "brown", "blue"),
        lty = c(1:3), lwd = 2, ylim = c(0,1))
legend("topright", c("P1", "P2", "P3"),
       col = c("black", "brown", "blue"),
       lty = c(1:3), lwd = 2)
Pdat[160:180,] # psim function begins to return "NaN" at t = 178

# exponential terms in P1 and P2 expressions are problematic
params <- list(p1 = p1t, p2 = p2t, mu1 = mu1t, mu2 = mu2t, t = 178)

exp1 <- expression(exp(sqrt((mu1 + mu2 + p1 + p2)^2 -
  4*(mu2*p1 + mu1*(mu2 + p2)))*t))
eval(exp1, params) # returns Inf at t = 178

exp2 <- expression(exp((1/2)*(mu1 + mu2 + p1 + p2 +
  sqrt((mu1 + mu2 + p1 + p2)^2 -
  4*(mu2*p1 + mu1*(mu2 + p2))))*t))
eval(exp2, params) # also returns Inf at t = 178
##########################################################################

The time step at which the exponential terms go to infinity depends onthe values of the parameters p1, p2, mu1, mu2. It seems that theconvergence problems may be due to these exponential terms going toinfinity. Thus my convergence problem appears to be an overflow problem?

Unfortunately, I'm not sure where to go from here. Due to thecomplexity of the P1 and P2 equations, there's no clear way to rearrangethe equations to eliminate t from the exponential terms.

Does anyone have any suggestions on how to address this problem?Perhaps there is a way to bound p1, p2, mu1, and mu2 to avoid theexponential terms going to infinity? Or bound P1 and P2?


Any suggestions would be greatly appreciated.

Adam Zeilinger



On 10/5/2012 3:06 AM, Berend Hasselman wrote:


On 05-10-2012, at 07:12, Adam Zeilinger wrote:

Hello R Help,

I am trying solve an MLE convergence problem: I would like to estimate four 
parameters, p1, p2, mu1, mu2, which relate to the probabilities, P1, P2, P3, of 
a multinomial (trinomial) distribution.  I am using the mle2() function and 
feeding it a time series dataset composed of four columns: time point, number 
of successes in category 1, number of successes in category 2, and number of 
success in category 3.  The column headers are: t, n1, n2, and n3.

The mle2() function converges occasionally, and I need to improve the rate of convergence when used 
in a stochastic simulation, with multiple stochastically generated datasets.  When mle2() does not 
converge, it returns an error: "Error in optim(par = c(2, 2, 0.001, 0.001), fn = function (p) 
: L-BFGS-B needs finite values of 'fn'."  I am using the L-BFGS-B optimization method with a 
lower box constraint of zero for all four parameters.  While I do not know any theoretical upper 
limit(s) to the parameter values, I have not seen any parameter estimates above 2 when using 
empirical data.  It seems that when I start with certain 'true' parameter values, the rate of 
convergence is quite high, whereas other "true" parameter values are very difficult to 
estimate.  For example, the true parameter values p1 = 2, p2 = 2, mu1 = 0.001, mu2 = 0.001 causes 
convergence problems, but the parameter values p1 = 0.3, p2 = 0.3, mu1 = 0.08, mu2 = 0.08 lead to 
high convergence rate.  I've chos!

e!

! n these two sets of values because they represent the upper and lower 
estimates of parameter values derived from graphical methods.


First, do you have any suggestions on how to improve the rate of convergence and avoid 
the "finite values of 'fn'" error?  Perhaps it has to do with the true 
parameter values being so close to the boundary?  If so, any suggestions on how to 
estimate parameter values that are near zero?

Here is reproducible and relevant code from my stochastic simulation:

########################################################################
library(bbmle)
library(combinat)

# define multinomial distribution
dmnom2 <- function(x,prob,log=FALSE) {
  r <- lgamma(sum(x) + 1) + sum(x * log(prob) - lgamma(x + 1))
  if (log) r else exp(r)
}

# vector of time points
tv <- 1:20

# Negative log likelihood function
NLL.func <- function(p1, p2, mu1, mu2, y){
  t <- y$tv
  n1 <- y$n1
  n2 <- y$n2
  n3 <- y$n3
  P1 <- (p1*((-1 + exp(sqrt((mu1 + mu2 + p1 + p2)^2 -
    4*(mu2*p1 + mu1*(mu2 + p2)))*t))*((-mu2)*(mu2 - p1 + p2) +
    mu1*(mu2 + 2*p2)) - mu2*sqrt((mu1 + mu2 + p1 + p2)^2 -
    4*(mu2*p1 + mu1*(mu2 + p2))) -
    exp(sqrt((mu1 + mu2 + p1 + p2)^2 - 4*(mu2*p1 + mu1*(mu2 + p2)))*t)*
    mu2*sqrt((mu1 + mu2 + p1 + p2)^2 - 4*(mu2*p1 + mu1*(mu2 + p2))) +
    2*exp((1/2)*(mu1 + mu2 + p1 + p2 + sqrt((mu1 + mu2 + p1 + p2)^2 -
    4*(mu2*p1 + mu1*(mu2 + p2))))*t)*mu2*
    sqrt((mu1 + mu2 + p1 + p2)^2 - 4*(mu2*p1 + mu1*(mu2 + p2)))))/
    exp((1/2)*(mu1 + mu2 + p1 + p2 + sqrt((mu1 + mu2 + p1 + p2)^2 -
    4*(mu2*p1 + mu1*(mu2 + p2))))*t)/(2*(mu2*p1 + mu1*(mu2 + p2))*
    sqrt((mu1 + mu2 + p1 + p2)^2 - 4*(mu2*p1 + mu1*(mu2 + p2))))
  P2 <- (p2*((-1 + exp(sqrt((mu1 + mu2 + p1 + p2)^2 -
    4*(mu2*p1 + mu1*(mu2 + p2)))*t))*(-mu1^2 + 2*mu2*p1 +
    mu1*(mu2 - p1 + p2)) - mu1*sqrt((mu1 + mu2 + p1 + p2)^2 -
    4*(mu2*p1 + mu1*(mu2 + p2))) -
    exp(sqrt((mu1 + mu2 + p1 + p2)^2 - 4*(mu2*p1 + mu1*(mu2 + p2)))*t)*
    mu1*sqrt((mu1 + mu2 + p1 + p2)^2 - 4*(mu2*p1 + mu1*(mu2 + p2))) +
    2*exp((1/2)*(mu1 + mu2 + p1 + p2 + sqrt((mu1 + mu2 + p1 + p2)^2 -
    4*(mu2*p1 + mu1*(mu2 + p2))))*t)*mu1*
    sqrt((mu1 + mu2 + p1 + p2)^2 - 4*(mu2*p1 + mu1*(mu2 + p2)))))/
    exp((1/2)*(mu1 + mu2 + p1 + p2 + sqrt((mu1 + mu2 + p1 + p2)^2 -
    4*(mu2*p1 + mu1*(mu2 + p2))))*t)/(2*(mu2*p1 + mu1*(mu2 + p2))*
    sqrt((mu1 + mu2 + p1 + p2)^2 - 4*(mu2*p1 + mu1*(mu2 + p2))))
  P3 <- 1 - P1 - P2
  p.all <- c(P1, P2, P3)
  -sum(dmnom2(c(n1, n2, n3), prob = p.all, log = TRUE))
}

## Generate simulated data
# Model equations as expressions,
P1 <- expression((p1*((-1 + exp(sqrt((mu1 + mu2 + p1 + p2)^2 -
  4*(mu2*p1 + mu1*(mu2 + p2)))*t))*((-mu2)*(mu2 - p1 + p2) +
  mu1*(mu2 + 2*p2)) - mu2*sqrt((mu1 + mu2 + p1 + p2)^2 -
  4*(mu2*p1 + mu1*(mu2 + p2))) -
  exp(sqrt((mu1 + mu2 + p1 + p2)^2 - 4*(mu2*p1 + mu1*(mu2 + p2)))*t)*
  mu2*sqrt((mu1 + mu2 + p1 + p2)^2 - 4*(mu2*p1 + mu1*(mu2 + p2))) +
  2*exp((1/2)*(mu1 + mu2 + p1 + p2 + sqrt((mu1 + mu2 + p1 + p2)^2 -
  4*(mu2*p1 + mu1*(mu2 + p2))))*t)*mu2*
  sqrt((mu1 + mu2 + p1 + p2)^2 - 4*(mu2*p1 + mu1*(mu2 + p2)))))/
  exp((1/2)*(mu1 + mu2 + p1 + p2 + sqrt((mu1 + mu2 + p1 + p2)^2 -
  4*(mu2*p1 + mu1*(mu2 + p2))))*t)/(2*(mu2*p1 + mu1*(mu2 + p2))*
  sqrt((mu1 + mu2 + p1 + p2)^2 - 4*(mu2*p1 + mu1*(mu2 + p2)))))

P2 <- expression((p2*((-1 + exp(sqrt((mu1 + mu2 + p1 + p2)^2 -
  4*(mu2*p1 + mu1*(mu2 + p2)))*t))*(-mu1^2 + 2*mu2*p1 +
  mu1*(mu2 - p1 + p2)) - mu1*sqrt((mu1 + mu2 + p1 + p2)^2 -
  4*(mu2*p1 + mu1*(mu2 + p2))) -
  exp(sqrt((mu1 + mu2 + p1 + p2)^2 - 4*(mu2*p1 + mu1*(mu2 + p2)))*t)*
  mu1*sqrt((mu1 + mu2 + p1 + p2)^2 - 4*(mu2*p1 + mu1*(mu2 + p2))) +
  2*exp((1/2)*(mu1 + mu2 + p1 + p2 + sqrt((mu1 + mu2 + p1 + p2)^2 -
  4*(mu2*p1 + mu1*(mu2 + p2))))*t)*mu1*
  sqrt((mu1 + mu2 + p1 + p2)^2 - 4*(mu2*p1 + mu1*(mu2 + p2)))))/
  exp((1/2)*(mu1 + mu2 + p1 + p2 + sqrt((mu1 + mu2 + p1 + p2)^2 -
  4*(mu2*p1 + mu1*(mu2 + p2))))*t)/(2*(mu2*p1 + mu1*(mu2 + p2))*
  sqrt((mu1 + mu2 + p1 + p2)^2 - 4*(mu2*p1 + mu1*(mu2 + p2)))))

# True parameter values
p1t = 2; p2t = 2; mu1t = 0.001; mu2t = 0.001

# Function to calculate probabilities from 'true' parameter values
psim <- function(x){
  params <- list(p1 = p1t, p2 = p2t, mu1 = mu1t, mu2 = mu2t, t = x)
  eval.P1 <- eval(P1, params)
  eval.P2 <- eval(P2, params)
  P3 <- 1 - eval.P1 - eval.P2
  c(x, matrix(c(eval.P1, eval.P2, P3), ncol = 3))
}
pdat <- sapply(tv, psim, simplify = TRUE)
Pdat <- as.data.frame(t(pdat))
names(Pdat) <- c("time", "P1", "P2", "P3")

# Generate simulated data set from probabilities
n = rep(20, length(tv))
p = as.matrix(Pdat[,2:4])
y <- as.data.frame(rmultinomial(n,p))
yt <- cbind(tv, y)
names(yt) <- c("tv", "n1", "n2", "n3")

# mle2 call
mle.fit <- mle2(NLL.func, data = list(y = yt),
                start = list(p1 = p1t, p2 = p2t, mu1 = mu1t, mu2 = mu2t),
                control = list(maxit = 5000, factr = 1e-10, lmm = 17),
                method = "L-BFGS-B", skip.hessian = TRUE,
                lower = list(p1 = 0, p2 = 0, mu1 = 0, mu2 = 0))

###########################################################################

I interpret the error as having to do with the finite difference approximation 
failing.  If so, perhaps a gradient function would help? If you agree, I've 
described my unsuccessful attempt at writing a gradient function below.  If a 
gradient function is unnecessary, ignore the remainder of this message.


After playing with your function, I can't agree with your interpretation of 
what could be wrong.
During optim iterations your function is dmnom2 is getting negative values for 
prob and that leads to the error messages.
I checked this by inserting the following lines in NLL.func after the 
assignment to p.all:

  cat("NLL.func p.all {P1,P2,P3}\n")
  print(matrix(p.all, ncol=3))

At some stage entries for P1, P2, P3 become negative (which ones and how many 
depends on the random number generator).
Try set.seed(1), set.seed(11) and set.seed(413) to see what happens.

The expressions are too complicated for further analysis.
Assuming your expressions are correct, you will need to restrict P1,P2,P3 to 
take on valid values.

Berend

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


--
Adam Zeilinger
Post Doctoral Scholar
Department of Entomology
University of California Riverside
www.linkedin.com/in/adamzeilinger

--
Adam Zeilinger
Post Doctoral Scholar
Department of Entomology
University of California Riverside
www.linkedin.com/in/adamzeilinger

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] problem with convergence in mle2/optim function

Reply via email to