Dear R experts,
I have been encountering problems with the "optim" routine using "for"
loops. I am determining the optimal parameters of several nested models by
minimizing the negative Log-Likelihood (NLL) of a dataset.
The aim is to find the model which best describes the data. To this end, I
am simulating artificial data sets based on the model with the least number
of parameters (6) and the parameters determined with the field data. For
each artificial set I estimate the parameters of the model with 6 parameters
and the next more complex model with 7 parameters (two of these parameters
are equal in the 6-parameter model) by minimizing the corresponding NLL with
"optim". In theory the 7-parameter model should fit the data either equally
or better than the 6-parameter model. Therefore the difference of the
minimal NLLs should be 0 or larger.
For 500 data sets I use the following code:
require(odesolve)
res=matrix(nrow=500,ncol=18)
colnames(res)=c("a_23","beta_23","mu_23","d_23","I_23","M_23","NLL_23",
"a_21","beta_21","mu_21","e_21","d_21","I_21","M_21","NLL_21",
"NLL23_min_21","conv23","conv21")
for (s in 1:500)
{
assign("data",read.table(paste("populations/TE23simset_",s,".txt",sep="")),e
nv=MyEnv) #reading a data set
M23=optim(rep(0.1,6),NLL23,method="L-BFGS-B",lower=0,
upper=c(Inf,Inf,Inf,Inf,1,1),control=c(maxit=150))
if (M23$convergence==0)
{
M21=optim(rep(0.1,7),NLL21,method="L-BFGS-B",lower=0,
upper=c(Inf,Inf,Inf,Inf,Inf,1,1),control=c(maxit=150))
}
res[s,1]=M23$par[1]
res[s,2]=M23$par[2]
res[s,3]=M23$par[3]
res[s,4]=M23$par[4]
res[s,5]=M23$par[5]
res[s,6]=M23$par[6]
res[s,7]=M23$value
res[s,8]=M21$par[1]
res[s,9]=M21$par[2]
res[s,10]=M21$par[3]
res[s,11]=M21$par[4]
res[s,12]=M21$par[5]
res[s,13]=M21$par[6]
res[s,14]=M21$par[7]
res[s,15]=M21$value
res[s,16]=M23$value-M21$value
res[s,17]=M23$convergence
res[s,18]=M21$convergence
write.table(res,"compare23_21TE061221.txt")
rm(M23,M21)
}
}
For some strange reason the results do not correspond to what I expect:
about 10% of the solutions have a difference of NLL smaller than 0. I have
verified the optimisation of these results manually and found that a minimal
NLL was ignored and a higher NLL was returned at "convergence". To check
what was happening I inserted a printing line in the NLL function to print
all parameters and the NLL as the procedure goes on. To my surprise "optim"
then stopped at the minimal NLL which had been ignored before. I have then
reduced the machine precision to .Machine$double.digits=8 thinking, that the
printing was slowing down the procedure and by reducing the machine
precision to speed up the calculations. For an individual calculation this
solved my problem. However if I implemented the same procedure in the loop
above, the same impossible results occurred again.
Can anyone tell me where I should be looking for the problem, or what it is
and how I could solve it?
Thanks a lot for your help
Sincerely
Simon
********************************************************************
Simon Ruegg
Dr.med.vet., PhD candidate
Institute for Parasitology
Winterthurstr. 266a
8057 Zurich
Switzerland
phone: +41 44 635 85 93
fax: +41 44 635 89 07
e-mail: [EMAIL PROTECTED]
[[alternative HTML version deleted]]
______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.