This is not the proper venue for a discussion of the mathematics of
optimization, no matter that it is interesting. Please take it off
list.

Cheers,
Bert

Bert Gunter
Genentech Nonclinical Biostatistics
(650) 467-7374

"Data is not information. Information is not knowledge. And knowledge
is certainly not wisdom."
Clifford Stoll




On Fri, Feb 20, 2015 at 6:03 AM, Doran, Harold <hdo...@air.org> wrote:
> John et al
>
> Thank you for your advice below. It was sloppy of me not to verify my 
> reproducible code below.  I have tried a few of your suggestions and wrapped 
> the working code into the function below called pl2. The function properly 
> lands on the right model parameters when I use the optim or nlminb (for 
> nlminb I had to increase max iterations).
>
> The function is enormously slow. At first, I created the object rr1 with two 
> calls to sapply(). This works, but creates an extremely large matrix at each 
> iteration.
>
> library(statmod)
> dat <- replicate(20, sample(c(0,1), 2000, replace = T))
> a <- b <- rep(1, 20)
> Q <- 10
> qq <- gauss.quad.prob(Q, dist = 'normal', mu = 0, sigma=1)
> nds <- qq$nodes
> wts <- qq$weights
>
> rr1 <- sapply(1:nrow(dat), function(j)
>                                                 sapply(1:Q, function(i)
>                                                                 
> exp(sum(dbinom(dat[j,], 1, 1/ (1 + exp(- 1.7 * a * (qq$nodes[i] - b))), log = 
> TRUE))) * qq$weights[i]))
>
> So, I thought to reduce some memory, I would do it this way which is 
> equivalent, doesn't create such a large matrix, but instead uses an explicit 
> loop. Both approaches are still equally as slow.
>
> rr1 <- numeric(nrow(dat))
> for(j in 1:length(rr1)){
>                 rr1[j] <- sum(sapply(1:Q,
>                 function(i) exp(sum(dbinom(dat[j,], 1, 1/ (1 + exp(- 1.7 * a 
> * (nds[i] - b))), log = TRUE))) * wts[i]))
>         }
>
> As you noted, my likelihood is not complex; in fact I have another program 
> that uses newton-raphson with the analytic first and second derivatives 
> because they are so easy to find. In that program, the model converges very 
> (very) quickly. My purpose in using numeric differentiation is experiential 
> in some respects and hoping to apply this to problems for which the analytic 
> derivatives might not be so easy to come by.
>
> I think the basic idea here to improve speed is to make a call to the 
> gradient, which I understand to be the vector of first derivatives of my 
> likelihood function, is that right? If that is right, in a multi-parameter 
> problem, I'm not sure how to think about the gradient function. Since I am 
> maximizing w.r.t. a and b (these are the parameters of the model), I would 
> have a vector of first partials for a and another for b. So I conceptually do 
> not understand what the gradient would be in this instance, perhaps some 
> clarification would be helpful.
>
> Below is the working function, which as I noted is enormously slow. Any 
> advice on speed improvements here would be helpful. Thank you
>
> pl2 <- function(dat, Q, startVal = NULL, ...){
>                 if(!is.null(startVal) && length(startVal) != ncol(dat) ){
>                                 stop("Length of argument startVal not equal 
> to the number of parameters estimated")
>                 }
>                 if(!is.null(startVal)){
>                                 startVal <- startVal
>                                 } else {
>                                 p <- colMeans(dat)
>                                 startValA <- rep(1, ncol(dat))
>                                 startValB <- as.vector(log((1 - p)/p))
>                                 startVal <- c(startValA,startValB)
>                 }
>                 rr1 <- numeric(nrow(dat))
>                 qq <- gauss.quad.prob(Q, dist = 'normal', mu = 0, sigma=1)
>         nds <- qq$nodes
>         wts <- qq$weights
>                 dat <- as.matrix(dat)
>                 fn <- function(params){
>                     a <- params[1:20]
>                     b <- params[21:40]
>                 for(j in 1:length(rr1)){
>                 rr1[j] <- sum(sapply(1:Q,
>                 function(i) exp(sum(dbinom(dat[j,], 1, 1/ (1 + exp(- 1.7 * a 
> * (nds[i] - b))), log = TRUE))) * wts[i]))
>                                 }
>                                 -sum(log(rr1))
>                 }
>                 #opt <- optim(startVal, fn, method = "BFGS", hessian = TRUE)
>                 opt <-  nlminb(startVal, fn)
>                 #opt <- Rcgmin(startVal, fn)
>                 opt
>                 #list("coefficients" = opt$par, "LogLik" = -opt$value, 
> "Std.Error" = sqrt(diag(solve(opt$hessian))))
> }
>
> dat <- replicate(20, sample(c(0,1), 2000, replace = T))
> r2 <- pl2(datat, Q =10)
>
> -----Original Message-----
> From: Prof J C Nash (U30A) [mailto:nas...@uottawa.ca]
> Sent: Wednesday, February 18, 2015 9:07 AM
> To: r-help@r-project.org; Doran, Harold
> Subject: Re: [R] multiple parameter optimization with optim()
>
> Some observations -- no solution here though:
>
> 1) the code is not executable. I tried. Maybe that makes it reproducible!
>     Typos such as "stat mod", undefined Q etc.
>
> 2) My experience is that any setup with a ?apply approach that doesn't then 
> check to see that the structure of the data is correct has a high probability 
> of failure due to mismatch with the optimizer requirements.
> It's worth being VERY pedestrian in setting up optimization functions and 
> checking obsessively that you get what you expect and that there are no 
> regions you are likely to wander into with divide by 0, log(negative), etc.
>
> 3) optim() is a BAD choice here. I wrote the source for three of the codes, 
> and the one most appropriate for many parameters (CG) I have been deprecating 
> for about 30 years. Use Rcgmin or something else instead.
>
> 4) If possible, analytic gradients are needed for CG like codes. You probably 
> need to dig out some source code for dbinom() to do this, but your function 
> is not particularly complicated, and doesn't have "if"
> statements etc. However, you could test a case using the numDeriv gradient 
> that is an option for Rcgmin, but it will be painfully slow.
> For a one-off computation, that may still be acceptable.
>
> JN
>
> On 15-02-18 06:00 AM, r-help-requ...@r-project.org wrote:
>> Message: 37
>> Date: Tue, 17 Feb 2015 23:03:24 +0000
>> From: "Doran, Harold" <hdo...@air.org>
>> To: "r-help@r-project.org" <r-help@r-project.org>
>> Subject: [R] multiple parameter optimization with optim()
>> Message-ID: <d10931e1.23c0e%hdo...@air.org>
>> Content-Type: text/plain; charset="UTF-8"
>>
>> I am trying to generalize a working piece of code for a single parameter to 
>> a multiple parameter problem. Reproducible code is below. The parameters to 
>> be estimated are a, b, and c. The estimation problem is such that there is 
>> one set of a, b, c parameters for each column of the data. Hence, in this 
>> sample data with 20 columns, there are 20 a params, 20 b-params, and 20 
>> c-params.
>>
>> Because I am estimating so many parameters, I am not certain that I have 
>> indicated to the function properly the right number of params to estimate 
>> and also if I have generated starting values in a sufficient way.
>>
>> Thanks for any help.
>> Harold
>>
>> dat <- replicate(20, sample(c(0,1), 2000, replace = T)) library(stat
>> mod) qq <- gauss.quad.prob(Q, dist = 'normal', mu = 0, sigma = 1) nds
>> <- qq$nodes wts <- qq$weights fn <- function(params){ a <-
>> params[1:ncol(dat)] b <- params[1:ncol(dat)] c <- params[1:ncol(dat)]
>> L <- sapply(1:ncol(dat), function(i) dbinom(dat[,i], 1, c + ((1 -
>> c)/(1 + exp(-1.7 * a * (nds[i] - b)))) * wts[i]))
>> r1 <- prod(colSums(L * wts))
>> -log(r1)
>> }
>> startVal <- rep(.5, ncol(dat))
>> opt <- optim(startVal, fn)
>>
>>       [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to