This is not the proper venue for a discussion of the mathematics of optimization, no matter that it is interesting. Please take it off list.
Cheers, Bert Bert Gunter Genentech Nonclinical Biostatistics (650) 467-7374 "Data is not information. Information is not knowledge. And knowledge is certainly not wisdom." Clifford Stoll On Fri, Feb 20, 2015 at 6:03 AM, Doran, Harold <hdo...@air.org> wrote: > John et al > > Thank you for your advice below. It was sloppy of me not to verify my > reproducible code below. I have tried a few of your suggestions and wrapped > the working code into the function below called pl2. The function properly > lands on the right model parameters when I use the optim or nlminb (for > nlminb I had to increase max iterations). > > The function is enormously slow. At first, I created the object rr1 with two > calls to sapply(). This works, but creates an extremely large matrix at each > iteration. > > library(statmod) > dat <- replicate(20, sample(c(0,1), 2000, replace = T)) > a <- b <- rep(1, 20) > Q <- 10 > qq <- gauss.quad.prob(Q, dist = 'normal', mu = 0, sigma=1) > nds <- qq$nodes > wts <- qq$weights > > rr1 <- sapply(1:nrow(dat), function(j) > sapply(1:Q, function(i) > > exp(sum(dbinom(dat[j,], 1, 1/ (1 + exp(- 1.7 * a * (qq$nodes[i] - b))), log = > TRUE))) * qq$weights[i])) > > So, I thought to reduce some memory, I would do it this way which is > equivalent, doesn't create such a large matrix, but instead uses an explicit > loop. Both approaches are still equally as slow. > > rr1 <- numeric(nrow(dat)) > for(j in 1:length(rr1)){ > rr1[j] <- sum(sapply(1:Q, > function(i) exp(sum(dbinom(dat[j,], 1, 1/ (1 + exp(- 1.7 * a > * (nds[i] - b))), log = TRUE))) * wts[i])) > } > > As you noted, my likelihood is not complex; in fact I have another program > that uses newton-raphson with the analytic first and second derivatives > because they are so easy to find. In that program, the model converges very > (very) quickly. My purpose in using numeric differentiation is experiential > in some respects and hoping to apply this to problems for which the analytic > derivatives might not be so easy to come by. > > I think the basic idea here to improve speed is to make a call to the > gradient, which I understand to be the vector of first derivatives of my > likelihood function, is that right? If that is right, in a multi-parameter > problem, I'm not sure how to think about the gradient function. Since I am > maximizing w.r.t. a and b (these are the parameters of the model), I would > have a vector of first partials for a and another for b. So I conceptually do > not understand what the gradient would be in this instance, perhaps some > clarification would be helpful. > > Below is the working function, which as I noted is enormously slow. Any > advice on speed improvements here would be helpful. Thank you > > pl2 <- function(dat, Q, startVal = NULL, ...){ > if(!is.null(startVal) && length(startVal) != ncol(dat) ){ > stop("Length of argument startVal not equal > to the number of parameters estimated") > } > if(!is.null(startVal)){ > startVal <- startVal > } else { > p <- colMeans(dat) > startValA <- rep(1, ncol(dat)) > startValB <- as.vector(log((1 - p)/p)) > startVal <- c(startValA,startValB) > } > rr1 <- numeric(nrow(dat)) > qq <- gauss.quad.prob(Q, dist = 'normal', mu = 0, sigma=1) > nds <- qq$nodes > wts <- qq$weights > dat <- as.matrix(dat) > fn <- function(params){ > a <- params[1:20] > b <- params[21:40] > for(j in 1:length(rr1)){ > rr1[j] <- sum(sapply(1:Q, > function(i) exp(sum(dbinom(dat[j,], 1, 1/ (1 + exp(- 1.7 * a > * (nds[i] - b))), log = TRUE))) * wts[i])) > } > -sum(log(rr1)) > } > #opt <- optim(startVal, fn, method = "BFGS", hessian = TRUE) > opt <- nlminb(startVal, fn) > #opt <- Rcgmin(startVal, fn) > opt > #list("coefficients" = opt$par, "LogLik" = -opt$value, > "Std.Error" = sqrt(diag(solve(opt$hessian)))) > } > > dat <- replicate(20, sample(c(0,1), 2000, replace = T)) > r2 <- pl2(datat, Q =10) > > -----Original Message----- > From: Prof J C Nash (U30A) [mailto:nas...@uottawa.ca] > Sent: Wednesday, February 18, 2015 9:07 AM > To: r-help@r-project.org; Doran, Harold > Subject: Re: [R] multiple parameter optimization with optim() > > Some observations -- no solution here though: > > 1) the code is not executable. I tried. Maybe that makes it reproducible! > Typos such as "stat mod", undefined Q etc. > > 2) My experience is that any setup with a ?apply approach that doesn't then > check to see that the structure of the data is correct has a high probability > of failure due to mismatch with the optimizer requirements. > It's worth being VERY pedestrian in setting up optimization functions and > checking obsessively that you get what you expect and that there are no > regions you are likely to wander into with divide by 0, log(negative), etc. > > 3) optim() is a BAD choice here. I wrote the source for three of the codes, > and the one most appropriate for many parameters (CG) I have been deprecating > for about 30 years. Use Rcgmin or something else instead. > > 4) If possible, analytic gradients are needed for CG like codes. You probably > need to dig out some source code for dbinom() to do this, but your function > is not particularly complicated, and doesn't have "if" > statements etc. However, you could test a case using the numDeriv gradient > that is an option for Rcgmin, but it will be painfully slow. > For a one-off computation, that may still be acceptable. > > JN > > On 15-02-18 06:00 AM, r-help-requ...@r-project.org wrote: >> Message: 37 >> Date: Tue, 17 Feb 2015 23:03:24 +0000 >> From: "Doran, Harold" <hdo...@air.org> >> To: "r-help@r-project.org" <r-help@r-project.org> >> Subject: [R] multiple parameter optimization with optim() >> Message-ID: <d10931e1.23c0e%hdo...@air.org> >> Content-Type: text/plain; charset="UTF-8" >> >> I am trying to generalize a working piece of code for a single parameter to >> a multiple parameter problem. Reproducible code is below. The parameters to >> be estimated are a, b, and c. The estimation problem is such that there is >> one set of a, b, c parameters for each column of the data. Hence, in this >> sample data with 20 columns, there are 20 a params, 20 b-params, and 20 >> c-params. >> >> Because I am estimating so many parameters, I am not certain that I have >> indicated to the function properly the right number of params to estimate >> and also if I have generated starting values in a sufficient way. >> >> Thanks for any help. >> Harold >> >> dat <- replicate(20, sample(c(0,1), 2000, replace = T)) library(stat >> mod) qq <- gauss.quad.prob(Q, dist = 'normal', mu = 0, sigma = 1) nds >> <- qq$nodes wts <- qq$weights fn <- function(params){ a <- >> params[1:ncol(dat)] b <- params[1:ncol(dat)] c <- params[1:ncol(dat)] >> L <- sapply(1:ncol(dat), function(i) dbinom(dat[,i], 1, c + ((1 - >> c)/(1 + exp(-1.7 * a * (nds[i] - b)))) * wts[i])) >> r1 <- prod(colSums(L * wts)) >> -log(r1) >> } >> startVal <- rep(.5, ncol(dat)) >> opt <- optim(startVal, fn) >> >> [[alternative HTML version deleted]] > > ______________________________________________ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.