Re: [R] optim bug/help?
Hi Kevin, There is no substitute to knowing your problem as well as you possibly can (I know this can be challenging in complicated, real problems). If you plot the contours of the function, you will see that they are banana shaped around the solution, i.e. the objective function is not very sensitive to changes along the "length" of the banana, but changes rapidly along the perpendicular direction. A concise way of saying all this is to say that the problem is ill-conditioned. A consequence of such ill-conditioning is that the numerical solution becomes highly sensitive to perturbations in "data", where the term "data" includes all the parameters (related to both the function and the numerical algorithm) that are required to solve the problem. The step-size for computing finite-difference approximation of gradient is a datum that has a significant impact on the solution, as Ben had pointed out, because of this ill-conditioning. Starting value is another crucial piece of data, as Patrick had pointed out. In the Rosenbrock example, the problem is due to the scaling factor "100" in the Rosenbrock function. Increase this, and the problem becomes more severe. One way to get a sense of the degree of ill-conditioning is to look at the ratio of the largest to smallest eigen value of the inverse of the Hessian matrix. ans <- optim(c(-1,1), fr, grr, method = "BFGS", hessian=TRUE) eigen( solve(ans$hess) )$val Hope this is helpful, Ravi. --- Ravi Varadhan, Ph.D. Assistant Professor, The Center on Aging and Health Division of Geriatric Medicine and Gerontology Johns Hopkins University Ph: (410) 502-2619 Fax: (410) 614-9625 Email: [EMAIL PROTECTED] Webpage: http://www.jhsph.edu/agingandhealth/People/Faculty/Varadhan.html -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of [EMAIL PROTECTED] Sent: Thursday, October 23, 2008 4:03 PM To: [EMAIL PROTECTED]; Ben Bolker Subject: Re: [R] optim bug/help? I was just trying to get a feel for the general strengths and weaknesses of the algoritmns. This is obviously a "demo" function and probably not representative of the "real" optimization problems that I will face. My concern was that if there is inaccuracy when I know the answer it might be worse when I don't if I don't understand the characteristics of the algoritmns involved. Kevin Ben Bolker <[EMAIL PROTECTED]> wrote: > charter.net> writes: > > > > > In the documentation for 'optim' it gives the following function: > > > > fr <- function(x) { ## Rosenbrock Banana function > > x1 <- x[1] > > x2 <- x[2] > > 100 * (x2 - x1 * x1)^2 + (1 - x1)^2 } optim(c(-1.2,1), fr) > > > > When I run this code I get: > > > > $par > > [1] 1.000260 1.000506 > > > > I am sure I am missing something but why isn't 1,1 a better answer? > > If I plug > 1,1 in the function it seems that > > the function is zero. Whereas the answer given gives a function > > result of > 8.82e-8. This was after 195 calls > > to the function (as reported by optim). The documentation indicates > > that the > 'reltol' is about 1e-8. Is > > this the limit that I am bumping up against? > > > > Kevin > > > > Yes, this is basically just numeric fuzz, the bulk of which probably > comes from finite-difference evaluation of the derivative. > As demonstrated below, you can get a lot closer by defining an > analytic gradient function. > May I ask why this level of accuracy is important? > (Insert Tukey quotation here about approximate answers to the right > question here ...) > > Ben Bolker > - > > fr <- function(x) { ## Rosenbrock Banana function > x1 <- x[1] > x2 <- x[2] > 100 * (x2 - x1 * x1)^2 + (1 - x1)^2 } > > ## gradient function > frg <- function(x) { > x1 <- x[1] > x2 <- x[2] > c(100*(4*x1^3-2*x2*2*x1)-2+2*x1, > 100*(2*x2-2*x1^2)) > } > > ## use numericDeriv to double-check my calculus > x1 <- 1.5 > x2 <- 1.7 > numericDeriv(quote(fr(c(x1,x2))),c("x1","x2")) > frg(c(x1,x2)) > > ## > optim(c(-1.2,1), fr) ## Nelder-Mead > optim(c(-1.2,1), fr,method="BFGS") > optim(c(-1.2,1), fr, gr=frg,method="BFGS") ## use gradient > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http:
Re: [R] optim bug/help?
[EMAIL PROTECTED] wrote: I was just trying to get a feel for the general strengths and weaknesses of the algoritmns. This is obviously a "demo" function and probably not representative of the "real" optimization problems that I will face. My concern was that if there is inaccuracy when I know the answer it might be worse when I don't if I don't understand the characteristics of the algoritmns involved. One approach to this problem is to do the optimization several times with different (random?) starts. Patrick Burns [EMAIL PROTECTED] +44 (0)20 8525 0696 http://www.burns-stat.com (home of S Poetry and "A Guide for the Unwilling S User") Kevin Ben Bolker <[EMAIL PROTECTED]> wrote: charter.net> writes: In the documentation for 'optim' it gives the following function: fr <- function(x) { ## Rosenbrock Banana function x1 <- x[1] x2 <- x[2] 100 * (x2 - x1 * x1)^2 + (1 - x1)^2 } optim(c(-1.2,1), fr) When I run this code I get: $par [1] 1.000260 1.000506 I am sure I am missing something but why isn't 1,1 a better answer? If I plug 1,1 in the function it seems that the function is zero. Whereas the answer given gives a function result of 8.82e-8. This was after 195 calls to the function (as reported by optim). The documentation indicates that the 'reltol' is about 1e-8. Is this the limit that I am bumping up against? Kevin Yes, this is basically just numeric fuzz, the bulk of which probably comes from finite-difference evaluation of the derivative. As demonstrated below, you can get a lot closer by defining an analytic gradient function. May I ask why this level of accuracy is important? (Insert Tukey quotation here about approximate answers to the right question here ...) Ben Bolker - fr <- function(x) { ## Rosenbrock Banana function x1 <- x[1] x2 <- x[2] 100 * (x2 - x1 * x1)^2 + (1 - x1)^2 } ## gradient function frg <- function(x) { x1 <- x[1] x2 <- x[2] c(100*(4*x1^3-2*x2*2*x1)-2+2*x1, 100*(2*x2-2*x1^2)) } ## use numericDeriv to double-check my calculus x1 <- 1.5 x2 <- 1.7 numericDeriv(quote(fr(c(x1,x2))),c("x1","x2")) frg(c(x1,x2)) ## optim(c(-1.2,1), fr) ## Nelder-Mead optim(c(-1.2,1), fr,method="BFGS") optim(c(-1.2,1), fr, gr=frg,method="BFGS") ## use gradient __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] optim bug/help?
I was just trying to get a feel for the general strengths and weaknesses of the algoritmns. This is obviously a "demo" function and probably not representative of the "real" optimization problems that I will face. My concern was that if there is inaccuracy when I know the answer it might be worse when I don't if I don't understand the characteristics of the algoritmns involved. Kevin Ben Bolker <[EMAIL PROTECTED]> wrote: > charter.net> writes: > > > > > In the documentation for 'optim' it gives the following function: > > > > fr <- function(x) { ## Rosenbrock Banana function > > x1 <- x[1] > > x2 <- x[2] > > 100 * (x2 - x1 * x1)^2 + (1 - x1)^2 > > } > > optim(c(-1.2,1), fr) > > > > When I run this code I get: > > > > $par > > [1] 1.000260 1.000506 > > > > I am sure I am missing something but why isn't 1,1 a better answer? If I > > plug > 1,1 in the function it seems that > > the function is zero. Whereas the answer given gives a function result of > 8.82e-8. This was after 195 calls > > to the function (as reported by optim). The documentation indicates that the > 'reltol' is about 1e-8. Is > > this the limit that I am bumping up against? > > > > Kevin > > > > Yes, this is basically just numeric fuzz, the > bulk of which probably comes from finite-difference > evaluation of the derivative. > As demonstrated below, you can get a lot closer > by defining an analytic gradient function. > May I ask why this level of accuracy is important? > (Insert Tukey quotation here about approximate answers > to the right question here ...) > > Ben Bolker > - > > fr <- function(x) { ## Rosenbrock Banana function > x1 <- x[1] > x2 <- x[2] > 100 * (x2 - x1 * x1)^2 + (1 - x1)^2 > } > > ## gradient function > frg <- function(x) { > x1 <- x[1] > x2 <- x[2] > c(100*(4*x1^3-2*x2*2*x1)-2+2*x1, > 100*(2*x2-2*x1^2)) > } > > ## use numericDeriv to double-check my calculus > x1 <- 1.5 > x2 <- 1.7 > numericDeriv(quote(fr(c(x1,x2))),c("x1","x2")) > frg(c(x1,x2)) > > ## > optim(c(-1.2,1), fr) ## Nelder-Mead > optim(c(-1.2,1), fr,method="BFGS") > optim(c(-1.2,1), fr, gr=frg,method="BFGS") ## use gradient > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] optim bug/help?
charter.net> writes: > > In the documentation for 'optim' it gives the following function: > > fr <- function(x) { ## Rosenbrock Banana function > x1 <- x[1] > x2 <- x[2] > 100 * (x2 - x1 * x1)^2 + (1 - x1)^2 > } > optim(c(-1.2,1), fr) > > When I run this code I get: > > $par > [1] 1.000260 1.000506 > > I am sure I am missing something but why isn't 1,1 a better answer? If I plug 1,1 in the function it seems that > the function is zero. Whereas the answer given gives a function result of 8.82e-8. This was after 195 calls > to the function (as reported by optim). The documentation indicates that the 'reltol' is about 1e-8. Is > this the limit that I am bumping up against? > > Kevin > Yes, this is basically just numeric fuzz, the bulk of which probably comes from finite-difference evaluation of the derivative. As demonstrated below, you can get a lot closer by defining an analytic gradient function. May I ask why this level of accuracy is important? (Insert Tukey quotation here about approximate answers to the right question here ...) Ben Bolker - fr <- function(x) { ## Rosenbrock Banana function x1 <- x[1] x2 <- x[2] 100 * (x2 - x1 * x1)^2 + (1 - x1)^2 } ## gradient function frg <- function(x) { x1 <- x[1] x2 <- x[2] c(100*(4*x1^3-2*x2*2*x1)-2+2*x1, 100*(2*x2-2*x1^2)) } ## use numericDeriv to double-check my calculus x1 <- 1.5 x2 <- 1.7 numericDeriv(quote(fr(c(x1,x2))),c("x1","x2")) frg(c(x1,x2)) ## optim(c(-1.2,1), fr) ## Nelder-Mead optim(c(-1.2,1), fr,method="BFGS") optim(c(-1.2,1), fr, gr=frg,method="BFGS") ## use gradient __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.