Henry Wolkowicz (google his page for lots of optimization references) mentioned to me that that function is a standard example to show that first order methods (e.g. steepest descent) can fail by repeatedly crossing back and forth over the valley whereas second order methods (damped Newton, damped quasi-Newton, trust region) use curvature info to get convergence in a few iterations.
On 12/4/05, Spencer Graves <[EMAIL PROTECTED]> wrote: > GENERAL REFERENCE ON NONLINEAR OPTIMIZATION? > > What are your favorite references on nonlinear optimization? I like > Bates and Watts (1988) Nonlinear Regression Analysis and Its > Applications (Wiley), especially for its key insights regarding > parameter effects vs. intrinsic curvature. Before I spent time and > money on several of the refences cited on the help pages for "optim", > "nlm", etc., I thought I'd ask you all for your thoughts. > > ROSENBROCK'S BANANA VALLEY FUNCTION? > > Beyond this, I wonder if someone help me understand the lessons one > should take from Rosenbrock's banana valley function: > > banana <- function(x){ > 100*(x[2]-x[1]^2)^2+(1-x[1])^2 > } > > This a quartic x[1] and a parabola in x[2] with a unique minimum at > x[2]=x[1]=1. Over the range (-1, 2)x(-1,1), it looks like a long, > curved, deep, narrow banana-shaped valley. It is a known hard problem > in nonlinear regression, but these difficulties don't affect "nlm" or > "nlminb" until the hessian is provided analytically (with R 2.2.0 under > Windows XP): > > nlm(banana, c(-1.2, 1)) # found the minimum in 23 iterations > nlminb(c(-1.2, 1), banana)# found the min in 35 iterations > > Dbanana <- function(x){ > c(-400*x[1]*(x[2] - x[1]^2) - 2*(1-x[1]), > 200*(x[2] - x[1]^2)) > } > banana1 <- function(x){ > b <- 100*(x[2]-x[1]^2)^2+(1-x[1])^2 > attr(b, "gradient") <- Dbanana(x) > b > } > > nlm(banana1, c(-1.2, 1)) # solved the problem in 24 iterations > nlminb(c(-1.2, 1), banana, Dbanana)# solution in 35 iterations > > D2banana <- function(x){ > a11 <- (2 - 400*(x[2] - x[1]^2) + 800*x[2]*x[1]^2) > a21 <- (-400*x[1]) > matrix(c(a11,a21,a21,200),2,2) > } > banana2 <- function(x){ > b <- 100*(x[2]-x[1]^2)^2+(1-x[1])^2 > attr(b, "gradient") <- Dbanana(x) > attr(b, "hessian") <- D2banana(x) > b > } > > nlm(banana2, c(-1.2, 1)) > # Found the valley but not the minimum > # in the default 100 iterations. > nlm(banana2, c(-1.2, 1), iterlim=10000) > # found the minimum to 3 significant digits in 5017 iterations. > > nlminb(c(-1.2, 1), banana, Dbanana, D2banana) > # took 95 iterations to find the answer to double precision. > > To understand this better, I wrote my own version of "nlm" (see > below), and learned that the hessian is often indefinite, with one > eigenvalue positive and the other negative. If I understand correctly, > a negative eigenvalue of the hessian tends to push the next step towards > increasing rather than decreasing the function. I tried a few things > that accelerated the convergence slightly, but but my "nlm." still had > not converged after 100 iterations. > > What might be done to improve the performance of something like "nlm" > without substantially increasing the overhead for other problems? > > Thanks. > spencer graves > ############################# > nlm. <- function(f=fgh, p=c(-1.2, 1), > gradtol=1e-6, steptol=1e-6, iterlim=100){ > # R code version of "nlm" > # requiring analytic gradient and hessian > # > # Initial evaluation > f.i <- f(p) > f0 <- f.i+1 > # Iterate > for(i in 1:iterlim){ > df <- attr(f.i, "gradient") > # Gradient sufficiently small? > if(sum(df^2)<(gradtol^2)){ > return(list(minimum=f.i, estimate=p+dp, > gradient=df, hessian=d2f, code=1, > iterations=i)) > } > # > d2f <- attr(f.i, "hessian") > dp <- (-solve(d2f, df)) > # Step sufficiently small? > if(sum(dp^2)<(steptol^2)){ > return(list(minimum=f.i, estimate=p+dp, > gradient=df, hessian=d2f, code=2, > iterations=i)) > } > # Next iter > f0 <- f.i > f.i <- f(p+dp) > # Step size control > if(f.i>=f0){ > for(j in 1:iterlim){ > { > if(j==1){ > d2f.eig <- eigen(d2f, symmetric=T) > cat("\nstep size control; i=", i, > "; p=", round(p, 3), "; dp=", signif(dp, 2), > "; eig(hessian)=",signif(d2f.eig$values, 4)) > v.max <- (1+max(abs(d2f.eig$values))) > v.adj <- pmax(.001*v.max, abs(d2f.eig$values)) > evec.df <- (t(d2f.eig$vectors) %*% df) > dp <- (-(d2f.eig$vectors %*% > (evec.df/(1+v.adj)))) > } > else{ > cat(".") > dp <- dp/2 > } > } > f.i <- f(p+dp) > f2 <- f(p+dp/2) > if(f2<f.i){ > dp <- dp/2 > f.i <- f2 > } > if(f.i<f0)break # j > } > if(f.i>=f0){ > cat("\n") > return(list(minimum=f0, estimate=p, > gradient=attr(f0, "gradient"), > hessian=attr(f0, "hessian"), code=3, > iterations=i)) > } > } > p <- p+dp > cat(i, p, f.i, "\n") > } > return(list(minimum=f.i, estimate=p, > gradient=df, hessian=d2f, code=4, > iterations=i)) > } > > -- > Spencer Graves, PhD > Senior Development Engineer > PDF Solutions, Inc. > 333 West San Carlos Street Suite 700 > San Jose, CA 95110, USA > > [EMAIL PROTECTED] > www.pdf.com <http://www.pdf.com> > Tel: 408-938-4420 > Fax: 408-280-7915 > > ______________________________________________ > R-help@stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html > ______________________________________________ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html