Re: [R] optim bug/help?

2008-10-23 Thread Ravi Varadhan
Hi Kevin,

There is no substitute to knowing your problem as well as you possibly can
(I know this can be challenging in complicated, real problems).  If you plot
the contours of the function, you will see that they are banana shaped
around the solution, i.e. the objective function is not very sensitive to
changes along the "length" of the banana, but changes rapidly along the
perpendicular direction.  A concise way of saying all this is to say that
the problem is ill-conditioned.  A consequence of such ill-conditioning is
that the numerical solution becomes highly sensitive to perturbations in
"data", where the term "data" includes all the parameters (related to both
the function and the numerical algorithm) that are required to solve the
problem.  The step-size for computing finite-difference approximation of
gradient is a datum that has a significant impact on the solution, as Ben
had pointed out, because of this ill-conditioning.  Starting value is
another crucial piece of data, as Patrick had pointed out.

In the Rosenbrock example, the problem is due to the scaling factor "100" in
the Rosenbrock function.  Increase this, and the problem becomes more
severe.  

One way to get a sense of the degree of ill-conditioning is to look at the
ratio of the largest to smallest eigen value of the inverse of the Hessian
matrix.

 ans <- optim(c(-1,1), fr, grr, method = "BFGS", hessian=TRUE) 
 eigen( solve(ans$hess) )$val

Hope this is helpful,
Ravi.


---

Ravi Varadhan, Ph.D.

Assistant Professor, The Center on Aging and Health

Division of Geriatric Medicine and Gerontology 

Johns Hopkins University

Ph: (410) 502-2619

Fax: (410) 614-9625

Email: [EMAIL PROTECTED]

Webpage:  http://www.jhsph.edu/agingandhealth/People/Faculty/Varadhan.html

 





-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On
Behalf Of [EMAIL PROTECTED]
Sent: Thursday, October 23, 2008 4:03 PM
To: [EMAIL PROTECTED]; Ben Bolker
Subject: Re: [R] optim bug/help?

I was just trying to get a feel for the general strengths and weaknesses of
the algoritmns. This is obviously a "demo" function and probably not
representative of the "real" optimization problems that I will face. My
concern was that if there is inaccuracy when I know the answer it might be
worse when I don't if I don't understand the characteristics of the
algoritmns involved.

Kevin

 Ben Bolker <[EMAIL PROTECTED]> wrote: 
>   charter.net> writes:
> 
> > 
> > In the documentation for 'optim' it gives the following function:
> > 
> > fr <- function(x) {   ## Rosenbrock Banana function
> > x1 <- x[1]
> > x2 <- x[2]
> > 100 * (x2 - x1 * x1)^2 + (1 - x1)^2 } optim(c(-1.2,1), fr)
> > 
> > When I run this code I get:
> > 
> > $par
> > [1] 1.000260 1.000506
> > 
> > I am sure I am missing something but why isn't 1,1 a better answer? 
> > If I plug
> 1,1 in the function it seems that
> > the function is zero. Whereas the answer given gives a function 
> > result of
> 8.82e-8. This was after 195 calls
> > to the function (as reported by optim). The documentation indicates 
> > that the
> 'reltol' is about 1e-8. Is
> > this the limit that I am bumping up against?
> > 
> > Kevin
> > 
> 
>   Yes, this is basically just numeric fuzz, the bulk of which probably 
> comes from finite-difference evaluation of the derivative.
> As demonstrated below, you can get a lot closer by defining an 
> analytic gradient function.
> May I ask why this level of accuracy is important?
> (Insert Tukey quotation here about approximate answers to the right 
> question here ...)
> 
>   Ben Bolker
> -
> 
> fr <- function(x) {   ## Rosenbrock Banana function
> x1 <- x[1]
> x2 <- x[2]
> 100 * (x2 - x1 * x1)^2 + (1 - x1)^2 }
> 
> ## gradient function
> frg <- function(x) {
>   x1 <- x[1]
>   x2 <- x[2]
>   c(100*(4*x1^3-2*x2*2*x1)-2+2*x1,
> 100*(2*x2-2*x1^2))
> }
> 
> ## use numericDeriv to double-check my calculus
> x1 <- 1.5
> x2 <- 1.7
> numericDeriv(quote(fr(c(x1,x2))),c("x1","x2"))
> frg(c(x1,x2))
> 
> ##
> optim(c(-1.2,1), fr)  ## Nelder-Mead
> optim(c(-1.2,1), fr,method="BFGS")
> optim(c(-1.2,1), fr, gr=frg,method="BFGS")  ## use gradient
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http:

Re: [R] optim bug/help?

2008-10-23 Thread Patrick Burns

[EMAIL PROTECTED] wrote:

I was just trying to get a feel for the general strengths and weaknesses of the algoritmns. This is 
obviously a "demo" function and probably not representative of the "real" 
optimization problems that I will face. My concern was that if there is inaccuracy when I know the 
answer it might be worse when I don't if I don't understand the characteristics of the algoritmns 
involved.
  


One approach to this problem is to do the optimization
several times with different (random?) starts.


Patrick Burns
[EMAIL PROTECTED]
+44 (0)20 8525 0696
http://www.burns-stat.com
(home of S Poetry and "A Guide for the Unwilling S User")

Kevin

 Ben Bolker <[EMAIL PROTECTED]> wrote: 
  

  charter.net> writes:



In the documentation for 'optim' it gives the following function:

fr <- function(x) {   ## Rosenbrock Banana function
x1 <- x[1]
x2 <- x[2]
100 * (x2 - x1 * x1)^2 + (1 - x1)^2
}
optim(c(-1.2,1), fr)

When I run this code I get:

$par
[1] 1.000260 1.000506

I am sure I am missing something but why isn't 1,1 a better answer? If I plug
  

1,1 in the function it seems that


the function is zero. Whereas the answer given gives a function result of
  

8.82e-8. This was after 195 calls


to the function (as reported by optim). The documentation indicates that the
  

'reltol' is about 1e-8. Is


this the limit that I am bumping up against?

Kevin

  

  Yes, this is basically just numeric fuzz, the
bulk of which probably comes from finite-difference
evaluation of the derivative.
As demonstrated below, you can get a lot closer
by defining an analytic gradient function.
May I ask why this level of accuracy is important?
(Insert Tukey quotation here about approximate answers
to the right question here ...)

  Ben Bolker
-

fr <- function(x) {   ## Rosenbrock Banana function
x1 <- x[1]
x2 <- x[2]
100 * (x2 - x1 * x1)^2 + (1 - x1)^2
}

## gradient function
frg <- function(x) {
  x1 <- x[1]
  x2 <- x[2]
  c(100*(4*x1^3-2*x2*2*x1)-2+2*x1,
100*(2*x2-2*x1^2))
}

## use numericDeriv to double-check my calculus
x1 <- 1.5
x2 <- 1.7
numericDeriv(quote(fr(c(x1,x2))),c("x1","x2"))
frg(c(x1,x2))

##
optim(c(-1.2,1), fr)  ## Nelder-Mead
optim(c(-1.2,1), fr,method="BFGS")
optim(c(-1.2,1), fr, gr=frg,method="BFGS")  ## use gradient

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.





__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] optim bug/help?

2008-10-23 Thread rkevinburton
I was just trying to get a feel for the general strengths and weaknesses of the 
algoritmns. This is obviously a "demo" function and probably not representative 
of the "real" optimization problems that I will face. My concern was that if 
there is inaccuracy when I know the answer it might be worse when I don't if I 
don't understand the characteristics of the algoritmns involved.

Kevin

 Ben Bolker <[EMAIL PROTECTED]> wrote: 
>   charter.net> writes:
> 
> > 
> > In the documentation for 'optim' it gives the following function:
> > 
> > fr <- function(x) {   ## Rosenbrock Banana function
> > x1 <- x[1]
> > x2 <- x[2]
> > 100 * (x2 - x1 * x1)^2 + (1 - x1)^2
> > }
> > optim(c(-1.2,1), fr)
> > 
> > When I run this code I get:
> > 
> > $par
> > [1] 1.000260 1.000506
> > 
> > I am sure I am missing something but why isn't 1,1 a better answer? If I 
> > plug
> 1,1 in the function it seems that
> > the function is zero. Whereas the answer given gives a function result of
> 8.82e-8. This was after 195 calls
> > to the function (as reported by optim). The documentation indicates that the
> 'reltol' is about 1e-8. Is
> > this the limit that I am bumping up against?
> > 
> > Kevin
> > 
> 
>   Yes, this is basically just numeric fuzz, the
> bulk of which probably comes from finite-difference
> evaluation of the derivative.
> As demonstrated below, you can get a lot closer
> by defining an analytic gradient function.
> May I ask why this level of accuracy is important?
> (Insert Tukey quotation here about approximate answers
> to the right question here ...)
> 
>   Ben Bolker
> -
> 
> fr <- function(x) {   ## Rosenbrock Banana function
> x1 <- x[1]
> x2 <- x[2]
> 100 * (x2 - x1 * x1)^2 + (1 - x1)^2
> }
> 
> ## gradient function
> frg <- function(x) {
>   x1 <- x[1]
>   x2 <- x[2]
>   c(100*(4*x1^3-2*x2*2*x1)-2+2*x1,
> 100*(2*x2-2*x1^2))
> }
> 
> ## use numericDeriv to double-check my calculus
> x1 <- 1.5
> x2 <- 1.7
> numericDeriv(quote(fr(c(x1,x2))),c("x1","x2"))
> frg(c(x1,x2))
> 
> ##
> optim(c(-1.2,1), fr)  ## Nelder-Mead
> optim(c(-1.2,1), fr,method="BFGS")
> optim(c(-1.2,1), fr, gr=frg,method="BFGS")  ## use gradient
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] optim bug/help?

2008-10-22 Thread Ben Bolker
  charter.net> writes:

> 
> In the documentation for 'optim' it gives the following function:
> 
> fr <- function(x) {   ## Rosenbrock Banana function
> x1 <- x[1]
> x2 <- x[2]
> 100 * (x2 - x1 * x1)^2 + (1 - x1)^2
> }
> optim(c(-1.2,1), fr)
> 
> When I run this code I get:
> 
> $par
> [1] 1.000260 1.000506
> 
> I am sure I am missing something but why isn't 1,1 a better answer? If I plug
1,1 in the function it seems that
> the function is zero. Whereas the answer given gives a function result of
8.82e-8. This was after 195 calls
> to the function (as reported by optim). The documentation indicates that the
'reltol' is about 1e-8. Is
> this the limit that I am bumping up against?
> 
> Kevin
> 

  Yes, this is basically just numeric fuzz, the
bulk of which probably comes from finite-difference
evaluation of the derivative.
As demonstrated below, you can get a lot closer
by defining an analytic gradient function.
May I ask why this level of accuracy is important?
(Insert Tukey quotation here about approximate answers
to the right question here ...)

  Ben Bolker
-

fr <- function(x) {   ## Rosenbrock Banana function
x1 <- x[1]
x2 <- x[2]
100 * (x2 - x1 * x1)^2 + (1 - x1)^2
}

## gradient function
frg <- function(x) {
  x1 <- x[1]
  x2 <- x[2]
  c(100*(4*x1^3-2*x2*2*x1)-2+2*x1,
100*(2*x2-2*x1^2))
}

## use numericDeriv to double-check my calculus
x1 <- 1.5
x2 <- 1.7
numericDeriv(quote(fr(c(x1,x2))),c("x1","x2"))
frg(c(x1,x2))

##
optim(c(-1.2,1), fr)  ## Nelder-Mead
optim(c(-1.2,1), fr,method="BFGS")
optim(c(-1.2,1), fr, gr=frg,method="BFGS")  ## use gradient

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.