Re: [R] BFGS versus L-BFGS-B

2011-02-26 Thread Prof. John C Nash
Bert has put a finger on one of the key questions: Are we there yet?

Global optima are very difficult to find in many, possibly most problems.
We can check for local optima. In package optimx (currently resetting it on 
CRAN due to a
variable naming glitch with Rvmmin -- should be there in a couple of days) we 
carry out
the Kuhn Karush Tucker tests, but even these have scale dependencies. On 
R-forge there's a
new package under our optimization and solving packages project called kktc 
that separates
off the tests. However, that's only a start.

Constraints require us to do more work, too.

A couple of very large scale problems have been noted. The minimization of 
energy for
collections of atoms or molecules was suggested, but my experience with that 
type of
problem is that it has a huge number of local minima and is not amenable to the 
techniques
that the original poster had in mind, and certainly optim(BFGS) nor Rcgmin nor
optim(L-BFGS-B) are appropriate as they stand, though they may be helpful in 
some sort of
polyalgorithm.

ALso I did not post, but mentioned in private emails, that L-BFGS-B and Rcgmin 
should
offer much lower memory demands for problems in large numbers of parameters than
optim(BFGS) which has an explicit inverse Hessian approximation i.e., n*n 
doubles to
store. BFGS and L-BFGS-B have very different internals.

JN


On 02/25/2011 08:46 PM, Bert Gunter wrote:
> Thanks to all for clarifications. So I'm off base, but whether waaay
> off base depends on whether there is a reasonably well defined optimum
> to converge to. Which begs the question, I suppose: How does one know
> whether one has converged to such an optimum?  This is always an
> issue, of course, even for a few parameters; but maybe more so with so
> many.
> 
> -- Bert
> 
> On Fri, Feb 25, 2011 at 3:35 PM, Ravi Varadhan  wrote:
>> I have worked on a 2D image reconstruction problem in PET (positron emission
>> tomography) using a Poisson model.  Here, each pixel intensity is an unknown
>> parameter.  I have solved problems of size 128 x 128 using an accelerated EM
>> algorithm.  Ken Lange has shown that you can achieve term by term separation
>> using a minorization inequality, and hence the problem simplifies greatly.
>>
>> Ravi.
>>
>> ---
>> Ravi Varadhan, Ph.D.
>> Assistant Professor,
>> Division of Geriatric Medicine and Gerontology School of Medicine Johns
>> Hopkins University
>>
>> Ph. (410) 502-2619
>> email: rvarad...@jhmi.edu
>>
>> -Original Message-
>> From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On
>> Behalf Of Prof. John C Nash
>> Sent: Friday, February 25, 2011 5:55 PM
>> To: Bert Gunter
>> Cc: r-help@r-project.org
>> Subject: Re: [R] BFGS versus L-BFGS-B
>>
>> For functions that have a reasonable structure i.e., 1 or at most a few
>> optima, it is
>> certainly a sensible task. Separable functions are certainly nicer (10K 1D
>> minimizations),
>> but it is pretty easy to devise functions e.g., generalizations of
>> Rosenbrock, Chebyquad
>> and other functions that are high dimension but not separable.
>>
>> Admittedly, there are not a lot of real-world examples that are publicly
>> available. More
>> would be useful.
>>
>> JN
>>
>>
>> On 02/25/2011 05:06 PM, Bert Gunter wrote:
>>> On Fri, Feb 25, 2011 at 12:00 PM, Brian Tsai  wrote:
>>>> Hi John,
>>>>
>>>> Thanks so much for the informative reply!  I'm currently trying to
>> optimize
>>>> ~10,000 parameters simultaneously - for some reason,
>>>
>>> -- Some expert (Ravi, John ?) please correct me, but: Is the above not
>>> complete nonsense? I can't imagine poking around usefully  in 10K
>>> dimensional space for an extremum unless maybe one can find the
>>> extremum by 10K separate 1-dim optimizations. And maybe not then
>>> either.
>>>
>>> Am I way offbase here, or has Brian merely described just another
>>> inefficient way to produce random numbers?
>>>
>>> -- Bert
>>
>> __
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] BFGS versus L-BFGS-B

2011-02-25 Thread Bert Gunter
Thanks to all for clarifications. So I'm off base, but whether waaay
off base depends on whether there is a reasonably well defined optimum
to converge to. Which begs the question, I suppose: How does one know
whether one has converged to such an optimum?  This is always an
issue, of course, even for a few parameters; but maybe more so with so
many.

-- Bert

On Fri, Feb 25, 2011 at 3:35 PM, Ravi Varadhan  wrote:
> I have worked on a 2D image reconstruction problem in PET (positron emission
> tomography) using a Poisson model.  Here, each pixel intensity is an unknown
> parameter.  I have solved problems of size 128 x 128 using an accelerated EM
> algorithm.  Ken Lange has shown that you can achieve term by term separation
> using a minorization inequality, and hence the problem simplifies greatly.
>
> Ravi.
>
> ---
> Ravi Varadhan, Ph.D.
> Assistant Professor,
> Division of Geriatric Medicine and Gerontology School of Medicine Johns
> Hopkins University
>
> Ph. (410) 502-2619
> email: rvarad...@jhmi.edu
>
> -Original Message-
> From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On
> Behalf Of Prof. John C Nash
> Sent: Friday, February 25, 2011 5:55 PM
> To: Bert Gunter
> Cc: r-help@r-project.org
> Subject: Re: [R] BFGS versus L-BFGS-B
>
> For functions that have a reasonable structure i.e., 1 or at most a few
> optima, it is
> certainly a sensible task. Separable functions are certainly nicer (10K 1D
> minimizations),
> but it is pretty easy to devise functions e.g., generalizations of
> Rosenbrock, Chebyquad
> and other functions that are high dimension but not separable.
>
> Admittedly, there are not a lot of real-world examples that are publicly
> available. More
> would be useful.
>
> JN
>
>
> On 02/25/2011 05:06 PM, Bert Gunter wrote:
>> On Fri, Feb 25, 2011 at 12:00 PM, Brian Tsai  wrote:
>>> Hi John,
>>>
>>> Thanks so much for the informative reply!  I'm currently trying to
> optimize
>>> ~10,000 parameters simultaneously - for some reason,
>>
>> -- Some expert (Ravi, John ?) please correct me, but: Is the above not
>> complete nonsense? I can't imagine poking around usefully  in 10K
>> dimensional space for an extremum unless maybe one can find the
>> extremum by 10K separate 1-dim optimizations. And maybe not then
>> either.
>>
>> Am I way offbase here, or has Brian merely described just another
>> inefficient way to produce random numbers?
>>
>> -- Bert
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] BFGS versus L-BFGS-B

2011-02-25 Thread Ravi Varadhan
I have worked on a 2D image reconstruction problem in PET (positron emission
tomography) using a Poisson model.  Here, each pixel intensity is an unknown
parameter.  I have solved problems of size 128 x 128 using an accelerated EM
algorithm.  Ken Lange has shown that you can achieve term by term separation
using a minorization inequality, and hence the problem simplifies greatly.

Ravi.

---
Ravi Varadhan, Ph.D.
Assistant Professor,
Division of Geriatric Medicine and Gerontology School of Medicine Johns
Hopkins University

Ph. (410) 502-2619
email: rvarad...@jhmi.edu

-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On
Behalf Of Prof. John C Nash
Sent: Friday, February 25, 2011 5:55 PM
To: Bert Gunter
Cc: r-help@r-project.org
Subject: Re: [R] BFGS versus L-BFGS-B

For functions that have a reasonable structure i.e., 1 or at most a few
optima, it is
certainly a sensible task. Separable functions are certainly nicer (10K 1D
minimizations),
but it is pretty easy to devise functions e.g., generalizations of
Rosenbrock, Chebyquad
and other functions that are high dimension but not separable.

Admittedly, there are not a lot of real-world examples that are publicly
available. More
would be useful.

JN


On 02/25/2011 05:06 PM, Bert Gunter wrote:
> On Fri, Feb 25, 2011 at 12:00 PM, Brian Tsai  wrote:
>> Hi John,
>>
>> Thanks so much for the informative reply!  I'm currently trying to
optimize
>> ~10,000 parameters simultaneously - for some reason,
> 
> -- Some expert (Ravi, John ?) please correct me, but: Is the above not
> complete nonsense? I can't imagine poking around usefully  in 10K
> dimensional space for an extremum unless maybe one can find the
> extremum by 10K separate 1-dim optimizations. And maybe not then
> either.
> 
> Am I way offbase here, or has Brian merely described just another
> inefficient way to produce random numbers?
> 
> -- Bert

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] BFGS versus L-BFGS-B

2011-02-25 Thread Prof. John C Nash
For functions that have a reasonable structure i.e., 1 or at most a few optima, 
it is
certainly a sensible task. Separable functions are certainly nicer (10K 1D 
minimizations),
but it is pretty easy to devise functions e.g., generalizations of Rosenbrock, 
Chebyquad
and other functions that are high dimension but not separable.

Admittedly, there are not a lot of real-world examples that are publicly 
available. More
would be useful.

JN


On 02/25/2011 05:06 PM, Bert Gunter wrote:
> On Fri, Feb 25, 2011 at 12:00 PM, Brian Tsai  wrote:
>> Hi John,
>>
>> Thanks so much for the informative reply!  I'm currently trying to optimize
>> ~10,000 parameters simultaneously - for some reason,
> 
> -- Some expert (Ravi, John ?) please correct me, but: Is the above not
> complete nonsense? I can't imagine poking around usefully  in 10K
> dimensional space for an extremum unless maybe one can find the
> extremum by 10K separate 1-dim optimizations. And maybe not then
> either.
> 
> Am I way offbase here, or has Brian merely described just another
> inefficient way to produce random numbers?
> 
> -- Bert

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] BFGS versus L-BFGS-B

2011-02-25 Thread Bert Gunter
On Fri, Feb 25, 2011 at 12:00 PM, Brian Tsai  wrote:
> Hi John,
>
> Thanks so much for the informative reply!  I'm currently trying to optimize
> ~10,000 parameters simultaneously - for some reason,

-- Some expert (Ravi, John ?) please correct me, but: Is the above not
complete nonsense? I can't imagine poking around usefully  in 10K
dimensional space for an extremum unless maybe one can find the
extremum by 10K separate 1-dim optimizations. And maybe not then
either.

Am I way offbase here, or has Brian merely described just another
inefficient way to produce random numbers?

-- Bert

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] BFGS versus L-BFGS-B

2011-02-25 Thread Brian Tsai
Hi John,

Thanks so much for the informative reply!  I'm currently trying to optimize
~10,000 parameters simultaneously - for some reason, when I compare the
memory usage for L-BFGS-B and BFGS, the L-BFGS-B only uses about 1/7 of the
memory, with all default input parameters, I'm a bit surprised that it isn't
a lot less, but BFGS is definitely converging a lot slower.

My other question is that, L-BFGS-B is returning 'non-finite' errors with
respect to the gradient function I'm supplying, because again, all the
parameters i'm optimizing need to be non-negative (so i'm optimizing the log
of the parameters), but the gradient at some point divides by each
parameter, so when some of the parameters go to 0, the gradient becomes
infinite.  Do you (or anyone else) have any suggestions for how to prevent
this?  Is the only way to force the parameters to be larger than some number
close to 0 (i.e. 1e-10), or modify the gradient function to set the entry of
small parameters to 0?

Thanks!

Brian.


On Fri, Feb 25, 2011 at 10:51 AM, Prof. John C Nash wrote:

> There are considerable differences between the algorithms. And BFGS is an
> unfortunate
> nomenclature, since there are so many variants that are VERY different. It
> was called
> "variable metric" in my book from which the code was derived, and that code
> was from Roger
> Fletcher's Fortran VM code based on his 1970 paper. L-BFGS-B is a later and
> more
> complicated algorithm with some pretty nice properties. The code is much
> larger.
>
> Re: "less memory" -- this will depend on the number of parameters, but to
> my knowledge
> there are no good benchmark studies of memory and performance. Perhaps
> someone wants to
> propose one for Google Summer of Code (see
> http://rwiki.sciviews.org/doku.php?id=developers:projects:gsoc2011
> ).
>
> The optimx package can call Rvmmin which has box constraints (also Rcgmin
> that is intended
> for very low memory). Also several other methods with box constraints,
> including L-BFGS-B.
> Worth a try if you are seeking a method for multiple "production" runs.
> Unfortunately, we
> seem to have some CRAN check errors on Solaris and some old releases --
> platforms I do not
> have -- so it may be a few days or more until we sort out the issues, which
> seem to be
> related to alignment of the underlying packages for which optimx is a
> wrapper.
>
> Use of transformation can be very effective. But again, I don't think there
> are good
> studies on whether use of box constraints or transformations is "better"
> and when. Another
> project, which I have made some tentative beginings to carry out.
> Collaborations welcome.
>
> Best,
>
> JN
>
>
> On 02/25/2011 06:00 AM, r-help-requ...@r-project.org wrote:
> > Message: 86
> > Date: Fri, 25 Feb 2011 00:11:59 -0500
> > From: Brian Tsai 
> > To: r-help@r-project.org
> > Subject: [R] BFGS versus L-BFGS-B
> > Message-ID:
> >   
> > Content-Type: text/plain
> >
> > Hi all,
> >
> > I'm trying to figure out the effective differences between BFGS and
> L-BFGS-B
> > are, besides the obvious that L-BFGS-B should be using a lot less memory,
> > and the user can provide box constraints.
> >
> > 1) Why would you ever want to use BFGS, if L-BFGS-B does the same thing
> but
> > use less memory?
> >
> > 2) If i'm optimizing with respect to a variable x that must be
> non-negative,
> > a common approach is to do a change of variables x = exp(y), and optimize
> > unconstrained with respect to y.  Is optimization using box constraints
> on
> > x, likely to produce as good a result as unconstrained optimization on y?
> >
> > - Brian.
> >
> >   [[alternative HTML version deleted]]
> >
> >
> >
>
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] BFGS versus L-BFGS-B

2011-02-25 Thread Prof. John C Nash
There are considerable differences between the algorithms. And BFGS is an 
unfortunate
nomenclature, since there are so many variants that are VERY different. It was 
called
"variable metric" in my book from which the code was derived, and that code was 
from Roger
Fletcher's Fortran VM code based on his 1970 paper. L-BFGS-B is a later and more
complicated algorithm with some pretty nice properties. The code is much larger.

Re: "less memory" -- this will depend on the number of parameters, but to my 
knowledge
there are no good benchmark studies of memory and performance. Perhaps someone 
wants to
propose one for Google Summer of Code (see
http://rwiki.sciviews.org/doku.php?id=developers:projects:gsoc2011
).

The optimx package can call Rvmmin which has box constraints (also Rcgmin that 
is intended
for very low memory). Also several other methods with box constraints, 
including L-BFGS-B.
Worth a try if you are seeking a method for multiple "production" runs. 
Unfortunately, we
seem to have some CRAN check errors on Solaris and some old releases -- 
platforms I do not
have -- so it may be a few days or more until we sort out the issues, which 
seem to be
related to alignment of the underlying packages for which optimx is a wrapper.

Use of transformation can be very effective. But again, I don't think there are 
good
studies on whether use of box constraints or transformations is "better" and 
when. Another
project, which I have made some tentative beginings to carry out. 
Collaborations welcome.

Best,

JN


On 02/25/2011 06:00 AM, r-help-requ...@r-project.org wrote:
> Message: 86
> Date: Fri, 25 Feb 2011 00:11:59 -0500
> From: Brian Tsai 
> To: r-help@r-project.org
> Subject: [R] BFGS versus L-BFGS-B
> Message-ID:
>   
> Content-Type: text/plain
> 
> Hi all,
> 
> I'm trying to figure out the effective differences between BFGS and L-BFGS-B
> are, besides the obvious that L-BFGS-B should be using a lot less memory,
> and the user can provide box constraints.
> 
> 1) Why would you ever want to use BFGS, if L-BFGS-B does the same thing but
> use less memory?
> 
> 2) If i'm optimizing with respect to a variable x that must be non-negative,
> a common approach is to do a change of variables x = exp(y), and optimize
> unconstrained with respect to y.  Is optimization using box constraints on
> x, likely to produce as good a result as unconstrained optimization on y?
> 
> - Brian.
> 
>   [[alternative HTML version deleted]]
> 
> 
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] BFGS versus L-BFGS-B

2011-02-25 Thread Ben Bolker
Brian Tsai  gmail.com> writes:

> 
> Hi all,
> 
> I'm trying to figure out the effective differences between BFGS and L-BFGS-B
> are, besides the obvious that L-BFGS-B should be using a lot less memory,
> and the user can provide box constraints.
> 
> 1) Why would you ever want to use BFGS, if L-BFGS-B does the same thing but
> use less memory?

  L-BFGS-B is a bit more finicky: for example, it does not allow
non-finite (infinite or NA) return values from the objective function,
while BFGS does (although neither does during the initial function evaluation).
I don't know offhand of other differences, although speed may differ.

> 2) If i'm optimizing with respect to a variable x that must be non-negative,
> a common approach is to do a change of variables x = exp(y), and optimize
> unconstrained with respect to y.  Is optimization using box constraints on
> x, likely to produce as good a result as unconstrained optimization on y?

  It depends.  If the optimal solution is on the boundary (i.e. x=0)
then optimization on the transformed variable (I think you mean y=exp(x)
above?) will work very badly. On the other hand, if the solution is in 
the interior then transforming sometimes works even better -- for example,
the goodness-of-fit surface may be closer to quadratic (which sometimes
has advantages in terms of inference) with the transformed than the
untransformed parameter.

  Ben Bolker

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] BFGS versus L-BFGS-B

2011-02-24 Thread Brian Tsai
Hi all,

I'm trying to figure out the effective differences between BFGS and L-BFGS-B
are, besides the obvious that L-BFGS-B should be using a lot less memory,
and the user can provide box constraints.

1) Why would you ever want to use BFGS, if L-BFGS-B does the same thing but
use less memory?

2) If i'm optimizing with respect to a variable x that must be non-negative,
a common approach is to do a change of variables x = exp(y), and optimize
unconstrained with respect to y.  Is optimization using box constraints on
x, likely to produce as good a result as unconstrained optimization on y?

- Brian.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.