Re: [R] BFGS versus L-BFGS-B
Bert has put a finger on one of the key questions: Are we there yet? Global optima are very difficult to find in many, possibly most problems. We can check for local optima. In package optimx (currently resetting it on CRAN due to a variable naming glitch with Rvmmin -- should be there in a couple of days) we carry out the Kuhn Karush Tucker tests, but even these have scale dependencies. On R-forge there's a new package under our optimization and solving packages project called kktc that separates off the tests. However, that's only a start. Constraints require us to do more work, too. A couple of very large scale problems have been noted. The minimization of energy for collections of atoms or molecules was suggested, but my experience with that type of problem is that it has a huge number of local minima and is not amenable to the techniques that the original poster had in mind, and certainly optim(BFGS) nor Rcgmin nor optim(L-BFGS-B) are appropriate as they stand, though they may be helpful in some sort of polyalgorithm. ALso I did not post, but mentioned in private emails, that L-BFGS-B and Rcgmin should offer much lower memory demands for problems in large numbers of parameters than optim(BFGS) which has an explicit inverse Hessian approximation i.e., n*n doubles to store. BFGS and L-BFGS-B have very different internals. JN On 02/25/2011 08:46 PM, Bert Gunter wrote: > Thanks to all for clarifications. So I'm off base, but whether waaay > off base depends on whether there is a reasonably well defined optimum > to converge to. Which begs the question, I suppose: How does one know > whether one has converged to such an optimum? This is always an > issue, of course, even for a few parameters; but maybe more so with so > many. > > -- Bert > > On Fri, Feb 25, 2011 at 3:35 PM, Ravi Varadhan wrote: >> I have worked on a 2D image reconstruction problem in PET (positron emission >> tomography) using a Poisson model. Here, each pixel intensity is an unknown >> parameter. I have solved problems of size 128 x 128 using an accelerated EM >> algorithm. Ken Lange has shown that you can achieve term by term separation >> using a minorization inequality, and hence the problem simplifies greatly. >> >> Ravi. >> >> --- >> Ravi Varadhan, Ph.D. >> Assistant Professor, >> Division of Geriatric Medicine and Gerontology School of Medicine Johns >> Hopkins University >> >> Ph. (410) 502-2619 >> email: rvarad...@jhmi.edu >> >> -Original Message- >> From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On >> Behalf Of Prof. John C Nash >> Sent: Friday, February 25, 2011 5:55 PM >> To: Bert Gunter >> Cc: r-help@r-project.org >> Subject: Re: [R] BFGS versus L-BFGS-B >> >> For functions that have a reasonable structure i.e., 1 or at most a few >> optima, it is >> certainly a sensible task. Separable functions are certainly nicer (10K 1D >> minimizations), >> but it is pretty easy to devise functions e.g., generalizations of >> Rosenbrock, Chebyquad >> and other functions that are high dimension but not separable. >> >> Admittedly, there are not a lot of real-world examples that are publicly >> available. More >> would be useful. >> >> JN >> >> >> On 02/25/2011 05:06 PM, Bert Gunter wrote: >>> On Fri, Feb 25, 2011 at 12:00 PM, Brian Tsai wrote: >>>> Hi John, >>>> >>>> Thanks so much for the informative reply! I'm currently trying to >> optimize >>>> ~10,000 parameters simultaneously - for some reason, >>> >>> -- Some expert (Ravi, John ?) please correct me, but: Is the above not >>> complete nonsense? I can't imagine poking around usefully in 10K >>> dimensional space for an extremum unless maybe one can find the >>> extremum by 10K separate 1-dim optimizations. And maybe not then >>> either. >>> >>> Am I way offbase here, or has Brian merely described just another >>> inefficient way to produce random numbers? >>> >>> -- Bert >> >> __ >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] BFGS versus L-BFGS-B
Thanks to all for clarifications. So I'm off base, but whether waaay off base depends on whether there is a reasonably well defined optimum to converge to. Which begs the question, I suppose: How does one know whether one has converged to such an optimum? This is always an issue, of course, even for a few parameters; but maybe more so with so many. -- Bert On Fri, Feb 25, 2011 at 3:35 PM, Ravi Varadhan wrote: > I have worked on a 2D image reconstruction problem in PET (positron emission > tomography) using a Poisson model. Here, each pixel intensity is an unknown > parameter. I have solved problems of size 128 x 128 using an accelerated EM > algorithm. Ken Lange has shown that you can achieve term by term separation > using a minorization inequality, and hence the problem simplifies greatly. > > Ravi. > > --- > Ravi Varadhan, Ph.D. > Assistant Professor, > Division of Geriatric Medicine and Gerontology School of Medicine Johns > Hopkins University > > Ph. (410) 502-2619 > email: rvarad...@jhmi.edu > > -Original Message- > From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On > Behalf Of Prof. John C Nash > Sent: Friday, February 25, 2011 5:55 PM > To: Bert Gunter > Cc: r-help@r-project.org > Subject: Re: [R] BFGS versus L-BFGS-B > > For functions that have a reasonable structure i.e., 1 or at most a few > optima, it is > certainly a sensible task. Separable functions are certainly nicer (10K 1D > minimizations), > but it is pretty easy to devise functions e.g., generalizations of > Rosenbrock, Chebyquad > and other functions that are high dimension but not separable. > > Admittedly, there are not a lot of real-world examples that are publicly > available. More > would be useful. > > JN > > > On 02/25/2011 05:06 PM, Bert Gunter wrote: >> On Fri, Feb 25, 2011 at 12:00 PM, Brian Tsai wrote: >>> Hi John, >>> >>> Thanks so much for the informative reply! I'm currently trying to > optimize >>> ~10,000 parameters simultaneously - for some reason, >> >> -- Some expert (Ravi, John ?) please correct me, but: Is the above not >> complete nonsense? I can't imagine poking around usefully in 10K >> dimensional space for an extremum unless maybe one can find the >> extremum by 10K separate 1-dim optimizations. And maybe not then >> either. >> >> Am I way offbase here, or has Brian merely described just another >> inefficient way to produce random numbers? >> >> -- Bert > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] BFGS versus L-BFGS-B
I have worked on a 2D image reconstruction problem in PET (positron emission tomography) using a Poisson model. Here, each pixel intensity is an unknown parameter. I have solved problems of size 128 x 128 using an accelerated EM algorithm. Ken Lange has shown that you can achieve term by term separation using a minorization inequality, and hence the problem simplifies greatly. Ravi. --- Ravi Varadhan, Ph.D. Assistant Professor, Division of Geriatric Medicine and Gerontology School of Medicine Johns Hopkins University Ph. (410) 502-2619 email: rvarad...@jhmi.edu -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Prof. John C Nash Sent: Friday, February 25, 2011 5:55 PM To: Bert Gunter Cc: r-help@r-project.org Subject: Re: [R] BFGS versus L-BFGS-B For functions that have a reasonable structure i.e., 1 or at most a few optima, it is certainly a sensible task. Separable functions are certainly nicer (10K 1D minimizations), but it is pretty easy to devise functions e.g., generalizations of Rosenbrock, Chebyquad and other functions that are high dimension but not separable. Admittedly, there are not a lot of real-world examples that are publicly available. More would be useful. JN On 02/25/2011 05:06 PM, Bert Gunter wrote: > On Fri, Feb 25, 2011 at 12:00 PM, Brian Tsai wrote: >> Hi John, >> >> Thanks so much for the informative reply! I'm currently trying to optimize >> ~10,000 parameters simultaneously - for some reason, > > -- Some expert (Ravi, John ?) please correct me, but: Is the above not > complete nonsense? I can't imagine poking around usefully in 10K > dimensional space for an extremum unless maybe one can find the > extremum by 10K separate 1-dim optimizations. And maybe not then > either. > > Am I way offbase here, or has Brian merely described just another > inefficient way to produce random numbers? > > -- Bert __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] BFGS versus L-BFGS-B
For functions that have a reasonable structure i.e., 1 or at most a few optima, it is certainly a sensible task. Separable functions are certainly nicer (10K 1D minimizations), but it is pretty easy to devise functions e.g., generalizations of Rosenbrock, Chebyquad and other functions that are high dimension but not separable. Admittedly, there are not a lot of real-world examples that are publicly available. More would be useful. JN On 02/25/2011 05:06 PM, Bert Gunter wrote: > On Fri, Feb 25, 2011 at 12:00 PM, Brian Tsai wrote: >> Hi John, >> >> Thanks so much for the informative reply! I'm currently trying to optimize >> ~10,000 parameters simultaneously - for some reason, > > -- Some expert (Ravi, John ?) please correct me, but: Is the above not > complete nonsense? I can't imagine poking around usefully in 10K > dimensional space for an extremum unless maybe one can find the > extremum by 10K separate 1-dim optimizations. And maybe not then > either. > > Am I way offbase here, or has Brian merely described just another > inefficient way to produce random numbers? > > -- Bert __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] BFGS versus L-BFGS-B
On Fri, Feb 25, 2011 at 12:00 PM, Brian Tsai wrote: > Hi John, > > Thanks so much for the informative reply! I'm currently trying to optimize > ~10,000 parameters simultaneously - for some reason, -- Some expert (Ravi, John ?) please correct me, but: Is the above not complete nonsense? I can't imagine poking around usefully in 10K dimensional space for an extremum unless maybe one can find the extremum by 10K separate 1-dim optimizations. And maybe not then either. Am I way offbase here, or has Brian merely described just another inefficient way to produce random numbers? -- Bert __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] BFGS versus L-BFGS-B
Hi John, Thanks so much for the informative reply! I'm currently trying to optimize ~10,000 parameters simultaneously - for some reason, when I compare the memory usage for L-BFGS-B and BFGS, the L-BFGS-B only uses about 1/7 of the memory, with all default input parameters, I'm a bit surprised that it isn't a lot less, but BFGS is definitely converging a lot slower. My other question is that, L-BFGS-B is returning 'non-finite' errors with respect to the gradient function I'm supplying, because again, all the parameters i'm optimizing need to be non-negative (so i'm optimizing the log of the parameters), but the gradient at some point divides by each parameter, so when some of the parameters go to 0, the gradient becomes infinite. Do you (or anyone else) have any suggestions for how to prevent this? Is the only way to force the parameters to be larger than some number close to 0 (i.e. 1e-10), or modify the gradient function to set the entry of small parameters to 0? Thanks! Brian. On Fri, Feb 25, 2011 at 10:51 AM, Prof. John C Nash wrote: > There are considerable differences between the algorithms. And BFGS is an > unfortunate > nomenclature, since there are so many variants that are VERY different. It > was called > "variable metric" in my book from which the code was derived, and that code > was from Roger > Fletcher's Fortran VM code based on his 1970 paper. L-BFGS-B is a later and > more > complicated algorithm with some pretty nice properties. The code is much > larger. > > Re: "less memory" -- this will depend on the number of parameters, but to > my knowledge > there are no good benchmark studies of memory and performance. Perhaps > someone wants to > propose one for Google Summer of Code (see > http://rwiki.sciviews.org/doku.php?id=developers:projects:gsoc2011 > ). > > The optimx package can call Rvmmin which has box constraints (also Rcgmin > that is intended > for very low memory). Also several other methods with box constraints, > including L-BFGS-B. > Worth a try if you are seeking a method for multiple "production" runs. > Unfortunately, we > seem to have some CRAN check errors on Solaris and some old releases -- > platforms I do not > have -- so it may be a few days or more until we sort out the issues, which > seem to be > related to alignment of the underlying packages for which optimx is a > wrapper. > > Use of transformation can be very effective. But again, I don't think there > are good > studies on whether use of box constraints or transformations is "better" > and when. Another > project, which I have made some tentative beginings to carry out. > Collaborations welcome. > > Best, > > JN > > > On 02/25/2011 06:00 AM, r-help-requ...@r-project.org wrote: > > Message: 86 > > Date: Fri, 25 Feb 2011 00:11:59 -0500 > > From: Brian Tsai > > To: r-help@r-project.org > > Subject: [R] BFGS versus L-BFGS-B > > Message-ID: > > > > Content-Type: text/plain > > > > Hi all, > > > > I'm trying to figure out the effective differences between BFGS and > L-BFGS-B > > are, besides the obvious that L-BFGS-B should be using a lot less memory, > > and the user can provide box constraints. > > > > 1) Why would you ever want to use BFGS, if L-BFGS-B does the same thing > but > > use less memory? > > > > 2) If i'm optimizing with respect to a variable x that must be > non-negative, > > a common approach is to do a change of variables x = exp(y), and optimize > > unconstrained with respect to y. Is optimization using box constraints > on > > x, likely to produce as good a result as unconstrained optimization on y? > > > > - Brian. > > > > [[alternative HTML version deleted]] > > > > > > > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] BFGS versus L-BFGS-B
There are considerable differences between the algorithms. And BFGS is an unfortunate nomenclature, since there are so many variants that are VERY different. It was called "variable metric" in my book from which the code was derived, and that code was from Roger Fletcher's Fortran VM code based on his 1970 paper. L-BFGS-B is a later and more complicated algorithm with some pretty nice properties. The code is much larger. Re: "less memory" -- this will depend on the number of parameters, but to my knowledge there are no good benchmark studies of memory and performance. Perhaps someone wants to propose one for Google Summer of Code (see http://rwiki.sciviews.org/doku.php?id=developers:projects:gsoc2011 ). The optimx package can call Rvmmin which has box constraints (also Rcgmin that is intended for very low memory). Also several other methods with box constraints, including L-BFGS-B. Worth a try if you are seeking a method for multiple "production" runs. Unfortunately, we seem to have some CRAN check errors on Solaris and some old releases -- platforms I do not have -- so it may be a few days or more until we sort out the issues, which seem to be related to alignment of the underlying packages for which optimx is a wrapper. Use of transformation can be very effective. But again, I don't think there are good studies on whether use of box constraints or transformations is "better" and when. Another project, which I have made some tentative beginings to carry out. Collaborations welcome. Best, JN On 02/25/2011 06:00 AM, r-help-requ...@r-project.org wrote: > Message: 86 > Date: Fri, 25 Feb 2011 00:11:59 -0500 > From: Brian Tsai > To: r-help@r-project.org > Subject: [R] BFGS versus L-BFGS-B > Message-ID: > > Content-Type: text/plain > > Hi all, > > I'm trying to figure out the effective differences between BFGS and L-BFGS-B > are, besides the obvious that L-BFGS-B should be using a lot less memory, > and the user can provide box constraints. > > 1) Why would you ever want to use BFGS, if L-BFGS-B does the same thing but > use less memory? > > 2) If i'm optimizing with respect to a variable x that must be non-negative, > a common approach is to do a change of variables x = exp(y), and optimize > unconstrained with respect to y. Is optimization using box constraints on > x, likely to produce as good a result as unconstrained optimization on y? > > - Brian. > > [[alternative HTML version deleted]] > > > __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] BFGS versus L-BFGS-B
Brian Tsai gmail.com> writes: > > Hi all, > > I'm trying to figure out the effective differences between BFGS and L-BFGS-B > are, besides the obvious that L-BFGS-B should be using a lot less memory, > and the user can provide box constraints. > > 1) Why would you ever want to use BFGS, if L-BFGS-B does the same thing but > use less memory? L-BFGS-B is a bit more finicky: for example, it does not allow non-finite (infinite or NA) return values from the objective function, while BFGS does (although neither does during the initial function evaluation). I don't know offhand of other differences, although speed may differ. > 2) If i'm optimizing with respect to a variable x that must be non-negative, > a common approach is to do a change of variables x = exp(y), and optimize > unconstrained with respect to y. Is optimization using box constraints on > x, likely to produce as good a result as unconstrained optimization on y? It depends. If the optimal solution is on the boundary (i.e. x=0) then optimization on the transformed variable (I think you mean y=exp(x) above?) will work very badly. On the other hand, if the solution is in the interior then transforming sometimes works even better -- for example, the goodness-of-fit surface may be closer to quadratic (which sometimes has advantages in terms of inference) with the transformed than the untransformed parameter. Ben Bolker __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] BFGS versus L-BFGS-B
Hi all, I'm trying to figure out the effective differences between BFGS and L-BFGS-B are, besides the obvious that L-BFGS-B should be using a lot less memory, and the user can provide box constraints. 1) Why would you ever want to use BFGS, if L-BFGS-B does the same thing but use less memory? 2) If i'm optimizing with respect to a variable x that must be non-negative, a common approach is to do a change of variables x = exp(y), and optimize unconstrained with respect to y. Is optimization using box constraints on x, likely to produce as good a result as unconstrained optimization on y? - Brian. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.