In econometrics it was common to start an optimization with Nelder-Mead and then switch to one of the other algorithms to finish the optimization. As John Nash states NM gets one close. switching then speeds the final solution.
John John C Frain 3 Aranleigh Park Rathfarnham Dublin 14 Ireland www.tcd.ie/Economics/staff/frainj/home.html mailto:fra...@tcd.ie mailto:fra...@gmail.com On 15 November 2015 at 20:05, Mark Leeds <marklee...@gmail.com> wrote: > and just to add to john's comments, since he's too modest, in my > experience, the algorithm in the rvmmin package ( written by john ) shows > great improvement compared to the L-BFGS-B algorithm so I don't use > L-BFGS-B anymore. L-BFGS-B often has a dangerous convergence issue in > that it can claim to converge when it hasn't. which, to > me is worse than not converging. Most likely it has to do with the link > below which was pointed out to me by john a while back. > > http://www.ece.northwestern.edu/~morales/PSfiles/acm-remark.pdf > > > On Sun, Nov 15, 2015 at 2:41 PM, ProfJCNash <profjcn...@gmail.com> wrote: > > > Agreed on the default algorithm issue. That is important for users to > > know, and I'm happy to underline it. Also that CG (which is based on one > > of my codes) should be deprecated. BFGS (also based on one of my codes > > from long ago) does much better than I would ever have expected. > > > > Over the years I've tried different Nelder-Mead implementations. Cannot > > say I've found any that is always better than that in optim() (also > > based on an old code of mine), though nmkb() from dfoptim package seems > > to do better a lot of the time, and it has a transformation method for > > bounds, which may be useful, but does have the awkwardness that one > > cannot start on a bound. For testing a function, I don't think it makes > > a lot of difference which variant of NM one uses if the trace is on to > > catch never-ending runs. For production use, it is a really good idea to > > try different methods on a sample of likely cases and choose a method > > that does well. That is the motivation for the optimx package or the > > opm() function of the newer optimz (on R-forge) that I'm still > > polishing. optimz has a function optimr() that has the same call as > > optim() but incorporates over a dozen optimizers via method = "(selected > > method)". > > > > As a gradient-free choice, the Powell codes from minqa or other packages > > (there are several implementations) can sometimes have spectacular > > performance, but they also flub rather more regularly than Nelder-Mead > > in my experience. That is, when they are good, they are very very good, > > and when they are not they are horrid. (Plagiarism!) > > > > JN > > > > On 15-11-15 12:46 PM, Ravi Varadhan wrote: > > > Hi John, > > > My main point is not about Nelder-Mead per se. It is *primarily* about > > the Nelder-Mead implementation in optim(). > > > > > > The users of optim() should be cautioned regarding the default > algorithm > > and that they should consider alternatives such as "BFGS" in optim(), or > > other implementations of Nelder-Mead. > > > > > > Best regards, > > > Ravi > > > ________________________________________ > > > From: ProfJCNash <profjcn...@gmail.com> > > > Sent: Sunday, November 15, 2015 12:21 PM > > > To: Ravi Varadhan; 'r-help@r-project.org'; lorenzo.ise...@gmail.com > > > Cc: b...@xs4all.nl; Gabor Grothendieck > > > Subject: Re: Cautioning optim() users about "Nelder-Mead" default - > > (originally) Optim instability > > > > > > Not contradicting Ravi's message, but I wouldn't say Nelder-Mead is > > > "bad" per se. It's issues are that it assumes the parameters are all on > > > the same scale, and the termination (not convergence) test can't use > > > gradients, so it tends to get "near" the optimum very quickly -- say > > > only 10% of the computational effort -- then spends an awful amount of > > > effort deciding it's got there. It often will do poorly when the > > > function has nearly "flat" zones e.g., long valley with very low slope. > > > > > > So my message is still that Nelder-Mead is an unfortunate default -- it > > > has been chosen I believe because it is generally robust and doesn't > > > need gradients. BFGS really should use accurate gradients, preferably > > > computed analytically, so it would only be a good default in that case > > > or with very good approximate gradients (which are costly > > > computationally). > > > > > > However, if you understand what NM is doing, and use it accordingly, it > > > is a valuable tool. I generally use it as a first try BUT turn on the > > > trace to watch what it is doing as a way to learn a bit about the > > > function I am minimizing. Rarely would I use it as a production > > minimizer. > > > > > > Best, JN > > > > > > On 15-11-15 12:02 PM, Ravi Varadhan wrote: > > >> Hi, > > >> > > >> > > >> > > >> While I agree with the comments about paying attention to parameter > > >> scaling, a major issue here is that the default optimization > algorithm, > > >> Nelder-Mead, is not very good. It is unfortunate that the optim > > >> implementation chose this as the "default" algorithm. I have several > > >> instances where people have come to me with poor results from using > > >> optim(), because they did not realize that the default algorithm is > > >> bad. We (John Nash and I) have pointed this out before, but the R > core > > >> has not addressed this issue due to backward compatibility reasons. > > >> > > >> > > >> > > >> There is a better implementation of Nelder-Mead in the "dfoptim" > > package. > > >> > > >> > > >> > > >> require(dfoptim) > > >> > > >> mm_def1 <- nmk(par = par_ini1, min.perc_error, data = data) > > >> > > >> mm_def2 <- nmk(par = par_ini2, min.perc_error, data = data) > > >> > > >> mm_def3 <- nmk(par = par_ini3, min.perc_error, data = data) > > >> > > >> print(mm_def1$par) > > >> > > >> print(mm_def2$par) > > >> > > >> print(mm_def3$par) > > >> > > >> > > >> > > >> In general, better implementations of optimization algorithms are > > >> available in packages such as "optimx", "nloptr". It is unfortunate > > >> that most naïve users of optimization in R do not recognize this. > > >> Perhaps, there should be a "message" in the optim help file that > points > > >> this out to the users. > > >> > > >> > > >> > > >> Hope this is helpful, > > >> > > >> Ravi > > >> > > >> > > > > > > > ______________________________________________ > > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide > > http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.