Github user staple commented on the pull request: https://github.com/apache/spark/pull/4934#issuecomment-78192837 Hi, replying to some of the statements above: > It seems @staple has already implemented backtracking (because he has results in the JIRA), but kept them out of this PR to keep it simple, so we can tackle that afterwards. I wrote a backtracking implementation (and checked that it performs the same as the tfocs implementation). Currently it is just a port of the tfocs version. Iâd need a little time to make it scala / spark idiomatic, but the turnaround would be pretty fast. > For example, if we add line search option, what is the semantic of agd.setStepSize(1.0).useLineSearch() TFOCS supports a suggested initial lipschitz value (variable named âLâ), which is just a starting point for line search, so a corresponding behavior would be to use the step size as an initial suggestion only when line search is enabled. It may be desirable to use a parameter name like âLâ instead of âstepSizeâ to make the meaning clearer. In TFOCS you can disable backtracking line search by setting several parameters (L, Lexact, alpha, and beta) which individually control different aspects of the backtracking implementation. For spark it may make sense to provide backtracking modes that are configured explicitly, for example fixed lipshitz bound (no backtracking), or backtracking line search based on the TFOCS implementation, or possibly an alternative line search implementation that is more conservative about performing round trip aggregations. Then there could be a setBacktrackingMode() setter to configure which mode is used. Moving forward there may be a need to support additional acceleration algorithms in addition to Auslender and Teboulle. These might be configurable via a setAlgorithm() function. > Btw, I don't think we need to stick to the current GradientDescent API. The accelerated gradient takes a smooth convex function which provides gradient and optionally the Lipschitz constant. The implementation of Nesterov's method doesn't need to know RDDs. This is good to know. I had been assuming we would stick with the existing GradientDescent api including Gradient and Updater delegates. Currently the applySmooth and applyProjector functions (named the same as corresponding TFOCS functions) serve as a bridge between the acceleration implementation (relatively unaware of RDDs) and spark specific RDD aggregations. This seems like a good time to mention that the backtracking implementation in TFOCS uses a system of caching the (expensive to compute) linear operator component of the objective function, which significantly reduces the cost of backtracking. A similar implementation is possible in spark, though the performance benefit may not be as significant because two round trips would still be required per iteration. (See p. 3 of my design doc linked in the jira for some more detail.) One reason I suggested not implementing linear operator caching in the design doc is because itâs incompatible with the existing Gradient interface. If we are considering an alternative interface it may be worth revisiting this issue. The objective function âinterfaceâ used by TFOCS involves the functions applyLinear (linear operator component of objective), applySmooth (smooth portion of objective), applyProjector (nonsmooth portion of objective). In addition there are a number of numeric and categorical parameters. Theoretically we could adopt a similar interface (with or without applyLinear, depending) where RDD specific operations are encapsulated within the various apply* functions. Finally, I wanted to mention that Iâm in the bay area and am happy to meet in person to discuss this project if that would be helpful.
--- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org