[ https://issues.apache.org/jira/browse/FLINK-1994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15110392#comment-15110392 ]
ASF GitHub Bot commented on FLINK-1994: --------------------------------------- Github user tillrohrmann commented on a diff in the pull request: https://github.com/apache/flink/pull/1397#discussion_r50382928 --- Diff: docs/libs/ml/optimization.md --- @@ -256,6 +271,79 @@ The full list of supported prediction functions can be found [here](#prediction- </tbody> </table> +#### Effective Learning Rate ## + +Where: +- $j$ is the iteration number +- $\eta_j$ is the step size on step $j$ +- $\eta_0$ is the initial step size +- $\lambda$ is the regularization constant +- $k$ is the decay constant + +<table class="table table-bordered"> + <thead> + <tr> + <th class="text-left" style="width: 20%">Function Name</th> + <th class="text-center">Description</th> + <th class="text-center">Function</th> + <th class="text-center">Called As</th> + </tr> + </thead> + <tbody> + <tr> + <td><strong>Default</strong></td> + <td> + <p> + The function default method used for determining the step size. This is equivalent to the inverse scaling method for $\tau$=0.5. This special case is kept as the default to maintain backwards compatibility. + </p> + </td> + <td class="text-center">$\eta_j = \eta_0/\sqrt{j}$</td> + <td class="text-center">`default`</td> + </tr> + <tr> + <td><strong>Constant</strong></td> + <td> + <p> + The step size is constant throughout the learning task. + </p> + </td> + <td class="text-center">$\eta_j = \eta_0$</td> + <td class="text-center">`constant`</td> + </tr> + <tr> + <td><strong>Leon Bottou's Method</strong></td> + <td> + <p> + This is the `'optimal'` method of sklearn. Chooses optimal initial $t_0 = \lambda \cdot eta_0$, based on Leon Bottou's [Learning with Large Data Sets ](http://leon.bottou.org/slides/largescale/lstut.pdf) + </p> + </td> + <td class="text-center">\eta_j = \frac{1}{\lambda \cdot (\frac{1}{\lambda \cdot eta_0 } +j -1) }</td> + <td class="text-center">`bottou`</td> + </tr> + <tr> + <td><strong>Inverse Scaling</strong></td> + <td> + <p> + A very common method for determining the step size. + </p> + </td> + <td class="text-center">$\eta_j = \frac{\lambda}{j^{\tau}}$</td> + <td class="text-center">`invScaling`</td> + </tr> + <tr> + <td><strong>Wei Xu's Method</strong></td> + <td> + <p> + Method proposed by Wei Xu in [Towards Optimal One Pass Large Scale Learning with +Averaged Stochastic Gradient Descent](http://arxiv.org/pdf/1107.2490.pdf). + </p> + </td> + <td class="text-center">\eta_j = \lambda \cdot (1+ \lambda \cdot \eta_0 \cdot j^{\$tau} )</td> --- End diff -- math environment missing > Add different gain calculation schemes to SGD > --------------------------------------------- > > Key: FLINK-1994 > URL: https://issues.apache.org/jira/browse/FLINK-1994 > Project: Flink > Issue Type: Improvement > Components: Machine Learning Library > Reporter: Till Rohrmann > Assignee: Trevor Grant > Priority: Minor > Labels: ML, Starter > > The current SGD implementation uses as gain for the weight updates the > formula {{stepsize/sqrt(iterationNumber)}}. It would be good to make the gain > calculation configurable and to provide different strategies for that. For > example: > * stepsize/(1 + iterationNumber) > * stepsize*(1 + regularization * stepsize * iterationNumber)^(-3/4) > See also how to properly select the gains [1]. > Resources: > [1] http://arxiv.org/pdf/1107.2490.pdf -- This message was sent by Atlassian JIRA (v6.3.4#6332)