[ 
https://issues.apache.org/jira/browse/FLINK-1994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15110397#comment-15110397
 ] 

ASF GitHub Bot commented on FLINK-1994:
---------------------------------------

Github user tillrohrmann commented on a diff in the pull request:

    https://github.com/apache/flink/pull/1397#discussion_r50383721
  
    --- Diff: docs/libs/ml/optimization.md ---
    @@ -256,6 +271,79 @@ The full list of supported prediction functions can be 
found [here](#prediction-
           </tbody>
         </table>
     
    +#### Effective Learning Rate ##
    +
    +Where:
    +- $j$ is the iteration number
    +- $\eta_j$ is the step size on step $j$
    +- $\eta_0$ is the initial step size
    +- $\lambda$ is the regularization constant
    +- $k$ is the decay constant
    +
    +<table class="table table-bordered">
    +    <thead>
    +      <tr>
    +        <th class="text-left" style="width: 20%">Function Name</th>
    +        <th class="text-center">Description</th>
    +        <th class="text-center">Function</th>
    +        <th class="text-center">Called As</th>
    +      </tr>
    +    </thead>
    +    <tbody>
    +      <tr>
    +        <td><strong>Default</strong></td>
    +        <td>
    +          <p>
    +            The function default method used for determining the step 
size. This is equivalent to the inverse scaling method for $\tau$=0.5.  This 
special case is kept as the default to maintain backwards compatibility.
    +          </p>
    +        </td>
    +        <td class="text-center">$\eta_j = \eta_0/\sqrt{j}$</td>
    +        <td class="text-center">`default`</td>
    +      </tr>
    +      <tr>
    +        <td><strong>Constant</strong></td>
    +        <td>
    +          <p>
    +            The step size is constant throughout the learning task.
    +          </p>
    +        </td>
    +        <td class="text-center">$\eta_j = \eta_0$</td>
    +        <td class="text-center">`constant`</td>
    +      </tr>
    +      <tr>
    +        <td><strong>Leon Bottou's Method</strong></td>
    +        <td>
    +          <p>
    +            This is the `'optimal'` method of sklearn.  Chooses optimal 
initial $t_0 = \lambda \cdot eta_0$, based on Leon Bottou's [Learning with 
Large Data Sets ](http://leon.bottou.org/slides/largescale/lstut.pdf)
    +          </p>
    +        </td>
    +        <td class="text-center">\eta_j = \frac{1}{\lambda \cdot 
(\frac{1}{\lambda \cdot eta_0 } +j -1) }</td>
    +        <td class="text-center">`bottou`</td>
    +      </tr>
    +      <tr>
    +        <td><strong>Inverse Scaling</strong></td>
    +        <td>
    +          <p>
    +            A very common method for determining the step size.
    +          </p>
    +        </td>
    +        <td class="text-center">$\eta_j = \frac{\lambda}{j^{\tau}}$</td>
    --- End diff --
    
    Maybe don't use `frac` here but instead `\lambda/ j^\tau`, because the 
exponent of `j` is rendered really small with `frac`.


> Add different gain calculation schemes to SGD
> ---------------------------------------------
>
>                 Key: FLINK-1994
>                 URL: https://issues.apache.org/jira/browse/FLINK-1994
>             Project: Flink
>          Issue Type: Improvement
>          Components: Machine Learning Library
>            Reporter: Till Rohrmann
>            Assignee: Trevor Grant
>            Priority: Minor
>              Labels: ML, Starter
>
> The current SGD implementation uses as gain for the weight updates the 
> formula {{stepsize/sqrt(iterationNumber)}}. It would be good to make the gain 
> calculation configurable and to provide different strategies for that. For 
> example:
> * stepsize/(1 + iterationNumber)
> * stepsize*(1 + regularization * stepsize * iterationNumber)^(-3/4)
> See also how to properly select the gains [1].
> Resources:
> [1] http://arxiv.org/pdf/1107.2490.pdf



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to