Github user tillrohrmann commented on a diff in the pull request:

    https://github.com/apache/flink/pull/1397#discussion_r50383721
  
    --- Diff: docs/libs/ml/optimization.md ---
    @@ -256,6 +271,79 @@ The full list of supported prediction functions can be 
found [here](#prediction-
           </tbody>
         </table>
     
    +#### Effective Learning Rate ##
    +
    +Where:
    +- $j$ is the iteration number
    +- $\eta_j$ is the step size on step $j$
    +- $\eta_0$ is the initial step size
    +- $\lambda$ is the regularization constant
    +- $k$ is the decay constant
    +
    +<table class="table table-bordered">
    +    <thead>
    +      <tr>
    +        <th class="text-left" style="width: 20%">Function Name</th>
    +        <th class="text-center">Description</th>
    +        <th class="text-center">Function</th>
    +        <th class="text-center">Called As</th>
    +      </tr>
    +    </thead>
    +    <tbody>
    +      <tr>
    +        <td><strong>Default</strong></td>
    +        <td>
    +          <p>
    +            The function default method used for determining the step 
size. This is equivalent to the inverse scaling method for $\tau$=0.5.  This 
special case is kept as the default to maintain backwards compatibility.
    +          </p>
    +        </td>
    +        <td class="text-center">$\eta_j = \eta_0/\sqrt{j}$</td>
    +        <td class="text-center">`default`</td>
    +      </tr>
    +      <tr>
    +        <td><strong>Constant</strong></td>
    +        <td>
    +          <p>
    +            The step size is constant throughout the learning task.
    +          </p>
    +        </td>
    +        <td class="text-center">$\eta_j = \eta_0$</td>
    +        <td class="text-center">`constant`</td>
    +      </tr>
    +      <tr>
    +        <td><strong>Leon Bottou's Method</strong></td>
    +        <td>
    +          <p>
    +            This is the `'optimal'` method of sklearn.  Chooses optimal 
initial $t_0 = \lambda \cdot eta_0$, based on Leon Bottou's [Learning with 
Large Data Sets ](http://leon.bottou.org/slides/largescale/lstut.pdf)
    +          </p>
    +        </td>
    +        <td class="text-center">\eta_j = \frac{1}{\lambda \cdot 
(\frac{1}{\lambda \cdot eta_0 } +j -1) }</td>
    +        <td class="text-center">`bottou`</td>
    +      </tr>
    +      <tr>
    +        <td><strong>Inverse Scaling</strong></td>
    +        <td>
    +          <p>
    +            A very common method for determining the step size.
    +          </p>
    +        </td>
    +        <td class="text-center">$\eta_j = \frac{\lambda}{j^{\tau}}$</td>
    --- End diff --
    
    Maybe don't use `frac` here but instead `\lambda/ j^\tau`, because the 
exponent of `j` is rendered really small with `frac`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

Reply via email to