[GitHub] incubator-spark pull request: new MLlib documentation for optimiza...

martinjaggi Sun, 09 Feb 2014 05:27:05 -0800

GitHub user martinjaggi opened a pull request:

    https://github.com/apache/incubator-spark/pull/566


    new MLlib documentation for optimization, regression and classification

    new documentation with tex formulas, hopefully improving usability and 
reproducibility of the offered MLlib methods.
    also did some minor changes in the code for consistency. scala tests pass.
    
    this is the rebased branch, i deleted the old PR
    
    jira:
    https://spark-project.atlassian.net/browse/MLLIB-19

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/apache/incubator-spark copy-MLlib-d

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/incubator-spark/pull/566.patch

----
commit 1d7ba79c27458687833bafeb7668c5ceba8489ca
Author: Martin Jaggi <[email protected]>
Date:   2014-02-07T16:33:24Z

    renaming LeastSquaresGradient
    
    not to confuse with squared regularizer or a squared gradient. added
    some more comments as what the loss functions are good for

commit e6ef3e8b870af40996a2e8f1bc3765262cf6c714
Author: Martin Jaggi <[email protected]>
Date:   2014-02-07T16:34:45Z

    use d for the number of features
    
    try to be consistent, that n is the number of data examples in the RDD,
    and each of them has d entries (also in documentation)

commit da439c161d1763dfce9833efe4f8b845b00e5bef
Author: Martin Jaggi <[email protected]>
Date:   2014-02-07T17:13:17Z

    correct scaling for MSE loss
    
    to be consistent with the documentation

commit d28ac774f4582370748aca379d2931520ff68fa4
Author: Martin Jaggi <[email protected]>
Date:   2014-02-07T17:15:44Z

    new classification and regression documentation
    
    with complete mathematical formulations. trying to be general for
    adding future ML methods as well. table of all subgradients used for
    reference.
    this change also required a small addition to the mathjax
    configuration, to allow equation numbers.

commit 580b846c19bb98d4000d46a3682d3855b0a488e7
Author: Martin Jaggi <[email protected]>
Date:   2014-02-07T17:16:51Z

    new optimization documentation
    
    explaining GD and SGD and the distributed versions that MLlib
    implements.

commit 6c7a7dc07afbf471fef7bf14a5751c80c6bb39aa
Author: Martin Jaggi <[email protected]>
Date:   2014-02-07T17:38:57Z

    better comments in SGD code for regression

commit 25acda728ae6ec3f3eea5462073f70b29526b3e2
Author: Martin Jaggi <[email protected]>
Date:   2014-02-07T22:41:42Z

    lambda R() in documentation

commit d82f9e80605c6dabf90135416f8921834621fdfe
Author: Martin Jaggi <[email protected]>
Date:   2014-02-08T17:31:05Z

    telling what updater actually does
    
    also use proper scaling for the L2 regularization (using 1/2 as in the
    documentation)

commit fa355498331180623664a425783b96a12a8bcee9
Author: Martin Jaggi <[email protected]>
Date:   2014-02-08T17:56:01Z

    remove broken url

commit 564cb0cee0ae2fb41baa264ed21f28c9336bc1ba
Author: Martin Jaggi <[email protected]>
Date:   2014-02-08T17:57:12Z

    better description of GradientDescent

commit 29e5d65e6098c3f25bcbeada99bcf9cbb151eea7
Author: Martin Jaggi <[email protected]>
Date:   2014-02-08T20:30:35Z

    line wrap at 100 chars

----

[GitHub] incubator-spark pull request: new MLlib documentation for optimiza...

Reply via email to