GitHub user martinjaggi opened a pull request:
https://github.com/apache/incubator-spark/pull/566
new MLlib documentation for optimization, regression and classification
new documentation with tex formulas, hopefully improving usability and
reproducibility of the offered MLlib methods.
also did some minor changes in the code for consistency. scala tests pass.
this is the rebased branch, i deleted the old PR
jira:
https://spark-project.atlassian.net/browse/MLLIB-19
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/apache/incubator-spark copy-MLlib-d
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/incubator-spark/pull/566.patch
----
commit 1d7ba79c27458687833bafeb7668c5ceba8489ca
Author: Martin Jaggi <[email protected]>
Date: 2014-02-07T16:33:24Z
renaming LeastSquaresGradient
not to confuse with squared regularizer or a squared gradient. added
some more comments as what the loss functions are good for
commit e6ef3e8b870af40996a2e8f1bc3765262cf6c714
Author: Martin Jaggi <[email protected]>
Date: 2014-02-07T16:34:45Z
use d for the number of features
try to be consistent, that n is the number of data examples in the RDD,
and each of them has d entries (also in documentation)
commit da439c161d1763dfce9833efe4f8b845b00e5bef
Author: Martin Jaggi <[email protected]>
Date: 2014-02-07T17:13:17Z
correct scaling for MSE loss
to be consistent with the documentation
commit d28ac774f4582370748aca379d2931520ff68fa4
Author: Martin Jaggi <[email protected]>
Date: 2014-02-07T17:15:44Z
new classification and regression documentation
with complete mathematical formulations. trying to be general for
adding future ML methods as well. table of all subgradients used for
reference.
this change also required a small addition to the mathjax
configuration, to allow equation numbers.
commit 580b846c19bb98d4000d46a3682d3855b0a488e7
Author: Martin Jaggi <[email protected]>
Date: 2014-02-07T17:16:51Z
new optimization documentation
explaining GD and SGD and the distributed versions that MLlib
implements.
commit 6c7a7dc07afbf471fef7bf14a5751c80c6bb39aa
Author: Martin Jaggi <[email protected]>
Date: 2014-02-07T17:38:57Z
better comments in SGD code for regression
commit 25acda728ae6ec3f3eea5462073f70b29526b3e2
Author: Martin Jaggi <[email protected]>
Date: 2014-02-07T22:41:42Z
lambda R() in documentation
commit d82f9e80605c6dabf90135416f8921834621fdfe
Author: Martin Jaggi <[email protected]>
Date: 2014-02-08T17:31:05Z
telling what updater actually does
also use proper scaling for the L2 regularization (using 1/2 as in the
documentation)
commit fa355498331180623664a425783b96a12a8bcee9
Author: Martin Jaggi <[email protected]>
Date: 2014-02-08T17:56:01Z
remove broken url
commit 564cb0cee0ae2fb41baa264ed21f28c9336bc1ba
Author: Martin Jaggi <[email protected]>
Date: 2014-02-08T17:57:12Z
better description of GradientDescent
commit 29e5d65e6098c3f25bcbeada99bcf9cbb151eea7
Author: Martin Jaggi <[email protected]>
Date: 2014-02-08T20:30:35Z
line wrap at 100 chars
----