GitHub user martinjaggi opened a pull request:
https://github.com/apache/incubator-spark/pull/563
new MLlib documentation for optimization, regression and classification
new documentation with tex formulas, hopefully improving usability and
reproducibility of the offered MLlib methods.
also did some minor changes in the code for consistency. scala tests pass.
for easier merging, we could maybe rebase these changes (only > feb 7 is
relevant) after
https://github.com/apache/incubator-spark/pull/552
is merged?
jira:
https://spark-project.atlassian.net/browse/MLLIB-19
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/apache/incubator-spark polishing-opt-MLlib
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/incubator-spark/pull/563.patch
----
commit d73948db0d9bc36296054e79fec5b1a657b4eab4
Author: Martin Jaggi <[email protected]>
Date: 2014-02-06T15:57:23Z
minor update on how to compile the documentation
commit d1c5212b93c67436543c2d8ddbbf610fdf0a26eb
Author: Martin Jaggi <[email protected]>
Date: 2014-02-06T15:59:43Z
enable mathjax formula in the .md documentation files
code by @shivaram
commit bbafafd2b497a5acaa03a140bb9de1fbb7d67ffa
Author: Martin Jaggi <[email protected]>
Date: 2014-02-06T16:31:29Z
split MLlib documentation by techniques
and linked from the main mllib-guide.md site
commit dcd2142c164b2f602bf472bb152ad55bae82d31a
Author: Martin Jaggi <[email protected]>
Date: 2014-02-06T17:04:26Z
enabling inline latex formulas with $.$
same mathjax configuration as used in math.stackexchange.com
sample usage in the linear algebra (SVD) documentation
commit 0364bfabbfc347f917216057a20c39b631842481
Author: Martin Jaggi <[email protected]>
Date: 2014-02-07T02:19:38Z
minor polishing, as suggested by @pwendell
commit 93d74988c33a9e4ef0d15e39c8b8fc9e6c36bb28
Author: Martin Jaggi <[email protected]>
Date: 2014-02-07T16:33:24Z
renaming LeastSquaresGradient
not to confuse with squared regularizer or a squared gradient. added
some more comments as what the loss functions are good for
commit e4cbe99bbcf7f53ebb8f1a0d2e0b869a4922bca4
Author: Martin Jaggi <[email protected]>
Date: 2014-02-07T16:34:45Z
use d for the number of features
try to be consistent, that n is the number of data examples in the RDD,
and each of them has d entries (also in documentation)
commit 79768fd3429df5c6d56f05ac93bdd8cf4355d946
Author: Martin Jaggi <[email protected]>
Date: 2014-02-07T17:13:17Z
correct scaling for MSE loss
to be consistent with the documentation
commit 1e228062b01ac806c4bd032eb0975a8b92431fd9
Author: Martin Jaggi <[email protected]>
Date: 2014-02-07T17:15:44Z
new classification and regression documentation
with complete mathematical formulations. trying to be general for
adding future ML methods as well. table of all subgradients used for
reference.
this change also required a small addition to the mathjax
configuration, to allow equation numbers.
commit 89e472f4121debb175b625ab0c138e24c4e60de8
Author: Martin Jaggi <[email protected]>
Date: 2014-02-07T17:16:51Z
new optimization documentation
explaining GD and SGD and the distributed versions that MLlib
implements.
commit a33be78a47bad1745a03a6e0ee1a4ea1a7893805
Author: Martin Jaggi <[email protected]>
Date: 2014-02-07T17:38:57Z
better comments in SGD code for regression
commit 73f5e71e3d9a253ff378907fca202b8d6aae1268
Author: Martin Jaggi <[email protected]>
Date: 2014-02-07T22:41:42Z
lambda R() in documentation
commit eec58c9c860def9b3b7604c990ec1697812bcbbf
Author: Martin Jaggi <[email protected]>
Date: 2014-02-08T17:31:05Z
telling what updater actually does
also use proper scaling for the L2 regularization (using 1/2 as in the
documentation)
commit 2c1cf8d35145081a61865f55f4e48fcfbafddbbe
Author: Martin Jaggi <[email protected]>
Date: 2014-02-08T17:56:01Z
remove broken url
commit ecbac73a7450fc90ef1509d9a410c9b627617130
Author: Martin Jaggi <[email protected]>
Date: 2014-02-08T17:57:12Z
better description of GradientDescent
----