GitHub user martinjaggi reopened a pull request: https://github.com/apache/incubator-spark/pull/563
new MLlib documentation for optimization, regression and classification new documentation with tex formulas, hopefully improving usability and reproducibility of the offered MLlib methods. also did some minor changes in the code for consistency. scala tests pass. for easier merging, we could maybe rebase these changes (only > feb 7 is relevant) after https://github.com/apache/incubator-spark/pull/552 is merged? jira: https://spark-project.atlassian.net/browse/MLLIB-19 You can merge this pull request into a Git repository by running: $ git pull https://github.com/apache/incubator-spark polishing-opt-MLlib Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-spark/pull/563.patch ---- commit d73948db0d9bc36296054e79fec5b1a657b4eab4 Author: Martin Jaggi <m.ja...@gmail.com> Date: 2014-02-06T15:57:23Z minor update on how to compile the documentation commit d1c5212b93c67436543c2d8ddbbf610fdf0a26eb Author: Martin Jaggi <m.ja...@gmail.com> Date: 2014-02-06T15:59:43Z enable mathjax formula in the .md documentation files code by @shivaram commit bbafafd2b497a5acaa03a140bb9de1fbb7d67ffa Author: Martin Jaggi <m.ja...@gmail.com> Date: 2014-02-06T16:31:29Z split MLlib documentation by techniques and linked from the main mllib-guide.md site commit dcd2142c164b2f602bf472bb152ad55bae82d31a Author: Martin Jaggi <m.ja...@gmail.com> Date: 2014-02-06T17:04:26Z enabling inline latex formulas with $.$ same mathjax configuration as used in math.stackexchange.com sample usage in the linear algebra (SVD) documentation commit 0364bfabbfc347f917216057a20c39b631842481 Author: Martin Jaggi <m.ja...@gmail.com> Date: 2014-02-07T02:19:38Z minor polishing, as suggested by @pwendell commit 93d74988c33a9e4ef0d15e39c8b8fc9e6c36bb28 Author: Martin Jaggi <m.ja...@gmail.com> Date: 2014-02-07T16:33:24Z renaming LeastSquaresGradient not to confuse with squared regularizer or a squared gradient. added some more comments as what the loss functions are good for commit e4cbe99bbcf7f53ebb8f1a0d2e0b869a4922bca4 Author: Martin Jaggi <m.ja...@gmail.com> Date: 2014-02-07T16:34:45Z use d for the number of features try to be consistent, that n is the number of data examples in the RDD, and each of them has d entries (also in documentation) commit 79768fd3429df5c6d56f05ac93bdd8cf4355d946 Author: Martin Jaggi <m.ja...@gmail.com> Date: 2014-02-07T17:13:17Z correct scaling for MSE loss to be consistent with the documentation commit 1e228062b01ac806c4bd032eb0975a8b92431fd9 Author: Martin Jaggi <m.ja...@gmail.com> Date: 2014-02-07T17:15:44Z new classification and regression documentation with complete mathematical formulations. trying to be general for adding future ML methods as well. table of all subgradients used for reference. this change also required a small addition to the mathjax configuration, to allow equation numbers. commit 89e472f4121debb175b625ab0c138e24c4e60de8 Author: Martin Jaggi <m.ja...@gmail.com> Date: 2014-02-07T17:16:51Z new optimization documentation explaining GD and SGD and the distributed versions that MLlib implements. commit a33be78a47bad1745a03a6e0ee1a4ea1a7893805 Author: Martin Jaggi <m.ja...@gmail.com> Date: 2014-02-07T17:38:57Z better comments in SGD code for regression commit 73f5e71e3d9a253ff378907fca202b8d6aae1268 Author: Martin Jaggi <m.ja...@gmail.com> Date: 2014-02-07T22:41:42Z lambda R() in documentation commit eec58c9c860def9b3b7604c990ec1697812bcbbf Author: Martin Jaggi <m.ja...@gmail.com> Date: 2014-02-08T17:31:05Z telling what updater actually does also use proper scaling for the L2 regularization (using 1/2 as in the documentation) commit 2c1cf8d35145081a61865f55f4e48fcfbafddbbe Author: Martin Jaggi <m.ja...@gmail.com> Date: 2014-02-08T17:56:01Z remove broken url commit ecbac73a7450fc90ef1509d9a410c9b627617130 Author: Martin Jaggi <m.ja...@gmail.com> Date: 2014-02-08T17:57:12Z better description of GradientDescent commit eae3dce25a4b68bf32ece1ca7783f9b2ffd56dff Author: Martin Jaggi <m.ja...@gmail.com> Date: 2014-02-08T20:30:35Z line wrap at 100 chars ----