GitHub user martinjaggi opened a pull request: https://github.com/apache/incubator-spark/pull/566
new MLlib documentation for optimization, regression and classification new documentation with tex formulas, hopefully improving usability and reproducibility of the offered MLlib methods. also did some minor changes in the code for consistency. scala tests pass. this is the rebased branch, i deleted the old PR jira: https://spark-project.atlassian.net/browse/MLLIB-19 You can merge this pull request into a Git repository by running: $ git pull https://github.com/apache/incubator-spark copy-MLlib-d Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-spark/pull/566.patch ---- commit 1d7ba79c27458687833bafeb7668c5ceba8489ca Author: Martin Jaggi <m.ja...@gmail.com> Date: 2014-02-07T16:33:24Z renaming LeastSquaresGradient not to confuse with squared regularizer or a squared gradient. added some more comments as what the loss functions are good for commit e6ef3e8b870af40996a2e8f1bc3765262cf6c714 Author: Martin Jaggi <m.ja...@gmail.com> Date: 2014-02-07T16:34:45Z use d for the number of features try to be consistent, that n is the number of data examples in the RDD, and each of them has d entries (also in documentation) commit da439c161d1763dfce9833efe4f8b845b00e5bef Author: Martin Jaggi <m.ja...@gmail.com> Date: 2014-02-07T17:13:17Z correct scaling for MSE loss to be consistent with the documentation commit d28ac774f4582370748aca379d2931520ff68fa4 Author: Martin Jaggi <m.ja...@gmail.com> Date: 2014-02-07T17:15:44Z new classification and regression documentation with complete mathematical formulations. trying to be general for adding future ML methods as well. table of all subgradients used for reference. this change also required a small addition to the mathjax configuration, to allow equation numbers. commit 580b846c19bb98d4000d46a3682d3855b0a488e7 Author: Martin Jaggi <m.ja...@gmail.com> Date: 2014-02-07T17:16:51Z new optimization documentation explaining GD and SGD and the distributed versions that MLlib implements. commit 6c7a7dc07afbf471fef7bf14a5751c80c6bb39aa Author: Martin Jaggi <m.ja...@gmail.com> Date: 2014-02-07T17:38:57Z better comments in SGD code for regression commit 25acda728ae6ec3f3eea5462073f70b29526b3e2 Author: Martin Jaggi <m.ja...@gmail.com> Date: 2014-02-07T22:41:42Z lambda R() in documentation commit d82f9e80605c6dabf90135416f8921834621fdfe Author: Martin Jaggi <m.ja...@gmail.com> Date: 2014-02-08T17:31:05Z telling what updater actually does also use proper scaling for the L2 regularization (using 1/2 as in the documentation) commit fa355498331180623664a425783b96a12a8bcee9 Author: Martin Jaggi <m.ja...@gmail.com> Date: 2014-02-08T17:56:01Z remove broken url commit 564cb0cee0ae2fb41baa264ed21f28c9336bc1ba Author: Martin Jaggi <m.ja...@gmail.com> Date: 2014-02-08T17:57:12Z better description of GradientDescent commit 29e5d65e6098c3f25bcbeada99bcf9cbb151eea7 Author: Martin Jaggi <m.ja...@gmail.com> Date: 2014-02-08T20:30:35Z line wrap at 100 chars ----