GitHub user iyerr3 opened a pull request: https://github.com/apache/madlib/pull/229
SVM: Add minibatch as a new solver Additional author: Nikhil Kak <n...@pivotal.io> This work is based on the original work by Xiaocheng Tang <xiaochen...@gmail.com> in #75. This PR adds two main features: 1. A Minibatch solver that takes as input a batch of data 2. Minibatching for SVM You can merge this pull request into a Git repository by running: $ git pull https://github.com/iyerr3/incubator-madlib feature/svm_minibatch Alternatively you can review and apply these changes as the patch at: https://github.com/apache/madlib/pull/229.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #229 ---- commit 8dde3fc0e42be6fbd97585cc046060b84d624da1 Author: Rahul Iyer <riyer@...> Date: 2018-01-08T21:21:16Z New module: Add minibatch as additional optimization framework commit 8b0af20c79af41d6c5fe023b32bd885f15df5bd1 Author: Rahul Iyer <riyer@...> Date: 2018-01-08T21:23:55Z SVM: Add minibatch capabilities commit 2526d61f36e740df47e6b103e485133deb99ec43 Author: Rahul Iyer <riyer@...> Date: 2018-01-09T00:55:34Z Add new dataset for batch commit 6a8649771d7950d376285b24e25070fd524be519 Author: Nikhil Kak <nkak@...> Date: 2018-01-09T18:52:18Z Add install-check test for svm minibatch commit b5d1adbc5b501640cb0230d5622143d9b25ce4f5 Author: Nikhil Kak <nkak@...> Date: 2018-01-10T18:27:57Z Add predict call for svm installcheck test commit a943b1ac6c162a33eaf269d775061ddc559dd360 Author: Rahul Iyer <riyer@...> Date: 2018-01-10T23:30:00Z Update model in getLoss function commit 971403769f65a05e95a07d2df44c8a30120d025c Author: Nikhil Kak <nkak@...> Date: 2018-01-11T00:46:07Z Refactor svm minibatch to add comments and update variable names. We are now using a ColumnVector instead of MappedColumnVector because the minibatch transition function wasn't able to convert ColumnVector to MappedColumnVector. This required us to not rebind tuple.depVar and instead just assign it to y. commit c86a36ca149a99d569ba54ddaaabe585f729df59 Author: Nikhil Kak <nkak@...> Date: 2018-01-11T21:47:16Z Add classification test for svm minibatch and add relevane asserts. commit a43bff2c0392770148fba2d04fb75c19d48803ee Author: Rahul Iyer <riyer@...> Date: 2018-01-12T00:42:13Z SVM: Fix classification with minibatching Changes: - Unnest minibatch array data to get all dependent labels - Transform dependent labels to 1/-1 by unnesting and then rebuilding array - Update install-check to use text labels and better thresholds for assert commit 5d29f4ed7eaa8bdac16b72136f70aeca7997b54d Author: Nikhil Kak <nkak@...> Date: 2018-01-12T01:30:21Z Use correct count for averaging the gradient/loss commit b9a69ddabdade4a3f57577de391d0f093ed93630 Author: Rahul Iyer <riyer@...> Date: 2018-01-12T21:48:45Z SVM: Fix minibatch data in install-check commit 60bda8a147d3a354329c9720fd41bb13650799a9 Author: Rahul Iyer <riyer@...> Date: 2018-01-18T00:09:49Z Add assert for data validation (+ comment updates) ---- ---