GitHub user thvasilo opened a pull request: https://github.com/apache/flink/pull/613
[WIP] - [FLINK-1807/1889] - Optimization frame work and initial SGD implementation This is a WIP PR for the optimization framework of the Flink ML library. The design is a mix between how sklearn and Apache Spark implement their learning algorithm optimization frameworks. The idea is that a Learner can take a Solver, LossFunction and RegularizationType as parameters, similar to the design that sklearn uses and Spark seems to be headed to. This allows for flexibility on how users design their learning algorithms. A Solver uses the LossFunction and RegularizationType in order to optimize the weights according to the provided DataSet of LabeledVector (label, featuresVector). As you will see in the TODOs there are many questions regarding the design yet, and no real RegularizationType has been implemented yet so that interface could change depending on what we end up needing for the regularization calculation. A first implementation of Stochastic Gradient Descent is included. As you will see, the stochastic part is still missing as we are blocked on a sample operator for DataSet. Instead we have to map over the whole data. If you run the tests you will see that the third test where we try to perform just one step of the optimization does not work. I haven't been able to figure out why this happens yet, any help would be appreciated. I've also included a wrapper for BLAS functions that was copied directly from MLlib. You can merge this pull request into a Git repository by running: $ git pull https://github.com/thvasilo/flink optimization Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/613.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #613 ---- commit 1ed6032b6505488549785ff38b5805586a0465cb Author: Theodore Vasiloudis <t...@sics.se> Date: 2015-04-21T08:59:34Z Interfaces for the optimization framework. BLAS.scala was directly copied from the Apache Spark project. commit 5a40f14790fd024fdd9a01069262627cda2126a4 Author: Theodore Vasiloudis <t...@sics.se> Date: 2015-04-21T09:01:50Z Added Stochastic Gradient Descent initial version and some tests. ---- --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---