GitHub user thvasilo opened a pull request:

    https://github.com/apache/flink/pull/613

    [WIP] - [FLINK-1807/1889] - Optimization frame work and initial SGD 
implementation

    This is a WIP PR for the optimization framework of the Flink ML library.
    
    The design is a mix between how sklearn and Apache Spark implement their 
learning algorithm optimization frameworks.
    
    The idea is that a Learner can take a Solver, LossFunction and 
RegularizationType as parameters, similar to the design that sklearn uses and 
Spark seems to be headed to. This allows for flexibility on how users design 
their learning algorithms.
    
    A Solver uses the  LossFunction and RegularizationType in order to optimize 
the weights according to the provided DataSet of LabeledVector (label, 
featuresVector).
    
    As you will see in the TODOs there are many questions regarding the design 
yet, and no real RegularizationType has been implemented yet so that interface 
could change depending on what we end up needing for the regularization 
calculation.
    
    A first implementation of Stochastic Gradient Descent is included. As you 
will see, the stochastic part is still missing as we are blocked on a sample 
operator for DataSet. Instead we have to map over the whole data.
    If you run the tests you will see that the third test where we try to 
perform just one step of the optimization does not work. I haven't been able to 
figure out why this happens yet, any help would be appreciated.
    
    I've also included a wrapper for BLAS functions that was copied directly 
from MLlib.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/thvasilo/flink optimization

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/flink/pull/613.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #613
    
----
commit 1ed6032b6505488549785ff38b5805586a0465cb
Author: Theodore Vasiloudis <t...@sics.se>
Date:   2015-04-21T08:59:34Z

    Interfaces for the optimization framework.
    
    BLAS.scala was directly copied from the Apache Spark project.

commit 5a40f14790fd024fdd9a01069262627cda2126a4
Author: Theodore Vasiloudis <t...@sics.se>
Date:   2015-04-21T09:01:50Z

    Added Stochastic Gradient Descent initial version and some tests.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

Reply via email to