GitHub user gaborhermann opened a pull request:

    https://github.com/apache/flink/pull/2819

    [FLINK-4961] [ml] SGD for Matrix Factorization (WIP)

    Please note, that this is a work-in-progress PR, to discuss some design 
questions. There are minor things to be done including the documentation (Scala 
docs are done). Apart from these and the questions worth discussing the PR is 
ready.
    
    Some notes:
    - Generalized matrix factorization methods into `MatrixFactorization` 
abstract class (this slightly modifies `ALS`).
    - The algorithm could be executed in parts with `MLTools.persist`, just 
like in `ALS` (to use less memory).
    - The algorithm uses random block ID initialization, and shuffles also the 
data when doing the updates. However, the algorithm can be made deterministic 
by setting a seed.
    - The objective function is simply squared loss with L2 regularization in 
contrast to `ALS`s weighted-lambda-regularization. This could be extended later 
to use other regularization methods too, as SGD is more flexible in terms of 
loss functions.
    - The same methods could be used for dynamically changing the learning rate 
as in the `GradientDescent` implementation.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/gaborhermann/flink dsgd

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/flink/pull/2819.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #2819
    
----
commit 88fffbf86e7a2ac8b1adc459e01e084ab2492e07
Author: Daniel Abram <[email protected]>
Date:   2016-11-16T13:34:51Z

    [FLINK-4961] SGD for Matrix Factorization

commit 9bd6f2ea4a4fec2e7f4c64cf2b14453f3ba91e48
Author: Gábor Hermann <[email protected]>
Date:   2016-11-16T13:35:10Z

    [FLINK-4961] SGD for Matrix Factorization test

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

Reply via email to