GitHub user gaborhermann opened a pull request:
https://github.com/apache/flink/pull/2819
[FLINK-4961] [ml] SGD for Matrix Factorization (WIP)
Please note, that this is a work-in-progress PR, to discuss some design
questions. There are minor things to be done including the documentation (Scala
docs are done). Apart from these and the questions worth discussing the PR is
ready.
Some notes:
- Generalized matrix factorization methods into `MatrixFactorization`
abstract class (this slightly modifies `ALS`).
- The algorithm could be executed in parts with `MLTools.persist`, just
like in `ALS` (to use less memory).
- The algorithm uses random block ID initialization, and shuffles also the
data when doing the updates. However, the algorithm can be made deterministic
by setting a seed.
- The objective function is simply squared loss with L2 regularization in
contrast to `ALS`s weighted-lambda-regularization. This could be extended later
to use other regularization methods too, as SGD is more flexible in terms of
loss functions.
- The same methods could be used for dynamically changing the learning rate
as in the `GradientDescent` implementation.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/gaborhermann/flink dsgd
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/flink/pull/2819.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #2819
----
commit 88fffbf86e7a2ac8b1adc459e01e084ab2492e07
Author: Daniel Abram <[email protected]>
Date: 2016-11-16T13:34:51Z
[FLINK-4961] SGD for Matrix Factorization
commit 9bd6f2ea4a4fec2e7f4c64cf2b14453f3ba91e48
Author: Gábor Hermann <[email protected]>
Date: 2016-11-16T13:35:10Z
[FLINK-4961] SGD for Matrix Factorization test
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---