Hi Stavros,

Thanks for bringing this up.

There have been past [1] and recent [2, 3] discussions about the Flink libraries, because there are some stalling PRs and overloaded committers. (Actually, Till is the only committer shepherd of the both the CEP and ML library, and AFAIK he has a ton of other responsibilities and work to do.) Thus it's hard to get code reviewed and merged, and without merged code it's hard to get a committer status, so there are not many committers who can review e.g. ML algorithm implementations, and the cycle goes on. Until this is resolved somehow, we should help the committers by reviewing each-others PRs.

I think prioritizing features (b) is a good way to start. We could declare most blocking features and concentrate on reviewing and merging them before moving forward. E.g. the evaluation framework is quite important for an ML library in my opinion, and has a PR stalling for long [4].

Regarding c), there are styleguides generally for contributing to Flink, so we should follow that. Is there something more ML specific you think we could follow? We should definitely declare, we follow scikit-learn and make sure contributions comply to that.

In terms of features (a, d), I think we should first see the bigger picture. That is, it would be nice to discuss a clearer direction for Flink ML. I've seen a lot of interest in contributing to Flink ML lately. I believe we should rethink our goals, to put the contribution efforts in making a usable and useful library. Are we trying to implement as many useful algorithms as possible to create a scalable ML library? That would seem ambitious, and of course there are a lot of frameworks and libraries that already has something like this as goal (e.g. Spark MLlib, Mahout). Should we rather create connectors to existing libraries? Then we cannot really do Flink specific optimizations. Should we go for online machine learning (as Flink is concentrating on streaming)? We already have a connector to SAMOA. We could go on with questions like this. Maybe I'm missing something, but I haven't seen such directions declared.

Cheers,
Gabor

[1] http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/Opening-a-discussion-on-FlinkML-td10265.html [2] http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/Flink-CEP-development-is-stalling-td15237.html#a15341 [3] http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/New-Flink-team-member-Kate-Eri-td15349.html
[4] https://github.com/apache/flink/pull/1849

On 2017-02-20 11:43, Stavros Kontopoulos wrote:

(Resending with the appropriate topic)

Hi,

I would like to start a discussion about next steps for Flink ML.
Currently there is a lot of work going on but needs a push forward.

Some topics to discuss:

a) How several features should be planned and get aligned with Flink
releases.
b) Priorities of what should be done.
c) Basic guidelines for code: styleguides, scikit-learn compliance etc
d) Missing features important for the success of the library, next steps
etc...

Thoughts?

Best,
Stavros


Reply via email to