回复：[DISCUSS] Embracing Table API in Flink ML

Yun Gao Tue, 20 Nov 2018 08:44:03 -0800

Hi Weihua,

    Thanks for the exciting proposal!

I have quickly read through it, and I really appropriate the idea of
providing the ML Pipeline API similar to the commonly used library
scikit-learn, since it greatly reduce the learning cost for the AI engineers to
transfer to the Flink platform.

Currently we are also working on a related issue, namely enhancing the
stream iteration of Flink to support both SGD and online learning, and it also
support batch training as a special case. we have had a rough design and will
start a new discussion in the next few days. I think the enhanced stream
iteration will help to implement Estimators directly in Flink, and it may help
to simplify the online learning pipeline by eliminating the requirement to load
the models from external file systems.

I will read the design doc more carefully. Thanks again for sharing the
design doc!

Yours sincerely
Yun Gao

------------------------------------------------------------------
发件人：Weihua Jiang <weihua.ji...@gmail.com>
发送时间：2018年11月20日(星期二) 20:53
收件人：dev <dev@flink.apache.org>
主 题：[DISCUSS] Embracing Table API in Flink ML

ML Pipeline is the idea brought by Scikit-learn
<https://arxiv.org/abs/1309.0238>. Both Spark and Flink has borrowed this
idea and made their own implementations [Spark ML Pipeline
<https://spark.apache.org/docs/latest/ml-pipeline.html>, Flink ML Pipeline
<https://ci.apache.org/projects/flink/flink-docs-release-1.6/dev/libs/ml/pipelines.html>].

NOTE: though I am using the term "ML", ML Pipeline shall apply to both ML
and DL pipelines.

ML Pipeline is quite helpful for model composition (i.e. using model(s) for
feature engineering) . And it enables logic reuse in train and inference
phases (via pipeline persistence and load), which is essential for AI
engineering. ML Pipeline can also be a good base for Flink based AI
engineering platform if we can make ML Pipeline have good tooling support
(i.e. meta data human readable).

As the Table API will be the unified high level API for both stream and
batch processing, I want to initiate the design discussion of new Table
based Flink ML Pipeline.

I drafted a design document [1] for this discussion. This design tries to
create a new ML Pipeline implementation so that concrete ML/DL algorithms
can fit to this new API to achieve interoperability.

Any feedback is highly appreciated.

Thanks

Weihua

[1]
https://docs.google.com/document/d/1PLddLEMP_wn4xHwi6069f3vZL7LzkaP0MN9nAB63X90/edit?usp=sharing

回复：[DISCUSS] Embracing Table API in Flink ML

Reply via email to