HI Becket, Thanks a lot for the Table API enhancement design doc.
I am working on some simple ML algorithm using this new ML pipeline. Will feedback you if there is any Table enhancement needed. Thanks Weihua Becket Qin <becket....@gmail.com> 于2018年11月20日周二 下午10:43写道: > Hi Weihua, > > Thanks for the well written design doc! > > The abstraction of ML pipeline is pretty handy to the AI engineers. As > Jincheng mentioned, there is an undergoing effort to enhance the Table API > for ML. But it would still be helpful to understand what is missing in > Table API to fully support the ML pipeline. Given that there are quite a > few proposed API and different related items to discuss, do you think > having some examples of how the pipeline works would facilitate the > discussion? > > Again, thanks for kicking off the discussion. > > Jiangjie (Becket) Qin > > > On Tue, Nov 20, 2018 at 9:17 PM jincheng sun <sunjincheng...@gmail.com> > wrote: > > > Hi Weihua, > > Thanks for bring up this discuss! > > > > I quickly read the google doc,and I fully agree that ML can be well > > supported on TableAPI (at some stage in the future). > > In fact, Xiaowei and I have already brought up a discussion on enhancing > > the Table API. In the first phase, we will add support for > > map/flatmap/agg/flatagg in TableAPI. > > So I am very happy to be involved in this discussion and will leave a > > comment in the good doc later. > > > > I think It's grateful if you can add a phased implementation plan in > google > > doc. What to do you think? > > > > Thanks, > > Jincheng > > > > > > Weihua Jiang <weihua.ji...@gmail.com> 于2018年11月20日周二 下午8:53写道: > > > > > ML Pipeline is the idea brought by Scikit-learn > > > <https://arxiv.org/abs/1309.0238>. Both Spark and Flink has borrowed > > this > > > idea and made their own implementations [Spark ML Pipeline > > > <https://spark.apache.org/docs/latest/ml-pipeline.html>, Flink ML > > Pipeline > > > < > > > > > > https://ci.apache.org/projects/flink/flink-docs-release-1.6/dev/libs/ml/pipelines.html > > > >]. > > > > > > > > > > > > NOTE: though I am using the term "ML", ML Pipeline shall apply to both > ML > > > and DL pipelines. > > > > > > > > > ML Pipeline is quite helpful for model composition (i.e. using model(s) > > for > > > feature engineering) . And it enables logic reuse in train and > inference > > > phases (via pipeline persistence and load), which is essential for AI > > > engineering. ML Pipeline can also be a good base for Flink based AI > > > engineering platform if we can make ML Pipeline have good tooling > support > > > (i.e. meta data human readable). > > > > > > > > > As the Table API will be the unified high level API for both stream and > > > batch processing, I want to initiate the design discussion of new Table > > > based Flink ML Pipeline. > > > > > > > > > I drafted a design document [1] for this discussion. This design tries > to > > > create a new ML Pipeline implementation so that concrete ML/DL > algorithms > > > can fit to this new API to achieve interoperability. > > > > > > > > > Any feedback is highly appreciated. > > > > > > > > > Thanks > > > > > > Weihua > > > > > > > > > [1] > > > > > > > > > https://docs.google.com/document/d/1PLddLEMP_wn4xHwi6069f3vZL7LzkaP0MN9nAB63X90/edit?usp=sharing > > > > > >