Hi, Jiangjie, Thanks a lot for your feedback. And also thanks for our offline discussion! Yes, your right! The Row-based APIs which you mentioned are very friendly to flink user! In order to follow the concept of the traditional database, perhaps we named the corresponding function RowValued/TabeValued function will be more appropriate, then from the perspective of return value in TableAPI we have three type functions:
- ColumnValuedFunction - ScalarFunction & AggregateFunction, and the result is a column. - RowValuedFunction - MapFunction which I'll proposal is RowValuedFunciton, and result is a single row. - TableValuedFunction - FlatMapFunction which I'll proposal is TableValuedFunciton, and result is a table. The detail will be described in following FLIP/design doc. About the input type I think we can support both column parameters and row parameters. but I think the meaning you want to express should be consistent with me, we are on the same page, right? And thanks you like the proposal, I hope that we can work together to advance the work! Best, Jincheng Becket Qin <becket....@gmail.com> 于2018年11月2日周五 上午1:25写道: > Thanks for the proposal, Jincheng. > > This makes a lot of sense. As a programming interface, Table API is > especially attractive because it supports both batch and stream. However, > the relational-only API often forces users to shoehorn their logic into a > bunch of user defined functions. Introducing some more flexible API (e.g. > row-based APIs) to process records would really help here. > > Besides the processing API, another useful improvement would be allowing > batch tables and stream tables to run in the same job, which is actually a > quite common scenario. > > As you said, there are a lot of work could be done here. I am looking > forward to the upcoming FLIPs. > > Thanks, > > Jiangjie (Becket) Qin > > On Fri, Nov 2, 2018 at 12:10 AM jincheng sun <sunjincheng...@gmail.com> > wrote: > > > Hi, Timo, > > I am very grateful for your feedback, and I am very excited when I hear > > that you also consider adding a process function to the TableAPI. > > > > I agree that add support for the Process Function on the Table API, which > > is actually part of my proposal Enhancing the functionality of Table API. > > In fact, supporting the ProcessFunction means supporting the user-defined > > Operator. As you said, A ProcessFunction can implement any logic, > including > > the user-defined window, which leaves the user with enough freedom and > > control. At the same time, Co-PrecessFunction needs to be supported, so > we > > can implement the logic of User-Defined JOIN through Co-PrecessFunciton. > Of > > course, Co-PrecessFunciton also needs to introduce the concept of > Connect, > > and will introduce a new ConnectedTable type on TableAPI. And I also > think > > TableAPI also for more event-driven applications. > > > > About processFunction In addition to the timer function, it should be > > completely equivalent to flatmapFunction, so maybe we can support map and > > flatmap in Table, support processFunction in GroupedTable, because for > the > > reason of State, the Timer of ProcessFunction can only Apply to > > KeyedStream. > > > > You are right, ANSI-SQL is difficult to express complex operator logic > such > > as ProcessFunction, so once we decide to make these enhancements on the > > TableAPI, it means that the Flink SQL only includes ANSI-SQL operations, > > and the TableAPI' operations is SQL super set. This means that the Flink > > High-level API includes the A Query language SQL and A powerfu program > > language Table API. In this way, SQL using for those simple ETL user > > groups, the TableAPI is for a user group that needs to be customized for > > complex logic, and these users can enjoy The benefit of the query > > optimizer. Maybe we need more refinement and hard work to support these > > functions, but maybe this is a good direction of effort. > > > > Thanks, > > Jincheng > > > > Timo Walther <twal...@apache.org> 于2018年11月1日周四 下午10:08写道: > > > > > Hi Jincheng, > > > > > > I was also thinking about introducing a process function for the Table > > > API several times. This would allow to define more complex logic > (custom > > > windows, timers, etc.) embedded into a relational API with schema > > > awareness and optimization around the black box. Of course this would > > > mean that we diverge with Table API from SQL API, however, it would > open > > > the Table API also for more event-driven applications. > > > > > > Maybe it would be possible to define timers and firing logic using > Table > > > API expressions and UDFs. Within planning this would be treated as a > > > special Calc node. > > > > > > Just some ideas that might be interesting for new use cases. > > > > > > Regards, > > > Timo > > > > > > > > > Am 01.11.18 um 13:12 schrieb Aljoscha Krettek: > > > > Hi Jincheng, > > > > > > > > these points sound very good! Are there any concrete proposals for > > > changes? For example a FLIP/design document? > > > > > > > > See here for FLIPs: > > > > > > https://cwiki.apache.org/confluence/display/FLINK/Flink+Improvement+Proposals > > > > > > > > Best, > > > > Aljoscha > > > > > > > >> On 1. Nov 2018, at 12:51, jincheng sun <sunjincheng...@gmail.com> > > > wrote: > > > >> > > > >> *--------I am sorry for the formatting of the email content. I > > reformat > > > >> the **content** as follows-----------* > > > >> > > > >> *Hi ALL,* > > > >> > > > >> With the continuous efforts from the community, the Flink system has > > > been > > > >> continuously improved, which has attracted more and more users. > Flink > > > SQL > > > >> is a canonical, widely used relational query language. However, > there > > > are > > > >> still some scenarios where Flink SQL failed to meet user needs in > > terms > > > of > > > >> functionality and ease of use, such as: > > > >> > > > >> *1. In terms of functionality* > > > >> Iteration, user-defined window, user-defined join, user-defined > > > >> GroupReduce, etc. Users cannot express them with SQL; > > > >> > > > >> *2. In terms of ease of use* > > > >> > > > >> - Map - e.g. “dataStream.map(mapFun)”. Although > > “table.select(udf1(), > > > >> udf2(), udf3()....)” can be used to accomplish the same > function., > > > with a > > > >> map() function returning 100 columns, one has to define or call > 100 > > > UDFs > > > >> when using SQL, which is quite involved. > > > >> - FlatMap - e.g. “dataStrem.flatmap(flatMapFun)”. Similarly, it > > can > > > be > > > >> implemented with “table.join(udtf).select()”. However, it is > > obvious > > > that > > > >> dataStream is easier to use than SQL. > > > >> > > > >> Due to the above two reasons, some users have to use the DataStream > > API > > > or > > > >> the DataSet API. But when they do that, they lose the unification of > > > batch > > > >> and streaming. They will also lose the sophisticated optimizations > > such > > > as > > > >> codegen, aggregate join transpose and multi-stage agg from Flink > SQL. > > > >> > > > >> We believe that enhancing the functionality and productivity is > vital > > > for > > > >> the successful adoption of Table API. To this end, Table API still > > > >> requires more efforts from every contributor in the community. We > see > > > great > > > >> opportunity in improving our user’s experience from this work. Any > > > feedback > > > >> is welcome. > > > >> > > > >> Regards, > > > >> > > > >> Jincheng > > > >> > > > >> jincheng sun <sunjincheng...@gmail.com> 于2018年11月1日周四 下午5:07写道: > > > >> > > > >>> Hi all, > > > >>> > > > >>> With the continuous efforts from the community, the Flink system > has > > > been > > > >>> continuously improved, which has attracted more and more users. > Flink > > > SQL > > > >>> is a canonical, widely used relational query language. However, > there > > > are > > > >>> still some scenarios where Flink SQL failed to meet user needs in > > > terms of > > > >>> functionality and ease of use, such as: > > > >>> > > > >>> > > > >>> - > > > >>> > > > >>> In terms of functionality > > > >>> > > > >>> Iteration, user-defined window, user-defined join, user-defined > > > >>> GroupReduce, etc. Users cannot express them with SQL; > > > >>> > > > >>> - > > > >>> > > > >>> In terms of ease of use > > > >>> - > > > >>> > > > >>> Map - e.g. “dataStream.map(mapFun)”. Although > > > “table.select(udf1(), > > > >>> udf2(), udf3()....)” can be used to accomplish the same > > > function., with a > > > >>> map() function returning 100 columns, one has to define or > call > > > 100 UDFs > > > >>> when using SQL, which is quite involved. > > > >>> - > > > >>> > > > >>> FlatMap - e.g. “dataStrem.flatmap(flatMapFun)”. Similarly, > it > > > can > > > >>> be implemented with “table.join(udtf).select()”. However, it > is > > > obvious > > > >>> that datastream is easier to use than SQL. > > > >>> > > > >>> > > > >>> Due to the above two reasons, some users have to use the DataStream > > > API or > > > >>> the DataSet API. But when they do that, they lose the unification > of > > > batch > > > >>> and streaming. They will also lose the sophisticated optimizations > > > such as > > > >>> codegen, aggregate join transpose and multi-stage agg from Flink > > SQL. > > > >>> > > > >>> We believe that enhancing the functionality and productivity is > vital > > > for > > > >>> the successful adoption of Table API. To this end, Table API still > > > >>> requires more efforts from every contributor in the community. We > see > > > great > > > >>> opportunity in improving our user’s experience from this work. Any > > > feedback > > > >>> is welcome. > > > >>> > > > >>> Regards, > > > >>> > > > >>> Jincheng > > > >>> > > > >>> > > > > > > > > >