Re: [DISCUSS] Enhancing the functionality and productivity of Table API

jincheng sun Tue, 06 Nov 2018 04:01:44 -0800

Hi Fabian,
Thank you for your deep thoughts in this regard, I think most of questions
you had mentioned are very worthy of in-depth discussion! I want share
thoughts about following questions:


1. Do we need move all DataSet API functionality into the Table API?
I think most of dataset functionality should be add into the TableAPI, such
as map, flatmap, groupReduce etc., Because these are very easy to use for
the user.

2. Do we support explicit physical operations like partitioning, sorting or
optimizer hints?
I think we do not want add the physical operations, e.g.:
sortPartition,partitionCustom etc. From the points of my view, those
physical operations are used to optimization, which can be solved by
hints(I think we should add hints feature to both tableAPI and SQL).

3. Do we want to support retractions in iteration?
I think support iteration is  a very complicated function。I am not sure,
but i think the implementation of the iteration may be implemented
according to the current batch mode, and the retraction is temporarily not
supported, assuming that the trained data will not be updated in the
current iteration. The updated data will be used in the next iteration.  So
I think we should in-depth discussion  in a new threading.

BTW, I find that you have had leave the very useful comments in the google
doc:
https://docs.google.com/document/d/1tnpxg31EQz2-MEzSotwFzqatsB4rNLz0I-l_vPa5H4Q/edit#

Thanks again for both your mail feedback and doc comments !

Best,
Jincheng



Fabian Hueske <fhue...@gmail.com> 于2018年11月6日周二 下午6:21写道：

> Thanks for the replies Xiaowei and others!
>
> You are right, I did not consider the batch optimization that would be
> missing if the DataSet API would be ported to extend the DataStream API.
> By extending the scope of the Table API, we can gain a holistic logical &
> physical optimization which would be great!
> Is your plan to move all DataSet API functionality into the Table API?
> If so, do you envision any batch-related API in DataStream at all or should
> this be done by converting a batch table to DataStream? I'm asking because
> if there would be batch features in DataStream, we would need some
> optimization there as well.
>
> I think the proposed separation of Table API (stateless APIs) and
> DataStream (APIs that expose state handling) is a good idea.
> On a side note, the DataSet API discouraged state handing in user function,
> so porting this Table API would be quite "natural".
>
> As I said before, I like that we can incrementally extend the Table API.
> Map and FlatMap functions do not seem too difficult.
> Reduce, GroupReduce, Combine, GroupCombine, MapPartition might be more
> tricky, esp. if we want to support retractions.
> Iterations should be a challenge. I assume that Calcite does not support
> iterations, so we probably need to split query / program and optimize parts
> separately (IIRC, this is also how Flink's own optimizer handles this).
> To what extend are you planning to support explicit physical operations
> like partitioning, sorting or optimizer hints?
>
> I haven't had a look in the design document that you shared. Probably, I
> find answers to some of my questions there ;-)
>
> Regarding the question of SQL or Table API, I agree that extending the
> scope of the Table API does not limit the scope for SQL.
> By adding more operations to the Table API we can expand it to use case
> that are not well-served by SQL.
> As others have said, we'll of course continue to extend and improve Flink's
> SQL support (within the bounds of the standard).
>
> Best, Fabian
>
> Am Di., 6. Nov. 2018 um 10:09 Uhr schrieb jincheng sun <
> sunjincheng...@gmail.com>:
>
> > Hi Jark,
> > Glad to see your feedback!
> > That's Correct, The proposal is aiming to extend the functionality for
> > Table API! I like add "drop" to fit the use case you mentioned. Not only
> > that, if a 100-columns Table. and our UDF needs these 100 columns, we
> don't
> > want to define the eval as eval(column0...column99), we prefer to define
> > eval as eval(Row)。Using it like this: table.select(udf (*)). All we also
> > need to consider if we put the columns package as a row. In a scenario
> like
> > this, we have Classification it as cloumn operation, and  list the
> changes
> > to the column operation after the map/flatMap/agg/flatAgg phase is
> > completed. And Currently,  Xiaowei has started a threading outlining
> which
> > talk about what we are proposing. Please see the detail in the mail
> thread:
> > Please see the detail in the mail thread:
> >
> >
> https://mail.google.com/mail/u/0/#search/xiaowei/FMfcgxvzLWzfvCnmvMzzSfxHTSfdwLkB
> > <
> >
> https://mail.google.com/mail/u/0/#search/xiaowei/FMfcgxvzLWzfvCnmvMzzSfxHTSfdwLkB
> > >
> >  .
> >
> > At this stage the Table API Enhancement Outline as follows:
> >
> >
> https://docs.google.com/document/d/1tnpxg31EQz2-MEzSotwFzqatsB4rNLz0I-l_vPa5H4Q/edit?usp=sharing
> >
> > Please let we know if you have further thoughts or feedback!
> >
> > Thanks，
> > Jincheng
> >
> >
> > Jark Wu <imj...@gmail.com> 于2018年11月6日周二 下午3:35写道：
> >
> > > Hi jingcheng,
> > >
> > > Thanks for your proposal. I think it is a helpful enhancement for
> > TableAPI
> > > which is a solid step forward for TableAPI.
> > > It doesn't weaken SQL or DataStream, because the conversion between
> > > DataStream and Table still works.
> > > People with advanced cases (e.g. complex and fine-grained state
> control)
> > > can go with DataStream,
> > > but most general cases can stay in TableAPI. This works is aiming to
> > extend
> > > the functionality for TableAPI,
> > > to extend the usage scenario, to help TableAPI becomes a more widely
> used
> > > API.
> > >
> > > For example, someone want to drop one column from a 100-columns Table.
> > > Currently, we have to convert
> > > Table to DataStream and use MapFunction to do that, or select the
> > remaining
> > > 99 columns using Table.select API.
> > > But if we support Table.drop() method for TableAPI, it will be a very
> > > convenient method and let users stay in Table.
> > >
> > > Looking forward to the more detailed design and further discussion.
> > >
> > > Regards,
> > > Jark
> > >
> > > jincheng sun <sunjincheng...@gmail.com> 于2018年11月6日周二 下午1:05写道：
> > >
> > > > Hi Rong Rong,
> > > >
> > > > Sorry for the late reply, And thanks for your feedback!  We will
> > continue
> > > > to add more convenience features to the TableAPI, such as map,
> flatmap,
> > > > agg, flatagg, iteration etc. And I am very happy that you are
> > interested
> > > on
> > > > this proposal. Due to this is a long-term continuous work, we will
> push
> > > it
> > > > in stages.  Currently  Xiaowei has started a threading outlining
> which
> > > talk
> > > > about what we are proposing. Please see the detail in the mail
> thread:
> > > >
> > > >
> > >
> >
> https://mail.google.com/mail/u/0/#search/xiaowei/FMfcgxvzLWzfvCnmvMzzSfxHTSfdwLkB
> > > > <
> > > >
> > >
> >
> https://mail.google.com/mail/u/0/#search/xiaowei/FMfcgxvzLWzfvCnmvMzzSfxHTSfdwLkB
> > > > >
> > > >  .
> > > >
> > > > The Table API Enhancement Outline as follows:
> > > >
> > > >
> > >
> >
> https://docs.google.com/document/d/1tnpxg31EQz2-MEzSotwFzqatsB4rNLz0I-l_vPa5H4Q/edit?usp=sharing
> > > >
> > > > Please let we know if you have further thoughts or feedback!
> > > >
> > > > Thanks,
> > > > Jincheng
> > > >
> > > > Fabian Hueske <fhue...@gmail.com> 于2018年11月5日周一 下午7:03写道：
> > > >
> > > > > Hi Jincheng,
> > > > >
> > > > > Thanks for this interesting proposal.
> > > > > I like that we can push this effort forward in a very fine-grained
> > > > manner,
> > > > > i.e., incrementally adding more APIs to the Table API.
> > > > >
> > > > > However, I also have a few questions / concerns.
> > > > > Today, the Table API is tightly integrated with the DataSet and
> > > > DataStream
> > > > > APIs. It is very easy to convert a Table into a DataSet or
> DataStream
> > > and
> > > > > vice versa. This mean it is already easy to combine custom logic an
> > > > > relational operations. What I like is that several aspects are
> > clearly
> > > > > separated like retraction and timestamp handling (see below) + all
> > > > > libraries on DataStream/DataSet can be easily combined with
> > relational
> > > > > operations.
> > > > > I can see that adding more functionality to the Table API would
> > remove
> > > > the
> > > > > distinction between DataSet and DataStream. However, wouldn't we
> get
> > a
> > > > > similar benefit by extending the DataStream API for proper support
> > for
> > > > > bounded streams (as is the long-term goal of Flink)?
> > > > > I'm also a bit skeptical about the optimization opportunities we
> > would
> > > > > gain. Map/FlatMap UDFs are black boxes that cannot be easily
> removed
> > > > > without additional information (I did some research on this a few
> > years
> > > > ago
> > > > > [1]).
> > > > >
> > > > > Moreover, I think there are a few tricky details that need to be
> > > resolved
> > > > > to enable a good integration.
> > > > >
> > > > > 1) How to deal with retraction messages? The DataStream API does
> not
> > > > have a
> > > > > notion of retractions. How would a MapFunction or FlatMapFunction
> > > handle
> > > > > retraction? Do they need to be aware of the change flag? Custom
> > > windowing
> > > > > and aggregation logic would certainly need to have that
> information.
> > > > > 2) How to deal with timestamps? The DataStream API does not give
> > access
> > > > to
> > > > > timestamps. In the Table API / SQL these are exposed as regular
> > > > attributes.
> > > > > How can we ensure that timestamp attributes remain valid (i.e.
> > aligned
> > > > with
> > > > > watermarks) if the output is produced by arbitrary code?
> > > > > There might be more issues of this kind.
> > > > >
> > > > > My main question would be how much would we gain with this proposal
> > > over
> > > > a
> > > > > tight integration of Table API and DataStream API, assuming that
> > batch
> > > > > functionality is moved to DataStream?
> > > > >
> > > > > Best, Fabian
> > > > >
> > > > > [1]
> http://stratosphere.eu/assets/papers/openingTheBlackBoxes_12.pdf
> > > > >
> > > > >
> > > > > Am Mo., 5. Nov. 2018 um 02:49 Uhr schrieb Rong Rong <
> > > walter...@gmail.com
> > > > >:
> > > > >
> > > > > > Hi Jincheng,
> > > > > >
> > > > > > Thank you for the proposal! I think being able to define a
> process
> > /
> > > > > > co-process function in table API definitely opens up a whole new
> > > level
> > > > of
> > > > > > applications using a unified API.
> > > > > >
> > > > > > In addition, as Tzu-Li and Hequn have mentioned, the benefit of
> > > > > > optimization layer of Table API will already bring in additional
> > > > benefit
> > > > > > over directly programming on top of DataStream/DataSet API. I am
> > very
> > > > > > interested an looking forward to seeing the support for more
> > complex
> > > > use
> > > > > > cases, especially iterations. It will enable table API to define
> > much
> > > > > > broader, event-driven use cases such as real-time ML
> > > > prediction/training.
> > > > > >
> > > > > > As Timo mentioned, This will make Table API diverge from the SQL
> > API.
> > > > But
> > > > > > as from my experience Table API was always giving me the
> impression
> > > to
> > > > > be a
> > > > > > more sophisticated, syntactic-aware way to express relational
> > > > operations.
> > > > > > Looking forward to further discussion and collaborations on the
> > FLIP
> > > > doc.
> > > > > >
> > > > > > --
> > > > > > Rong
> > > > > >
> > > > > > On Sun, Nov 4, 2018 at 5:22 PM jincheng sun <
> > > sunjincheng...@gmail.com>
> > > > > > wrote:
> > > > > >
> > > > > > > Hi tison,
> > > > > > >
> > > > > > > Thanks a lot for your feedback!
> > > > > > > I am very happy to see that community contributors agree to
> > > enhanced
> > > > > the
> > > > > > > TableAPI. This work is a long-term continuous work, we will
> push
> > it
> > > > in
> > > > > > > stages, we will soon complete  the enhanced list of the first
> > > phase，
> > > > we
> > > > > > can
> > > > > > > go deep discussion  in google doc. thanks again for joining on
> > the
> > > > very
> > > > > > > important discussion of the Flink Table API.
> > > > > > >
> > > > > > > Thanks,
> > > > > > > Jincheng
> > > > > > >
> > > > > > > Tzu-Li Chen <wander4...@gmail.com> 于2018年11月2日周五 下午1:49写道：
> > > > > > >
> > > > > > > > Hi jingchengm
> > > > > > > >
> > > > > > > > Thanks a lot for your proposal! I find it is a good start
> point
> > > for
> > > > > > > > internal optimization works and help Flink to be more
> > > > > > > > user-friendly.
> > > > > > > >
> > > > > > > > AFAIK, DataStream is the most popular API currently that
> Flink
> > > > > > > > users should describe their logic with detailed logic.
> > > > > > > > From a more internal view the conversion from DataStream to
> > > > > > > > JobGraph is quite mechanically and hard to be optimized. So
> > when
> > > > > > > > users program with DataStream, they have to learn more
> > internals
> > > > > > > > and spend a lot of time to tune for performance.
> > > > > > > > With your proposal, we provide enhanced functionality of
> Table
> > > API,
> > > > > > > > so that users can describe their job easily on Table aspect.
> > This
> > > > > gives
> > > > > > > > an opportunity to Flink developers to introduce an optimize
> > phase
> > > > > > > > while transforming user program(described by Table API) to
> > > internal
> > > > > > > > representation.
> > > > > > > >
> > > > > > > > Given a user who want to start using Flink with simple ETL,
> > > > > pipelining
> > > > > > > > or analytics, he would find it is most naturally described by
> > > > > SQL/Table
> > > > > > > > API. Further, as mentioned by @hequn,
> > > > > > > >
> > > > > > > > SQL is a widely used language. It follows standards, is a
> > > > > > > > > descriptive language, and is easy to use
> > > > > > > >
> > > > > > > >
> > > > > > > > thus we could expect with the enhancement of SQL/Table API,
> > Flink
> > > > > > > > becomes more friendly to users.
> > > > > > > >
> > > > > > > > Looking forward to the design doc/FLIP!
> > > > > > > >
> > > > > > > > Best,
> > > > > > > > tison.
> > > > > > > >
> > > > > > > >
> > > > > > > > jincheng sun <sunjincheng...@gmail.com> 于2018年11月2日周五
> > 上午11:46写道：
> > > > > > > >
> > > > > > > > > Hi Hequn,
> > > > > > > > > Thanks for your feedback! And also thanks for our offline
> > > > > discussion!
> > > > > > > > > You are right, unification of batch and streaming is very
> > > > important
> > > > > > for
> > > > > > > > > flink API.
> > > > > > > > > We will provide more detailed design later, Please let me
> > know
> > > if
> > > > > you
> > > > > > > > have
> > > > > > > > > further thoughts or feedback.
> > > > > > > > >
> > > > > > > > > Thanks,
> > > > > > > > > Jincheng
> > > > > > > > >
> > > > > > > > > Hequn Cheng <chenghe...@gmail.com> 于2018年11月2日周五
> 上午10:02写道：
> > > > > > > > >
> > > > > > > > > > Hi Jincheng,
> > > > > > > > > >
> > > > > > > > > > Thanks a lot for your proposal. It is very encouraging!
> > > > > > > > > >
> > > > > > > > > > As we all know, SQL is a widely used language. It follows
> > > > > > standards,
> > > > > > > > is a
> > > > > > > > > > descriptive language, and is easy to use. A powerful
> > feature
> > > of
> > > > > SQL
> > > > > > > is
> > > > > > > > > that
> > > > > > > > > > it supports optimization. Users only need to care about
> the
> > > > logic
> > > > > > of
> > > > > > > > the
> > > > > > > > > > program. The underlying optimizer will help users
> optimize
> > > the
> > > > > > > > > performance
> > > > > > > > > > of the program. However, in terms of functionality and
> ease
> > > of
> > > > > use,
> > > > > > > in
> > > > > > > > > some
> > > > > > > > > > scenarios sql will be limited, as described in Jincheng's
> > > > > proposal.
> > > > > > > > > >
> > > > > > > > > > Correspondingly, the DataStream/DataSet api can provide
> > > > powerful
> > > > > > > > > > functionalities. Users can write
> > > > > ProcessFunction/CoProcessFunction
> > > > > > > and
> > > > > > > > > get
> > > > > > > > > > the timer. Compared with SQL, it provides more
> > > functionalities
> > > > > and
> > > > > > > > > > flexibilities. However, it does not support optimization
> > like
> > > > > SQL.
> > > > > > > > > > Meanwhile, DataStream/DataSet api has not been unified
> > which
> > > > > means,
> > > > > > > for
> > > > > > > > > the
> > > > > > > > > > same logic, users need to write a job for each stream and
> > > > batch.
> > > > > > > > > >
> > > > > > > > > > With TableApi, I think we can combine the advantages of
> > both.
> > > > > Users
> > > > > > > can
> > > > > > > > > > easily write relational operations and enjoy
> optimization.
> > At
> > > > the
> > > > > > > same
> > > > > > > > > > time, it supports more functionality and ease of use.
> > Looking
> > > > > > forward
> > > > > > > > to
> > > > > > > > > > the detailed design/FLIP.
> > > > > > > > > >
> > > > > > > > > > Best,
> > > > > > > > > > Hequn
> > > > > > > > > >
> > > > > > > > > > On Fri, Nov 2, 2018 at 9:48 AM Shaoxuan Wang <
> > > > > wshaox...@gmail.com>
> > > > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > > Hi Aljoscha，
> > > > > > > > > > > Glad that you like the proposal. We have completed the
> > > > > prototype
> > > > > > of
> > > > > > > > > most
> > > > > > > > > > > new proposed functionalities. Once collect the feedback
> > > from
> > > > > > > > community,
> > > > > > > > > > we
> > > > > > > > > > > will come up with a concrete FLIP/design doc.
> > > > > > > > > > >
> > > > > > > > > > > Regards,
> > > > > > > > > > > Shaoxuan
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > On Thu, Nov 1, 2018 at 8:12 PM Aljoscha Krettek <
> > > > > > > aljos...@apache.org
> > > > > > > > >
> > > > > > > > > > > wrote:
> > > > > > > > > > >
> > > > > > > > > > > > Hi Jincheng,
> > > > > > > > > > > >
> > > > > > > > > > > > these points sound very good! Are there any concrete
> > > > > proposals
> > > > > > > for
> > > > > > > > > > > > changes? For example a FLIP/design document?
> > > > > > > > > > > >
> > > > > > > > > > > > See here for FLIPs:
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/Flink+Improvement+Proposals
> > > > > > > > > > > >
> > > > > > > > > > > > Best,
> > > > > > > > > > > > Aljoscha
> > > > > > > > > > > >
> > > > > > > > > > > > > On 1. Nov 2018, at 12:51, jincheng sun <
> > > > > > > sunjincheng...@gmail.com
> > > > > > > > >
> > > > > > > > > > > wrote:
> > > > > > > > > > > > >
> > > > > > > > > > > > > *--------I am sorry for the formatting of the email
> > > > > content.
> > > > > > I
> > > > > > > > > > reformat
> > > > > > > > > > > > > the **content** as follows-----------*
> > > > > > > > > > > > >
> > > > > > > > > > > > > *Hi ALL,*
> > > > > > > > > > > > >
> > > > > > > > > > > > > With the continuous efforts from the community, the
> > > Flink
> > > > > > > system
> > > > > > > > > has
> > > > > > > > > > > been
> > > > > > > > > > > > > continuously improved, which has attracted more and
> > > more
> > > > > > users.
> > > > > > > > > Flink
> > > > > > > > > > > SQL
> > > > > > > > > > > > > is a canonical, widely used relational query
> > language.
> > > > > > However,
> > > > > > > > > there
> > > > > > > > > > > are
> > > > > > > > > > > > > still some scenarios where Flink SQL failed to meet
> > > user
> > > > > > needs
> > > > > > > in
> > > > > > > > > > terms
> > > > > > > > > > > > of
> > > > > > > > > > > > > functionality and ease of use, such as:
> > > > > > > > > > > > >
> > > > > > > > > > > > > *1. In terms of functionality*
> > > > > > > > > > > > >    Iteration, user-defined window, user-defined
> join,
> > > > > > > > user-defined
> > > > > > > > > > > > > GroupReduce, etc. Users cannot express them with
> SQL;
> > > > > > > > > > > > >
> > > > > > > > > > > > > *2. In terms of ease of use*
> > > > > > > > > > > > >
> > > > > > > > > > > > >   - Map - e.g. “dataStream.map(mapFun)”. Although
> > > > > > > > > > “table.select(udf1(),
> > > > > > > > > > > > >   udf2(), udf3()....)” can be used to accomplish
> the
> > > same
> > > > > > > > > function.,
> > > > > > > > > > > > with a
> > > > > > > > > > > > >   map() function returning 100 columns, one has to
> > > define
> > > > > or
> > > > > > > call
> > > > > > > > > 100
> > > > > > > > > > > > UDFs
> > > > > > > > > > > > >   when using SQL, which is quite involved.
> > > > > > > > > > > > >   - FlatMap -  e.g.
> “dataStrem.flatmap(flatMapFun)”.
> > > > > > Similarly,
> > > > > > > > it
> > > > > > > > > > can
> > > > > > > > > > > be
> > > > > > > > > > > > >   implemented with “table.join(udtf).select()”.
> > > However,
> > > > it
> > > > > > is
> > > > > > > > > > obvious
> > > > > > > > > > > > that
> > > > > > > > > > > > >   dataStream is easier to use than SQL.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Due to the above two reasons, some users have to
> use
> > > the
> > > > > > > > DataStream
> > > > > > > > > > API
> > > > > > > > > > > > or
> > > > > > > > > > > > > the DataSet API. But when they do that, they lose
> the
> > > > > > > unification
> > > > > > > > > of
> > > > > > > > > > > > batch
> > > > > > > > > > > > > and streaming. They will also lose the
> sophisticated
> > > > > > > > optimizations
> > > > > > > > > > such
> > > > > > > > > > > > as
> > > > > > > > > > > > > codegen, aggregate join transpose and multi-stage
> agg
> > > > from
> > > > > > > Flink
> > > > > > > > > SQL.
> > > > > > > > > > > > >
> > > > > > > > > > > > > We believe that enhancing the functionality and
> > > > > productivity
> > > > > > is
> > > > > > > > > vital
> > > > > > > > > > > for
> > > > > > > > > > > > > the successful adoption of Table API. To this end,
> > > Table
> > > > > API
> > > > > > > > still
> > > > > > > > > > > > > requires more efforts from every contributor in the
> > > > > > community.
> > > > > > > We
> > > > > > > > > see
> > > > > > > > > > > > great
> > > > > > > > > > > > > opportunity in improving our user’s experience from
> > > this
> > > > > > work.
> > > > > > > > Any
> > > > > > > > > > > > feedback
> > > > > > > > > > > > > is welcome.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Regards,
> > > > > > > > > > > > >
> > > > > > > > > > > > > Jincheng
> > > > > > > > > > > > >
> > > > > > > > > > > > > jincheng sun <sunjincheng...@gmail.com>
> > 于2018年11月1日周四
> > > > > > > 下午5:07写道：
> > > > > > > > > > > > >
> > > > > > > > > > > > >> Hi all,
> > > > > > > > > > > > >>
> > > > > > > > > > > > >> With the continuous efforts from the community,
> the
> > > > Flink
> > > > > > > system
> > > > > > > > > has
> > > > > > > > > > > > been
> > > > > > > > > > > > >> continuously improved, which has attracted more
> and
> > > more
> > > > > > > users.
> > > > > > > > > > Flink
> > > > > > > > > > > > SQL
> > > > > > > > > > > > >> is a canonical, widely used relational query
> > language.
> > > > > > > However,
> > > > > > > > > > there
> > > > > > > > > > > > are
> > > > > > > > > > > > >> still some scenarios where Flink SQL failed to
> meet
> > > user
> > > > > > needs
> > > > > > > > in
> > > > > > > > > > > terms
> > > > > > > > > > > > of
> > > > > > > > > > > > >> functionality and ease of use, such as:
> > > > > > > > > > > > >>
> > > > > > > > > > > > >>
> > > > > > > > > > > > >>   -
> > > > > > > > > > > > >>
> > > > > > > > > > > > >>   In terms of functionality
> > > > > > > > > > > > >>
> > > > > > > > > > > > >> Iteration, user-defined window, user-defined join,
> > > > > > > user-defined
> > > > > > > > > > > > >> GroupReduce, etc. Users cannot express them with
> > SQL;
> > > > > > > > > > > > >>
> > > > > > > > > > > > >>   -
> > > > > > > > > > > > >>
> > > > > > > > > > > > >>   In terms of ease of use
> > > > > > > > > > > > >>   -
> > > > > > > > > > > > >>
> > > > > > > > > > > > >>      Map - e.g. “dataStream.map(mapFun)”. Although
> > > > > > > > > > > “table.select(udf1(),
> > > > > > > > > > > > >>      udf2(), udf3()....)” can be used to
> accomplish
> > > the
> > > > > same
> > > > > > > > > > > function.,
> > > > > > > > > > > > with a
> > > > > > > > > > > > >>      map() function returning 100 columns, one has
> > to
> > > > > define
> > > > > > > or
> > > > > > > > > call
> > > > > > > > > > > > 100 UDFs
> > > > > > > > > > > > >>      when using SQL, which is quite involved.
> > > > > > > > > > > > >>      -
> > > > > > > > > > > > >>
> > > > > > > > > > > > >>      FlatMap -  e.g.
> > “dataStrem.flatmap(flatMapFun)”.
> > > > > > > Similarly,
> > > > > > > > > it
> > > > > > > > > > > can
> > > > > > > > > > > > >>      be implemented with
> > “table.join(udtf).select()”.
> > > > > > However,
> > > > > > > > it
> > > > > > > > > is
> > > > > > > > > > > > obvious
> > > > > > > > > > > > >>      that datastream is easier to use than SQL.
> > > > > > > > > > > > >>
> > > > > > > > > > > > >>
> > > > > > > > > > > > >> Due to the above two reasons, some users have to
> use
> > > the
> > > > > > > > > DataStream
> > > > > > > > > > > API
> > > > > > > > > > > > or
> > > > > > > > > > > > >> the DataSet API. But when they do that, they lose
> > the
> > > > > > > > unification
> > > > > > > > > of
> > > > > > > > > > > > batch
> > > > > > > > > > > > >> and streaming. They will also lose the
> sophisticated
> > > > > > > > optimizations
> > > > > > > > > > > such
> > > > > > > > > > > > as
> > > > > > > > > > > > >> codegen, aggregate join transpose  and multi-stage
> > agg
> > > > > from
> > > > > > > > Flink
> > > > > > > > > > SQL.
> > > > > > > > > > > > >>
> > > > > > > > > > > > >> We believe that enhancing the functionality and
> > > > > productivity
> > > > > > > is
> > > > > > > > > > vital
> > > > > > > > > > > > for
> > > > > > > > > > > > >> the successful adoption of Table API. To this end,
> > > > Table
> > > > > > API
> > > > > > > > > still
> > > > > > > > > > > > >> requires more efforts from every contributor in
> the
> > > > > > community.
> > > > > > > > We
> > > > > > > > > > see
> > > > > > > > > > > > great
> > > > > > > > > > > > >> opportunity in improving our user’s experience
> from
> > > this
> > > > > > work.
> > > > > > > > Any
> > > > > > > > > > > > feedback
> > > > > > > > > > > > >> is welcome.
> > > > > > > > > > > > >>
> > > > > > > > > > > > >> Regards,
> > > > > > > > > > > > >>
> > > > > > > > > > > > >> Jincheng
> > > > > > > > > > > > >>
> > > > > > > > > > > > >>
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: [DISCUSS] Enhancing the functionality and productivity of Table API

Reply via email to