@Yuxia,

We are aware of the issue that you mentioned. Actually, I don't think the
DataStream API can cover everything in the DataSet API in exactly the same
way, because the fundamental model, concepts and primitives of the two sets
of APIs are completely different. Many of the DataSet APIs, especially
those accessing the full data set at once, do not fit in the DataStream
concepts at all. I think what's important is that users can achieve the
same function, even if they may need to code in a different way.

We have gone through all the existing DataSet APIs, and categorized them
into 3 kinds:
- APIs that are well supported by DataStream API as is. E.g., map, reduce
on grouped dataset, etc.
- APIs that can be achieved by DataStream API as is, but with a price
(programming complexity, or computation efficiency). E.g., reduce on full
dataset, sort partition, etc. Admittedly, there is room for improvement on
these. We may keep improving these for the DataStream API, or we can
concentrate on supporting them better in the new ProcessFunction API.
Either way, I don't think we should block the retiring of DataSet API on
them.
- There are also a few APIs that cannot be supported by the DataStream API
as is, unless users write their custom operators from the ground up. Only
left/rightOuterJoin and combineGroup fall into this category. I think
combinedGroup is probably not a problem, because this is more like a
variant of reduceGroup that allows the framework to execute more
efficiently. As for the outer joins, depending on how badly this is needed,
it can be supported by emitting the non-joined entries upon triggering a
window join.

We are also planning to draft a guideline to help users migrating from
DataSet to DataStream, which should demonstrate how users can achieve
things like sort-partition with DataStream API.

Last but not least, I'd like to point out that the decision to deprecate
and eventually remove the DataSet API was approved in FLIP-131, and all the
prerequisites mentioned in the FLIP have been completed.

Best,

Xintong


[1]
https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=158866741



On Wed, Jul 12, 2023 at 10:20 AM Jingsong Li <jingsongl...@gmail.com> wrote:

> +1 to Leonard and Galen and Jing.
>
> About Source and Sink.
> We're still missing quite a bit of work, including functionality,
> including ease of use, including bug fixes, and I'm not sure we'll be
> completely done by 2.0.
> Until that's done, we won't be in a position to clean up the old APIs.
>
> Best,
> Jingsong
>
> On Wed, Jul 12, 2023 at 9:41 AM yuxia <luoyu...@alumni.sjtu.edu.cn> wrote:
> >
> > Hi,Xintong.
> > Sorry to disturb the voting. I just found an email[1] about DataSet API
> from flink-user-zh channel. And I think it's not just a single case
> according to my observation.
> >
> > Remove DataSet is a must have item in release-2.0. But as the user email
> said, if we remove DataSet, how users can implement Sort/PartitionBy, etc
> as they did with DataSet?
> > Do we will also provide similar api in datastream or some other thing
> before we remove DataSet?
> > Btw, as far as I see, with regarding to replcaing DataSet with
> Datastream, Datastream are missing many API. I think it may well take much
> effort to fully cover the missing api.
> >
> > [1] https://lists.apache.org/thread/syjmt8f74gh8ok3z4lhgt95zl4dzn168
> >
> > Best regards,
> > Yuxia
> >
> > ----- 原始邮件 -----
> > 发件人: "Jing Ge" <j...@ververica.com.INVALID>
> > 收件人: "dev" <dev@flink.apache.org>
> > 发送时间: 星期三, 2023年 7 月 12日 上午 1:23:40
> > 主题: Re: [VOTE] Release 2.0 must-have work items
> >
> > agree with what Leonard said. There are actually more issues wrt the new
> > Source and SinkV2[1]
> >
> > Speaking of must-have vs nice-to-have, I think it depends on the
> priority.
> > If removing them has higher priority, we should keep related tasks as
> > must-have and make sure enough effort will be put to solve those issues
> and
> > therefore be able to remove those APIs.
> >
> > Best regards,
> > Jing
> >
> > [1] https://lists.apache.org/thread/90qc9nrlzf0vbvg92klzp9ftxxc43nbk
> >
> > On Tue, Jul 11, 2023 at 10:26 AM Leonard Xu <xbjt...@gmail.com> wrote:
> >
> > > Thanks Xintong for driving this great work! But I’ve to give my
> > > -1(binding) here:
> > >
> > > -1 to mark "deprecat SourceFunction/SinkFunction/Sinkv1" item as must
> to
> > > have for release 2.0.
> > >
> > > I do a lot of connector work in the community, and I have two insights
> > > from past experience:
> > >
> > > 1. Many developers reported that it is very difficult to migrate from
> > > SourceFunction to new Source [1]. The migration of existing conenctors
> > > after deprecated SourceFunction is very difficult. Some developers
> (Flavio
> > > Pompermaier) reported that they gave up the migration because it was
> too
> > > complicated. I believe it's not a few cases. This means that
> deprecating
> > > SourceFunction related interfaces require community contributors to
> reduce
> > > the migration cost before starting the migration work.
> > >
> > > 2. IIRC, the function of SinkV2 cannot currently cover SinkFunction as
> > > described in FLIP-287[2], it means the migration path after deprecate
> > > SinkFunction/Sinkv1 does not exist, thus we cannot mark the related
> > > interfaces of sinkfunction/sinkv1  as deprecated in 1.18.
> > >
> > > Based on these two cognitions, I think we should not mark these
> interfaces
> > > as must to have in 2.0. Maintaining the two sets of source/sink
> interfaces
> > > is not a concern for me, users can choose the interface to implement
> > > according to their energy and needs.
> > >
> > > Btw, some work items in 2.0 are marked as must to have, but no
> contributor
> > > has claimed them yet. I think this is a risk and hope the Release
> Managers
> > > could pay attention to it.
> > >
> > > Thank you all RMs for your work, sorry again for interrupting the vote
> > >
> > > Best,
> > > Leonard
> > >
> > > [1] https://lists.apache.org/thread/sqq26s9rorynr4vx4nhxz3fmmxpgtdqp
> > > [2]
> > >
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=240880853
> > >
> > > > On Jul 11, 2023, at 4:11 PM, Yuan Mei <yuanmei.w...@gmail.com>
> wrote:
> > > >
> > > > As a second thought, I think "Eager State Declaration" is probably
> not a
> > > > must-have.
> > > >
> > > > I was originally thinking it is a prerequisite for "state querying
> for
> > > > disaggregated state management".
> > > >
> > > > Since disaggregated state management itself is not a must-have,
> "Eager
> > > > State Declaration" is not as well. We can downgrade it to "nice to
> have"
> > > if
> > > > no objection.
> > > >
> > > > Best
> > > >
> > > > Yuan
> > > >
> > > > On Mon, Jul 10, 2023 at 7:02 PM Jing Ge <j...@ververica.com.invalid>
> > > wrote:
> > > >
> > > >> +1
> > > >>
> > > >> On Mon, Jul 10, 2023 at 12:52 PM Yu Li <car...@gmail.com> wrote:
> > > >>
> > > >>> +1 (binding)
> > > >>>
> > > >>> Thanks for driving this and great to see us moving forward.
> > > >>>
> > > >>> Best Regards,
> > > >>> Yu
> > > >>>
> > > >>>
> > > >>> On Mon, 10 Jul 2023 at 11:59, Feng Wang <wangfeng...@gmail.com>
> wrote:
> > > >>>
> > > >>>> +1
> > > >>>> Thanks for driving this, looking forward to the next stage of
> flink.
> > > >>>>
> > > >>>> On Fri, Jul 7, 2023 at 5:31 PM Xintong Song <
> tonysong...@gmail.com>
> > > >>> wrote:
> > > >>>>
> > > >>>>> Hi all,
> > > >>>>>
> > > >>>>> I'd like to start the VOTE for the must-have work items for
> release
> > > >> 2.0
> > > >>>>> [1]. The corresponding discussion thread is [2].
> > > >>>>>
> > > >>>>> Please note that once the vote is approved, any changes to the
> > > >>> must-have
> > > >>>>> items (adding / removing must-have items, changing the priority)
> > > >>> requires
> > > >>>>> another vote. Assigning contributors / reviewers, updating
> > > >>> descriptions /
> > > >>>>> progress, changes to nice-to-have items do not require another
> vote.
> > > >>>>>
> > > >>>>> The vote will be open until at least July 12, following the
> consensus
> > > >>>>> voting process. Votes of PMC members are binding.
> > > >>>>>
> > > >>>>> Best,
> > > >>>>>
> > > >>>>> Xintong
> > > >>>>>
> > > >>>>>
> > > >>>>> [1]
> https://cwiki.apache.org/confluence/display/FLINK/2.0+Release
> > > >>>>>
> > > >>>>> [2]
> https://lists.apache.org/thread/l3dkdypyrovd3txzodn07lgdwtwvhgk4
> > > >>>>>
> > > >>>>
> > > >>>
> > > >>
> > >
> > >
>

Reply via email to