I agree that we could downgrade "Eager state declaration" to a nice-to-have 
feature.

For the depreciation of "queryable state", can we just rename to deprecate 
"current implementation of queryable state"? The feature to query the internal 
state is actually very useful for debugging and could provide more possibility 
to extend FlinkSQL more like a database.

Just as Yuan replied in the previous email [1], current implementation of 
queryable state has many problems in design. However, I don't want to make 
users feel that this feature cannot be done well, and maybe we can redesign 
this feature. As far as I know, risingwave already support  queryable state 
with better user experience [2].


[1] https://lists.apache.org/thread/9hmwcjb3q5c24pk3qshjvybfqk62v17m
[2] https://syntaxbug.com/06a3e7c554/

Best
Yun Tang
________________________________
From: Xintong Song <tonysong...@gmail.com>
Sent: Friday, July 14, 2023 13:51
To: dev@flink.apache.org <dev@flink.apache.org>
Subject: Re: [VOTE] Release 2.0 must-have work items

Thanks for the support, Yu.

We will have the guideline before removing DataSet. We are currently
prioritizing works that need to be done before the 1.18 feature freeze, and
will soon get back to working on the guidelines. We expect to get the
guideline ready before or soon after the 1.18 release, which will
definitely be before removing DataSet in 2.0.

Best,

Xintong



On Fri, Jul 14, 2023 at 1:06 PM Yu Li <car...@gmail.com> wrote:

> It's great to see the discussion about what we need to improve on
> (completely) switching from DataSet API to DataStream API from the user
> perspective. I feel that these improvements would happen faster (only) when
> we seriously prepare to remove the DataSet APIs with a target release, just
> like what we are doing now. And the same applies to the SinkV1 related
> discussions (smile).
>
> I support Xintong's opinion on keeping "Remove the DataSet APIs" a
> must-have item, meantime I support Yuxia's opinion that we should
> explicitly let our users know how to migrate their existing DataSet API
> based applications afterwards, meaning that the guideline Xintong mentioned
> is a must-have (rather than best efforts) before removing the DataSet APIs.
>
> Best Regards,
> Yu
>
>
> On Wed, 12 Jul 2023 at 14:00, yuxia <luoyu...@alumni.sjtu.edu.cn> wrote:
>
> > Thanks Xintong for clarification. A guideline to help users migrating
> from
> > DataSet to DataStream will definitely be helpful.
> >
> > Best regards,
> > Yuxia
> >
> > ----- 原始邮件 -----
> > 发件人: "Xintong Song" <tonysong...@gmail.com>
> > 收件人: "dev" <dev@flink.apache.org>
> > 发送时间: 星期三, 2023年 7 月 12日 上午 11:40:12
> > 主题: Re: [VOTE] Release 2.0 must-have work items
> >
> > @Yuxia,
> >
> > We are aware of the issue that you mentioned. Actually, I don't think the
> > DataStream API can cover everything in the DataSet API in exactly the
> same
> > way, because the fundamental model, concepts and primitives of the two
> sets
> > of APIs are completely different. Many of the DataSet APIs, especially
> > those accessing the full data set at once, do not fit in the DataStream
> > concepts at all. I think what's important is that users can achieve the
> > same function, even if they may need to code in a different way.
> >
> > We have gone through all the existing DataSet APIs, and categorized them
> > into 3 kinds:
> > - APIs that are well supported by DataStream API as is. E.g., map, reduce
> > on grouped dataset, etc.
> > - APIs that can be achieved by DataStream API as is, but with a price
> > (programming complexity, or computation efficiency). E.g., reduce on full
> > dataset, sort partition, etc. Admittedly, there is room for improvement
> on
> > these. We may keep improving these for the DataStream API, or we can
> > concentrate on supporting them better in the new ProcessFunction API.
> > Either way, I don't think we should block the retiring of DataSet API on
> > them.
> > - There are also a few APIs that cannot be supported by the DataStream
> API
> > as is, unless users write their custom operators from the ground up. Only
> > left/rightOuterJoin and combineGroup fall into this category. I think
> > combinedGroup is probably not a problem, because this is more like a
> > variant of reduceGroup that allows the framework to execute more
> > efficiently. As for the outer joins, depending on how badly this is
> needed,
> > it can be supported by emitting the non-joined entries upon triggering a
> > window join.
> >
> > We are also planning to draft a guideline to help users migrating from
> > DataSet to DataStream, which should demonstrate how users can achieve
> > things like sort-partition with DataStream API.
> >
> > Last but not least, I'd like to point out that the decision to deprecate
> > and eventually remove the DataSet API was approved in FLIP-131, and all
> the
> > prerequisites mentioned in the FLIP have been completed.
> >
> > Best,
> >
> > Xintong
> >
> >
> > [1]
> >
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=158866741
> >
> >
> >
> > On Wed, Jul 12, 2023 at 10:20 AM Jingsong Li <jingsongl...@gmail.com>
> > wrote:
> >
> > > +1 to Leonard and Galen and Jing.
> > >
> > > About Source and Sink.
> > > We're still missing quite a bit of work, including functionality,
> > > including ease of use, including bug fixes, and I'm not sure we'll be
> > > completely done by 2.0.
> > > Until that's done, we won't be in a position to clean up the old APIs.
> > >
> > > Best,
> > > Jingsong
> > >
> > > On Wed, Jul 12, 2023 at 9:41 AM yuxia <luoyu...@alumni.sjtu.edu.cn>
> > wrote:
> > > >
> > > > Hi,Xintong.
> > > > Sorry to disturb the voting. I just found an email[1] about DataSet
> API
> > > from flink-user-zh channel. And I think it's not just a single case
> > > according to my observation.
> > > >
> > > > Remove DataSet is a must have item in release-2.0. But as the user
> > email
> > > said, if we remove DataSet, how users can implement Sort/PartitionBy,
> etc
> > > as they did with DataSet?
> > > > Do we will also provide similar api in datastream or some other thing
> > > before we remove DataSet?
> > > > Btw, as far as I see, with regarding to replcaing DataSet with
> > > Datastream, Datastream are missing many API. I think it may well take
> > much
> > > effort to fully cover the missing api.
> > > >
> > > > [1] https://lists.apache.org/thread/syjmt8f74gh8ok3z4lhgt95zl4dzn168
> > > >
> > > > Best regards,
> > > > Yuxia
> > > >
> > > > ----- 原始邮件 -----
> > > > 发件人: "Jing Ge" <j...@ververica.com.INVALID>
> > > > 收件人: "dev" <dev@flink.apache.org>
> > > > 发送时间: 星期三, 2023年 7 月 12日 上午 1:23:40
> > > > 主题: Re: [VOTE] Release 2.0 must-have work items
> > > >
> > > > agree with what Leonard said. There are actually more issues wrt the
> > new
> > > > Source and SinkV2[1]
> > > >
> > > > Speaking of must-have vs nice-to-have, I think it depends on the
> > > priority.
> > > > If removing them has higher priority, we should keep related tasks as
> > > > must-have and make sure enough effort will be put to solve those
> issues
> > > and
> > > > therefore be able to remove those APIs.
> > > >
> > > > Best regards,
> > > > Jing
> > > >
> > > > [1] https://lists.apache.org/thread/90qc9nrlzf0vbvg92klzp9ftxxc43nbk
> > > >
> > > > On Tue, Jul 11, 2023 at 10:26 AM Leonard Xu <xbjt...@gmail.com>
> wrote:
> > > >
> > > > > Thanks Xintong for driving this great work! But I’ve to give my
> > > > > -1(binding) here:
> > > > >
> > > > > -1 to mark "deprecat SourceFunction/SinkFunction/Sinkv1" item as
> must
> > > to
> > > > > have for release 2.0.
> > > > >
> > > > > I do a lot of connector work in the community, and I have two
> > insights
> > > > > from past experience:
> > > > >
> > > > > 1. Many developers reported that it is very difficult to migrate
> from
> > > > > SourceFunction to new Source [1]. The migration of existing
> > conenctors
> > > > > after deprecated SourceFunction is very difficult. Some developers
> > > (Flavio
> > > > > Pompermaier) reported that they gave up the migration because it
> was
> > > too
> > > > > complicated. I believe it's not a few cases. This means that
> > > deprecating
> > > > > SourceFunction related interfaces require community contributors to
> > > reduce
> > > > > the migration cost before starting the migration work.
> > > > >
> > > > > 2. IIRC, the function of SinkV2 cannot currently cover SinkFunction
> > as
> > > > > described in FLIP-287[2], it means the migration path after
> deprecate
> > > > > SinkFunction/Sinkv1 does not exist, thus we cannot mark the related
> > > > > interfaces of sinkfunction/sinkv1  as deprecated in 1.18.
> > > > >
> > > > > Based on these two cognitions, I think we should not mark these
> > > interfaces
> > > > > as must to have in 2.0. Maintaining the two sets of source/sink
> > > interfaces
> > > > > is not a concern for me, users can choose the interface to
> implement
> > > > > according to their energy and needs.
> > > > >
> > > > > Btw, some work items in 2.0 are marked as must to have, but no
> > > contributor
> > > > > has claimed them yet. I think this is a risk and hope the Release
> > > Managers
> > > > > could pay attention to it.
> > > > >
> > > > > Thank you all RMs for your work, sorry again for interrupting the
> > vote
> > > > >
> > > > > Best,
> > > > > Leonard
> > > > >
> > > > > [1]
> https://lists.apache.org/thread/sqq26s9rorynr4vx4nhxz3fmmxpgtdqp
> > > > > [2]
> > > > >
> > >
> >
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=240880853
> > > > >
> > > > > > On Jul 11, 2023, at 4:11 PM, Yuan Mei <yuanmei.w...@gmail.com>
> > > wrote:
> > > > > >
> > > > > > As a second thought, I think "Eager State Declaration" is
> probably
> > > not a
> > > > > > must-have.
> > > > > >
> > > > > > I was originally thinking it is a prerequisite for "state
> querying
> > > for
> > > > > > disaggregated state management".
> > > > > >
> > > > > > Since disaggregated state management itself is not a must-have,
> > > "Eager
> > > > > > State Declaration" is not as well. We can downgrade it to "nice
> to
> > > have"
> > > > > if
> > > > > > no objection.
> > > > > >
> > > > > > Best
> > > > > >
> > > > > > Yuan
> > > > > >
> > > > > > On Mon, Jul 10, 2023 at 7:02 PM Jing Ge
> <j...@ververica.com.invalid
> > >
> > > > > wrote:
> > > > > >
> > > > > >> +1
> > > > > >>
> > > > > >> On Mon, Jul 10, 2023 at 12:52 PM Yu Li <car...@gmail.com>
> wrote:
> > > > > >>
> > > > > >>> +1 (binding)
> > > > > >>>
> > > > > >>> Thanks for driving this and great to see us moving forward.
> > > > > >>>
> > > > > >>> Best Regards,
> > > > > >>> Yu
> > > > > >>>
> > > > > >>>
> > > > > >>> On Mon, 10 Jul 2023 at 11:59, Feng Wang <wangfeng...@gmail.com
> >
> > > wrote:
> > > > > >>>
> > > > > >>>> +1
> > > > > >>>> Thanks for driving this, looking forward to the next stage of
> > > flink.
> > > > > >>>>
> > > > > >>>> On Fri, Jul 7, 2023 at 5:31 PM Xintong Song <
> > > tonysong...@gmail.com>
> > > > > >>> wrote:
> > > > > >>>>
> > > > > >>>>> Hi all,
> > > > > >>>>>
> > > > > >>>>> I'd like to start the VOTE for the must-have work items for
> > > release
> > > > > >> 2.0
> > > > > >>>>> [1]. The corresponding discussion thread is [2].
> > > > > >>>>>
> > > > > >>>>> Please note that once the vote is approved, any changes to
> the
> > > > > >>> must-have
> > > > > >>>>> items (adding / removing must-have items, changing the
> > priority)
> > > > > >>> requires
> > > > > >>>>> another vote. Assigning contributors / reviewers, updating
> > > > > >>> descriptions /
> > > > > >>>>> progress, changes to nice-to-have items do not require
> another
> > > vote.
> > > > > >>>>>
> > > > > >>>>> The vote will be open until at least July 12, following the
> > > consensus
> > > > > >>>>> voting process. Votes of PMC members are binding.
> > > > > >>>>>
> > > > > >>>>> Best,
> > > > > >>>>>
> > > > > >>>>> Xintong
> > > > > >>>>>
> > > > > >>>>>
> > > > > >>>>> [1]
> > > https://cwiki.apache.org/confluence/display/FLINK/2.0+Release
> > > > > >>>>>
> > > > > >>>>> [2]
> > > https://lists.apache.org/thread/l3dkdypyrovd3txzodn07lgdwtwvhgk4
> > > > > >>>>>
> > > > > >>>>
> > > > > >>>
> > > > > >>
> > > > >
> > > > >
> > >
> >
>

Reply via email to