Thanks for the clarification. If the list of "Remove deprecated APIs" means, we must remove the code in > Flink-2.0 initial release, I would vote -1 for queryable state before we > get an alternative.
FYI, the removal of queryable state is currently marked as the `must-have` priority. Of course it's not a final decision and that's exactly why we are collecting feedback about the list now. Best, Xintong On Mon, Jul 17, 2023 at 3:15 PM Yun Tang <myas...@live.com> wrote: > Hi Xintong, > > If the current implementation of queryable state would not block the > implementation of disaggregated state-backends. > I prefer to not removing the implementation until we have a better > solution (maybe based on the queryable snapshot) cc @Yuan. > > If the list of "Remove deprecated APIs" means, we must remove the code in > Flink-2.0 initial release, I would vote -1 for queryable state before we > get an alternative. > And I will raise the concern in the Flink roadmap discussion. > > > Best > Yun Tang > ________________________________ > From: Xintong Song <tonysong...@gmail.com> > Sent: Monday, July 17, 2023 10:07 > To: dev@flink.apache.org <dev@flink.apache.org> > Subject: Re: [VOTE] Release 2.0 must-have work items > > @Yun, > I see your point that the ability queryable states trying to provide is > meaningful but the current implementation of the feature is problematic. So > what's your opinion on deprecating the current queryable state? Do you > think we need to wait until there is a new implementation of queryable > state to remove the current one? Or maybe the current implementation is not > well functional anyway and we can treat the removal of it as > independent from introducing a new one? > > However, I don't want to make users feel that this feature cannot be done > > well, and maybe we can redesign this feature. > > > TBH, the impression that I got from the roadmap[1] is that the queryable > state is retiring and will be replaced by the state processor api. If this > is not the impression we want users to have, you probably also need to > raise it in the roadmap discussion [2]. > > Best, > > Xintong > > > [1] https://flink.apache.org/roadmap > > [2] https://lists.apache.org/thread/szdr4ngrfcmo7zko4917393zbqhgw0v5 > > > > On Mon, Jul 17, 2023 at 9:53 AM Xintong Song <tonysong...@gmail.com> > wrote: > > > I'd propose to downgrade "Refactor the API modules" to TBD. The original > > proposal was based on the condition that we are allowed to introduce > > in-place API breaking changes in release 2.0. As the migration period is > > introduced, and we are no longer planning to do in-place changes / > > removal for DataStream (and same for APIs in `flink-core`), we need to > > re-evaluate whether it's feasible to do things like moving classes to > > different module / packages, turning concrete classes into interfaces on > > the API classes. > > > > Best, > > > > Xintong > > > > > > > > On Mon, Jul 17, 2023 at 1:10 AM Yun Tang <myas...@live.com> wrote: > > > >> I agree that we could downgrade "Eager state declaration" to a > >> nice-to-have feature. > >> > >> For the depreciation of "queryable state", can we just rename to > >> deprecate "current implementation of queryable state"? The feature to > query > >> the internal state is actually very useful for debugging and could > provide > >> more possibility to extend FlinkSQL more like a database. > >> > >> Just as Yuan replied in the previous email [1], current implementation > of > >> queryable state has many problems in design. However, I don't want to > make > >> users feel that this feature cannot be done well, and maybe we can > redesign > >> this feature. As far as I know, risingwave already support queryable > state > >> with better user experience [2]. > >> > >> > >> [1] https://lists.apache.org/thread/9hmwcjb3q5c24pk3qshjvybfqk62v17m > >> [2] https://syntaxbug.com/06a3e7c554/ > >> > >> Best > >> Yun Tang > >> ________________________________ > >> From: Xintong Song <tonysong...@gmail.com> > >> Sent: Friday, July 14, 2023 13:51 > >> To: dev@flink.apache.org <dev@flink.apache.org> > >> Subject: Re: [VOTE] Release 2.0 must-have work items > >> > >> Thanks for the support, Yu. > >> > >> We will have the guideline before removing DataSet. We are currently > >> prioritizing works that need to be done before the 1.18 feature freeze, > >> and > >> will soon get back to working on the guidelines. We expect to get the > >> guideline ready before or soon after the 1.18 release, which will > >> definitely be before removing DataSet in 2.0. > >> > >> Best, > >> > >> Xintong > >> > >> > >> > >> On Fri, Jul 14, 2023 at 1:06 PM Yu Li <car...@gmail.com> wrote: > >> > >> > It's great to see the discussion about what we need to improve on > >> > (completely) switching from DataSet API to DataStream API from the > user > >> > perspective. I feel that these improvements would happen faster (only) > >> when > >> > we seriously prepare to remove the DataSet APIs with a target release, > >> just > >> > like what we are doing now. And the same applies to the SinkV1 related > >> > discussions (smile). > >> > > >> > I support Xintong's opinion on keeping "Remove the DataSet APIs" a > >> > must-have item, meantime I support Yuxia's opinion that we should > >> > explicitly let our users know how to migrate their existing DataSet > API > >> > based applications afterwards, meaning that the guideline Xintong > >> mentioned > >> > is a must-have (rather than best efforts) before removing the DataSet > >> APIs. > >> > > >> > Best Regards, > >> > Yu > >> > > >> > > >> > On Wed, 12 Jul 2023 at 14:00, yuxia <luoyu...@alumni.sjtu.edu.cn> > >> wrote: > >> > > >> > > Thanks Xintong for clarification. A guideline to help users > migrating > >> > from > >> > > DataSet to DataStream will definitely be helpful. > >> > > > >> > > Best regards, > >> > > Yuxia > >> > > > >> > > ----- 原始邮件 ----- > >> > > 发件人: "Xintong Song" <tonysong...@gmail.com> > >> > > 收件人: "dev" <dev@flink.apache.org> > >> > > 发送时间: 星期三, 2023年 7 月 12日 上午 11:40:12 > >> > > 主题: Re: [VOTE] Release 2.0 must-have work items > >> > > > >> > > @Yuxia, > >> > > > >> > > We are aware of the issue that you mentioned. Actually, I don't > think > >> the > >> > > DataStream API can cover everything in the DataSet API in exactly > the > >> > same > >> > > way, because the fundamental model, concepts and primitives of the > two > >> > sets > >> > > of APIs are completely different. Many of the DataSet APIs, > especially > >> > > those accessing the full data set at once, do not fit in the > >> DataStream > >> > > concepts at all. I think what's important is that users can achieve > >> the > >> > > same function, even if they may need to code in a different way. > >> > > > >> > > We have gone through all the existing DataSet APIs, and categorized > >> them > >> > > into 3 kinds: > >> > > - APIs that are well supported by DataStream API as is. E.g., map, > >> reduce > >> > > on grouped dataset, etc. > >> > > - APIs that can be achieved by DataStream API as is, but with a > price > >> > > (programming complexity, or computation efficiency). E.g., reduce on > >> full > >> > > dataset, sort partition, etc. Admittedly, there is room for > >> improvement > >> > on > >> > > these. We may keep improving these for the DataStream API, or we can > >> > > concentrate on supporting them better in the new ProcessFunction > API. > >> > > Either way, I don't think we should block the retiring of DataSet > API > >> on > >> > > them. > >> > > - There are also a few APIs that cannot be supported by the > DataStream > >> > API > >> > > as is, unless users write their custom operators from the ground up. > >> Only > >> > > left/rightOuterJoin and combineGroup fall into this category. I > think > >> > > combinedGroup is probably not a problem, because this is more like a > >> > > variant of reduceGroup that allows the framework to execute more > >> > > efficiently. As for the outer joins, depending on how badly this is > >> > needed, > >> > > it can be supported by emitting the non-joined entries upon > >> triggering a > >> > > window join. > >> > > > >> > > We are also planning to draft a guideline to help users migrating > from > >> > > DataSet to DataStream, which should demonstrate how users can > achieve > >> > > things like sort-partition with DataStream API. > >> > > > >> > > Last but not least, I'd like to point out that the decision to > >> deprecate > >> > > and eventually remove the DataSet API was approved in FLIP-131, and > >> all > >> > the > >> > > prerequisites mentioned in the FLIP have been completed. > >> > > > >> > > Best, > >> > > > >> > > Xintong > >> > > > >> > > > >> > > [1] > >> > > > >> > > >> > https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=158866741 > >> > > > >> > > > >> > > > >> > > On Wed, Jul 12, 2023 at 10:20 AM Jingsong Li < > jingsongl...@gmail.com> > >> > > wrote: > >> > > > >> > > > +1 to Leonard and Galen and Jing. > >> > > > > >> > > > About Source and Sink. > >> > > > We're still missing quite a bit of work, including functionality, > >> > > > including ease of use, including bug fixes, and I'm not sure we'll > >> be > >> > > > completely done by 2.0. > >> > > > Until that's done, we won't be in a position to clean up the old > >> APIs. > >> > > > > >> > > > Best, > >> > > > Jingsong > >> > > > > >> > > > On Wed, Jul 12, 2023 at 9:41 AM yuxia < > luoyu...@alumni.sjtu.edu.cn> > >> > > wrote: > >> > > > > > >> > > > > Hi,Xintong. > >> > > > > Sorry to disturb the voting. I just found an email[1] about > >> DataSet > >> > API > >> > > > from flink-user-zh channel. And I think it's not just a single > case > >> > > > according to my observation. > >> > > > > > >> > > > > Remove DataSet is a must have item in release-2.0. But as the > user > >> > > email > >> > > > said, if we remove DataSet, how users can implement > >> Sort/PartitionBy, > >> > etc > >> > > > as they did with DataSet? > >> > > > > Do we will also provide similar api in datastream or some other > >> thing > >> > > > before we remove DataSet? > >> > > > > Btw, as far as I see, with regarding to replcaing DataSet with > >> > > > Datastream, Datastream are missing many API. I think it may well > >> take > >> > > much > >> > > > effort to fully cover the missing api. > >> > > > > > >> > > > > [1] > >> https://lists.apache.org/thread/syjmt8f74gh8ok3z4lhgt95zl4dzn168 > >> > > > > > >> > > > > Best regards, > >> > > > > Yuxia > >> > > > > > >> > > > > ----- 原始邮件 ----- > >> > > > > 发件人: "Jing Ge" <j...@ververica.com.INVALID> > >> > > > > 收件人: "dev" <dev@flink.apache.org> > >> > > > > 发送时间: 星期三, 2023年 7 月 12日 上午 1:23:40 > >> > > > > 主题: Re: [VOTE] Release 2.0 must-have work items > >> > > > > > >> > > > > agree with what Leonard said. There are actually more issues wrt > >> the > >> > > new > >> > > > > Source and SinkV2[1] > >> > > > > > >> > > > > Speaking of must-have vs nice-to-have, I think it depends on the > >> > > > priority. > >> > > > > If removing them has higher priority, we should keep related > >> tasks as > >> > > > > must-have and make sure enough effort will be put to solve those > >> > issues > >> > > > and > >> > > > > therefore be able to remove those APIs. > >> > > > > > >> > > > > Best regards, > >> > > > > Jing > >> > > > > > >> > > > > [1] > >> https://lists.apache.org/thread/90qc9nrlzf0vbvg92klzp9ftxxc43nbk > >> > > > > > >> > > > > On Tue, Jul 11, 2023 at 10:26 AM Leonard Xu <xbjt...@gmail.com> > >> > wrote: > >> > > > > > >> > > > > > Thanks Xintong for driving this great work! But I’ve to give > my > >> > > > > > -1(binding) here: > >> > > > > > > >> > > > > > -1 to mark "deprecat SourceFunction/SinkFunction/Sinkv1" item > as > >> > must > >> > > > to > >> > > > > > have for release 2.0. > >> > > > > > > >> > > > > > I do a lot of connector work in the community, and I have two > >> > > insights > >> > > > > > from past experience: > >> > > > > > > >> > > > > > 1. Many developers reported that it is very difficult to > migrate > >> > from > >> > > > > > SourceFunction to new Source [1]. The migration of existing > >> > > conenctors > >> > > > > > after deprecated SourceFunction is very difficult. Some > >> developers > >> > > > (Flavio > >> > > > > > Pompermaier) reported that they gave up the migration because > it > >> > was > >> > > > too > >> > > > > > complicated. I believe it's not a few cases. This means that > >> > > > deprecating > >> > > > > > SourceFunction related interfaces require community > >> contributors to > >> > > > reduce > >> > > > > > the migration cost before starting the migration work. > >> > > > > > > >> > > > > > 2. IIRC, the function of SinkV2 cannot currently cover > >> SinkFunction > >> > > as > >> > > > > > described in FLIP-287[2], it means the migration path after > >> > deprecate > >> > > > > > SinkFunction/Sinkv1 does not exist, thus we cannot mark the > >> related > >> > > > > > interfaces of sinkfunction/sinkv1 as deprecated in 1.18. > >> > > > > > > >> > > > > > Based on these two cognitions, I think we should not mark > these > >> > > > interfaces > >> > > > > > as must to have in 2.0. Maintaining the two sets of > source/sink > >> > > > interfaces > >> > > > > > is not a concern for me, users can choose the interface to > >> > implement > >> > > > > > according to their energy and needs. > >> > > > > > > >> > > > > > Btw, some work items in 2.0 are marked as must to have, but no > >> > > > contributor > >> > > > > > has claimed them yet. I think this is a risk and hope the > >> Release > >> > > > Managers > >> > > > > > could pay attention to it. > >> > > > > > > >> > > > > > Thank you all RMs for your work, sorry again for interrupting > >> the > >> > > vote > >> > > > > > > >> > > > > > Best, > >> > > > > > Leonard > >> > > > > > > >> > > > > > [1] > >> > https://lists.apache.org/thread/sqq26s9rorynr4vx4nhxz3fmmxpgtdqp > >> > > > > > [2] > >> > > > > > > >> > > > > >> > > > >> > > >> > https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=240880853 > >> > > > > > > >> > > > > > > On Jul 11, 2023, at 4:11 PM, Yuan Mei < > yuanmei.w...@gmail.com > >> > > >> > > > wrote: > >> > > > > > > > >> > > > > > > As a second thought, I think "Eager State Declaration" is > >> > probably > >> > > > not a > >> > > > > > > must-have. > >> > > > > > > > >> > > > > > > I was originally thinking it is a prerequisite for "state > >> > querying > >> > > > for > >> > > > > > > disaggregated state management". > >> > > > > > > > >> > > > > > > Since disaggregated state management itself is not a > >> must-have, > >> > > > "Eager > >> > > > > > > State Declaration" is not as well. We can downgrade it to > >> "nice > >> > to > >> > > > have" > >> > > > > > if > >> > > > > > > no objection. > >> > > > > > > > >> > > > > > > Best > >> > > > > > > > >> > > > > > > Yuan > >> > > > > > > > >> > > > > > > On Mon, Jul 10, 2023 at 7:02 PM Jing Ge > >> > <j...@ververica.com.invalid > >> > > > > >> > > > > > wrote: > >> > > > > > > > >> > > > > > >> +1 > >> > > > > > >> > >> > > > > > >> On Mon, Jul 10, 2023 at 12:52 PM Yu Li <car...@gmail.com> > >> > wrote: > >> > > > > > >> > >> > > > > > >>> +1 (binding) > >> > > > > > >>> > >> > > > > > >>> Thanks for driving this and great to see us moving > forward. > >> > > > > > >>> > >> > > > > > >>> Best Regards, > >> > > > > > >>> Yu > >> > > > > > >>> > >> > > > > > >>> > >> > > > > > >>> On Mon, 10 Jul 2023 at 11:59, Feng Wang < > >> wangfeng...@gmail.com > >> > > > >> > > > wrote: > >> > > > > > >>> > >> > > > > > >>>> +1 > >> > > > > > >>>> Thanks for driving this, looking forward to the next > stage > >> of > >> > > > flink. > >> > > > > > >>>> > >> > > > > > >>>> On Fri, Jul 7, 2023 at 5:31 PM Xintong Song < > >> > > > tonysong...@gmail.com> > >> > > > > > >>> wrote: > >> > > > > > >>>> > >> > > > > > >>>>> Hi all, > >> > > > > > >>>>> > >> > > > > > >>>>> I'd like to start the VOTE for the must-have work items > >> for > >> > > > release > >> > > > > > >> 2.0 > >> > > > > > >>>>> [1]. The corresponding discussion thread is [2]. > >> > > > > > >>>>> > >> > > > > > >>>>> Please note that once the vote is approved, any changes > to > >> > the > >> > > > > > >>> must-have > >> > > > > > >>>>> items (adding / removing must-have items, changing the > >> > > priority) > >> > > > > > >>> requires > >> > > > > > >>>>> another vote. Assigning contributors / reviewers, > updating > >> > > > > > >>> descriptions / > >> > > > > > >>>>> progress, changes to nice-to-have items do not require > >> > another > >> > > > vote. > >> > > > > > >>>>> > >> > > > > > >>>>> The vote will be open until at least July 12, following > >> the > >> > > > consensus > >> > > > > > >>>>> voting process. Votes of PMC members are binding. > >> > > > > > >>>>> > >> > > > > > >>>>> Best, > >> > > > > > >>>>> > >> > > > > > >>>>> Xintong > >> > > > > > >>>>> > >> > > > > > >>>>> > >> > > > > > >>>>> [1] > >> > > > https://cwiki.apache.org/confluence/display/FLINK/2.0+Release > >> > > > > > >>>>> > >> > > > > > >>>>> [2] > >> > > > https://lists.apache.org/thread/l3dkdypyrovd3txzodn07lgdwtwvhgk4 > >> > > > > > >>>>> > >> > > > > > >>>> > >> > > > > > >>> > >> > > > > > >> > >> > > > > > > >> > > > > > > >> > > > > >> > > > >> > > >> > > >