Thanks for working on this, Weijie.

The design flaws of the current DataStream API (i.e., V1) have been a pain
for a long time. It's great to see efforts going on trying to resolve them.

Significant changes to such an important and comprehensive set of public
APIs deserves caution. From that perspective, the ideas of introducing a
new set of APIs that gradually replace the current one, splitting the
introducing of the new APIs into many separate FLIPs, and making
intermediate APIs @Experiemental until all of them are completed make
great sense to me.

Besides, the ideas of generalized watermark, execution hints sound quite
interesting. Looking forward to more detailed discussions in the
corresponding sub-FLIPs.

+1 for the roadmap.

Best,

Xintong



On Tue, Jan 30, 2024 at 11:00 AM weijie guo <guoweijieres...@gmail.com>
wrote:

> Hi Wencong:
>
> > The Processing TimerService is currently
> defined as one of the basic primitives, partly because it's understood that
> you have to choose between processing time and event time.
> The other part of the reason is that it needs to work based on the task's
> mailbox thread model to avoid concurrency issues. Could you clarify the
> second
> part of the reason?
>
> Since the processing logic of the operators takes place in the mailbox
> thread, the processing timer's callback function must also be executed in
> the mailbox to ensure thread safety.
> If we do not define the Processing TimerService as primitive, there is no
> way for the user to dispatch custom logic to the mailbox thread.
>
>
> Best regards,
>
> Weijie
>
>
> Xuannan Su <suxuanna...@gmail.com> 于2024年1月29日周一 17:12写道:
>
> > Hi Weijie,
> >
> > Thanks for driving the work! There are indeed many pain points in the
> > current DataStream API, which are challenging to resolve with its
> > existing design. It is a great opportunity to propose a new DataStream
> > API that tackles these issues. I like the way we've divided the FLIP
> > into multiple sub-FLIPs; the roadmap is clear and comprehensible. +1
> > for the umbrella FLIP. I am eager to see the sub-FLIPs!
> >
> > Best regards,
> > Xuannan
> >
> >
> >
> >
> > On Wed, Jan 24, 2024 at 8:55 PM Wencong Liu <liuwencle...@163.com>
> wrote:
> > >
> > > Hi Weijie,
> > >
> > >
> > > Thank you for the effort you've put into the DataStream API ! By
> > reorganizing and
> > > redesigning the DataStream API, as well as addressing some of the
> > unreasonable
> > > designs within it, we can enhance the efficiency of job development for
> > developers.
> > > It also allows developers to design more flexible Flink jobs to meet
> > business requirements.
> > >
> > >
> > > I have conducted a comprehensive review of the DataStream API design in
> > versions
> > > 1.18 and 1.19. I found quite a few functional defects in the DataStream
> > API, such as the
> > > lack of corresponding APIs in batch processing scenarios. In the
> > upcoming 1.20 version,
> > > I will further improve the DataStream API in batch computing scenarios.
> > >
> > >
> > > The issues existing in the old DataStream API (which can be referred to
> > as V1) can be
> > > addressed from a design perspective in the initial version of V2. I
> hope
> > to also have the
> > >  opportunity to participate in the development of DataStream V2 and
> make
> > my contribution.
> > >
> > >
> > > Regarding FLIP-408, I have a question: The Processing TimerService is
> > currently
> > > defined as one of the basic primitives, partly because it's understood
> > that
> > > you have to choose between processing time and event time.
> > > The other part of the reason is that it needs to work based on the
> task's
> > > mailbox thread model to avoid concurrency issues. Could you clarify the
> > second
> > > part of the reason?
> > >
> > > Best,
> > > Wencong Liu
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > > At 2023-12-26 14:42:20, "weijie guo" <guoweijieres...@gmail.com>
> wrote:
> > > >Hi devs,
> > > >
> > > >
> > > >I'd like to start a discussion about FLIP-408: [Umbrella] Introduce
> > > >DataStream API V2 [1].
> > > >
> > > >
> > > >The DataStream API is one of the two main APIs that Flink provides for
> > > >writing data processing programs. As an API that was introduced
> > > >practically since day-1 of the project and has been evolved for nearly
> > > >a decade, we are observing more and more problems of it. Improvements
> > > >on these problems require significant breaking changes, which makes
> > > >in-place refactor impractical. Therefore, we propose to introduce a
> > > >new set of APIs, the DataStream API V2, to gradually replace the
> > > >original DataStream API.
> > > >
> > > >
> > > >The proposal to introduce a whole set new API is complex and includes
> > > >massive changes. We are planning  to break it down into multiple
> > > >sub-FLIPs for incremental discussion. This FLIP is only used as an
> > > >umbrella, mainly focusing on motivation, goals, and overall planning.
> > > >That is to say, more design and implementation details  will be
> > > >discussed in other FLIPs.
> > > >
> > > >
> > > >Given that it's hard to imagine the detailed design of the new API if
> > > >we're just talking about this umbrella FLIP, and we probably won't be
> > > >able to give an opinion on it. Therefore, I have prepared two
> > > >sub-FLIPs [2][3] at the same time, and the discussion of them will be
> > > >posted later in separate threads.
> > > >
> > > >
> > > >Looking forward to hearing from you, thanks!
> > > >
> > > >
> > > >Best regards,
> > > >
> > > >Weijie
> > > >
> > > >
> > > >
> > > >[1]
> > > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-408%3A+%5BUmbrella%5D+Introduce+DataStream+API+V2
> > > >
> > > >[2]
> > > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-409%3A+DataStream+V2+Building+Blocks%3A+DataStream%2C+Partitioning+and+ProcessFunction
> > > >
> > > >
> > > >[3]
> > > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-410%3A++Config%2C+Context+and+Processing+Timer+Service+of+DataStream+API+V2
> >
>

Reply via email to