Hi Chesnay

>"Move Calcite rules from Scala to Java": I would hope that this would be
>an entirely internal change, and could thus be an incremental process
>independent of major releases.
>What is the actual scale of this item; how much are we actually re-writing?

Thanks for asking
yes, you're right, that should be internal change.
Yeah I was also thinking about incremental change (rule by rule or
reasonable small group of rules).
And yes, this could be an independent (on major release) activity

The problem is actually for children of RelOptRule.
Currently I see 60+ such rules (in Scala) using the mentioned deprecated
api.
There are also children of ConverterRule (50+) which do not have such
issues.
Maybe it could be considered as the next step to have all the rules in Java.

On Tue, Jun 27, 2023 at 1:34 PM Xintong Song <tonysong...@gmail.com> wrote:

> Hi Alex & Gyula,
>
> By compatibility discussion do you mean the "[DISCUSS] FLIP-321: Introduce
> > an API deprecation process" thread [1]?
> >
>
> Yes, I meant the FLIP-321 discussion. I just noticed I pasted the wrong url
> in my previous email. Sorry for the mistake.
>
> I am also curious to know if the rationale behind this new API has been
> > previously discussed on the mailing list. Do we have a list of
> shortcomings
> > in the current DataStream API that it tries to resolve? How does the
> > current ProcessFunction functionality fit into the picture? Will it be
> kept
> > as is or subsumed by new API?
> >
>
> I don't think we should create a replacement for the DataStream API unless
> > we have a very good reason to do so and with a proper discussion about
> this
> > as Alex said.
>
>
> The ProcessFunction API which is targeting to replace DataStream API is
> still a proposal, not a decision. Sorry for the confusion, I should have
> been more careful with my words, not giving the impression that this is
> something we'll do anyway.
>
> There will be a FLIP describing the motivations and designs in detail, for
> the community to discuss and vote on. We are still working on it. TBH, this
> is not trivial and we would need more time on it.
>
> Just to quickly share some backgrounds:
>
>    - We see quite some problems with the current DataStream APIs
>       - Users are working with concrete classes rather than interfaces,
>       which means
>       - Users can access methods that are designed to be used by internal
>          classes, even though they are annotated with `@Internal`. E.g.,
>          `DataStream#getTransformation`.
>          - Changes to the non-API implementations (e.g., `Transformation`)
>          would affect the API classes (e.g., `DataStream`), which
> makes it hard to
>          provide binary compatibility.
>       - Internal classes are used as parameter / return-value of public
>       APIs. E.g., while `AbstractStreamOperator` is PublicEvolving,
> `StreamTask`
>       which returns from `AbstractStreamOperator#getContainingTask` is
> Internal.
>       - In many cases, users are asked to extend the API classes, rather
>       than implementing interfaces. E.g., `AbstractStreamOperator`.
>          - Any changes to the base classes, even the internal part, may
>          affect the behavior of the user-provided sub-classes
>          - Users can override the behavior of the base classes
>       - The API module `flink-streaming-java` contains non-API classes, and
>       depends on internal modules such as `flink-runtime`, which means
>       - Changes to the internal modules may affect the API modules, which
>          requires users to re-build their applications upon upgrading
>          - The artifact user needs for building their application larger
>          than necessary.
>       - We probably should not expose operators (e.g.,
>       `AbstractStreamOperator`) to users. Functions should be enough
> for users to
>       define their data processing logics. Exposing operator-level concepts
>       (e.g., mailbox thread model, checkpoint barrier alignment, etc.) is
>       unnecessary and limits the improvement regarding such exposed
> mechanisms
>       with compatibility considerations.
>       - The current DataStream API seems to be a mixture of many things,
>       making it hard to understand especially for newcomers. It might be
> better
>       to re-organize it into several parts: (the taxonomy below are just an
>       example of the, we are still working on this)
>          - The most fundamental stateful stream processing: streams,
>          partitions / key, process functions, state, timeline-service
>          - An extension for common batch-streaming unified functions: map,
>          flatmap, filter, agg, reduce, join, etc.
>          - An extension for windowing supports:  window, triggering
>          - An extension for event-time supports: event time, watermark
>          - The extensions are like short-cuts / sugars, without which users
>          can probably still achieve the same behavior by working with the
>          fundamental APIs, but would be a lot easier with the extensions
>       - The original plan was to do in-place refactors / changes on
>    DataStream API. Some related items are listed in this doc [2] attached
> to
>    the kicking off email [3]. Not all of the above issues are listed,
> because
>    we haven't looked into this as deeply as now  by that time.
>    - We proposed this as a new API rather than in-place refactors in the
>    2.0 work item list, because we realized the changes might be too big
> for an
>    in-place change. First having a new API then gradually retiring the old
> one
>    would help users to smoothly migrate between them.
>
> A thorough discussion is definitely needed once the FLIP is out. And of
> course it's possible that the FLIP might be rejected. Given that we are
> planning for release 2.0, I just feel it would be better to bring this up
> early even the concrete plan is not yet ready,
>
> Best,
>
> Xintong
>
>
> [1] https://lists.apache.org/thread/vmhzv8fcw2b33pqxp43486owrxbkd5x9
> [2]
>
> https://docs.google.com/document/d/1_PMGl5RuDQGlV99_gL3y7OiRsF0DgCk91Coua6hFXhE/edit?usp=sharing
> [3] https://lists.apache.org/thread/b8w5cx0qqbwzzklyn5xxf54vw9ymys1c
>
> On Tue, Jun 27, 2023 at 5:15 PM Gyula Fóra <gyf...@apache.org> wrote:
>
> > Hey!
> >
> > I share the same concerns mentioned above regarding the "ProcessFunction
> > API".
> >
> > I don't think we should create a replacement for the DataStream API
> unless
> > we have a very good reason to do so and with a proper discussion about
> this
> > as Alex said.
> >
> > Cheers,
> > Gyula
> >
> > On Tue, Jun 27, 2023 at 11:03 AM Alexander Fedulov <
> > alexander.fedu...@gmail.com> wrote:
> >
> > > Hi Xintong,
> > >
> > > By compatibility discussion do you mean the "[DISCUSS] FLIP-321:
> > Introduce
> > > an API deprecation process" thread [1]?
> > >
> > > I am also curious to know if the rationale behind this new API has been
> > > previously discussed on the mailing list. Do we have a list of
> > shortcomings
> > > in the current DataStream API that it tries to resolve? How does the
> > > current ProcessFunction functionality fit into the picture? Will it be
> > kept
> > > as is or subsumed by new API?
> > >
> > > [1] https://lists.apache.org/thread/vmhzv8fcw2b33pqxp43486owrxbkd5x9
> > >
> > > Best,
> > > Alex
> > >
> > > On Mon, 26 Jun 2023 at 14:33, Xintong Song <tonysong...@gmail.com>
> > wrote:
> > >
> > > > >
> > > > > The ProcessFunction API item is giving me the most headaches
> because
> > > it's
> > > > > very unclear what it actually entails; like is it an entirely
> > separate
> > > > API
> > > > > to DataStream (sounds like it is!) or an extension of DataStream.
> How
> > > > much
> > > > > will it share the internals with DataStream etc.; how does it
> relate
> > to
> > > > the
> > > > > Table API (w.r.t. switching APIs / what Table API uses underneath).
> > > > >
> > > >
> > > > I totally understand your confusion. We started planning this after
> > > kicking
> > > > off the release 2.0, so there's still a lot to be explored and the
> plan
> > > > keeps changing.
> > > >
> > > >
> > > >    - In the beginning, we planned to do an in-place refactor of
> > > DataStream
> > > >    API, until the API migration period is proposed.
> > > >    - Then we want to make it an entirely separate API to DataStream,
> > and
> > > >    listed as a must-have for release 2.0 so that we can remove
> > DataStream
> > > > once
> > > >    it's ready.
> > > >    - However, depending on the outcome of the API compatibility
> > > discussion
> > > >    [1], we may not be able to remove DataStream in 2.0 anyway, which
> > > means
> > > > we
> > > >    might need to re-evaluate the necessity of this item for 2.0.
> > > >
> > > > I'd say we wait a bit longer for the compatibility discussion [1] and
> > > > decide the priority for this item afterwards.
> > > >
> > > >
> > > > Best,
> > > >
> > > > Xintong
> > > >
> > > >
> > > > [1] https://lists.apache.org/list.html?dev@flink.apache.org
> > > >
> > > >
> > > > On Mon, Jun 26, 2023 at 6:00 PM Chesnay Schepler <ches...@apache.org
> >
> > > > wrote:
> > > >
> > > > > by-and-large I'm quite happy with the list of items.
> > > > >
> > > > > I'm curious as to why the "Disaggregated State Management" item is
> > > marked
> > > > > as a must-have; will it require changes that break something? What
> > > > prevents
> > > > > it from being added in 2.1?
> > > > >
> > > > > We may want to update the Java 17 item to "Make Java 17 the
> default,
> > > drop
> > > > > Java 8/11". Maybe even split it into a must-have "Drop Java 8" and
> a
> > > > > nice-to-have "Drop Java 11"?
> > > > >
> > > > > "Move Calcite rules from Scala to Java": I would hope that this
> would
> > > be
> > > > > an entirely internal change, and could thus be an incremental
> process
> > > > > independent of major releases.
> > > > > What is the actual scale of this item; how much are we actually
> > > > re-writing?
> > > > >
> > > > > "Add MetricGroup#getLogicalScope": I'd raise this to a must-have; i
> > > think
> > > > > I marked it down as nice-to-have only because it depends on another
> > > item.
> > > > >
> > > > > The ProcessFunction API item is giving me the most headaches
> because
> > > it's
> > > > > very unclear what it actually entails; like is it an entirely
> > separate
> > > > API
> > > > > to DataStream (sounds like it is!) or an extension of DataStream.
> How
> > > > much
> > > > > will it share the internals with DataStream etc.; how does it
> relate
> > to
> > > > the
> > > > > Table API (w.r.t. switching APIs / what Table API uses underneath).
> > > > >
> > > > > There are a few items I added as ideas which don't have a priority
> > yet;
> > > > > would love to get some feedback on those.
> > > > >
> > > > > On 21/06/2023 08:41, Xintong Song wrote:
> > > > >
> > > > > Hi devs,
> > > > >
> > > > > As previously discussed in [1], we had been collecting work item
> > > > proposals
> > > > > for the 2.0 release until June 15th, on the wiki page [2].
> > > > >
> > > > >    - As we have passed the due date, I'd like to kindly remind
> > everyone
> > > > *not
> > > > >    to add / remove items directly on the wiki page*. If needed,
> > please
> > > > post
> > > > >    in this thread or reach out to the release managers instead.
> > > > >    - I've reached out to some folks for clarifications about their
> > > > >    proposals. Some of them mentioned that they can not yet tell
> > whether
> > > > we
> > > > >    should do an item or not, and would need more time / discussions
> > to
> > > > make
> > > > >    the decision. So I added a new symbol for items whose priorities
> > are
> > > > `TBD`.
> > > > >
> > > > > Now it's time to collaboratively decide a minimum set of must-have
> > > items.
> > > > > I've gone through the entire list of proposed items, and found most
> > of
> > > > them
> > > > > make quite much sense. So I think an online sync might not be
> > necessary
> > > > for
> > > > > this. I'd like to go with this DISCUSS thread, where everyone can
> > > comment
> > > > > on how they think the list can be improved, followed by a VOTE to
> > > > formally
> > > > > make the decision.
> > > > >
> > > > > Any feedback and opinions, including but not limited to the
> following
> > > > > aspects, will be appreciated.
> > > > >
> > > > >    - Important items that are missing from the list
> > > > >    - Concerns regarding the listed items or their priorities
> > > > >
> > > > > Looking forward to your feedback.
> > > > >
> > > > > Best,
> > > > >
> > > > > Xintong
> > > > >
> > > > >
> > > > > [1]
> > > >
> > >
> >
> https://lists.apache.org/list?dev@flink.apache.org:lte=1M:release%202.0%20status%20updates
> > > > >
> > > > > [2] https://cwiki.apache.org/confluence/display/FLINK/2.0+Release
> > > > >
> > > > >
> > > > >
> > > >
> > >
> >
>


-- 
Best regards,
Sergey

Reply via email to