Hi,

I think one more thing we need to consider to do in 2.0 is changing the
default value of configuration to improve out-of-box user experience.

Currently, in order to run a Flink job, users may need to set
a bunch of configurations, such as minibatch, checkpoint interval,
exactly-once,
incremental-checkpoint, etc. It's very verbose and hard to use for
beginners.
Most of them can have a universally applicable value.  Because changing the
default value is a breaking change. I think It's worth considering changing
them in 2.0.

What do you think?

Best,
Jark


On Wed, 28 Jun 2023 at 14:10, Sergey Nuyanzin <snuyan...@gmail.com> wrote:

> Hi Chesnay
>
> >"Move Calcite rules from Scala to Java": I would hope that this would be
> >an entirely internal change, and could thus be an incremental process
> >independent of major releases.
> >What is the actual scale of this item; how much are we actually
> re-writing?
>
> Thanks for asking
> yes, you're right, that should be internal change.
> Yeah I was also thinking about incremental change (rule by rule or
> reasonable small group of rules).
> And yes, this could be an independent (on major release) activity
>
> The problem is actually for children of RelOptRule.
> Currently I see 60+ such rules (in Scala) using the mentioned deprecated
> api.
> There are also children of ConverterRule (50+) which do not have such
> issues.
> Maybe it could be considered as the next step to have all the rules in
> Java.
>
> On Tue, Jun 27, 2023 at 1:34 PM Xintong Song <tonysong...@gmail.com>
> wrote:
>
> > Hi Alex & Gyula,
> >
> > By compatibility discussion do you mean the "[DISCUSS] FLIP-321:
> Introduce
> > > an API deprecation process" thread [1]?
> > >
> >
> > Yes, I meant the FLIP-321 discussion. I just noticed I pasted the wrong
> url
> > in my previous email. Sorry for the mistake.
> >
> > I am also curious to know if the rationale behind this new API has been
> > > previously discussed on the mailing list. Do we have a list of
> > shortcomings
> > > in the current DataStream API that it tries to resolve? How does the
> > > current ProcessFunction functionality fit into the picture? Will it be
> > kept
> > > as is or subsumed by new API?
> > >
> >
> > I don't think we should create a replacement for the DataStream API
> unless
> > > we have a very good reason to do so and with a proper discussion about
> > this
> > > as Alex said.
> >
> >
> > The ProcessFunction API which is targeting to replace DataStream API is
> > still a proposal, not a decision. Sorry for the confusion, I should have
> > been more careful with my words, not giving the impression that this is
> > something we'll do anyway.
> >
> > There will be a FLIP describing the motivations and designs in detail,
> for
> > the community to discuss and vote on. We are still working on it. TBH,
> this
> > is not trivial and we would need more time on it.
> >
> > Just to quickly share some backgrounds:
> >
> >    - We see quite some problems with the current DataStream APIs
> >       - Users are working with concrete classes rather than interfaces,
> >       which means
> >       - Users can access methods that are designed to be used by internal
> >          classes, even though they are annotated with `@Internal`. E.g.,
> >          `DataStream#getTransformation`.
> >          - Changes to the non-API implementations (e.g.,
> `Transformation`)
> >          would affect the API classes (e.g., `DataStream`), which
> > makes it hard to
> >          provide binary compatibility.
> >       - Internal classes are used as parameter / return-value of public
> >       APIs. E.g., while `AbstractStreamOperator` is PublicEvolving,
> > `StreamTask`
> >       which returns from `AbstractStreamOperator#getContainingTask` is
> > Internal.
> >       - In many cases, users are asked to extend the API classes, rather
> >       than implementing interfaces. E.g., `AbstractStreamOperator`.
> >          - Any changes to the base classes, even the internal part, may
> >          affect the behavior of the user-provided sub-classes
> >          - Users can override the behavior of the base classes
> >       - The API module `flink-streaming-java` contains non-API classes,
> and
> >       depends on internal modules such as `flink-runtime`, which means
> >       - Changes to the internal modules may affect the API modules, which
> >          requires users to re-build their applications upon upgrading
> >          - The artifact user needs for building their application larger
> >          than necessary.
> >       - We probably should not expose operators (e.g.,
> >       `AbstractStreamOperator`) to users. Functions should be enough
> > for users to
> >       define their data processing logics. Exposing operator-level
> concepts
> >       (e.g., mailbox thread model, checkpoint barrier alignment, etc.) is
> >       unnecessary and limits the improvement regarding such exposed
> > mechanisms
> >       with compatibility considerations.
> >       - The current DataStream API seems to be a mixture of many things,
> >       making it hard to understand especially for newcomers. It might be
> > better
> >       to re-organize it into several parts: (the taxonomy below are just
> an
> >       example of the, we are still working on this)
> >          - The most fundamental stateful stream processing: streams,
> >          partitions / key, process functions, state, timeline-service
> >          - An extension for common batch-streaming unified functions:
> map,
> >          flatmap, filter, agg, reduce, join, etc.
> >          - An extension for windowing supports:  window, triggering
> >          - An extension for event-time supports: event time, watermark
> >          - The extensions are like short-cuts / sugars, without which
> users
> >          can probably still achieve the same behavior by working with the
> >          fundamental APIs, but would be a lot easier with the extensions
> >       - The original plan was to do in-place refactors / changes on
> >    DataStream API. Some related items are listed in this doc [2] attached
> > to
> >    the kicking off email [3]. Not all of the above issues are listed,
> > because
> >    we haven't looked into this as deeply as now  by that time.
> >    - We proposed this as a new API rather than in-place refactors in the
> >    2.0 work item list, because we realized the changes might be too big
> > for an
> >    in-place change. First having a new API then gradually retiring the
> old
> > one
> >    would help users to smoothly migrate between them.
> >
> > A thorough discussion is definitely needed once the FLIP is out. And of
> > course it's possible that the FLIP might be rejected. Given that we are
> > planning for release 2.0, I just feel it would be better to bring this up
> > early even the concrete plan is not yet ready,
> >
> > Best,
> >
> > Xintong
> >
> >
> > [1] https://lists.apache.org/thread/vmhzv8fcw2b33pqxp43486owrxbkd5x9
> > [2]
> >
> >
> https://docs.google.com/document/d/1_PMGl5RuDQGlV99_gL3y7OiRsF0DgCk91Coua6hFXhE/edit?usp=sharing
> > [3] https://lists.apache.org/thread/b8w5cx0qqbwzzklyn5xxf54vw9ymys1c
> >
> > On Tue, Jun 27, 2023 at 5:15 PM Gyula Fóra <gyf...@apache.org> wrote:
> >
> > > Hey!
> > >
> > > I share the same concerns mentioned above regarding the
> "ProcessFunction
> > > API".
> > >
> > > I don't think we should create a replacement for the DataStream API
> > unless
> > > we have a very good reason to do so and with a proper discussion about
> > this
> > > as Alex said.
> > >
> > > Cheers,
> > > Gyula
> > >
> > > On Tue, Jun 27, 2023 at 11:03 AM Alexander Fedulov <
> > > alexander.fedu...@gmail.com> wrote:
> > >
> > > > Hi Xintong,
> > > >
> > > > By compatibility discussion do you mean the "[DISCUSS] FLIP-321:
> > > Introduce
> > > > an API deprecation process" thread [1]?
> > > >
> > > > I am also curious to know if the rationale behind this new API has
> been
> > > > previously discussed on the mailing list. Do we have a list of
> > > shortcomings
> > > > in the current DataStream API that it tries to resolve? How does the
> > > > current ProcessFunction functionality fit into the picture? Will it
> be
> > > kept
> > > > as is or subsumed by new API?
> > > >
> > > > [1] https://lists.apache.org/thread/vmhzv8fcw2b33pqxp43486owrxbkd5x9
> > > >
> > > > Best,
> > > > Alex
> > > >
> > > > On Mon, 26 Jun 2023 at 14:33, Xintong Song <tonysong...@gmail.com>
> > > wrote:
> > > >
> > > > > >
> > > > > > The ProcessFunction API item is giving me the most headaches
> > because
> > > > it's
> > > > > > very unclear what it actually entails; like is it an entirely
> > > separate
> > > > > API
> > > > > > to DataStream (sounds like it is!) or an extension of DataStream.
> > How
> > > > > much
> > > > > > will it share the internals with DataStream etc.; how does it
> > relate
> > > to
> > > > > the
> > > > > > Table API (w.r.t. switching APIs / what Table API uses
> underneath).
> > > > > >
> > > > >
> > > > > I totally understand your confusion. We started planning this after
> > > > kicking
> > > > > off the release 2.0, so there's still a lot to be explored and the
> > plan
> > > > > keeps changing.
> > > > >
> > > > >
> > > > >    - In the beginning, we planned to do an in-place refactor of
> > > > DataStream
> > > > >    API, until the API migration period is proposed.
> > > > >    - Then we want to make it an entirely separate API to
> DataStream,
> > > and
> > > > >    listed as a must-have for release 2.0 so that we can remove
> > > DataStream
> > > > > once
> > > > >    it's ready.
> > > > >    - However, depending on the outcome of the API compatibility
> > > > discussion
> > > > >    [1], we may not be able to remove DataStream in 2.0 anyway,
> which
> > > > means
> > > > > we
> > > > >    might need to re-evaluate the necessity of this item for 2.0.
> > > > >
> > > > > I'd say we wait a bit longer for the compatibility discussion [1]
> and
> > > > > decide the priority for this item afterwards.
> > > > >
> > > > >
> > > > > Best,
> > > > >
> > > > > Xintong
> > > > >
> > > > >
> > > > > [1] https://lists.apache.org/list.html?dev@flink.apache.org
> > > > >
> > > > >
> > > > > On Mon, Jun 26, 2023 at 6:00 PM Chesnay Schepler <
> ches...@apache.org
> > >
> > > > > wrote:
> > > > >
> > > > > > by-and-large I'm quite happy with the list of items.
> > > > > >
> > > > > > I'm curious as to why the "Disaggregated State Management" item
> is
> > > > marked
> > > > > > as a must-have; will it require changes that break something?
> What
> > > > > prevents
> > > > > > it from being added in 2.1?
> > > > > >
> > > > > > We may want to update the Java 17 item to "Make Java 17 the
> > default,
> > > > drop
> > > > > > Java 8/11". Maybe even split it into a must-have "Drop Java 8"
> and
> > a
> > > > > > nice-to-have "Drop Java 11"?
> > > > > >
> > > > > > "Move Calcite rules from Scala to Java": I would hope that this
> > would
> > > > be
> > > > > > an entirely internal change, and could thus be an incremental
> > process
> > > > > > independent of major releases.
> > > > > > What is the actual scale of this item; how much are we actually
> > > > > re-writing?
> > > > > >
> > > > > > "Add MetricGroup#getLogicalScope": I'd raise this to a
> must-have; i
> > > > think
> > > > > > I marked it down as nice-to-have only because it depends on
> another
> > > > item.
> > > > > >
> > > > > > The ProcessFunction API item is giving me the most headaches
> > because
> > > > it's
> > > > > > very unclear what it actually entails; like is it an entirely
> > > separate
> > > > > API
> > > > > > to DataStream (sounds like it is!) or an extension of DataStream.
> > How
> > > > > much
> > > > > > will it share the internals with DataStream etc.; how does it
> > relate
> > > to
> > > > > the
> > > > > > Table API (w.r.t. switching APIs / what Table API uses
> underneath).
> > > > > >
> > > > > > There are a few items I added as ideas which don't have a
> priority
> > > yet;
> > > > > > would love to get some feedback on those.
> > > > > >
> > > > > > On 21/06/2023 08:41, Xintong Song wrote:
> > > > > >
> > > > > > Hi devs,
> > > > > >
> > > > > > As previously discussed in [1], we had been collecting work item
> > > > > proposals
> > > > > > for the 2.0 release until June 15th, on the wiki page [2].
> > > > > >
> > > > > >    - As we have passed the due date, I'd like to kindly remind
> > > everyone
> > > > > *not
> > > > > >    to add / remove items directly on the wiki page*. If needed,
> > > please
> > > > > post
> > > > > >    in this thread or reach out to the release managers instead.
> > > > > >    - I've reached out to some folks for clarifications about
> their
> > > > > >    proposals. Some of them mentioned that they can not yet tell
> > > whether
> > > > > we
> > > > > >    should do an item or not, and would need more time /
> discussions
> > > to
> > > > > make
> > > > > >    the decision. So I added a new symbol for items whose
> priorities
> > > are
> > > > > `TBD`.
> > > > > >
> > > > > > Now it's time to collaboratively decide a minimum set of
> must-have
> > > > items.
> > > > > > I've gone through the entire list of proposed items, and found
> most
> > > of
> > > > > them
> > > > > > make quite much sense. So I think an online sync might not be
> > > necessary
> > > > > for
> > > > > > this. I'd like to go with this DISCUSS thread, where everyone can
> > > > comment
> > > > > > on how they think the list can be improved, followed by a VOTE to
> > > > > formally
> > > > > > make the decision.
> > > > > >
> > > > > > Any feedback and opinions, including but not limited to the
> > following
> > > > > > aspects, will be appreciated.
> > > > > >
> > > > > >    - Important items that are missing from the list
> > > > > >    - Concerns regarding the listed items or their priorities
> > > > > >
> > > > > > Looking forward to your feedback.
> > > > > >
> > > > > > Best,
> > > > > >
> > > > > > Xintong
> > > > > >
> > > > > >
> > > > > > [1]
> > > > >
> > > >
> > >
> >
> https://lists.apache.org/list?dev@flink.apache.org:lte=1M:release%202.0%20status%20updates
> > > > > >
> > > > > > [2]
> https://cwiki.apache.org/confluence/display/FLINK/2.0+Release
> > > > > >
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>
>
> --
> Best regards,
> Sergey
>

Reply via email to