Thanks all for the discussion.

The wiki has been updated as discussed. I'm starting a vote now.

Best,

Xintong



On Wed, Jul 5, 2023 at 9:52 AM Xintong Song <tonysong...@gmail.com> wrote:

> Hi ConradJam,
>
> I think Chesnay has already put his name as the Contributor for the two
> tasks you listed. Maybe you can reach out to him to see if you can
> collaborate on this.
>
> In general, I don't think contributing to a release 2.0 issue is much
> different from contributing to a regular issue. We haven't yet created JIRA
> tickets for all the listed tasks because many of them needs further
> discussions and / or FLIPs to decide whether and how they should be
> performed.
>
> Best,
>
> Xintong
>
>
>
> On Mon, Jul 3, 2023 at 10:37 PM ConradJam <jam.gz...@gmail.com> wrote:
>
>> Hi Community:
>>   I see some tasks in the 2.0 list that haven't been assigned yet. I want
>> to take the initiative to take on some tasks that I can complete. How do I
>> apply to the community for this part of the task? I am interested in the
>> following parts of FLINK-32377
>> <https://issues.apache.org/jira/browse/FLINK-32377>, do I need to create
>> issuse myself and point it to myself?
>>
>> - the current timestamp, which is problematic w.r.t. caching and testing,
>> while providing no value.
>> - Remove JarRequestBody#programArgs in favor of #programArgsList.
>>
>> [1] FLINK-32377 <https://issues.apache.org/jira/browse/FLINK-32377>
>> https://issues.apache.org/jira/browse/FLINK-32377
>>
>> Teoh, Hong <lian...@amazon.co.uk.invalid> 于2023年6月30日周五 00:53写道:
>>
>>
>> Teoh, Hong <lian...@amazon.co.uk.invalid> 于2023年6月30日周五 00:53写道:
>>
>> > Thanks Xintong for driving the effort.
>> >
>> > I’d add a +1 to reworking configs, as suggested by @Jark and @Chesnay,
>> > especially the types. We have various configs that encode Time /
>> MemorySize
>> > that are Long instead!
>> >
>> > Regards,
>> > Hong
>> >
>> >
>> >
>> > > On 29 Jun 2023, at 16:19, Yuan Mei <yuanmei.w...@gmail.com> wrote:
>> > >
>> > > CAUTION: This email originated from outside of the organization. Do
>> not
>> > click links or open attachments unless you can confirm the sender and
>> know
>> > the content is safe.
>> > >
>> > >
>> > >
>> > > Thanks for driving this effort, Xintong!
>> > >
>> > > To Chesnay
>> > >> I'm curious as to why the "Disaggregated State Management" item is
>> > >> marked as a must-have; will it require changes that break something?
>> > >> What prevents it from being added in 2.1?
>> > >
>> > > As to "Disaggregated State Management".
>> > >
>> > > We plan to provide a new type of state backend to support DFS as
>> primary
>> > > storage.
>> > > To achieve this, we at least need to include two parts of amends (not
>> > > entirely sure yet, since we are still in the designing and prototype
>> > phase)
>> > >
>> > > 1. Statebackend Change
>> > > 2. State Access Change
>> > >
>> > > Not all of the interfaces related are `@Internal`. Some of the
>> interfaces
>> > > like `StateBackend` is `@PublicEvolving`
>> > > So, you are right in the sense that "Disaggregated State Management"
>> > itself
>> > > probably does not need to be a "Must Have"
>> > >
>> > > But I was hoping changes that related to public APIs can be finalized
>> and
>> > > merged in Flink 2.0 (I will fix the wiki accordingly).
>> > >
>> > > I also agree with Jark that 2.0 is a good chance to rework the default
>> > > value of configurations.
>> > >
>> > > Best
>> > > Yuan
>> > >
>> > >
>> > > On Thu, Jun 29, 2023 at 8:43 PM Chesnay Schepler <ches...@apache.org>
>> > wrote:
>> > >
>> > >> Something else configuration-related is that there are a bunch of
>> > >> options where the type isn't quite correct (e.g., a String where it
>> > >> could be an enum, a string where it should be an int or something).
>> > >> Could do a pass over those as well.
>> > >>
>> > >> On 29/06/2023 13:50, Jark Wu wrote:
>> > >>> Hi,
>> > >>>
>> > >>> I think one more thing we need to consider to do in 2.0 is changing
>> the
>> > >>> default value of configuration to improve out-of-box user
>> experience.
>> > >>>
>> > >>> Currently, in order to run a Flink job, users may need to set
>> > >>> a bunch of configurations, such as minibatch, checkpoint interval,
>> > >>> exactly-once,
>> > >>> incremental-checkpoint, etc. It's very verbose and hard to use for
>> > >>> beginners.
>> > >>> Most of them can have a universally applicable value.  Because
>> changing
>> > >> the
>> > >>> default value is a breaking change. I think It's worth considering
>> > >> changing
>> > >>> them in 2.0.
>> > >>>
>> > >>> What do you think?
>> > >>>
>> > >>> Best,
>> > >>> Jark
>> > >>>
>> > >>>
>> > >>> On Wed, 28 Jun 2023 at 14:10, Sergey Nuyanzin <snuyan...@gmail.com>
>> > >> wrote:
>> > >>>
>> > >>>> Hi Chesnay
>> > >>>>
>> > >>>>> "Move Calcite rules from Scala to Java": I would hope that this
>> would
>> > >> be
>> > >>>>> an entirely internal change, and could thus be an incremental
>> process
>> > >>>>> independent of major releases.
>> > >>>>> What is the actual scale of this item; how much are we actually
>> > >>>> re-writing?
>> > >>>>
>> > >>>> Thanks for asking
>> > >>>> yes, you're right, that should be internal change.
>> > >>>> Yeah I was also thinking about incremental change (rule by rule or
>> > >>>> reasonable small group of rules).
>> > >>>> And yes, this could be an independent (on major release) activity
>> > >>>>
>> > >>>> The problem is actually for children of RelOptRule.
>> > >>>> Currently I see 60+ such rules (in Scala) using the mentioned
>> > deprecated
>> > >>>> api.
>> > >>>> There are also children of ConverterRule (50+) which do not have
>> such
>> > >>>> issues.
>> > >>>> Maybe it could be considered as the next step to have all the
>> rules in
>> > >>>> Java.
>> > >>>>
>> > >>>> On Tue, Jun 27, 2023 at 1:34 PM Xintong Song <
>> tonysong...@gmail.com>
>> > >>>> wrote:
>> > >>>>
>> > >>>>> Hi Alex & Gyula,
>> > >>>>>
>> > >>>>> By compatibility discussion do you mean the "[DISCUSS] FLIP-321:
>> > >>>> Introduce
>> > >>>>>> an API deprecation process" thread [1]?
>> > >>>>>>
>> > >>>>> Yes, I meant the FLIP-321 discussion. I just noticed I pasted the
>> > wrong
>> > >>>> url
>> > >>>>> in my previous email. Sorry for the mistake.
>> > >>>>>
>> > >>>>> I am also curious to know if the rationale behind this new API has
>> > been
>> > >>>>>> previously discussed on the mailing list. Do we have a list of
>> > >>>>> shortcomings
>> > >>>>>> in the current DataStream API that it tries to resolve? How does
>> the
>> > >>>>>> current ProcessFunction functionality fit into the picture? Will
>> it
>> > be
>> > >>>>> kept
>> > >>>>>> as is or subsumed by new API?
>> > >>>>>>
>> > >>>>> I don't think we should create a replacement for the DataStream
>> API
>> > >>>> unless
>> > >>>>>> we have a very good reason to do so and with a proper discussion
>> > about
>> > >>>>> this
>> > >>>>>> as Alex said.
>> > >>>>>
>> > >>>>> The ProcessFunction API which is targeting to replace DataStream
>> API
>> > is
>> > >>>>> still a proposal, not a decision. Sorry for the confusion, I
>> should
>> > >> have
>> > >>>>> been more careful with my words, not giving the impression that
>> this
>> > is
>> > >>>>> something we'll do anyway.
>> > >>>>>
>> > >>>>> There will be a FLIP describing the motivations and designs in
>> > detail,
>> > >>>> for
>> > >>>>> the community to discuss and vote on. We are still working on it.
>> > TBH,
>> > >>>> this
>> > >>>>> is not trivial and we would need more time on it.
>> > >>>>>
>> > >>>>> Just to quickly share some backgrounds:
>> > >>>>>
>> > >>>>>    - We see quite some problems with the current DataStream APIs
>> > >>>>>       - Users are working with concrete classes rather than
>> > >> interfaces,
>> > >>>>>       which means
>> > >>>>>       - Users can access methods that are designed to be used by
>> > >> internal
>> > >>>>>          classes, even though they are annotated with `@Internal`.
>> > >> E.g.,
>> > >>>>>          `DataStream#getTransformation`.
>> > >>>>>          - Changes to the non-API implementations (e.g.,
>> > >>>> `Transformation`)
>> > >>>>>          would affect the API classes (e.g., `DataStream`), which
>> > >>>>> makes it hard to
>> > >>>>>          provide binary compatibility.
>> > >>>>>       - Internal classes are used as parameter / return-value of
>> > >> public
>> > >>>>>       APIs. E.g., while `AbstractStreamOperator` is
>> PublicEvolving,
>> > >>>>> `StreamTask`
>> > >>>>>       which returns from
>> `AbstractStreamOperator#getContainingTask`
>> > is
>> > >>>>> Internal.
>> > >>>>>       - In many cases, users are asked to extend the API classes,
>> > >> rather
>> > >>>>>       than implementing interfaces. E.g.,
>> `AbstractStreamOperator`.
>> > >>>>>          - Any changes to the base classes, even the internal
>> part,
>> > >> may
>> > >>>>>          affect the behavior of the user-provided sub-classes
>> > >>>>>          - Users can override the behavior of the base classes
>> > >>>>>       - The API module `flink-streaming-java` contains non-API
>> > >> classes,
>> > >>>> and
>> > >>>>>       depends on internal modules such as `flink-runtime`, which
>> > means
>> > >>>>>       - Changes to the internal modules may affect the API
>> modules,
>> > >> which
>> > >>>>>          requires users to re-build their applications upon
>> upgrading
>> > >>>>>          - The artifact user needs for building their application
>> > >> larger
>> > >>>>>          than necessary.
>> > >>>>>       - We probably should not expose operators (e.g.,
>> > >>>>>       `AbstractStreamOperator`) to users. Functions should be
>> enough
>> > >>>>> for users to
>> > >>>>>       define their data processing logics. Exposing operator-level
>> > >>>> concepts
>> > >>>>>       (e.g., mailbox thread model, checkpoint barrier alignment,
>> > >> etc.) is
>> > >>>>>       unnecessary and limits the improvement regarding such
>> exposed
>> > >>>>> mechanisms
>> > >>>>>       with compatibility considerations.
>> > >>>>>       - The current DataStream API seems to be a mixture of many
>> > >> things,
>> > >>>>>       making it hard to understand especially for newcomers. It
>> might
>> > >> be
>> > >>>>> better
>> > >>>>>       to re-organize it into several parts: (the taxonomy below
>> are
>> > >> just
>> > >>>> an
>> > >>>>>       example of the, we are still working on this)
>> > >>>>>          - The most fundamental stateful stream processing:
>> streams,
>> > >>>>>          partitions / key, process functions, state,
>> timeline-service
>> > >>>>>          - An extension for common batch-streaming unified
>> functions:
>> > >>>> map,
>> > >>>>>          flatmap, filter, agg, reduce, join, etc.
>> > >>>>>          - An extension for windowing supports:  window,
>> triggering
>> > >>>>>          - An extension for event-time supports: event time,
>> > watermark
>> > >>>>>          - The extensions are like short-cuts / sugars, without
>> which
>> > >>>> users
>> > >>>>>          can probably still achieve the same behavior by working
>> with
>> > >> the
>> > >>>>>          fundamental APIs, but would be a lot easier with the
>> > >> extensions
>> > >>>>>       - The original plan was to do in-place refactors / changes
>> on
>> > >>>>>    DataStream API. Some related items are listed in this doc [2]
>> > >> attached
>> > >>>>> to
>> > >>>>>    the kicking off email [3]. Not all of the above issues are
>> listed,
>> > >>>>> because
>> > >>>>>    we haven't looked into this as deeply as now  by that time.
>> > >>>>>    - We proposed this as a new API rather than in-place refactors
>> in
>> > >> the
>> > >>>>>    2.0 work item list, because we realized the changes might be
>> too
>> > >> big
>> > >>>>> for an
>> > >>>>>    in-place change. First having a new API then gradually retiring
>> > the
>> > >>>> old
>> > >>>>> one
>> > >>>>>    would help users to smoothly migrate between them.
>> > >>>>>
>> > >>>>> A thorough discussion is definitely needed once the FLIP is out.
>> And
>> > of
>> > >>>>> course it's possible that the FLIP might be rejected. Given that
>> we
>> > are
>> > >>>>> planning for release 2.0, I just feel it would be better to bring
>> > this
>> > >> up
>> > >>>>> early even the concrete plan is not yet ready,
>> > >>>>>
>> > >>>>> Best,
>> > >>>>>
>> > >>>>> Xintong
>> > >>>>>
>> > >>>>>
>> > >>>>> [1]
>> https://lists.apache.org/thread/vmhzv8fcw2b33pqxp43486owrxbkd5x9
>> > >>>>> [2]
>> > >>>>>
>> > >>>>>
>> > >>>>
>> > >>
>> >
>> https://docs.google.com/document/d/1_PMGl5RuDQGlV99_gL3y7OiRsF0DgCk91Coua6hFXhE/edit?usp=sharing
>> > >>>>> [3]
>> https://lists.apache.org/thread/b8w5cx0qqbwzzklyn5xxf54vw9ymys1c
>> > >>>>>
>> > >>>>> On Tue, Jun 27, 2023 at 5:15 PM Gyula Fóra <gyf...@apache.org>
>> > wrote:
>> > >>>>>
>> > >>>>>> Hey!
>> > >>>>>>
>> > >>>>>> I share the same concerns mentioned above regarding the
>> > >>>> "ProcessFunction
>> > >>>>>> API".
>> > >>>>>>
>> > >>>>>> I don't think we should create a replacement for the DataStream
>> API
>> > >>>>> unless
>> > >>>>>> we have a very good reason to do so and with a proper discussion
>> > about
>> > >>>>> this
>> > >>>>>> as Alex said.
>> > >>>>>>
>> > >>>>>> Cheers,
>> > >>>>>> Gyula
>> > >>>>>>
>> > >>>>>> On Tue, Jun 27, 2023 at 11:03 AM Alexander Fedulov <
>> > >>>>>> alexander.fedu...@gmail.com> wrote:
>> > >>>>>>
>> > >>>>>>> Hi Xintong,
>> > >>>>>>>
>> > >>>>>>> By compatibility discussion do you mean the "[DISCUSS] FLIP-321:
>> > >>>>>> Introduce
>> > >>>>>>> an API deprecation process" thread [1]?
>> > >>>>>>>
>> > >>>>>>> I am also curious to know if the rationale behind this new API
>> has
>> > >>>> been
>> > >>>>>>> previously discussed on the mailing list. Do we have a list of
>> > >>>>>> shortcomings
>> > >>>>>>> in the current DataStream API that it tries to resolve? How does
>> > the
>> > >>>>>>> current ProcessFunction functionality fit into the picture?
>> Will it
>> > >>>> be
>> > >>>>>> kept
>> > >>>>>>> as is or subsumed by new API?
>> > >>>>>>>
>> > >>>>>>> [1]
>> > https://lists.apache.org/thread/vmhzv8fcw2b33pqxp43486owrxbkd5x9
>> > >>>>>>>
>> > >>>>>>> Best,
>> > >>>>>>> Alex
>> > >>>>>>>
>> > >>>>>>> On Mon, 26 Jun 2023 at 14:33, Xintong Song <
>> tonysong...@gmail.com>
>> > >>>>>> wrote:
>> > >>>>>>>>> The ProcessFunction API item is giving me the most headaches
>> > >>>>> because
>> > >>>>>>> it's
>> > >>>>>>>>> very unclear what it actually entails; like is it an entirely
>> > >>>>>> separate
>> > >>>>>>>> API
>> > >>>>>>>>> to DataStream (sounds like it is!) or an extension of
>> DataStream.
>> > >>>>> How
>> > >>>>>>>> much
>> > >>>>>>>>> will it share the internals with DataStream etc.; how does it
>> > >>>>> relate
>> > >>>>>> to
>> > >>>>>>>> the
>> > >>>>>>>>> Table API (w.r.t. switching APIs / what Table API uses
>> > >>>> underneath).
>> > >>>>>>>> I totally understand your confusion. We started planning this
>> > after
>> > >>>>>>> kicking
>> > >>>>>>>> off the release 2.0, so there's still a lot to be explored and
>> the
>> > >>>>> plan
>> > >>>>>>>> keeps changing.
>> > >>>>>>>>
>> > >>>>>>>>
>> > >>>>>>>>    - In the beginning, we planned to do an in-place refactor of
>> > >>>>>>> DataStream
>> > >>>>>>>>    API, until the API migration period is proposed.
>> > >>>>>>>>    - Then we want to make it an entirely separate API to
>> > >>>> DataStream,
>> > >>>>>> and
>> > >>>>>>>>    listed as a must-have for release 2.0 so that we can remove
>> > >>>>>> DataStream
>> > >>>>>>>> once
>> > >>>>>>>>    it's ready.
>> > >>>>>>>>    - However, depending on the outcome of the API compatibility
>> > >>>>>>> discussion
>> > >>>>>>>>    [1], we may not be able to remove DataStream in 2.0 anyway,
>> > >>>> which
>> > >>>>>>> means
>> > >>>>>>>> we
>> > >>>>>>>>    might need to re-evaluate the necessity of this item for
>> 2.0.
>> > >>>>>>>>
>> > >>>>>>>> I'd say we wait a bit longer for the compatibility discussion
>> [1]
>> > >>>> and
>> > >>>>>>>> decide the priority for this item afterwards.
>> > >>>>>>>>
>> > >>>>>>>>
>> > >>>>>>>> Best,
>> > >>>>>>>>
>> > >>>>>>>> Xintong
>> > >>>>>>>>
>> > >>>>>>>>
>> > >>>>>>>> [1] https://lists.apache.org/list.html?dev@flink.apache.org
>> > >>>>>>>>
>> > >>>>>>>>
>> > >>>>>>>> On Mon, Jun 26, 2023 at 6:00 PM Chesnay Schepler <
>> > >>>> ches...@apache.org
>> > >>>>>>>> wrote:
>> > >>>>>>>>
>> > >>>>>>>>> by-and-large I'm quite happy with the list of items.
>> > >>>>>>>>>
>> > >>>>>>>>> I'm curious as to why the "Disaggregated State Management"
>> item
>> > >>>> is
>> > >>>>>>> marked
>> > >>>>>>>>> as a must-have; will it require changes that break something?
>> > >>>> What
>> > >>>>>>>> prevents
>> > >>>>>>>>> it from being added in 2.1?
>> > >>>>>>>>>
>> > >>>>>>>>> We may want to update the Java 17 item to "Make Java 17 the
>> > >>>>> default,
>> > >>>>>>> drop
>> > >>>>>>>>> Java 8/11". Maybe even split it into a must-have "Drop Java 8"
>> > >>>> and
>> > >>>>> a
>> > >>>>>>>>> nice-to-have "Drop Java 11"?
>> > >>>>>>>>>
>> > >>>>>>>>> "Move Calcite rules from Scala to Java": I would hope that
>> this
>> > >>>>> would
>> > >>>>>>> be
>> > >>>>>>>>> an entirely internal change, and could thus be an incremental
>> > >>>>> process
>> > >>>>>>>>> independent of major releases.
>> > >>>>>>>>> What is the actual scale of this item; how much are we
>> actually
>> > >>>>>>>> re-writing?
>> > >>>>>>>>> "Add MetricGroup#getLogicalScope": I'd raise this to a
>> > >>>> must-have; i
>> > >>>>>>> think
>> > >>>>>>>>> I marked it down as nice-to-have only because it depends on
>> > >>>> another
>> > >>>>>>> item.
>> > >>>>>>>>> The ProcessFunction API item is giving me the most headaches
>> > >>>>> because
>> > >>>>>>> it's
>> > >>>>>>>>> very unclear what it actually entails; like is it an entirely
>> > >>>>>> separate
>> > >>>>>>>> API
>> > >>>>>>>>> to DataStream (sounds like it is!) or an extension of
>> DataStream.
>> > >>>>> How
>> > >>>>>>>> much
>> > >>>>>>>>> will it share the internals with DataStream etc.; how does it
>> > >>>>> relate
>> > >>>>>> to
>> > >>>>>>>> the
>> > >>>>>>>>> Table API (w.r.t. switching APIs / what Table API uses
>> > >>>> underneath).
>> > >>>>>>>>> There are a few items I added as ideas which don't have a
>> > >>>> priority
>> > >>>>>> yet;
>> > >>>>>>>>> would love to get some feedback on those.
>> > >>>>>>>>>
>> > >>>>>>>>> On 21/06/2023 08:41, Xintong Song wrote:
>> > >>>>>>>>>
>> > >>>>>>>>> Hi devs,
>> > >>>>>>>>>
>> > >>>>>>>>> As previously discussed in [1], we had been collecting work
>> item
>> > >>>>>>>> proposals
>> > >>>>>>>>> for the 2.0 release until June 15th, on the wiki page [2].
>> > >>>>>>>>>
>> > >>>>>>>>>    - As we have passed the due date, I'd like to kindly remind
>> > >>>>>> everyone
>> > >>>>>>>> *not
>> > >>>>>>>>>    to add / remove items directly on the wiki page*. If
>> needed,
>> > >>>>>> please
>> > >>>>>>>> post
>> > >>>>>>>>>    in this thread or reach out to the release managers
>> instead.
>> > >>>>>>>>>    - I've reached out to some folks for clarifications about
>> > >>>> their
>> > >>>>>>>>>    proposals. Some of them mentioned that they can not yet
>> tell
>> > >>>>>> whether
>> > >>>>>>>> we
>> > >>>>>>>>>    should do an item or not, and would need more time /
>> > >>>> discussions
>> > >>>>>> to
>> > >>>>>>>> make
>> > >>>>>>>>>    the decision. So I added a new symbol for items whose
>> > >>>> priorities
>> > >>>>>> are
>> > >>>>>>>> `TBD`.
>> > >>>>>>>>> Now it's time to collaboratively decide a minimum set of
>> > >>>> must-have
>> > >>>>>>> items.
>> > >>>>>>>>> I've gone through the entire list of proposed items, and found
>> > >>>> most
>> > >>>>>> of
>> > >>>>>>>> them
>> > >>>>>>>>> make quite much sense. So I think an online sync might not be
>> > >>>>>> necessary
>> > >>>>>>>> for
>> > >>>>>>>>> this. I'd like to go with this DISCUSS thread, where everyone
>> can
>> > >>>>>>> comment
>> > >>>>>>>>> on how they think the list can be improved, followed by a
>> VOTE to
>> > >>>>>>>> formally
>> > >>>>>>>>> make the decision.
>> > >>>>>>>>>
>> > >>>>>>>>> Any feedback and opinions, including but not limited to the
>> > >>>>> following
>> > >>>>>>>>> aspects, will be appreciated.
>> > >>>>>>>>>
>> > >>>>>>>>>    - Important items that are missing from the list
>> > >>>>>>>>>    - Concerns regarding the listed items or their priorities
>> > >>>>>>>>>
>> > >>>>>>>>> Looking forward to your feedback.
>> > >>>>>>>>>
>> > >>>>>>>>> Best,
>> > >>>>>>>>>
>> > >>>>>>>>> Xintong
>> > >>>>>>>>>
>> > >>>>>>>>>
>> > >>>>>>>>> [1]
>> > >>>>
>> > >>
>> >
>> https://lists.apache.org/list?dev@flink.apache.org:lte=1M:release%202.0%20status%20updates
>> > >>>>>>>>> [2]
>> > >>>> https://cwiki.apache.org/confluence/display/FLINK/2.0+Release
>> > >>>>>>>>>
>> > >>>>>>>>>
>> > >>>>
>> > >>>> --
>> > >>>> Best regards,
>> > >>>> Sergey
>> > >>>>
>> > >>
>> > >>
>> >
>> >
>>
>> --
>> Best
>>
>> ConradJam
>>
>

Reply via email to