Hi Community:
  I see some tasks in the 2.0 list that haven't been assigned yet. I want
to take the initiative to take on some tasks that I can complete. How do I
apply to the community for this part of the task? I am interested in the
following parts of FLINK-32377
<https://issues.apache.org/jira/browse/FLINK-32377>, do I need to create
issuse myself and point it to myself?

- the current timestamp, which is problematic w.r.t. caching and testing,
while providing no value.
- Remove JarRequestBody#programArgs in favor of #programArgsList.

[1] FLINK-32377 <https://issues.apache.org/jira/browse/FLINK-32377>
https://issues.apache.org/jira/browse/FLINK-32377

Teoh, Hong <lian...@amazon.co.uk.invalid> 于2023年6月30日周五 00:53写道:


Teoh, Hong <lian...@amazon.co.uk.invalid> 于2023年6月30日周五 00:53写道:

> Thanks Xintong for driving the effort.
>
> I’d add a +1 to reworking configs, as suggested by @Jark and @Chesnay,
> especially the types. We have various configs that encode Time / MemorySize
> that are Long instead!
>
> Regards,
> Hong
>
>
>
> > On 29 Jun 2023, at 16:19, Yuan Mei <yuanmei.w...@gmail.com> wrote:
> >
> > CAUTION: This email originated from outside of the organization. Do not
> click links or open attachments unless you can confirm the sender and know
> the content is safe.
> >
> >
> >
> > Thanks for driving this effort, Xintong!
> >
> > To Chesnay
> >> I'm curious as to why the "Disaggregated State Management" item is
> >> marked as a must-have; will it require changes that break something?
> >> What prevents it from being added in 2.1?
> >
> > As to "Disaggregated State Management".
> >
> > We plan to provide a new type of state backend to support DFS as primary
> > storage.
> > To achieve this, we at least need to include two parts of amends (not
> > entirely sure yet, since we are still in the designing and prototype
> phase)
> >
> > 1. Statebackend Change
> > 2. State Access Change
> >
> > Not all of the interfaces related are `@Internal`. Some of the interfaces
> > like `StateBackend` is `@PublicEvolving`
> > So, you are right in the sense that "Disaggregated State Management"
> itself
> > probably does not need to be a "Must Have"
> >
> > But I was hoping changes that related to public APIs can be finalized and
> > merged in Flink 2.0 (I will fix the wiki accordingly).
> >
> > I also agree with Jark that 2.0 is a good chance to rework the default
> > value of configurations.
> >
> > Best
> > Yuan
> >
> >
> > On Thu, Jun 29, 2023 at 8:43 PM Chesnay Schepler <ches...@apache.org>
> wrote:
> >
> >> Something else configuration-related is that there are a bunch of
> >> options where the type isn't quite correct (e.g., a String where it
> >> could be an enum, a string where it should be an int or something).
> >> Could do a pass over those as well.
> >>
> >> On 29/06/2023 13:50, Jark Wu wrote:
> >>> Hi,
> >>>
> >>> I think one more thing we need to consider to do in 2.0 is changing the
> >>> default value of configuration to improve out-of-box user experience.
> >>>
> >>> Currently, in order to run a Flink job, users may need to set
> >>> a bunch of configurations, such as minibatch, checkpoint interval,
> >>> exactly-once,
> >>> incremental-checkpoint, etc. It's very verbose and hard to use for
> >>> beginners.
> >>> Most of them can have a universally applicable value.  Because changing
> >> the
> >>> default value is a breaking change. I think It's worth considering
> >> changing
> >>> them in 2.0.
> >>>
> >>> What do you think?
> >>>
> >>> Best,
> >>> Jark
> >>>
> >>>
> >>> On Wed, 28 Jun 2023 at 14:10, Sergey Nuyanzin <snuyan...@gmail.com>
> >> wrote:
> >>>
> >>>> Hi Chesnay
> >>>>
> >>>>> "Move Calcite rules from Scala to Java": I would hope that this would
> >> be
> >>>>> an entirely internal change, and could thus be an incremental process
> >>>>> independent of major releases.
> >>>>> What is the actual scale of this item; how much are we actually
> >>>> re-writing?
> >>>>
> >>>> Thanks for asking
> >>>> yes, you're right, that should be internal change.
> >>>> Yeah I was also thinking about incremental change (rule by rule or
> >>>> reasonable small group of rules).
> >>>> And yes, this could be an independent (on major release) activity
> >>>>
> >>>> The problem is actually for children of RelOptRule.
> >>>> Currently I see 60+ such rules (in Scala) using the mentioned
> deprecated
> >>>> api.
> >>>> There are also children of ConverterRule (50+) which do not have such
> >>>> issues.
> >>>> Maybe it could be considered as the next step to have all the rules in
> >>>> Java.
> >>>>
> >>>> On Tue, Jun 27, 2023 at 1:34 PM Xintong Song <tonysong...@gmail.com>
> >>>> wrote:
> >>>>
> >>>>> Hi Alex & Gyula,
> >>>>>
> >>>>> By compatibility discussion do you mean the "[DISCUSS] FLIP-321:
> >>>> Introduce
> >>>>>> an API deprecation process" thread [1]?
> >>>>>>
> >>>>> Yes, I meant the FLIP-321 discussion. I just noticed I pasted the
> wrong
> >>>> url
> >>>>> in my previous email. Sorry for the mistake.
> >>>>>
> >>>>> I am also curious to know if the rationale behind this new API has
> been
> >>>>>> previously discussed on the mailing list. Do we have a list of
> >>>>> shortcomings
> >>>>>> in the current DataStream API that it tries to resolve? How does the
> >>>>>> current ProcessFunction functionality fit into the picture? Will it
> be
> >>>>> kept
> >>>>>> as is or subsumed by new API?
> >>>>>>
> >>>>> I don't think we should create a replacement for the DataStream API
> >>>> unless
> >>>>>> we have a very good reason to do so and with a proper discussion
> about
> >>>>> this
> >>>>>> as Alex said.
> >>>>>
> >>>>> The ProcessFunction API which is targeting to replace DataStream API
> is
> >>>>> still a proposal, not a decision. Sorry for the confusion, I should
> >> have
> >>>>> been more careful with my words, not giving the impression that this
> is
> >>>>> something we'll do anyway.
> >>>>>
> >>>>> There will be a FLIP describing the motivations and designs in
> detail,
> >>>> for
> >>>>> the community to discuss and vote on. We are still working on it.
> TBH,
> >>>> this
> >>>>> is not trivial and we would need more time on it.
> >>>>>
> >>>>> Just to quickly share some backgrounds:
> >>>>>
> >>>>>    - We see quite some problems with the current DataStream APIs
> >>>>>       - Users are working with concrete classes rather than
> >> interfaces,
> >>>>>       which means
> >>>>>       - Users can access methods that are designed to be used by
> >> internal
> >>>>>          classes, even though they are annotated with `@Internal`.
> >> E.g.,
> >>>>>          `DataStream#getTransformation`.
> >>>>>          - Changes to the non-API implementations (e.g.,
> >>>> `Transformation`)
> >>>>>          would affect the API classes (e.g., `DataStream`), which
> >>>>> makes it hard to
> >>>>>          provide binary compatibility.
> >>>>>       - Internal classes are used as parameter / return-value of
> >> public
> >>>>>       APIs. E.g., while `AbstractStreamOperator` is PublicEvolving,
> >>>>> `StreamTask`
> >>>>>       which returns from `AbstractStreamOperator#getContainingTask`
> is
> >>>>> Internal.
> >>>>>       - In many cases, users are asked to extend the API classes,
> >> rather
> >>>>>       than implementing interfaces. E.g., `AbstractStreamOperator`.
> >>>>>          - Any changes to the base classes, even the internal part,
> >> may
> >>>>>          affect the behavior of the user-provided sub-classes
> >>>>>          - Users can override the behavior of the base classes
> >>>>>       - The API module `flink-streaming-java` contains non-API
> >> classes,
> >>>> and
> >>>>>       depends on internal modules such as `flink-runtime`, which
> means
> >>>>>       - Changes to the internal modules may affect the API modules,
> >> which
> >>>>>          requires users to re-build their applications upon upgrading
> >>>>>          - The artifact user needs for building their application
> >> larger
> >>>>>          than necessary.
> >>>>>       - We probably should not expose operators (e.g.,
> >>>>>       `AbstractStreamOperator`) to users. Functions should be enough
> >>>>> for users to
> >>>>>       define their data processing logics. Exposing operator-level
> >>>> concepts
> >>>>>       (e.g., mailbox thread model, checkpoint barrier alignment,
> >> etc.) is
> >>>>>       unnecessary and limits the improvement regarding such exposed
> >>>>> mechanisms
> >>>>>       with compatibility considerations.
> >>>>>       - The current DataStream API seems to be a mixture of many
> >> things,
> >>>>>       making it hard to understand especially for newcomers. It might
> >> be
> >>>>> better
> >>>>>       to re-organize it into several parts: (the taxonomy below are
> >> just
> >>>> an
> >>>>>       example of the, we are still working on this)
> >>>>>          - The most fundamental stateful stream processing: streams,
> >>>>>          partitions / key, process functions, state, timeline-service
> >>>>>          - An extension for common batch-streaming unified functions:
> >>>> map,
> >>>>>          flatmap, filter, agg, reduce, join, etc.
> >>>>>          - An extension for windowing supports:  window, triggering
> >>>>>          - An extension for event-time supports: event time,
> watermark
> >>>>>          - The extensions are like short-cuts / sugars, without which
> >>>> users
> >>>>>          can probably still achieve the same behavior by working with
> >> the
> >>>>>          fundamental APIs, but would be a lot easier with the
> >> extensions
> >>>>>       - The original plan was to do in-place refactors / changes on
> >>>>>    DataStream API. Some related items are listed in this doc [2]
> >> attached
> >>>>> to
> >>>>>    the kicking off email [3]. Not all of the above issues are listed,
> >>>>> because
> >>>>>    we haven't looked into this as deeply as now  by that time.
> >>>>>    - We proposed this as a new API rather than in-place refactors in
> >> the
> >>>>>    2.0 work item list, because we realized the changes might be too
> >> big
> >>>>> for an
> >>>>>    in-place change. First having a new API then gradually retiring
> the
> >>>> old
> >>>>> one
> >>>>>    would help users to smoothly migrate between them.
> >>>>>
> >>>>> A thorough discussion is definitely needed once the FLIP is out. And
> of
> >>>>> course it's possible that the FLIP might be rejected. Given that we
> are
> >>>>> planning for release 2.0, I just feel it would be better to bring
> this
> >> up
> >>>>> early even the concrete plan is not yet ready,
> >>>>>
> >>>>> Best,
> >>>>>
> >>>>> Xintong
> >>>>>
> >>>>>
> >>>>> [1] https://lists.apache.org/thread/vmhzv8fcw2b33pqxp43486owrxbkd5x9
> >>>>> [2]
> >>>>>
> >>>>>
> >>>>
> >>
> https://docs.google.com/document/d/1_PMGl5RuDQGlV99_gL3y7OiRsF0DgCk91Coua6hFXhE/edit?usp=sharing
> >>>>> [3] https://lists.apache.org/thread/b8w5cx0qqbwzzklyn5xxf54vw9ymys1c
> >>>>>
> >>>>> On Tue, Jun 27, 2023 at 5:15 PM Gyula Fóra <gyf...@apache.org>
> wrote:
> >>>>>
> >>>>>> Hey!
> >>>>>>
> >>>>>> I share the same concerns mentioned above regarding the
> >>>> "ProcessFunction
> >>>>>> API".
> >>>>>>
> >>>>>> I don't think we should create a replacement for the DataStream API
> >>>>> unless
> >>>>>> we have a very good reason to do so and with a proper discussion
> about
> >>>>> this
> >>>>>> as Alex said.
> >>>>>>
> >>>>>> Cheers,
> >>>>>> Gyula
> >>>>>>
> >>>>>> On Tue, Jun 27, 2023 at 11:03 AM Alexander Fedulov <
> >>>>>> alexander.fedu...@gmail.com> wrote:
> >>>>>>
> >>>>>>> Hi Xintong,
> >>>>>>>
> >>>>>>> By compatibility discussion do you mean the "[DISCUSS] FLIP-321:
> >>>>>> Introduce
> >>>>>>> an API deprecation process" thread [1]?
> >>>>>>>
> >>>>>>> I am also curious to know if the rationale behind this new API has
> >>>> been
> >>>>>>> previously discussed on the mailing list. Do we have a list of
> >>>>>> shortcomings
> >>>>>>> in the current DataStream API that it tries to resolve? How does
> the
> >>>>>>> current ProcessFunction functionality fit into the picture? Will it
> >>>> be
> >>>>>> kept
> >>>>>>> as is or subsumed by new API?
> >>>>>>>
> >>>>>>> [1]
> https://lists.apache.org/thread/vmhzv8fcw2b33pqxp43486owrxbkd5x9
> >>>>>>>
> >>>>>>> Best,
> >>>>>>> Alex
> >>>>>>>
> >>>>>>> On Mon, 26 Jun 2023 at 14:33, Xintong Song <tonysong...@gmail.com>
> >>>>>> wrote:
> >>>>>>>>> The ProcessFunction API item is giving me the most headaches
> >>>>> because
> >>>>>>> it's
> >>>>>>>>> very unclear what it actually entails; like is it an entirely
> >>>>>> separate
> >>>>>>>> API
> >>>>>>>>> to DataStream (sounds like it is!) or an extension of DataStream.
> >>>>> How
> >>>>>>>> much
> >>>>>>>>> will it share the internals with DataStream etc.; how does it
> >>>>> relate
> >>>>>> to
> >>>>>>>> the
> >>>>>>>>> Table API (w.r.t. switching APIs / what Table API uses
> >>>> underneath).
> >>>>>>>> I totally understand your confusion. We started planning this
> after
> >>>>>>> kicking
> >>>>>>>> off the release 2.0, so there's still a lot to be explored and the
> >>>>> plan
> >>>>>>>> keeps changing.
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>    - In the beginning, we planned to do an in-place refactor of
> >>>>>>> DataStream
> >>>>>>>>    API, until the API migration period is proposed.
> >>>>>>>>    - Then we want to make it an entirely separate API to
> >>>> DataStream,
> >>>>>> and
> >>>>>>>>    listed as a must-have for release 2.0 so that we can remove
> >>>>>> DataStream
> >>>>>>>> once
> >>>>>>>>    it's ready.
> >>>>>>>>    - However, depending on the outcome of the API compatibility
> >>>>>>> discussion
> >>>>>>>>    [1], we may not be able to remove DataStream in 2.0 anyway,
> >>>> which
> >>>>>>> means
> >>>>>>>> we
> >>>>>>>>    might need to re-evaluate the necessity of this item for 2.0.
> >>>>>>>>
> >>>>>>>> I'd say we wait a bit longer for the compatibility discussion [1]
> >>>> and
> >>>>>>>> decide the priority for this item afterwards.
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> Best,
> >>>>>>>>
> >>>>>>>> Xintong
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> [1] https://lists.apache.org/list.html?dev@flink.apache.org
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> On Mon, Jun 26, 2023 at 6:00 PM Chesnay Schepler <
> >>>> ches...@apache.org
> >>>>>>>> wrote:
> >>>>>>>>
> >>>>>>>>> by-and-large I'm quite happy with the list of items.
> >>>>>>>>>
> >>>>>>>>> I'm curious as to why the "Disaggregated State Management" item
> >>>> is
> >>>>>>> marked
> >>>>>>>>> as a must-have; will it require changes that break something?
> >>>> What
> >>>>>>>> prevents
> >>>>>>>>> it from being added in 2.1?
> >>>>>>>>>
> >>>>>>>>> We may want to update the Java 17 item to "Make Java 17 the
> >>>>> default,
> >>>>>>> drop
> >>>>>>>>> Java 8/11". Maybe even split it into a must-have "Drop Java 8"
> >>>> and
> >>>>> a
> >>>>>>>>> nice-to-have "Drop Java 11"?
> >>>>>>>>>
> >>>>>>>>> "Move Calcite rules from Scala to Java": I would hope that this
> >>>>> would
> >>>>>>> be
> >>>>>>>>> an entirely internal change, and could thus be an incremental
> >>>>> process
> >>>>>>>>> independent of major releases.
> >>>>>>>>> What is the actual scale of this item; how much are we actually
> >>>>>>>> re-writing?
> >>>>>>>>> "Add MetricGroup#getLogicalScope": I'd raise this to a
> >>>> must-have; i
> >>>>>>> think
> >>>>>>>>> I marked it down as nice-to-have only because it depends on
> >>>> another
> >>>>>>> item.
> >>>>>>>>> The ProcessFunction API item is giving me the most headaches
> >>>>> because
> >>>>>>> it's
> >>>>>>>>> very unclear what it actually entails; like is it an entirely
> >>>>>> separate
> >>>>>>>> API
> >>>>>>>>> to DataStream (sounds like it is!) or an extension of DataStream.
> >>>>> How
> >>>>>>>> much
> >>>>>>>>> will it share the internals with DataStream etc.; how does it
> >>>>> relate
> >>>>>> to
> >>>>>>>> the
> >>>>>>>>> Table API (w.r.t. switching APIs / what Table API uses
> >>>> underneath).
> >>>>>>>>> There are a few items I added as ideas which don't have a
> >>>> priority
> >>>>>> yet;
> >>>>>>>>> would love to get some feedback on those.
> >>>>>>>>>
> >>>>>>>>> On 21/06/2023 08:41, Xintong Song wrote:
> >>>>>>>>>
> >>>>>>>>> Hi devs,
> >>>>>>>>>
> >>>>>>>>> As previously discussed in [1], we had been collecting work item
> >>>>>>>> proposals
> >>>>>>>>> for the 2.0 release until June 15th, on the wiki page [2].
> >>>>>>>>>
> >>>>>>>>>    - As we have passed the due date, I'd like to kindly remind
> >>>>>> everyone
> >>>>>>>> *not
> >>>>>>>>>    to add / remove items directly on the wiki page*. If needed,
> >>>>>> please
> >>>>>>>> post
> >>>>>>>>>    in this thread or reach out to the release managers instead.
> >>>>>>>>>    - I've reached out to some folks for clarifications about
> >>>> their
> >>>>>>>>>    proposals. Some of them mentioned that they can not yet tell
> >>>>>> whether
> >>>>>>>> we
> >>>>>>>>>    should do an item or not, and would need more time /
> >>>> discussions
> >>>>>> to
> >>>>>>>> make
> >>>>>>>>>    the decision. So I added a new symbol for items whose
> >>>> priorities
> >>>>>> are
> >>>>>>>> `TBD`.
> >>>>>>>>> Now it's time to collaboratively decide a minimum set of
> >>>> must-have
> >>>>>>> items.
> >>>>>>>>> I've gone through the entire list of proposed items, and found
> >>>> most
> >>>>>> of
> >>>>>>>> them
> >>>>>>>>> make quite much sense. So I think an online sync might not be
> >>>>>> necessary
> >>>>>>>> for
> >>>>>>>>> this. I'd like to go with this DISCUSS thread, where everyone can
> >>>>>>> comment
> >>>>>>>>> on how they think the list can be improved, followed by a VOTE to
> >>>>>>>> formally
> >>>>>>>>> make the decision.
> >>>>>>>>>
> >>>>>>>>> Any feedback and opinions, including but not limited to the
> >>>>> following
> >>>>>>>>> aspects, will be appreciated.
> >>>>>>>>>
> >>>>>>>>>    - Important items that are missing from the list
> >>>>>>>>>    - Concerns regarding the listed items or their priorities
> >>>>>>>>>
> >>>>>>>>> Looking forward to your feedback.
> >>>>>>>>>
> >>>>>>>>> Best,
> >>>>>>>>>
> >>>>>>>>> Xintong
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> [1]
> >>>>
> >>
> https://lists.apache.org/list?dev@flink.apache.org:lte=1M:release%202.0%20status%20updates
> >>>>>>>>> [2]
> >>>> https://cwiki.apache.org/confluence/display/FLINK/2.0+Release
> >>>>>>>>>
> >>>>>>>>>
> >>>>
> >>>> --
> >>>> Best regards,
> >>>> Sergey
> >>>>
> >>
> >>
>
>

-- 
Best

ConradJam

Reply via email to