Thanks Xintong for driving the effort.

I’d add a +1 to reworking configs, as suggested by @Jark and @Chesnay, 
especially the types. We have various configs that encode Time / MemorySize 
that are Long instead!

Regards,
Hong



> On 29 Jun 2023, at 16:19, Yuan Mei <yuanmei.w...@gmail.com> wrote:
> 
> CAUTION: This email originated from outside of the organization. Do not click 
> links or open attachments unless you can confirm the sender and know the 
> content is safe.
> 
> 
> 
> Thanks for driving this effort, Xintong!
> 
> To Chesnay
>> I'm curious as to why the "Disaggregated State Management" item is
>> marked as a must-have; will it require changes that break something?
>> What prevents it from being added in 2.1?
> 
> As to "Disaggregated State Management".
> 
> We plan to provide a new type of state backend to support DFS as primary
> storage.
> To achieve this, we at least need to include two parts of amends (not
> entirely sure yet, since we are still in the designing and prototype phase)
> 
> 1. Statebackend Change
> 2. State Access Change
> 
> Not all of the interfaces related are `@Internal`. Some of the interfaces
> like `StateBackend` is `@PublicEvolving`
> So, you are right in the sense that "Disaggregated State Management" itself
> probably does not need to be a "Must Have"
> 
> But I was hoping changes that related to public APIs can be finalized and
> merged in Flink 2.0 (I will fix the wiki accordingly).
> 
> I also agree with Jark that 2.0 is a good chance to rework the default
> value of configurations.
> 
> Best
> Yuan
> 
> 
> On Thu, Jun 29, 2023 at 8:43 PM Chesnay Schepler <ches...@apache.org> wrote:
> 
>> Something else configuration-related is that there are a bunch of
>> options where the type isn't quite correct (e.g., a String where it
>> could be an enum, a string where it should be an int or something).
>> Could do a pass over those as well.
>> 
>> On 29/06/2023 13:50, Jark Wu wrote:
>>> Hi,
>>> 
>>> I think one more thing we need to consider to do in 2.0 is changing the
>>> default value of configuration to improve out-of-box user experience.
>>> 
>>> Currently, in order to run a Flink job, users may need to set
>>> a bunch of configurations, such as minibatch, checkpoint interval,
>>> exactly-once,
>>> incremental-checkpoint, etc. It's very verbose and hard to use for
>>> beginners.
>>> Most of them can have a universally applicable value.  Because changing
>> the
>>> default value is a breaking change. I think It's worth considering
>> changing
>>> them in 2.0.
>>> 
>>> What do you think?
>>> 
>>> Best,
>>> Jark
>>> 
>>> 
>>> On Wed, 28 Jun 2023 at 14:10, Sergey Nuyanzin <snuyan...@gmail.com>
>> wrote:
>>> 
>>>> Hi Chesnay
>>>> 
>>>>> "Move Calcite rules from Scala to Java": I would hope that this would
>> be
>>>>> an entirely internal change, and could thus be an incremental process
>>>>> independent of major releases.
>>>>> What is the actual scale of this item; how much are we actually
>>>> re-writing?
>>>> 
>>>> Thanks for asking
>>>> yes, you're right, that should be internal change.
>>>> Yeah I was also thinking about incremental change (rule by rule or
>>>> reasonable small group of rules).
>>>> And yes, this could be an independent (on major release) activity
>>>> 
>>>> The problem is actually for children of RelOptRule.
>>>> Currently I see 60+ such rules (in Scala) using the mentioned deprecated
>>>> api.
>>>> There are also children of ConverterRule (50+) which do not have such
>>>> issues.
>>>> Maybe it could be considered as the next step to have all the rules in
>>>> Java.
>>>> 
>>>> On Tue, Jun 27, 2023 at 1:34 PM Xintong Song <tonysong...@gmail.com>
>>>> wrote:
>>>> 
>>>>> Hi Alex & Gyula,
>>>>> 
>>>>> By compatibility discussion do you mean the "[DISCUSS] FLIP-321:
>>>> Introduce
>>>>>> an API deprecation process" thread [1]?
>>>>>> 
>>>>> Yes, I meant the FLIP-321 discussion. I just noticed I pasted the wrong
>>>> url
>>>>> in my previous email. Sorry for the mistake.
>>>>> 
>>>>> I am also curious to know if the rationale behind this new API has been
>>>>>> previously discussed on the mailing list. Do we have a list of
>>>>> shortcomings
>>>>>> in the current DataStream API that it tries to resolve? How does the
>>>>>> current ProcessFunction functionality fit into the picture? Will it be
>>>>> kept
>>>>>> as is or subsumed by new API?
>>>>>> 
>>>>> I don't think we should create a replacement for the DataStream API
>>>> unless
>>>>>> we have a very good reason to do so and with a proper discussion about
>>>>> this
>>>>>> as Alex said.
>>>>> 
>>>>> The ProcessFunction API which is targeting to replace DataStream API is
>>>>> still a proposal, not a decision. Sorry for the confusion, I should
>> have
>>>>> been more careful with my words, not giving the impression that this is
>>>>> something we'll do anyway.
>>>>> 
>>>>> There will be a FLIP describing the motivations and designs in detail,
>>>> for
>>>>> the community to discuss and vote on. We are still working on it. TBH,
>>>> this
>>>>> is not trivial and we would need more time on it.
>>>>> 
>>>>> Just to quickly share some backgrounds:
>>>>> 
>>>>>    - We see quite some problems with the current DataStream APIs
>>>>>       - Users are working with concrete classes rather than
>> interfaces,
>>>>>       which means
>>>>>       - Users can access methods that are designed to be used by
>> internal
>>>>>          classes, even though they are annotated with `@Internal`.
>> E.g.,
>>>>>          `DataStream#getTransformation`.
>>>>>          - Changes to the non-API implementations (e.g.,
>>>> `Transformation`)
>>>>>          would affect the API classes (e.g., `DataStream`), which
>>>>> makes it hard to
>>>>>          provide binary compatibility.
>>>>>       - Internal classes are used as parameter / return-value of
>> public
>>>>>       APIs. E.g., while `AbstractStreamOperator` is PublicEvolving,
>>>>> `StreamTask`
>>>>>       which returns from `AbstractStreamOperator#getContainingTask` is
>>>>> Internal.
>>>>>       - In many cases, users are asked to extend the API classes,
>> rather
>>>>>       than implementing interfaces. E.g., `AbstractStreamOperator`.
>>>>>          - Any changes to the base classes, even the internal part,
>> may
>>>>>          affect the behavior of the user-provided sub-classes
>>>>>          - Users can override the behavior of the base classes
>>>>>       - The API module `flink-streaming-java` contains non-API
>> classes,
>>>> and
>>>>>       depends on internal modules such as `flink-runtime`, which means
>>>>>       - Changes to the internal modules may affect the API modules,
>> which
>>>>>          requires users to re-build their applications upon upgrading
>>>>>          - The artifact user needs for building their application
>> larger
>>>>>          than necessary.
>>>>>       - We probably should not expose operators (e.g.,
>>>>>       `AbstractStreamOperator`) to users. Functions should be enough
>>>>> for users to
>>>>>       define their data processing logics. Exposing operator-level
>>>> concepts
>>>>>       (e.g., mailbox thread model, checkpoint barrier alignment,
>> etc.) is
>>>>>       unnecessary and limits the improvement regarding such exposed
>>>>> mechanisms
>>>>>       with compatibility considerations.
>>>>>       - The current DataStream API seems to be a mixture of many
>> things,
>>>>>       making it hard to understand especially for newcomers. It might
>> be
>>>>> better
>>>>>       to re-organize it into several parts: (the taxonomy below are
>> just
>>>> an
>>>>>       example of the, we are still working on this)
>>>>>          - The most fundamental stateful stream processing: streams,
>>>>>          partitions / key, process functions, state, timeline-service
>>>>>          - An extension for common batch-streaming unified functions:
>>>> map,
>>>>>          flatmap, filter, agg, reduce, join, etc.
>>>>>          - An extension for windowing supports:  window, triggering
>>>>>          - An extension for event-time supports: event time, watermark
>>>>>          - The extensions are like short-cuts / sugars, without which
>>>> users
>>>>>          can probably still achieve the same behavior by working with
>> the
>>>>>          fundamental APIs, but would be a lot easier with the
>> extensions
>>>>>       - The original plan was to do in-place refactors / changes on
>>>>>    DataStream API. Some related items are listed in this doc [2]
>> attached
>>>>> to
>>>>>    the kicking off email [3]. Not all of the above issues are listed,
>>>>> because
>>>>>    we haven't looked into this as deeply as now  by that time.
>>>>>    - We proposed this as a new API rather than in-place refactors in
>> the
>>>>>    2.0 work item list, because we realized the changes might be too
>> big
>>>>> for an
>>>>>    in-place change. First having a new API then gradually retiring the
>>>> old
>>>>> one
>>>>>    would help users to smoothly migrate between them.
>>>>> 
>>>>> A thorough discussion is definitely needed once the FLIP is out. And of
>>>>> course it's possible that the FLIP might be rejected. Given that we are
>>>>> planning for release 2.0, I just feel it would be better to bring this
>> up
>>>>> early even the concrete plan is not yet ready,
>>>>> 
>>>>> Best,
>>>>> 
>>>>> Xintong
>>>>> 
>>>>> 
>>>>> [1] https://lists.apache.org/thread/vmhzv8fcw2b33pqxp43486owrxbkd5x9
>>>>> [2]
>>>>> 
>>>>> 
>>>> 
>> https://docs.google.com/document/d/1_PMGl5RuDQGlV99_gL3y7OiRsF0DgCk91Coua6hFXhE/edit?usp=sharing
>>>>> [3] https://lists.apache.org/thread/b8w5cx0qqbwzzklyn5xxf54vw9ymys1c
>>>>> 
>>>>> On Tue, Jun 27, 2023 at 5:15 PM Gyula Fóra <gyf...@apache.org> wrote:
>>>>> 
>>>>>> Hey!
>>>>>> 
>>>>>> I share the same concerns mentioned above regarding the
>>>> "ProcessFunction
>>>>>> API".
>>>>>> 
>>>>>> I don't think we should create a replacement for the DataStream API
>>>>> unless
>>>>>> we have a very good reason to do so and with a proper discussion about
>>>>> this
>>>>>> as Alex said.
>>>>>> 
>>>>>> Cheers,
>>>>>> Gyula
>>>>>> 
>>>>>> On Tue, Jun 27, 2023 at 11:03 AM Alexander Fedulov <
>>>>>> alexander.fedu...@gmail.com> wrote:
>>>>>> 
>>>>>>> Hi Xintong,
>>>>>>> 
>>>>>>> By compatibility discussion do you mean the "[DISCUSS] FLIP-321:
>>>>>> Introduce
>>>>>>> an API deprecation process" thread [1]?
>>>>>>> 
>>>>>>> I am also curious to know if the rationale behind this new API has
>>>> been
>>>>>>> previously discussed on the mailing list. Do we have a list of
>>>>>> shortcomings
>>>>>>> in the current DataStream API that it tries to resolve? How does the
>>>>>>> current ProcessFunction functionality fit into the picture? Will it
>>>> be
>>>>>> kept
>>>>>>> as is or subsumed by new API?
>>>>>>> 
>>>>>>> [1] https://lists.apache.org/thread/vmhzv8fcw2b33pqxp43486owrxbkd5x9
>>>>>>> 
>>>>>>> Best,
>>>>>>> Alex
>>>>>>> 
>>>>>>> On Mon, 26 Jun 2023 at 14:33, Xintong Song <tonysong...@gmail.com>
>>>>>> wrote:
>>>>>>>>> The ProcessFunction API item is giving me the most headaches
>>>>> because
>>>>>>> it's
>>>>>>>>> very unclear what it actually entails; like is it an entirely
>>>>>> separate
>>>>>>>> API
>>>>>>>>> to DataStream (sounds like it is!) or an extension of DataStream.
>>>>> How
>>>>>>>> much
>>>>>>>>> will it share the internals with DataStream etc.; how does it
>>>>> relate
>>>>>> to
>>>>>>>> the
>>>>>>>>> Table API (w.r.t. switching APIs / what Table API uses
>>>> underneath).
>>>>>>>> I totally understand your confusion. We started planning this after
>>>>>>> kicking
>>>>>>>> off the release 2.0, so there's still a lot to be explored and the
>>>>> plan
>>>>>>>> keeps changing.
>>>>>>>> 
>>>>>>>> 
>>>>>>>>    - In the beginning, we planned to do an in-place refactor of
>>>>>>> DataStream
>>>>>>>>    API, until the API migration period is proposed.
>>>>>>>>    - Then we want to make it an entirely separate API to
>>>> DataStream,
>>>>>> and
>>>>>>>>    listed as a must-have for release 2.0 so that we can remove
>>>>>> DataStream
>>>>>>>> once
>>>>>>>>    it's ready.
>>>>>>>>    - However, depending on the outcome of the API compatibility
>>>>>>> discussion
>>>>>>>>    [1], we may not be able to remove DataStream in 2.0 anyway,
>>>> which
>>>>>>> means
>>>>>>>> we
>>>>>>>>    might need to re-evaluate the necessity of this item for 2.0.
>>>>>>>> 
>>>>>>>> I'd say we wait a bit longer for the compatibility discussion [1]
>>>> and
>>>>>>>> decide the priority for this item afterwards.
>>>>>>>> 
>>>>>>>> 
>>>>>>>> Best,
>>>>>>>> 
>>>>>>>> Xintong
>>>>>>>> 
>>>>>>>> 
>>>>>>>> [1] https://lists.apache.org/list.html?dev@flink.apache.org
>>>>>>>> 
>>>>>>>> 
>>>>>>>> On Mon, Jun 26, 2023 at 6:00 PM Chesnay Schepler <
>>>> ches...@apache.org
>>>>>>>> wrote:
>>>>>>>> 
>>>>>>>>> by-and-large I'm quite happy with the list of items.
>>>>>>>>> 
>>>>>>>>> I'm curious as to why the "Disaggregated State Management" item
>>>> is
>>>>>>> marked
>>>>>>>>> as a must-have; will it require changes that break something?
>>>> What
>>>>>>>> prevents
>>>>>>>>> it from being added in 2.1?
>>>>>>>>> 
>>>>>>>>> We may want to update the Java 17 item to "Make Java 17 the
>>>>> default,
>>>>>>> drop
>>>>>>>>> Java 8/11". Maybe even split it into a must-have "Drop Java 8"
>>>> and
>>>>> a
>>>>>>>>> nice-to-have "Drop Java 11"?
>>>>>>>>> 
>>>>>>>>> "Move Calcite rules from Scala to Java": I would hope that this
>>>>> would
>>>>>>> be
>>>>>>>>> an entirely internal change, and could thus be an incremental
>>>>> process
>>>>>>>>> independent of major releases.
>>>>>>>>> What is the actual scale of this item; how much are we actually
>>>>>>>> re-writing?
>>>>>>>>> "Add MetricGroup#getLogicalScope": I'd raise this to a
>>>> must-have; i
>>>>>>> think
>>>>>>>>> I marked it down as nice-to-have only because it depends on
>>>> another
>>>>>>> item.
>>>>>>>>> The ProcessFunction API item is giving me the most headaches
>>>>> because
>>>>>>> it's
>>>>>>>>> very unclear what it actually entails; like is it an entirely
>>>>>> separate
>>>>>>>> API
>>>>>>>>> to DataStream (sounds like it is!) or an extension of DataStream.
>>>>> How
>>>>>>>> much
>>>>>>>>> will it share the internals with DataStream etc.; how does it
>>>>> relate
>>>>>> to
>>>>>>>> the
>>>>>>>>> Table API (w.r.t. switching APIs / what Table API uses
>>>> underneath).
>>>>>>>>> There are a few items I added as ideas which don't have a
>>>> priority
>>>>>> yet;
>>>>>>>>> would love to get some feedback on those.
>>>>>>>>> 
>>>>>>>>> On 21/06/2023 08:41, Xintong Song wrote:
>>>>>>>>> 
>>>>>>>>> Hi devs,
>>>>>>>>> 
>>>>>>>>> As previously discussed in [1], we had been collecting work item
>>>>>>>> proposals
>>>>>>>>> for the 2.0 release until June 15th, on the wiki page [2].
>>>>>>>>> 
>>>>>>>>>    - As we have passed the due date, I'd like to kindly remind
>>>>>> everyone
>>>>>>>> *not
>>>>>>>>>    to add / remove items directly on the wiki page*. If needed,
>>>>>> please
>>>>>>>> post
>>>>>>>>>    in this thread or reach out to the release managers instead.
>>>>>>>>>    - I've reached out to some folks for clarifications about
>>>> their
>>>>>>>>>    proposals. Some of them mentioned that they can not yet tell
>>>>>> whether
>>>>>>>> we
>>>>>>>>>    should do an item or not, and would need more time /
>>>> discussions
>>>>>> to
>>>>>>>> make
>>>>>>>>>    the decision. So I added a new symbol for items whose
>>>> priorities
>>>>>> are
>>>>>>>> `TBD`.
>>>>>>>>> Now it's time to collaboratively decide a minimum set of
>>>> must-have
>>>>>>> items.
>>>>>>>>> I've gone through the entire list of proposed items, and found
>>>> most
>>>>>> of
>>>>>>>> them
>>>>>>>>> make quite much sense. So I think an online sync might not be
>>>>>> necessary
>>>>>>>> for
>>>>>>>>> this. I'd like to go with this DISCUSS thread, where everyone can
>>>>>>> comment
>>>>>>>>> on how they think the list can be improved, followed by a VOTE to
>>>>>>>> formally
>>>>>>>>> make the decision.
>>>>>>>>> 
>>>>>>>>> Any feedback and opinions, including but not limited to the
>>>>> following
>>>>>>>>> aspects, will be appreciated.
>>>>>>>>> 
>>>>>>>>>    - Important items that are missing from the list
>>>>>>>>>    - Concerns regarding the listed items or their priorities
>>>>>>>>> 
>>>>>>>>> Looking forward to your feedback.
>>>>>>>>> 
>>>>>>>>> Best,
>>>>>>>>> 
>>>>>>>>> Xintong
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> [1]
>>>> 
>> https://lists.apache.org/list?dev@flink.apache.org:lte=1M:release%202.0%20status%20updates
>>>>>>>>> [2]
>>>> https://cwiki.apache.org/confluence/display/FLINK/2.0+Release
>>>>>>>>> 
>>>>>>>>> 
>>>> 
>>>> --
>>>> Best regards,
>>>> Sergey
>>>> 
>> 
>> 

Reply via email to