Hi,

Speaking of "Move Calcite rules from Scala to Java", I was wondering if
this thread is the right place to talk about it. Afaik, the Flink community
has decided to abandon Scala. That is the reason, I guess, we want to move
those Calcite rules from Scala to Java. On the other side, new Scala code
will be added while developing new features[1]. Do we have any thoughts
wrt the Scala code strategy?

Best regards,
Jing



[1] https://lists.apache.org/thread/tnygl4n3q1fx75cl2vclc78j8mrxmz6y

On Mon, Jul 3, 2023 at 10:31 AM Xintong Song <tonysong...@gmail.com> wrote:

> Thanks all for the discussion.
>
>
> IIUC, we need to make the following changes. Please correct me if I get it
> wrong.
>
>
> 1. Disaggregated State Management - Clarify that only the public API
> related part is must-have for 2.0.
>
> 2. Java version support - Split it into 3 items: a) make java 17 the
> default (must-have), b) drop java 8 (must-have), and c) drop java 11
> (nice-to-have)
>
> 3. Add MetricGroup#getLogicalScope - Should be promoted to must-have
>
> 4. ProcessFunction API - Should be downgrade to nice-to-have
>
> 5. Configuration - Add an item "revisit all config option types and default
> values", which IIUC should also be a must-have
>
>
> There seems to be no changes needed for "Move Calcite rules from Scala to
> Java" as it's already nice-to-have.
>
>
> If there's no objections, I'll update the wiki page accordingly, and start
> a VOTE in the next couple of days.
>
>
> Best,
>
> Xintong
>
>
>
> On Fri, Jun 30, 2023 at 12:53 AM Teoh, Hong <lian...@amazon.co.uk.invalid>
> wrote:
>
> > Thanks Xintong for driving the effort.
> >
> > I’d add a +1 to reworking configs, as suggested by @Jark and @Chesnay,
> > especially the types. We have various configs that encode Time /
> MemorySize
> > that are Long instead!
> >
> > Regards,
> > Hong
> >
> >
> >
> > > On 29 Jun 2023, at 16:19, Yuan Mei <yuanmei.w...@gmail.com> wrote:
> > >
> > > CAUTION: This email originated from outside of the organization. Do not
> > click links or open attachments unless you can confirm the sender and
> know
> > the content is safe.
> > >
> > >
> > >
> > > Thanks for driving this effort, Xintong!
> > >
> > > To Chesnay
> > >> I'm curious as to why the "Disaggregated State Management" item is
> > >> marked as a must-have; will it require changes that break something?
> > >> What prevents it from being added in 2.1?
> > >
> > > As to "Disaggregated State Management".
> > >
> > > We plan to provide a new type of state backend to support DFS as
> primary
> > > storage.
> > > To achieve this, we at least need to include two parts of amends (not
> > > entirely sure yet, since we are still in the designing and prototype
> > phase)
> > >
> > > 1. Statebackend Change
> > > 2. State Access Change
> > >
> > > Not all of the interfaces related are `@Internal`. Some of the
> interfaces
> > > like `StateBackend` is `@PublicEvolving`
> > > So, you are right in the sense that "Disaggregated State Management"
> > itself
> > > probably does not need to be a "Must Have"
> > >
> > > But I was hoping changes that related to public APIs can be finalized
> and
> > > merged in Flink 2.0 (I will fix the wiki accordingly).
> > >
> > > I also agree with Jark that 2.0 is a good chance to rework the default
> > > value of configurations.
> > >
> > > Best
> > > Yuan
> > >
> > >
> > > On Thu, Jun 29, 2023 at 8:43 PM Chesnay Schepler <ches...@apache.org>
> > wrote:
> > >
> > >> Something else configuration-related is that there are a bunch of
> > >> options where the type isn't quite correct (e.g., a String where it
> > >> could be an enum, a string where it should be an int or something).
> > >> Could do a pass over those as well.
> > >>
> > >> On 29/06/2023 13:50, Jark Wu wrote:
> > >>> Hi,
> > >>>
> > >>> I think one more thing we need to consider to do in 2.0 is changing
> the
> > >>> default value of configuration to improve out-of-box user experience.
> > >>>
> > >>> Currently, in order to run a Flink job, users may need to set
> > >>> a bunch of configurations, such as minibatch, checkpoint interval,
> > >>> exactly-once,
> > >>> incremental-checkpoint, etc. It's very verbose and hard to use for
> > >>> beginners.
> > >>> Most of them can have a universally applicable value.  Because
> changing
> > >> the
> > >>> default value is a breaking change. I think It's worth considering
> > >> changing
> > >>> them in 2.0.
> > >>>
> > >>> What do you think?
> > >>>
> > >>> Best,
> > >>> Jark
> > >>>
> > >>>
> > >>> On Wed, 28 Jun 2023 at 14:10, Sergey Nuyanzin <snuyan...@gmail.com>
> > >> wrote:
> > >>>
> > >>>> Hi Chesnay
> > >>>>
> > >>>>> "Move Calcite rules from Scala to Java": I would hope that this
> would
> > >> be
> > >>>>> an entirely internal change, and could thus be an incremental
> process
> > >>>>> independent of major releases.
> > >>>>> What is the actual scale of this item; how much are we actually
> > >>>> re-writing?
> > >>>>
> > >>>> Thanks for asking
> > >>>> yes, you're right, that should be internal change.
> > >>>> Yeah I was also thinking about incremental change (rule by rule or
> > >>>> reasonable small group of rules).
> > >>>> And yes, this could be an independent (on major release) activity
> > >>>>
> > >>>> The problem is actually for children of RelOptRule.
> > >>>> Currently I see 60+ such rules (in Scala) using the mentioned
> > deprecated
> > >>>> api.
> > >>>> There are also children of ConverterRule (50+) which do not have
> such
> > >>>> issues.
> > >>>> Maybe it could be considered as the next step to have all the rules
> in
> > >>>> Java.
> > >>>>
> > >>>> On Tue, Jun 27, 2023 at 1:34 PM Xintong Song <tonysong...@gmail.com
> >
> > >>>> wrote:
> > >>>>
> > >>>>> Hi Alex & Gyula,
> > >>>>>
> > >>>>> By compatibility discussion do you mean the "[DISCUSS] FLIP-321:
> > >>>> Introduce
> > >>>>>> an API deprecation process" thread [1]?
> > >>>>>>
> > >>>>> Yes, I meant the FLIP-321 discussion. I just noticed I pasted the
> > wrong
> > >>>> url
> > >>>>> in my previous email. Sorry for the mistake.
> > >>>>>
> > >>>>> I am also curious to know if the rationale behind this new API has
> > been
> > >>>>>> previously discussed on the mailing list. Do we have a list of
> > >>>>> shortcomings
> > >>>>>> in the current DataStream API that it tries to resolve? How does
> the
> > >>>>>> current ProcessFunction functionality fit into the picture? Will
> it
> > be
> > >>>>> kept
> > >>>>>> as is or subsumed by new API?
> > >>>>>>
> > >>>>> I don't think we should create a replacement for the DataStream API
> > >>>> unless
> > >>>>>> we have a very good reason to do so and with a proper discussion
> > about
> > >>>>> this
> > >>>>>> as Alex said.
> > >>>>>
> > >>>>> The ProcessFunction API which is targeting to replace DataStream
> API
> > is
> > >>>>> still a proposal, not a decision. Sorry for the confusion, I should
> > >> have
> > >>>>> been more careful with my words, not giving the impression that
> this
> > is
> > >>>>> something we'll do anyway.
> > >>>>>
> > >>>>> There will be a FLIP describing the motivations and designs in
> > detail,
> > >>>> for
> > >>>>> the community to discuss and vote on. We are still working on it.
> > TBH,
> > >>>> this
> > >>>>> is not trivial and we would need more time on it.
> > >>>>>
> > >>>>> Just to quickly share some backgrounds:
> > >>>>>
> > >>>>>    - We see quite some problems with the current DataStream APIs
> > >>>>>       - Users are working with concrete classes rather than
> > >> interfaces,
> > >>>>>       which means
> > >>>>>       - Users can access methods that are designed to be used by
> > >> internal
> > >>>>>          classes, even though they are annotated with `@Internal`.
> > >> E.g.,
> > >>>>>          `DataStream#getTransformation`.
> > >>>>>          - Changes to the non-API implementations (e.g.,
> > >>>> `Transformation`)
> > >>>>>          would affect the API classes (e.g., `DataStream`), which
> > >>>>> makes it hard to
> > >>>>>          provide binary compatibility.
> > >>>>>       - Internal classes are used as parameter / return-value of
> > >> public
> > >>>>>       APIs. E.g., while `AbstractStreamOperator` is PublicEvolving,
> > >>>>> `StreamTask`
> > >>>>>       which returns from `AbstractStreamOperator#getContainingTask`
> > is
> > >>>>> Internal.
> > >>>>>       - In many cases, users are asked to extend the API classes,
> > >> rather
> > >>>>>       than implementing interfaces. E.g., `AbstractStreamOperator`.
> > >>>>>          - Any changes to the base classes, even the internal part,
> > >> may
> > >>>>>          affect the behavior of the user-provided sub-classes
> > >>>>>          - Users can override the behavior of the base classes
> > >>>>>       - The API module `flink-streaming-java` contains non-API
> > >> classes,
> > >>>> and
> > >>>>>       depends on internal modules such as `flink-runtime`, which
> > means
> > >>>>>       - Changes to the internal modules may affect the API modules,
> > >> which
> > >>>>>          requires users to re-build their applications upon
> upgrading
> > >>>>>          - The artifact user needs for building their application
> > >> larger
> > >>>>>          than necessary.
> > >>>>>       - We probably should not expose operators (e.g.,
> > >>>>>       `AbstractStreamOperator`) to users. Functions should be
> enough
> > >>>>> for users to
> > >>>>>       define their data processing logics. Exposing operator-level
> > >>>> concepts
> > >>>>>       (e.g., mailbox thread model, checkpoint barrier alignment,
> > >> etc.) is
> > >>>>>       unnecessary and limits the improvement regarding such exposed
> > >>>>> mechanisms
> > >>>>>       with compatibility considerations.
> > >>>>>       - The current DataStream API seems to be a mixture of many
> > >> things,
> > >>>>>       making it hard to understand especially for newcomers. It
> might
> > >> be
> > >>>>> better
> > >>>>>       to re-organize it into several parts: (the taxonomy below are
> > >> just
> > >>>> an
> > >>>>>       example of the, we are still working on this)
> > >>>>>          - The most fundamental stateful stream processing:
> streams,
> > >>>>>          partitions / key, process functions, state,
> timeline-service
> > >>>>>          - An extension for common batch-streaming unified
> functions:
> > >>>> map,
> > >>>>>          flatmap, filter, agg, reduce, join, etc.
> > >>>>>          - An extension for windowing supports:  window, triggering
> > >>>>>          - An extension for event-time supports: event time,
> > watermark
> > >>>>>          - The extensions are like short-cuts / sugars, without
> which
> > >>>> users
> > >>>>>          can probably still achieve the same behavior by working
> with
> > >> the
> > >>>>>          fundamental APIs, but would be a lot easier with the
> > >> extensions
> > >>>>>       - The original plan was to do in-place refactors / changes on
> > >>>>>    DataStream API. Some related items are listed in this doc [2]
> > >> attached
> > >>>>> to
> > >>>>>    the kicking off email [3]. Not all of the above issues are
> listed,
> > >>>>> because
> > >>>>>    we haven't looked into this as deeply as now  by that time.
> > >>>>>    - We proposed this as a new API rather than in-place refactors
> in
> > >> the
> > >>>>>    2.0 work item list, because we realized the changes might be too
> > >> big
> > >>>>> for an
> > >>>>>    in-place change. First having a new API then gradually retiring
> > the
> > >>>> old
> > >>>>> one
> > >>>>>    would help users to smoothly migrate between them.
> > >>>>>
> > >>>>> A thorough discussion is definitely needed once the FLIP is out.
> And
> > of
> > >>>>> course it's possible that the FLIP might be rejected. Given that we
> > are
> > >>>>> planning for release 2.0, I just feel it would be better to bring
> > this
> > >> up
> > >>>>> early even the concrete plan is not yet ready,
> > >>>>>
> > >>>>> Best,
> > >>>>>
> > >>>>> Xintong
> > >>>>>
> > >>>>>
> > >>>>> [1]
> https://lists.apache.org/thread/vmhzv8fcw2b33pqxp43486owrxbkd5x9
> > >>>>> [2]
> > >>>>>
> > >>>>>
> > >>>>
> > >>
> >
> https://docs.google.com/document/d/1_PMGl5RuDQGlV99_gL3y7OiRsF0DgCk91Coua6hFXhE/edit?usp=sharing
> > >>>>> [3]
> https://lists.apache.org/thread/b8w5cx0qqbwzzklyn5xxf54vw9ymys1c
> > >>>>>
> > >>>>> On Tue, Jun 27, 2023 at 5:15 PM Gyula Fóra <gyf...@apache.org>
> > wrote:
> > >>>>>
> > >>>>>> Hey!
> > >>>>>>
> > >>>>>> I share the same concerns mentioned above regarding the
> > >>>> "ProcessFunction
> > >>>>>> API".
> > >>>>>>
> > >>>>>> I don't think we should create a replacement for the DataStream
> API
> > >>>>> unless
> > >>>>>> we have a very good reason to do so and with a proper discussion
> > about
> > >>>>> this
> > >>>>>> as Alex said.
> > >>>>>>
> > >>>>>> Cheers,
> > >>>>>> Gyula
> > >>>>>>
> > >>>>>> On Tue, Jun 27, 2023 at 11:03 AM Alexander Fedulov <
> > >>>>>> alexander.fedu...@gmail.com> wrote:
> > >>>>>>
> > >>>>>>> Hi Xintong,
> > >>>>>>>
> > >>>>>>> By compatibility discussion do you mean the "[DISCUSS] FLIP-321:
> > >>>>>> Introduce
> > >>>>>>> an API deprecation process" thread [1]?
> > >>>>>>>
> > >>>>>>> I am also curious to know if the rationale behind this new API
> has
> > >>>> been
> > >>>>>>> previously discussed on the mailing list. Do we have a list of
> > >>>>>> shortcomings
> > >>>>>>> in the current DataStream API that it tries to resolve? How does
> > the
> > >>>>>>> current ProcessFunction functionality fit into the picture? Will
> it
> > >>>> be
> > >>>>>> kept
> > >>>>>>> as is or subsumed by new API?
> > >>>>>>>
> > >>>>>>> [1]
> > https://lists.apache.org/thread/vmhzv8fcw2b33pqxp43486owrxbkd5x9
> > >>>>>>>
> > >>>>>>> Best,
> > >>>>>>> Alex
> > >>>>>>>
> > >>>>>>> On Mon, 26 Jun 2023 at 14:33, Xintong Song <
> tonysong...@gmail.com>
> > >>>>>> wrote:
> > >>>>>>>>> The ProcessFunction API item is giving me the most headaches
> > >>>>> because
> > >>>>>>> it's
> > >>>>>>>>> very unclear what it actually entails; like is it an entirely
> > >>>>>> separate
> > >>>>>>>> API
> > >>>>>>>>> to DataStream (sounds like it is!) or an extension of
> DataStream.
> > >>>>> How
> > >>>>>>>> much
> > >>>>>>>>> will it share the internals with DataStream etc.; how does it
> > >>>>> relate
> > >>>>>> to
> > >>>>>>>> the
> > >>>>>>>>> Table API (w.r.t. switching APIs / what Table API uses
> > >>>> underneath).
> > >>>>>>>> I totally understand your confusion. We started planning this
> > after
> > >>>>>>> kicking
> > >>>>>>>> off the release 2.0, so there's still a lot to be explored and
> the
> > >>>>> plan
> > >>>>>>>> keeps changing.
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>>    - In the beginning, we planned to do an in-place refactor of
> > >>>>>>> DataStream
> > >>>>>>>>    API, until the API migration period is proposed.
> > >>>>>>>>    - Then we want to make it an entirely separate API to
> > >>>> DataStream,
> > >>>>>> and
> > >>>>>>>>    listed as a must-have for release 2.0 so that we can remove
> > >>>>>> DataStream
> > >>>>>>>> once
> > >>>>>>>>    it's ready.
> > >>>>>>>>    - However, depending on the outcome of the API compatibility
> > >>>>>>> discussion
> > >>>>>>>>    [1], we may not be able to remove DataStream in 2.0 anyway,
> > >>>> which
> > >>>>>>> means
> > >>>>>>>> we
> > >>>>>>>>    might need to re-evaluate the necessity of this item for 2.0.
> > >>>>>>>>
> > >>>>>>>> I'd say we wait a bit longer for the compatibility discussion
> [1]
> > >>>> and
> > >>>>>>>> decide the priority for this item afterwards.
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>> Best,
> > >>>>>>>>
> > >>>>>>>> Xintong
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>> [1] https://lists.apache.org/list.html?dev@flink.apache.org
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>> On Mon, Jun 26, 2023 at 6:00 PM Chesnay Schepler <
> > >>>> ches...@apache.org
> > >>>>>>>> wrote:
> > >>>>>>>>
> > >>>>>>>>> by-and-large I'm quite happy with the list of items.
> > >>>>>>>>>
> > >>>>>>>>> I'm curious as to why the "Disaggregated State Management" item
> > >>>> is
> > >>>>>>> marked
> > >>>>>>>>> as a must-have; will it require changes that break something?
> > >>>> What
> > >>>>>>>> prevents
> > >>>>>>>>> it from being added in 2.1?
> > >>>>>>>>>
> > >>>>>>>>> We may want to update the Java 17 item to "Make Java 17 the
> > >>>>> default,
> > >>>>>>> drop
> > >>>>>>>>> Java 8/11". Maybe even split it into a must-have "Drop Java 8"
> > >>>> and
> > >>>>> a
> > >>>>>>>>> nice-to-have "Drop Java 11"?
> > >>>>>>>>>
> > >>>>>>>>> "Move Calcite rules from Scala to Java": I would hope that this
> > >>>>> would
> > >>>>>>> be
> > >>>>>>>>> an entirely internal change, and could thus be an incremental
> > >>>>> process
> > >>>>>>>>> independent of major releases.
> > >>>>>>>>> What is the actual scale of this item; how much are we actually
> > >>>>>>>> re-writing?
> > >>>>>>>>> "Add MetricGroup#getLogicalScope": I'd raise this to a
> > >>>> must-have; i
> > >>>>>>> think
> > >>>>>>>>> I marked it down as nice-to-have only because it depends on
> > >>>> another
> > >>>>>>> item.
> > >>>>>>>>> The ProcessFunction API item is giving me the most headaches
> > >>>>> because
> > >>>>>>> it's
> > >>>>>>>>> very unclear what it actually entails; like is it an entirely
> > >>>>>> separate
> > >>>>>>>> API
> > >>>>>>>>> to DataStream (sounds like it is!) or an extension of
> DataStream.
> > >>>>> How
> > >>>>>>>> much
> > >>>>>>>>> will it share the internals with DataStream etc.; how does it
> > >>>>> relate
> > >>>>>> to
> > >>>>>>>> the
> > >>>>>>>>> Table API (w.r.t. switching APIs / what Table API uses
> > >>>> underneath).
> > >>>>>>>>> There are a few items I added as ideas which don't have a
> > >>>> priority
> > >>>>>> yet;
> > >>>>>>>>> would love to get some feedback on those.
> > >>>>>>>>>
> > >>>>>>>>> On 21/06/2023 08:41, Xintong Song wrote:
> > >>>>>>>>>
> > >>>>>>>>> Hi devs,
> > >>>>>>>>>
> > >>>>>>>>> As previously discussed in [1], we had been collecting work
> item
> > >>>>>>>> proposals
> > >>>>>>>>> for the 2.0 release until June 15th, on the wiki page [2].
> > >>>>>>>>>
> > >>>>>>>>>    - As we have passed the due date, I'd like to kindly remind
> > >>>>>> everyone
> > >>>>>>>> *not
> > >>>>>>>>>    to add / remove items directly on the wiki page*. If needed,
> > >>>>>> please
> > >>>>>>>> post
> > >>>>>>>>>    in this thread or reach out to the release managers instead.
> > >>>>>>>>>    - I've reached out to some folks for clarifications about
> > >>>> their
> > >>>>>>>>>    proposals. Some of them mentioned that they can not yet tell
> > >>>>>> whether
> > >>>>>>>> we
> > >>>>>>>>>    should do an item or not, and would need more time /
> > >>>> discussions
> > >>>>>> to
> > >>>>>>>> make
> > >>>>>>>>>    the decision. So I added a new symbol for items whose
> > >>>> priorities
> > >>>>>> are
> > >>>>>>>> `TBD`.
> > >>>>>>>>> Now it's time to collaboratively decide a minimum set of
> > >>>> must-have
> > >>>>>>> items.
> > >>>>>>>>> I've gone through the entire list of proposed items, and found
> > >>>> most
> > >>>>>> of
> > >>>>>>>> them
> > >>>>>>>>> make quite much sense. So I think an online sync might not be
> > >>>>>> necessary
> > >>>>>>>> for
> > >>>>>>>>> this. I'd like to go with this DISCUSS thread, where everyone
> can
> > >>>>>>> comment
> > >>>>>>>>> on how they think the list can be improved, followed by a VOTE
> to
> > >>>>>>>> formally
> > >>>>>>>>> make the decision.
> > >>>>>>>>>
> > >>>>>>>>> Any feedback and opinions, including but not limited to the
> > >>>>> following
> > >>>>>>>>> aspects, will be appreciated.
> > >>>>>>>>>
> > >>>>>>>>>    - Important items that are missing from the list
> > >>>>>>>>>    - Concerns regarding the listed items or their priorities
> > >>>>>>>>>
> > >>>>>>>>> Looking forward to your feedback.
> > >>>>>>>>>
> > >>>>>>>>> Best,
> > >>>>>>>>>
> > >>>>>>>>> Xintong
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>> [1]
> > >>>>
> > >>
> >
> https://lists.apache.org/list?dev@flink.apache.org:lte=1M:release%202.0%20status%20updates
> > >>>>>>>>> [2]
> > >>>> https://cwiki.apache.org/confluence/display/FLINK/2.0+Release
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>
> > >>>> --
> > >>>> Best regards,
> > >>>> Sergey
> > >>>>
> > >>
> > >>
> >
> >
>

Reply via email to