At what point are the FLIP discussions coming into play?

I keep wondering if these shouldn't have started already.
It just seems that a lot of decisions are implicitly reliant on the items even being accepted. Estimates can only be provided if we actually know the scope of the change, but that's not always clear from the description in the doc.

What we need to ensure is that all breaking API changes are discussed/decided before 1.18 is released so we can deprecate affected APIs.

On 10/07/2023 11:32, Xintong Song wrote:
Hi Matthias,

The questions you asked are indeed very important. Here're some quick
responses, based on the plans I had in mind, which I have not aligned with
other release managers yet.

In the previous discussions between the RMs, we were not able to make
proposals on things like how to make a time plan, how to manage the release
branch, etc., due to the lack of inputs on e.g., the work items need to be
included (which transitively depends on the API compatibility to provide
between major versions) and the workloads / time needed for them. With the
recent discussions, we have collected at least the majority of the inputs
needed.

Here are things that I think we as the release managers would do next
(again, not aligned with other release managers yet)
- Creating a time plan, by reaching out to people to understand the
estimated workloads, prerequisites and ETA of each work item.
- Make a proposal on how to manage the release branch, i.e., when to cut
the branch and whether to ship the milestone releases, etc.
- Set-up regular release syncs (bi-weekly / monthly) to update the status
and draw attention to where help is needed.

So back to your questions.

There are still to-be-discussed items in the list of features. What's the
plan with those?
When collecting ETA, for items that the completion time cannot yet be
estimated, we would like to have at least a time by which the estimation
can be made. I think the same applies to the to-be-discussed items. And if
the items should be included as must-haves, we would need another vote to
adjust the must-have item list.

Some of them don't have anyone assigned.
My concern is that they will be overlooked because nobody feels to be in
charge.
This is a tricky one. For must-have items without assignees, we as the
release managers should be responsible for raising them up in the release
syncs, and try to find assignees for them. Hopefully, there will be someone
who stands out. But it is possible that for a must-have item nobody wants
to work on it. If that happens, which I don't think it will, it probably
means the item is not that critical and we may have to exclude it from the
release. Either way, they should not be overlooked, because IMHO release
managers should be responsible for trying to get someone to work on the
un-assigned items.

We'll have more discussions soon and keep the community updated.

Best,

Xintong



On Mon, Jul 10, 2023 at 3:53 PM Matthias Pohl
<matthias.p...@aiven.io.invalid> wrote:

Now that the vote is started on the must-have items: There are still
to-be-discussed items in the list of features. What's the plan with those?
Some of them don't have anyone assigned. Were these items discussed among
the release managers? So far, it looks like they are handled as
nice-to-have if someone volunteers to pick them up?

My concern is that they will be overlooked because nobody feels to be in
charge.

Best,
Matthias

On Fri, Jul 7, 2023 at 11:06 AM Xintong Song <tonysong...@gmail.com>
wrote:

Thanks all for the discussion.

The wiki has been updated as discussed. I'm starting a vote now.

Best,

Xintong



On Wed, Jul 5, 2023 at 9:52 AM Xintong Song <tonysong...@gmail.com>
wrote:
Hi ConradJam,

I think Chesnay has already put his name as the Contributor for the two
tasks you listed. Maybe you can reach out to him to see if you can
collaborate on this.

In general, I don't think contributing to a release 2.0 issue is much
different from contributing to a regular issue. We haven't yet created
JIRA
tickets for all the listed tasks because many of them needs further
discussions and / or FLIPs to decide whether and how they should be
performed.

Best,

Xintong



On Mon, Jul 3, 2023 at 10:37 PM ConradJam <jam.gz...@gmail.com> wrote:

Hi Community:
   I see some tasks in the 2.0 list that haven't been assigned yet. I
want
to take the initiative to take on some tasks that I can complete. How
do I
apply to the community for this part of the task? I am interested in
the
following parts of FLINK-32377
<https://issues.apache.org/jira/browse/FLINK-32377>, do I need to
create
issuse myself and point it to myself?

- the current timestamp, which is problematic w.r.t. caching and
testing,
while providing no value.
- Remove JarRequestBody#programArgs in favor of #programArgsList.

[1] FLINK-32377 <https://issues.apache.org/jira/browse/FLINK-32377>
https://issues.apache.org/jira/browse/FLINK-32377

Teoh, Hong <lian...@amazon.co.uk.invalid> 于2023年6月30日周五 00:53写道:


Teoh, Hong <lian...@amazon.co.uk.invalid> 于2023年6月30日周五 00:53写道:

Thanks Xintong for driving the effort.

I’d add a +1 to reworking configs, as suggested by @Jark and
@Chesnay,
especially the types. We have various configs that encode Time /
MemorySize
that are Long instead!

Regards,
Hong



On 29 Jun 2023, at 16:19, Yuan Mei <yuanmei.w...@gmail.com>
wrote:
CAUTION: This email originated from outside of the organization.
Do
not
click links or open attachments unless you can confirm the sender
and
know
the content is safe.


Thanks for driving this effort, Xintong!

To Chesnay
I'm curious as to why the "Disaggregated State Management" item
is
marked as a must-have; will it require changes that break
something?
What prevents it from being added in 2.1?
As to "Disaggregated State Management".

We plan to provide a new type of state backend to support DFS as
primary
storage.
To achieve this, we at least need to include two parts of amends
(not
entirely sure yet, since we are still in the designing and
prototype
phase)
1. Statebackend Change
2. State Access Change

Not all of the interfaces related are `@Internal`. Some of the
interfaces
like `StateBackend` is `@PublicEvolving`
So, you are right in the sense that "Disaggregated State
Management"
itself
probably does not need to be a "Must Have"

But I was hoping changes that related to public APIs can be
finalized
and
merged in Flink 2.0 (I will fix the wiki accordingly).

I also agree with Jark that 2.0 is a good chance to rework the
default
value of configurations.

Best
Yuan


On Thu, Jun 29, 2023 at 8:43 PM Chesnay Schepler <
ches...@apache.org>
wrote:
Something else configuration-related is that there are a bunch of
options where the type isn't quite correct (e.g., a String where
it
could be an enum, a string where it should be an int or
something).
Could do a pass over those as well.

On 29/06/2023 13:50, Jark Wu wrote:
Hi,

I think one more thing we need to consider to do in 2.0 is
changing
the
default value of configuration to improve out-of-box user
experience.
Currently, in order to run a Flink job, users may need to set
a bunch of configurations, such as minibatch, checkpoint
interval,
exactly-once,
incremental-checkpoint, etc. It's very verbose and hard to use
for
beginners.
Most of them can have a universally applicable value.  Because
changing
the
default value is a breaking change. I think It's worth
considering
changing
them in 2.0.

What do you think?

Best,
Jark


On Wed, 28 Jun 2023 at 14:10, Sergey Nuyanzin <
snuyan...@gmail.com>
wrote:
Hi Chesnay

"Move Calcite rules from Scala to Java": I would hope that
this
would
be
an entirely internal change, and could thus be an incremental
process
independent of major releases.
What is the actual scale of this item; how much are we
actually
re-writing?

Thanks for asking
yes, you're right, that should be internal change.
Yeah I was also thinking about incremental change (rule by rule
or
reasonable small group of rules).
And yes, this could be an independent (on major release)
activity
The problem is actually for children of RelOptRule.
Currently I see 60+ such rules (in Scala) using the mentioned
deprecated
api.
There are also children of ConverterRule (50+) which do not
have
such
issues.
Maybe it could be considered as the next step to have all the
rules in
Java.

On Tue, Jun 27, 2023 at 1:34 PM Xintong Song <
tonysong...@gmail.com>
wrote:

Hi Alex & Gyula,

By compatibility discussion do you mean the "[DISCUSS]
FLIP-321:
Introduce
an API deprecation process" thread [1]?

Yes, I meant the FLIP-321 discussion. I just noticed I pasted
the
wrong
url
in my previous email. Sorry for the mistake.

I am also curious to know if the rationale behind this new API
has
been
previously discussed on the mailing list. Do we have a list
of
shortcomings
in the current DataStream API that it tries to resolve? How
does
the
current ProcessFunction functionality fit into the picture?
Will
it
be
kept
as is or subsumed by new API?

I don't think we should create a replacement for the
DataStream
API
unless
we have a very good reason to do so and with a proper
discussion
about
this
as Alex said.
The ProcessFunction API which is targeting to replace
DataStream
API
is
still a proposal, not a decision. Sorry for the confusion, I
should
have
been more careful with my words, not giving the impression
that
this
is
something we'll do anyway.

There will be a FLIP describing the motivations and designs in
detail,
for
the community to discuss and vote on. We are still working on
it.
TBH,
this
is not trivial and we would need more time on it.

Just to quickly share some backgrounds:

    - We see quite some problems with the current DataStream
APIs
       - Users are working with concrete classes rather than
interfaces,
       which means
       - Users can access methods that are designed to be used
by
internal
          classes, even though they are annotated with
`@Internal`.
E.g.,
          `DataStream#getTransformation`.
          - Changes to the non-API implementations (e.g.,
`Transformation`)
          would affect the API classes (e.g., `DataStream`),
which
makes it hard to
          provide binary compatibility.
       - Internal classes are used as parameter / return-value
of
public
       APIs. E.g., while `AbstractStreamOperator` is
PublicEvolving,
`StreamTask`
       which returns from
`AbstractStreamOperator#getContainingTask`
is
Internal.
       - In many cases, users are asked to extend the API
classes,
rather
       than implementing interfaces. E.g.,
`AbstractStreamOperator`.
          - Any changes to the base classes, even the internal
part,
may
          affect the behavior of the user-provided sub-classes
          - Users can override the behavior of the base classes
       - The API module `flink-streaming-java` contains non-API
classes,
and
       depends on internal modules such as `flink-runtime`,
which
means
       - Changes to the internal modules may affect the API
modules,
which
          requires users to re-build their applications upon
upgrading
          - The artifact user needs for building their
application
larger
          than necessary.
       - We probably should not expose operators (e.g.,
       `AbstractStreamOperator`) to users. Functions should be
enough
for users to
       define their data processing logics. Exposing
operator-level
concepts
       (e.g., mailbox thread model, checkpoint barrier
alignment,
etc.) is
       unnecessary and limits the improvement regarding such
exposed
mechanisms
       with compatibility considerations.
       - The current DataStream API seems to be a mixture of
many
things,
       making it hard to understand especially for newcomers.
It
might
be
better
       to re-organize it into several parts: (the taxonomy
below
are
just
an
       example of the, we are still working on this)
          - The most fundamental stateful stream processing:
streams,
          partitions / key, process functions, state,
timeline-service
          - An extension for common batch-streaming unified
functions:
map,
          flatmap, filter, agg, reduce, join, etc.
          - An extension for windowing supports:  window,
triggering
          - An extension for event-time supports: event time,
watermark
          - The extensions are like short-cuts / sugars,
without
which
users
          can probably still achieve the same behavior by
working
with
the
          fundamental APIs, but would be a lot easier with the
extensions
       - The original plan was to do in-place refactors /
changes
on
    DataStream API. Some related items are listed in this doc
[2]
attached
to
    the kicking off email [3]. Not all of the above issues are
listed,
because
    we haven't looked into this as deeply as now  by that time.
    - We proposed this as a new API rather than in-place
refactors
in
the
    2.0 work item list, because we realized the changes might
be
too
big
for an
    in-place change. First having a new API then gradually
retiring
the
old
one
    would help users to smoothly migrate between them.

A thorough discussion is definitely needed once the FLIP is
out.
And
of
course it's possible that the FLIP might be rejected. Given
that
we
are
planning for release 2.0, I just feel it would be better to
bring
this
up
early even the concrete plan is not yet ready,

Best,

Xintong


[1]
https://lists.apache.org/thread/vmhzv8fcw2b33pqxp43486owrxbkd5x9
[2]


https://docs.google.com/document/d/1_PMGl5RuDQGlV99_gL3y7OiRsF0DgCk91Coua6hFXhE/edit?usp=sharing
[3]
https://lists.apache.org/thread/b8w5cx0qqbwzzklyn5xxf54vw9ymys1c
On Tue, Jun 27, 2023 at 5:15 PM Gyula Fóra <gyf...@apache.org
wrote:
Hey!

I share the same concerns mentioned above regarding the
"ProcessFunction
API".

I don't think we should create a replacement for the
DataStream
API
unless
we have a very good reason to do so and with a proper
discussion
about
this
as Alex said.

Cheers,
Gyula

On Tue, Jun 27, 2023 at 11:03 AM Alexander Fedulov <
alexander.fedu...@gmail.com> wrote:

Hi Xintong,

By compatibility discussion do you mean the "[DISCUSS]
FLIP-321:
Introduce
an API deprecation process" thread [1]?

I am also curious to know if the rationale behind this new
API
has
been
previously discussed on the mailing list. Do we have a list
of
shortcomings
in the current DataStream API that it tries to resolve? How
does
the
current ProcessFunction functionality fit into the picture?
Will it
be
kept
as is or subsumed by new API?

[1]
https://lists.apache.org/thread/vmhzv8fcw2b33pqxp43486owrxbkd5x9
Best,
Alex

On Mon, 26 Jun 2023 at 14:33, Xintong Song <
tonysong...@gmail.com>
wrote:
The ProcessFunction API item is giving me the most
headaches
because
it's
very unclear what it actually entails; like is it an
entirely
separate
API
to DataStream (sounds like it is!) or an extension of
DataStream.
How
much
will it share the internals with DataStream etc.; how does
it
relate
to
the
Table API (w.r.t. switching APIs / what Table API uses
underneath).
I totally understand your confusion. We started planning
this
after
kicking
off the release 2.0, so there's still a lot to be explored
and
the
plan
keeps changing.


    - In the beginning, we planned to do an in-place
refactor
of
DataStream
    API, until the API migration period is proposed.
    - Then we want to make it an entirely separate API to
DataStream,
and
    listed as a must-have for release 2.0 so that we can
remove
DataStream
once
    it's ready.
    - However, depending on the outcome of the API
compatibility
discussion
    [1], we may not be able to remove DataStream in 2.0
anyway,
which
means
we
    might need to re-evaluate the necessity of this item for
2.0.
I'd say we wait a bit longer for the compatibility
discussion
[1]
and
decide the priority for this item afterwards.


Best,

Xintong


[1]
https://lists.apache.org/list.html?dev@flink.apache.org

On Mon, Jun 26, 2023 at 6:00 PM Chesnay Schepler <
ches...@apache.org
wrote:

by-and-large I'm quite happy with the list of items.

I'm curious as to why the "Disaggregated State Management"
item
is
marked
as a must-have; will it require changes that break
something?
What
prevents
it from being added in 2.1?

We may want to update the Java 17 item to "Make Java 17
the
default,
drop
Java 8/11". Maybe even split it into a must-have "Drop
Java
8"
and
a
nice-to-have "Drop Java 11"?

"Move Calcite rules from Scala to Java": I would hope that
this
would
be
an entirely internal change, and could thus be an
incremental
process
independent of major releases.
What is the actual scale of this item; how much are we
actually
re-writing?
"Add MetricGroup#getLogicalScope": I'd raise this to a
must-have; i
think
I marked it down as nice-to-have only because it depends
on
another
item.
The ProcessFunction API item is giving me the most
headaches
because
it's
very unclear what it actually entails; like is it an
entirely
separate
API
to DataStream (sounds like it is!) or an extension of
DataStream.
How
much
will it share the internals with DataStream etc.; how does
it
relate
to
the
Table API (w.r.t. switching APIs / what Table API uses
underneath).
There are a few items I added as ideas which don't have a
priority
yet;
would love to get some feedback on those.

On 21/06/2023 08:41, Xintong Song wrote:

Hi devs,

As previously discussed in [1], we had been collecting
work
item
proposals
for the 2.0 release until June 15th, on the wiki page [2].

    - As we have passed the due date, I'd like to kindly
remind
everyone
*not
    to add / remove items directly on the wiki page*. If
needed,
please
post
    in this thread or reach out to the release managers
instead.
    - I've reached out to some folks for clarifications
about
their
    proposals. Some of them mentioned that they can not yet
tell
whether
we
    should do an item or not, and would need more time /
discussions
to
make
    the decision. So I added a new symbol for items whose
priorities
are
`TBD`.
Now it's time to collaboratively decide a minimum set of
must-have
items.
I've gone through the entire list of proposed items, and
found
most
of
them
make quite much sense. So I think an online sync might not
be
necessary
for
this. I'd like to go with this DISCUSS thread, where
everyone
can
comment
on how they think the list can be improved, followed by a
VOTE to
formally
make the decision.

Any feedback and opinions, including but not limited to
the
following
aspects, will be appreciated.

    - Important items that are missing from the list
    - Concerns regarding the listed items or their
priorities
Looking forward to your feedback.

Best,

Xintong


[1]
https://lists.apache.org/list?dev@flink.apache.org:lte=1M:release%202.0%20status%20updates
[2]
https://cwiki.apache.org/confluence/display/FLINK/2.0+Release

--
Best regards,
Sergey



--
Best

ConradJam


Reply via email to