Re: [DISCUSS] Planning Flink 2.0

Chesnay Schepler Wed, 26 Apr 2023 01:08:11 -0700

> /Instead of defining compatibility guarantees as "this API won'tchange in all 1.x/2.x series", what if we define it as "this API won'tchange in the next 2/3 years"./

I can see some benefits to this approach (all APIs having a fixedminimum lifetime) but it's just gonna be difficult to communicate.Technically this implies that every minor release may contain breakingchanges, which is exactly what users don't want.


What problems to do you see in creating major releases every N years?

> /IIUC, the milestone releases are a breakdown of the 2.0 release,while we are free to introduce breaking changes between them. And yousuggest using longer-living feature branches to keep the master branchin a releasable state (in terms of milestone releases). Am Iunderstanding it correctly?/

I think you got the general idea. There are a lot of details to beironed out though (e.g., do we release connectors for each milestone?...).

Conflicts in the long-lived branches are certainly a concern, but Ithink those will be inevitable. Right now I'm not _too_ worried aboutthem, at least based on my personal wish-list.Maybe the milestones could even help with that, as we could preemptivelydecide on an order for certain changes that have a high chance ofconflicting with each other?

I guess we could do that anyway.

Maybe we should explicitly evaluate how invasive a change is (inrelation to other breaking changes!) and manage things accordingly



Other thoughts:

We need to figure out what this release means for connectorscompatibility-wise. The current rules for which versions a connectormust support don't cover major releases at all.(This depends a bit on the scope of 2.0; if we add binary compatibilityto Public APIs and promote a few Evolving ones then compatibility acrossminor releases becomes trivial)

What process are you thinking of for deciding what breaking changes tomake? The obvious choice would be FLIPs, but I'm worried that this willoverload the mailing list / wiki for lots of tiny changes.

Provided that we agree on doing 2.0, when would we cut the 2.0 branch?Would we wait a few months for people to prepare/agree on changes so wereduce the time we need to merge things into 2 branches?


On 26/04/2023 05:51, Xintong Song wrote:

Thanks all for the positive feedback.

@Martijn

If we want to have that roadmap, should we consolidate this into a

dedicated Confluence page over storing it in a Google doc?

Having a dedicated wiki page is definitely a good way for the roadmap
discussion. I haven't created one yet because it's still a proposal to have
such roadmap discussion. If the community agrees with our proposal, the
release manager team can decide how they want to drive and track the
roadmap discussion.

@Chesnay

We should discuss how regularly we will ship major releases from now on.

Let's avoid again making breaking changes because we "gotta do it now
because 3.0 isn't happening anytime soon". (e.g., every 2 years or
something)


I'm not entirely sure about shipping major releases regularly. But I do
agree that we may want to avoid the situation that "breaking changes can
only happen now, or no idea when". Instead of defining compatibility
guarantees as "this API won't change in all 1.x/2.x series", what if we
define it as "this API won't change in the next 2/3 years". That should
allow us to incrementally iterate the APIs.

E.g., in 2.a, all APIs annotated as `@Stable` will be guaranteed compatible
until 2 years after 2.a is shipped, and in 2.b if the API is still
annotated `@Stable` it extends the compatibility guarantee to 2 years after
2.b is shipped. To remove an API, we would need to mark it as `@Deprecated`
and wait for 2 years after the last release in which it was marked
`@Stable`.

My thinking goes rather in the area of defining Milestone releases, each

Milestone targeting specific changes.

I'm trying to understand what you are suggesting here. IIUC, the milestone
releases are a breakdown of the 2.0 release, while we are free to introduce
breaking changes between them. And you suggest using longer-living feature
branches to keep the master branch in a releasable state (in terms of
milestone releases). Am I understanding it correctly?

I haven't thought this through. My gut feeling is this might be a good
direction to go, in terms of keeping things organized. The risk is the cost
of merging feature branches and rebasing feature branches after other
features are merged. That depends on how close the features are related to
each other. E.g., reorganization of the project modules and dependencies
may change the project structure a lot, which may significantly affect most
of the feature branches. Maybe we can identify such widely-affecting
changes and perform them at the beginning or end of the release cycle.

Best,

Xintong



On Wed, Apr 26, 2023 at 8:23 AM ConradJam<jam.gz...@gmail.com>  wrote:

Thanks Xintong and Jark for kicking off the great discussion!

I checked the list carefully. The plans are detailed and most of the
problems are covered
Some of the ideas Chesnay mentioned, I think we should iterate in
small steps and collect feedback in time
Looking forward to the start of the work of Flink2.0, I am willing to
provide assistance ~

Xintong Song<tonysong...@gmail.com>  于2023年4月25日周二 19:10写道：

Hi everyone,

I'd like to start a discussion on planning for a Flink 2.0 release.

AFAIK, in the past years this topic has been mentioned from time to time,
in mailing lists, jira tickets and offline discussions. However, few
concrete steps have been taken, due to the significant determination and
efforts it requires and distractions from other prioritized focuses.

After

a series of offline discussions in the recent weeks, with folks mostly

from

our team internally as well as a few from outside Alibaba / Ververica
(thanks for insights from Becket and Robert), we believe it's time to

kick

this off in the community.

Below are some of our thoughts about the 2.0 release. Looking forward to
your opinions and feedback.


## Why plan for release 2.0?


Flink 1.0.0 was released in March 2016. In the past 7 years, many new
features have been added and the project has become different from what

it

used to be. So what is Flink now? What will it become in the next 3-5
years? What do we think of Flink's position in the industry? We believe
it's time to rethink these questions, and draw a roadmap towards another
milestone, a milestone that worths a new major release.


In addition, we are still providing backwards compatibility (maybe not
perfectly but largely) with APIs that we designed and claimed stable 7
years ago. While such backwards compatibility helps users to stick with

the

latest Flink releases more easily, it sometimes, and more and more over
time, also becomes a burden for maintenance and a limitation for new
features and improvements. It's probably time to have a comprehensive
review and clean-up over all the public APIs.


Furthermore, next year is the 10th year for Flink as an Apache project.
Flink joined the Apache incubator in April 2014, and became a top-level
project in December 2014. That makes 2024 a perfect time for bringing out
the release 2.0 milestone. And for such a major release, we'd expect it
takes one year or even longer to prepare for, which means we probably
should start now.


## What should we focus on in release 2.0?


    - Roadmap discussion - How do we define and position Flink for now and
    in future? This is probably something we lacked. I believe some

people have

    thought about it, but at least it's not explicitly discussed and

aligned in

    the community. Ideally, the 2.0 release should be a result of the

roadmap.

    - Breaking changes - Important improvements, bugfixes, technical debts
    that involve breaking of API backwards compatibility, which can only

be

    carried out in major releases.
       - With breaking API changes, we may need multiple 2.0-alpha/beta
       versions to collect feedback.
    - Key features - Significant features and improvements (e.g., new user
    stories, architectural upgrades) that may change how users use Flink

and

    its position in the industry. Some items from this category may also
    involve API breaking changes or significant behavior changes.
       - There are also opinions that we should stay focused as much as
       possible on the breaking changes only. Incremental / non-breaking
       improvements and features, or anything that can be added in 2.x

minor

       releases, should not block the 2.0 release.

It might be better to discuss the detailed technical items later in

another

thread, to keep the current discussion focused on the overall proposal,

and

to leave time for all parties to think about their technical plans. For
your reference, I've attached a preliminary list of work items proposed

by

Alibaba / Ververica [1]. Note that the listed items are still being
carefully evaluated and prioritized, and may change in future.


## How do we manage the release?


#### Release Process


We'd expect the release process for Flink 2.0 to be different from the

1.x

releases.


A major difference is that, we think the timeline-based release

management

may not be suitable. The idea behind the timeline-based approach is that

we

can have more frequent releases and deliver completed features to users
earlier, while incompleted features can be postponed to the next release
which won't be too late with the short release cycle. However, for

breaking

changes that can only take place in major releases, the price for

missing a

release is too high.


Alternatively, we probably should discuss and agree on a list of

must-have

work items. That doesn't mean keep postponing the release upon a few
delayed features. In fact, we would need to closely monitor the progress

of

the must-have items during the entire release cycle, making sure they are
taken care of by contributors with enough expertise and capacities.


#### Timeline


The release cycle should be decided according to the feature list,
especially the must-have items that we plan to do in the release.

However,

a target feature freeze date would still be helpful when making the plan,
so that we don't pack too many things into the release. We propose to aim
for a feature freeze around mid 2024, so that in case must-have items are
delayed, we still have a good chance to make the release happen by the

end

of 2024.


#### Branch


A longer release cycle also means we probably should keep shiping the 1.x
releases while preparing for the 2.0 release. We may cut a release-1 from
master, on which we can keep developing and release 1.x releases. The
version on the master branch will then become '2.0-SNAPSHOT'.


#### Release Manager


Given the new and to-be-explored release process, longer cycle and higher
synchronization requirements, we'd expect the 2.0 release to be more
challenging than previous 1.x releases. Therefore, we'd like to propose

to

assemble a release management team with 4-5 experienced PMC members. Jark
and I would like to volunteer as 2 of the release managers.


Looking forward to your thoughts.


Best,

Jark & Xintong


[1]

https://docs.google.com/document/d/1_PMGl5RuDQGlV99_gL3y7OiRsF0DgCk91Coua6hFXhE/edit?usp=sharing

--
Best

ConradJam

Re: [DISCUSS] Planning Flink 2.0

Reply via email to