Hi all,

@Xintong Song
Thanks for reminding me, I would contact Jark to update the wiki page.

Besides, I'd like to provide more inputs by sharing our experience about
upgrading Internal version of Flink.

Flink has been widely used in the production environment since 2018 in our
company. Our internal version is far behind the latest stable version of
the community by about 1 year. We upgraded the internal Flink version to
1.10 version in March last year, and we plan to upgrade directly to 1.13
next month (missed 1.11 and 1.12 versions). We wish to use the latest
version as soon as possible. However, in fact we follow up with the
community's latest stable release version almost once a year because
upgrading to a new version is a time-consuming process.

I list detailed works as follows.

a. Before release new internal version
1) Required: Cherrypick internal features to the new Flink branch. A few
features need to be redeveloped based on the new branch code base.
    BTW, The cost would be more and more heavy since we maintain more and
more internal features in our internal version.
2) Optional: Some internal connectors need to adapt to the new API
3) Required: Surrounding products need to updated based on the new API, for
example, Internal Flink SQL WEB development platform
4) Required: Regression tests

b. After release, encourage users to upgrade existing jobs (Thousands of
jobs) to the new version, User need some time to do :
1) Repackage jar for dataStream job
2) For critical jobs, users need to run jobs at the two versions at the
same time for a while. Migrated to a new job only after comparing the
data carefully.
3) Pure ETL SQL jobs are easy to bump up. But other Flink SQL jobs with
stateful operators need extra efforts because Flink SQL Job does not
support state compatibility yet.

Best regards,
JING ZHANG

Prasanna kumar <prasannakumarram...@gmail.com> 于2021年6月4日周五 下午2:27写道:

> Hi all,
>
> We are using Flink for our eventing system. Overall we are very happy with
> the tech, documentation and community support and quick replies in mails.
>
> My last 1 year experience with versions.
>
> We were working on 1.10 initially during our research phase then we
> stabilised with 1.11 as we moved on but by the time we are about to get
> into production 1.12 was released. As with all software and products,
> there were bugs reported. So we waited till 1.12.2 was released and then
> upgraded. Within a month of us doing it 1.13 got released.
>
> But by past experience , we waited till at least a couple of minor
> versions(fixing bugs) get released before we move onto a newer version.
> The development happens at a rapid/good phase in flink (which is good in
> terms of features) but adoption and moving the production code to newer
> version 3/4 times a year is an onerous effort. For example , the memory
> model was changed in one of the releases (there is a good documentation) .
> But as a production user to adopt the newer version, at least a month of
> testing is required with a huge scale environment. We also do not want to
> be behind more than 2 versions at any point of time.
>
> I Personally feel 2 major releases a year or at max a release once 5 months
> is good.
>
> Thanks
> Prasanna.
>
> On Fri, Jun 4, 2021 at 9:38 AM Xintong Song <tonysong...@gmail.com> wrote:
>
> > Thanks everyone for the feedback.
> >
> > @Jing,
> > Thanks for the inputs. Could you please ask a committer who works
> together
> > with you on these items to fill them into the feature collecting wiki
> page
> > [1]? I assume Jark, who co-edited the flip wiki page, is working with
> you?
> >
> > @Kurt, @Till and @Seth,
> > First of all, a few things that potentially demotivate users from
> > upgrading, observed from users that I've been in touch with.
> > 1. It takes time for Flink major releases to get stabilized. Many users
> > tend to waitting for the bugfix releases (x.y.1/2, or even x.y.3/4)
> rather
> > than upgrading to x.y.0 immediately. This could take months, sometimes
> even
> > after the next major release.
> > 2. Many users maintain an internal version of Flink, with customized
> > features for their specific businesses. For them, upgrading Flink
> requires
> > significant efforts to rebase those customized features. On the other
> hand,
> > the more versions they are left behind, the harder to contribute those
> > features to the community, becoming a vicious cycle.
> >
> > I think the question to be answered is how do we prioritize between
> > stabilizing a previous major release and casting a new major release. So
> > far, it feels like the new release is prior. I recall that we have waited
> > for weeks to release 1.11.3 because people were busy stabilizing 1.12.0.
> > What if more resources are lean to the bugfix releases? We may have a
> more
> > explicit schedule for the bugfix releases. E.g., try to always release
> the
> > first bugfix release 2 weeks after the major release, the second bugfix
> > release 4 weeks after that, and release on-demand starting from the third
> > bugfix release. Or some other rules like this. Would that help speeding
> up
> > the stabilization of release and give the users more confidence to
> upgrade
> > earlier?
> >
> > A related question is how do we prioritize between casting a release and
> > motivating more contributors. According to my experience, what Kurt
> > described, that committers cannot help contributors due to "planned
> > features", usually happens during the release testing period or right
> > before that (when people are struggling to catch the feature freeze).
> This
> > probably indicates that currently casting a release timely is prioritized
> > over the contributor's experience. Do we need to change that?
> >
> > If extending the release period does not come in a way that simply more
> > features are pushed into each release, but rather allowing a longer
> period
> > for the release to get stabilized while leaving more capacity for bugfix
> > releases and helping contributors, it might be a good idea. To be
> specific,
> > currently we have the 4 months period as 3 months feature developing + 1
> > month release testing. We might consider a 5 months period as 3 months
> > feature developing + 2 month release testing.
> >
> > To sum up, I'm leaning towards extending the overall release period a
> bit,
> > while keeping the period before feature freeze. WDYT?
> >
> > Thank you~
> >
> > Xintong Song
> >
> >
> > [1] https://cwiki.apache.org/confluence/display/FLINK/1.14+Release
> >
> > On Thu, Jun 3, 2021 at 9:00 PM Seth Wiesman <sjwies...@gmail.com> wrote:
> >
> > > Hi Everyone,
> > >
> > > +1 for the Release Managers. Thank you all for volunteering.
> > >
> > > @Till Rohrmann <trohrm...@apache.org> A common sentiment that I have
> > heard
> > > from many users is that upgrading off of 1.9 was very difficult. In
> > > particular, a lot of people struggled to understand the new memory
> model.
> > > Many users who required custom memory configurations in earlier
> versions
> > > assumed they should carry those configurations into latter versions and
> > > then found themselves with OOM and instability issues. The good news is
> > > Flink did what it was supposed to do and so for the majority dropping
> > their
> > > custom configurations and just setting total process memory was the
> > correct
> > > solution; this was not an issue of a buggy release. The problem is
> people
> > > do not read the release notes or fully understood the implications of
> the
> > > change. Back to Kurt's point, this transition seems to have left a bad
> > > taste in many mouths, slowing some user's adoption of newer versions. I
> > > don't know I have a solution to this problem. I think it is more
> > > communication than engineering, but I'm open to continuing the
> > discussion.
> > >
> > > On Thu, Jun 3, 2021 at 5:04 AM Till Rohrmann <trohrm...@apache.org>
> > wrote:
> > >
> > > > Thanks for volunteering as our release managers Xintong, Dawid and
> Joe!
> > > >
> > > > Thanks for starting the discussion about the release date Kurt.
> > > Personally,
> > > > I prefer in general shorter release cycles as it allows us to deliver
> > > > features faster and people feel less pressure to merge half-done
> > features
> > > > last minute because they fear that they have to wait a long time if
> > they
> > > > missed the train. Also, it forces us to make the release process less
> > of
> > > a
> > > > stop-the-world event and cut down the costs of releases.
> > > >
> > > > On the other hand, if our users don't upgrade Flink fast enough, then
> > > > releasing more often won't have the effect of shipping features to
> our
> > > > users and getting feedback faster from our users faster. What I
> believe
> > > we
> > > > should try to do is to understand why upgrading Flink is so difficult
> > for
> > > > them. What are the things preventing a quick upgrade and how can we
> > > improve
> > > > the situation for our users? Are our APIs not stable enough? Does
> > Flink's
> > > > behavior changes too drastically between versions? Is the tooling for
> > > > upgrades lacking behind? Are they just cautious and don't want to use
> > > > bleeding edge software?
> > > >
> > > > If there is a problem that the majority of users is using an
> > unsupported
> > > > version, then one solution could also be to extend the list of
> > supported
> > > > Flink versions to the latest 3 versions, for example.
> > > >
> > > > About your 2) point I am a bit skeptical. I think that we will simply
> > > plan
> > > > more features and end up in the same situation wrt external
> > > contributions.
> > > > If it weren't the case, then it would also work with shorter release
> > > cycles
> > > > by simply planning less feature work and including the external
> > > > contribution, which could not be done in the past release, in the
> next
> > > > release. So in the end it is about what we plan for a release and not
> > so
> > > > much how much time we have (assuming that we plan less if we have
> less
> > > time
> > > > and vice versa).
> > > >
> > > > Cheers,
> > > > Till
> > > >
> > > > On Thu, Jun 3, 2021 at 5:08 AM Kurt Young <ykt...@gmail.com> wrote:
> > > >
> > > > > Thanks for bringing this up.
> > > > >
> > > > > I have one thought about the release period. In a short word: shall
> > we
> > > > try
> > > > > to extend the release period for 1 month?
> > > > >
> > > > > There are a couple of reasons why I want to bring up this proposal.
> > > > >
> > > > > 1) I observed that lots of users are actually far behind the
> current
> > > > Flink
> > > > > version. For example, we are now actively
> > > > > developing 1.14 but most users I know who have a migration or
> upgrade
> > > > plan
> > > > > are planning to upgrade to 1.12. This means
> > > > > we need to back port bug fixes to 1.12 and 1.13. If we extend the
> > > release
> > > > > period by 1 month, I think there may be some
> > > > > chances that users can have a proper time frame to upgrade to the
> > > > previous
> > > > > released version. Then we can have a
> > > > > good development cycle which looks like "actively developing the
> > > current
> > > > > version and making the previous version stable,
> > > > > not 2 ~ 3 versions before". Always far away from Flink's latest
> > version
> > > > > also suppresses the motivation to contribute to Flink
> > > > > from users perspective.
> > > > >
> > > > > 2) Increasing the release period also eases the workload of
> > committers
> > > > > which I think can improve the contributor experience.
> > > > > I have seen several times that when some contributors want to do
> some
> > > new
> > > > > features or improvements, we have to response
> > > > > with "sorry we are right now focusing with implementing/stabilizing
> > > > planned
> > > > > feature for this version", and the contributions are
> > > > > mostly like being stalled and never brought up again.
> > > > >
> > > > > BTW extending the release period also has downsides. It slows down
> > the
> > > > > delivery speed of new features. And I'm also not
> > > > > sure how much it can improve the above 2 issues.
> > > > >
> > > > > Looking forward to hearing some feedback from the community, both
> > users
> > > > and
> > > > > developers.
> > > > >
> > > > > Best,
> > > > > Kurt
> > > > >
> > > > >
> > > > > On Wed, Jun 2, 2021 at 8:39 PM JING ZHANG <beyond1...@gmail.com>
> > > wrote:
> > > > >
> > > > > > Hi Dawid, Joe & Xintong,
> > > > > >
> > > > > > Thanks for starting the discussion.
> > > > > >
> > > > > > I would like to polish Window TVFs[1][2] which is a popular
> feature
> > > in
> > > > > SQL
> > > > > > introduced in 1.13.
> > > > > >
> > > > > > The detailed items are as follows.
> > > > > > 1. Add more computations based on Window TVF
> > > > > >     * Window Join (which is already merged in master branch)
> > > > > >     * Window Table Function
> > > > > >     * Window Deduplicate
> > > > > > 2. Finish related JIRA to improve user experience
> > > > > >    * Add offset support for TUMBLE, HOP, session window
> > > > > > 3. Complement the missing functions compared to the group window,
> > > which
> > > > > is
> > > > > > a precondition of deprecating the legacy Grouped Window Function
> in
> > > the
> > > > > > later versions.
> > > > > >    * Support Session windows
> > > > > >    * Support allow-lateness
> > > > > >    * Support retract input stream
> > > > > >    * Support window TVF in batch mode
> > > > > >
> > > > > > [1] https://issues.apache.org/jira/browse/FLINK-19604
> > > > > > [2]
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-145%3A+Support+SQL+windowing+table-valued+function#FLIP145:SupportSQLwindowingtablevaluedfunction-CumulatingWindows
> > > > > >
> > > > > > Best regards,
> > > > > > JING ZHANG
> > > > > >
> > > > > > Xintong Song <xts...@apache.org> 于2021年6月2日周三 下午6:45写道:
> > > > > >
> > > > > > > Hi all,
> > > > > > >
> > > > > > > As 1.13 has been released for a while, I think it is a good
> time
> > to
> > > > > start
> > > > > > > planning for the 1.14 release cycle.
> > > > > > >
> > > > > > > - Release managers: This time we'd like to have a team of 3
> > release
> > > > > > > managers. Dawid, Joe and I would like to volunteer for it. What
> > do
> > > > you
> > > > > > > think about it?
> > > > > > >
> > > > > > > - Timeline: According to our approximate 4 months release
> period,
> > > we
> > > > > > > propose to aim for a feature freeze roughly in early August
> > (which
> > > > > could
> > > > > > > mean something like early September for the 1.14. release).
> Does
> > it
> > > > > work
> > > > > > > for everyone?
> > > > > > >
> > > > > > > - Collecting features: It would be helpful to have a rough
> > overview
> > > > of
> > > > > > the
> > > > > > > new features that will likely be included in this release. We
> > have
> > > > > > created
> > > > > > > a wiki page [1] for collecting such information. We'd like to
> > > kindly
> > > > > ask
> > > > > > > all committers to fill in the page with features that they
> intend
> > > to
> > > > > work
> > > > > > > on.
> > > > > > >
> > > > > > > We would also like to emphasize some aspects of the engineering
> > > > > process:
> > > > > > >
> > > > > > > - Stability of master: This has been an issue during the 1.13
> > > feature
> > > > > > > freeze phase and it is still going on. We encourage every
> > committer
> > > > to
> > > > > > not
> > > > > > > merge PRs through the Github button, but do this manually, with
> > > > caution
> > > > > > for
> > > > > > > the commits merged after the CI being triggered. It would be
> > > > > appreciated
> > > > > > to
> > > > > > > always build the project before merging to master.
> > > > > > >
> > > > > > > - Documentation: Please try to see documentation as an
> integrated
> > > > part
> > > > > of
> > > > > > > the engineering process and don't push it to the feature freeze
> > > phase
> > > > > or
> > > > > > > even after. You might even think about going documentation
> first.
> > > We,
> > > > > as
> > > > > > > the Flink community, are adding great stuff, that is pushing
> the
> > > > limits
> > > > > > of
> > > > > > > streaming data processors, with every release. We should also
> > make
> > > > this
> > > > > > > stuff usable for our users by documenting it well.
> > > > > > >
> > > > > > > - Promotion of 1.14: What applies to documentation also applies
> > to
> > > > all
> > > > > > the
> > > > > > > activity around the release. We encourage every contributor to
> > also
> > > > > think
> > > > > > > about, plan and prepare activities like blog posts and talk,
> that
> > > > will
> > > > > > > promote and spread the release once it is done.
> > > > > > >
> > > > > > > Please let us know what you think.
> > > > > > >
> > > > > > > Thank you~
> > > > > > > Dawid, Joe & Xintong
> > > > > > >
> > > > > > > [1]
> > https://cwiki.apache.org/confluence/display/FLINK/1.14+Release
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Reply via email to