Thanks for the feedback, guys!

For Jingsong's feedback:
>## Do we have the plan to upgrade calcite to 1.31?

I think we will upgrade Calcite to 1.31 only when Flink depends on
some significant features of Calcite.
 Such as: new syntax PTF (CALCITE-4865).

 >## Is Cherry-pick costly?
>From the experience of maintaining calcite with our company, the cost is small.
We only cherry-pick the bug fixes and needed minor features.
For a major feature, we can choose to upgrade the version.

>## Are the calcite repository costly to maintain?
>From the experience of @Dann y chen (One PMC of Calcite), publishing
is much easier.


For Chesnay's feedback:
I also totally agree that a fork repository will increase the cost of
maintenance.

Usually, the Calcite community releases a version three months or more.
I think it's hard to let Calcite change the release cycle
because Calcite supports many compute engines.


For Konstantin's feedback:
Some changes in Calcite may cause hundreds of plan changes in Flink,
such as: CALCITE-4173.
We must check whether the change is expected, whether there is
performance regression.
Some of the changes are very subtle, especially in the CBO planner.
This situation also occurs similarly within upgrading from 1.1x to 1.22.
If you are not familiar with Flink planner and Calcite, it will be
more difficult to upgrade.


For Xintong's feedback:
You are right, I will connect Yun for some help, Thanks for the suggestions.


For Martijn's feedback:
I'm also against cherry-pick many features code into the fock repository,
and I also totally agree we should collaborate closely with the
Calcite community.
I'm just trying to find an approach which can avoid frequent Calcite upgrades,
but easily support bug fix and minor new feature development.

As for the CALCITE-4865 case, I think we should upgrade the Calcite
version to support PTF.

@Jing zhang, can you share some 'feeling' for CALCITE-4865 ?

Best,
Godfrey

Martijn Visser <martijnvis...@apache.org> 于2022年4月22日周五 17:31写道:
>
> Hi everyone,
>
> Overall I'm against the idea of setting up a Calcite fork for the same
> reasons that Chesnay has mentioned. We've talked extensively about doing an
> upgrade of Calcite during the Flink 1.15 release period, but there was a
> lot of pushback by the maintainers against that because of the required
> efforts. Having our own fork will mean that there will be even more effort
> required, because not only do we need to perform the upgrade on Flink's
> end, we also need to maintain this Calcite fork.
>
> I think what we should do is have a closer collaboration with the Calcite
> community and see if we can also help out with reviewing/merging PRs and
> more frequent releases. What we're seeing is that already features that are
> proposed towards Calcite because we need them for Flink, are not getting
> picked up by the Calcite community. See
> https://issues.apache.org/jira/browse/CALCITE-4865 /
> https://github.com/apache/calcite/pull/2606 which is such an example.
>
> I would rather invest more in collaborating with the Calcite community
> instead of maintaining our own fork. I believe that would help us get new
> features and bug fixes sooner.
>
> Best regards,
>
> Martijn Visser
> https://twitter.com/MartijnVisser82
> https://github.com/MartijnVisser
>
>
> On Fri, 22 Apr 2022 at 10:46, Xintong Song <tonysong...@gmail.com> wrote:
>
> > BTW, I think this proposal sounds similar to FRocksDB, the Flink's custom
> > RocksDB build. Maybe folks maintaining FRocksDB can share some experiences.
> >
> > CC @Yun Tang
> >
> > Thank you~
> >
> > Xintong Song
> >
> >
> >
> > On Fri, Apr 22, 2022 at 4:35 PM Xintong Song <tonysong...@gmail.com>
> > wrote:
> >
> > > Hi Godfrey,
> > >
> > >
> > >> 1. Where to put the code? https://github.com/flink-extended is a good
> > >> place.
> > >
> > >
> > > Please notice that `flink-extended` is not endorsed by the Apache Flink
> > > PMC. That means if the proposed new Calcite repository is hosted there,
> > the
> > > maintenance and release will not be guaranteed by the Apache Flink
> > project.
> > > I guess the question is do we consider another 3rd party Calcite
> > repository
> > > more reliable and convenient than the official Apache Calcite that we
> > want
> > > to depend on.
> > >
> > > Thank you~
> > >
> > > Xintong Song
> > >
> > >
> > >
> > > On Fri, Apr 22, 2022 at 4:07 PM Chesnay Schepler <ches...@apache.org>
> > > wrote:
> > >
> > >> I'm overall against the idea of creating a fork.
> > >> It implies quite some maintenance overhead, like dealing with unstable
> > >> tests, CI, licensing etc. and the overall release overhead.
> > >>
> > >> Is there no alternative where we can collaborate more with the calcite
> > >> guys, like verifying new features so bugs are caught sooner?
> > >>
> > >> On 22/04/2022 09:31, godfrey he wrote:
> > >> > Dear devs,
> > >> >
> > >> > I would like to open a discussion on the fact that currently many
> > >> > Flink SQL function
> > >> >   development relies on Calcite releases, which seriously blocks some
> > >> > Flink SQL's features release.
> > >> > Therefore, I would like to discuss whether it is possible to solve
> > this
> > >> problem
> > >> > by creating Flink's own Calcite repository.
> > >> >
> > >> > Currently, Flink depends on Caclite-1.26, FLIP-204[1] relies on
> > >> Calcite-1.30,
> > >> > and we recently want to support fully join-hints functionatity in
> > >> Flink-1.16,
> > >> > which relies on Calcite-1.31 (maybe two or three months later will be
> > >> released).
> > >> >
> > >> > In order to support some new features or fix some bugs, we need to
> > >> upgrade
> > >> > the Calcite version, but every time we upgrade Calcite version
> > >> > (especially upgrades
> > >> > across multiple versions), the processing is very tough: I remember
> > >> clearly that
> > >> >   the Calcite upgrade from 1.22 to 1.26 took two weeks of full-time to
> > >> complete.
> > >> >
> > >> > Currently, in order to fix some bugs while not upgrading the Calcite
> > >> version,
> > >> > we copy the corresponding Calcite class directly into the Flink
> > project
> > >> > and then modify it accordingly.[2] This approach is rather hacky and
> > >> > hard for code maintenance and upgrades.
> > >> >
> > >> > So, I had an idea whether we could solve this problem by maintaining a
> > >> > Calcite repository
> > >> > in the Flink community. This approach has been practiced within my
> > >> > company for many years.
> > >> >   There are similar practices in the industry. For example, Apache
> > Dill
> > >> > also maintains
> > >> > a separate Calcite repository[3].
> > >> >
> > >> > The following is a brief analysis of the approach and the pros and
> > >> > cons of maintaining a separate repository.
> > >> >
> > >> > Approach:
> > >> > 1. Where to put the code? https://github.com/flink-extended is a good
> > >> place.
> > >> > 2. What extra code can be added to this repository? Only bug fixes and
> > >> features
> > >> > that are already merged into Calcite can be cherry-picked to this
> > >> repository.
> > >> > We also should try to push bug fixes to the Calcite community.
> > >> > Btw, the copied Calcite class in the Flink project can be removed.
> > >> > 3. How to upgrade the Calcite version? Check out the target Calcite
> > >> > release branch
> > >> > and rebase our bug fix code. (As we upgrade, we will maintain fewer
> > >> > and fewer older bug
> > >> > fixes code.) And then, verify all Calcte's tests and Flink's tests in
> > >> > the developer's local
> > >> >   environment. If all tests are OK, release the Calcite branch, or fix
> > >> > it in the branch and re-test.
> > >> >   After the branch is released, then the version of Calcite in Flink
> > >> > can be upgraded. For example:
> > >> >   checkout calcite-1.26.0-flink-v1-SNAPSHOT branch from
> > calcite-1.26.0,
> > >> > move all the copied
> > >> >   Calcite code in Flink to the branch, and pick all the hint related
> > >> > changes from Calcite-1.31 to
> > >> >   the branch. Then we can change the Calcite version in Flink to
> > >> > calcite-1.26.0-flink-v1-SNAPSHOT,
> > >> > and verify all tests in the locale. Release calcite-1.26.0-flink-v1
> > >> > after all tests are successful.
> > >> > At last upgrade the calcite version to
> > >> > calcite-1.26.0-flink-v10-flink-v1, and open a PR.
> > >> > 4. Who will maintain it? The maintenance workload is minimal, but the
> > >> > upgrade work is
> > >> >   laborious (actually, it's similar to before). I can maintain it in
> > >> > the early stage and standardise the processing.
> > >> >
> > >> > Pros.
> > >> > 1. The release of Flink is decoupled from the release of Calcite,
> > >> >   making feature development and bug fix quicker
> > >> > 2. Reduce the hassle of unnecessary calcite upgrades
> > >> > 3. No hacking in Flink to maintain the Calcite copied code
> > >> >
> > >> > cons.
> > >> > 1. Need to maintain an additional Calcite repository
> > >> > 2. The Upgrades are a little more complicated than before
> > >> >
> > >> > Any feedback is very welcome!
> > >> >
> > >> >
> > >> > [1]
> > >>
> > https://cwiki.apache.org/confluence/display/FLINK/FLIP-204%3A+Introduce+Hash+Lookup+Join
> > >> > [2]
> > >>
> > https://github.com/apache/flink/tree/master/flink-table/flink-table-planner/src/main/java/org/apache/calcite
> > >> > [3] https://github.com/apache/drill/blob/master/pom.xml#L64
> > >> >
> > >> > Best,
> > >> > Godfrey
> > >>
> > >>
> > >>
> >

Reply via email to