Thanks Godfrey for driving this.

Sounds good to me.

I have a few questions:

## Do we have the plan to upgrade calcite to 1.31?

It looks like only one major upgrade later will solve the current problem.

## Is Cherry-pick costly?

If the conflict is large, this may require too much effort in this
calcite repository.

## Are the calcite repository costly to maintain?

Like publishing our calcite repository, trouble? I'm not familiar with
the calcite community, but I know they use gradle and the release
process is different from maven.

Best,
Jingsong

On Fri, Apr 22, 2022 at 3:32 PM godfrey he <godfre...@gmail.com> wrote:
>
> Dear devs,
>
> I would like to open a discussion on the fact that currently many
> Flink SQL function
>  development relies on Calcite releases, which seriously blocks some
> Flink SQL's features release.
> Therefore, I would like to discuss whether it is possible to solve this 
> problem
> by creating Flink's own Calcite repository.
>
> Currently, Flink depends on Caclite-1.26, FLIP-204[1] relies on Calcite-1.30,
> and we recently want to support fully join-hints functionatity in Flink-1.16,
> which relies on Calcite-1.31 (maybe two or three months later will be 
> released).
>
> In order to support some new features or fix some bugs, we need to upgrade
> the Calcite version, but every time we upgrade Calcite version
> (especially upgrades
> across multiple versions), the processing is very tough: I remember clearly 
> that
>  the Calcite upgrade from 1.22 to 1.26 took two weeks of full-time to 
> complete.
>
> Currently, in order to fix some bugs while not upgrading the Calcite version,
> we copy the corresponding Calcite class directly into the Flink project
> and then modify it accordingly.[2] This approach is rather hacky and
> hard for code maintenance and upgrades.
>
> So, I had an idea whether we could solve this problem by maintaining a
> Calcite repository
> in the Flink community. This approach has been practiced within my
> company for many years.
>  There are similar practices in the industry. For example, Apache Dill
> also maintains
> a separate Calcite repository[3].
>
> The following is a brief analysis of the approach and the pros and
> cons of maintaining a separate repository.
>
> Approach:
> 1. Where to put the code? https://github.com/flink-extended is a good place.
> 2. What extra code can be added to this repository? Only bug fixes and 
> features
> that are already merged into Calcite can be cherry-picked to this repository.
> We also should try to push bug fixes to the Calcite community.
> Btw, the copied Calcite class in the Flink project can be removed.
> 3. How to upgrade the Calcite version? Check out the target Calcite
> release branch
> and rebase our bug fix code. (As we upgrade, we will maintain fewer
> and fewer older bug
> fixes code.) And then, verify all Calcte's tests and Flink's tests in
> the developer's local
>  environment. If all tests are OK, release the Calcite branch, or fix
> it in the branch and re-test.
>  After the branch is released, then the version of Calcite in Flink
> can be upgraded. For example:
>  checkout calcite-1.26.0-flink-v1-SNAPSHOT branch from calcite-1.26.0,
> move all the copied
>  Calcite code in Flink to the branch, and pick all the hint related
> changes from Calcite-1.31 to
>  the branch. Then we can change the Calcite version in Flink to
> calcite-1.26.0-flink-v1-SNAPSHOT,
> and verify all tests in the locale. Release calcite-1.26.0-flink-v1
> after all tests are successful.
> At last upgrade the calcite version to
> calcite-1.26.0-flink-v10-flink-v1, and open a PR.
> 4. Who will maintain it? The maintenance workload is minimal, but the
> upgrade work is
>  laborious (actually, it's similar to before). I can maintain it in
> the early stage and standardise the processing.
>
> Pros.
> 1. The release of Flink is decoupled from the release of Calcite,
>  making feature development and bug fix quicker
> 2. Reduce the hassle of unnecessary calcite upgrades
> 3. No hacking in Flink to maintain the Calcite copied code
>
> cons.
> 1. Need to maintain an additional Calcite repository
> 2. The Upgrades are a little more complicated than before
>
> Any feedback is very welcome!
>
>
> [1] 
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-204%3A+Introduce+Hash+Lookup+Join
> [2] 
> https://github.com/apache/flink/tree/master/flink-table/flink-table-planner/src/main/java/org/apache/calcite
> [3] https://github.com/apache/drill/blob/master/pom.xml#L64
>
> Best,
> Godfrey

Reply via email to