Thanks Godfrey for driving this. Sounds good to me.
I have a few questions: ## Do we have the plan to upgrade calcite to 1.31? It looks like only one major upgrade later will solve the current problem. ## Is Cherry-pick costly? If the conflict is large, this may require too much effort in this calcite repository. ## Are the calcite repository costly to maintain? Like publishing our calcite repository, trouble? I'm not familiar with the calcite community, but I know they use gradle and the release process is different from maven. Best, Jingsong On Fri, Apr 22, 2022 at 3:32 PM godfrey he <godfre...@gmail.com> wrote: > > Dear devs, > > I would like to open a discussion on the fact that currently many > Flink SQL function > development relies on Calcite releases, which seriously blocks some > Flink SQL's features release. > Therefore, I would like to discuss whether it is possible to solve this > problem > by creating Flink's own Calcite repository. > > Currently, Flink depends on Caclite-1.26, FLIP-204[1] relies on Calcite-1.30, > and we recently want to support fully join-hints functionatity in Flink-1.16, > which relies on Calcite-1.31 (maybe two or three months later will be > released). > > In order to support some new features or fix some bugs, we need to upgrade > the Calcite version, but every time we upgrade Calcite version > (especially upgrades > across multiple versions), the processing is very tough: I remember clearly > that > the Calcite upgrade from 1.22 to 1.26 took two weeks of full-time to > complete. > > Currently, in order to fix some bugs while not upgrading the Calcite version, > we copy the corresponding Calcite class directly into the Flink project > and then modify it accordingly.[2] This approach is rather hacky and > hard for code maintenance and upgrades. > > So, I had an idea whether we could solve this problem by maintaining a > Calcite repository > in the Flink community. This approach has been practiced within my > company for many years. > There are similar practices in the industry. For example, Apache Dill > also maintains > a separate Calcite repository[3]. > > The following is a brief analysis of the approach and the pros and > cons of maintaining a separate repository. > > Approach: > 1. Where to put the code? https://github.com/flink-extended is a good place. > 2. What extra code can be added to this repository? Only bug fixes and > features > that are already merged into Calcite can be cherry-picked to this repository. > We also should try to push bug fixes to the Calcite community. > Btw, the copied Calcite class in the Flink project can be removed. > 3. How to upgrade the Calcite version? Check out the target Calcite > release branch > and rebase our bug fix code. (As we upgrade, we will maintain fewer > and fewer older bug > fixes code.) And then, verify all Calcte's tests and Flink's tests in > the developer's local > environment. If all tests are OK, release the Calcite branch, or fix > it in the branch and re-test. > After the branch is released, then the version of Calcite in Flink > can be upgraded. For example: > checkout calcite-1.26.0-flink-v1-SNAPSHOT branch from calcite-1.26.0, > move all the copied > Calcite code in Flink to the branch, and pick all the hint related > changes from Calcite-1.31 to > the branch. Then we can change the Calcite version in Flink to > calcite-1.26.0-flink-v1-SNAPSHOT, > and verify all tests in the locale. Release calcite-1.26.0-flink-v1 > after all tests are successful. > At last upgrade the calcite version to > calcite-1.26.0-flink-v10-flink-v1, and open a PR. > 4. Who will maintain it? The maintenance workload is minimal, but the > upgrade work is > laborious (actually, it's similar to before). I can maintain it in > the early stage and standardise the processing. > > Pros. > 1. The release of Flink is decoupled from the release of Calcite, > making feature development and bug fix quicker > 2. Reduce the hassle of unnecessary calcite upgrades > 3. No hacking in Flink to maintain the Calcite copied code > > cons. > 1. Need to maintain an additional Calcite repository > 2. The Upgrades are a little more complicated than before > > Any feedback is very welcome! > > > [1] > https://cwiki.apache.org/confluence/display/FLINK/FLIP-204%3A+Introduce+Hash+Lookup+Join > [2] > https://github.com/apache/flink/tree/master/flink-table/flink-table-planner/src/main/java/org/apache/calcite > [3] https://github.com/apache/drill/blob/master/pom.xml#L64 > > Best, > Godfrey