Hi Martijin,
This is really exciting news.
Thanks a lot for the effort to improve collaboration and communication with
the Calcite community.

> My take away from the discussion in the Flink community and the discussion
in the Calcite community is that I believe we should do 3 things.

Agreed on these 3 points.
About keeping up with the Calcite updates, I would like to take this issue.
Is it too late to schedule the 1.16 version? How about scheduling this work
on version 1.17?

Best,
Jing Zhang


Martijn Visser <martijnvis...@apache.org> 于2022年6月23日周四 02:01写道:

> Hi everyone,
>
> I've recently reached out to the Calcite community to see if we could
> somehow get something done with regards to the PR that Jing Zhang had
> opened a long time ago. In that thread, I also mentioned that we had a
> discussion in the Flink community on potentially forking Calcite. I would
> recommend reading up on the thread [1]. Specifically the replies from other
> projects/PMCs (Apache Drill, Apache Dremio) are super interesting. These
> projects have forked Calcite in the past, regret that move, have reverted
> back to Calcite / are in the process of reverting and are elaborating on
> that. This thread also gained some traction on Twitter in case you're
> interested in more opinions. [3]
>
> My take away from the discussion in the Flink community and the discussion
> in the Calcite community is that I believe we should do 3 things:
>
> 1. We should not fork Calcite. There might be short term benefits but long
> term pain. I think we already are suffering from enough long term pain in
> the Flink codebase that we shouldn't take a step that will increase that
> pain even more, scattered over multiple places.
> 2. I think we should try to help out the Calcite community more. Not only
> by opening new PRs for new features, but we can also help by reviewing
> those PRs, reviewing other PRs that could be relevant for Flink or propose
> improvements given our experience at Flink. As you can see in the Calcite
> thread, Timo has already expressed desire in doing so. Part of the OSS
> community is also about helping each other; if we improve Calcite, we will
> also improve Flink.
> 3. I think we need to prioritise keeping up with the Calcite updates. They
> are currently working on releasing version 1.31, while Flink is still at
> 1.26.0. We don't necessarily need to stay in sync with the latest available
> version, but I definitely think we should be at most 2 versions (and
> preferably 1 version) behind (so currently that would be 1.28 and 1.29
> soonish). Not only are we increasing our own tech debt by not updating, we
> are also limiting ourselves in adding new features in the Table/SQL space.
> As you can also see for the 1.26 release notes, there's a warning to only
> use 1.26 for development since it can corrupt your data [3]. There are
> already multiple upgrade tickets for Calcite [4] [5] [6].
>
> [1] https://lists.apache.org/thread/3lkfhwjpqwy9pfhnvwmfkwmwlfyqs45z
> [2]
>
> https://twitter.com/gunnarmorling/status/1539499415337111553?s=21&t=8fGk3PxScOx4FJPJWE5UeA
> [3] https://calcite.apache.org/news/2020/10/06/release-1.26.0/
> [4] https://issues.apache.org/jira/browse/FLINK-20873
> [5] https://issues.apache.org/jira/browse/FLINK-21239
> [6] https://issues.apache.org/jira/browse/FLINK-27998
>
> Best regards,
>
> Martijn Visser
> https://twitter.com/MartijnVisser82
> https://github.com/MartijnVisser
>
> Op do 5 mei 2022 om 10:34 schreef godfrey he <godfre...@gmail.com>:
>
> > Hi, Timo & Martijn,
> >
> > Sorry for the late reply, thanks for the feedback.
> >
> > I strongly agree that the best solution would be to cooperate more
> > with the Calcite community
> > and maintain all new features and bug fixes in the Calcite community,
> > without any forking.
> > It is a long-term process. I think it's difficult to change community
> > rules, because the Calcite
> > project is a neutral lib that serves multiple projects simultaneously.
> > I don't think fork calcite is the perfect solution, but rather a
> > better balance within limited resources:
> > it's possible to introduce some necessary minor features and bug fixes
> > without having to
> > upgrade to the latest version.
> >
> >
> > I investigate other projects that use Calcite[1] and find that most of
> > them do not use
> > the latest version of the Calcite. Even for the Kylin community, the
> > version, based on
> > Calcite-1.16.0 has been updated to 70[2]. (Similar projects are quark and
> > drill)
> > My guess is that these projects choosed a stable version,
> > (or even choose to maintain a fork project), to maintain the stability.
> > When Flink does not need to introduce new syntax anymore,
> > I guess it's less expensive and more manageable to maintain a fork
> Calcite.
> >
> >
> > Even if we don't end up going the fork calcite route,
> > I hope that we could discuss the options for subsequent calcite upgrades
> > here.
> > Just like Timo mentioned, how to balance feature development and code
> > maintenance.
> > There are a few realistic questions about the Calcite upgrade
> > situation now, such as:
> > 1. If we keep up with the latest version of Calcite, who is
> > responsible for each upgrade?
> > The current status is that no one has motivation to upgrade the version
> > unless he/she wants to drive new features.
> > 2. Do we have the resources/energy to upgrade each version?
> > 3. How do we ensure that each upgrade is expected? It took a lot of
> effort
> > to
> > verify the correctness of the upgrade results.The Test set for
> > uncommon sql usage is not enough now.
> >
> >
> > > I still don't quite understand why we want to avoid Calcite upgrades.
> > Not every feature in Calcite is a feature we really need. While some
> > refactorings can be very burdensome
> > (lots of bugs, plan changes, and a lot of effort to fix).
> > Just as mentioned above, the "SEARCH operator" refactoring in
> > CALCITE-4173 did cause a lot of bugs.
> >
> >
> > [1] https://calcite.apache.org/docs/powered_by.html
> > [2] https://github.com/Kyligence/calcite/commits/kycalcite-1.16.0.x-4.x
> >
> > Best,
> > Godfrey
> >
> > Martijn Visser <martijnvis...@apache.org> 于2022年4月25日周一 22:11写道:
> >
> > >
> > > Hi all,
> > >
> > > Just a couple of remarks on some of things from this thread:
> > >
> > > > I think we will upgrade Calcite to 1.31 only when Flink depends on
> some
> > > significant features of Calcite.
> > > > Such as: new syntax PTF (CALCITE-4865).
> > >
> > > Like Timo also mentions, I think this is a bad practice. Calcite is a
> key
> > > dependency for Flink. We should upgrade as often as possible, not as
> > little
> > > as possible. Any fork in the beginning is easy, but it becomes a bigger
> > > pain as time progresses.
> > >
> > > > >## Are the calcite repository costly to maintain?
> > > > From the experience of @Dann y chen (One PMC of Calcite), publishing
> > > > is much easier.
> > >
> > > Since Calcite is such a key dependency, I would really oppose forking
> it.
> > > There will only be very few maintainers of such a fork. The amount of
> > > people that know and can maintain both Calcite and Flink will be even
> > less.
> > >
> > > > I'm just trying to find an approach which can avoid frequent Calcite
> > > upgrades,
> > > > but easily support bug fix and minor new feature development.
> > >
> > > I still don't quite understand why we want to avoid Calcite upgrades.
> > > Upgrading Calcite introduces new features, but it also resolves bugs
> that
> > > currently exist in Flink. Part of housekeeping is that we keep our
> > codebase
> > > up-to-date and tidy, to avoid that it becomes a mess and
> unmaintainable.
> > I
> > > understand that this is less preferred, because you can't spend this
> time
> > > working on new features. If I make a comparison with doing construction
> > > work on your house, you can't put in a new floor if you don't clean out
> > the
> > > room first.
> > >
> > > > About Calcite version upgrading,  we should try not use the latest
> > > Calcite version to avoid the bugs introduced by the new version if
> > possible.
> > >
> > > I can fully agree on that. But right now we're running multiple
> versions
> > > behind.
> > >
> > > Have we reached out to the Calcite community first with our problems,
> or
> > > have we gone straight into "let's fork it"?
> > >
> > > I still haven't seen an argument that would make me in favor of setting
> > up
> > > a fork.
> > >
> > > Best regards,
> > >
> > > Martijn
> > >
> > > On Mon, 25 Apr 2022 at 15:55, Timo Walther <twal...@apache.org> wrote:
> > >
> > > > Hi Godfrey,
> > > >
> > > > I'm also strictly against maintaining a Calcite fork. We had similar
> > > > discussions during the merge of the Blink code base in the past and
> I'm
> > > > happy that we could prevent a fork until today. Let me elaborate a
> bit
> > > > on my strict opinion here:
> > > >
> > > > 1) Calcite does not offer bugfix releases
> > > >
> > > > In the end, also Calcite is an Apache community. I'm sure we could
> > > > improve our collaboration and help releasing bugfix releases. So far
> we
> > > > were mostly leveraging all the stuff that the Calcite community has
> > > > built. It would be good to strengthen the relation and also give
> > > > something back.
> > > >
> > > > So far having no bugfix releases was not really a problem for the
> Flink
> > > > community. We simply copy over files from Calcite into Flink once a
> bug
> > > > has been merged in Calcite. Maven implicitly overwrites the original
> > > > Calcite classes during artifact building. Most `org.apache.calcite`
> > > > classes in the Flink code base are fixing bugs and wait for removal
> > > > during the next Calcite upgrade.
> > > >
> > > > 2) Slow feature reviewing
> > > >
> > > > Slow feature reviewing has a good and a bad side. One of the reasons
> > why
> > > > it is so slow is because the maintainers pay a lot of attention to
> > > > standard compliance, long-term code quality, and
> > > > cross-downstream-projects usability. All of that is the reason why
> the
> > > > Calcite code base has last multiple decades already and is useful for
> > > > many parties.
> > > >
> > > > Relying on Calcite has protected the Flink code base from merging
> > > > non-standard SQL features and extending the SQL dialect too much. The
> > 1.
> > > > windows in Calcite and aux functions such as TUMBLE_START have shown
> > > > that only standard compliant features should be merged. Now the Flink
> > > > community has the problem of maintaining this custom syntax.
> > > >
> > > > 3) No compatibility guaranteed from the Calcite community
> > > >
> > > > I disagree here. Many changes are protected by keeping deprecated
> > > > methods/constructors/classes around for years. And many refactoring
> are
> > > > nice also for the Flink community. E.g. easier optimizer rule
> > definition.
> > > >
> > > > IMHO the core problem is rather that we don't update Calcite
> frequently
> > > > enough. Currently, we are lagging behind quite a bit because we don't
> > > > pay enough resources in code maintenance but only in new feature
> > > > development. We should spend some time in a better balance of the
> two.
> > > >
> > > > Regards,
> > > > Timo
> > > >
> > > > Am 25.04.22 um 15:13 schrieb godfrey he:
> > > > > Hi Jark,
> > > > >
> > > > > Agree with you, thanks for the feedback.
> > > > >
> > > > > Best,
> > > > > Godfrey
> > > > >
> > > > > Jark Wu <imj...@gmail.com> 于2022年4月25日周一 13:02写道:
> > > > >> Thanks, Godfrey, for starting this discussion,
> > > > >>
> > > > >> I understand the motivation behind it.
> > > > >> No bugfix releases, slow feature reviewing, and no compatibility
> > > > guaranteed
> > > > >> are genuinely blocking the development of Flink SQL.
> > > > >>
> > > > >> I think a fork is the last choice before trying our best to
> > cooperate
> > > > with
> > > > >> the Calcite community.
> > > > >> But we shouldn't stop here if there is no progress. Therefore, I'm
> > okay
> > > > >> with maintaining a fork.
> > > > >>
> > > > >> However:
> > > > >> 1) It should be a temporary solution. We should have a plan to
> move
> > > > back to
> > > > >> the latest Calcite version at some point (e.g., pushing them to
> > resolve
> > > > the
> > > > >> problems mentioned above).
> > > > >>
> > > > >> 2) If we maintain the fork in flink-extended, we should determine
> a
> > > > groupId
> > > > >> for deploying to maven central. The community should have
> > permission to
> > > > >> deploy under the groupId.
> > > > >>
> > > > >> Best,
> > > > >> Jark
> > > > >>
> > > > >>
> > > > >> On Sun, 24 Apr 2022 at 16:14, godfrey he <godfre...@gmail.com>
> > wrote:
> > > > >>
> > > > >>> Hi, Jing
> > > > >>> Thanks for sharing the Calcite experiences.
> > > > >>> About Calcite version upgrading,  we should try not use the
> latest
> > > > Calcite
> > > > >>> version to avoid the bugs introduced by the new version if
> > possible.
> > > > >>> This may be a best practice.
> > > > >>>
> > > > >>>
> > > > >>> Hi, Yun
> > > > >>> Thanks for the detailed explanation for the experiences regarding
> > > > FRocksDB.
> > > > >>> I agree with you that the situation with Calcite and RocksDB is a
> > > > >>> little difference.
> > > > >>> The main pain point for Calcite is that we have to upgrade
> Calcite
> > to
> > > > >>> latest version
> > > > >>> to get fix bugs and new features, but the latest version may be
> > > > >>> unstable, which is a pain for us.
> > > > >>> If we all agree we should maintain a forked Calcite repo,
> > > > >>> there are many experiences we can learn from FRocksDB.
> > > > >>>
> > > > >>> Best,
> > > > >>> Godfrey
> > > > >>>
> > > > >>> Yun Tang <myas...@live.com> 于2022年4月24日周日 11:58写道:
> > > > >>>> Hi all,
> > > > >>>>
> > > > >>>> I could share two cents here for how we maintain FRocksDB.
> > > > >>>>
> > > > >>>> First of all, we also do not prefer to maintain a customized
> > RocksDB
> > > > >>> version in Flink, which brings additional overhead for Flink
> > community:
> > > > >>>>
> > > > >>>>    1.  RocksDB community switches to circleci for the CI tests
> > after
> > > > >>> RocksDB-6.x, which requires additional money to run all tests for
> > > > reviewing
> > > > >>> each PR.
> > > > >>>>    2.  We need to compile and include all kinds of FRocksDB
> > binaries
> > > > on
> > > > >>> linux32/64, windows, ppc64, ARM and Macos platforms, which is
> > really
> > > > tough
> > > > >>> and boring experiences.
> > > > >>>> The root reason why we have to maintain a forked RocksDB repo is
> > that
> > > > >>> RocksDB community refuses to accept a plugin-like feature based
> on
> > > > >>> compaction filter, which is heavily dependent by Flink's state
> TTL
> > > > feature
> > > > >>> [1]. From RocksDB-7.0, the community also moves several
> components
> > to
> > > > the
> > > > >>> plugin repo [2], although this cannot avoid us to release all
> > kinds of
> > > > >>> binaries, it can at least decrease our energy to maintain the
> whole
> > > > tests
> > > > >>> if we follow this trend.
> > > > >>>> Last but not least, I don't think current discussion on Apache
> > Calcite
> > > > >>> is in the same situation as FRocksDB. Current Flink SQL guys
> > complain
> > > > that
> > > > >>> Calcite is released too slowly, which blocks some feature
> > development
> > > > in
> > > > >>> Flink. However, RocksDB community itself actually release new
> > versions
> > > > more
> > > > >>> frequently, and we don't rely on its new version for some new
> > features
> > > > >>> currently. Moreover, we're often more careful on upgrading
> > underlying
> > > > >>> storage component as it could impact the performance and data
> > > > correctness.
> > > > >>>>
> > > > >>>> [1]
> > > > >>>
> > > >
> >
> https://github.com/ververica/frocksdb/commit/3da8249d50c8a3a6ea229f43890d37e098372786
> > > > >>>> [2] https://github.com/facebook/rocksdb/issues/9390
> > > > >>>>
> > > > >>>> Best
> > > > >>>> Yun Tang
> > > > >>>>
> > > > >>>> ________________________________
> > > > >>>> From: Jing Zhang <beyond1...@gmail.com>
> > > > >>>> Sent: Saturday, April 23, 2022 15:21
> > > > >>>> To: dev <dev@flink.apache.org>
> > > > >>>> Cc: Yun Tang <myas...@live.com>
> > > > >>>> Subject: Re: [DISCUSS] Maintain a Calcite repository for Flink
> to
> > > > >>> accelerate the development for Flink SQL features
> > > > >>>> Hi Godfrey,
> > > > >>>> I would like to share some problems based on my past experience.
> > > > >>>> 1.  It's not easy to push new features in the CALCITE community.
> > > > >>>> As @Martijn referred,
> > > > https://issues.apache.org/jira/browse/CALCITE-4865
> > > > >>> /
> > > > >>>> https://github.com/apache/calcite/pull/2606 is such an example.
> > > > >>>> I tried out many ways, for example, sent review requests in the
> > dev
> > > > mail
> > > > >>> list, left comments in JIRA and in pull requests.
> > > > >>>> And had to give up finally. Sorry for that.
> > > > >>>> 2. However,  some new features of calcite are radical.
> > > > >>>> Such as https://issues.apache.org/jira/browse/CALCITE-4173,
> > which had
> > > > >>> some strong opposition in the CALCITE community,
> > > > >>>> But it was merged finally and caused  unexpected problems, such
> as
> > > > wrong
> > > > >>> results (https://issues.apache.org/jira/browse/FLINK-24708)
> > > > >>>> and other related bugs.
> > > > >>>> 3. Every time we upgrade the calcite version, we will cross
> > multiple
> > > > >>> versions, resulting in a slow upgrade process and
> > > > >>>> uncontrolled results, often causing some unexpected problems.
> > > > >>>>
> > > > >>>> Thank @Godfrey for driving this discussion in a big scope.
> > > > >>>> I think it's a good chance to review these problems and find a
> > > > solution.
> > > > >>>>
> > > > >>>> Best,
> > > > >>>> Jing Zhang
> > > > >>>>
> > > > >>>> godfrey he <godfre...@gmail.com<mailto:godfre...@gmail.com>>
> > > > >>> 于2022年4月22日周五 21:40写道:
> > > > >>>> Hi Chesnay,
> > > > >>>>
> > > > >>>> There is no bug fix version until now.
> > > > >>>> You can find the releases in
> > https://github.com/apache/calcite/tags
> > > > >>>>
> > > > >>>> Best,
> > > > >>>> Godfrey
> > > > >>>>
> > > > >>>> Chesnay Schepler <ches...@apache.org<mailto:ches...@apache.org
> >>
> > > > >>> 于2022年4月22日周五 18:48写道:
> > > > >>>>> I find it a bit weird that the supposed only way to get a bug
> > fix is
> > > > to
> > > > >>>>> do a big version upgrade.
> > > > >>>>> Is Calcite not creating bugfix releases?
> > > > >>>>>
> > > > >>>>> On 22/04/2022 12:26, godfrey he wrote:
> > > > >>>>>> Thanks for the feedback, guys!
> > > > >>>>>>
> > > > >>>>>> For Jingsong's feedback:
> > > > >>>>>>> ## Do we have the plan to upgrade calcite to 1.31?
> > > > >>>>>> I think we will upgrade Calcite to 1.31 only when Flink
> depends
> > on
> > > > >>>>>> some significant features of Calcite.
> > > > >>>>>>    Such as: new syntax PTF (CALCITE-4865).
> > > > >>>>>>
> > > > >>>>>>    >## Is Cherry-pick costly?
> > > > >>>>>> >From the experience of maintaining calcite with our company,
> > the
> > > > >>> cost is small.
> > > > >>>>>> We only cherry-pick the bug fixes and needed minor features.
> > > > >>>>>> For a major feature, we can choose to upgrade the version.
> > > > >>>>>>
> > > > >>>>>>> ## Are the calcite repository costly to maintain?
> > > > >>>>>> >From the experience of @Dann y chen (One PMC of Calcite),
> > > > publishing
> > > > >>>>>> is much easier.
> > > > >>>>>>
> > > > >>>>>>
> > > > >>>>>> For Chesnay's feedback:
> > > > >>>>>> I also totally agree that a fork repository will increase the
> > cost
> > > > of
> > > > >>>>>> maintenance.
> > > > >>>>>>
> > > > >>>>>> Usually, the Calcite community releases a version three months
> > or
> > > > >>> more.
> > > > >>>>>> I think it's hard to let Calcite change the release cycle
> > > > >>>>>> because Calcite supports many compute engines.
> > > > >>>>>>
> > > > >>>>>>
> > > > >>>>>> For Konstantin's feedback:
> > > > >>>>>> Some changes in Calcite may cause hundreds of plan changes in
> > Flink,
> > > > >>>>>> such as: CALCITE-4173.
> > > > >>>>>> We must check whether the change is expected, whether there is
> > > > >>>>>> performance regression.
> > > > >>>>>> Some of the changes are very subtle, especially in the CBO
> > planner.
> > > > >>>>>> This situation also occurs similarly within upgrading from
> 1.1x
> > to
> > > > >>> 1.22.
> > > > >>>>>> If you are not familiar with Flink planner and Calcite, it
> will
> > be
> > > > >>>>>> more difficult to upgrade.
> > > > >>>>>>
> > > > >>>>>>
> > > > >>>>>> For Xintong's feedback:
> > > > >>>>>> You are right, I will connect Yun for some help, Thanks for
> the
> > > > >>> suggestions.
> > > > >>>>>>
> > > > >>>>>> For Martijn's feedback:
> > > > >>>>>> I'm also against cherry-pick many features code into the fock
> > > > >>> repository,
> > > > >>>>>> and I also totally agree we should collaborate closely with
> the
> > > > >>>>>> Calcite community.
> > > > >>>>>> I'm just trying to find an approach which can avoid frequent
> > Calcite
> > > > >>> upgrades,
> > > > >>>>>> but easily support bug fix and minor new feature development.
> > > > >>>>>>
> > > > >>>>>> As for the CALCITE-4865 case, I think we should upgrade the
> > Calcite
> > > > >>>>>> version to support PTF.
> > > > >>>>>>
> > > > >>>>>> @Jing zhang, can you share some 'feeling' for CALCITE-4865 ?
> > > > >>>>>>
> > > > >>>>>> Best,
> > > > >>>>>> Godfrey
> > > > >>>>>>
> > > > >>>>>> Martijn Visser <martijnvis...@apache.org<mailto:
> > > > >>> martijnvis...@apache.org>> 于2022年4月22日周五 17:31写道:
> > > > >>>>>>> Hi everyone,
> > > > >>>>>>>
> > > > >>>>>>> Overall I'm against the idea of setting up a Calcite fork for
> > the
> > > > >>> same
> > > > >>>>>>> reasons that Chesnay has mentioned. We've talked extensively
> > about
> > > > >>> doing an
> > > > >>>>>>> upgrade of Calcite during the Flink 1.15 release period, but
> > there
> > > > >>> was a
> > > > >>>>>>> lot of pushback by the maintainers against that because of
> the
> > > > >>> required
> > > > >>>>>>> efforts. Having our own fork will mean that there will be
> even
> > more
> > > > >>> effort
> > > > >>>>>>> required, because not only do we need to perform the upgrade
> on
> > > > >>> Flink's
> > > > >>>>>>> end, we also need to maintain this Calcite fork.
> > > > >>>>>>>
> > > > >>>>>>> I think what we should do is have a closer collaboration with
> > the
> > > > >>> Calcite
> > > > >>>>>>> community and see if we can also help out with
> > reviewing/merging
> > > > >>> PRs and
> > > > >>>>>>> more frequent releases. What we're seeing is that already
> > features
> > > > >>> that are
> > > > >>>>>>> proposed towards Calcite because we need them for Flink, are
> > not
> > > > >>> getting
> > > > >>>>>>> picked up by the Calcite community. See
> > > > >>>>>>> https://issues.apache.org/jira/browse/CALCITE-4865 /
> > > > >>>>>>> https://github.com/apache/calcite/pull/2606 which is such an
> > > > >>> example.
> > > > >>>>>>> I would rather invest more in collaborating with the Calcite
> > > > >>> community
> > > > >>>>>>> instead of maintaining our own fork. I believe that would
> help
> > us
> > > > >>> get new
> > > > >>>>>>> features and bug fixes sooner.
> > > > >>>>>>>
> > > > >>>>>>> Best regards,
> > > > >>>>>>>
> > > > >>>>>>> Martijn Visser
> > > > >>>>>>> https://twitter.com/MartijnVisser82
> > > > >>>>>>> https://github.com/MartijnVisser
> > > > >>>>>>>
> > > > >>>>>>>
> > > > >>>>>>> On Fri, 22 Apr 2022 at 10:46, Xintong Song <
> > tonysong...@gmail.com
> > > > >>> <mailto:tonysong...@gmail.com>> wrote:
> > > > >>>>>>>> BTW, I think this proposal sounds similar to FRocksDB, the
> > Flink's
> > > > >>> custom
> > > > >>>>>>>> RocksDB build. Maybe folks maintaining FRocksDB can share
> some
> > > > >>> experiences.
> > > > >>>>>>>> CC @Yun Tang
> > > > >>>>>>>>
> > > > >>>>>>>> Thank you~
> > > > >>>>>>>>
> > > > >>>>>>>> Xintong Song
> > > > >>>>>>>>
> > > > >>>>>>>>
> > > > >>>>>>>>
> > > > >>>>>>>> On Fri, Apr 22, 2022 at 4:35 PM Xintong Song <
> > > > >>> tonysong...@gmail.com<mailto:tonysong...@gmail.com>>
> > > > >>>>>>>> wrote:
> > > > >>>>>>>>
> > > > >>>>>>>>> Hi Godfrey,
> > > > >>>>>>>>>
> > > > >>>>>>>>>
> > > > >>>>>>>>>> 1. Where to put the code?
> https://github.com/flink-extended
> > is
> > > > >>> a good
> > > > >>>>>>>>>> place.
> > > > >>>>>>>>> Please notice that `flink-extended` is not endorsed by the
> > Apache
> > > > >>> Flink
> > > > >>>>>>>>> PMC. That means if the proposed new Calcite repository is
> > hosted
> > > > >>> there,
> > > > >>>>>>>> the
> > > > >>>>>>>>> maintenance and release will not be guaranteed by the
> Apache
> > > > Flink
> > > > >>>>>>>> project.
> > > > >>>>>>>>> I guess the question is do we consider another 3rd party
> > Calcite
> > > > >>>>>>>> repository
> > > > >>>>>>>>> more reliable and convenient than the official Apache
> Calcite
> > > > >>> that we
> > > > >>>>>>>> want
> > > > >>>>>>>>> to depend on.
> > > > >>>>>>>>>
> > > > >>>>>>>>> Thank you~
> > > > >>>>>>>>>
> > > > >>>>>>>>> Xintong Song
> > > > >>>>>>>>>
> > > > >>>>>>>>>
> > > > >>>>>>>>>
> > > > >>>>>>>>> On Fri, Apr 22, 2022 at 4:07 PM Chesnay Schepler <
> > > > >>> ches...@apache.org<mailto:ches...@apache.org>>
> > > > >>>>>>>>> wrote:
> > > > >>>>>>>>>
> > > > >>>>>>>>>> I'm overall against the idea of creating a fork.
> > > > >>>>>>>>>> It implies quite some maintenance overhead, like dealing
> > with
> > > > >>> unstable
> > > > >>>>>>>>>> tests, CI, licensing etc. and the overall release
> overhead.
> > > > >>>>>>>>>>
> > > > >>>>>>>>>> Is there no alternative where we can collaborate more with
> > the
> > > > >>> calcite
> > > > >>>>>>>>>> guys, like verifying new features so bugs are caught
> sooner?
> > > > >>>>>>>>>>
> > > > >>>>>>>>>> On 22/04/2022 09:31, godfrey he wrote:
> > > > >>>>>>>>>>> Dear devs,
> > > > >>>>>>>>>>>
> > > > >>>>>>>>>>> I would like to open a discussion on the fact that
> > currently
> > > > >>> many
> > > > >>>>>>>>>>> Flink SQL function
> > > > >>>>>>>>>>>     development relies on Calcite releases, which
> seriously
> > > > >>> blocks some
> > > > >>>>>>>>>>> Flink SQL's features release.
> > > > >>>>>>>>>>> Therefore, I would like to discuss whether it is possible
> > to
> > > > >>> solve
> > > > >>>>>>>> this
> > > > >>>>>>>>>> problem
> > > > >>>>>>>>>>> by creating Flink's own Calcite repository.
> > > > >>>>>>>>>>>
> > > > >>>>>>>>>>> Currently, Flink depends on Caclite-1.26, FLIP-204[1]
> > relies on
> > > > >>>>>>>>>> Calcite-1.30,
> > > > >>>>>>>>>>> and we recently want to support fully join-hints
> > functionatity
> > > > >>> in
> > > > >>>>>>>>>> Flink-1.16,
> > > > >>>>>>>>>>> which relies on Calcite-1.31 (maybe two or three months
> > later
> > > > >>> will be
> > > > >>>>>>>>>> released).
> > > > >>>>>>>>>>> In order to support some new features or fix some bugs,
> we
> > need
> > > > >>> to
> > > > >>>>>>>>>> upgrade
> > > > >>>>>>>>>>> the Calcite version, but every time we upgrade Calcite
> > version
> > > > >>>>>>>>>>> (especially upgrades
> > > > >>>>>>>>>>> across multiple versions), the processing is very tough:
> I
> > > > >>> remember
> > > > >>>>>>>>>> clearly that
> > > > >>>>>>>>>>>     the Calcite upgrade from 1.22 to 1.26 took two weeks
> of
> > > > >>> full-time to
> > > > >>>>>>>>>> complete.
> > > > >>>>>>>>>>> Currently, in order to fix some bugs while not upgrading
> > the
> > > > >>> Calcite
> > > > >>>>>>>>>> version,
> > > > >>>>>>>>>>> we copy the corresponding Calcite class directly into the
> > Flink
> > > > >>>>>>>> project
> > > > >>>>>>>>>>> and then modify it accordingly.[2] This approach is
> rather
> > > > >>> hacky and
> > > > >>>>>>>>>>> hard for code maintenance and upgrades.
> > > > >>>>>>>>>>>
> > > > >>>>>>>>>>> So, I had an idea whether we could solve this problem by
> > > > >>> maintaining a
> > > > >>>>>>>>>>> Calcite repository
> > > > >>>>>>>>>>> in the Flink community. This approach has been practiced
> > within
> > > > >>> my
> > > > >>>>>>>>>>> company for many years.
> > > > >>>>>>>>>>>     There are similar practices in the industry. For
> > example,
> > > > >>> Apache
> > > > >>>>>>>> Dill
> > > > >>>>>>>>>>> also maintains
> > > > >>>>>>>>>>> a separate Calcite repository[3].
> > > > >>>>>>>>>>>
> > > > >>>>>>>>>>> The following is a brief analysis of the approach and the
> > pros
> > > > >>> and
> > > > >>>>>>>>>>> cons of maintaining a separate repository.
> > > > >>>>>>>>>>>
> > > > >>>>>>>>>>> Approach:
> > > > >>>>>>>>>>> 1. Where to put the code?
> > https://github.com/flink-extended is
> > > > >>> a good
> > > > >>>>>>>>>> place.
> > > > >>>>>>>>>>> 2. What extra code can be added to this repository? Only
> > bug
> > > > >>> fixes and
> > > > >>>>>>>>>> features
> > > > >>>>>>>>>>> that are already merged into Calcite can be cherry-picked
> > to
> > > > >>> this
> > > > >>>>>>>>>> repository.
> > > > >>>>>>>>>>> We also should try to push bug fixes to the Calcite
> > community.
> > > > >>>>>>>>>>> Btw, the copied Calcite class in the Flink project can be
> > > > >>> removed.
> > > > >>>>>>>>>>> 3. How to upgrade the Calcite version? Check out the
> target
> > > > >>> Calcite
> > > > >>>>>>>>>>> release branch
> > > > >>>>>>>>>>> and rebase our bug fix code. (As we upgrade, we will
> > maintain
> > > > >>> fewer
> > > > >>>>>>>>>>> and fewer older bug
> > > > >>>>>>>>>>> fixes code.) And then, verify all Calcte's tests and
> > Flink's
> > > > >>> tests in
> > > > >>>>>>>>>>> the developer's local
> > > > >>>>>>>>>>>     environment. If all tests are OK, release the Calcite
> > > > >>> branch, or fix
> > > > >>>>>>>>>>> it in the branch and re-test.
> > > > >>>>>>>>>>>     After the branch is released, then the version of
> > Calcite
> > > > in
> > > > >>> Flink
> > > > >>>>>>>>>>> can be upgraded. For example:
> > > > >>>>>>>>>>>     checkout calcite-1.26.0-flink-v1-SNAPSHOT branch from
> > > > >>>>>>>> calcite-1.26.0,
> > > > >>>>>>>>>>> move all the copied
> > > > >>>>>>>>>>>     Calcite code in Flink to the branch, and pick all the
> > hint
> > > > >>> related
> > > > >>>>>>>>>>> changes from Calcite-1.31 to
> > > > >>>>>>>>>>>     the branch. Then we can change the Calcite version in
> > Flink
> > > > >>> to
> > > > >>>>>>>>>>> calcite-1.26.0-flink-v1-SNAPSHOT,
> > > > >>>>>>>>>>> and verify all tests in the locale. Release
> > > > >>> calcite-1.26.0-flink-v1
> > > > >>>>>>>>>>> after all tests are successful.
> > > > >>>>>>>>>>> At last upgrade the calcite version to
> > > > >>>>>>>>>>> calcite-1.26.0-flink-v10-flink-v1, and open a PR.
> > > > >>>>>>>>>>> 4. Who will maintain it? The maintenance workload is
> > minimal,
> > > > >>> but the
> > > > >>>>>>>>>>> upgrade work is
> > > > >>>>>>>>>>>     laborious (actually, it's similar to before). I can
> > > > maintain
> > > > >>> it in
> > > > >>>>>>>>>>> the early stage and standardise the processing.
> > > > >>>>>>>>>>>
> > > > >>>>>>>>>>> Pros.
> > > > >>>>>>>>>>> 1. The release of Flink is decoupled from the release of
> > > > >>> Calcite,
> > > > >>>>>>>>>>>     making feature development and bug fix quicker
> > > > >>>>>>>>>>> 2. Reduce the hassle of unnecessary calcite upgrades
> > > > >>>>>>>>>>> 3. No hacking in Flink to maintain the Calcite copied
> code
> > > > >>>>>>>>>>>
> > > > >>>>>>>>>>> cons.
> > > > >>>>>>>>>>> 1. Need to maintain an additional Calcite repository
> > > > >>>>>>>>>>> 2. The Upgrades are a little more complicated than before
> > > > >>>>>>>>>>>
> > > > >>>>>>>>>>> Any feedback is very welcome!
> > > > >>>>>>>>>>>
> > > > >>>>>>>>>>>
> > > > >>>>>>>>>>> [1]
> > > > >>>
> > > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-204%3A+Introduce+Hash+Lookup+Join
> > > > >>>>>>>>>>> [2]
> > > > >>>
> > > >
> >
> https://github.com/apache/flink/tree/master/flink-table/flink-table-planner/src/main/java/org/apache/calcite
> > > > >>>>>>>>>>> [3]
> > https://github.com/apache/drill/blob/master/pom.xml#L64
> > > > >>>>>>>>>>>
> > > > >>>>>>>>>>> Best,
> > > > >>>>>>>>>>> Godfrey
> > > > >>>>>>>>>>
> > > >
> > > >
> >
>

Reply via email to