Martijn, thanks for sharing that thread in the Flink community. I'm someone who has forked Calcite twice: once in Apache Drill and again in Dremio. In both cases, it was all about trading short term benefits against long term costs. In both cases, I think the net amount of work was probably 5x as much as what it would have been if we had just done a better job engaging the community. If I were to state the curve of behavior over six years, I'd guess that in both cases the numbers of effort looked like this:
estimated effort doing high intensity integration with calcite (years 1-6) fork: 1, 5, 10, 50, 100, 200, total = 366 non-fork: 10, 10, 10, 10, 10, total = 50 So yes, the first couple years you're ahead. But you pay a massive technical debt premium long term. Early in a project (Drill) or company's life (Dremio), it can make sense to sacrifice long term for short term but it's important people do it with their eyes open. The reason that this pain is so high is that as your codebases diverge, you start having to do everything the Calcite community does by yourself. Backports become harder and things that you need (e.g. new sql syntax, etc) have to be reimplemented (even if someone else already implemented them in some post-fork Calcite version. Ultimately, at some point you realize that your path is untenable and you unfork. This becomes the biggest expense of them all and I believe both of those teams are still trying to un-fork. The additional thing that becomes an even bigger problem is your absence from the Calcite community means that people may take the project or APIs in ways that are in direct conflict to how you use the library. Since you're not active in the project, you fail to provide a counterpoint and then you're basically just in a miserable place. The Hive project did this best by ensuring that releases of Calcite were also run pre-release against Hive to make sure no major regressions occurred. By being in the community and active, this is the best state from my pov. (It makes your project better and Calcite better.) Two last notes: - I'm not sure the rocks fork is comparable to forking Calcite. The api surface area and community models are very different. - This is all based on a high intensity integration (using rules + planner or sql + rules + planner). Calcite is frustratingly monolithic and if someone was only going to use a small component, my opinion would likely be very different. I'd send this to the Flink list but I'm not subscribed. It'd be great if you shared it with the people over there if you think they'd find it useful. On Tue, Jun 21, 2022 at 12:31 AM Martijn Visser <martijnvis...@apache.org> wrote: > Thanks Julian and Austin! > > Any reply to kick-off some sort of discussion is worthwhile :D > I definitely know the feeling of having more PRs open then you would like, > looking at https://github.com/apache/flink/pulls :) > > There have been discussions in the Flink community about forking Calcite > [1]. My personal preference at the moment is to see if we can create a > better collaboration and community. I believe that we can find people from > the Flink community who can open / help reviewing Calcite PRs that are > interesting for the Flink community. The question is if that will also help > short term since in the end it still requires a Calcite maintainer to > review/merge. > > Best regards, > > Martijn > > [1] https://lists.apache.org/thread/1oqydpsm4mc55bkk440gx9lr9gf2rvf4 > > > Op ma 20 jun. 2022 om 23:51 schreef Austin Bennett < > whatwouldausti...@gmail.com>: > > > From the peanut gallery :-) --> > > > > Wow; yes, lots of open PRs. https://github.com/apache/calcite/pulls > > > > How can individuals from the Flink [sub-]community, and/or more general > > calcite community help lighten this load? Is there much weight given to > > reviews from non-committers; how to increase the # of people capable of > > providing worthwhile reviews [ that are recognized as such ]? > > > > > > > > On Mon, Jun 20, 2022 at 11:47 AM Julian Hyde <jhyde.apa...@gmail.com> > > wrote: > > > > > Martijn, > > > > > > Since you requested a reply, I am replying. To answer your question, I > > > don’t know of a way to move this topic forward. We have more PRs than > > > people to review them. > > > > > > Julian > > > > > > > > > > On Jun 19, 2022, at 11:58 PM, Martijn Visser < > martijnvis...@apache.org > > > > > > wrote: > > > > > > > > Hi everyone, > > > > > > > > I just wanted to reach out to the Calcite community once more on this > > > topic > > > > since no reply was received. Would be great if someone could get back > > to > > > us. > > > > > > > > Best regards, > > > > > > > > Martijn > > > > > > > > Op wo 8 jun. 2022 om 11:24 schreef Martijn Visser < > > > martijnvis...@apache.org > > > >> : > > > > > > > >> Hi everyone, > > > >> > > > >> I would like to follow-up on this email that was sent by Jing. So > far, > > > no > > > >> progress has been made, despite reaching out to the mailing list, > the > > > >> original Jira ticket and reaching out to people directly. Is there a > > way > > > >> that we can move this PR/topic forward? > > > >> > > > >> For context, in Apache Flink we're currently heavily using Calcite. > > > >> However, we are now at the stage where Calcite is actually holding > us > > > back. > > > >> It would be great if we can find a way to strengthen our bond and > move > > > both > > > >> Calcite and Flink forward. > > > >> > > > >> Looking forward to your thoughts, > > > >> > > > >> Martijn > > > >> > > > >> On 2022/01/26 07:05:37 Jing Zhang wrote: > > > >>> Hi community, > > > >>> My apologies for interrupting. > > > >>> Anyone could help to review the pr > > > >>> https://github.com/apache/calcite/pull/2606? > > > >>> Thanks a lot. > > > >>> > > > >>> CALCITE-4865 is the first sub-task of CALCITE-4864. This Jira aims > to > > > >>> extend existing Table function in order to support Polymorphic > Table > > > >>> Function which is introduced as the part of ANSI SQL 2016. > > > >>> > > > >>> The brief change logs of the PR are: > > > >>> - Update `Parser.jj` to support partition by clause and order by > > > clause > > > >>> for input table with set semantics of PTF > > > >>> - Introduce `TableCharacteristics` which contains three > > > characteristics > > > >>> of input table of table function > > > >>> - Update `SqlTableFunction` to add a method > `tableCharacteristics`, > > > >> the > > > >>> method returns the table characteristics for the ordinal-th > argument > > to > > > >>> this table function. Default return value is Optional.empty which > > means > > > >> the > > > >>> ordinal-th argument is not table. > > > >>> - Introduce `SqlSetSemanticsTable` which represents input table > with > > > >> set > > > >>> semantics of Table Function, its `SqlKind` is `SET_SEMANTICS_TABLE` > > > >>> - Updates `SqlValidatorImpl` to validate only set semantic table > of > > > >> Table > > > >>> Function could have partition by and order by clause > > > >>> - Update `SqlToRelConverter#substituteSubQuery` to parse subQuery > > > which > > > >>> represents set semantics table. > > > >>> > > > >>> PR: https://github.com/apache/calcite/pull/2606 > > > >>> JIRA: https://issues.apache.org/jira/browse/CALCITE-4865 > > > >>> Parent JARA: https://issues.apache.org/jira/browse/CALCITE-4864 > > > >>> > > > >>> Best, > > > >>> Jing Zhang > > > >>> > > > >> > > > > > > > > >