I completely agree with Julian. The problem cannot be solved unless we start investing more time in the project in the ways he already described.
What I outlined previously is an attempt to mitigate the current situation, not something that can solve the problem for good. Nevertheless, to push this forward I created a PR [1] with an initial sketch of the process. Feel free to leave your comments there. Best, Stamatis [1] https://github.com/apache/calcite/pull/2851 On Thu, Jun 23, 2022 at 8:34 PM Julian Hyde <jhyde.apa...@gmail.com> wrote: > +1 to Stamatis’ idea. It won’t make things worse. :) > > But to repeat what I said earlier. We need existing committers to pull > their weight. If necessary, committers need to talk to their managers and > get time allocated to contribute to “housekeeping”. > > One important kind of housekeeping is productization. That means not just > getting features and bug fixes into Calcite, but adding sufficient > documentation that users know they exist and how to use them. You may have > noticed that I spend a lot of effort asking people to improve the subject > and description of JIRA cases, and making sure that the commit message > matches the JIRA subject. I do this because usually the only documentation > of a feature is the line in the release notes and the JIRA case it links to. > > This effort is key to Calcite’s success, and quite a few committers don’t > do it. If committers did a better job in this area, it would reduce the > workload on me. > > Julian > > > > > On Jun 23, 2022, at 6:44 AM, Ruben Q L <rube...@gmail.com> wrote: > > > > +1 on Stamatis' idea, I think it could help with the current situation of > > lack of reviewers. > > > > Best, > > Ruben > > > > > > On Thu, Jun 23, 2022 at 12:56 PM Charles Givre <cgi...@gmail.com> wrote: > > > >> Hello all, > >> FWIW, If a committer/reviewer shortage is the issue, I'd second > Stamatis's > >> recommendation. > >> Best, > >> -- C > >> > >>> On Jun 23, 2022, at 7:02 AM, Stamatis Zampetakis <zabe...@gmail.com> > >> wrote: > >>> > >>> Hi all, > >>> > >>> How about granting Calcite committership to people who are already ASF > >>> committers (in other projects) and they have a proven record of working > >>> with Calcite? > >>> > >>> Usually the PMC invites people to become committers to the project > after > >>> having a few successful code contributions in Calcite/Avatica repos. > >>> This is to ensure that people are familiar with the codebase and > >> understand > >>> how the ASF works. > >>> > >>> People who are already committers in an ASF project already know how > the > >>> foundation works and how they should behave. > >>> Also people working in projects like Drill, Flink, Hive, Ignite, > Phoenix, > >>> etc., may already be quite familiar with Calcite if they have worked on > >> the > >>> query processing layer of the system. > >>> > >>> It might be difficult for the Calcite PMC to identify people familiar > >> with > >>> Calcite if they don't contribute to the main Calcite/Avatica repos > >>> regularly thus I would be open to consider people for committers on a > per > >>> request basis. > >>> > >>> Example: > >>> Bob is an ASF committer in Flink and he has pushed various > contributions > >>> around Calcite in the Flink repo. > >>> Bob feels confident about fixing trivial things in Calcite and he wants > >> to > >>> help with reviewing and merging open PRs. > >>> Bob sends an email to private@calcite list requesting to become a > >> Calcite > >>> committer. > >>> Bob explains in the email who he is and what he has done to demonstrate > >> he > >>> is familiar with the Calcite code. > >>> The Calcite PMC acknowledges the request and starts a vote for granting > >>> Calcite comittership to Bob. > >>> The Calcite PMC informs Bob about their decision and takes further > >> actions > >>> if necessary. > >>> > >>> If we agree on the overall idea we can figure out the details and > >> formalize > >>> the request process in our docs. > >>> > >>> Best, > >>> Stamatis > >>> > >>> On Thu, Jun 23, 2022 at 6:06 AM Jing Zhang <beyond1...@gmail.com> > wrote: > >>> > >>>> Hi everyone, > >>>> > >>>> This is an awesome discussion to improve collaborating between > different > >>>> projects. > >>>> Thanks Julian, Jacques, Austin, Martijn, Timo's effort to make it > >> happen. > >>>> > >>>> Best, > >>>> Jing Zhang > >>>> > >>>> Martijn Visser <martijnvis...@apache.org> 于2022年6月23日周四 01:43写道: > >>>> > >>>>> Hi Jacques, Julian, Austin and everyone else, > >>>>> > >>>>> Thank you very much for sharing all your experiences and providing > >> really > >>>>> valuable input. I'll definitely relay this back to the original > >>>> discussion > >>>>> thread in the Flink community. Part of bringing this information back > >> to > >>>>> the Flink community is also because I feel like the only way that > >>>> different > >>>>> OSS solutions can help each other forward is by communicating and > >>>>> collaborating. As Timo already mentioned, he'll try to help out. > Let's > >>>> try > >>>>> to get some more involved. > >>>>> > >>>>> Side note: I also saw that this thread got some traction on Twitter > [1] > >>>> on > >>>>> the cost of forking. > >>>>> > >>>>> Best regards, > >>>>> > >>>>> Martijn > >>>>> > >>>>> [1] > >>>>> > >>>>> > >>>> > >> > https://twitter.com/gunnarmorling/status/1539499415337111553?s=21&t=8fGk3PxScOx4FJPJWE5UeA > >>>>> > >>>>> Op wo 22 jun. 2022 om 09:29 schreef Timo Walther <twal...@apache.org > >: > >>>>> > >>>>>> Hi everyone, > >>>>>> > >>>>>> This is a really great discussion. Thanks for starting it Martijn > and > >>>>>> your input Jacques! I have been fighting against forking Calcite in > >>>>>> Flink for years already. Even when merging forks of Flink that > >>>>>> transitively forked Calcite, in the end we were able to resolve > >>>>>> conflicts / contribute blockers back into Calcite. And I strongly > >>>>>> believe that this is the better approach for long-term success for > >> both > >>>>>> projects. > >>>>>> > >>>>>> I would like to get more involved in the Calcite community. I have > >> been > >>>>>> implementing and managing Flink SQL based on Calcite since 2016. > Thus, > >>>> I > >>>>>> feel confident to say that I know the code base and some quirks in > the > >>>>>> stack very well. > >>>>>> > >>>>>> Capacity-wise I will try to reserve some time for helping the > Calcite > >>>>>> community. Happy to get some pointers where and how I can help. > >>>>>> > >>>>>> I will take a look at https://github.com/apache/calcite/pull/2606 > >> this > >>>>>> week to get the ball rolling. As this is an important addition and > >>>>>> prepares for "customer SQL operators" in Flink SQL. > >>>>>> > >>>>>> Regards, > >>>>>> Timo > >>>>>> > >>>>>> On 21.06.22 22:18, Charles Givre wrote: > >>>>>>> As the PMC for Apache Drill, I'd echo everyone's comments here.... > >>>>> Don't > >>>>>> fork. Don't do it. > >>>>>>> > >>>>>>> Apache Drill forked Calcite several years ago which Calcite was on > >>>>>> version 1.20 or 1.21. While this meant that some bugs were easily > >>>> fixed, > >>>>>> what it also meant that as our fork diverged from "regular" Calcite, > >> it > >>>>>> became harder and harder to maintain. It also meant that we were > >>>> chasing > >>>>>> bugs that had since been fixed. > >>>>>>> > >>>>>>> Drill is in the process of "de-forking" Calcite, meaning that we're > >>>>>> ditching our fork and re-integrating with standard Calcite. It has > >>>> been > >>>>> A > >>>>>> TON of work and we have contributed (and will continue to > contribute) > >>>> bug > >>>>>> fixes and PRs to Calcite. In the long run, I think this will be > >>>>> beneficial > >>>>>> for both communities. > >>>>>>> > >>>>>>> Best, > >>>>>>> -- C > >>>>>>> > >>>>>>> > >>>>>>>> On Jun 21, 2022, at 1:57 PM, Julian Hyde <jhyde.apa...@gmail.com> > >>>>>> wrote: > >>>>>>>> > >>>>>>>> Please don’t fork Calcite. > >>>>>>>> > >>>>>>>> Calcite suffers from the tragedy of the commons. Unlike many open > >>>>>> source data projects, there is no commercial project that directly > >> maps > >>>>> to > >>>>>> Calcite (even though Calcite is an essential part of many projects). > >>>> As a > >>>>>> result no engineers work full-time on Calcite. > >>>>>>>> > >>>>>>>> It takes more than pull requests to keep a project going. We need > >>>>>> reviewers, people to work on releases, people to fix bugs (such as > >>>>> security > >>>>>> bugs) that are important to everyone but urgent to no one. > >>>>>>>> > >>>>>>>> We have plenty of committers in Calcite, and add several more per > >>>>> year. > >>>>>> We rely on those committers taking on their share of the housework, > >> but > >>>>> the > >>>>>> burden falls on too few people. > >>>>>>>> > >>>>>>>> Engineering managers need to start paying a little more for the > >>>> “free > >>>>>> lunch” that they enjoy when Calcite “just works” in their project. > >>>> Sadly, > >>>>>> most engineering managers are not subscribed to this list. > >>>>>>>> > >>>>>>>> Julian > >>>>>>>> > >>>>>>>> > >>>>>>>>> On Jun 21, 2022, at 9:49 AM, Jacques Nadeau <jacq...@apache.org> > >>>>>> wrote: > >>>>>>>>> > >>>>>>>>> Martijn, thanks for sharing that thread in the Flink community. > >>>>>>>>> > >>>>>>>>> I'm someone who has forked Calcite twice: once in Apache Drill > and > >>>>>> again in > >>>>>>>>> Dremio. In both cases, it was all about trading short term > benefits > >>>>>> against > >>>>>>>>> long term costs. In both cases, I think the net amount of work > was > >>>>>> probably > >>>>>>>>> 5x as much as what it would have been if we had just done a > better > >>>>> job > >>>>>>>>> engaging the community. If I were to state the curve of behavior > >>>> over > >>>>>> six > >>>>>>>>> years, I'd guess that in both cases the numbers of effort looked > >>>> like > >>>>>> this: > >>>>>>>>> > >>>>>>>>> estimated effort doing high intensity integration with calcite > >>>> (years > >>>>>> 1-6) > >>>>>>>>> fork: 1, 5, 10, 50, 100, 200, total = 366 > >>>>>>>>> non-fork: 10, 10, 10, 10, 10, total = 50 > >>>>>>>>> > >>>>>>>>> So yes, the first couple years you're ahead. But you pay a > massive > >>>>>>>>> technical debt premium long term. Early in a project (Drill) or > >>>>>> company's > >>>>>>>>> life (Dremio), it can make sense to sacrifice long term for short > >>>>> term > >>>>>> but > >>>>>>>>> it's important people do it with their eyes open. > >>>>>>>>> > >>>>>>>>> The reason that this pain is so high is that as your codebases > >>>>>> diverge, you > >>>>>>>>> start having to do everything the Calcite community does by > >>>> yourself. > >>>>>>>>> Backports become harder and things that you need (e.g. new sql > >>>>> syntax, > >>>>>> etc) > >>>>>>>>> have to be reimplemented (even if someone else already > implemented > >>>>>> them in > >>>>>>>>> some post-fork Calcite version. Ultimately, at some point you > >>>> realize > >>>>>> that > >>>>>>>>> your path is untenable and you unfork. This becomes the biggest > >>>>>> expense of > >>>>>>>>> them all and I believe both of those teams are still trying to > >>>>>> un-fork. The > >>>>>>>>> additional thing that becomes an even bigger problem is your > >>>> absence > >>>>>> from > >>>>>>>>> the Calcite community means that people may take the project or > >>>> APIs > >>>>> in > >>>>>>>>> ways that are in direct conflict to how you use the library. > Since > >>>>>> you're > >>>>>>>>> not active in the project, you fail to provide a counterpoint and > >>>>> then > >>>>>>>>> you're basically just in a miserable place. The Hive project did > >>>> this > >>>>>> best > >>>>>>>>> by ensuring that releases of Calcite were also run pre-release > >>>>> against > >>>>>> Hive > >>>>>>>>> to make sure no major regressions occurred. By being in the > >>>> community > >>>>>> and > >>>>>>>>> active, this is the best state from my pov. (It makes your > project > >>>>>> better > >>>>>>>>> and Calcite better.) > >>>>>>>>> > >>>>>>>>> Two last notes: > >>>>>>>>> - I'm not sure the rocks fork is comparable to forking Calcite. > The > >>>>> api > >>>>>>>>> surface area and community models are very different. > >>>>>>>>> - This is all based on a high intensity integration (using rules > + > >>>>>> planner > >>>>>>>>> or sql + rules + planner). Calcite is frustratingly monolithic > and > >>>> if > >>>>>>>>> someone was only going to use a small component, my opinion would > >>>>>> likely be > >>>>>>>>> very different. > >>>>>>>>> > >>>>>>>>> I'd send this to the Flink list but I'm not subscribed. It'd be > >>>> great > >>>>>> if > >>>>>>>>> you shared it with the people over there if you think they'd find > >>>> it > >>>>>> useful. > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> On Tue, Jun 21, 2022 at 12:31 AM Martijn Visser < > >>>>>> martijnvis...@apache.org> > >>>>>>>>> wrote: > >>>>>>>>> > >>>>>>>>>> Thanks Julian and Austin! > >>>>>>>>>> > >>>>>>>>>> Any reply to kick-off some sort of discussion is worthwhile :D > >>>>>>>>>> I definitely know the feeling of having more PRs open then you > >>>> would > >>>>>> like, > >>>>>>>>>> looking at https://github.com/apache/flink/pulls :) > >>>>>>>>>> > >>>>>>>>>> There have been discussions in the Flink community about forking > >>>>>> Calcite > >>>>>>>>>> [1]. My personal preference at the moment is to see if we can > >>>>> create a > >>>>>>>>>> better collaboration and community. I believe that we can find > >>>>> people > >>>>>> from > >>>>>>>>>> the Flink community who can open / help reviewing Calcite PRs > that > >>>>> are > >>>>>>>>>> interesting for the Flink community. The question is if that > will > >>>>>> also help > >>>>>>>>>> short term since in the end it still requires a Calcite > maintainer > >>>>> to > >>>>>>>>>> review/merge. > >>>>>>>>>> > >>>>>>>>>> Best regards, > >>>>>>>>>> > >>>>>>>>>> Martijn > >>>>>>>>>> > >>>>>>>>>> [1] > >>>>> https://lists.apache.org/thread/1oqydpsm4mc55bkk440gx9lr9gf2rvf4 > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> Op ma 20 jun. 2022 om 23:51 schreef Austin Bennett < > >>>>>>>>>> whatwouldausti...@gmail.com>: > >>>>>>>>>> > >>>>>>>>>>> From the peanut gallery :-) --> > >>>>>>>>>>> > >>>>>>>>>>> Wow; yes, lots of open PRs. > >>>>> https://github.com/apache/calcite/pulls > >>>>>>>>>>> > >>>>>>>>>>> How can individuals from the Flink [sub-]community, and/or more > >>>>>> general > >>>>>>>>>>> calcite community help lighten this load? Is there much weight > >>>>>> given to > >>>>>>>>>>> reviews from non-committers; how to increase the # of people > >>>>> capable > >>>>>> of > >>>>>>>>>>> providing worthwhile reviews [ that are recognized as such ]? > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> On Mon, Jun 20, 2022 at 11:47 AM Julian Hyde < > >>>>> jhyde.apa...@gmail.com > >>>>>>> > >>>>>>>>>>> wrote: > >>>>>>>>>>> > >>>>>>>>>>>> Martijn, > >>>>>>>>>>>> > >>>>>>>>>>>> Since you requested a reply, I am replying. To answer your > >>>>>> question, I > >>>>>>>>>>>> don’t know of a way to move this topic forward. We have more > PRs > >>>>>> than > >>>>>>>>>>>> people to review them. > >>>>>>>>>>>> > >>>>>>>>>>>> Julian > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>>> On Jun 19, 2022, at 11:58 PM, Martijn Visser < > >>>>>>>>>> martijnvis...@apache.org > >>>>>>>>>>>> > >>>>>>>>>>>> wrote: > >>>>>>>>>>>>> > >>>>>>>>>>>>> Hi everyone, > >>>>>>>>>>>>> > >>>>>>>>>>>>> I just wanted to reach out to the Calcite community once more > >>>> on > >>>>>> this > >>>>>>>>>>>> topic > >>>>>>>>>>>>> since no reply was received. Would be great if someone could > >>>> get > >>>>>> back > >>>>>>>>>>> to > >>>>>>>>>>>> us. > >>>>>>>>>>>>> > >>>>>>>>>>>>> Best regards, > >>>>>>>>>>>>> > >>>>>>>>>>>>> Martijn > >>>>>>>>>>>>> > >>>>>>>>>>>>> Op wo 8 jun. 2022 om 11:24 schreef Martijn Visser < > >>>>>>>>>>>> martijnvis...@apache.org > >>>>>>>>>>>>>> : > >>>>>>>>>>>>> > >>>>>>>>>>>>>> Hi everyone, > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> I would like to follow-up on this email that was sent by > Jing. > >>>>> So > >>>>>>>>>> far, > >>>>>>>>>>>> no > >>>>>>>>>>>>>> progress has been made, despite reaching out to the mailing > >>>>> list, > >>>>>>>>>> the > >>>>>>>>>>>>>> original Jira ticket and reaching out to people directly. Is > >>>>>> there a > >>>>>>>>>>> way > >>>>>>>>>>>>>> that we can move this PR/topic forward? > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> For context, in Apache Flink we're currently heavily using > >>>>>> Calcite. > >>>>>>>>>>>>>> However, we are now at the stage where Calcite is actually > >>>>> holding > >>>>>>>>>> us > >>>>>>>>>>>> back. > >>>>>>>>>>>>>> It would be great if we can find a way to strengthen our > bond > >>>>> and > >>>>>>>>>> move > >>>>>>>>>>>> both > >>>>>>>>>>>>>> Calcite and Flink forward. > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> Looking forward to your thoughts, > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> Martijn > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> On 2022/01/26 07:05:37 Jing Zhang wrote: > >>>>>>>>>>>>>>> Hi community, > >>>>>>>>>>>>>>> My apologies for interrupting. > >>>>>>>>>>>>>>> Anyone could help to review the pr > >>>>>>>>>>>>>>> https://github.com/apache/calcite/pull/2606? > >>>>>>>>>>>>>>> Thanks a lot. > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> CALCITE-4865 is the first sub-task of CALCITE-4864. This > Jira > >>>>>> aims > >>>>>>>>>> to > >>>>>>>>>>>>>>> extend existing Table function in order to support > >>>> Polymorphic > >>>>>>>>>> Table > >>>>>>>>>>>>>>> Function which is introduced as the part of ANSI SQL 2016. > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> The brief change logs of the PR are: > >>>>>>>>>>>>>>> - Update `Parser.jj` to support partition by clause and > order > >>>>> by > >>>>>>>>>>>> clause > >>>>>>>>>>>>>>> for input table with set semantics of PTF > >>>>>>>>>>>>>>> - Introduce `TableCharacteristics` which contains three > >>>>>>>>>>>> characteristics > >>>>>>>>>>>>>>> of input table of table function > >>>>>>>>>>>>>>> - Update `SqlTableFunction` to add a method > >>>>>>>>>> `tableCharacteristics`, > >>>>>>>>>>>>>> the > >>>>>>>>>>>>>>> method returns the table characteristics for the ordinal-th > >>>>>>>>>> argument > >>>>>>>>>>> to > >>>>>>>>>>>>>>> this table function. Default return value is Optional.empty > >>>>> which > >>>>>>>>>>> means > >>>>>>>>>>>>>> the > >>>>>>>>>>>>>>> ordinal-th argument is not table. > >>>>>>>>>>>>>>> - Introduce `SqlSetSemanticsTable` which represents input > >>>> table > >>>>>>>>>> with > >>>>>>>>>>>>>> set > >>>>>>>>>>>>>>> semantics of Table Function, its `SqlKind` is > >>>>>> `SET_SEMANTICS_TABLE` > >>>>>>>>>>>>>>> - Updates `SqlValidatorImpl` to validate only set semantic > >>>>> table > >>>>>>>>>> of > >>>>>>>>>>>>>> Table > >>>>>>>>>>>>>>> Function could have partition by and order by clause > >>>>>>>>>>>>>>> - Update `SqlToRelConverter#substituteSubQuery` to parse > >>>>> subQuery > >>>>>>>>>>>> which > >>>>>>>>>>>>>>> represents set semantics table. > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> PR: https://github.com/apache/calcite/pull/2606 > >>>>>>>>>>>>>>> JIRA: https://issues.apache.org/jira/browse/CALCITE-4865 > >>>>>>>>>>>>>>> Parent JARA: > >>>>> https://issues.apache.org/jira/browse/CALCITE-4864 > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> Best, > >>>>>>>>>>>>>>> Jing Zhang > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>> > >>>>>>>> > >>>>>>> > >>>>>> > >>>>>> > >>>>> > >>>> > >> > >> > >