Hi everyone,

This is an awesome discussion to improve collaborating between different
projects.
Thanks Julian, Jacques, Austin, Martijn, Timo's effort to make it happen.

Best,
Jing Zhang

Martijn Visser <martijnvis...@apache.org> 于2022年6月23日周四 01:43写道:

> Hi Jacques, Julian, Austin and everyone else,
>
> Thank you very much for sharing all your experiences and providing really
> valuable input. I'll definitely relay this back to the original discussion
> thread in the Flink community. Part of bringing this information back to
> the Flink community is also because I feel like the only way that different
> OSS solutions can help each other forward is by communicating and
> collaborating. As Timo already mentioned, he'll try to help out. Let's try
> to get some more involved.
>
> Side note: I also saw that this thread got some traction on Twitter [1] on
> the cost of forking.
>
> Best regards,
>
> Martijn
>
> [1]
>
> https://twitter.com/gunnarmorling/status/1539499415337111553?s=21&t=8fGk3PxScOx4FJPJWE5UeA
>
> Op wo 22 jun. 2022 om 09:29 schreef Timo Walther <twal...@apache.org>:
>
> > Hi everyone,
> >
> > This is a really great discussion. Thanks for starting it Martijn and
> > your input Jacques! I have been fighting against forking Calcite in
> > Flink for years already. Even when merging forks of Flink that
> > transitively forked Calcite, in the end we were able to resolve
> > conflicts / contribute blockers back into Calcite. And I strongly
> > believe that this is the better approach for long-term success for both
> > projects.
> >
> > I would like to get more involved in the Calcite community. I have been
> > implementing and managing Flink SQL based on Calcite since 2016. Thus, I
> > feel confident to say that I know the code base and some quirks in the
> > stack very well.
> >
> > Capacity-wise I will try to reserve some time for helping the Calcite
> > community. Happy to get some pointers where and how I can help.
> >
> > I will take a look at https://github.com/apache/calcite/pull/2606 this
> > week to get the ball rolling. As this is an important addition and
> > prepares for "customer SQL operators" in Flink SQL.
> >
> > Regards,
> > Timo
> >
> > On 21.06.22 22:18, Charles Givre wrote:
> > > As the PMC for Apache Drill, I'd echo everyone's comments here....
> Don't
> > fork.   Don't do it.
> > >
> > > Apache Drill forked Calcite several years ago which Calcite was on
> > version 1.20 or 1.21.  While this meant that some bugs were easily fixed,
> > what it also meant that as our fork diverged from "regular" Calcite, it
> > became harder and harder to maintain.  It also meant that we were chasing
> > bugs that had since been fixed.
> > >
> > > Drill is in the process of "de-forking" Calcite, meaning that we're
> > ditching our fork and re-integrating with standard Calcite.  It has been
> A
> > TON of work and we have contributed (and will continue to contribute) bug
> > fixes and PRs to Calcite. In the long run, I think this will be
> beneficial
> > for both communities.
> > >
> > > Best,
> > > -- C
> > >
> > >
> > >> On Jun 21, 2022, at 1:57 PM, Julian Hyde <jhyde.apa...@gmail.com>
> > wrote:
> > >>
> > >> Please don’t fork Calcite.
> > >>
> > >> Calcite suffers from the tragedy of the commons. Unlike many open
> > source data projects, there is no commercial project that directly maps
> to
> > Calcite (even though Calcite is an essential part of many projects). As a
> > result no engineers work full-time on Calcite.
> > >>
> > >> It takes more than pull requests to keep a project going. We need
> > reviewers, people to work on releases, people to fix bugs (such as
> security
> > bugs) that are important to everyone but urgent to no one.
> > >>
> > >> We have plenty of committers in Calcite, and add several more per
> year.
> > We rely on those committers taking on their share of the housework, but
> the
> > burden falls on too few people.
> > >>
> > >> Engineering managers need to start paying a little more for the “free
> > lunch” that they enjoy when Calcite “just works” in their project. Sadly,
> > most engineering managers are not subscribed to this list.
> > >>
> > >> Julian
> > >>
> > >>
> > >>> On Jun 21, 2022, at 9:49 AM, Jacques Nadeau <jacq...@apache.org>
> > wrote:
> > >>>
> > >>> Martijn, thanks for sharing that thread in the Flink community.
> > >>>
> > >>> I'm someone who has forked Calcite twice: once in Apache Drill and
> > again in
> > >>> Dremio. In both cases, it was all about trading short term benefits
> > against
> > >>> long term costs. In both cases, I think the net amount of work was
> > probably
> > >>> 5x as much as what it would have been if we had just done a better
> job
> > >>> engaging the community. If I were to state the curve of behavior over
> > six
> > >>> years, I'd guess that in both cases the numbers of effort looked like
> > this:
> > >>>
> > >>> estimated effort doing high intensity integration with calcite (years
> > 1-6)
> > >>> fork: 1, 5, 10, 50, 100, 200, total = 366
> > >>> non-fork: 10, 10, 10, 10, 10, total = 50
> > >>>
> > >>> So yes, the first couple years you're ahead. But you pay a massive
> > >>> technical debt premium long term. Early in a project (Drill) or
> > company's
> > >>> life (Dremio), it can make sense to sacrifice long term for short
> term
> > but
> > >>> it's important people do it with their eyes open.
> > >>>
> > >>> The reason that this pain is so high is that as your codebases
> > diverge, you
> > >>> start having to do everything the Calcite community does by yourself.
> > >>> Backports become harder and things that you need (e.g. new sql
> syntax,
> > etc)
> > >>> have to be reimplemented (even if someone else already implemented
> > them in
> > >>> some post-fork Calcite version. Ultimately, at some point you realize
> > that
> > >>> your path is untenable and you unfork. This becomes the biggest
> > expense of
> > >>> them all and I believe both of those teams are still trying to
> > un-fork. The
> > >>> additional thing that becomes an even bigger problem is your absence
> > from
> > >>> the Calcite community means that people may take the project or APIs
> in
> > >>> ways that are in direct conflict to how you use the library. Since
> > you're
> > >>> not active in the project, you fail to provide a counterpoint and
> then
> > >>> you're basically just in a miserable place. The Hive project did this
> > best
> > >>> by ensuring that releases of Calcite were also run pre-release
> against
> > Hive
> > >>> to make sure no major regressions occurred. By being in the community
> > and
> > >>> active, this is the best state from my pov. (It makes your project
> > better
> > >>> and Calcite better.)
> > >>>
> > >>> Two last notes:
> > >>> - I'm not sure the rocks fork is comparable to forking Calcite. The
> api
> > >>> surface area and community models are very different.
> > >>> - This is all based on a high intensity integration (using rules +
> > planner
> > >>> or sql + rules + planner). Calcite is frustratingly monolithic and if
> > >>> someone was only going to use a small component, my opinion would
> > likely be
> > >>> very different.
> > >>>
> > >>> I'd send this to the Flink list but I'm not subscribed. It'd be great
> > if
> > >>> you shared it with the people over there if you think they'd find it
> > useful.
> > >>>
> > >>>
> > >>>
> > >>> On Tue, Jun 21, 2022 at 12:31 AM Martijn Visser <
> > martijnvis...@apache.org>
> > >>> wrote:
> > >>>
> > >>>> Thanks Julian and Austin!
> > >>>>
> > >>>> Any reply to kick-off some sort of discussion is worthwhile :D
> > >>>> I definitely know the feeling of having more PRs open then you would
> > like,
> > >>>> looking at https://github.com/apache/flink/pulls :)
> > >>>>
> > >>>> There have been discussions in the Flink community about forking
> > Calcite
> > >>>> [1]. My personal preference at the moment is to see if we can
> create a
> > >>>> better collaboration and community. I believe that we can find
> people
> > from
> > >>>> the Flink community who can open / help reviewing Calcite PRs that
> are
> > >>>> interesting for the Flink community. The question is if that will
> > also help
> > >>>> short term since in the end it still requires a Calcite maintainer
> to
> > >>>> review/merge.
> > >>>>
> > >>>> Best regards,
> > >>>>
> > >>>> Martijn
> > >>>>
> > >>>> [1]
> https://lists.apache.org/thread/1oqydpsm4mc55bkk440gx9lr9gf2rvf4
> > >>>>
> > >>>>
> > >>>> Op ma 20 jun. 2022 om 23:51 schreef Austin Bennett <
> > >>>> whatwouldausti...@gmail.com>:
> > >>>>
> > >>>>>  From the peanut gallery :-)  -->
> > >>>>>
> > >>>>> Wow; yes, lots of open PRs.
> https://github.com/apache/calcite/pulls
> > >>>>>
> > >>>>> How can individuals from the Flink [sub-]community, and/or more
> > general
> > >>>>> calcite community help lighten this load?  Is there much weight
> > given to
> > >>>>> reviews from non-committers; how to increase the # of people
> capable
> > of
> > >>>>> providing worthwhile reviews [ that are recognized as such ]?
> > >>>>>
> > >>>>>
> > >>>>>
> > >>>>> On Mon, Jun 20, 2022 at 11:47 AM Julian Hyde <
> jhyde.apa...@gmail.com
> > >
> > >>>>> wrote:
> > >>>>>
> > >>>>>> Martijn,
> > >>>>>>
> > >>>>>> Since you requested a reply, I am replying. To answer your
> > question, I
> > >>>>>> don’t know of a way to move this topic forward. We have more PRs
> > than
> > >>>>>> people to review them.
> > >>>>>>
> > >>>>>> Julian
> > >>>>>>
> > >>>>>>
> > >>>>>>> On Jun 19, 2022, at 11:58 PM, Martijn Visser <
> > >>>> martijnvis...@apache.org
> > >>>>>>
> > >>>>>> wrote:
> > >>>>>>>
> > >>>>>>> Hi everyone,
> > >>>>>>>
> > >>>>>>> I just wanted to reach out to the Calcite community once more on
> > this
> > >>>>>> topic
> > >>>>>>> since no reply was received. Would be great if someone could get
> > back
> > >>>>> to
> > >>>>>> us.
> > >>>>>>>
> > >>>>>>> Best regards,
> > >>>>>>>
> > >>>>>>> Martijn
> > >>>>>>>
> > >>>>>>> Op wo 8 jun. 2022 om 11:24 schreef Martijn Visser <
> > >>>>>> martijnvis...@apache.org
> > >>>>>>>> :
> > >>>>>>>
> > >>>>>>>> Hi everyone,
> > >>>>>>>>
> > >>>>>>>> I would like to follow-up on this email that was sent by Jing.
> So
> > >>>> far,
> > >>>>>> no
> > >>>>>>>> progress has been made, despite reaching out to the mailing
> list,
> > >>>> the
> > >>>>>>>> original Jira ticket and reaching out to people directly. Is
> > there a
> > >>>>> way
> > >>>>>>>> that we can move this PR/topic forward?
> > >>>>>>>>
> > >>>>>>>> For context, in Apache Flink we're currently heavily using
> > Calcite.
> > >>>>>>>> However, we are now at the stage where Calcite is actually
> holding
> > >>>> us
> > >>>>>> back.
> > >>>>>>>> It would be great if we can find a way to strengthen our bond
> and
> > >>>> move
> > >>>>>> both
> > >>>>>>>> Calcite and Flink forward.
> > >>>>>>>>
> > >>>>>>>> Looking forward to your thoughts,
> > >>>>>>>>
> > >>>>>>>> Martijn
> > >>>>>>>>
> > >>>>>>>> On 2022/01/26 07:05:37 Jing Zhang wrote:
> > >>>>>>>>> Hi community,
> > >>>>>>>>> My apologies for interrupting.
> > >>>>>>>>> Anyone could help to review the pr
> > >>>>>>>>> https://github.com/apache/calcite/pull/2606?
> > >>>>>>>>> Thanks a lot.
> > >>>>>>>>>
> > >>>>>>>>> CALCITE-4865 is the first sub-task of CALCITE-4864. This Jira
> > aims
> > >>>> to
> > >>>>>>>>> extend existing Table function in order to support Polymorphic
> > >>>> Table
> > >>>>>>>>> Function which is introduced as the part of ANSI SQL 2016.
> > >>>>>>>>>
> > >>>>>>>>> The brief change logs of the PR are:
> > >>>>>>>>> - Update `Parser.jj` to support partition by clause and order
> by
> > >>>>>> clause
> > >>>>>>>>> for input table with set semantics of PTF
> > >>>>>>>>> - Introduce `TableCharacteristics` which contains three
> > >>>>>> characteristics
> > >>>>>>>>> of input table of table function
> > >>>>>>>>> - Update `SqlTableFunction` to add a method
> > >>>> `tableCharacteristics`,
> > >>>>>>>> the
> > >>>>>>>>> method returns the table characteristics for the ordinal-th
> > >>>> argument
> > >>>>> to
> > >>>>>>>>> this table function. Default return value is Optional.empty
> which
> > >>>>> means
> > >>>>>>>> the
> > >>>>>>>>> ordinal-th argument is not table.
> > >>>>>>>>> - Introduce `SqlSetSemanticsTable` which represents input table
> > >>>> with
> > >>>>>>>> set
> > >>>>>>>>> semantics of Table Function, its `SqlKind` is
> > `SET_SEMANTICS_TABLE`
> > >>>>>>>>> - Updates `SqlValidatorImpl` to validate only set semantic
> table
> > >>>> of
> > >>>>>>>> Table
> > >>>>>>>>> Function could have partition by and order by clause
> > >>>>>>>>> - Update `SqlToRelConverter#substituteSubQuery` to parse
> subQuery
> > >>>>>> which
> > >>>>>>>>> represents set semantics table.
> > >>>>>>>>>
> > >>>>>>>>> PR: https://github.com/apache/calcite/pull/2606
> > >>>>>>>>> JIRA: https://issues.apache.org/jira/browse/CALCITE-4865
> > >>>>>>>>> Parent JARA:
> https://issues.apache.org/jira/browse/CALCITE-4864
> > >>>>>>>>>
> > >>>>>>>>> Best,
> > >>>>>>>>> Jing Zhang
> > >>>>>>>>>
> > >>>>>>>>
> > >>>>>>
> > >>>>>>
> > >>>>>
> > >>>>
> > >>
> > >
> >
> >
>

Reply via email to