Re: PR Review Request

2022-07-05 Thread Stamatis Zampetakis
I completely agree with Julian. The problem cannot be solved unless we
start investing more time in the project in the ways he already described.

What I outlined previously is an attempt to mitigate the current situation,
not something that can solve the problem for good. Nevertheless, to push
this forward I created a PR [1] with an initial sketch of the process. Feel
free to leave your comments there.

Best,
Stamatis

[1] https://github.com/apache/calcite/pull/2851

On Thu, Jun 23, 2022 at 8:34 PM Julian Hyde  wrote:

> +1 to Stamatis’ idea. It won’t make things worse. :)
>
> But to repeat what I said earlier. We need existing committers to pull
> their weight. If necessary, committers need to talk to their managers and
> get time allocated to contribute to “housekeeping”.
>
> One important kind of housekeeping is productization. That means not just
> getting features and bug fixes into Calcite, but adding sufficient
> documentation that users know they exist and how to use them. You may have
> noticed that I spend a lot of effort asking people to improve the subject
> and description of JIRA cases, and making sure that the commit message
> matches the JIRA subject. I do this because usually the only documentation
> of a feature is the line in the release notes and the JIRA case it links to.
>
> This effort is key to Calcite’s success, and quite a few committers don’t
> do it. If committers did a better job in this area, it would reduce the
> workload on me.
>
> Julian
>
>
>
> > On Jun 23, 2022, at 6:44 AM, Ruben Q L  wrote:
> >
> > +1 on Stamatis' idea, I think it could help with the current situation of
> > lack of reviewers.
> >
> > Best,
> > Ruben
> >
> >
> > On Thu, Jun 23, 2022 at 12:56 PM Charles Givre  wrote:
> >
> >> Hello all,
> >> FWIW, If a committer/reviewer shortage is the issue, I'd second
> Stamatis's
> >> recommendation.
> >> Best,
> >> -- C
> >>
> >>> On Jun 23, 2022, at 7:02 AM, Stamatis Zampetakis 
> >> wrote:
> >>>
> >>> Hi all,
> >>>
> >>> How about granting Calcite committership to people who are already ASF
> >>> committers (in other projects) and they have a proven record of working
> >>> with Calcite?
> >>>
> >>> Usually the PMC invites people to become committers to the project
> after
> >>> having a few successful code contributions in Calcite/Avatica repos.
> >>> This is to ensure that people are familiar with the codebase and
> >> understand
> >>> how the ASF works.
> >>>
> >>> People who are already committers in an ASF project already know how
> the
> >>> foundation works and how they should behave.
> >>> Also people working in projects like Drill, Flink, Hive, Ignite,
> Phoenix,
> >>> etc., may already be quite familiar with Calcite if they have worked on
> >> the
> >>> query processing layer of the system.
> >>>
> >>> It might be difficult for the Calcite PMC to identify people familiar
> >> with
> >>> Calcite if they don't contribute to the main Calcite/Avatica repos
> >>> regularly thus I would be open to consider people for committers on a
> per
> >>> request basis.
> >>>
> >>> Example:
> >>> Bob is an ASF committer in Flink and he has pushed various
> contributions
> >>> around Calcite in the Flink repo.
> >>> Bob feels confident about fixing trivial things in Calcite and he wants
> >> to
> >>> help with reviewing and merging open PRs.
> >>> Bob sends an email to private@calcite list requesting to become a
> >> Calcite
> >>> committer.
> >>> Bob explains in the email who he is and what he has done to demonstrate
> >> he
> >>> is familiar with the Calcite code.
> >>> The Calcite PMC acknowledges the request and starts a vote for granting
> >>> Calcite comittership to Bob.
> >>> The Calcite PMC informs Bob about their decision and takes further
> >> actions
> >>> if necessary.
> >>>
> >>> If we agree on the overall idea we can figure out the details and
> >> formalize
> >>> the request process in our docs.
> >>>
> >>> Best,
> >>> Stamatis
> >>>
> >>> On Thu, Jun 23, 2022 at 6:06 AM Jing Zhang 
> wrote:
> >>>
>  Hi everyone,
> 
>  This is an awesome discussion to improve collaborating between
> different
>  projects.
>  Thanks Julian, Jacques, Austin, Martijn, Timo's effort to make it
> >> happen.
> 
>  Best,
>  Jing Zhang
> 
>  Martijn Visser  于2022年6月23日周四 01:43写道:
> 
> > Hi Jacques, Julian, Austin and everyone else,
> >
> > Thank you very much for sharing all your experiences and providing
> >> really
> > valuable input. I'll definitely relay this back to the original
>  discussion
> > thread in the Flink community. Part of bringing this information back
> >> to
> > the Flink community is also because I feel like the only way that
>  different
> > OSS solutions can help each other forward is by communicating and
> > collaborating. As Timo already mentioned, he'll try to help out.
> Let's
>  try
> > to get some more involved.
> >
> > Side note: I also saw that this 

Re: PR Review Request

2022-06-23 Thread Julian Hyde
+1 to Stamatis’ idea. It won’t make things worse. :)

But to repeat what I said earlier. We need existing committers to pull their 
weight. If necessary, committers need to talk to their managers and get time 
allocated to contribute to “housekeeping”.

One important kind of housekeeping is productization. That means not just 
getting features and bug fixes into Calcite, but adding sufficient 
documentation that users know they exist and how to use them. You may have 
noticed that I spend a lot of effort asking people to improve the subject and 
description of JIRA cases, and making sure that the commit message matches the 
JIRA subject. I do this because usually the only documentation of a feature is 
the line in the release notes and the JIRA case it links to.

This effort is key to Calcite’s success, and quite a few committers don’t do 
it. If committers did a better job in this area, it would reduce the workload 
on me.

Julian



> On Jun 23, 2022, at 6:44 AM, Ruben Q L  wrote:
> 
> +1 on Stamatis' idea, I think it could help with the current situation of
> lack of reviewers.
> 
> Best,
> Ruben
> 
> 
> On Thu, Jun 23, 2022 at 12:56 PM Charles Givre  wrote:
> 
>> Hello all,
>> FWIW, If a committer/reviewer shortage is the issue, I'd second Stamatis's
>> recommendation.
>> Best,
>> -- C
>> 
>>> On Jun 23, 2022, at 7:02 AM, Stamatis Zampetakis 
>> wrote:
>>> 
>>> Hi all,
>>> 
>>> How about granting Calcite committership to people who are already ASF
>>> committers (in other projects) and they have a proven record of working
>>> with Calcite?
>>> 
>>> Usually the PMC invites people to become committers to the project after
>>> having a few successful code contributions in Calcite/Avatica repos.
>>> This is to ensure that people are familiar with the codebase and
>> understand
>>> how the ASF works.
>>> 
>>> People who are already committers in an ASF project already know how the
>>> foundation works and how they should behave.
>>> Also people working in projects like Drill, Flink, Hive, Ignite, Phoenix,
>>> etc., may already be quite familiar with Calcite if they have worked on
>> the
>>> query processing layer of the system.
>>> 
>>> It might be difficult for the Calcite PMC to identify people familiar
>> with
>>> Calcite if they don't contribute to the main Calcite/Avatica repos
>>> regularly thus I would be open to consider people for committers on a per
>>> request basis.
>>> 
>>> Example:
>>> Bob is an ASF committer in Flink and he has pushed various contributions
>>> around Calcite in the Flink repo.
>>> Bob feels confident about fixing trivial things in Calcite and he wants
>> to
>>> help with reviewing and merging open PRs.
>>> Bob sends an email to private@calcite list requesting to become a
>> Calcite
>>> committer.
>>> Bob explains in the email who he is and what he has done to demonstrate
>> he
>>> is familiar with the Calcite code.
>>> The Calcite PMC acknowledges the request and starts a vote for granting
>>> Calcite comittership to Bob.
>>> The Calcite PMC informs Bob about their decision and takes further
>> actions
>>> if necessary.
>>> 
>>> If we agree on the overall idea we can figure out the details and
>> formalize
>>> the request process in our docs.
>>> 
>>> Best,
>>> Stamatis
>>> 
>>> On Thu, Jun 23, 2022 at 6:06 AM Jing Zhang  wrote:
>>> 
 Hi everyone,
 
 This is an awesome discussion to improve collaborating between different
 projects.
 Thanks Julian, Jacques, Austin, Martijn, Timo's effort to make it
>> happen.
 
 Best,
 Jing Zhang
 
 Martijn Visser  于2022年6月23日周四 01:43写道:
 
> Hi Jacques, Julian, Austin and everyone else,
> 
> Thank you very much for sharing all your experiences and providing
>> really
> valuable input. I'll definitely relay this back to the original
 discussion
> thread in the Flink community. Part of bringing this information back
>> to
> the Flink community is also because I feel like the only way that
 different
> OSS solutions can help each other forward is by communicating and
> collaborating. As Timo already mentioned, he'll try to help out. Let's
 try
> to get some more involved.
> 
> Side note: I also saw that this thread got some traction on Twitter [1]
 on
> the cost of forking.
> 
> Best regards,
> 
> Martijn
> 
> [1]
> 
> 
 
>> https://twitter.com/gunnarmorling/status/1539499415337111553?s=21=8fGk3PxScOx4FJPJWE5UeA
> 
> Op wo 22 jun. 2022 om 09:29 schreef Timo Walther :
> 
>> Hi everyone,
>> 
>> This is a really great discussion. Thanks for starting it Martijn and
>> your input Jacques! I have been fighting against forking Calcite in
>> Flink for years already. Even when merging forks of Flink that
>> transitively forked Calcite, in the end we were able to resolve
>> conflicts / contribute blockers back into Calcite. And I strongly
>> believe that this is the 

Re: PR Review Request

2022-06-23 Thread Ruben Q L
+1 on Stamatis' idea, I think it could help with the current situation of
lack of reviewers.

Best,
Ruben


On Thu, Jun 23, 2022 at 12:56 PM Charles Givre  wrote:

> Hello all,
> FWIW, If a committer/reviewer shortage is the issue, I'd second Stamatis's
> recommendation.
> Best,
> -- C
>
> > On Jun 23, 2022, at 7:02 AM, Stamatis Zampetakis 
> wrote:
> >
> > Hi all,
> >
> > How about granting Calcite committership to people who are already ASF
> > committers (in other projects) and they have a proven record of working
> > with Calcite?
> >
> > Usually the PMC invites people to become committers to the project after
> > having a few successful code contributions in Calcite/Avatica repos.
> > This is to ensure that people are familiar with the codebase and
> understand
> > how the ASF works.
> >
> > People who are already committers in an ASF project already know how the
> > foundation works and how they should behave.
> > Also people working in projects like Drill, Flink, Hive, Ignite, Phoenix,
> > etc., may already be quite familiar with Calcite if they have worked on
> the
> > query processing layer of the system.
> >
> > It might be difficult for the Calcite PMC to identify people familiar
> with
> > Calcite if they don't contribute to the main Calcite/Avatica repos
> > regularly thus I would be open to consider people for committers on a per
> > request basis.
> >
> > Example:
> > Bob is an ASF committer in Flink and he has pushed various contributions
> > around Calcite in the Flink repo.
> > Bob feels confident about fixing trivial things in Calcite and he wants
> to
> > help with reviewing and merging open PRs.
> > Bob sends an email to private@calcite list requesting to become a
> Calcite
> > committer.
> > Bob explains in the email who he is and what he has done to demonstrate
> he
> > is familiar with the Calcite code.
> > The Calcite PMC acknowledges the request and starts a vote for granting
> > Calcite comittership to Bob.
> > The Calcite PMC informs Bob about their decision and takes further
> actions
> > if necessary.
> >
> > If we agree on the overall idea we can figure out the details and
> formalize
> > the request process in our docs.
> >
> > Best,
> > Stamatis
> >
> > On Thu, Jun 23, 2022 at 6:06 AM Jing Zhang  wrote:
> >
> >> Hi everyone,
> >>
> >> This is an awesome discussion to improve collaborating between different
> >> projects.
> >> Thanks Julian, Jacques, Austin, Martijn, Timo's effort to make it
> happen.
> >>
> >> Best,
> >> Jing Zhang
> >>
> >> Martijn Visser  于2022年6月23日周四 01:43写道:
> >>
> >>> Hi Jacques, Julian, Austin and everyone else,
> >>>
> >>> Thank you very much for sharing all your experiences and providing
> really
> >>> valuable input. I'll definitely relay this back to the original
> >> discussion
> >>> thread in the Flink community. Part of bringing this information back
> to
> >>> the Flink community is also because I feel like the only way that
> >> different
> >>> OSS solutions can help each other forward is by communicating and
> >>> collaborating. As Timo already mentioned, he'll try to help out. Let's
> >> try
> >>> to get some more involved.
> >>>
> >>> Side note: I also saw that this thread got some traction on Twitter [1]
> >> on
> >>> the cost of forking.
> >>>
> >>> Best regards,
> >>>
> >>> Martijn
> >>>
> >>> [1]
> >>>
> >>>
> >>
> https://twitter.com/gunnarmorling/status/1539499415337111553?s=21=8fGk3PxScOx4FJPJWE5UeA
> >>>
> >>> Op wo 22 jun. 2022 om 09:29 schreef Timo Walther :
> >>>
>  Hi everyone,
> 
>  This is a really great discussion. Thanks for starting it Martijn and
>  your input Jacques! I have been fighting against forking Calcite in
>  Flink for years already. Even when merging forks of Flink that
>  transitively forked Calcite, in the end we were able to resolve
>  conflicts / contribute blockers back into Calcite. And I strongly
>  believe that this is the better approach for long-term success for
> both
>  projects.
> 
>  I would like to get more involved in the Calcite community. I have
> been
>  implementing and managing Flink SQL based on Calcite since 2016. Thus,
> >> I
>  feel confident to say that I know the code base and some quirks in the
>  stack very well.
> 
>  Capacity-wise I will try to reserve some time for helping the Calcite
>  community. Happy to get some pointers where and how I can help.
> 
>  I will take a look at https://github.com/apache/calcite/pull/2606
> this
>  week to get the ball rolling. As this is an important addition and
>  prepares for "customer SQL operators" in Flink SQL.
> 
>  Regards,
>  Timo
> 
>  On 21.06.22 22:18, Charles Givre wrote:
> > As the PMC for Apache Drill, I'd echo everyone's comments here
> >>> Don't
>  fork.   Don't do it.
> >
> > Apache Drill forked Calcite several years ago which Calcite was on
>  version 1.20 or 1.21.  While this meant that some bugs 

Re: PR Review Request

2022-06-23 Thread Charles Givre
Hello all, 
FWIW, If a committer/reviewer shortage is the issue, I'd second Stamatis's 
recommendation.
Best,
-- C

> On Jun 23, 2022, at 7:02 AM, Stamatis Zampetakis  wrote:
> 
> Hi all,
> 
> How about granting Calcite committership to people who are already ASF
> committers (in other projects) and they have a proven record of working
> with Calcite?
> 
> Usually the PMC invites people to become committers to the project after
> having a few successful code contributions in Calcite/Avatica repos.
> This is to ensure that people are familiar with the codebase and understand
> how the ASF works.
> 
> People who are already committers in an ASF project already know how the
> foundation works and how they should behave.
> Also people working in projects like Drill, Flink, Hive, Ignite, Phoenix,
> etc., may already be quite familiar with Calcite if they have worked on the
> query processing layer of the system.
> 
> It might be difficult for the Calcite PMC to identify people familiar with
> Calcite if they don't contribute to the main Calcite/Avatica repos
> regularly thus I would be open to consider people for committers on a per
> request basis.
> 
> Example:
> Bob is an ASF committer in Flink and he has pushed various contributions
> around Calcite in the Flink repo.
> Bob feels confident about fixing trivial things in Calcite and he wants to
> help with reviewing and merging open PRs.
> Bob sends an email to private@calcite list requesting to become a Calcite
> committer.
> Bob explains in the email who he is and what he has done to demonstrate he
> is familiar with the Calcite code.
> The Calcite PMC acknowledges the request and starts a vote for granting
> Calcite comittership to Bob.
> The Calcite PMC informs Bob about their decision and takes further actions
> if necessary.
> 
> If we agree on the overall idea we can figure out the details and formalize
> the request process in our docs.
> 
> Best,
> Stamatis
> 
> On Thu, Jun 23, 2022 at 6:06 AM Jing Zhang  wrote:
> 
>> Hi everyone,
>> 
>> This is an awesome discussion to improve collaborating between different
>> projects.
>> Thanks Julian, Jacques, Austin, Martijn, Timo's effort to make it happen.
>> 
>> Best,
>> Jing Zhang
>> 
>> Martijn Visser  于2022年6月23日周四 01:43写道:
>> 
>>> Hi Jacques, Julian, Austin and everyone else,
>>> 
>>> Thank you very much for sharing all your experiences and providing really
>>> valuable input. I'll definitely relay this back to the original
>> discussion
>>> thread in the Flink community. Part of bringing this information back to
>>> the Flink community is also because I feel like the only way that
>> different
>>> OSS solutions can help each other forward is by communicating and
>>> collaborating. As Timo already mentioned, he'll try to help out. Let's
>> try
>>> to get some more involved.
>>> 
>>> Side note: I also saw that this thread got some traction on Twitter [1]
>> on
>>> the cost of forking.
>>> 
>>> Best regards,
>>> 
>>> Martijn
>>> 
>>> [1]
>>> 
>>> 
>> https://twitter.com/gunnarmorling/status/1539499415337111553?s=21=8fGk3PxScOx4FJPJWE5UeA
>>> 
>>> Op wo 22 jun. 2022 om 09:29 schreef Timo Walther :
>>> 
 Hi everyone,
 
 This is a really great discussion. Thanks for starting it Martijn and
 your input Jacques! I have been fighting against forking Calcite in
 Flink for years already. Even when merging forks of Flink that
 transitively forked Calcite, in the end we were able to resolve
 conflicts / contribute blockers back into Calcite. And I strongly
 believe that this is the better approach for long-term success for both
 projects.
 
 I would like to get more involved in the Calcite community. I have been
 implementing and managing Flink SQL based on Calcite since 2016. Thus,
>> I
 feel confident to say that I know the code base and some quirks in the
 stack very well.
 
 Capacity-wise I will try to reserve some time for helping the Calcite
 community. Happy to get some pointers where and how I can help.
 
 I will take a look at https://github.com/apache/calcite/pull/2606 this
 week to get the ball rolling. As this is an important addition and
 prepares for "customer SQL operators" in Flink SQL.
 
 Regards,
 Timo
 
 On 21.06.22 22:18, Charles Givre wrote:
> As the PMC for Apache Drill, I'd echo everyone's comments here
>>> Don't
 fork.   Don't do it.
> 
> Apache Drill forked Calcite several years ago which Calcite was on
 version 1.20 or 1.21.  While this meant that some bugs were easily
>> fixed,
 what it also meant that as our fork diverged from "regular" Calcite, it
 became harder and harder to maintain.  It also meant that we were
>> chasing
 bugs that had since been fixed.
> 
> Drill is in the process of "de-forking" Calcite, meaning that we're
 ditching our fork and re-integrating with standard Calcite.  It has
>> been
>>> A
 TON of work and we 

Re: PR Review Request

2022-06-23 Thread Stamatis Zampetakis
Hi all,

How about granting Calcite committership to people who are already ASF
committers (in other projects) and they have a proven record of working
with Calcite?

Usually the PMC invites people to become committers to the project after
having a few successful code contributions in Calcite/Avatica repos.
This is to ensure that people are familiar with the codebase and understand
how the ASF works.

People who are already committers in an ASF project already know how the
foundation works and how they should behave.
Also people working in projects like Drill, Flink, Hive, Ignite, Phoenix,
etc., may already be quite familiar with Calcite if they have worked on the
query processing layer of the system.

It might be difficult for the Calcite PMC to identify people familiar with
Calcite if they don't contribute to the main Calcite/Avatica repos
regularly thus I would be open to consider people for committers on a per
request basis.

Example:
Bob is an ASF committer in Flink and he has pushed various contributions
around Calcite in the Flink repo.
Bob feels confident about fixing trivial things in Calcite and he wants to
help with reviewing and merging open PRs.
Bob sends an email to private@calcite list requesting to become a Calcite
committer.
Bob explains in the email who he is and what he has done to demonstrate he
is familiar with the Calcite code.
The Calcite PMC acknowledges the request and starts a vote for granting
Calcite comittership to Bob.
The Calcite PMC informs Bob about their decision and takes further actions
if necessary.

If we agree on the overall idea we can figure out the details and formalize
the request process in our docs.

Best,
Stamatis

On Thu, Jun 23, 2022 at 6:06 AM Jing Zhang  wrote:

> Hi everyone,
>
> This is an awesome discussion to improve collaborating between different
> projects.
> Thanks Julian, Jacques, Austin, Martijn, Timo's effort to make it happen.
>
> Best,
> Jing Zhang
>
> Martijn Visser  于2022年6月23日周四 01:43写道:
>
> > Hi Jacques, Julian, Austin and everyone else,
> >
> > Thank you very much for sharing all your experiences and providing really
> > valuable input. I'll definitely relay this back to the original
> discussion
> > thread in the Flink community. Part of bringing this information back to
> > the Flink community is also because I feel like the only way that
> different
> > OSS solutions can help each other forward is by communicating and
> > collaborating. As Timo already mentioned, he'll try to help out. Let's
> try
> > to get some more involved.
> >
> > Side note: I also saw that this thread got some traction on Twitter [1]
> on
> > the cost of forking.
> >
> > Best regards,
> >
> > Martijn
> >
> > [1]
> >
> >
> https://twitter.com/gunnarmorling/status/1539499415337111553?s=21=8fGk3PxScOx4FJPJWE5UeA
> >
> > Op wo 22 jun. 2022 om 09:29 schreef Timo Walther :
> >
> > > Hi everyone,
> > >
> > > This is a really great discussion. Thanks for starting it Martijn and
> > > your input Jacques! I have been fighting against forking Calcite in
> > > Flink for years already. Even when merging forks of Flink that
> > > transitively forked Calcite, in the end we were able to resolve
> > > conflicts / contribute blockers back into Calcite. And I strongly
> > > believe that this is the better approach for long-term success for both
> > > projects.
> > >
> > > I would like to get more involved in the Calcite community. I have been
> > > implementing and managing Flink SQL based on Calcite since 2016. Thus,
> I
> > > feel confident to say that I know the code base and some quirks in the
> > > stack very well.
> > >
> > > Capacity-wise I will try to reserve some time for helping the Calcite
> > > community. Happy to get some pointers where and how I can help.
> > >
> > > I will take a look at https://github.com/apache/calcite/pull/2606 this
> > > week to get the ball rolling. As this is an important addition and
> > > prepares for "customer SQL operators" in Flink SQL.
> > >
> > > Regards,
> > > Timo
> > >
> > > On 21.06.22 22:18, Charles Givre wrote:
> > > > As the PMC for Apache Drill, I'd echo everyone's comments here
> > Don't
> > > fork.   Don't do it.
> > > >
> > > > Apache Drill forked Calcite several years ago which Calcite was on
> > > version 1.20 or 1.21.  While this meant that some bugs were easily
> fixed,
> > > what it also meant that as our fork diverged from "regular" Calcite, it
> > > became harder and harder to maintain.  It also meant that we were
> chasing
> > > bugs that had since been fixed.
> > > >
> > > > Drill is in the process of "de-forking" Calcite, meaning that we're
> > > ditching our fork and re-integrating with standard Calcite.  It has
> been
> > A
> > > TON of work and we have contributed (and will continue to contribute)
> bug
> > > fixes and PRs to Calcite. In the long run, I think this will be
> > beneficial
> > > for both communities.
> > > >
> > > > Best,
> > > > -- C
> > > >
> > > >
> > > >> On Jun 21, 2022, at 1:57 PM, Julian 

Re: PR Review Request

2022-06-22 Thread Jing Zhang
Hi everyone,

This is an awesome discussion to improve collaborating between different
projects.
Thanks Julian, Jacques, Austin, Martijn, Timo's effort to make it happen.

Best,
Jing Zhang

Martijn Visser  于2022年6月23日周四 01:43写道:

> Hi Jacques, Julian, Austin and everyone else,
>
> Thank you very much for sharing all your experiences and providing really
> valuable input. I'll definitely relay this back to the original discussion
> thread in the Flink community. Part of bringing this information back to
> the Flink community is also because I feel like the only way that different
> OSS solutions can help each other forward is by communicating and
> collaborating. As Timo already mentioned, he'll try to help out. Let's try
> to get some more involved.
>
> Side note: I also saw that this thread got some traction on Twitter [1] on
> the cost of forking.
>
> Best regards,
>
> Martijn
>
> [1]
>
> https://twitter.com/gunnarmorling/status/1539499415337111553?s=21=8fGk3PxScOx4FJPJWE5UeA
>
> Op wo 22 jun. 2022 om 09:29 schreef Timo Walther :
>
> > Hi everyone,
> >
> > This is a really great discussion. Thanks for starting it Martijn and
> > your input Jacques! I have been fighting against forking Calcite in
> > Flink for years already. Even when merging forks of Flink that
> > transitively forked Calcite, in the end we were able to resolve
> > conflicts / contribute blockers back into Calcite. And I strongly
> > believe that this is the better approach for long-term success for both
> > projects.
> >
> > I would like to get more involved in the Calcite community. I have been
> > implementing and managing Flink SQL based on Calcite since 2016. Thus, I
> > feel confident to say that I know the code base and some quirks in the
> > stack very well.
> >
> > Capacity-wise I will try to reserve some time for helping the Calcite
> > community. Happy to get some pointers where and how I can help.
> >
> > I will take a look at https://github.com/apache/calcite/pull/2606 this
> > week to get the ball rolling. As this is an important addition and
> > prepares for "customer SQL operators" in Flink SQL.
> >
> > Regards,
> > Timo
> >
> > On 21.06.22 22:18, Charles Givre wrote:
> > > As the PMC for Apache Drill, I'd echo everyone's comments here
> Don't
> > fork.   Don't do it.
> > >
> > > Apache Drill forked Calcite several years ago which Calcite was on
> > version 1.20 or 1.21.  While this meant that some bugs were easily fixed,
> > what it also meant that as our fork diverged from "regular" Calcite, it
> > became harder and harder to maintain.  It also meant that we were chasing
> > bugs that had since been fixed.
> > >
> > > Drill is in the process of "de-forking" Calcite, meaning that we're
> > ditching our fork and re-integrating with standard Calcite.  It has been
> A
> > TON of work and we have contributed (and will continue to contribute) bug
> > fixes and PRs to Calcite. In the long run, I think this will be
> beneficial
> > for both communities.
> > >
> > > Best,
> > > -- C
> > >
> > >
> > >> On Jun 21, 2022, at 1:57 PM, Julian Hyde 
> > wrote:
> > >>
> > >> Please don’t fork Calcite.
> > >>
> > >> Calcite suffers from the tragedy of the commons. Unlike many open
> > source data projects, there is no commercial project that directly maps
> to
> > Calcite (even though Calcite is an essential part of many projects). As a
> > result no engineers work full-time on Calcite.
> > >>
> > >> It takes more than pull requests to keep a project going. We need
> > reviewers, people to work on releases, people to fix bugs (such as
> security
> > bugs) that are important to everyone but urgent to no one.
> > >>
> > >> We have plenty of committers in Calcite, and add several more per
> year.
> > We rely on those committers taking on their share of the housework, but
> the
> > burden falls on too few people.
> > >>
> > >> Engineering managers need to start paying a little more for the “free
> > lunch” that they enjoy when Calcite “just works” in their project. Sadly,
> > most engineering managers are not subscribed to this list.
> > >>
> > >> Julian
> > >>
> > >>
> > >>> On Jun 21, 2022, at 9:49 AM, Jacques Nadeau 
> > wrote:
> > >>>
> > >>> Martijn, thanks for sharing that thread in the Flink community.
> > >>>
> > >>> I'm someone who has forked Calcite twice: once in Apache Drill and
> > again in
> > >>> Dremio. In both cases, it was all about trading short term benefits
> > against
> > >>> long term costs. In both cases, I think the net amount of work was
> > probably
> > >>> 5x as much as what it would have been if we had just done a better
> job
> > >>> engaging the community. If I were to state the curve of behavior over
> > six
> > >>> years, I'd guess that in both cases the numbers of effort looked like
> > this:
> > >>>
> > >>> estimated effort doing high intensity integration with calcite (years
> > 1-6)
> > >>> fork: 1, 5, 10, 50, 100, 200, total = 366
> > >>> non-fork: 10, 10, 10, 10, 10, total = 50
> > >>>
> > >>> So 

Re: PR Review Request

2022-06-22 Thread Martijn Visser
Hi Jacques, Julian, Austin and everyone else,

Thank you very much for sharing all your experiences and providing really
valuable input. I'll definitely relay this back to the original discussion
thread in the Flink community. Part of bringing this information back to
the Flink community is also because I feel like the only way that different
OSS solutions can help each other forward is by communicating and
collaborating. As Timo already mentioned, he'll try to help out. Let's try
to get some more involved.

Side note: I also saw that this thread got some traction on Twitter [1] on
the cost of forking.

Best regards,

Martijn

[1]
https://twitter.com/gunnarmorling/status/1539499415337111553?s=21=8fGk3PxScOx4FJPJWE5UeA

Op wo 22 jun. 2022 om 09:29 schreef Timo Walther :

> Hi everyone,
>
> This is a really great discussion. Thanks for starting it Martijn and
> your input Jacques! I have been fighting against forking Calcite in
> Flink for years already. Even when merging forks of Flink that
> transitively forked Calcite, in the end we were able to resolve
> conflicts / contribute blockers back into Calcite. And I strongly
> believe that this is the better approach for long-term success for both
> projects.
>
> I would like to get more involved in the Calcite community. I have been
> implementing and managing Flink SQL based on Calcite since 2016. Thus, I
> feel confident to say that I know the code base and some quirks in the
> stack very well.
>
> Capacity-wise I will try to reserve some time for helping the Calcite
> community. Happy to get some pointers where and how I can help.
>
> I will take a look at https://github.com/apache/calcite/pull/2606 this
> week to get the ball rolling. As this is an important addition and
> prepares for "customer SQL operators" in Flink SQL.
>
> Regards,
> Timo
>
> On 21.06.22 22:18, Charles Givre wrote:
> > As the PMC for Apache Drill, I'd echo everyone's comments here Don't
> fork.   Don't do it.
> >
> > Apache Drill forked Calcite several years ago which Calcite was on
> version 1.20 or 1.21.  While this meant that some bugs were easily fixed,
> what it also meant that as our fork diverged from "regular" Calcite, it
> became harder and harder to maintain.  It also meant that we were chasing
> bugs that had since been fixed.
> >
> > Drill is in the process of "de-forking" Calcite, meaning that we're
> ditching our fork and re-integrating with standard Calcite.  It has been A
> TON of work and we have contributed (and will continue to contribute) bug
> fixes and PRs to Calcite. In the long run, I think this will be beneficial
> for both communities.
> >
> > Best,
> > -- C
> >
> >
> >> On Jun 21, 2022, at 1:57 PM, Julian Hyde 
> wrote:
> >>
> >> Please don’t fork Calcite.
> >>
> >> Calcite suffers from the tragedy of the commons. Unlike many open
> source data projects, there is no commercial project that directly maps to
> Calcite (even though Calcite is an essential part of many projects). As a
> result no engineers work full-time on Calcite.
> >>
> >> It takes more than pull requests to keep a project going. We need
> reviewers, people to work on releases, people to fix bugs (such as security
> bugs) that are important to everyone but urgent to no one.
> >>
> >> We have plenty of committers in Calcite, and add several more per year.
> We rely on those committers taking on their share of the housework, but the
> burden falls on too few people.
> >>
> >> Engineering managers need to start paying a little more for the “free
> lunch” that they enjoy when Calcite “just works” in their project. Sadly,
> most engineering managers are not subscribed to this list.
> >>
> >> Julian
> >>
> >>
> >>> On Jun 21, 2022, at 9:49 AM, Jacques Nadeau 
> wrote:
> >>>
> >>> Martijn, thanks for sharing that thread in the Flink community.
> >>>
> >>> I'm someone who has forked Calcite twice: once in Apache Drill and
> again in
> >>> Dremio. In both cases, it was all about trading short term benefits
> against
> >>> long term costs. In both cases, I think the net amount of work was
> probably
> >>> 5x as much as what it would have been if we had just done a better job
> >>> engaging the community. If I were to state the curve of behavior over
> six
> >>> years, I'd guess that in both cases the numbers of effort looked like
> this:
> >>>
> >>> estimated effort doing high intensity integration with calcite (years
> 1-6)
> >>> fork: 1, 5, 10, 50, 100, 200, total = 366
> >>> non-fork: 10, 10, 10, 10, 10, total = 50
> >>>
> >>> So yes, the first couple years you're ahead. But you pay a massive
> >>> technical debt premium long term. Early in a project (Drill) or
> company's
> >>> life (Dremio), it can make sense to sacrifice long term for short term
> but
> >>> it's important people do it with their eyes open.
> >>>
> >>> The reason that this pain is so high is that as your codebases
> diverge, you
> >>> start having to do everything the Calcite community does by yourself.
> >>> Backports 

Re: PR Review Request

2022-06-22 Thread Timo Walther

Hi everyone,

This is a really great discussion. Thanks for starting it Martijn and 
your input Jacques! I have been fighting against forking Calcite in 
Flink for years already. Even when merging forks of Flink that 
transitively forked Calcite, in the end we were able to resolve 
conflicts / contribute blockers back into Calcite. And I strongly 
believe that this is the better approach for long-term success for both 
projects.


I would like to get more involved in the Calcite community. I have been 
implementing and managing Flink SQL based on Calcite since 2016. Thus, I 
feel confident to say that I know the code base and some quirks in the 
stack very well.


Capacity-wise I will try to reserve some time for helping the Calcite 
community. Happy to get some pointers where and how I can help.


I will take a look at https://github.com/apache/calcite/pull/2606 this 
week to get the ball rolling. As this is an important addition and 
prepares for "customer SQL operators" in Flink SQL.


Regards,
Timo

On 21.06.22 22:18, Charles Givre wrote:

As the PMC for Apache Drill, I'd echo everyone's comments here Don't fork.  
 Don't do it.

Apache Drill forked Calcite several years ago which Calcite was on version 1.20 or 1.21.  
While this meant that some bugs were easily fixed, what it also meant that as our fork 
diverged from "regular" Calcite, it became harder and harder to maintain.  It 
also meant that we were chasing bugs that had since been fixed.

Drill is in the process of "de-forking" Calcite, meaning that we're ditching 
our fork and re-integrating with standard Calcite.  It has been A TON of work and we have 
contributed (and will continue to contribute) bug fixes and PRs to Calcite. In the long 
run, I think this will be beneficial for both communities.

Best,
-- C



On Jun 21, 2022, at 1:57 PM, Julian Hyde  wrote:

Please don’t fork Calcite.

Calcite suffers from the tragedy of the commons. Unlike many open source data 
projects, there is no commercial project that directly maps to Calcite (even 
though Calcite is an essential part of many projects). As a result no engineers 
work full-time on Calcite.

It takes more than pull requests to keep a project going. We need reviewers, 
people to work on releases, people to fix bugs (such as security bugs) that are 
important to everyone but urgent to no one.

We have plenty of committers in Calcite, and add several more per year. We rely 
on those committers taking on their share of the housework, but the burden 
falls on too few people.

Engineering managers need to start paying a little more for the “free lunch” 
that they enjoy when Calcite “just works” in their project. Sadly, most 
engineering managers are not subscribed to this list.

Julian



On Jun 21, 2022, at 9:49 AM, Jacques Nadeau  wrote:

Martijn, thanks for sharing that thread in the Flink community.

I'm someone who has forked Calcite twice: once in Apache Drill and again in
Dremio. In both cases, it was all about trading short term benefits against
long term costs. In both cases, I think the net amount of work was probably
5x as much as what it would have been if we had just done a better job
engaging the community. If I were to state the curve of behavior over six
years, I'd guess that in both cases the numbers of effort looked like this:

estimated effort doing high intensity integration with calcite (years 1-6)
fork: 1, 5, 10, 50, 100, 200, total = 366
non-fork: 10, 10, 10, 10, 10, total = 50

So yes, the first couple years you're ahead. But you pay a massive
technical debt premium long term. Early in a project (Drill) or company's
life (Dremio), it can make sense to sacrifice long term for short term but
it's important people do it with their eyes open.

The reason that this pain is so high is that as your codebases diverge, you
start having to do everything the Calcite community does by yourself.
Backports become harder and things that you need (e.g. new sql syntax, etc)
have to be reimplemented (even if someone else already implemented them in
some post-fork Calcite version. Ultimately, at some point you realize that
your path is untenable and you unfork. This becomes the biggest expense of
them all and I believe both of those teams are still trying to un-fork. The
additional thing that becomes an even bigger problem is your absence from
the Calcite community means that people may take the project or APIs in
ways that are in direct conflict to how you use the library. Since you're
not active in the project, you fail to provide a counterpoint and then
you're basically just in a miserable place. The Hive project did this best
by ensuring that releases of Calcite were also run pre-release against Hive
to make sure no major regressions occurred. By being in the community and
active, this is the best state from my pov. (It makes your project better
and Calcite better.)

Two last notes:
- I'm not sure the rocks fork is comparable to forking Calcite. The api
surface area and 

Re: PR Review Request

2022-06-21 Thread Charles Givre
As the PMC for Apache Drill, I'd echo everyone's comments here Don't fork.  
 Don't do it.

Apache Drill forked Calcite several years ago which Calcite was on version 1.20 
or 1.21.  While this meant that some bugs were easily fixed, what it also meant 
that as our fork diverged from "regular" Calcite, it became harder and harder 
to maintain.  It also meant that we were chasing bugs that had since been fixed.

Drill is in the process of "de-forking" Calcite, meaning that we're ditching 
our fork and re-integrating with standard Calcite.  It has been A TON of work 
and we have contributed (and will continue to contribute) bug fixes and PRs to 
Calcite. In the long run, I think this will be beneficial for both communities. 

Best,
-- C


> On Jun 21, 2022, at 1:57 PM, Julian Hyde  wrote:
> 
> Please don’t fork Calcite.
> 
> Calcite suffers from the tragedy of the commons. Unlike many open source data 
> projects, there is no commercial project that directly maps to Calcite (even 
> though Calcite is an essential part of many projects). As a result no 
> engineers work full-time on Calcite.
> 
> It takes more than pull requests to keep a project going. We need reviewers, 
> people to work on releases, people to fix bugs (such as security bugs) that 
> are important to everyone but urgent to no one.
> 
> We have plenty of committers in Calcite, and add several more per year. We 
> rely on those committers taking on their share of the housework, but the 
> burden falls on too few people.
> 
> Engineering managers need to start paying a little more for the “free lunch” 
> that they enjoy when Calcite “just works” in their project. Sadly, most 
> engineering managers are not subscribed to this list.
> 
> Julian
> 
> 
>> On Jun 21, 2022, at 9:49 AM, Jacques Nadeau  wrote:
>> 
>> Martijn, thanks for sharing that thread in the Flink community.
>> 
>> I'm someone who has forked Calcite twice: once in Apache Drill and again in
>> Dremio. In both cases, it was all about trading short term benefits against
>> long term costs. In both cases, I think the net amount of work was probably
>> 5x as much as what it would have been if we had just done a better job
>> engaging the community. If I were to state the curve of behavior over six
>> years, I'd guess that in both cases the numbers of effort looked like this:
>> 
>> estimated effort doing high intensity integration with calcite (years 1-6)
>> fork: 1, 5, 10, 50, 100, 200, total = 366
>> non-fork: 10, 10, 10, 10, 10, total = 50
>> 
>> So yes, the first couple years you're ahead. But you pay a massive
>> technical debt premium long term. Early in a project (Drill) or company's
>> life (Dremio), it can make sense to sacrifice long term for short term but
>> it's important people do it with their eyes open.
>> 
>> The reason that this pain is so high is that as your codebases diverge, you
>> start having to do everything the Calcite community does by yourself.
>> Backports become harder and things that you need (e.g. new sql syntax, etc)
>> have to be reimplemented (even if someone else already implemented them in
>> some post-fork Calcite version. Ultimately, at some point you realize that
>> your path is untenable and you unfork. This becomes the biggest expense of
>> them all and I believe both of those teams are still trying to un-fork. The
>> additional thing that becomes an even bigger problem is your absence from
>> the Calcite community means that people may take the project or APIs in
>> ways that are in direct conflict to how you use the library. Since you're
>> not active in the project, you fail to provide a counterpoint and then
>> you're basically just in a miserable place. The Hive project did this best
>> by ensuring that releases of Calcite were also run pre-release against Hive
>> to make sure no major regressions occurred. By being in the community and
>> active, this is the best state from my pov. (It makes your project better
>> and Calcite better.)
>> 
>> Two last notes:
>> - I'm not sure the rocks fork is comparable to forking Calcite. The api
>> surface area and community models are very different.
>> - This is all based on a high intensity integration (using rules + planner
>> or sql + rules + planner). Calcite is frustratingly monolithic and if
>> someone was only going to use a small component, my opinion would likely be
>> very different.
>> 
>> I'd send this to the Flink list but I'm not subscribed. It'd be great if
>> you shared it with the people over there if you think they'd find it useful.
>> 
>> 
>> 
>> On Tue, Jun 21, 2022 at 12:31 AM Martijn Visser 
>> wrote:
>> 
>>> Thanks Julian and Austin!
>>> 
>>> Any reply to kick-off some sort of discussion is worthwhile :D
>>> I definitely know the feeling of having more PRs open then you would like,
>>> looking at https://github.com/apache/flink/pulls :)
>>> 
>>> There have been discussions in the Flink community about forking Calcite
>>> [1]. My personal preference at the moment is to see 

Re: PR Review Request

2022-06-21 Thread Austin Bennett
Martijn:

I'd interpret Julian's response as welcoming you to contribute to the
Calcite :-)

Sounds like there is concretely room for:
* reviewing ( ex: test, comment in PR, but not actually merge -- might make
it easier/quicker for the current committers to then allow them to address
other/more things )
* security bug fixes ( ex: if addressed, then committers are freed up for
additional reviewing/other )
* the usual laundry list of open source projects welcoming helping hands [
seems you know the drill via Flink ].

Cheers,
Austin


On Tue, Jun 21, 2022 at 10:57 AM Julian Hyde  wrote:

> Please don’t fork Calcite.
>
> Calcite suffers from the tragedy of the commons. Unlike many open source
> data projects, there is no commercial project that directly maps to Calcite
> (even though Calcite is an essential part of many projects). As a result no
> engineers work full-time on Calcite.
>
> It takes more than pull requests to keep a project going. We need
> reviewers, people to work on releases, people to fix bugs (such as security
> bugs) that are important to everyone but urgent to no one.
>
> We have plenty of committers in Calcite, and add several more per year. We
> rely on those committers taking on their share of the housework, but the
> burden falls on too few people.
>
> Engineering managers need to start paying a little more for the “free
> lunch” that they enjoy when Calcite “just works” in their project. Sadly,
> most engineering managers are not subscribed to this list.
>
> Julian
>
>
> > On Jun 21, 2022, at 9:49 AM, Jacques Nadeau  wrote:
> >
> > Martijn, thanks for sharing that thread in the Flink community.
> >
> > I'm someone who has forked Calcite twice: once in Apache Drill and again
> in
> > Dremio. In both cases, it was all about trading short term benefits
> against
> > long term costs. In both cases, I think the net amount of work was
> probably
> > 5x as much as what it would have been if we had just done a better job
> > engaging the community. If I were to state the curve of behavior over six
> > years, I'd guess that in both cases the numbers of effort looked like
> this:
> >
> > estimated effort doing high intensity integration with calcite (years
> 1-6)
> > fork: 1, 5, 10, 50, 100, 200, total = 366
> > non-fork: 10, 10, 10, 10, 10, total = 50
> >
> > So yes, the first couple years you're ahead. But you pay a massive
> > technical debt premium long term. Early in a project (Drill) or company's
> > life (Dremio), it can make sense to sacrifice long term for short term
> but
> > it's important people do it with their eyes open.
> >
> > The reason that this pain is so high is that as your codebases diverge,
> you
> > start having to do everything the Calcite community does by yourself.
> > Backports become harder and things that you need (e.g. new sql syntax,
> etc)
> > have to be reimplemented (even if someone else already implemented them
> in
> > some post-fork Calcite version. Ultimately, at some point you realize
> that
> > your path is untenable and you unfork. This becomes the biggest expense
> of
> > them all and I believe both of those teams are still trying to un-fork.
> The
> > additional thing that becomes an even bigger problem is your absence from
> > the Calcite community means that people may take the project or APIs in
> > ways that are in direct conflict to how you use the library. Since you're
> > not active in the project, you fail to provide a counterpoint and then
> > you're basically just in a miserable place. The Hive project did this
> best
> > by ensuring that releases of Calcite were also run pre-release against
> Hive
> > to make sure no major regressions occurred. By being in the community and
> > active, this is the best state from my pov. (It makes your project better
> > and Calcite better.)
> >
> > Two last notes:
> > - I'm not sure the rocks fork is comparable to forking Calcite. The api
> > surface area and community models are very different.
> > - This is all based on a high intensity integration (using rules +
> planner
> > or sql + rules + planner). Calcite is frustratingly monolithic and if
> > someone was only going to use a small component, my opinion would likely
> be
> > very different.
> >
> > I'd send this to the Flink list but I'm not subscribed. It'd be great if
> > you shared it with the people over there if you think they'd find it
> useful.
> >
> >
> >
> > On Tue, Jun 21, 2022 at 12:31 AM Martijn Visser <
> martijnvis...@apache.org>
> > wrote:
> >
> >> Thanks Julian and Austin!
> >>
> >> Any reply to kick-off some sort of discussion is worthwhile :D
> >> I definitely know the feeling of having more PRs open then you would
> like,
> >> looking at https://github.com/apache/flink/pulls :)
> >>
> >> There have been discussions in the Flink community about forking Calcite
> >> [1]. My personal preference at the moment is to see if we can create a
> >> better collaboration and community. I believe that we can find people
> from
> >> the 

Re: PR Review Request

2022-06-21 Thread Julian Hyde
Please don’t fork Calcite.

Calcite suffers from the tragedy of the commons. Unlike many open source data 
projects, there is no commercial project that directly maps to Calcite (even 
though Calcite is an essential part of many projects). As a result no engineers 
work full-time on Calcite.

It takes more than pull requests to keep a project going. We need reviewers, 
people to work on releases, people to fix bugs (such as security bugs) that are 
important to everyone but urgent to no one.

We have plenty of committers in Calcite, and add several more per year. We rely 
on those committers taking on their share of the housework, but the burden 
falls on too few people.

Engineering managers need to start paying a little more for the “free lunch” 
that they enjoy when Calcite “just works” in their project. Sadly, most 
engineering managers are not subscribed to this list.

Julian


> On Jun 21, 2022, at 9:49 AM, Jacques Nadeau  wrote:
> 
> Martijn, thanks for sharing that thread in the Flink community.
> 
> I'm someone who has forked Calcite twice: once in Apache Drill and again in
> Dremio. In both cases, it was all about trading short term benefits against
> long term costs. In both cases, I think the net amount of work was probably
> 5x as much as what it would have been if we had just done a better job
> engaging the community. If I were to state the curve of behavior over six
> years, I'd guess that in both cases the numbers of effort looked like this:
> 
> estimated effort doing high intensity integration with calcite (years 1-6)
> fork: 1, 5, 10, 50, 100, 200, total = 366
> non-fork: 10, 10, 10, 10, 10, total = 50
> 
> So yes, the first couple years you're ahead. But you pay a massive
> technical debt premium long term. Early in a project (Drill) or company's
> life (Dremio), it can make sense to sacrifice long term for short term but
> it's important people do it with their eyes open.
> 
> The reason that this pain is so high is that as your codebases diverge, you
> start having to do everything the Calcite community does by yourself.
> Backports become harder and things that you need (e.g. new sql syntax, etc)
> have to be reimplemented (even if someone else already implemented them in
> some post-fork Calcite version. Ultimately, at some point you realize that
> your path is untenable and you unfork. This becomes the biggest expense of
> them all and I believe both of those teams are still trying to un-fork. The
> additional thing that becomes an even bigger problem is your absence from
> the Calcite community means that people may take the project or APIs in
> ways that are in direct conflict to how you use the library. Since you're
> not active in the project, you fail to provide a counterpoint and then
> you're basically just in a miserable place. The Hive project did this best
> by ensuring that releases of Calcite were also run pre-release against Hive
> to make sure no major regressions occurred. By being in the community and
> active, this is the best state from my pov. (It makes your project better
> and Calcite better.)
> 
> Two last notes:
> - I'm not sure the rocks fork is comparable to forking Calcite. The api
> surface area and community models are very different.
> - This is all based on a high intensity integration (using rules + planner
> or sql + rules + planner). Calcite is frustratingly monolithic and if
> someone was only going to use a small component, my opinion would likely be
> very different.
> 
> I'd send this to the Flink list but I'm not subscribed. It'd be great if
> you shared it with the people over there if you think they'd find it useful.
> 
> 
> 
> On Tue, Jun 21, 2022 at 12:31 AM Martijn Visser 
> wrote:
> 
>> Thanks Julian and Austin!
>> 
>> Any reply to kick-off some sort of discussion is worthwhile :D
>> I definitely know the feeling of having more PRs open then you would like,
>> looking at https://github.com/apache/flink/pulls :)
>> 
>> There have been discussions in the Flink community about forking Calcite
>> [1]. My personal preference at the moment is to see if we can create a
>> better collaboration and community. I believe that we can find people from
>> the Flink community who can open / help reviewing Calcite PRs that are
>> interesting for the Flink community. The question is if that will also help
>> short term since in the end it still requires a Calcite maintainer to
>> review/merge.
>> 
>> Best regards,
>> 
>> Martijn
>> 
>> [1] https://lists.apache.org/thread/1oqydpsm4mc55bkk440gx9lr9gf2rvf4
>> 
>> 
>> Op ma 20 jun. 2022 om 23:51 schreef Austin Bennett <
>> whatwouldausti...@gmail.com>:
>> 
>>> From the peanut gallery :-)  -->
>>> 
>>> Wow; yes, lots of open PRs.  https://github.com/apache/calcite/pulls
>>> 
>>> How can individuals from the Flink [sub-]community, and/or more general
>>> calcite community help lighten this load?  Is there much weight given to
>>> reviews from non-committers; how to increase the # of people capable of
>>> 

Re: PR Review Request

2022-06-21 Thread Jacques Nadeau
Martijn, thanks for sharing that thread in the Flink community.

I'm someone who has forked Calcite twice: once in Apache Drill and again in
Dremio. In both cases, it was all about trading short term benefits against
long term costs. In both cases, I think the net amount of work was probably
5x as much as what it would have been if we had just done a better job
engaging the community. If I were to state the curve of behavior over six
years, I'd guess that in both cases the numbers of effort looked like this:

estimated effort doing high intensity integration with calcite (years 1-6)
fork: 1, 5, 10, 50, 100, 200, total = 366
non-fork: 10, 10, 10, 10, 10, total = 50

So yes, the first couple years you're ahead. But you pay a massive
technical debt premium long term. Early in a project (Drill) or company's
life (Dremio), it can make sense to sacrifice long term for short term but
it's important people do it with their eyes open.

The reason that this pain is so high is that as your codebases diverge, you
start having to do everything the Calcite community does by yourself.
Backports become harder and things that you need (e.g. new sql syntax, etc)
have to be reimplemented (even if someone else already implemented them in
some post-fork Calcite version. Ultimately, at some point you realize that
your path is untenable and you unfork. This becomes the biggest expense of
them all and I believe both of those teams are still trying to un-fork. The
additional thing that becomes an even bigger problem is your absence from
the Calcite community means that people may take the project or APIs in
ways that are in direct conflict to how you use the library. Since you're
not active in the project, you fail to provide a counterpoint and then
you're basically just in a miserable place. The Hive project did this best
by ensuring that releases of Calcite were also run pre-release against Hive
to make sure no major regressions occurred. By being in the community and
active, this is the best state from my pov. (It makes your project better
and Calcite better.)

Two last notes:
- I'm not sure the rocks fork is comparable to forking Calcite. The api
surface area and community models are very different.
- This is all based on a high intensity integration (using rules + planner
or sql + rules + planner). Calcite is frustratingly monolithic and if
someone was only going to use a small component, my opinion would likely be
very different.

I'd send this to the Flink list but I'm not subscribed. It'd be great if
you shared it with the people over there if you think they'd find it useful.



On Tue, Jun 21, 2022 at 12:31 AM Martijn Visser 
wrote:

> Thanks Julian and Austin!
>
> Any reply to kick-off some sort of discussion is worthwhile :D
> I definitely know the feeling of having more PRs open then you would like,
> looking at https://github.com/apache/flink/pulls :)
>
> There have been discussions in the Flink community about forking Calcite
> [1]. My personal preference at the moment is to see if we can create a
> better collaboration and community. I believe that we can find people from
> the Flink community who can open / help reviewing Calcite PRs that are
> interesting for the Flink community. The question is if that will also help
> short term since in the end it still requires a Calcite maintainer to
> review/merge.
>
> Best regards,
>
> Martijn
>
> [1] https://lists.apache.org/thread/1oqydpsm4mc55bkk440gx9lr9gf2rvf4
>
>
> Op ma 20 jun. 2022 om 23:51 schreef Austin Bennett <
> whatwouldausti...@gmail.com>:
>
> > From the peanut gallery :-)  -->
> >
> > Wow; yes, lots of open PRs.  https://github.com/apache/calcite/pulls
> >
> > How can individuals from the Flink [sub-]community, and/or more general
> > calcite community help lighten this load?  Is there much weight given to
> > reviews from non-committers; how to increase the # of people capable of
> > providing worthwhile reviews [ that are recognized as such ]?
> >
> >
> >
> > On Mon, Jun 20, 2022 at 11:47 AM Julian Hyde 
> > wrote:
> >
> > > Martijn,
> > >
> > > Since you requested a reply, I am replying. To answer your question, I
> > > don’t know of a way to move this topic forward. We have more PRs than
> > > people to review them.
> > >
> > > Julian
> > >
> > >
> > > > On Jun 19, 2022, at 11:58 PM, Martijn Visser <
> martijnvis...@apache.org
> > >
> > > wrote:
> > > >
> > > > Hi everyone,
> > > >
> > > > I just wanted to reach out to the Calcite community once more on this
> > > topic
> > > > since no reply was received. Would be great if someone could get back
> > to
> > > us.
> > > >
> > > > Best regards,
> > > >
> > > > Martijn
> > > >
> > > > Op wo 8 jun. 2022 om 11:24 schreef Martijn Visser <
> > > martijnvis...@apache.org
> > > >> :
> > > >
> > > >> Hi everyone,
> > > >>
> > > >> I would like to follow-up on this email that was sent by Jing. So
> far,
> > > no
> > > >> progress has been made, despite reaching out to the mailing list,
> the
> > > >> original Jira 

Re: PR Review Request

2022-06-21 Thread Martijn Visser
Thanks Julian and Austin!

Any reply to kick-off some sort of discussion is worthwhile :D
I definitely know the feeling of having more PRs open then you would like,
looking at https://github.com/apache/flink/pulls :)

There have been discussions in the Flink community about forking Calcite
[1]. My personal preference at the moment is to see if we can create a
better collaboration and community. I believe that we can find people from
the Flink community who can open / help reviewing Calcite PRs that are
interesting for the Flink community. The question is if that will also help
short term since in the end it still requires a Calcite maintainer to
review/merge.

Best regards,

Martijn

[1] https://lists.apache.org/thread/1oqydpsm4mc55bkk440gx9lr9gf2rvf4


Op ma 20 jun. 2022 om 23:51 schreef Austin Bennett <
whatwouldausti...@gmail.com>:

> From the peanut gallery :-)  -->
>
> Wow; yes, lots of open PRs.  https://github.com/apache/calcite/pulls
>
> How can individuals from the Flink [sub-]community, and/or more general
> calcite community help lighten this load?  Is there much weight given to
> reviews from non-committers; how to increase the # of people capable of
> providing worthwhile reviews [ that are recognized as such ]?
>
>
>
> On Mon, Jun 20, 2022 at 11:47 AM Julian Hyde 
> wrote:
>
> > Martijn,
> >
> > Since you requested a reply, I am replying. To answer your question, I
> > don’t know of a way to move this topic forward. We have more PRs than
> > people to review them.
> >
> > Julian
> >
> >
> > > On Jun 19, 2022, at 11:58 PM, Martijn Visser  >
> > wrote:
> > >
> > > Hi everyone,
> > >
> > > I just wanted to reach out to the Calcite community once more on this
> > topic
> > > since no reply was received. Would be great if someone could get back
> to
> > us.
> > >
> > > Best regards,
> > >
> > > Martijn
> > >
> > > Op wo 8 jun. 2022 om 11:24 schreef Martijn Visser <
> > martijnvis...@apache.org
> > >> :
> > >
> > >> Hi everyone,
> > >>
> > >> I would like to follow-up on this email that was sent by Jing. So far,
> > no
> > >> progress has been made, despite reaching out to the mailing list, the
> > >> original Jira ticket and reaching out to people directly. Is there a
> way
> > >> that we can move this PR/topic forward?
> > >>
> > >> For context, in Apache Flink we're currently heavily using Calcite.
> > >> However, we are now at the stage where Calcite is actually holding us
> > back.
> > >> It would be great if we can find a way to strengthen our bond and move
> > both
> > >> Calcite and Flink forward.
> > >>
> > >> Looking forward to your thoughts,
> > >>
> > >> Martijn
> > >>
> > >> On 2022/01/26 07:05:37 Jing Zhang wrote:
> > >>> Hi community,
> > >>> My apologies for interrupting.
> > >>> Anyone could help to review the pr
> > >>> https://github.com/apache/calcite/pull/2606?
> > >>> Thanks a lot.
> > >>>
> > >>> CALCITE-4865 is the first sub-task of CALCITE-4864. This Jira aims to
> > >>> extend existing Table function in order to support Polymorphic Table
> > >>> Function which is introduced as the part of ANSI SQL 2016.
> > >>>
> > >>> The brief change logs of the PR are:
> > >>>  - Update `Parser.jj` to support partition by clause and order by
> > clause
> > >>> for input table with set semantics of PTF
> > >>>  - Introduce `TableCharacteristics` which contains three
> > characteristics
> > >>> of input table of table function
> > >>>  - Update `SqlTableFunction` to add a method `tableCharacteristics`,
> > >> the
> > >>> method returns the table characteristics for the ordinal-th argument
> to
> > >>> this table function. Default return value is Optional.empty which
> means
> > >> the
> > >>> ordinal-th argument is not table.
> > >>>  - Introduce `SqlSetSemanticsTable` which represents input table with
> > >> set
> > >>> semantics of Table Function, its `SqlKind` is `SET_SEMANTICS_TABLE`
> > >>>  - Updates `SqlValidatorImpl` to validate only set semantic table of
> > >> Table
> > >>> Function could have partition by and order by clause
> > >>>  - Update `SqlToRelConverter#substituteSubQuery` to parse subQuery
> > which
> > >>> represents set semantics table.
> > >>>
> > >>> PR: https://github.com/apache/calcite/pull/2606
> > >>> JIRA: https://issues.apache.org/jira/browse/CALCITE-4865
> > >>> Parent JARA: https://issues.apache.org/jira/browse/CALCITE-4864
> > >>>
> > >>> Best,
> > >>> Jing Zhang
> > >>>
> > >>
> >
> >
>


Re: PR Review Request

2022-06-20 Thread Austin Bennett
>From the peanut gallery :-)  -->

Wow; yes, lots of open PRs.  https://github.com/apache/calcite/pulls

How can individuals from the Flink [sub-]community, and/or more general
calcite community help lighten this load?  Is there much weight given to
reviews from non-committers; how to increase the # of people capable of
providing worthwhile reviews [ that are recognized as such ]?



On Mon, Jun 20, 2022 at 11:47 AM Julian Hyde  wrote:

> Martijn,
>
> Since you requested a reply, I am replying. To answer your question, I
> don’t know of a way to move this topic forward. We have more PRs than
> people to review them.
>
> Julian
>
>
> > On Jun 19, 2022, at 11:58 PM, Martijn Visser 
> wrote:
> >
> > Hi everyone,
> >
> > I just wanted to reach out to the Calcite community once more on this
> topic
> > since no reply was received. Would be great if someone could get back to
> us.
> >
> > Best regards,
> >
> > Martijn
> >
> > Op wo 8 jun. 2022 om 11:24 schreef Martijn Visser <
> martijnvis...@apache.org
> >> :
> >
> >> Hi everyone,
> >>
> >> I would like to follow-up on this email that was sent by Jing. So far,
> no
> >> progress has been made, despite reaching out to the mailing list, the
> >> original Jira ticket and reaching out to people directly. Is there a way
> >> that we can move this PR/topic forward?
> >>
> >> For context, in Apache Flink we're currently heavily using Calcite.
> >> However, we are now at the stage where Calcite is actually holding us
> back.
> >> It would be great if we can find a way to strengthen our bond and move
> both
> >> Calcite and Flink forward.
> >>
> >> Looking forward to your thoughts,
> >>
> >> Martijn
> >>
> >> On 2022/01/26 07:05:37 Jing Zhang wrote:
> >>> Hi community,
> >>> My apologies for interrupting.
> >>> Anyone could help to review the pr
> >>> https://github.com/apache/calcite/pull/2606?
> >>> Thanks a lot.
> >>>
> >>> CALCITE-4865 is the first sub-task of CALCITE-4864. This Jira aims to
> >>> extend existing Table function in order to support Polymorphic Table
> >>> Function which is introduced as the part of ANSI SQL 2016.
> >>>
> >>> The brief change logs of the PR are:
> >>>  - Update `Parser.jj` to support partition by clause and order by
> clause
> >>> for input table with set semantics of PTF
> >>>  - Introduce `TableCharacteristics` which contains three
> characteristics
> >>> of input table of table function
> >>>  - Update `SqlTableFunction` to add a method `tableCharacteristics`,
> >> the
> >>> method returns the table characteristics for the ordinal-th argument to
> >>> this table function. Default return value is Optional.empty which means
> >> the
> >>> ordinal-th argument is not table.
> >>>  - Introduce `SqlSetSemanticsTable` which represents input table with
> >> set
> >>> semantics of Table Function, its `SqlKind` is `SET_SEMANTICS_TABLE`
> >>>  - Updates `SqlValidatorImpl` to validate only set semantic table of
> >> Table
> >>> Function could have partition by and order by clause
> >>>  - Update `SqlToRelConverter#substituteSubQuery` to parse subQuery
> which
> >>> represents set semantics table.
> >>>
> >>> PR: https://github.com/apache/calcite/pull/2606
> >>> JIRA: https://issues.apache.org/jira/browse/CALCITE-4865
> >>> Parent JARA: https://issues.apache.org/jira/browse/CALCITE-4864
> >>>
> >>> Best,
> >>> Jing Zhang
> >>>
> >>
>
>


Re: PR Review Request

2022-06-20 Thread Julian Hyde
Martijn,

Since you requested a reply, I am replying. To answer your question, I don’t 
know of a way to move this topic forward. We have more PRs than people to 
review them.

Julian


> On Jun 19, 2022, at 11:58 PM, Martijn Visser  wrote:
> 
> Hi everyone,
> 
> I just wanted to reach out to the Calcite community once more on this topic
> since no reply was received. Would be great if someone could get back to us.
> 
> Best regards,
> 
> Martijn
> 
> Op wo 8 jun. 2022 om 11:24 schreef Martijn Visser > :
> 
>> Hi everyone,
>> 
>> I would like to follow-up on this email that was sent by Jing. So far, no
>> progress has been made, despite reaching out to the mailing list, the
>> original Jira ticket and reaching out to people directly. Is there a way
>> that we can move this PR/topic forward?
>> 
>> For context, in Apache Flink we're currently heavily using Calcite.
>> However, we are now at the stage where Calcite is actually holding us back.
>> It would be great if we can find a way to strengthen our bond and move both
>> Calcite and Flink forward.
>> 
>> Looking forward to your thoughts,
>> 
>> Martijn
>> 
>> On 2022/01/26 07:05:37 Jing Zhang wrote:
>>> Hi community,
>>> My apologies for interrupting.
>>> Anyone could help to review the pr
>>> https://github.com/apache/calcite/pull/2606?
>>> Thanks a lot.
>>> 
>>> CALCITE-4865 is the first sub-task of CALCITE-4864. This Jira aims to
>>> extend existing Table function in order to support Polymorphic Table
>>> Function which is introduced as the part of ANSI SQL 2016.
>>> 
>>> The brief change logs of the PR are:
>>>  - Update `Parser.jj` to support partition by clause and order by clause
>>> for input table with set semantics of PTF
>>>  - Introduce `TableCharacteristics` which contains three characteristics
>>> of input table of table function
>>>  - Update `SqlTableFunction` to add a method `tableCharacteristics`,
>> the
>>> method returns the table characteristics for the ordinal-th argument to
>>> this table function. Default return value is Optional.empty which means
>> the
>>> ordinal-th argument is not table.
>>>  - Introduce `SqlSetSemanticsTable` which represents input table with
>> set
>>> semantics of Table Function, its `SqlKind` is `SET_SEMANTICS_TABLE`
>>>  - Updates `SqlValidatorImpl` to validate only set semantic table of
>> Table
>>> Function could have partition by and order by clause
>>>  - Update `SqlToRelConverter#substituteSubQuery` to parse subQuery which
>>> represents set semantics table.
>>> 
>>> PR: https://github.com/apache/calcite/pull/2606
>>> JIRA: https://issues.apache.org/jira/browse/CALCITE-4865
>>> Parent JARA: https://issues.apache.org/jira/browse/CALCITE-4864
>>> 
>>> Best,
>>> Jing Zhang
>>> 
>> 



Re: PR Review Request

2022-06-20 Thread Martijn Visser
Hi everyone,

I just wanted to reach out to the Calcite community once more on this topic
since no reply was received. Would be great if someone could get back to us.

Best regards,

Martijn

Op wo 8 jun. 2022 om 11:24 schreef Martijn Visser :

> Hi everyone,
>
> I would like to follow-up on this email that was sent by Jing. So far, no
> progress has been made, despite reaching out to the mailing list, the
> original Jira ticket and reaching out to people directly. Is there a way
> that we can move this PR/topic forward?
>
> For context, in Apache Flink we're currently heavily using Calcite.
> However, we are now at the stage where Calcite is actually holding us back.
> It would be great if we can find a way to strengthen our bond and move both
> Calcite and Flink forward.
>
> Looking forward to your thoughts,
>
> Martijn
>
> On 2022/01/26 07:05:37 Jing Zhang wrote:
> > Hi community,
> > My apologies for interrupting.
> > Anyone could help to review the pr
> > https://github.com/apache/calcite/pull/2606?
> > Thanks a lot.
> >
> > CALCITE-4865 is the first sub-task of CALCITE-4864. This Jira aims to
> > extend existing Table function in order to support Polymorphic Table
> > Function which is introduced as the part of ANSI SQL 2016.
> >
> > The brief change logs of the PR are:
> >   - Update `Parser.jj` to support partition by clause and order by clause
> > for input table with set semantics of PTF
> >   - Introduce `TableCharacteristics` which contains three characteristics
> > of input table of table function
> >   - Update `SqlTableFunction` to add a method `tableCharacteristics`,
> the
> > method returns the table characteristics for the ordinal-th argument to
> > this table function. Default return value is Optional.empty which means
> the
> > ordinal-th argument is not table.
> >   - Introduce `SqlSetSemanticsTable` which represents input table with
> set
> > semantics of Table Function, its `SqlKind` is `SET_SEMANTICS_TABLE`
> >   - Updates `SqlValidatorImpl` to validate only set semantic table of
> Table
> > Function could have partition by and order by clause
> >   - Update `SqlToRelConverter#substituteSubQuery` to parse subQuery which
> > represents set semantics table.
> >
> > PR: https://github.com/apache/calcite/pull/2606
> > JIRA: https://issues.apache.org/jira/browse/CALCITE-4865
> > Parent JARA: https://issues.apache.org/jira/browse/CALCITE-4864
> >
> > Best,
> > Jing Zhang
> >
>


Re: PR Review Request

2022-06-08 Thread Martijn Visser
Hi everyone,

I would like to follow-up on this email that was sent by Jing. So far, no 
progress has been made, despite reaching out to the mailing list, the original 
Jira ticket and reaching out to people directly. Is there a way that we can 
move this PR/topic forward? 

For context, in Apache Flink we're currently heavily using Calcite. However, we 
are now at the stage where Calcite is actually holding us back. It would be 
great if we can find a way to strengthen our bond and move both Calcite and 
Flink forward. 

Looking forward to your thoughts,

Martijn

On 2022/01/26 07:05:37 Jing Zhang wrote:
> Hi community,
> My apologies for interrupting.
> Anyone could help to review the pr
> https://github.com/apache/calcite/pull/2606?
> Thanks a lot.
> 
> CALCITE-4865 is the first sub-task of CALCITE-4864. This Jira aims to
> extend existing Table function in order to support Polymorphic Table
> Function which is introduced as the part of ANSI SQL 2016.
> 
> The brief change logs of the PR are:
>   - Update `Parser.jj` to support partition by clause and order by clause
> for input table with set semantics of PTF
>   - Introduce `TableCharacteristics` which contains three characteristics
> of input table of table function
>   - Update `SqlTableFunction` to add a method `tableCharacteristics`,  the
> method returns the table characteristics for the ordinal-th argument to
> this table function. Default return value is Optional.empty which means the
> ordinal-th argument is not table.
>   - Introduce `SqlSetSemanticsTable` which represents input table with set
> semantics of Table Function, its `SqlKind` is `SET_SEMANTICS_TABLE`
>   - Updates `SqlValidatorImpl` to validate only set semantic table of Table
> Function could have partition by and order by clause
>   - Update `SqlToRelConverter#substituteSubQuery` to parse subQuery which
> represents set semantics table.
> 
> PR: https://github.com/apache/calcite/pull/2606
> JIRA: https://issues.apache.org/jira/browse/CALCITE-4865
> Parent JARA: https://issues.apache.org/jira/browse/CALCITE-4864
> 
> Best,
> Jing Zhang
>