Re: [DISCUSS] Remove "final" from AbstractRelNode#getRelTypeName

2020-07-17 Thread Julian Hyde
Several of the constructors of Enumerable RelNodes have the comment "Use {@link 
#create} unless you know what you're doing”. I think it’s pretty clear - the 
class and constructor had to be public for various reasons, but people should 
not treat it as a blank check.

If your main motivation is consistency, you might have mentioned that comment.

But I suspect many people are motivated by maximizing freedom for downstream 
projects. That’s fine, but that freedom is a burden on Calcite, and it is not 
clear-cut that more freedom for downstream projects is a net benefit for 
Calcite. As committers we should be thinking solely of the last consideration.

Julian




> On Jul 17, 2020, at 8:37 AM, Haisheng Yuan  wrote:
> 
> Agree with Ruben.
> 
> I don't see any convincing reasons to make EnumerableMergeJoin special.
> 
> On 2020/07/17 07:46:35, Ruben Q L  wrote: 
>> Regarding CALCITE-4123, I can provide a bit more of context.
>> 
>> I have been working on some improvements in EnumerableMergeJoin (support
>> new join types). I found that the easiest way to do so was extending the
>> class as MyOwnEnumerableMergeJoin and override the "implement" method, so
>> that instead of calling EnumerableDefault's mergeJoin, it calls my own
>> mergeJoin method, where I can develop and test these evolutions.
>> Eventually, once this code is satisfactory, I will push the improvements
>> upstream, so that they will be part of the following release (if the
>> community agrees), and I can get rid of MyOwnEnumerableMergeJoin and use
>> the default one instead. But this is a process that takes some time, so for
>> a certain period I have the improvements in my project via
>> MyOwnEnumerableMergeJoin, which is fine. Today this is not possible,
>> because EnumerableMergeJoin is not extensible (due to the package-private
>> constructor) so I need to create MyOwnEnumerableMergeJoin "from scratch"
>> (extending Join) and copy-pasting most of the code from
>> EnumerableMergeJoin. So it is possible to achieve my goal, but this is
>> (IMHO) not optimal. Moreover, I find it hard to understand why one could
>> use this extension-approach on EnumerableHashJoin or
>> EnumerableNestedLoopJoin (with protected constructors), but not with
>> EnumerableMergeJoin (with package-private constructor). Why is MergeJoin so
>> special?
>> 
>> Recently I found a different example of extending enumerable operators. I
>> started to test the topDown mechanism in Volcano Planner (great feature
>> btw), and I found myself in a position where I had to extend EnumerableCalc
>> to exploit this feature as much as possible. Basically I had to create
>> MyOwnEnumerableCalc (which extends EnumerableCalc) to override the
>> passThroughTraits method. The reason is that in my project I can produce an
>> IndexScan with some "special" collations (sub-fields on a struct field),
>> which are not supported by Calcite's standard collation mechanism (and
>> therefore were not passed through by EnumerableCalc#passThroughTraits to a
>> potential underlying Scan). The solution for that was simple, create
>> MyOwnEnumerableCalc and override passThroughTraits to provide my own
>> implementation considering also the specificities of my special collations,
>> and everything works perfectly. This is a very particular scenario that
>> will not be pushed upstream into standard Calcite, so probably I will have
>> to live with MyOwnEnumerableCalc, which I think is totally fine. And in
>> this case I had no problem with extension, because EnumerableCalc provides
>> a public constructor.
>> 
>> Honesty, I never thought this change would cause such a big discussion, I
>> will be more careful in the future and provide arguments in modifications
>> like this one, before going on with the change.
>> I do not want to block 1.24 on such a minor thing, if the community does
>> not agree with my change, or we need more time for discussion, I can revert
>> the change and re-consider it for 1.25. Personally, with the reasons that I
>> have provided, I think we can keep the change and release it in 1.24.
>> 
>> Best,
>> Ruben
>> 
>> 
>> 
>> Le ven. 17 juil. 2020 à 08:24, Julian Hyde  a écrit :
>> 
>>> When I write an API and do not include the word 'protected' on a
>>> constructor I am saying 'this is not a public API'. So someone coming
>>> along and adding the word 'protected' is actually making a big change.
>>> 
>>> There are other ways to extend code than sub-classing. I don't know
>>> what use case Ruben has for extending EnumerableMergeJoin. I never
>>> will, because I can't see his code. EnumerableMergeJoin, like all
>>> RelNodes, is extensible by new RexNode operators, new types, new kinds
>>> of collation, and by composing with other RelNodes. With any of those
>>> other extension mechanisms, Calcite's language would have been richer.
>>> There is a lot to be said for keeping the building blocks fixed, and
>>> creating a new extension point only when we're tried everything else.
>>> 
>>> 

Re: [DISCUSS] Remove "final" from AbstractRelNode#getRelTypeName

2020-07-17 Thread Haisheng Yuan
Agree with Ruben.

I don't see any convincing reasons to make EnumerableMergeJoin special.

On 2020/07/17 07:46:35, Ruben Q L  wrote: 
> Regarding CALCITE-4123, I can provide a bit more of context.
> 
> I have been working on some improvements in EnumerableMergeJoin (support
> new join types). I found that the easiest way to do so was extending the
> class as MyOwnEnumerableMergeJoin and override the "implement" method, so
> that instead of calling EnumerableDefault's mergeJoin, it calls my own
> mergeJoin method, where I can develop and test these evolutions.
> Eventually, once this code is satisfactory, I will push the improvements
> upstream, so that they will be part of the following release (if the
> community agrees), and I can get rid of MyOwnEnumerableMergeJoin and use
> the default one instead. But this is a process that takes some time, so for
> a certain period I have the improvements in my project via
> MyOwnEnumerableMergeJoin, which is fine. Today this is not possible,
> because EnumerableMergeJoin is not extensible (due to the package-private
> constructor) so I need to create MyOwnEnumerableMergeJoin "from scratch"
> (extending Join) and copy-pasting most of the code from
> EnumerableMergeJoin. So it is possible to achieve my goal, but this is
> (IMHO) not optimal. Moreover, I find it hard to understand why one could
> use this extension-approach on EnumerableHashJoin or
> EnumerableNestedLoopJoin (with protected constructors), but not with
> EnumerableMergeJoin (with package-private constructor). Why is MergeJoin so
> special?
> 
> Recently I found a different example of extending enumerable operators. I
> started to test the topDown mechanism in Volcano Planner (great feature
> btw), and I found myself in a position where I had to extend EnumerableCalc
> to exploit this feature as much as possible. Basically I had to create
> MyOwnEnumerableCalc (which extends EnumerableCalc) to override the
> passThroughTraits method. The reason is that in my project I can produce an
> IndexScan with some "special" collations (sub-fields on a struct field),
> which are not supported by Calcite's standard collation mechanism (and
> therefore were not passed through by EnumerableCalc#passThroughTraits to a
> potential underlying Scan). The solution for that was simple, create
> MyOwnEnumerableCalc and override passThroughTraits to provide my own
> implementation considering also the specificities of my special collations,
> and everything works perfectly. This is a very particular scenario that
> will not be pushed upstream into standard Calcite, so probably I will have
> to live with MyOwnEnumerableCalc, which I think is totally fine. And in
> this case I had no problem with extension, because EnumerableCalc provides
> a public constructor.
> 
> Honesty, I never thought this change would cause such a big discussion, I
> will be more careful in the future and provide arguments in modifications
> like this one, before going on with the change.
> I do not want to block 1.24 on such a minor thing, if the community does
> not agree with my change, or we need more time for discussion, I can revert
> the change and re-consider it for 1.25. Personally, with the reasons that I
> have provided, I think we can keep the change and release it in 1.24.
> 
> Best,
> Ruben
> 
> 
> 
> Le ven. 17 juil. 2020 à 08:24, Julian Hyde  a écrit :
> 
> > When I write an API and do not include the word 'protected' on a
> > constructor I am saying 'this is not a public API'. So someone coming
> > along and adding the word 'protected' is actually making a big change.
> >
> > There are other ways to extend code than sub-classing. I don't know
> > what use case Ruben has for extending EnumerableMergeJoin. I never
> > will, because I can't see his code. EnumerableMergeJoin, like all
> > RelNodes, is extensible by new RexNode operators, new types, new kinds
> > of collation, and by composing with other RelNodes. With any of those
> > other extension mechanisms, Calcite's language would have been richer.
> > There is a lot to be said for keeping the building blocks fixed, and
> > creating a new extension point only when we're tried everything else.
> >
> > On Thu, Jul 16, 2020 at 3:49 PM Stamatis Zampetakis 
> > wrote:
> > >
> > > +1 for removing the final modifier for all the reasons mentioned so far.
> > >
> > > Regarding CALCITE-4123, I find Ruben's justification convincing.
> > > The majority of Enumerable operators (not only joins) are extensible
> > (with
> > > public/protected constructors) so it seems that EnumerableMergeJoin was
> > the
> > > only exception to this pattern.
> > > Now if we want to keep these APIs internal it is another discussion and I
> > > think it should consider all Enumerable operators so for the moment I am
> > > fine with the current change.
> > > The only question from my side is why it is necessary to extend
> > > EnumerableMergeJoin in the first place but this is mostly from curiosity,
> > > and to see 

Re: [DISCUSS] Remove "final" from AbstractRelNode#getRelTypeName

2020-07-17 Thread Ruben Q L
Regarding CALCITE-4123, I can provide a bit more of context.

I have been working on some improvements in EnumerableMergeJoin (support
new join types). I found that the easiest way to do so was extending the
class as MyOwnEnumerableMergeJoin and override the "implement" method, so
that instead of calling EnumerableDefault's mergeJoin, it calls my own
mergeJoin method, where I can develop and test these evolutions.
Eventually, once this code is satisfactory, I will push the improvements
upstream, so that they will be part of the following release (if the
community agrees), and I can get rid of MyOwnEnumerableMergeJoin and use
the default one instead. But this is a process that takes some time, so for
a certain period I have the improvements in my project via
MyOwnEnumerableMergeJoin, which is fine. Today this is not possible,
because EnumerableMergeJoin is not extensible (due to the package-private
constructor) so I need to create MyOwnEnumerableMergeJoin "from scratch"
(extending Join) and copy-pasting most of the code from
EnumerableMergeJoin. So it is possible to achieve my goal, but this is
(IMHO) not optimal. Moreover, I find it hard to understand why one could
use this extension-approach on EnumerableHashJoin or
EnumerableNestedLoopJoin (with protected constructors), but not with
EnumerableMergeJoin (with package-private constructor). Why is MergeJoin so
special?

Recently I found a different example of extending enumerable operators. I
started to test the topDown mechanism in Volcano Planner (great feature
btw), and I found myself in a position where I had to extend EnumerableCalc
to exploit this feature as much as possible. Basically I had to create
MyOwnEnumerableCalc (which extends EnumerableCalc) to override the
passThroughTraits method. The reason is that in my project I can produce an
IndexScan with some "special" collations (sub-fields on a struct field),
which are not supported by Calcite's standard collation mechanism (and
therefore were not passed through by EnumerableCalc#passThroughTraits to a
potential underlying Scan). The solution for that was simple, create
MyOwnEnumerableCalc and override passThroughTraits to provide my own
implementation considering also the specificities of my special collations,
and everything works perfectly. This is a very particular scenario that
will not be pushed upstream into standard Calcite, so probably I will have
to live with MyOwnEnumerableCalc, which I think is totally fine. And in
this case I had no problem with extension, because EnumerableCalc provides
a public constructor.

Honesty, I never thought this change would cause such a big discussion, I
will be more careful in the future and provide arguments in modifications
like this one, before going on with the change.
I do not want to block 1.24 on such a minor thing, if the community does
not agree with my change, or we need more time for discussion, I can revert
the change and re-consider it for 1.25. Personally, with the reasons that I
have provided, I think we can keep the change and release it in 1.24.

Best,
Ruben



Le ven. 17 juil. 2020 à 08:24, Julian Hyde  a écrit :

> When I write an API and do not include the word 'protected' on a
> constructor I am saying 'this is not a public API'. So someone coming
> along and adding the word 'protected' is actually making a big change.
>
> There are other ways to extend code than sub-classing. I don't know
> what use case Ruben has for extending EnumerableMergeJoin. I never
> will, because I can't see his code. EnumerableMergeJoin, like all
> RelNodes, is extensible by new RexNode operators, new types, new kinds
> of collation, and by composing with other RelNodes. With any of those
> other extension mechanisms, Calcite's language would have been richer.
> There is a lot to be said for keeping the building blocks fixed, and
> creating a new extension point only when we're tried everything else.
>
> On Thu, Jul 16, 2020 at 3:49 PM Stamatis Zampetakis 
> wrote:
> >
> > +1 for removing the final modifier for all the reasons mentioned so far.
> >
> > Regarding CALCITE-4123, I find Ruben's justification convincing.
> > The majority of Enumerable operators (not only joins) are extensible
> (with
> > public/protected constructors) so it seems that EnumerableMergeJoin was
> the
> > only exception to this pattern.
> > Now if we want to keep these APIs internal it is another discussion and I
> > think it should consider all Enumerable operators so for the moment I am
> > fine with the current change.
> > The only question from my side is why it is necessary to extend
> > EnumerableMergeJoin in the first place but this is mostly from curiosity,
> > and to see if there are things that should be changed in the operator to
> > avoid such extensions.
> > Let's continue the discussion under the respective JIRA if necessary.
> >
> > Best,
> > Stamatis
> >
> >
> > On Fri, Jul 17, 2020 at 1:30 AM Rui Wang  wrote:
> >
> > > Since the concern on CALCLITE-4123 is brought 

Re: [DISCUSS] Remove "final" from AbstractRelNode#getRelTypeName

2020-07-16 Thread Stamatis Zampetakis
+1 for removing the final modifier for all the reasons mentioned so far.

Regarding CALCITE-4123, I find Ruben's justification convincing.
The majority of Enumerable operators (not only joins) are extensible (with
public/protected constructors) so it seems that EnumerableMergeJoin was the
only exception to this pattern.
Now if we want to keep these APIs internal it is another discussion and I
think it should consider all Enumerable operators so for the moment I am
fine with the current change.
The only question from my side is why it is necessary to extend
EnumerableMergeJoin in the first place but this is mostly from curiosity,
and to see if there are things that should be changed in the operator to
avoid such extensions.
Let's continue the discussion under the respective JIRA if necessary.

Best,
Stamatis


On Fri, Jul 17, 2020 at 1:30 AM Rui Wang  wrote:

> Since the concern on CALCLITE-4123 is brought up, can we make a quick
> decision about whether to make it in the 1.24.0 release or not?
>
> Having the public interface increased in 1.24.0 and then later there might
> be a decision to revert it doesn't sound right. If we are not sure for now,
> maybe have a quick revert to unblock 1.24.0 release first?
>
>
> -Rui
>
> On Thu, Jul 16, 2020 at 3:17 PM Ruben Q L  wrote:
>
> > Thanks for the feedback, I will create a Jira ticket to carry out the
> > change and remove the "final" from AbstractRelNode#getRelTypeName.
> >
> > @Julian, I am sorry I committed that change (
> > https://issues.apache.org/jira/browse/CALCITE-4123) without further
> > discussion. I honestly thought it was a minor detail. We can discuss it,
> > and if the community does not agree with the change, we can revert it.
> > Basically, the package-private constructor in EnumerableMergeJoin
> prevents
> > it from being extended in downstream projects. This is not consistent
> with
> > other Join operators, such as EnumerableHashJoin,
> EnumerableNestedLoopJoin
> > or EnumerbleBatchNestedLoopJoin, all of them providing a protected
> > constructor. So I do not see a reason why a downstream project can extend
> > e.g. EnumerableHashJoin, but is unable to extend EnumerableMergeJoin. I
> > think there is no issue in extending these clases (otherwise they would
> be
> > final). I believed changing the constructor from package-private to
> > protected would solve this issue, without causing any
> > problematic side-effect.
> >
> > Best,
> > Ruben
> >
> >
> > Le jeu. 16 juil. 2020 à 21:32, Julian Hyde  a écrit :
> >
> > > I don't have a problem with this change.
> > >
> > > I do have a problem with changes, such as the one that Ruben recently
> > > committed [1] to make the constructor of EnumerableMergeJoin, that
> > > increase the surface of our public API. The larger our public API, the
> > > more work it is to add features to Calcite while adhering to semantic
> > > versioning.
> > >
> > > 'final' is sometimes used intentionally to limit the API surface area.
> > >
> > > Julian
> > >
> > > [1]
> > >
> >
> https://github.com/apache/calcite/commit/6bb7e2d0b65ec3c7a0d82c92ca0564f8caec4af5
> > >
> > > On Thu, Jul 16, 2020 at 10:33 AM Rui Wang 
> wrote:
> > > >
> > > > +1 to make getRelTypeName overridable.
> > > >
> > > > Not a java language expert, to me when there are cases that a public
> > API
> > > > with final cannot solve, it's a good sign to remove that final to
> allow
> > > > customization to solve those specific use cases.
> > > >
> > > >
> > > > -Rui
> > > >
> > > > On Thu, Jul 16, 2020 at 8:58 AM Haisheng Yuan 
> > wrote:
> > > >
> > > > > +1 to remove final.
> > > > >
> > > > > The method returns the logical/physical operator name. When we
> > explain
> > > the
> > > > > plan (and compute the digest in 1.23.0 and before), getRelTypeName
> > may
> > > be
> > > > > called multiple times for a single operator, every time it will
> start
> > > from
> > > > > scratch to figure out what is correct operator name. Making it
> > > overridable
> > > > > will not only solve the case Ruben mentioned below, but also remove
> > the
> > > > > duplicate computation by just returning the literal string for the
> > > operator
> > > > > name, even if the gain might be negligible.
> > > > >
> > > > > The downside is when downstream projects who override the method
> > > refactor
> > > > > the operator class name, they have to manually update the method to
> > > return
> > > > > correct names. We get a flexibility by losing another one.  But I
> > > think it
> > > > > is OK to add a caveat that use at your own risk.
> > > > >
> > > > > On 2020/07/16 14:32:31, Ruben Q L  wrote:
> > > > > > Hello everyone,
> > > > > >
> > > > > > I have a small issue regarding RelNode#getRelTypeName [1]. This
> > > method is
> > > > > > used when explaining a plan (e.g. via RelOptUtil#dumpPlan), and
> it
> > > > > "returns
> > > > > > the name of this relational expression's class, sans package
> name,
> > > for
> > > > > use
> > > > > > in explain".
> > > > > > This 

Re: [DISCUSS] Remove "final" from AbstractRelNode#getRelTypeName

2020-07-16 Thread Rui Wang
Since the concern on CALCLITE-4123 is brought up, can we make a quick
decision about whether to make it in the 1.24.0 release or not?

Having the public interface increased in 1.24.0 and then later there might
be a decision to revert it doesn't sound right. If we are not sure for now,
maybe have a quick revert to unblock 1.24.0 release first?


-Rui

On Thu, Jul 16, 2020 at 3:17 PM Ruben Q L  wrote:

> Thanks for the feedback, I will create a Jira ticket to carry out the
> change and remove the "final" from AbstractRelNode#getRelTypeName.
>
> @Julian, I am sorry I committed that change (
> https://issues.apache.org/jira/browse/CALCITE-4123) without further
> discussion. I honestly thought it was a minor detail. We can discuss it,
> and if the community does not agree with the change, we can revert it.
> Basically, the package-private constructor in EnumerableMergeJoin prevents
> it from being extended in downstream projects. This is not consistent with
> other Join operators, such as EnumerableHashJoin, EnumerableNestedLoopJoin
> or EnumerbleBatchNestedLoopJoin, all of them providing a protected
> constructor. So I do not see a reason why a downstream project can extend
> e.g. EnumerableHashJoin, but is unable to extend EnumerableMergeJoin. I
> think there is no issue in extending these clases (otherwise they would be
> final). I believed changing the constructor from package-private to
> protected would solve this issue, without causing any
> problematic side-effect.
>
> Best,
> Ruben
>
>
> Le jeu. 16 juil. 2020 à 21:32, Julian Hyde  a écrit :
>
> > I don't have a problem with this change.
> >
> > I do have a problem with changes, such as the one that Ruben recently
> > committed [1] to make the constructor of EnumerableMergeJoin, that
> > increase the surface of our public API. The larger our public API, the
> > more work it is to add features to Calcite while adhering to semantic
> > versioning.
> >
> > 'final' is sometimes used intentionally to limit the API surface area.
> >
> > Julian
> >
> > [1]
> >
> https://github.com/apache/calcite/commit/6bb7e2d0b65ec3c7a0d82c92ca0564f8caec4af5
> >
> > On Thu, Jul 16, 2020 at 10:33 AM Rui Wang  wrote:
> > >
> > > +1 to make getRelTypeName overridable.
> > >
> > > Not a java language expert, to me when there are cases that a public
> API
> > > with final cannot solve, it's a good sign to remove that final to allow
> > > customization to solve those specific use cases.
> > >
> > >
> > > -Rui
> > >
> > > On Thu, Jul 16, 2020 at 8:58 AM Haisheng Yuan 
> wrote:
> > >
> > > > +1 to remove final.
> > > >
> > > > The method returns the logical/physical operator name. When we
> explain
> > the
> > > > plan (and compute the digest in 1.23.0 and before), getRelTypeName
> may
> > be
> > > > called multiple times for a single operator, every time it will start
> > from
> > > > scratch to figure out what is correct operator name. Making it
> > overridable
> > > > will not only solve the case Ruben mentioned below, but also remove
> the
> > > > duplicate computation by just returning the literal string for the
> > operator
> > > > name, even if the gain might be negligible.
> > > >
> > > > The downside is when downstream projects who override the method
> > refactor
> > > > the operator class name, they have to manually update the method to
> > return
> > > > correct names. We get a flexibility by losing another one.  But I
> > think it
> > > > is OK to add a caveat that use at your own risk.
> > > >
> > > > On 2020/07/16 14:32:31, Ruben Q L  wrote:
> > > > > Hello everyone,
> > > > >
> > > > > I have a small issue regarding RelNode#getRelTypeName [1]. This
> > method is
> > > > > used when explaining a plan (e.g. via RelOptUtil#dumpPlan), and it
> > > > "returns
> > > > > the name of this relational expression's class, sans package name,
> > for
> > > > use
> > > > > in explain".
> > > > > This method has a *final* implementation in AbstractRelNode [2],
> > which
> > > > > returns a String based on getClass().getName(). Normally, this
> > > > > implementation should work fine for all RelNode implementations,
> but
> > > > > currently I am working on a project that uses obfuscation, so this
> > > > > implementation returns a non-meaningful name on our RelNode
> > > > > implementations (e.g. our TableScan operators), which will lead to
> > > > > incomprehensible query plans in our logs. I would like to override
> > the
> > > > > method on our RelNode  implementations, so that they can return a
> > "clear"
> > > > > relTypeName, instead of the default one based on
> > getClass().getName(),
> > > > but
> > > > > at the moment I cannot do that, since the implementation in
> > > > AbstractRelNode
> > > > > is final.
> > > > >
> > > > > What would you think about removing the "final" from
> > > > > AbstractRelNode#getRelTypeName and allow downstream projects to
> > provide
> > > > > their own implementation? Of course, these projects shall be
> > responsible
> > > > > for providing 

Re: [DISCUSS] Remove "final" from AbstractRelNode#getRelTypeName

2020-07-16 Thread Ruben Q L
Thanks for the feedback, I will create a Jira ticket to carry out the
change and remove the "final" from AbstractRelNode#getRelTypeName.

@Julian, I am sorry I committed that change (
https://issues.apache.org/jira/browse/CALCITE-4123) without further
discussion. I honestly thought it was a minor detail. We can discuss it,
and if the community does not agree with the change, we can revert it.
Basically, the package-private constructor in EnumerableMergeJoin prevents
it from being extended in downstream projects. This is not consistent with
other Join operators, such as EnumerableHashJoin, EnumerableNestedLoopJoin
or EnumerbleBatchNestedLoopJoin, all of them providing a protected
constructor. So I do not see a reason why a downstream project can extend
e.g. EnumerableHashJoin, but is unable to extend EnumerableMergeJoin. I
think there is no issue in extending these clases (otherwise they would be
final). I believed changing the constructor from package-private to
protected would solve this issue, without causing any
problematic side-effect.

Best,
Ruben


Le jeu. 16 juil. 2020 à 21:32, Julian Hyde  a écrit :

> I don't have a problem with this change.
>
> I do have a problem with changes, such as the one that Ruben recently
> committed [1] to make the constructor of EnumerableMergeJoin, that
> increase the surface of our public API. The larger our public API, the
> more work it is to add features to Calcite while adhering to semantic
> versioning.
>
> 'final' is sometimes used intentionally to limit the API surface area.
>
> Julian
>
> [1]
> https://github.com/apache/calcite/commit/6bb7e2d0b65ec3c7a0d82c92ca0564f8caec4af5
>
> On Thu, Jul 16, 2020 at 10:33 AM Rui Wang  wrote:
> >
> > +1 to make getRelTypeName overridable.
> >
> > Not a java language expert, to me when there are cases that a public API
> > with final cannot solve, it's a good sign to remove that final to allow
> > customization to solve those specific use cases.
> >
> >
> > -Rui
> >
> > On Thu, Jul 16, 2020 at 8:58 AM Haisheng Yuan  wrote:
> >
> > > +1 to remove final.
> > >
> > > The method returns the logical/physical operator name. When we explain
> the
> > > plan (and compute the digest in 1.23.0 and before), getRelTypeName may
> be
> > > called multiple times for a single operator, every time it will start
> from
> > > scratch to figure out what is correct operator name. Making it
> overridable
> > > will not only solve the case Ruben mentioned below, but also remove the
> > > duplicate computation by just returning the literal string for the
> operator
> > > name, even if the gain might be negligible.
> > >
> > > The downside is when downstream projects who override the method
> refactor
> > > the operator class name, they have to manually update the method to
> return
> > > correct names. We get a flexibility by losing another one.  But I
> think it
> > > is OK to add a caveat that use at your own risk.
> > >
> > > On 2020/07/16 14:32:31, Ruben Q L  wrote:
> > > > Hello everyone,
> > > >
> > > > I have a small issue regarding RelNode#getRelTypeName [1]. This
> method is
> > > > used when explaining a plan (e.g. via RelOptUtil#dumpPlan), and it
> > > "returns
> > > > the name of this relational expression's class, sans package name,
> for
> > > use
> > > > in explain".
> > > > This method has a *final* implementation in AbstractRelNode [2],
> which
> > > > returns a String based on getClass().getName(). Normally, this
> > > > implementation should work fine for all RelNode implementations, but
> > > > currently I am working on a project that uses obfuscation, so this
> > > > implementation returns a non-meaningful name on our RelNode
> > > > implementations (e.g. our TableScan operators), which will lead to
> > > > incomprehensible query plans in our logs. I would like to override
> the
> > > > method on our RelNode  implementations, so that they can return a
> "clear"
> > > > relTypeName, instead of the default one based on
> getClass().getName(),
> > > but
> > > > at the moment I cannot do that, since the implementation in
> > > AbstractRelNode
> > > > is final.
> > > >
> > > > What would you think about removing the "final" from
> > > > AbstractRelNode#getRelTypeName and allow downstream projects to
> provide
> > > > their own implementation? Of course, these projects shall be
> responsible
> > > > for providing a valid implementation at their own risk, we could even
> > > add a
> > > > warning message in AbstractRelNode#getRelTypeName saying something
> like
> > > "do
> > > > not override this method unless you know what you are doing" or "it
> is
> > > > strongly discouraged to override this method".
> > > >
> > > > Of course, I know there are other alternatives for my problem, like
> > > > implementing my own dumpPlan and RelWriter mechanisms, but I would
> like
> > > to
> > > > explore the getRelTypeName possibility before doing that, because it
> > > would
> > > > be a more straightforward solution, and it might be useful for 

Re: [DISCUSS] Remove "final" from AbstractRelNode#getRelTypeName

2020-07-16 Thread Julian Hyde
I don't have a problem with this change.

I do have a problem with changes, such as the one that Ruben recently
committed [1] to make the constructor of EnumerableMergeJoin, that
increase the surface of our public API. The larger our public API, the
more work it is to add features to Calcite while adhering to semantic
versioning.

'final' is sometimes used intentionally to limit the API surface area.

Julian

[1] 
https://github.com/apache/calcite/commit/6bb7e2d0b65ec3c7a0d82c92ca0564f8caec4af5

On Thu, Jul 16, 2020 at 10:33 AM Rui Wang  wrote:
>
> +1 to make getRelTypeName overridable.
>
> Not a java language expert, to me when there are cases that a public API
> with final cannot solve, it's a good sign to remove that final to allow
> customization to solve those specific use cases.
>
>
> -Rui
>
> On Thu, Jul 16, 2020 at 8:58 AM Haisheng Yuan  wrote:
>
> > +1 to remove final.
> >
> > The method returns the logical/physical operator name. When we explain the
> > plan (and compute the digest in 1.23.0 and before), getRelTypeName may be
> > called multiple times for a single operator, every time it will start from
> > scratch to figure out what is correct operator name. Making it overridable
> > will not only solve the case Ruben mentioned below, but also remove the
> > duplicate computation by just returning the literal string for the operator
> > name, even if the gain might be negligible.
> >
> > The downside is when downstream projects who override the method refactor
> > the operator class name, they have to manually update the method to return
> > correct names. We get a flexibility by losing another one.  But I think it
> > is OK to add a caveat that use at your own risk.
> >
> > On 2020/07/16 14:32:31, Ruben Q L  wrote:
> > > Hello everyone,
> > >
> > > I have a small issue regarding RelNode#getRelTypeName [1]. This method is
> > > used when explaining a plan (e.g. via RelOptUtil#dumpPlan), and it
> > "returns
> > > the name of this relational expression's class, sans package name, for
> > use
> > > in explain".
> > > This method has a *final* implementation in AbstractRelNode [2], which
> > > returns a String based on getClass().getName(). Normally, this
> > > implementation should work fine for all RelNode implementations, but
> > > currently I am working on a project that uses obfuscation, so this
> > > implementation returns a non-meaningful name on our RelNode
> > > implementations (e.g. our TableScan operators), which will lead to
> > > incomprehensible query plans in our logs. I would like to override the
> > > method on our RelNode  implementations, so that they can return a "clear"
> > > relTypeName, instead of the default one based on getClass().getName(),
> > but
> > > at the moment I cannot do that, since the implementation in
> > AbstractRelNode
> > > is final.
> > >
> > > What would you think about removing the "final" from
> > > AbstractRelNode#getRelTypeName and allow downstream projects to provide
> > > their own implementation? Of course, these projects shall be responsible
> > > for providing a valid implementation at their own risk, we could even
> > add a
> > > warning message in AbstractRelNode#getRelTypeName saying something like
> > "do
> > > not override this method unless you know what you are doing" or "it is
> > > strongly discouraged to override this method".
> > >
> > > Of course, I know there are other alternatives for my problem, like
> > > implementing my own dumpPlan and RelWriter mechanisms, but I would like
> > to
> > > explore the getRelTypeName possibility before doing that, because it
> > would
> > > be a more straightforward solution, and it might be useful for other
> > people
> > > as well.
> > >
> > > Thanks and best regards,
> > > Ruben
> > >
> > > [1]
> > >
> > https://github.com/apache/calcite/blob/551e3f5a182a48e20586e40c378224bb63e4bfb3/core/src/main/java/org/apache/calcite/rel/RelNode.java#L367
> > > [2]
> > >
> > https://github.com/apache/calcite/blob/551e3f5a182a48e20586e40c378224bb63e4bfb3/core/src/main/java/org/apache/calcite/rel/AbstractRelNode.java#L182
> > >
> >


Re: [DISCUSS] Remove "final" from AbstractRelNode#getRelTypeName

2020-07-16 Thread Rui Wang
+1 to make getRelTypeName overridable.

Not a java language expert, to me when there are cases that a public API
with final cannot solve, it's a good sign to remove that final to allow
customization to solve those specific use cases.


-Rui

On Thu, Jul 16, 2020 at 8:58 AM Haisheng Yuan  wrote:

> +1 to remove final.
>
> The method returns the logical/physical operator name. When we explain the
> plan (and compute the digest in 1.23.0 and before), getRelTypeName may be
> called multiple times for a single operator, every time it will start from
> scratch to figure out what is correct operator name. Making it overridable
> will not only solve the case Ruben mentioned below, but also remove the
> duplicate computation by just returning the literal string for the operator
> name, even if the gain might be negligible.
>
> The downside is when downstream projects who override the method refactor
> the operator class name, they have to manually update the method to return
> correct names. We get a flexibility by losing another one.  But I think it
> is OK to add a caveat that use at your own risk.
>
> On 2020/07/16 14:32:31, Ruben Q L  wrote:
> > Hello everyone,
> >
> > I have a small issue regarding RelNode#getRelTypeName [1]. This method is
> > used when explaining a plan (e.g. via RelOptUtil#dumpPlan), and it
> "returns
> > the name of this relational expression's class, sans package name, for
> use
> > in explain".
> > This method has a *final* implementation in AbstractRelNode [2], which
> > returns a String based on getClass().getName(). Normally, this
> > implementation should work fine for all RelNode implementations, but
> > currently I am working on a project that uses obfuscation, so this
> > implementation returns a non-meaningful name on our RelNode
> > implementations (e.g. our TableScan operators), which will lead to
> > incomprehensible query plans in our logs. I would like to override the
> > method on our RelNode  implementations, so that they can return a "clear"
> > relTypeName, instead of the default one based on getClass().getName(),
> but
> > at the moment I cannot do that, since the implementation in
> AbstractRelNode
> > is final.
> >
> > What would you think about removing the "final" from
> > AbstractRelNode#getRelTypeName and allow downstream projects to provide
> > their own implementation? Of course, these projects shall be responsible
> > for providing a valid implementation at their own risk, we could even
> add a
> > warning message in AbstractRelNode#getRelTypeName saying something like
> "do
> > not override this method unless you know what you are doing" or "it is
> > strongly discouraged to override this method".
> >
> > Of course, I know there are other alternatives for my problem, like
> > implementing my own dumpPlan and RelWriter mechanisms, but I would like
> to
> > explore the getRelTypeName possibility before doing that, because it
> would
> > be a more straightforward solution, and it might be useful for other
> people
> > as well.
> >
> > Thanks and best regards,
> > Ruben
> >
> > [1]
> >
> https://github.com/apache/calcite/blob/551e3f5a182a48e20586e40c378224bb63e4bfb3/core/src/main/java/org/apache/calcite/rel/RelNode.java#L367
> > [2]
> >
> https://github.com/apache/calcite/blob/551e3f5a182a48e20586e40c378224bb63e4bfb3/core/src/main/java/org/apache/calcite/rel/AbstractRelNode.java#L182
> >
>


Re: [DISCUSS] Remove "final" from AbstractRelNode#getRelTypeName

2020-07-16 Thread Haisheng Yuan
+1 to remove final.

The method returns the logical/physical operator name. When we explain the plan 
(and compute the digest in 1.23.0 and before), getRelTypeName may be called 
multiple times for a single operator, every time it will start from scratch to 
figure out what is correct operator name. Making it overridable will not only 
solve the case Ruben mentioned below, but also remove the duplicate computation 
by just returning the literal string for the operator name, even if the gain 
might be negligible.

The downside is when downstream projects who override the method refactor the 
operator class name, they have to manually update the method to return correct 
names. We get a flexibility by losing another one.  But I think it is OK to add 
a caveat that use at your own risk.

On 2020/07/16 14:32:31, Ruben Q L  wrote: 
> Hello everyone,
> 
> I have a small issue regarding RelNode#getRelTypeName [1]. This method is
> used when explaining a plan (e.g. via RelOptUtil#dumpPlan), and it "returns
> the name of this relational expression's class, sans package name, for use
> in explain".
> This method has a *final* implementation in AbstractRelNode [2], which
> returns a String based on getClass().getName(). Normally, this
> implementation should work fine for all RelNode implementations, but
> currently I am working on a project that uses obfuscation, so this
> implementation returns a non-meaningful name on our RelNode
> implementations (e.g. our TableScan operators), which will lead to
> incomprehensible query plans in our logs. I would like to override the
> method on our RelNode  implementations, so that they can return a "clear"
> relTypeName, instead of the default one based on getClass().getName(), but
> at the moment I cannot do that, since the implementation in AbstractRelNode
> is final.
> 
> What would you think about removing the "final" from
> AbstractRelNode#getRelTypeName and allow downstream projects to provide
> their own implementation? Of course, these projects shall be responsible
> for providing a valid implementation at their own risk, we could even add a
> warning message in AbstractRelNode#getRelTypeName saying something like "do
> not override this method unless you know what you are doing" or "it is
> strongly discouraged to override this method".
> 
> Of course, I know there are other alternatives for my problem, like
> implementing my own dumpPlan and RelWriter mechanisms, but I would like to
> explore the getRelTypeName possibility before doing that, because it would
> be a more straightforward solution, and it might be useful for other people
> as well.
> 
> Thanks and best regards,
> Ruben
> 
> [1]
> https://github.com/apache/calcite/blob/551e3f5a182a48e20586e40c378224bb63e4bfb3/core/src/main/java/org/apache/calcite/rel/RelNode.java#L367
> [2]
> https://github.com/apache/calcite/blob/551e3f5a182a48e20586e40c378224bb63e4bfb3/core/src/main/java/org/apache/calcite/rel/AbstractRelNode.java#L182
>