Re: Re: [DISCUSSION] Extension of Metadata Query

2019-10-24 Thread Haisheng Yuan
Sounds like we have reached a consensus, I guess.
Can we create a JIRA to allow users to create their own sub-class of 
RelMetadataQuery?

- Haisheng

--
发件人:Julian Hyde
日 期:2019年10月22日 09:58:34
收件人:dev
主 题:Re: [DISCUSSION] Extension of Metadata Query

Yes, the division of labor between MetadataFactory and RelMetadataQuery is a 
good one, and we should keep them intact. One provides the raw metadata, and 
the other provides a typed interface and transactions/caching.

It might be allowable for a user to create their own sub-class of 
RelMetadataQuery that adds only handler fields and metadata methods, provided 
that it follows the existing pattern. We could add a new thread-local in 
RelMetadataQuery.instance() that is a factory to create the required sub-class 
of RelMetadataQuery.

The process by which metadata factories re-generate themselves is delicate and 
subtle:

 public Double getMaxRowCount(RelNode rel) {
 for (;;) {
 try {
 return maxRowCountHandler.getMaxRowCount(rel, this);
 } catch (JaninoRelMetadataProvider.NoHandler e) {
 maxRowCountHandler =
 revise(e.relClass, BuiltInMetadata.MaxRowCount.DEF);
 }
 }
 }

(Note that “revise” generates a new class, creating bytecode via janino, and 
the generated code throws “NoHandler" when it needs to extend itself.) I don’t 
trust the typical Calcite user to be able to write a sub-class that works 
correctly and efficiently.

Julian



> On Oct 18, 2019, at 3:55 AM, XING JIN  wrote:
> 
> +1 on Danny's comment.
> If we use MedataFactory to customize and use RelMetadataQuery for
> convenience, that will make user confused.
> 
> Danny Chan  于2019年10月18日周五 下午12:33写道:
> 
>> That is the point, we should supply a way to extend the RelMetadataQuery
>> conveniently for Calcite, because in most of the RelOptRules, user would
>> use the code like:
>> 
>> RelOptRuleCall.getMetadataQuery
>> 
>> To get a RMQ instead of using AbstractRelNode.metadata() to fetch a
>> MedataFactory.
>> 
>> We should at lest unity the metadata query entrance/interfaces, or it
>> would confuse a lot.
>> 
>> Best,
>> Danny Chan
>> 在 2019年10月18日 +0800 AM12:23,Seliverstov Igor ,写道:
>>> At least in our project (Apache Ignite) we use
>> AbstractRelNode.metadata().
>>> 
>>> But it so because there is no way to put our metadata type into
>>> RelMetadataQuery without changes in Calcite.
>>> 
>>> Regards,
>>> Igor
>>> 
>>> чт, 17 окт. 2019 г., 19:16 Xiening Dai :
>>> 
>>>> MetadataFactory is still useful. It provides a way to access Metadata
>>>> directly. If someone creates a new type of Metadata class, it can be
>>>> accessed through AbstractRelNode.metadata(). This way you don’t need to
>>>> update RelMetadataQuery interface to include the getter for this new
>> meta.
>>>> Although I don’t see this pattern being used often, but I do think it
>> is
>>>> still useful and shouldn’t be removed.
>>>> 
>>>> 
>>>> For your second point, I think you would still need a way to keep
>>>> RelMetadataQuery object during a rule call. If you choose to create new
>>>> instance, you will have to pass it around while applying the rule. That
>>>> actually complicates things a lot.
>>>> 
>>>> 
>>>>> On Oct 17, 2019, at 12:49 AM, XING JIN 
>> wrote:
>>>>> 
>>>>> 1. RelMetadataQuery covers the functionality of MetadataFactory, why
>>>> should
>>>>> we keep/maintain both of them ? shall we just deprecate
>> MetadataFactory.
>>>> I
>>>>> see MetadataFactory is rarely used in current code. Also I
>>>>> think MetadataFactory is not good place to offering customized
>> metadata,
>>>>> which will make user confused for the difference between
>> RelMetadataQuery
>>>>> and MetadataFactory.
>>>>> 
>>>>>> Customized RelMetadataQuery with code generated meta handler for
>>>>> customized metadata, also can provide convenient way to get metadata.
>>>>> It makes sense for me.
>>>>> 
>>>>> 2. If the natural lifespan of a RelMetadataQuery is a RelOptCall,
>> shall
>>>> we
>>>>> deprecate RelOptCluster#getMetadataQuery ? If a user wants to get the
>>>>> metadata but without a RelOptCall, he/she will need to create a new
>>>>> instance of RelMetadataQuery.
>>>>> 
>>>>> Xiening Dai  于2019年10月17日周四 上午2:27写道:
>>&g

Re: Re: [DISCUSSION] Extension of Metadata Query

2019-10-17 Thread Haisheng Yuan
MetadataFactory's implementation provides a unified, reflective approach to 
retrieve metadata, no matter the metadata is builtin or extended type. 
RelMetadataQuery provides convenient methods (in codegen) to get builtin 
metadata, but not for costomized metadata. Unless we recommend sub-classing 
RelMetadataQuery as the way to add costomized metadata, deprecating 
MetadataFactory is not a wise choice.

- Haisheng

--
发件人:XING JIN
日 期:2019年10月17日 15:55:31
收件人:
主 题:Re: [DISCUSSION] Extension of Metadata Query

BTW, I think one JIRA number discussed in the thread would be
https://issues.apache.org/jira/browse/CALCITE-2855 not CALCITE-2885

Best,
Jin

XING JIN  于2019年10月17日周四 下午3:49写道:

> 1. RelMetadataQuery covers the functionality of MetadataFactory, why
> should we keep/maintain both of them ? shall we just
> deprecate MetadataFactory. I see MetadataFactory is rarely used in current
> code. Also I think MetadataFactory is not good place to offering customized
> metadata, which will make user confused for the difference
> between RelMetadataQuery and MetadataFactory.
>
> > Customized RelMetadataQuery with code generated meta handler for
> customized metadata, also can provide convenient way to get metadata.
> It makes sense for me.
>
> 2. If the natural lifespan of a RelMetadataQuery is a RelOptCall, shall we
> deprecate RelOptCluster#getMetadataQuery ? If a user wants to get the
> metadata but without a RelOptCall, he/she will need to create a new
> instance of RelMetadataQuery.
>
> Xiening Dai  于2019年10月17日周四 上午2:27写道:
>
>> I have seen both patterns in current code base. In most places, for
>> example SubQueryRemoveRule, AggregateUnionTrasposeRule
>> SortJoinTransposeRule, etc., RelOptCluster.getMetadataQuery() is used. And
>> there are a few other places where new RelMetadataQuery instance is
>> created, which Haisheng attempts to fix.
>>
>> Currently RelOptCluster.invalidateMetadataQuery() is called at the end of
>> RelOptRuleCall.transformTo(). So the lifespan of RelMetadataQuery is
>> guaranteed to be within a RelOptCall. I think Haisheng’s fix is safe.
>>
>>
>> > On Oct 16, 2019, at 1:53 AM, Danny Chan  wrote:
>> >
>> > This is the reason I was struggling for the discussion.
>> >
>> > Best,
>> > Danny Chan
>> > 在 2019年10月16日 +0800 AM11:23,dev@calcite.apache.org,写道:
>> >>
>> >> RelMetadataQuery
>>
>>



Re: Re: [DISCUSSION] Extension of Metadata Query

2019-10-16 Thread Danny Chan
This is the reason I was struggling for the discussion.

Best,
Danny Chan
在 2019年10月16日 +0800 AM11:23,dev@calcite.apache.org,写道:
>
> RelMetadataQuery


Re: Re: [DISCUSSION] Extension of Metadata Query

2019-10-15 Thread Haisheng Yuan
That's a good question.
Yes. And the method getMetadataQuery from RelOptRuleCall also retrieves 
MetadataQuery from RelOptCluster.
Any other better places?

- Haisheng

--
发件人:Julian Hyde
日 期:2019年10月16日 11:02:33
收件人:dev
主 题:Re: [DISCUSSION] Extension of Metadata Query

Yes, but why RelOptCluster? Besides the fact that there happened to be a method 
there.


> On Oct 15, 2019, at 7:30 PM, Haisheng Yuan  wrote:
> 
> > Is there a good rationale for attaching one to RelOptCluster
> Customized RelMetadataQuery with code generated meta handler for customized 
> metadata, also can provide convenient way to get metadata.
> 
> - Haisheng
> 
> --
> 发件人:Julian Hyde
> 日 期:2019年10月16日 06:52:39
> 收件人:
> 主 题:Re: [DISCUSSION] Extension of Metadata Query
> 
> Having reviewed https://issues.apache.org/jira/browse/CALCITE-3403 
> <https://issues.apache.org/jira/browse/CALCITE-3403> and 
> https://issues.apache.org/jira/browse/CALCITE-2018 
> <https://issues.apache.org/jira/browse/CALCITE-2018> today I am worried about 
> the direction RelMetadataQuery is taking. Is there a good rationale for 
> attaching one to RelOptCluster? (Besides “it seems to work”?)
> 
> The natural lifespan of a RelMetadataQuery is a RelOptCall (because the 
> structure of the tree is constant for the duration of a call).
> 
> As for sub-classing RelMetadataQuery, I really don’t know. It’s probably 
> harmless, possibly disastrous, and it takes a lot of effort to determine 
> which. We have to be really cautious with RelMetadataQuery because it is one 
> of the few components in Calcite that mutate.
> 
> Julian
> 
> 
> 
> 
> > On Oct 15, 2019, at 1:42 PM, Haisheng Yuan  wrote:
> > 
> > I would like to bring this discussion up again.
> > 
> > We created our own metadata provider, and set it to RelMetadataQuery by 
> > calling
> > RelMetadataQuery.THREAD_PROVIDERS.set()
> > If we just use all the types of metadata provided by Calcite, it should be 
> > fine. But given that we have created new types of metadata, and we want to 
> > get them during rule transformation, we want to have the rule call return 
> > our own RelMetadataQuery. In order to achieve that, we sub-classed 
> > RelMetadataQuery, and sub-classed RelOptCluster to make metadata query 
> > overridable, and return the instance of subclass of RelMetadataQuery when 
> > the mq is null. Indeed, we can achieve that goal by just call 
> > rel.metadata(), but we want a efficient and convenient way to do that. That 
> > is why we want RelMetadataQuery in RelOptCluster customizable. 
> > Is it a good practice? If not, any other better approach to suggest?
> > Thanks.
> > Haisheng
> > 
> > --
> > 发件人:Stamatis Zampetakis
> > 日 期:2019年06月15日 15:47:58
> > 收件人:
> > 主 题:Re: [DISCUSSION] Extension of Metadata Query
> > 
> > I had a look on CALCITE-1147 (which added support for custom providers and
> > metadata) and based on the added tests [1], it seems that the main
> > alternative to RelMetadataQuery is indeed through the MetadataFactory. One
> > other alternative I can think of, is to pass a custom RelMetadataQuery (or
> > a query factory) through the context of the planner [2].
> > 
> > Given that we have already methods to obtain a RelMetadataQuery in
> > RelOptRuleCall and RelOptCluster it may make sense to allow the type of
> > query to be pluggable. However, I have not really used this mechanism in
> > the past so I cannot be sure what's the best way to proceed.
> > 
> > Best,
> > Stamatis
> > 
> > [1]
> > https://github.com/apache/calcite/blob/4e89fddab415a1e04b82c7d69960e399f608949f/core/src/test/java/org/apache/calcite/test/RelMetadataTest.java#L1001
> > [2]
> > https://github.com/apache/calcite/blob/4e89fddab415a1e04b82c7d69960e399f608949f/core/src/main/java/org/apache/calcite/plan/volcano/VolcanoPlanner.java#L256
> > 
> > On Wed, Jun 12, 2019 at 6:07 AM Yuzhao Chen  wrote:
> > 
> >> We have an interface to fetch the RelMetadataQuery in RelOptRuleCall [1],
> >> which I believe is the most common routine to query metadata during
> >> planning, I’m a little confused for your saying
> >> 
> >>> The methods in RelMetadataQuery are for convenience only.
> >> 
> >> If these methods in RelMetadataQuery are only for convenience, what is the
> >> other way I can query a metadata ? The MedataFactory from the RelOptCluster
> >> ? (Users w

Re: Re: [DISCUSSION] Extension of Metadata Query

2019-10-15 Thread Haisheng Yuan
> Is there a good rationale for attaching one to RelOptCluster
Customized RelMetadataQuery with code generated meta handler for customized 
metadata, also can provide convenient way to get metadata.

- Haisheng

--
发件人:Julian Hyde
日 期:2019年10月16日 06:52:39
收件人:
主 题:Re: [DISCUSSION] Extension of Metadata Query

Having reviewed https://issues.apache.org/jira/browse/CALCITE-3403 
<https://issues.apache.org/jira/browse/CALCITE-3403> and 
https://issues.apache.org/jira/browse/CALCITE-2018 
<https://issues.apache.org/jira/browse/CALCITE-2018> today I am worried about 
the direction RelMetadataQuery is taking. Is there a good rationale for 
attaching one to RelOptCluster? (Besides “it seems to work”?)

The natural lifespan of a RelMetadataQuery is a RelOptCall (because the 
structure of the tree is constant for the duration of a call).

As for sub-classing RelMetadataQuery, I really don’t know. It’s probably 
harmless, possibly disastrous, and it takes a lot of effort to determine which. 
We have to be really cautious with RelMetadataQuery because it is one of the 
few components in Calcite that mutate.

Julian




> On Oct 15, 2019, at 1:42 PM, Haisheng Yuan  wrote:
> 
> I would like to bring this discussion up again.
> 
> We created our own metadata provider, and set it to RelMetadataQuery by 
> calling
> RelMetadataQuery.THREAD_PROVIDERS.set()
> If we just use all the types of metadata provided by Calcite, it should be 
> fine. But given that we have created new types of metadata, and we want to 
> get them during rule transformation, we want to have the rule call return our 
> own RelMetadataQuery. In order to achieve that, we sub-classed 
> RelMetadataQuery, and sub-classed RelOptCluster to make metadata query 
> overridable, and return the instance of subclass of RelMetadataQuery when the 
> mq is null. Indeed, we can achieve that goal by just call rel.metadata(), but 
> we want a efficient and convenient way to do that. That is why we want 
> RelMetadataQuery in RelOptCluster customizable. 
> Is it a good practice? If not, any other better approach to suggest?
> Thanks.
> Haisheng
> 
> ----------
> 发件人:Stamatis Zampetakis
> 日 期:2019年06月15日 15:47:58
> 收件人:
> 主 题:Re: [DISCUSSION] Extension of Metadata Query
> 
> I had a look on CALCITE-1147 (which added support for custom providers and
> metadata) and based on the added tests [1], it seems that the main
> alternative to RelMetadataQuery is indeed through the MetadataFactory. One
> other alternative I can think of, is to pass a custom RelMetadataQuery (or
> a query factory) through the context of the planner [2].
> 
> Given that we have already methods to obtain a RelMetadataQuery in
> RelOptRuleCall and RelOptCluster it may make sense to allow the type of
> query to be pluggable. However, I have not really used this mechanism in
> the past so I cannot be sure what's the best way to proceed.
> 
> Best,
> Stamatis
> 
> [1]
> https://github.com/apache/calcite/blob/4e89fddab415a1e04b82c7d69960e399f608949f/core/src/test/java/org/apache/calcite/test/RelMetadataTest.java#L1001
> [2]
> https://github.com/apache/calcite/blob/4e89fddab415a1e04b82c7d69960e399f608949f/core/src/main/java/org/apache/calcite/plan/volcano/VolcanoPlanner.java#L256
> 
> On Wed, Jun 12, 2019 at 6:07 AM Yuzhao Chen  wrote:
> 
>> We have an interface to fetch the RelMetadataQuery in RelOptRuleCall [1],
>> which I believe is the most common routine to query metadata during
>> planning, I’m a little confused for your saying
>> 
>>> The methods in RelMetadataQuery are for convenience only.
>> 
>> If these methods in RelMetadataQuery are only for convenience, what is the
>> other way I can query a metadata ? The MedataFactory from the RelOptCluster
>> ? (Users would never expect to use this because it’s based on Java
>> Reflection and has performance problem)
>> 
>> [1]
>> https://github.com/apache/calcite/blob/a3c56be7bccc58859524ba39e5b30b7078f97d00/core/src/main/java/org/apache/calcite/plan/RelOptRuleCall.java#L200
>> 
>> Best,
>> Danny Chan
>> 在 2019年6月12日 +0800 AM6:27,Julian Hyde ,写道:
>>> The goal is that it should be possible to add a new kind of metadata.
>> The methods in RelMetadataQuery are for convenience only. So you should be
>> able to use your new kind of metadata, and existing ones, without modifying
>> RelMetadataQuery.
>>> 
>>> If that is not possible, it’s a bug, and you should log a JIRA case.
>>> 
>>> Given how the methods re-generate handlers (by catching exceptions and
>> modifying mutable private members) I can see that new kinds of

Re: Re: [DISCUSSION] Extension of Metadata Query

2019-10-15 Thread Haisheng Yuan
I would like to bring this discussion up again.

We created our own metadata provider, and set it to RelMetadataQuery by calling
RelMetadataQuery.THREAD_PROVIDERS.set()
If we just use all the types of metadata provided by Calcite, it should be 
fine. But given that we have created new types of metadata, and we want to get 
them during rule transformation, we want to have the rule call return our own 
RelMetadataQuery. In order to achieve that, we sub-classed RelMetadataQuery, 
and sub-classed RelOptCluster to make metadata query overridable, and return 
the instance of subclass of RelMetadataQuery when the mq is null.  Indeed, we 
can achieve that goal by just call rel.metadata(), but we want a efficient and 
convenient way to do that. That is why we want RelMetadataQuery in 
RelOptCluster customizable. 
Is it a good practice? If not, any other better approach to suggest?
Thanks.
Haisheng

--
发件人:Stamatis Zampetakis
日 期:2019年06月15日 15:47:58
收件人:
主 题:Re: [DISCUSSION] Extension of Metadata Query

I had a look on CALCITE-1147 (which added support for custom providers and
metadata) and based on the added tests [1], it seems that the main
alternative to RelMetadataQuery is indeed through the MetadataFactory. One
other alternative I can think of, is to pass a custom RelMetadataQuery (or
a query factory) through the context of the planner [2].

Given that we have already methods to obtain a RelMetadataQuery in
RelOptRuleCall and RelOptCluster it may make sense to allow the type of
query to be pluggable. However, I have not really used this mechanism in
the past so I cannot be sure what's the best way to proceed.

Best,
Stamatis

[1]
https://github.com/apache/calcite/blob/4e89fddab415a1e04b82c7d69960e399f608949f/core/src/test/java/org/apache/calcite/test/RelMetadataTest.java#L1001
[2]
https://github.com/apache/calcite/blob/4e89fddab415a1e04b82c7d69960e399f608949f/core/src/main/java/org/apache/calcite/plan/volcano/VolcanoPlanner.java#L256

On Wed, Jun 12, 2019 at 6:07 AM Yuzhao Chen  wrote:

> We have an interface to fetch the RelMetadataQuery in RelOptRuleCall [1],
> which I believe is the most common routine to query metadata during
> planning, I’m a little confused for your saying
>
> >The methods in RelMetadataQuery are for convenience only.
>
> If these methods in RelMetadataQuery are only for convenience, what is the
> other way I can query a metadata ? The MedataFactory from the RelOptCluster
> ? (Users would never expect to use this because it’s based on Java
> Reflection and has performance problem)
>
> [1]
> https://github.com/apache/calcite/blob/a3c56be7bccc58859524ba39e5b30b7078f97d00/core/src/main/java/org/apache/calcite/plan/RelOptRuleCall.java#L200
>
> Best,
> Danny Chan
> 在 2019年6月12日 +0800 AM6:27,Julian Hyde ,写道:
> > The goal is that it should be possible to add a new kind of metadata.
> The methods in RelMetadataQuery are for convenience only. So you should be
> able to use your new kind of metadata, and existing ones, without modifying
> RelMetadataQuery.
> >
> > If that is not possible, it’s a bug, and you should log a JIRA case.
> >
> > Given how the methods re-generate handlers (by catching exceptions and
> modifying mutable private members) I can see that new kinds of metadata
> might be difficult to add.
> >
> > Julian
> >
> >
> > > On Jun 9, 2019, at 7:54 PM, Yuzhao Chen  wrote:
> > >
> > > Thanks, Stamatis
> > >
> > > You are right, this discussion came up due to CALCITE-2885, it is not
> about the performance problem, it is the extension of RelMetadataQuery,
> because we add all kinds of top interfaces in RelMetadataQuery, e.g. the
> row count query [1].
> > >
> > > When we add a new RelMetadataProvider, a corresponding
> interface/method should be added into RelMetadataQuery, but for current
> RelOpeCluster impl, we could not do that(with a always default instance)
> > >
> > > As for the methods in RelMetadataProvider, i only saw the usage of
> #handlers in RelMetadataQuery, and saw the reference you pase, it seems not
> that relevant with this topic. What do you think ?
> > >
> > > [1]
> https://github.com/apache/calcite/blob/941cd4e9540e3ef9b7c15daee42831a0c63da8b9/core/src/main/java/org/apache/calcite/rel/metadata/RelMetadataQuery.java#L222
> > >
> > > Best,
> > > Danny Chan
> > > 在 2019年6月5日 +0800 AM6:44,Stamatis Zampetakis ,写道:
> > > > Thanks for bringing this up Danny.
> > > >
> > > > I guess the discussion came up due to CALCITE-2885 [1]. Looking back
> into
> > > > it, I am not sure that the intention there is to make the
> RelMetadataQuery
> > > > pluggable. We co