> Is there a good rationale for attaching one to RelOptCluster
Customized RelMetadataQuery with code generated meta handler for customized 
metadata, also can provide convenient way to get metadata.

- Haisheng

------------------------------------------------------------------
发件人:Julian Hyde<jh...@apache.org>
日 期:2019年10月16日 06:52:39
收件人:<dev@calcite.apache.org>
主 题:Re: [DISCUSSION] Extension of Metadata Query

Having reviewed https://issues.apache.org/jira/browse/CALCITE-3403 
<https://issues.apache.org/jira/browse/CALCITE-3403> and 
https://issues.apache.org/jira/browse/CALCITE-2018 
<https://issues.apache.org/jira/browse/CALCITE-2018> today I am worried about 
the direction RelMetadataQuery is taking. Is there a good rationale for 
attaching one to RelOptCluster? (Besides “it seems to work”?)

The natural lifespan of a RelMetadataQuery is a RelOptCall (because the 
structure of the tree is constant for the duration of a call).

As for sub-classing RelMetadataQuery, I really don’t know. It’s probably 
harmless, possibly disastrous, and it takes a lot of effort to determine which. 
We have to be really cautious with RelMetadataQuery because it is one of the 
few components in Calcite that mutate.

Julian




> On Oct 15, 2019, at 1:42 PM, Haisheng Yuan <h.y...@alibaba-inc.com> wrote:
> 
> I would like to bring this discussion up again.
> 
> We created our own metadata provider, and set it to RelMetadataQuery by 
> calling
> RelMetadataQuery.THREAD_PROVIDERS.set()
> If we just use all the types of metadata provided by Calcite, it should be 
> fine. But given that we have created new types of metadata, and we want to 
> get them during rule transformation, we want to have the rule call return our 
> own RelMetadataQuery. In order to achieve that, we sub-classed 
> RelMetadataQuery, and sub-classed RelOptCluster to make metadata query 
> overridable, and return the instance of subclass of RelMetadataQuery when the 
> mq is null. Indeed, we can achieve that goal by just call rel.metadata(), but 
> we want a efficient and convenient way to do that. That is why we want 
> RelMetadataQuery in RelOptCluster customizable. 
> Is it a good practice? If not, any other better approach to suggest?
> Thanks.
> Haisheng
> 
> ------------------------------------------------------------------
> 发件人:Stamatis Zampetakis<zabe...@gmail.com>
> 日 期:2019年06月15日 15:47:58
> 收件人:<dev@calcite.apache.org>
> 主 题:Re: [DISCUSSION] Extension of Metadata Query
> 
> I had a look on CALCITE-1147 (which added support for custom providers and
> metadata) and based on the added tests [1], it seems that the main
> alternative to RelMetadataQuery is indeed through the MetadataFactory. One
> other alternative I can think of, is to pass a custom RelMetadataQuery (or
> a query factory) through the context of the planner [2].
> 
> Given that we have already methods to obtain a RelMetadataQuery in
> RelOptRuleCall and RelOptCluster it may make sense to allow the type of
> query to be pluggable. However, I have not really used this mechanism in
> the past so I cannot be sure what's the best way to proceed.
> 
> Best,
> Stamatis
> 
> [1]
> https://github.com/apache/calcite/blob/4e89fddab415a1e04b82c7d69960e399f608949f/core/src/test/java/org/apache/calcite/test/RelMetadataTest.java#L1001
> [2]
> https://github.com/apache/calcite/blob/4e89fddab415a1e04b82c7d69960e399f608949f/core/src/main/java/org/apache/calcite/plan/volcano/VolcanoPlanner.java#L256
> 
> On Wed, Jun 12, 2019 at 6:07 AM Yuzhao Chen <yuzhao....@gmail.com> wrote:
> 
>> We have an interface to fetch the RelMetadataQuery in RelOptRuleCall [1],
>> which I believe is the most common routine to query metadata during
>> planning, I’m a little confused for your saying
>> 
>>> The methods in RelMetadataQuery are for convenience only.
>> 
>> If these methods in RelMetadataQuery are only for convenience, what is the
>> other way I can query a metadata ? The MedataFactory from the RelOptCluster
>> ? (Users would never expect to use this because it’s based on Java
>> Reflection and has performance problem)
>> 
>> [1]
>> https://github.com/apache/calcite/blob/a3c56be7bccc58859524ba39e5b30b7078f97d00/core/src/main/java/org/apache/calcite/plan/RelOptRuleCall.java#L200
>> 
>> Best,
>> Danny Chan
>> 在 2019年6月12日 +0800 AM6:27,Julian Hyde <jh...@apache.org>,写道:
>>> The goal is that it should be possible to add a new kind of metadata.
>> The methods in RelMetadataQuery are for convenience only. So you should be
>> able to use your new kind of metadata, and existing ones, without modifying
>> RelMetadataQuery.
>>> 
>>> If that is not possible, it’s a bug, and you should log a JIRA case.
>>> 
>>> Given how the methods re-generate handlers (by catching exceptions and
>> modifying mutable private members) I can see that new kinds of metadata
>> might be difficult to add.
>>> 
>>> Julian
>>> 
>>> 
>>>> On Jun 9, 2019, at 7:54 PM, Yuzhao Chen <yuzhao....@gmail.com> wrote:
>>>> 
>>>> Thanks, Stamatis
>>>> 
>>>> You are right, this discussion came up due to CALCITE-2885, it is not
>> about the performance problem, it is the extension of RelMetadataQuery,
>> because we add all kinds of top interfaces in RelMetadataQuery, e.g. the
>> row count query [1].
>>>> 
>>>> When we add a new RelMetadataProvider, a corresponding
>> interface/method should be added into RelMetadataQuery, but for current
>> RelOpeCluster impl, we could not do that(with a always default instance)
>>>> 
>>>> As for the methods in RelMetadataProvider, i only saw the usage of
>> #handlers in RelMetadataQuery, and saw the reference you pase, it seems not
>> that relevant with this topic. What do you think ?
>>>> 
>>>> [1]
>> https://github.com/apache/calcite/blob/941cd4e9540e3ef9b7c15daee42831a0c63da8b9/core/src/main/java/org/apache/calcite/rel/metadata/RelMetadataQuery.java#L222
>>>> 
>>>> Best,
>>>> Danny Chan
>>>> 在 2019年6月5日 +0800 AM6:44,Stamatis Zampetakis <zabe...@gmail.com>,写道:
>>>>> Thanks for bringing this up Danny.
>>>>> 
>>>>> I guess the discussion came up due to CALCITE-2885 [1]. Looking back
>> into
>>>>> it, I am not sure that the intention there is to make the
>> RelMetadataQuery
>>>>> pluggable. We could possibly solve the performance problem without
>>>>> extending the RelMetadataQuery. We have to look again into this case.
>>>>> 
>>>>> For more details regarding the existence of the two methods in
>>>>> RelMetadataProvider have a look in CALCITE-604 [2]. More general for
>> the
>>>>> design of RelMetadataProvider you may find useful the description in
>> [3].
>>>>> 
>>>>> Best,
>>>>> Stamatis
>>>>> 
>>>>> [1] https://issues.apache.org/jira/browse/CALCITE-2885
>>>>> [2] https://issues.apache.org/jira/browse/CALCITE-604
>>>>> [3]
>>>>> 
>> https://web.archive.org/web/20140624040836/www.hydromatic.net/wiki/RelationalExpressionMetadata/
>>>>> 
>>>>> On Tue, Jun 4, 2019 at 7:48 PM Julian Hyde <jh...@apache.org> wrote:
>>>>> 
>>>>>>> 1. Why we have 2 methods in RelMetadataProvider?
>>>>>> 
>>>>>> The metadata system is complicated. We need to allow multiple
>> handlers
>>>>>> for any given call. So, making a metadata call involves multiple
>>>>>> dispatch [1] based on the metadata method being called, the type of
>>>>>> RelNode, and the handlers that are registered. Also it needs to
>> cache
>>>>>> results, and detect cycles. And the dispatch needs to be
>> efficient, so
>>>>>> we generate janino code to do the dispatch, and re-generate when
>> new
>>>>>> handlers or sub-classes of RelNode are added.
>>>>>> 
>>>>>> I forget details, the two methods are basically required to allow
>> us
>>>>>> to generate code to do the dispatch.
>>>>>> 
>>>>>>> 2. We should make the RelMetadataQuery in RelOptCluster
>> pluggable.
>>>>>> 
>>>>>> I disagree. RelMetadataQuery is not an extension point. Its sole
>>>>>> purpose is to to keep state (the cache and cycle-checking).
>>>>>> RelMetadataProvider is the extension point. (By analogy, if you are
>>>>>> un-parsing an AST, you let each AST sub-class handle unparsing
>> itself,
>>>>>> but the unparsed text goes to a simple StringBuilder.
>> RelMetadataQuery
>>>>>> is in the role of the StringBuilder. In a complex system, it is
>> nice
>>>>>> to keep some of the components simple, or at least keep them to
>>>>>> prescribed roles.)
>>>>>> 
>>>>>> Julian
>>>>>> 
>>>>>> [1] https://en.wikipedia.org/wiki/Multiple_dispatch
>>>>>> 
>>>>>> On Sun, Jun 2, 2019 at 11:19 PM Yuzhao Chen <yuzhao....@gmail.com>
>> wrote:
>>>>>>> 
>>>>>>> Currently we provide answer to metadata query through
>>>>>> RelMetadataProvider [1], there are some sub-classes of it:
>>>>>>> 
>>>>>>> RelMetadataProvider
>>>>>>> |
>>>>>>> |- VolcanoRelMetadataProvider
>>>>>>> |- ChainedRelMetadataProvider/DefaultRelMetadataProvider
>>>>>>> |- HepRelMetadataProvider
>>>>>>> |- CachingRelMetadataProvider
>>>>>>> |- ReflectiveRelMetadataProvider
>>>>>>> |- JaninoRelMetadataProvider
>>>>>>> 
>>>>>>> The RelMetadataProvider has two methods: #apply and #handlers,
>> the
>>>>>> #apply method seems a programming interface and there is a demo
>> code how we
>>>>>> can use it:
>>>>>>> 
>>>>>>> RelMetadataProvider provider;
>>>>>>> LogicalFilter filter;
>>>>>>> RexNode predicate;
>>>>>>> Function<RelNode, Metadata> function =
>>>>>>> provider.apply(LogicalFilter.class, Selectivity.class};
>>>>>>> Selectivity selectivity = function.apply(filter);
>>>>>>> Double d = selectivity.selectivity(predicate);
>>>>>>> 
>>>>>>> But let's see our RelOptCluster's member variables[2], there are
>>>>>> MetadataFactory and RelMetadataQuery which all can be used to
>> query the
>>>>>> metadata, for MetadataFactory, there is a default impl named
>>>>>> MetadataFactoryImpl which will invoke RelMetadataProvider#apply
>> internally,
>>>>>> for RelMetadataQuery, it will invoke RelMetadataProvider#handlers
>> (finally
>>>>>> composed and codeden by JaninoRelMetadataProvider).
>>>>>>> 
>>>>>>> In our planning phrase, we can invoke
>> RelOptRuleCall#getMetadataQuery to
>>>>>> get the MQ and query the metadata.
>>>>>>> 
>>>>>>> For extension of metadata handlers, we can set our customized
>>>>>> RelMetadataProvider in RelOptCluster[3]. But for RelMetadataQuery,
>> we have
>>>>>> no way to extend it now, because the RelOptCluster always has a
>> singleton
>>>>>> instance [4] which is only the default implementation.
>>>>>>> 
>>>>>>> 
>>>>>>> My question is as follows:
>>>>>>> 
>>>>>>> 1. Why we have 2 methods in RelMetadataProvider, and why we need
>> the
>>>>>> MetadataFactory and RelMetadataProvider#apply ? It seems that it's
>> function
>>>>>> is already been overriden by RelMetadataQuery(The difference is
>> that
>>>>>> MetadataFactory use Reflection and RelMetadataQuery use gened
>> bytes code).
>>>>>>> 2. We should make the RelMetadataQuery in RelOptCluster
>> pluggable.
>>>>>>> 
>>>>>>> 
>>>>>>> [1]
>>>>>> 
>> https://github.com/apache/calcite/blob/b0e83c469ff57257c1ea621ff943ca76f626a9b7/core/src/main/java/org/apache/calcite/rel/metadata/RelMetadataProvider.java#L38
>>>>>>> [2]
>>>>>> 
>> https://github.com/apache/calcite/blob/b0e83c469ff57257c1ea621ff943ca76f626a9b7/core/src/main/java/org/apache/calcite/plan/RelOptCluster.java#L49
>>>>>>> [3]
>>>>>> 
>> https://github.com/apache/calcite/blob/b0e83c469ff57257c1ea621ff943ca76f626a9b7/core/src/main/java/org/apache/calcite/plan/RelOptCluster.java#L135
>>>>>>> [4]
>>>>>> 
>> https://github.com/apache/calcite/blob/b0e83c469ff57257c1ea621ff943ca76f626a9b7/core/src/main/java/org/apache/calcite/plan/RelOptCluster.java#L151
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> Best,
>>>>>>> Danny Chan
>>>>>> 
>>> 
>> 
> 


Reply via email to