Re: [DISCUSS] Towards Cascades Optimizer

2020-04-21 Thread Андрей Цвелодуб
Hello Haisheng,

> To keep backward compatibility, all the un-marked rules will be treated
as logical rules, except rules that uses AbstractConverter as rule operand,
these rules still need to applied top-down, or random order.
Obviously, from what is written here, I could guess that this would require
me to change my physical planning rules, even if only by implementing a
marker interface. I am not saying this is a bad thing, but this is a thing
that should be communicated and planned ahead in case the VolcanoPlanner is
modified.

> Looks like I have to revert changes in CALCITE-2970 and CALCITE-3753,
because they will cause another tons of plan changes.
I see you are still bitter due to all the discussions on this list lately,
I'm sorry. I don't want you to think that I somehow resent the changes you
are pushing, au contraire I support them and would be happy to help if I
can. I just want the process of these changes to be executed in the best
possible way.
As I see there are already several opinions in this thread that basically
align with what I am saying, so I guess I am not the crazy guy running
around and yelling "the end is nigh!".

Thank you for taking these mumbled thoughts into account.

Bestest Regards,
Andrii Tsvielodub

On Tue, 21 Apr 2020 at 21:08, Haisheng Yuan  wrote:

> Hi Andrii,
>
> > I guess changing the planner would lead to changes in tons of rules and
> even more tests.
> Obviously you didn't read through my email. You are not required to do any
> changes to your rule if you don't want to, but if you do, just need to mark
> the rule to tell planner whether it is a physical rule or not, simply by
> implementing an empty interface.
>
> > many on this list already experienced problems with upgrading even
> between the minor versions of Calcite.
> Sorry to see the problem you have experienced when upgrading Calcite.
> Looks like I have to revert changes in CALCITE-2970 and CALCITE-3753,
> because they will cause another tons of plan changes.
>
> But I will see if I can add a setting to use the old search strategy,
> which can be left untouched.
>
> Haisheng
>
> On 2020/04/21 06:33:08, Андрей Цвелодуб  wrote:
> > Hello everyone,
> >
> > First of all, thanks for this great effort of improving the core parts of
> > the framework we all are using,
> > I believe this is long overdue and hope this will have benefits both for
> > the maintainers and users of the library.
> >
> > I don't have anything to say about the general idea at the moment,
> > but I want to make a point that maintaining the old implementation of
> > VolcanoPlanner during
> > the initial stages of implementing the new planner is absolutely
> CRITICAL.
> > As a lot of users of Calcite do various customizations to the engine, to
> > the rules
> > and all that is there in between, I believe changing the implementation
> of
> > the core component
> > would have a huge impact on most users of the library. I think many on
> this
> > list
> > already experienced problems with upgrading even between the minor
> versions
> > of Calcite,
> > so I guess changing the planner would lead to changes in tons of rules
> and
> > even more tests.
> >
> > I don't have anything against replacing VolcanoPlanner as a final goal of
> > this effort,
> > but I don't think that modifying it directly and merging it to master is
> a
> > viable development approach.
> > While I understand how burdensome it is to maintain several parallel core
> > components at once
> > (we did this while moving the engine of our product to Calcite), we
> should
> > still respect those who depend
> > on it and not introduce the risks related to the development of a new
> > component into existing processing flows.
> >
> > A good model to try to follow would be the way new Garbage Collectors are
> > introduced in Java.
> > First, add it as an experimental option, then make it generally
> available,
> > then after everyone agrees
> > this is the best option - make it the default one.
> > With this approach, everyone can then move to the new planner at their
> own
> > pace, guaranteeing a smooth transition overall.
> > Yes, this could take some time, maybe even a year, but this is the price
> of
> > doing major changes in a popular framework.
> >
> > Again, thank you for initiating this discussion and leading this effort.
> >
> > Best Regards,
> > Andrii Tsvielodub
> >
> > On Tue, 21 Apr 2020 at 07:51, Jinpeng Wu  wrote:
> >
> > > Hi, Xiening.
> > >
> > > Regarding calculating the logical cost, here are some ways I though:
> > 

Re: [DISCUSS] Towards Cascades Optimizer

2020-04-21 Thread Андрей Цвелодуб
Hello everyone,

First of all, thanks for this great effort of improving the core parts of
the framework we all are using,
I believe this is long overdue and hope this will have benefits both for
the maintainers and users of the library.

I don't have anything to say about the general idea at the moment,
but I want to make a point that maintaining the old implementation of
VolcanoPlanner during
the initial stages of implementing the new planner is absolutely CRITICAL.
As a lot of users of Calcite do various customizations to the engine, to
the rules
and all that is there in between, I believe changing the implementation of
the core component
would have a huge impact on most users of the library. I think many on this
list
already experienced problems with upgrading even between the minor versions
of Calcite,
so I guess changing the planner would lead to changes in tons of rules and
even more tests.

I don't have anything against replacing VolcanoPlanner as a final goal of
this effort,
but I don't think that modifying it directly and merging it to master is a
viable development approach.
While I understand how burdensome it is to maintain several parallel core
components at once
(we did this while moving the engine of our product to Calcite), we should
still respect those who depend
on it and not introduce the risks related to the development of a new
component into existing processing flows.

A good model to try to follow would be the way new Garbage Collectors are
introduced in Java.
First, add it as an experimental option, then make it generally available,
then after everyone agrees
this is the best option - make it the default one.
With this approach, everyone can then move to the new planner at their own
pace, guaranteeing a smooth transition overall.
Yes, this could take some time, maybe even a year, but this is the price of
doing major changes in a popular framework.

Again, thank you for initiating this discussion and leading this effort.

Best Regards,
Andrii Tsvielodub

On Tue, 21 Apr 2020 at 07:51, Jinpeng Wu  wrote:

> Hi, Xiening.
>
> Regarding calculating the logical cost, here are some ways I though:
> 1. Logical rel may implement their own computeSelfCost method. Some
> rels can provide such information, for example the
> LogicalProject/LogicalFilter contains nearly the same information as their
> physical implementations. If we don't have enough confidence, just return
> zeroCost is also OK, as it only affects pruning.
> 2. Logical rel  tells its parents what its physical input could be after
> implementation. Then the problem come back to calculating lower bound of a
> physical rel.
> There should always be ways. The only problem is how to find a pretty one.
>
> Regarding the risk, new planner do have different risk. It is not because
> new planner could stop us doing something wrong but we can decide when to
> use the new one. Some scenarios:
> 1. If modifying the VolcanoPlanner directly, the only way user could
> control the risk is not to upgrade calcite version until it is considered
> stable. You know, it is quite different from keeping calcite updated and
> switching to the new planner at a proper time.
> 2. It is very importance for SLA control. For the important business and
> jobs, we may keep using the old and stable planner. And use the new one
> only for jobs that have fault tolerance. And this helps testing new planner
> with actual scenarios.
> 3. It is helpful when upgrading online services. When the new planner
> happened to have some bugs, we can switch to the old planner directly
> without rollback the whole service.
> 4. With all these ways to prevent issues becoming disasters, we are not
> vulnerable to making mistakes. This not only enables faster iterations but
> also let us have enough time to resolve big bugs, like considering it in
> detail and applying a time-consuming refactoring for it. To work around a
> critical bug using tricky ways usually introduces more issues.
>
> Thanks,
> Jinpeng
>
> On Tue, Apr 21, 2020 at 2:04 AM Xiening Dai  wrote:
>
> > Hi Jinpeng,
> >
> > Regarding this comment - I believe there are ways to calculate the
> logical
> > cost, but I think it’s not that simple as  "cardinality *
> unit_copy_cost.”,
> > would  you provide more details of other different ways? Just the
> algorithm
> > description or pseudo code would help us understand. Thanks.
> >
> > Regarding the approach of creating new planner, I don’t think a new
> > planner would lower the risk. We don’t know what we don’t know. If we
> > introduced an issue while modifying the planner, most likely we would do
> > the same with new planner class. A new planner doesn’t necessarily
> prevent
> > the issue from happening, but just delay its surfacing, which is worse
> IMO.
> >
> > There’s one obvious benefit with new planner is that we can provide some
> > sort of isolation so the change won’t cause test baseline updates, which
> > could be painful at times.  We should see if we 

Re: Rel as a service

2020-04-07 Thread Андрей Цвелодуб
Hi Tal,

Although our case is not really similar to yours, we had to do a conversion
from our custom API to Calcite RelNodes. We just used RelBuilder to build
the tree, and then optimized and executed it.
I would say there is nothing really difficult, most of that logic could be
found in Calcite itself, so you can copy it and customize it to your needs.

One thing to consider is there are some optimizations that happen during
SQL-to-Rel conversion, so you won't get those if you use RelBuilder
directly. And there are some corner-cases like IN filters or
implementations of some SQL functions that are also handled in
SqlToRelConverter (e.g. CASE is used to implement COALESCE). Most of those
could be reimplemented, there's usually nothing really complex there.

Best Regards,
Andrii Tsvielodub

On Sun, 5 Apr 2020 at 17:47, Stamatis Zampetakis  wrote:

> Hi Tal,
>
> I'm sure there are valid reasons that you are using your own SQL parser and
> not the one provided by Calcite but can you briefly explain why? This is to
> understand if there are things that we could do to improve Calcite.
>
> I don't have any particular experience for the use case you mentioned but
> one thing that comes first to my mind is to serialize, send, and
> deserialize the entire plan [1, 2, 3].
>
> Best,
> Stamatis
>
> [1]
>
> https://github.com/apache/calcite/blob/d0180d160120e5d775b000549b7edac30250a353/core/src/test/java/org/apache/calcite/plan/RelWriterTest.java#L491
> [2]
>
> https://github.com/apache/calcite/blob/master/core/src/main/java/org/apache/calcite/rel/externalize/RelJsonReader.java
> [3]
>
> https://github.com/apache/calcite/blob/master/core/src/main/java/org/apache/calcite/rel/externalize/RelJsonWriter.java
>
> On Sun, Apr 5, 2020, 11:44 AM Tal Glanzman  wrote:
>
> > Hi,
> >
> > I'm looking for a some guidance / references on how to approach the
> > following scenario.
> >
> > I have 2 separate software modules
> > 1. Cpp-Codebase that parses an SQL query and construct a plan
> > 2. Calcite adapter to my specific domain
> >
> > My goal is to provide a server, such that the Cpp-Codebase, as the
> client,
> > can construct (in the server) a corresponding plan (of calcite). The
> client
> > shouldn't receive the constructed plan back.
> >
> > Basically, it's an integration problem, i want to convert
> NativePlan@Client
> > to RelNode@Server.
> >
> > *I am looking for references based on your experience how would i go
> about
> > this. *
> >
> > My initial intuition is to provide a gRPC service, with messages that
> > according to will construct the plan
> >   - the client will convert it's model to the gRPC model
> >   - the server will convert the gRPC model to the RelNode
> >
> > This also includes a manual construction of the RelNode which im not sure
> > how to go about...
> >
> > Thanks in advance!
> > Tal
> >
>


Re: [VOTE] Release apache-calcite-1.22.0 (release candidate 0)

2020-02-24 Thread Андрей Цвелодуб
Greetings everyone,

Quick question - It looks like Calcite still depends on Avatica version
1.15.0 (if it is specified by "calcite.avatica.version" in
gradle.properties in the root, I'm not that familiar with gradle), while
there is a 1.16.0 release already.
Is this intentional or just something overlooked? I remember there were
some issues with dockerfiles in 1.16 release. Is this the reason?
We are particularly interested in one fix[1] in Avatica in a downstream
project.

[1]
https://github.com/apache/calcite-avatica/commit/72bbbfc964e7805e1b09bfdb47f5472d37050c39

Best Regards,
Andrew

On Mon, 24 Feb 2020 at 15:55, Enrico Olivelli  wrote:

> Danny,
> We are testing HerdDB with 1.22.0rc0 tag and we are seeing problems with
> Joins.
>
> We were still on 1.19.0 and in December we created a test branch
> against current Calcite's master.
> Unfortunately during the past few weeks we stopped checking
> continuously that branch and we missed the commit id on Calcite that
> introduced these failures.
>
> All failures are about JOIN conditions that seem not to be applied
> correctly.
>
> This is my test branch with the upgrade from 1.19 to 1.22.0rc0:
> https://github.com/diennea/herddb/pull/563
>
> We are investigating, hopefully is only a bug in our changes.
> Since 1.19.0 the management of JOINs has been changed a lot in Calcite
> so probably we missed something.
>
> I also had to implement QueryableTable#getExpression that wasn't
> required before, I have implemented it with a "return null"
>
> This was the error:
> java.lang.RuntimeException: Error while applying rule
> EnumerableTableScanRule(in:NONE,out:ENUMERABLE), args
> [rel#26:LogicalTableScan.NONE.[](table=[tblspace1, tsql])]
> at
> org.apache.calcite.plan.volcano.VolcanoRuleCall.onMatch(VolcanoRuleCall.java:244)
> at
> org.apache.calcite.plan.volcano.VolcanoPlanner.findBestExp(VolcanoPlanner.java:636)
> at herddb.sql.CalcitePlanner.runPlanner(CalcitePlanner.java:523)
> at herddb.sql.CalcitePlanner.translate(CalcitePlanner.java:291)
> at herddb.core.RawSQLTest.cacheStatement(RawSQLTest.java:96)
> at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native
> Method)
> at
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.base/java.lang.reflect.Method.invoke(Method.java:567)
> at
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
> at
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
> at
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
> at
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
> at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
> at
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
> at
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
> at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
> at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
> at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
> at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
> at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
> at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
> at
> org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365)
> at
> org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273)
> at
> org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238)
> at
> org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159)
> at
> org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:384)
> at
> org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:345)
> at
> org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:126)
> at
> org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:418)
> Caused by: java.lang.RuntimeException: getExpression is not implemented
> at
> herddb.sql.CalcitePlanner$TableImpl.getExpression(CalcitePlanner.java:1377)
> at
> org.apache.calcite.prepare.RelOptTableImpl.lambda$getClassExpressionFunction$2(RelOptTableImpl.java:165)
> at
> org.apache.calcite.prepare.RelOptTableImpl.getExpression(RelOptTableImpl.java:214)
> at
> org.apache.calcite.adapter.enumerable.EnumerableTableScanRule.convert(EnumerableTableScanRule.java:63)
> at
> org.apache.calcite.rel.convert.ConverterRule.onMatch(ConverterRule.java:144)
> at
> org.apache.calcite.plan.volcano.VolcanoRuleCall.onMatch(VolcanoRuleCall.java:217)
>
> Best regards
>
> Enrico
>
> Il giorno lun 24 feb 2020 alle ore 11:38 Danny Chan
>  ha scritto:
> >
> > Just to note that, I have updated the release 

Re: How to traverse RelNode’s parent conviniently?

2019-04-23 Thread Андрей Цвелодуб
Hi Danny,

I would also agree with Julian on his position. I've tried to get around
this limitation in several different ways, but none of it ended well :)

For your idea with hints, if you have custom RelNode classes, you can add
hint as an additional field of the class and you can write a simple rule
that propagates the hint downwards, step by step. And also include the hint
in your cost estimation, so that nodes with hints would be more attractive
to the planner. I'm not sure this would be the most correct way to use the
cost mechanism, but at least it is straightforward and it works.

Best Regards,
Andrew Tsvelodub

On Tue, 23 Apr 2019 at 08:44, Yuzhao Chen  wrote:

> Julian,
>
> I want to add hint support for Calcite, the initial idea was to tag a
> RelNode(transformed from a SqlNode with hint) with a hit attribute(or
> trait), then I hope that the children (inputs) of it can see this hint, so
> to make some decisions if it should consume or propagate the hint.
>
> The problem I got here is the trait propagate from inputs from, which is
> the opposite as what I need, can you give some suggestions ? If I use
> MetadataHandler to cache and propagate the hints, how to propagate from
> parents to children ?
>
> Best,
> Danny Chan
> 在 2019年4月23日 +0800 AM3:14,Julian Hyde ,写道:
> > TL;DR: RelNodes don’t really have parents. Be careful if you are relying
> on the parent concept too much. Rely on rules instead.
> >
> > In the Volcano model, a RelNode doesn’t really have a parent. It might
> be used in several places. (RelSet has a field ‘List parents’ that
> is kept up to date as planing progresses. But it’s really for Volcano’s
> internal use.)
> >
> > Even if you are not using Volcano, there are reasons to want the RelNode
> graph to be a dag, so again, a RelNode doesn’t have a unique parent.
> >
> > RelShuttleImpl has a stack. You can use that to find the parent. But the
> “parent” is just “where we came from as we traversed the RelNode graph”.
> There may be other “parents” that you do not know about.
> >
> > If you have a Project and want to find all parents that are Filters,
> don’t even think about “iterating over the parents” of the Project. Just
> write a rule that matches a Filter on a Project, and trust Volcano to do
> its job.
> >
> > Julian
> >
> >
> >
> >
> > > On Apr 22, 2019, at 6:15 AM, Yuzhao Chen  wrote:
> > >
> > > Thx, Stamatis, that somehow make sense, if i pass around the parent
> node every time I visit a RelNode and keep the parents in the cache, but it
> is still not that intuitive. Actually I what a to add a new RelTrait which
> bind to a specific scope, for example:
> > >
> > > join-rel(trait1)
> > > / \
> > > join2 join3
> > >
> > > Join-rel has a trait trait1, and I want all the children of join-rel
> can see this trait, with Calcite’s default metadata handler, I can only see
> the trait from children nodes(traits propagate from the inputs), and I have
> no idea how to propagate a trait reversely?
> > >
> > >
> > > Best,
> > > Danny Chan
> > > 在 2019年4月22日 +0800 PM8:44,Stamatis Zampetakis ,写道:
> > > > Hi Danny,
> > > >
> > > > Apart from RelShuttle there is also RelVisitor which has a visit
> method
> > > > that provides the parent [1]. Not sure, if it suits your needs.
> > > >
> > > > Best,
> > > > Stamatis
> > > >
> > > > [1]
> > > >
> https://github.com/apache/calcite/blob/ee83efd360793ef4201f4cdfc2af8d837b76ca69/core/src/main/java/org/apache/calcite/rel/RelVisitor.java#L43
> > > >
> > > >
> > > > On Mon, Apr 22, 2019 at 2:14 PM Yuzhao Chen 
> wrote:
> > > >
> > > > > Now for RelNode, we have method getInput()[1] to fetch the input
> > > > > RelNodes, but how we fetch the parent ?
> > > > >
> > > > > For example, we have plan:
> > > > >
> > > > > join-rel
> > > > > / \
> > > > > scan1 scan2
> > > > >
> > > > >
> > > > > We can get scan1 and scan2 in join-rel directly with method
> getInput, but
> > > > > how can we get the join rel in scan1 and scan 2 ?
> > > > >
> > > > > I know that there is a RelShuttle that can visit every RelNode and
> if I
> > > > > make a cache for the inputs mapping, finally I can get the
> ‘parents’ from
> > > > > the cache, but this is boring code and not that intuitive.
> > > > >
> > > > > Do you guys have any good ideas ?
> > > > >
> > > > > [1]
> > > > >
> https://github.com/apache/calcite/blob/ee83efd360793ef4201f4cdfc2af8d837b76ca69/core/src/main/java/org/apache/calcite/rel/RelNode.java#L132
> > > > >
> > > > >
> > > > > Best,
> > > > > Danny Chan
> > > > >
> >
>


Re: Question about complex rule operands and Rel tree in general

2018-11-08 Thread Андрей Цвелодуб
Yes, I have already appreciated the value and flexibility that predefined
rules provide, and would not want to lose it :)
I didn't look into the query metadata much before, but that looks promising!
Thanks!

On Thu, Nov 8, 2018 at 11:05 PM Julian Hyde  wrote:

> For validation (i.e. that doesn’t modify the tree), I would use a visitor.
> RelVisitor may suffice.
>
> There are also a few “whole tree” transformations, e.g. column pruning.
> Use sparingly.
>
> You are correct that rules and their operands do not “scale” to match
> large sections of the tree. We could in principle extend matching a little
> (e.g. better handing of Union with many inputs) but the locality is mostly
> a good thing. In a Volcano graph, there are multiple nodes in each
> equivalence set, therefore huge numbers of paths through the graph. Deep
> matches would quickly become intractable.
>
> I strongly recommend using traits, and in particular predicates
> (RelMdPredicates / RelOptPredicateList). Let’s suppose you want to know
> whether a particular input column is always equal to 5. You could write a
> rule that looks for a Project several layers down whose expression is the
> literal 5. But much better is to look at the predicates. Predicates are
> propagated up the tree, which means you don’t need to look at the
> structure, and you can reason and act locally.
>
> Similar arguments apply for sort and distribution (which are also traits).
>
> If are able to package your logic into a RelOptRule you will be pleased
> with the results. It composes beautifully and efficiently with the hundreds
> of other rules, and with all the flavors of metadata.
>
> Julian
>
>
> > On Nov 8, 2018, at 12:50 PM, Андрей Цвелодуб 
> wrote:
> >
> > Hello everyone!
> >
> > I have a question that I can't find an answer to, so maybe someone could
> > help me.
> > As a part of Rel Rules, there is always an operand, that matches a part
> of
> > the tree, and says if the rule should be executed.
> > The operand can be complex, so I can say for example - match an Aggregate
> > on top of Project on top of Filter. AFAIU, this operand will only match
> if
> > exactly this three nodes will be somewhere in the tree.
> > But here is my question - what if I want a rule that will match a more
> > generic structure, like this
> >> Aggregate
> >> -...
> >> --* any number of any nodes in any levels
> >> ---...
> >>   Project
> > Is there an official way to do that?
> >
> > My first approach was to match any Aggregate and then try to inspect the
> > underlying tree in matches()/onMatch(), but this turned out to be quite
> > unreliable since it involves inspecting RelSubsets (and this shouldn't be
> > done, as follows from
> >
> https://lists.apache.org/thread.html/ee2349272e9d344228595c0940820b2fc525cc6115388c48e99495a6@%3Cdev.calcite.apache.org%3E
> ).
> >
> >
> > In case I'm doing it all wrong, I can formulate my question even broader
> -
> > is there a mechanism to perform validation of the execution tree during
> the
> > planning process, i.e. skip some plans as unimplementable based on their
> > internal structure. As an example imagine I want to say that in
> > JdbcConvention, all plans that have a Filter node, over a Project node
> that
> > has more than three fields, should not be implemented. (Modifying cost
> > calculation is also not an option since the plan still has RelSubsets)
> >
> > I hope this makes sense, and thanks in advance!
> >
> > Best Regards,
> > Andrew Tsvielodub
>
>


Question about complex rule operands and Rel tree in general

2018-11-08 Thread Андрей Цвелодуб
Hello everyone!

I have a question that I can't find an answer to, so maybe someone could
help me.
As a part of Rel Rules, there is always an operand, that matches a part of
the tree, and says if the rule should be executed.
The operand can be complex, so I can say for example - match an Aggregate
on top of Project on top of Filter. AFAIU, this operand will only match if
exactly this three nodes will be somewhere in the tree.
But here is my question - what if I want a rule that will match a more
generic structure, like this
>Aggregate
>-...
>--* any number of any nodes in any levels
>---...
>  Project
Is there an official way to do that?

My first approach was to match any Aggregate and then try to inspect the
underlying tree in matches()/onMatch(), but this turned out to be quite
unreliable since it involves inspecting RelSubsets (and this shouldn't be
done, as follows from
https://lists.apache.org/thread.html/ee2349272e9d344228595c0940820b2fc525cc6115388c48e99495a6@%3Cdev.calcite.apache.org%3E).


In case I'm doing it all wrong, I can formulate my question even broader -
is there a mechanism to perform validation of the execution tree during the
planning process, i.e. skip some plans as unimplementable based on their
internal structure. As an example imagine I want to say that in
JdbcConvention, all plans that have a Filter node, over a Project node that
has more than three fields, should not be implemented. (Modifying cost
calculation is also not an option since the plan still has RelSubsets)

I hope this makes sense, and thanks in advance!

Best Regards,
Andrew Tsvielodub