[DISCUSS] Proposal to add API to force rules matching specific rels

2019-10-31 Thread Vladimir Ozerov
Hi colleagues,

I would like to discuss with the community the possibility of adding a new
public method to VolcanoPlanner which will forcefully re-trigger the rules
for the specific rel. This is a follow up of a discussion started in the
other thread [1].

**Problem statement**
When converting between conventions during optimization VolcanoPlanner
prefers the top-bottom approach, so that the nodes are converted from the
root. But in some cases, the intermediate node must be converted after its
children. This is especially true for distributed SQL engines, which rely
on distribution traits during the optimization process: it is not possible
to efficiently choose a proper physical implementation of a parent node
unless the physical representation of a child node is known.

It seems that presently VolcanoPlanner cannot address such cases by
default. Consider that we have two nodes and associated rules which convert
them to physical counterparts:
[LogicalParent <- LogicalChild]
The parent should be converted after the child. When the child is
converted, the physical node is created:
[LogicalParent <- {LogicalChild, PhysicalChild}]
In order to finish the optimization process, we need to convert the parent.
But parent rules are not fired, because PhysicalChild has traits
incompatible with LogicalParent.

Presently the problem could be solved in two ways:
1) Always produce conversions when going top-down. In this case, I first
create a physical parent, then a physical child. The problem is that
created parent is not optimal because it didn't know child distribution at
the time of creation. So the full flow would be: create a bad parent,
create a good child, create a good parent.
1.1) [LogicalParent <- LogicalChild]
1.2) [{LogicalParent, PhysicalParentBad} <- LogicalChild]
1.3) [{LogicalParent, PhysicalParentBad} <- {LogicalChild, PhysicalChild}]
1.4) [{LogicalParent, PhysicalParentBad, PhysicalParentGood} <-
{LogicalChild, PhysicalChild}]
What is worse, the creation of a not optimal parent will trigger rules on
its parent, which in turn may create a not optimal parent-parent node, etc.

2) Make sure that your convention returns true for
Convention.canConvertConvention. In this case, every time a new rel is
added to a RelSet, its traitset will be compared to traitsets of all other
rels in the set. For every pair of traitset we may ask the engine to create
a relevant AbstractConverter. The net effect is that "physical-to-logical"
converter is created, which re-triggers the rule on the logical parent
since their conventions are compatible:
2.1) [LogicalParent  <- LogicalChild]
2.2) [LogicalParent <- {LogicalChild, PhysicalChild}]
2.3) [LogicalParent <- {LogicalChild, PhysicalChild,
PhysicalToLogicalConverter}]
2.4) [{LogicalParent, PhysicalParent} <- {LogicalChild, PhysicalChild,
PhysicalToLogicalConverter}]

The problem with that approach is that it is too coarse-grained since we
operate on traitsets rather than rels. As a result, additional memory and
CPU pressure are introduced because usually too many converters are
created, which triggers the rules over and over again.

**Affected products**
At the moment two distributed engines are being developed for Hazelcast and
Apache Ignite. Both require bottom-up optimization and currently rely on
the second workaround.
Another example is Apache Drill. I do not know whether its community is
concerned with the issue, but it also uses bottom-up optimization for many
rules and employs both the aforementioned workarounds. As a result, I guess
that Drill's optimizer also creates too many rels during optimization and
suffer from huge search space. Please correct me if I am wrong.

**Proposal**
The key problem is that there is no way to re-fire rules on a specific
node. The original problem could have been solved, if it would be possible
to re-fire rules on a LogicalParent without creating any additional rels.
That would lead to a clear optimization flow:
2.1) [LogicalParent  <- LogicalChild]
2.2) [LogicalParent <- {LogicalChild, PhysicalChild}]
2.3) [{LogicalParent, PhysicalParent} <- {LogicalChild, PhysicalChild}]

We can add the following method to VolcanoPlanner (RelOptPlanner?)
interface:
void fireRules(RelNode rel)

This method will fire the rules on a passed node in a deferred mode as if
it was the new node just added to the optimizer. This would require slight
changes to RuleQueue.addMatch method, and possibly some other places.

Usage example:
class PhysicalChildRule extends RelOptRule {
void onMatch(RelOptRuleCall call) {
LogicalChild logicalRel = call.get(0);
PhysicalChild physicalRel = ...;

call.transformTo(physicalRel);
optimizer.fireRules(logicalRel);
}
}

What does the community think about such a method? Does it make any sense
to you? If not, do you aware of any other ways on how to organize bottom-up
optimization with VolcanoPlanner without the creation of additional rels?

If the community is OK in general, I can creat

Re: [DISCUSS] Proposal to add API to force rules matching specific rels

2019-10-31 Thread Seliverstov Igor
Not only "fireRule" method is needed, but the way to get all parents of a
set, containing a subset, produced by the transformation. Or a way to fire
rules without traits satisfaction check.

чт, 31 окт. 2019 г., 10:56 Vladimir Ozerov :

> Hi colleagues,
>
> I would like to discuss with the community the possibility of adding a new
> public method to VolcanoPlanner which will forcefully re-trigger the rules
> for the specific rel. This is a follow up of a discussion started in the
> other thread [1].
>
> **Problem statement**
> When converting between conventions during optimization VolcanoPlanner
> prefers the top-bottom approach, so that the nodes are converted from the
> root. But in some cases, the intermediate node must be converted after its
> children. This is especially true for distributed SQL engines, which rely
> on distribution traits during the optimization process: it is not possible
> to efficiently choose a proper physical implementation of a parent node
> unless the physical representation of a child node is known.
>
> It seems that presently VolcanoPlanner cannot address such cases by
> default. Consider that we have two nodes and associated rules which convert
> them to physical counterparts:
> [LogicalParent <- LogicalChild]
> The parent should be converted after the child. When the child is
> converted, the physical node is created:
> [LogicalParent <- {LogicalChild, PhysicalChild}]
> In order to finish the optimization process, we need to convert the parent.
> But parent rules are not fired, because PhysicalChild has traits
> incompatible with LogicalParent.
>
> Presently the problem could be solved in two ways:
> 1) Always produce conversions when going top-down. In this case, I first
> create a physical parent, then a physical child. The problem is that
> created parent is not optimal because it didn't know child distribution at
> the time of creation. So the full flow would be: create a bad parent,
> create a good child, create a good parent.
> 1.1) [LogicalParent <- LogicalChild]
> 1.2) [{LogicalParent, PhysicalParentBad} <- LogicalChild]
> 1.3) [{LogicalParent, PhysicalParentBad} <- {LogicalChild, PhysicalChild}]
> 1.4) [{LogicalParent, PhysicalParentBad, PhysicalParentGood} <-
> {LogicalChild, PhysicalChild}]
> What is worse, the creation of a not optimal parent will trigger rules on
> its parent, which in turn may create a not optimal parent-parent node, etc.
>
> 2) Make sure that your convention returns true for
> Convention.canConvertConvention. In this case, every time a new rel is
> added to a RelSet, its traitset will be compared to traitsets of all other
> rels in the set. For every pair of traitset we may ask the engine to create
> a relevant AbstractConverter. The net effect is that "physical-to-logical"
> converter is created, which re-triggers the rule on the logical parent
> since their conventions are compatible:
> 2.1) [LogicalParent  <- LogicalChild]
> 2.2) [LogicalParent <- {LogicalChild, PhysicalChild}]
> 2.3) [LogicalParent <- {LogicalChild, PhysicalChild,
> PhysicalToLogicalConverter}]
> 2.4) [{LogicalParent, PhysicalParent} <- {LogicalChild, PhysicalChild,
> PhysicalToLogicalConverter}]
>
> The problem with that approach is that it is too coarse-grained since we
> operate on traitsets rather than rels. As a result, additional memory and
> CPU pressure are introduced because usually too many converters are
> created, which triggers the rules over and over again.
>
> **Affected products**
> At the moment two distributed engines are being developed for Hazelcast and
> Apache Ignite. Both require bottom-up optimization and currently rely on
> the second workaround.
> Another example is Apache Drill. I do not know whether its community is
> concerned with the issue, but it also uses bottom-up optimization for many
> rules and employs both the aforementioned workarounds. As a result, I guess
> that Drill's optimizer also creates too many rels during optimization and
> suffer from huge search space. Please correct me if I am wrong.
>
> **Proposal**
> The key problem is that there is no way to re-fire rules on a specific
> node. The original problem could have been solved, if it would be possible
> to re-fire rules on a LogicalParent without creating any additional rels.
> That would lead to a clear optimization flow:
> 2.1) [LogicalParent  <- LogicalChild]
> 2.2) [LogicalParent <- {LogicalChild, PhysicalChild}]
> 2.3) [{LogicalParent, PhysicalParent} <- {LogicalChild, PhysicalChild}]
>
> We can add the following method to VolcanoPlanner (RelOptPlanner?)
> interface:
> void fireRules(RelNode rel)
>
> This method will fire the rules on a passed node in a deferred mode as if
> it was the new node just added to the optimizer. This would require slight
> changes to RuleQueue.addMatch method, and possibly some other places.
>
> Usage example:
> class PhysicalChildRule extends RelOptRule {
> void onMatch(RelOptRuleCall call) {
> LogicalChild logicalRel = call.ge

[jira] [Created] (CALCITE-3463) Parse SQL Query - Non-query expression encountered in illegal context

2019-10-31 Thread Yunpeng Niu (Jira)
Yunpeng Niu created CALCITE-3463:


 Summary: Parse SQL Query - Non-query expression encountered in 
illegal context
 Key: CALCITE-3463
 URL: https://issues.apache.org/jira/browse/CALCITE-3463
 Project: Calcite
  Issue Type: Bug
  Components: core
Affects Versions: 1.21.0
 Environment: OS: macOS Catalina 10.15

Java: java 12.0.2 2019-07-16
Reporter: Yunpeng Niu


I tried to ask calcite-core to parse the following SQL query:

 
{code:java}
final SqlParser.Config parserConfig = SqlParser.configBuilder()
.setConformance(SqlConformanceEnum.MYSQL_5)
.setLex(Lex.MYSQL)
.build();
String query = "SELECT * FROM departments d INNER JOIN (dept_manager m INNER 
JOIN employees e ON m.emp_no = e.emp_no) ON d.dept_no = m.dept_no";
SqlParser.create(query, parserConfig).parseQuery();
{code}
 

 

But there is the following exception:
{noformat}
Exception in thread "main" org.apache.calcite.sql.parser.SqlParseException: 
Non-query expression encountered in illegal contextException in thread "main" 
org.apache.calcite.sql.parser.SqlParseException: Non-query expression 
encountered in illegal context at 
org.apache.calcite.sql.parser.impl.SqlParserImpl.convertException(SqlParserImpl.java:362)
 at 
org.apache.calcite.sql.parser.impl.SqlParserImpl.normalizeException(SqlParserImpl.java:147)
 at org.apache.calcite.sql.parser.SqlParser.handleException(SqlParser.java:148) 
at org.apache.calcite.sql.parser.SqlParser.parseQuery(SqlParser.java:163) at 
org.apache.calcite.Runner.main(Runner.java:60)Caused by: 
org.apache.calcite.runtime.CalciteException: Non-query expression encountered 
in illegal context at 
java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native
 Method) at 
java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
 at 
java.base/jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
 at 
java.base/java.lang.reflect.Constructor.newInstanceWithCaller(Constructor.java:500)
 at java.base/java.lang.reflect.Constructor.newInstance(Constructor.java:481) 
at org.apache.calcite.runtime.Resources$ExInstWithCause.ex(Resources.java:463) 
at org.apache.calcite.runtime.Resources$ExInst.ex(Resources.java:572) at 
org.apache.calcite.sql.SqlUtil.newContextException(SqlUtil.java:835) at 
org.apache.calcite.sql.SqlUtil.newContextException(SqlUtil.java:820) at 
org.apache.calcite.sql.parser.impl.SqlParserImpl.checkNonQueryExpression(SqlParserImpl.java:306)
 at 
org.apache.calcite.sql.parser.impl.SqlParserImpl.Expression3(SqlParserImpl.java:13780)
 at 
org.apache.calcite.sql.parser.impl.SqlParserImpl.Expression2b(SqlParserImpl.java:13449)
 at 
org.apache.calcite.sql.parser.impl.SqlParserImpl.Expression2(SqlParserImpl.java:13490)
 at 
org.apache.calcite.sql.parser.impl.SqlParserImpl.Expression(SqlParserImpl.java:13421)
 at 
org.apache.calcite.sql.parser.impl.SqlParserImpl.LeafQueryOrExpr(SqlParserImpl.java:13398)
 at 
org.apache.calcite.sql.parser.impl.SqlParserImpl.QueryOrExpr(SqlParserImpl.java:12874)
 at 
org.apache.calcite.sql.parser.impl.SqlParserImpl.OrderedQueryOrExpr(SqlParserImpl.java:478)
 at 
org.apache.calcite.sql.parser.impl.SqlParserImpl.ParenthesizedExpression(SqlParserImpl.java:639)
 at 
org.apache.calcite.sql.parser.impl.SqlParserImpl.TableRef2(SqlParserImpl.java:8145)
 at 
org.apache.calcite.sql.parser.impl.SqlParserImpl.TableRef(SqlParserImpl.java:8068)
 at 
org.apache.calcite.sql.parser.impl.SqlParserImpl.FromClause(SqlParserImpl.java:7969)
 at 
org.apache.calcite.sql.parser.impl.SqlParserImpl.SqlSelect(SqlParserImpl.java:3722)
 at 
org.apache.calcite.sql.parser.impl.SqlParserImpl.LeafQuery(SqlParserImpl.java:604)
 at 
org.apache.calcite.sql.parser.impl.SqlParserImpl.LeafQueryOrExpr(SqlParserImpl.java:13404)
 at 
org.apache.calcite.sql.parser.impl.SqlParserImpl.QueryOrExpr(SqlParserImpl.java:12874)
 at 
org.apache.calcite.sql.parser.impl.SqlParserImpl.OrderedQueryOrExpr(SqlParserImpl.java:478)
 at 
org.apache.calcite.sql.parser.impl.SqlParserImpl.SqlStmt(SqlParserImpl.java:3626)
 at 
org.apache.calcite.sql.parser.impl.SqlParserImpl.SqlStmtEof(SqlParserImpl.java:3664)
 at 
org.apache.calcite.sql.parser.impl.SqlParserImpl.parseSqlStmtEof(SqlParserImpl.java:194)
 at org.apache.calcite.sql.parser.SqlParser.parseQuery(SqlParser.java:161) ... 
1 more
Process finished with exit code 1{noformat}
 

I have tried to run the above query in MySQL database and I am sure it is valid 
syntax. Can anyone help to explain why the above exception is thrown? Thanks. 
And how should I fix this?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [DISCUSS] Proposal to add API to force rules matching specific rels

2019-10-31 Thread Danny Chan
Thanks Vladimir for bringing up this discussion !

Did you notice that there is a JIRA issue about this problem ? [1] Also a 
discussion about how to propagate the traits [2]

[1] https://issues.apache.org/jira/browse/CALCITE-2970
[2] 
https://ponymail-vm.apache.org/_GUI_/thread.html/79dac47ea50b5dfbd3f234e368ed61d247fb0eb989f87fe01aedaf25@%3Cdev.calcite.apache.org%3E

Best,
Danny Chan
在 2019年10月31日 +0800 PM3:56,Vladimir Ozerov ,写道:
> Hi colleagues,
>
> I would like to discuss with the community the possibility of adding a new
> public method to VolcanoPlanner which will forcefully re-trigger the rules
> for the specific rel. This is a follow up of a discussion started in the
> other thread [1].
>
> **Problem statement**
> When converting between conventions during optimization VolcanoPlanner
> prefers the top-bottom approach, so that the nodes are converted from the
> root. But in some cases, the intermediate node must be converted after its
> children. This is especially true for distributed SQL engines, which rely
> on distribution traits during the optimization process: it is not possible
> to efficiently choose a proper physical implementation of a parent node
> unless the physical representation of a child node is known.
>
> It seems that presently VolcanoPlanner cannot address such cases by
> default. Consider that we have two nodes and associated rules which convert
> them to physical counterparts:
> [LogicalParent <- LogicalChild]
> The parent should be converted after the child. When the child is
> converted, the physical node is created:
> [LogicalParent <- {LogicalChild, PhysicalChild}]
> In order to finish the optimization process, we need to convert the parent.
> But parent rules are not fired, because PhysicalChild has traits
> incompatible with LogicalParent.
>
> Presently the problem could be solved in two ways:
> 1) Always produce conversions when going top-down. In this case, I first
> create a physical parent, then a physical child. The problem is that
> created parent is not optimal because it didn't know child distribution at
> the time of creation. So the full flow would be: create a bad parent,
> create a good child, create a good parent.
> 1.1) [LogicalParent <- LogicalChild]
> 1.2) [{LogicalParent, PhysicalParentBad} <- LogicalChild]
> 1.3) [{LogicalParent, PhysicalParentBad} <- {LogicalChild, PhysicalChild}]
> 1.4) [{LogicalParent, PhysicalParentBad, PhysicalParentGood} <-
> {LogicalChild, PhysicalChild}]
> What is worse, the creation of a not optimal parent will trigger rules on
> its parent, which in turn may create a not optimal parent-parent node, etc.
>
> 2) Make sure that your convention returns true for
> Convention.canConvertConvention. In this case, every time a new rel is
> added to a RelSet, its traitset will be compared to traitsets of all other
> rels in the set. For every pair of traitset we may ask the engine to create
> a relevant AbstractConverter. The net effect is that "physical-to-logical"
> converter is created, which re-triggers the rule on the logical parent
> since their conventions are compatible:
> 2.1) [LogicalParent <- LogicalChild]
> 2.2) [LogicalParent <- {LogicalChild, PhysicalChild}]
> 2.3) [LogicalParent <- {LogicalChild, PhysicalChild,
> PhysicalToLogicalConverter}]
> 2.4) [{LogicalParent, PhysicalParent} <- {LogicalChild, PhysicalChild,
> PhysicalToLogicalConverter}]
>
> The problem with that approach is that it is too coarse-grained since we
> operate on traitsets rather than rels. As a result, additional memory and
> CPU pressure are introduced because usually too many converters are
> created, which triggers the rules over and over again.
>
> **Affected products**
> At the moment two distributed engines are being developed for Hazelcast and
> Apache Ignite. Both require bottom-up optimization and currently rely on
> the second workaround.
> Another example is Apache Drill. I do not know whether its community is
> concerned with the issue, but it also uses bottom-up optimization for many
> rules and employs both the aforementioned workarounds. As a result, I guess
> that Drill's optimizer also creates too many rels during optimization and
> suffer from huge search space. Please correct me if I am wrong.
>
> **Proposal**
> The key problem is that there is no way to re-fire rules on a specific
> node. The original problem could have been solved, if it would be possible
> to re-fire rules on a LogicalParent without creating any additional rels.
> That would lead to a clear optimization flow:
> 2.1) [LogicalParent <- LogicalChild]
> 2.2) [LogicalParent <- {LogicalChild, PhysicalChild}]
> 2.3) [{LogicalParent, PhysicalParent} <- {LogicalChild, PhysicalChild}]
>
> We can add the following method to VolcanoPlanner (RelOptPlanner?)
> interface:
> void fireRules(RelNode rel)
>
> This method will fire the rules on a passed node in a deferred mode as if
> it was the new node just added to the optimizer. This would require slight
> changes to RuleQueue.ad

[jira] [Created] (CALCITE-3464) RexSimplify simplifies plan having filter with NULL to empty values

2019-10-31 Thread Wang Yanlin (Jira)
Wang Yanlin created CALCITE-3464:


 Summary: RexSimplify simplifies plan having filter with NULL to 
empty values
 Key: CALCITE-3464
 URL: https://issues.apache.org/jira/browse/CALCITE-3464
 Project: Calcite
  Issue Type: Bug
Reporter: Wang Yanlin


When filter by  comparing to null in sql, the plan will get empty result

{code:java}
@Test public void testSimplifyItemEqualNull() {
String query = "select * from sales.customer as t1 where name = NULL";
sql(query)
.withTester(t -> createDynamicTester())
.withRule(ReduceExpressionsRule.FILTER_INSTANCE)
.check();
  }
{code}

The plan after optimization is like this
{code:java}
LogicalProject(**=[$1])
  LogicalValues(tuples=[[]])
{code}

The optimized plan will get empty result, is this the result we want?




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (CALCITE-3465) Cover Cassandra 3.x data types

2019-10-31 Thread Alessandro Solimando (Jira)
Alessandro Solimando created CALCITE-3465:
-

 Summary: Cover Cassandra 3.x data types
 Key: CALCITE-3465
 URL: https://issues.apache.org/jira/browse/CALCITE-3465
 Project: Calcite
  Issue Type: Improvement
  Components: cassandra-adapter
Affects Versions: 1.21.0
Reporter: Alessandro Solimando


Currently cassandra-adapter covers only part of the [data types available in 
Cassandra 
3.x|[https://docs.datastax.com/en/archived/cql/3.3/cql/cql_reference/cql_data_types_c.html]],
 the scope of the ticket is to extend the coverage and support all data types.

Current status:
||CQL Data Type||SQL Data Type||Java Class||Supported||
|uuid|CHAR|java.lang.String|Y|
|timeuuid|CHAR|java.lang.String|Y|
|ascii|VARCHAR|java.lang.String|Y|
|text|VARCHAR|java.lang.String|Y|
|varchar|VARCHAR|java.lang.String|Y|
|int (cint)|INTEGER|int|Y|
|varint|INTEGER|int|Y|
|bigint|BIGINT|long|Y|
|double (cdouble)|DOUBLE|double|Y|
|float (cfloat)|DOUBLE (should be REAL?)|float|Y|
|decimal|DOUBLE|N/A|Y|
|blob|VARBINARY|N/A|N|
|boolean|BOOLEAN|N/A|N|
|counter|INTEGER|N/A|N|
|date|DATE|N/A|N|
|frozen|?|N/A|N|
|inet|?|N/A|N|
|list|ARRAY?|N/A|N|
|map|MAP|N/A|N|
|set|?|N/A|N|
|smallint|SMALLINT|N/A|N|
|time|TIME?|N/A|N|
|timestamp|TIMESTAMP|N/A|N|
|tinyint|TINYINT|N/A|N|
|tuple|ARRAY?|N/A|N|

Second column is derived from 
_org.apache.calcite.adapter.cassandra.CassandraSchema.getRelDataType(...),_
third column from _org.apache.calcite.adapter.cassandra.currentRowField(...)._



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: CassandraAdapter (Add Type) and WHERE statement.

2019-10-31 Thread Alessandro Solimando
Hello,
I have logged a Jira ticket for this:
https://issues.apache.org/jira/browse/CALCITE-3465

I have listed all the data types for Cassandra 3.x, and I have tried to
compile a table with the current status, but there are few entries for
which I am not sure.

If you have time to contribute to the table or material I can use to fill
the blanks, it would be great.

Best regards,
Alessandro

On Wed, 16 Oct 2019 at 17:08, Michael Mior  wrote:

> You're right that there are several types which are not supported by
> the Cassandra adapter. We would happily accept pull requests to add
> support for new types.
>
> You're also correct that Cassandra cannot efficiently execute queries
> which do not specify the partition key. Calcite will make those
> queries more efficient, but it can make it easier to execute queries
> that CQL does not directly support. Ultimately data is still stored
> based on the partition key, so if your query does not specify a
> partition key, Calcite will still need to issue an expensive
> cross-partition query to Cassandra.
> --
> Michael Mior
> mm...@apache.org
>
> Le mer. 16 oct. 2019 à 07:57, Yanna elina  a
> écrit :
> >
> > Hi guys ,
> >
> > I study Calcite the benefits that a Cassandra-Calcite Adapter can bring ,
> > as for example brings the possibility of join.
> >
> > the problem type defined into CassandraSchema.getRelDataType(..) is very
> > limited
> > some important type are missing  boolean / array ect...
> >
> > I thought inherited from the class CassandraSchema for Override  this
> > method and add more type but this method is used inside CassandraTable
> too.
> >
> > i would like to avoid  to re-write fully this adapter  :)
> >
> > do you have suggestions?
> >
> > My second question  is : Cassandra is not optimized to have WHERE on key
> > not defined on cluster/partition key. I was wondering if calcite could
> play
> > a role without this mechanism to improve performance
> >
> >
> > Thank !
> >
> > Yanna
>


Re: [DISCUSS] Proposal to add API to force rules matching specific rels

2019-10-31 Thread Vladimir Ozerov
Hi Danny,

Thank you very much for the links. What is described here is pretty much
similar to the problem I describe. Especially the discussion about trait
propagation, as this is basically what I need - to explore potential traits
of children before optimizing parents. And this is basically what Drill
already does with it's SubsetTransformer:
1) There is a SubsetTransformer interface, which iterates over physical
relations of the given subset [1]
2) If you want to make a physical project, you iterate over physical
relations of the input subset and create possible physical projects [2]
3) But if you cannot find any physical input, then you trigger creation of
a "bad" physical project, which is very likely to have poor cost because it
cannot take advantage of input's distribution information [3]
So, step 2 - is a trait set propagation which is needed by many
distributed engines. Step 3 - an attempt to workaround current
VolcanoPlanner behavior, when a parent rule is fired only if produced child
node has compatible trait set.

I do not know Calcite's architecture that good but at first glance, the
proposed ability to re-fire rules of a specific Rel seems good enough. It
doesn't expand search space, because no new nodes are created, and it seems
to be relatively simple to implement.

[1]
https://github.com/apache/drill/blob/1.16.0/exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/SubsetTransformer.java
[2]
https://github.com/apache/drill/blob/1.16.0/exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/ProjectPrule.java#L66
[3]
https://github.com/apache/drill/blob/1.16.0/exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/ProjectPrule.java#L69

чт, 31 окт. 2019 г. в 12:21, Danny Chan :

> Thanks Vladimir for bringing up this discussion !
>
> Did you notice that there is a JIRA issue about this problem ? [1] Also a
> discussion about how to propagate the traits [2]
>
> [1] https://issues.apache.org/jira/browse/CALCITE-2970
> [2]
> https://ponymail-vm.apache.org/_GUI_/thread.html/79dac47ea50b5dfbd3f234e368ed61d247fb0eb989f87fe01aedaf25@%3Cdev.calcite.apache.org%3E
>
> Best,
> Danny Chan
> 在 2019年10月31日 +0800 PM3:56,Vladimir Ozerov ,写道:
> > Hi colleagues,
> >
> > I would like to discuss with the community the possibility of adding a
> new
> > public method to VolcanoPlanner which will forcefully re-trigger the
> rules
> > for the specific rel. This is a follow up of a discussion started in the
> > other thread [1].
> >
> > **Problem statement**
> > When converting between conventions during optimization VolcanoPlanner
> > prefers the top-bottom approach, so that the nodes are converted from the
> > root. But in some cases, the intermediate node must be converted after
> its
> > children. This is especially true for distributed SQL engines, which rely
> > on distribution traits during the optimization process: it is not
> possible
> > to efficiently choose a proper physical implementation of a parent node
> > unless the physical representation of a child node is known.
> >
> > It seems that presently VolcanoPlanner cannot address such cases by
> > default. Consider that we have two nodes and associated rules which
> convert
> > them to physical counterparts:
> > [LogicalParent <- LogicalChild]
> > The parent should be converted after the child. When the child is
> > converted, the physical node is created:
> > [LogicalParent <- {LogicalChild, PhysicalChild}]
> > In order to finish the optimization process, we need to convert the
> parent.
> > But parent rules are not fired, because PhysicalChild has traits
> > incompatible with LogicalParent.
> >
> > Presently the problem could be solved in two ways:
> > 1) Always produce conversions when going top-down. In this case, I first
> > create a physical parent, then a physical child. The problem is that
> > created parent is not optimal because it didn't know child distribution
> at
> > the time of creation. So the full flow would be: create a bad parent,
> > create a good child, create a good parent.
> > 1.1) [LogicalParent <- LogicalChild]
> > 1.2) [{LogicalParent, PhysicalParentBad} <- LogicalChild]
> > 1.3) [{LogicalParent, PhysicalParentBad} <- {LogicalChild,
> PhysicalChild}]
> > 1.4) [{LogicalParent, PhysicalParentBad, PhysicalParentGood} <-
> > {LogicalChild, PhysicalChild}]
> > What is worse, the creation of a not optimal parent will trigger rules on
> > its parent, which in turn may create a not optimal parent-parent node,
> etc.
> >
> > 2) Make sure that your convention returns true for
> > Convention.canConvertConvention. In this case, every time a new rel is
> > added to a RelSet, its traitset will be compared to traitsets of all
> other
> > rels in the set. For every pair of traitset we may ask the engine to
> create
> > a relevant AbstractConverter. The net effect is that
> "physical-to-logical"
> > converter is created, which re-triggers the rule on the logical parent
> > since their conventions are compa

[jira] [Created] (CALCITE-3466) JDBC adapter dropped group by statement in subquery

2019-10-31 Thread Wang Weidong (Jira)
Wang Weidong created CALCITE-3466:
-

 Summary: JDBC adapter dropped group by statement in subquery
 Key: CALCITE-3466
 URL: https://issues.apache.org/jira/browse/CALCITE-3466
 Project: Calcite
  Issue Type: Bug
  Components: core
Affects Versions: 1.21.0, 1.22.0
Reporter: Wang Weidong
 Fix For: 1.22.0


While convert RelNode to SqlNode, the "group by" statement in subquery was 
dropped unexpectedly. For example,
original sql is:
_select a, avg(c1) from (select a,sum(d),b as c1 from t_testProjectMerge group 
by b,a) as t group by a_
RelNode converted by SqlNode is 
_LogicalAggregate(group=[{0}], EXPR$1=[AVG($1)])
  LogicalProject(A=[$1], C1=[$0])
LogicalAggregate(group=[{0, 1}], EXPR$1=[SUM($2)])
  LogicalProject(B=[$1], A=[$0], D=[$3])
EnumerableTableScan(table=[[T_TESTPROJECTMERGE]])
_



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (CALCITE-3467) fix PartialMapping mistake and Cosmetic Change

2019-10-31 Thread xzh_dz (Jira)
xzh_dz created CALCITE-3467:
---

 Summary: fix PartialMapping mistake and Cosmetic Change
 Key: CALCITE-3467
 URL: https://issues.apache.org/jira/browse/CALCITE-3467
 Project: Calcite
  Issue Type: Wish
Reporter: xzh_dz


fix PartialMapping mistake and Cosmetic Change



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [DISCUSS] Proposal to add API to force rules matching specific rels

2019-10-31 Thread Xiening Dai
Hi Vladimir,

I think for short/mid term, #2 way (using AbstractConverter) should work after 
we fix CALCITE-2970. We already understand the root cause, now are looking at 
the best way to fix it. If you cannot wait, you can also create your own 
converter rule so it won’t generate logical node, and the performance should be 
much better. And to your concern regarding the overhead of creating 
AbstractConverter objects, I think they are just minor overheads compared to 
the rest of part of the work framework’s doing (rule matching, merging set, 
etc). 

Regarding the comment "to explore potential traits of children before 
optimizing parents”, I totally agree and want to point out we should also 
consider parent requirements. Currently such mechanism is missing in Calcite, 
and we are having discussion on how to add it as part of the Volcano planner. 
Danny'd shared the mail archive previously. Welcome to join the discussion. 


> On Oct 31, 2019, at 3:37 AM, Vladimir Ozerov  wrote:
> 
> Hi Danny,
> 
> Thank you very much for the links. What is described here is pretty much
> similar to the problem I describe. Especially the discussion about trait
> propagation, as this is basically what I need - to explore potential traits
> of children before optimizing parents. And this is basically what Drill
> already does with it's SubsetTransformer:
> 1) There is a SubsetTransformer interface, which iterates over physical
> relations of the given subset [1]
> 2) If you want to make a physical project, you iterate over physical
> relations of the input subset and create possible physical projects [2]
> 3) But if you cannot find any physical input, then you trigger creation of
> a "bad" physical project, which is very likely to have poor cost because it
> cannot take advantage of input's distribution information [3]
> So, step 2 - is a trait set propagation which is needed by many
> distributed engines. Step 3 - an attempt to workaround current
> VolcanoPlanner behavior, when a parent rule is fired only if produced child
> node has compatible trait set.
> 
> I do not know Calcite's architecture that good but at first glance, the
> proposed ability to re-fire rules of a specific Rel seems good enough. It
> doesn't expand search space, because no new nodes are created, and it seems
> to be relatively simple to implement.
> 
> [1]
> https://github.com/apache/drill/blob/1.16.0/exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/SubsetTransformer.java
> [2]
> https://github.com/apache/drill/blob/1.16.0/exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/ProjectPrule.java#L66
> [3]
> https://github.com/apache/drill/blob/1.16.0/exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/ProjectPrule.java#L69
> 
> чт, 31 окт. 2019 г. в 12:21, Danny Chan :
> 
>> Thanks Vladimir for bringing up this discussion !
>> 
>> Did you notice that there is a JIRA issue about this problem ? [1] Also a
>> discussion about how to propagate the traits [2]
>> 
>> [1] https://issues.apache.org/jira/browse/CALCITE-2970
>> [2]
>> https://ponymail-vm.apache.org/_GUI_/thread.html/79dac47ea50b5dfbd3f234e368ed61d247fb0eb989f87fe01aedaf25@%3Cdev.calcite.apache.org%3E
>> 
>> Best,
>> Danny Chan
>> 在 2019年10月31日 +0800 PM3:56,Vladimir Ozerov ,写道:
>>> Hi colleagues,
>>> 
>>> I would like to discuss with the community the possibility of adding a
>> new
>>> public method to VolcanoPlanner which will forcefully re-trigger the
>> rules
>>> for the specific rel. This is a follow up of a discussion started in the
>>> other thread [1].
>>> 
>>> **Problem statement**
>>> When converting between conventions during optimization VolcanoPlanner
>>> prefers the top-bottom approach, so that the nodes are converted from the
>>> root. But in some cases, the intermediate node must be converted after
>> its
>>> children. This is especially true for distributed SQL engines, which rely
>>> on distribution traits during the optimization process: it is not
>> possible
>>> to efficiently choose a proper physical implementation of a parent node
>>> unless the physical representation of a child node is known.
>>> 
>>> It seems that presently VolcanoPlanner cannot address such cases by
>>> default. Consider that we have two nodes and associated rules which
>> convert
>>> them to physical counterparts:
>>> [LogicalParent <- LogicalChild]
>>> The parent should be converted after the child. When the child is
>>> converted, the physical node is created:
>>> [LogicalParent <- {LogicalChild, PhysicalChild}]
>>> In order to finish the optimization process, we need to convert the
>> parent.
>>> But parent rules are not fired, because PhysicalChild has traits
>>> incompatible with LogicalParent.
>>> 
>>> Presently the problem could be solved in two ways:
>>> 1) Always produce conversions when going top-down. In this case, I first
>>> create a physical parent, then a physical child. The problem is that
>>> created parent is not optimal because it di

Re: [DISCUSS] Proposal to add API to force rules matching specific rels

2019-10-31 Thread Vladimir Ozerov
Hi Xiening,

Yes, I think that the manual creation of converters to trigger parent rules
should be enough at the moment. I'll try to explore this possibility. Thank
you.

Regards,
Vladimir

чт, 31 окт. 2019 г. в 20:11, Xiening Dai :

> Hi Vladimir,
>
> I think for short/mid term, #2 way (using AbstractConverter) should work
> after we fix CALCITE-2970. We already understand the root cause, now are
> looking at the best way to fix it. If you cannot wait, you can also create
> your own converter rule so it won’t generate logical node, and the
> performance should be much better. And to your concern regarding the
> overhead of creating AbstractConverter objects, I think they are just minor
> overheads compared to the rest of part of the work framework’s doing (rule
> matching, merging set, etc).
>
> Regarding the comment "to explore potential traits of children before
> optimizing parents”, I totally agree and want to point out we should also
> consider parent requirements. Currently such mechanism is missing in
> Calcite, and we are having discussion on how to add it as part of the
> Volcano planner. Danny'd shared the mail archive previously. Welcome to
> join the discussion.
>
>
> > On Oct 31, 2019, at 3:37 AM, Vladimir Ozerov  wrote:
> >
> > Hi Danny,
> >
> > Thank you very much for the links. What is described here is pretty much
> > similar to the problem I describe. Especially the discussion about trait
> > propagation, as this is basically what I need - to explore potential
> traits
> > of children before optimizing parents. And this is basically what Drill
> > already does with it's SubsetTransformer:
> > 1) There is a SubsetTransformer interface, which iterates over physical
> > relations of the given subset [1]
> > 2) If you want to make a physical project, you iterate over physical
> > relations of the input subset and create possible physical projects [2]
> > 3) But if you cannot find any physical input, then you trigger creation
> of
> > a "bad" physical project, which is very likely to have poor cost because
> it
> > cannot take advantage of input's distribution information [3]
> > So, step 2 - is a trait set propagation which is needed by many
> > distributed engines. Step 3 - an attempt to workaround current
> > VolcanoPlanner behavior, when a parent rule is fired only if produced
> child
> > node has compatible trait set.
> >
> > I do not know Calcite's architecture that good but at first glance, the
> > proposed ability to re-fire rules of a specific Rel seems good enough. It
> > doesn't expand search space, because no new nodes are created, and it
> seems
> > to be relatively simple to implement.
> >
> > [1]
> >
> https://github.com/apache/drill/blob/1.16.0/exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/SubsetTransformer.java
> > [2]
> >
> https://github.com/apache/drill/blob/1.16.0/exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/ProjectPrule.java#L66
> > [3]
> >
> https://github.com/apache/drill/blob/1.16.0/exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/ProjectPrule.java#L69
> >
> > чт, 31 окт. 2019 г. в 12:21, Danny Chan :
> >
> >> Thanks Vladimir for bringing up this discussion !
> >>
> >> Did you notice that there is a JIRA issue about this problem ? [1] Also
> a
> >> discussion about how to propagate the traits [2]
> >>
> >> [1] https://issues.apache.org/jira/browse/CALCITE-2970
> >> [2]
> >>
> https://ponymail-vm.apache.org/_GUI_/thread.html/79dac47ea50b5dfbd3f234e368ed61d247fb0eb989f87fe01aedaf25@%3Cdev.calcite.apache.org%3E
> >>
> >> Best,
> >> Danny Chan
> >> 在 2019年10月31日 +0800 PM3:56,Vladimir Ozerov ,写道:
> >>> Hi colleagues,
> >>>
> >>> I would like to discuss with the community the possibility of adding a
> >> new
> >>> public method to VolcanoPlanner which will forcefully re-trigger the
> >> rules
> >>> for the specific rel. This is a follow up of a discussion started in
> the
> >>> other thread [1].
> >>>
> >>> **Problem statement**
> >>> When converting between conventions during optimization VolcanoPlanner
> >>> prefers the top-bottom approach, so that the nodes are converted from
> the
> >>> root. But in some cases, the intermediate node must be converted after
> >> its
> >>> children. This is especially true for distributed SQL engines, which
> rely
> >>> on distribution traits during the optimization process: it is not
> >> possible
> >>> to efficiently choose a proper physical implementation of a parent node
> >>> unless the physical representation of a child node is known.
> >>>
> >>> It seems that presently VolcanoPlanner cannot address such cases by
> >>> default. Consider that we have two nodes and associated rules which
> >> convert
> >>> them to physical counterparts:
> >>> [LogicalParent <- LogicalChild]
> >>> The parent should be converted after the child. When the child is
> >>> converted, the physical node is created:
> >>> [LogicalParent <- {LogicalChild, PhysicalChild}]
> >>> In order to finish th

Re: [DISCUSS] Proposal to add API to force rules matching specific rels

2019-10-31 Thread Jinfeng Ni
Hi Vladimir,

The SubsetTransformer interface and the iterating over the RelNodes
within a RelSubset in Drill  is exactly implemented to do the trait
propagation. We also had to rely on AbstractConverter to fire
necessary rule to avoid the CanNotPlan issue. At some point, Calcite
community chooses to remove AbstractConverter and Drill had to add it
back, which is probably one of the main reasons for us to continue
using a Calcite fork.  I still remember we constantly had to deal with
the dilemma between "CanNotPlan" and long planing time due to explored
search space.

Glad to see more people are joining the effort to solve this long
overdue issue, something missing in Calcite's core optimizer framework
"since before Calcite was Calcite" (Jacques's words).

Jinfeng


On Thu, Oct 31, 2019 at 3:38 AM Vladimir Ozerov  wrote:
>
> Hi Danny,
>
> Thank you very much for the links. What is described here is pretty much
> similar to the problem I describe. Especially the discussion about trait
> propagation, as this is basically what I need - to explore potential traits
> of children before optimizing parents. And this is basically what Drill
> already does with it's SubsetTransformer:
> 1) There is a SubsetTransformer interface, which iterates over physical
> relations of the given subset [1]
> 2) If you want to make a physical project, you iterate over physical
> relations of the input subset and create possible physical projects [2]
> 3) But if you cannot find any physical input, then you trigger creation of
> a "bad" physical project, which is very likely to have poor cost because it
> cannot take advantage of input's distribution information [3]
> So, step 2 - is a trait set propagation which is needed by many
> distributed engines. Step 3 - an attempt to workaround current
> VolcanoPlanner behavior, when a parent rule is fired only if produced child
> node has compatible trait set.
>
> I do not know Calcite's architecture that good but at first glance, the
> proposed ability to re-fire rules of a specific Rel seems good enough. It
> doesn't expand search space, because no new nodes are created, and it seems
> to be relatively simple to implement.
>
> [1]
> https://github.com/apache/drill/blob/1.16.0/exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/SubsetTransformer.java
> [2]
> https://github.com/apache/drill/blob/1.16.0/exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/ProjectPrule.java#L66
> [3]
> https://github.com/apache/drill/blob/1.16.0/exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/ProjectPrule.java#L69
>
> чт, 31 окт. 2019 г. в 12:21, Danny Chan :
>
> > Thanks Vladimir for bringing up this discussion !
> >
> > Did you notice that there is a JIRA issue about this problem ? [1] Also a
> > discussion about how to propagate the traits [2]
> >
> > [1] https://issues.apache.org/jira/browse/CALCITE-2970
> > [2]
> > https://ponymail-vm.apache.org/_GUI_/thread.html/79dac47ea50b5dfbd3f234e368ed61d247fb0eb989f87fe01aedaf25@%3Cdev.calcite.apache.org%3E
> >
> > Best,
> > Danny Chan
> > 在 2019年10月31日 +0800 PM3:56,Vladimir Ozerov ,写道:
> > > Hi colleagues,
> > >
> > > I would like to discuss with the community the possibility of adding a
> > new
> > > public method to VolcanoPlanner which will forcefully re-trigger the
> > rules
> > > for the specific rel. This is a follow up of a discussion started in the
> > > other thread [1].
> > >
> > > **Problem statement**
> > > When converting between conventions during optimization VolcanoPlanner
> > > prefers the top-bottom approach, so that the nodes are converted from the
> > > root. But in some cases, the intermediate node must be converted after
> > its
> > > children. This is especially true for distributed SQL engines, which rely
> > > on distribution traits during the optimization process: it is not
> > possible
> > > to efficiently choose a proper physical implementation of a parent node
> > > unless the physical representation of a child node is known.
> > >
> > > It seems that presently VolcanoPlanner cannot address such cases by
> > > default. Consider that we have two nodes and associated rules which
> > convert
> > > them to physical counterparts:
> > > [LogicalParent <- LogicalChild]
> > > The parent should be converted after the child. When the child is
> > > converted, the physical node is created:
> > > [LogicalParent <- {LogicalChild, PhysicalChild}]
> > > In order to finish the optimization process, we need to convert the
> > parent.
> > > But parent rules are not fired, because PhysicalChild has traits
> > > incompatible with LogicalParent.
> > >
> > > Presently the problem could be solved in two ways:
> > > 1) Always produce conversions when going top-down. In this case, I first
> > > create a physical parent, then a physical child. The problem is that
> > > created parent is not optimal because it didn't know child distribution
> > at
> > > the time of creation. So the full flow would be: create 

[jira] [Created] (CALCITE-3468) Oracle cast to varchar does not use precision when it should

2019-10-31 Thread Lindsey Meyer (Jira)
Lindsey Meyer created CALCITE-3468:
--

 Summary: Oracle cast to varchar does not use precision when it 
should
 Key: CALCITE-3468
 URL: https://issues.apache.org/jira/browse/CALCITE-3468
 Project: Calcite
  Issue Type: Improvement
Reporter: Lindsey Meyer


We're trying to cast a node to a varchar with something like this:
{noformat}
relBuilder.cast(node, SqlTypeName.VARCHAR)
{noformat}
And in Oracle SQL, this generates sql like
{noformat}
CAST(node AS VARCHAR){noformat}
which is incorrect. Oracle can only have VARCHAR types with an explicit 
precision.

Looking at the code, it _seems_ like this should work with Oracle, so I'm not 
really sure what's going on here



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [DISCUSS] Proposal to add API to force rules matching specific rels

2019-10-31 Thread Xiening Dai
Actually we solved this problem in our setup using a mechanism called “Pull-Up 
Traits”, which explores the possible trait set of children’s input to decide 
parent’s physical properties. In order to determine child input trait, you 
would have to look at child’s children, and all the way to the leaves nodes or 
a barrier. A barrier is a rel node which cannot derive any traits regardless 
the input. A good example would be a user define function which would throw off 
any distribution or collation. Then we realize just pulling up is not enough, 
sometimes we would need to look at parent’s requirement as well. So we try to 
solve this in a unified framework, which we call “On Demand Trait” and 
implement it as part of the framework so anyone can be benefited. I hope 
Haisheng can share a design doc once we have more concrete ideas.  


> On Oct 31, 2019, at 11:37 AM, Jinfeng Ni  wrote:
> 
> Hi Vladimir,
> 
> The SubsetTransformer interface and the iterating over the RelNodes
> within a RelSubset in Drill  is exactly implemented to do the trait
> propagation. We also had to rely on AbstractConverter to fire
> necessary rule to avoid the CanNotPlan issue. At some point, Calcite
> community chooses to remove AbstractConverter and Drill had to add it
> back, which is probably one of the main reasons for us to continue
> using a Calcite fork.  I still remember we constantly had to deal with
> the dilemma between "CanNotPlan" and long planing time due to explored
> search space.
> 
> Glad to see more people are joining the effort to solve this long
> overdue issue, something missing in Calcite's core optimizer framework
> "since before Calcite was Calcite" (Jacques's words).
> 
> Jinfeng
> 
> 
> On Thu, Oct 31, 2019 at 3:38 AM Vladimir Ozerov  wrote:
>> 
>> Hi Danny,
>> 
>> Thank you very much for the links. What is described here is pretty much
>> similar to the problem I describe. Especially the discussion about trait
>> propagation, as this is basically what I need - to explore potential traits
>> of children before optimizing parents. And this is basically what Drill
>> already does with it's SubsetTransformer:
>> 1) There is a SubsetTransformer interface, which iterates over physical
>> relations of the given subset [1]
>> 2) If you want to make a physical project, you iterate over physical
>> relations of the input subset and create possible physical projects [2]
>> 3) But if you cannot find any physical input, then you trigger creation of
>> a "bad" physical project, which is very likely to have poor cost because it
>> cannot take advantage of input's distribution information [3]
>> So, step 2 - is a trait set propagation which is needed by many
>> distributed engines. Step 3 - an attempt to workaround current
>> VolcanoPlanner behavior, when a parent rule is fired only if produced child
>> node has compatible trait set.
>> 
>> I do not know Calcite's architecture that good but at first glance, the
>> proposed ability to re-fire rules of a specific Rel seems good enough. It
>> doesn't expand search space, because no new nodes are created, and it seems
>> to be relatively simple to implement.
>> 
>> [1]
>> https://github.com/apache/drill/blob/1.16.0/exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/SubsetTransformer.java
>> [2]
>> https://github.com/apache/drill/blob/1.16.0/exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/ProjectPrule.java#L66
>> [3]
>> https://github.com/apache/drill/blob/1.16.0/exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/ProjectPrule.java#L69
>> 
>> чт, 31 окт. 2019 г. в 12:21, Danny Chan :
>> 
>>> Thanks Vladimir for bringing up this discussion !
>>> 
>>> Did you notice that there is a JIRA issue about this problem ? [1] Also a
>>> discussion about how to propagate the traits [2]
>>> 
>>> [1] https://issues.apache.org/jira/browse/CALCITE-2970
>>> [2]
>>> https://ponymail-vm.apache.org/_GUI_/thread.html/79dac47ea50b5dfbd3f234e368ed61d247fb0eb989f87fe01aedaf25@%3Cdev.calcite.apache.org%3E
>>> 
>>> Best,
>>> Danny Chan
>>> 在 2019年10月31日 +0800 PM3:56,Vladimir Ozerov ,写道:
 Hi colleagues,
 
 I would like to discuss with the community the possibility of adding a
>>> new
 public method to VolcanoPlanner which will forcefully re-trigger the
>>> rules
 for the specific rel. This is a follow up of a discussion started in the
 other thread [1].
 
 **Problem statement**
 When converting between conventions during optimization VolcanoPlanner
 prefers the top-bottom approach, so that the nodes are converted from the
 root. But in some cases, the intermediate node must be converted after
>>> its
 children. This is especially true for distributed SQL engines, which rely
 on distribution traits during the optimization process: it is not
>>> possible
 to efficiently choose a proper physical implementation of a parent node
 unless the physical representation of a child node is known.

Re: [DISCUSS] On-demand traitset request

2019-10-31 Thread Jinfeng Ni
Hi Xiening,

"Let say if R and S doesn’t have sorting properties at all. In your
case, we would end up adding enforcers for LHS and RHS to get
collation (a, b, c). Then we would need another enforcer to get
collation (b, c). This is a sub optimal plan as we could have use (b,
c, a) for join."

In such case, for step 2 when MergeJoin request a permutation match of
(a, b,c) on both it's input, it is not necessary to end up with
collation (a, b, c) only. Since it request "permutation", MJ could ask
all possible satisfying collations, which include (b, c, a). In other
words, the steps I described did not exclude such plan.

You may argue it would increase the search space. However,  by
limiting the search space, without explore all possible choice, we may
lose the chance getting 'optimal' plan we want.  For instance, in the
above example, the idea of passing "on demand" trait request (b,c)
from Agg to MJ is to avoid unnecessary sort (b,c).  In cases where the
join condition has good filtering, and such sort of join output could
be quite cheap.  Yet in the plan enumeration, since we use "on demand"
trait request from parent to guide the actions of MJ, I'm not sure if
we may restrict the choices we consider in the legs of join, whose
cardinality could be larger and play a bigger role in the overall
cost.

In other words, by using "on demand" trait request, we may restrict
the choices of plan, possibly in the some operators with larger data
size.

In the current implementation of VolcanoPlanner, I feel the root issue
of long planning time is not to explore all possible satisfying trait.
It is actually the unnecessary of AbstractConverter, added to the
equivalence class.


On Fri, Oct 18, 2019 at 10:39 PM Xiening Dai  wrote:
>
> Thanks for the sharing. I like the way you model this problem, Jinfeng.
>
> There’s one minor issue with your example. Let say if R and S doesn’t have 
> sorting properties at all. In your case, we would end up adding enforcers for 
> LHS and RHS to get collation (a, b, c). Then we would need another enforcer 
> to get collation (b, c). This is a sub optimal plan as we could have use (b, 
> c, a) for join.
>
> I think in step #2, the join operator would need to take the agg trait 
> requirement into account. Then it would have two options -
>
> 1) require *exact/super* match of  (b, c, a) or (c, b, a); this is to 
> guarantee the join output would deliver the collation agg needs.
> 2) require permutation match of (a, b, c); in such case, an enforcer might be 
> needed for aggregation.
>
> Eventually the cost model decides who is the winner.
>
> There’s a fundamental difference between your model and Haisheng’s proposal. 
> In Haisheng’s case, a rel node not only looks at its parent’s requirement, 
> but also tries to get the potential traits its input could deliver. It would 
> try to align them to eliminate unnecessary alternatives.
>
> In above example, assuming R is (b, c, a) and S is (a, b, c), to implement 
> option 1), we would generate two alternatives -
>
> MergeJoin (b, c, a)
> TableScan R
> Sort(b, c, a)
> TableScan S
>
> MergeJoin(c, b, a)
> Sort(c, b, a)
> TableScan R
> Sort(c, b, a)
> TableScan S
>
> But if we look at the input traits and has the insight that R already 
> delivers (b, c, a), we could decide to require (b, c, a) only and avoid 
> generating the 2nd plan, which is definitely worse, and reduce the search 
> space.
>
>
> > On Oct 18, 2019, at 4:57 PM, Jinfeng Ni  wrote:
> >
> > A little bit of history.  In Drill,  when we first implemented
> > Distribution trait's definition,  we allows both exact match and
> > partial match in satisfy() method. This works fine for single-input
> > operator such aggregation, however it leads to incorrect plan for join
> > query, i.e LHS shuffle with (a, b),  RHS shuffle with (a) .  At that
> > time, we removed partial match, and use exact match only. Yet this
> > changes leads to unnecessary additional exchange.  To mitigate this
> > problem, in join physical operator, for a join key (a, b, c),  we
> > enumerate different distribution requests, yet this lead to more space
> > to explore and significantly increase planning time (which is probably
> > what Haisheng also experienced).  When I look back, I feel probably
> > what we miss is the "coordination" step in the join operator, because
> > if we relax the requirement of satisfy(), for multi-input operators,
> > we have to enforce some "coordination", to make sure multiple input's
> > trait could work together properly.
> >
> >
> >
> > On Fri, Oct 18, 2019 at 4:38 PM Jinfeng Ni  wrote:
> >>
> >> This is an interesting topic. Thanks for bringing up this issue.
> >>
> >> My understanding of Volcano planner is it works in a top-down search
> >> mode (the parent asks for certain trait of its child), while the trait
> >> propagates in a bottom-up way, as Stamatis explained.
> >>
> >> IMHO, the issue comes

Re: [DISCUSS] Proposal to add API to force rules matching specific rels

2019-10-31 Thread Jinfeng Ni
I think "pull-up traits" is necessary, since the physical traits are
propagated  upwards. However, I'm not fully convinced "On Demand
Trait" is the best solution, as I feel it may restrict the choices the
planner may consider.  Maybe after the proposed design doc is ready,
we may look into the detail and reevaluate.

One question that I have always had ever since I started using Calcite
couple of years ago,  is the concept of Convention being a trait, and
introduction of AbstractConverter in Calcite's VolcanoPlanner.
Quickly looking through the original paper of Volcano [1], or the new
one [2], I did not see mentioning of such concepts.  The original
paper has a classification of "transformation rules" which operates on
logical relation expression  and "implementation rules" which provides
the mapping to physical operators.


1. https://paperhub.s3.amazonaws.com/dace52a42c07f7f8348b08dc2b186061.pdf
2. 
https://www.cse.iitb.ac.in/infolab/Data/Courses/CS632/Papers/volcano-prasan.pdf

On Thu, Oct 31, 2019 at 2:40 PM Xiening Dai  wrote:
>
> Actually we solved this problem in our setup using a mechanism called 
> “Pull-Up Traits”, which explores the possible trait set of children’s input 
> to decide parent’s physical properties. In order to determine child input 
> trait, you would have to look at child’s children, and all the way to the 
> leaves nodes or a barrier. A barrier is a rel node which cannot derive any 
> traits regardless the input. A good example would be a user define function 
> which would throw off any distribution or collation. Then we realize just 
> pulling up is not enough, sometimes we would need to look at parent’s 
> requirement as well. So we try to solve this in a unified framework, which we 
> call “On Demand Trait” and implement it as part of the framework so anyone 
> can be benefited. I hope Haisheng can share a design doc once we have more 
> concrete ideas.
>
>
> > On Oct 31, 2019, at 11:37 AM, Jinfeng Ni  wrote:
> >
> > Hi Vladimir,
> >
> > The SubsetTransformer interface and the iterating over the RelNodes
> > within a RelSubset in Drill  is exactly implemented to do the trait
> > propagation. We also had to rely on AbstractConverter to fire
> > necessary rule to avoid the CanNotPlan issue. At some point, Calcite
> > community chooses to remove AbstractConverter and Drill had to add it
> > back, which is probably one of the main reasons for us to continue
> > using a Calcite fork.  I still remember we constantly had to deal with
> > the dilemma between "CanNotPlan" and long planing time due to explored
> > search space.
> >
> > Glad to see more people are joining the effort to solve this long
> > overdue issue, something missing in Calcite's core optimizer framework
> > "since before Calcite was Calcite" (Jacques's words).
> >
> > Jinfeng
> >
> >
> > On Thu, Oct 31, 2019 at 3:38 AM Vladimir Ozerov  wrote:
> >>
> >> Hi Danny,
> >>
> >> Thank you very much for the links. What is described here is pretty much
> >> similar to the problem I describe. Especially the discussion about trait
> >> propagation, as this is basically what I need - to explore potential traits
> >> of children before optimizing parents. And this is basically what Drill
> >> already does with it's SubsetTransformer:
> >> 1) There is a SubsetTransformer interface, which iterates over physical
> >> relations of the given subset [1]
> >> 2) If you want to make a physical project, you iterate over physical
> >> relations of the input subset and create possible physical projects [2]
> >> 3) But if you cannot find any physical input, then you trigger creation of
> >> a "bad" physical project, which is very likely to have poor cost because it
> >> cannot take advantage of input's distribution information [3]
> >> So, step 2 - is a trait set propagation which is needed by many
> >> distributed engines. Step 3 - an attempt to workaround current
> >> VolcanoPlanner behavior, when a parent rule is fired only if produced child
> >> node has compatible trait set.
> >>
> >> I do not know Calcite's architecture that good but at first glance, the
> >> proposed ability to re-fire rules of a specific Rel seems good enough. It
> >> doesn't expand search space, because no new nodes are created, and it seems
> >> to be relatively simple to implement.
> >>
> >> [1]
> >> https://github.com/apache/drill/blob/1.16.0/exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/SubsetTransformer.java
> >> [2]
> >> https://github.com/apache/drill/blob/1.16.0/exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/ProjectPrule.java#L66
> >> [3]
> >> https://github.com/apache/drill/blob/1.16.0/exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/ProjectPrule.java#L69
> >>
> >> чт, 31 окт. 2019 г. в 12:21, Danny Chan :
> >>
> >>> Thanks Vladimir for bringing up this discussion !
> >>>
> >>> Did you notice that there is a JIRA issue about this problem ? [1] Also a
> >>> discussion about how to propagate th