Re: Query optimization time with Calcite

2023-08-23 Thread Roman Kondakov

Hi Abhijit,

there are actually a lot of places where time is spent. I think the most 
time consuming places are jvm lazy class loading and code generation for 
metadata framework.


Anyway, on a long distance the planning time will be very small.

Thanks.

--

Roman

On 24.08.2023 00:51, Abhijit Subramanya wrote:

Thanks a lot for your help.
Would you happen to have any insights into where exactly the time is spent
during the warm up?

On Tue, Aug 22, 2023 at 11:58 PM Roman Kondakov 
wrote:


Hi Abhijit,

Usually the very first query is optimized a bit long in Calcite. But
after some warm-up the next queries will be planned faster and faster.

If you use Calcite as a part of a server application, then the problem
with a long query planning will be eliminated after a few query runs.

Thanks.

--

Roman

On 23.08.2023 01:33, Abhijit Subramanya wrote:

Hi,

I am currently working with Apache Calcite and using the
VolcanoPlanner to optimize queries. I've noticed that the

planner.setRoot()

function can take approximately 100ms to complete. After some profiling,

it

appears that the JaninoRelMetadataProvider.compile function might be the
bottleneck.


  Is this level of optimization time expected for a simple query like:


SELECT * FROM table1

LEFT JOIN table2 ON table1.a = table2.a

WHERE d >= '2023-08-01'

LIMIT 1000;


I'm running this on my MacBook, which has macOS Ventura, 16GB of RAM, and
10 CPU cores. I am using the latest version of calcite from the github

repo.


Are there specific settings or configurations I should be aware of for
optimizing query compilation times in Calcite?


Thanks



Re: Query optimization time with Calcite

2023-08-22 Thread Roman Kondakov

Hi Abhijit,

Usually the very first query is optimized a bit long in Calcite. But 
after some warm-up the next queries will be planned faster and faster.


If you use Calcite as a part of a server application, then the problem 
with a long query planning will be eliminated after a few query runs.


Thanks.

--

Roman

On 23.08.2023 01:33, Abhijit Subramanya wrote:

Hi,

   I am currently working with Apache Calcite and using the
VolcanoPlanner to optimize queries. I've noticed that the planner.setRoot()
function can take approximately 100ms to complete. After some profiling, it
appears that the JaninoRelMetadataProvider.compile function might be the
bottleneck.


 Is this level of optimization time expected for a simple query like:


SELECT * FROM table1

LEFT JOIN table2 ON table1.a = table2.a

WHERE d >= '2023-08-01'

LIMIT 1000;


I'm running this on my MacBook, which has macOS Ventura, 16GB of RAM, and
10 CPU cores. I am using the latest version of calcite from the github repo.


Are there specific settings or configurations I should be aware of for
optimizing query compilation times in Calcite?


Thanks



Re: Key-Based Scans and Joins

2023-07-28 Thread Roman Kondakov

Hi Minseok,

I'm not sure I fully understand your requirements. But I think that 
Correlate/EnumerableCorrelate may be useful in your case. It is similar 
to nested loop join, but it adds an additional filter for the right side 
with the correlation variable (cor0):


  EnumerableCorrelate(correlation=[$cor0], joinType=[left], 
requiredColumns=[{7}])

    EnumerableTableScan(table=[[CATALOG, SALES, EMP]])
    EnumerableFilter(condition=[=($cor0.DEPTNO, $0)])
  EnumerableTableScan(table=[[CATALOG, SALES, DEPT]])

Thanks,

Roman

On 18.07.2023 01:10, Kim, Min-Seok wrote:

Hello All,

I'm creating a query system using Apache Calcite. My storage method can
only look up and scan by key, and doesn't support full scans without a key.
I've made the storage table using FilterableTable, which requires a
specific key in the filter.

I can do single queries like

*SELECT * FROM emps as e WHERE e.empno = 100*

where '*emps*' is stored with the key 'empno'. But, joins are more
difficult. For example, in

*SELECT * FROM emps as e JOIN depts as d ON e.deptno = d.deptno WHERE
e.empno = 100*

I can use 'e.empno = 100' to scan '*emps*'. Ideally, I'd use the result of
that scan when scanning '*depts*'

But, the current plan is:
EnumerableHashJoin
 ...
 EnumerableInterpreter
 BindableTableScan(table=[[SALES, EMPS]], filters=[[=($1, 100)]]) <-
*FilterableTable* is Used, it is feasible.
 EnumerableCalc
 EnumerableTableScan(table=[[SALES, EMPS]]) <- FilterableTable is
used but the filters in scan are empty so it is infeasible.

Instead, I want to do:

*EnumerableCustomHashJoin* <- custom HashJoin
 EnumerableCalc
 *EnumerableCustomTableScan*(table=[[SALES, EMPS]])  <- custom Scan
EnumerableInterpreter
 BindableTableScan(table=[[SALES, EMPS]],
filters=[[=($1, 100)]])

I need advice on how to make this happen. Any advice or insights you could
provide would be greatly appreciated. Looking forward to hearing from you.

Thanks
Minseok



Re: Optimal way to organize Joins in Calcite

2023-07-06 Thread Roman Kondakov

Hi Jonathan,

if you are using custom RelOptCost with custom cost model, you can apply 
your cost model to the logical nodes as well.


1. You need to initialize the planner with your implementation of the 
RelOptCostFactory


2. You can also override the default cost formula for any node (logical 
or physical) using this metadata handler [1]


[1] 
https://github.com/apache/calcite/blob/2dba40e7a0a5651eac5a30d9e0a72f178bd9bff2/core/src/main/java/org/apache/calcite/rel/metadata/RelMdPercentageOriginalRows.java#L186


Thanks.

Roman.

On 07.07.2023 02:46, Jonathan Sternberg wrote:

Thanks.

For MultiJoin, I'm trying to get it to work with a custom cost model
and custom output convention. We have our costs for operations as part of
the implementation of the physical nodes. Since MultiJoin uses the costs to
determine the join ordering, I'm a bit concerned that it is using the costs
from the logical plans rather than our custom ones. Is it possible to
utilize the physical nodes with MultiJoin or does it have to be utilized
with logical nodes only?

--Jonathan Sternberg

On Tue, Jul 4, 2023 at 1:43 AM Roman Kondakov 
wrote:


Hi Jonathan,

1. As Julian mentioned, it's better to use heuristic join order for
large amount of joins

2. LoptOptimizeJoinRule and MultiJoinOptimizeBushyRule AFAIK always
produce tree of joins, not a MultiJoin.

3. Yes, your understanding is correct. You can check the default join
order program [1]

[1]

https://github.com/apache/calcite/blob/2dba40e7a0a5651eac5a30d9e0a72f178bd9bff2/core/src/main/java/org/apache/calcite/tools/Programs.java#L186

Thanks,

Roman.

On 03.07.2023 22:48, Julian Hyde wrote:

The reason that there are two strategies is because of large joins. If

your query joins 10 tables, the number of possible join orders is large
(bounded by 10 factorial I believe) and therefore would overwhelm the
Volcano planner, which must construct each possibility.

Therefore we have a heuristic algorithm that you should use for large

joins. We gather the entire FROM clause into a data structure called
MultiJoin, and a single rule call applies heuristics and spits out a join
order that is probably close to optimal.

When you are optimizing a query, you need to know whether you are in

danger of being swallowed by the monster that is the complexity of large
joins. If your query only joins 2 or 3 tables (and in some other situations
too) you are not in danger and can safely exhaustively enumerate plans.

On Jul 3, 2023, at 7:58 AM, Jonathan Sternberg 

wrote:

Hi,

I'm presently working on optimizing the ordering of joins for queries

and

had a few questions about the optimal way to do that with Calcite.

I watched this meetup video (

https://www.youtube.com/watch?v=5wQojihyJDs)

and spent some time experimenting with JoinAssociateRule,

JoinCommuteRule,

and the rules related to MultiJoins. We're utilizing the volcano planner
for optimization at the present moment but also have the freedom to
customize the order and phases for the planner phases.

1. Is MultiJoin generally suggested over JoinAssociate and JoinCommute
rules? Or are JoinAssociate and JoinCommute still recommended as the
standard way to handle reordering of joins?
2. Our system only supports performing the join over two inputs and we
can't support MultiJoin as a physical operation. My understanding is

that

the LoptOptimizeJoinRule and MultiJoinOptimizeBushyRule will rearrange

the

join but will still produce a MultiJoin. What's the appropriate way to
convert a MultiJoin back to a set of joins?
3. My understanding is that MultiJoin rules aren't compatible with the
volcano planner and should be run as part of a stage using the heuristic
planner. Is this understanding correct?

Thank you for any help.

--Jonathan Sternberg


Re: Optimal way to organize Joins in Calcite

2023-07-03 Thread Roman Kondakov

Hi Jonathan,

1. As Julian mentioned, it's better to use heuristic join order for 
large amount of joins


2. LoptOptimizeJoinRule and MultiJoinOptimizeBushyRule AFAIK always 
produce tree of joins, not a MultiJoin.


3. Yes, your understanding is correct. You can check the default join 
order program [1]


[1] 
https://github.com/apache/calcite/blob/2dba40e7a0a5651eac5a30d9e0a72f178bd9bff2/core/src/main/java/org/apache/calcite/tools/Programs.java#L186


Thanks,

Roman.

On 03.07.2023 22:48, Julian Hyde wrote:

The reason that there are two strategies is because of large joins. If your 
query joins 10 tables, the number of possible join orders is large (bounded by 
10 factorial I believe) and therefore would overwhelm the Volcano planner, 
which must construct each possibility.

Therefore we have a heuristic algorithm that you should use for large joins. We 
gather the entire FROM clause into a data structure called MultiJoin, and a 
single rule call applies heuristics and spits out a join order that is probably 
close to optimal.

When you are optimizing a query, you need to know whether you are in danger of 
being swallowed by the monster that is the complexity of large joins. If your 
query only joins 2 or 3 tables (and in some other situations too) you are not 
in danger and can safely exhaustively enumerate plans.


On Jul 3, 2023, at 7:58 AM, Jonathan Sternberg  wrote:

Hi,

I'm presently working on optimizing the ordering of joins for queries and
had a few questions about the optimal way to do that with Calcite.

I watched this meetup video (https://www.youtube.com/watch?v=5wQojihyJDs)
and spent some time experimenting with JoinAssociateRule, JoinCommuteRule,
and the rules related to MultiJoins. We're utilizing the volcano planner
for optimization at the present moment but also have the freedom to
customize the order and phases for the planner phases.

1. Is MultiJoin generally suggested over JoinAssociate and JoinCommute
rules? Or are JoinAssociate and JoinCommute still recommended as the
standard way to handle reordering of joins?
2. Our system only supports performing the join over two inputs and we
can't support MultiJoin as a physical operation. My understanding is that
the LoptOptimizeJoinRule and MultiJoinOptimizeBushyRule will rearrange the
join but will still produce a MultiJoin. What's the appropriate way to
convert a MultiJoin back to a set of joins?
3. My understanding is that MultiJoin rules aren't compatible with the
volcano planner and should be run as part of a stage using the heuristic
planner. Is this understanding correct?

Thank you for any help.

--Jonathan Sternberg


Re: Share intermediate data between nodes during plan execution

2023-05-23 Thread Roman Kondakov

Hi Vladislav,

Can you share more details regarding your use case?

If you want to share the same object between join inputs, you can use 
RelShuttleImpl [1] to traverse the RelNode tree and every time you are 
visiting a Join relation you need to put the object to the stack and 
remove it from there when you are leaving the Join. In this case the 
shared object will be available on the top of the stack for both sides 
of the Join.


[1] 
https://github.com/apache/calcite/blob/b0420947ddeb2c9e18afe896dbf7adaf49a715a2/core/src/main/java/org/apache/calcite/rel/RelShuttleImpl.java#L49


Thanks,

Roman

On 24.05.2023 05:44, Vladislav Matyushevski wrote:

Greetings Calcite team!

I`m using the Calcite + Avatica and curious to know:
Is there a way how I can share an Object (e.g. Map) between left and right
parts of join?
Sharing is needed due to the fact that the left part is producing data that
must be re-used in the right part.
I`m thinking of the following solution:
There is a getProvider() method, which returns me CalciteJDBC41Connection.
And if I have my own implementation of this class then I`ll be able to
extend it and enrich the class with methods I need. However, it
doesn't feel like a good idea. I believe there might be a proper solution
for this.


Thanks in advance,
Vladislav



[jira] [Created] (CALCITE-5670) Assertion error in SemiJoinJoinTransposeRule

2023-04-23 Thread Roman Kondakov (Jira)
Roman Kondakov created CALCITE-5670:
---

 Summary: Assertion error in SemiJoinJoinTransposeRule
 Key: CALCITE-5670
 URL: https://issues.apache.org/jira/browse/CALCITE-5670
 Project: Calcite
  Issue Type: Bug
Affects Versions: 1.34.0
Reporter: Roman Kondakov
Assignee: Roman Kondakov
 Fix For: 1.35.0


The following test will raise the assertion error:

{code:java}
  @Test void testPushSemiJoinPastJoinRuleNotHappensJoinKeysDifferentOrigin() {
// tests the case where the semijoin is pushed to the right
final String sql = "select e.ename from emp e, dept d, bonus b\n"
+ "where e.deptno = d.deptno and e.ename = b.ename and d.name = b.job";
sql(sql)
.withRule(CoreRules.FILTER_INTO_JOIN,
CoreRules.JOIN_ADD_REDUNDANT_SEMI_JOIN,
CoreRules.SEMI_JOIN_JOIN_TRANSPOSE)
.check();
  }
{code}


{noformat}
java.lang.AssertionError
at 
org.apache.calcite.rel.rules.SemiJoinJoinTransposeRule.onMatch(SemiJoinJoinTransposeRule.java:112)
at 
org.apache.calcite.plan.AbstractRelOptPlanner.fireRule(AbstractRelOptPlanner.java:337)
at org.apache.calcite.plan.hep.HepPlanner.applyRule(HepPlanner.java:556)
at 
org.apache.calcite.plan.hep.HepPlanner.applyRules(HepPlanner.java:420)
at 
org.apache.calcite.plan.hep.HepPlanner.executeRuleInstance(HepPlanner.java:243)
at 
org.apache.calcite.plan.hep.HepInstruction$RuleInstance$State.execute(HepInstruction.java:178)
at 
org.apache.calcite.plan.hep.HepPlanner.lambda$executeProgram$0(HepPlanner.java:211)
at 
com.google.common.collect.ImmutableList.forEach(ImmutableList.java:422)
at 
org.apache.calcite.plan.hep.HepPlanner.executeProgram(HepPlanner.java:210)
at 
org.apache.calcite.plan.hep.HepProgram$State.execute(HepProgram.java:118)
at 
org.apache.calcite.plan.hep.HepPlanner.executeProgram(HepPlanner.java:205)
at 
org.apache.calcite.plan.hep.HepPlanner.findBestExp(HepPlanner.java:191)
at 
org.apache.calcite.test.RelOptFixture.checkPlanning(RelOptFixture.java:379)
at org.apache.calcite.test.RelOptFixture.check(RelOptFixture.java:330)
at org.apache.calcite.test.RelOptFixture.check(RelOptFixture.java:314)
at 
org.apache.calcite.test.RelOptRulesTest.testPushSemiJoinPastJoinRuleNotHappensJoinKeysDifferentOrigin(RelOptRulesTest.java:2650)
{noformat}

The problem here is that the case when top-most Semi-Join has join keys from 
both tables of the bottom Join is considered impossible, though it happens in 
practice.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: Question Regarding Volcano Planner

2022-08-28 Thread Roman Kondakov

Hello Pranav,

you also can split your optimization process into several steps. For 
example you can add a heuristic phase (using HEP planner) before Volcano 
planning with filters pushdown. It is almost always a beneficial 
operation so you can avoid polluting search space with plans that are 
known to be not very good in advance.


You can find examples of optimization phases here [1]

[1] 
https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/planner/PlannerPhase.java


Thanks,

Roman

On 29.08.2022 03:54, Haisheng Yuan wrote:

You need to first investigate whether the transformation rules or operators are 
correctly implemented or not. Typically there is a bug if you see the planner 
fails to terminate.

There is cancelFlag variable and checkCancel() method in VolcanoPlanner.
You can pass your own CancelFlag implementation to Context when constructing 
planner [1] so that you can interrupt the planning when it times out or the 
cost the greater or lower than some threshold.

[1] 
https://github.com/apache/calcite/blob/e2f949d5d6cff79cbe8565bc3e85f437dee9fd7e/core/src/main/java/org/apache/calcite/plan/AbstractRelOptPlanner.java#L102

On 2022/08/26 20:29:15 Pranav Deshpande wrote:

Dear Apache Calcite Dev Team,
Sometimes my volcano planner fails to terminate because there are many
plans to search for (millions of them).

Is there a way to prune the search if a specific amount of time has elapsed
or if the same net cost is obtained for the Xth time, etc.

Requesting the community's guidance on this.

Thanks & Regards,
Pranav



[jira] [Created] (CALCITE-5170) Assertion error on range distribution creation

2022-05-29 Thread Roman Kondakov (Jira)
Roman Kondakov created CALCITE-5170:
---

 Summary: Assertion error on range distribution creation
 Key: CALCITE-5170
 URL: https://issues.apache.org/jira/browse/CALCITE-5170
 Project: Calcite
  Issue Type: Bug
  Components: core
Affects Versions: 1.30.0
Reporter: Roman Kondakov
Assignee: Roman Kondakov
 Fix For: 1.31.0


An assertion error occurs on range distribution creation. This test fails:
{code:java}
@Test void testRangeRelDistributionKeys() {
RelDistributions.range(Arrays.asList(0, 1));
}
{code}
This happens because there is an incorrect assertion in {{RelDistributionImpl}} 
constructor:

{code:java}
  assert type == Type.HASH_DISTRIBUTED
  || type == Type.RANDOM_DISTRIBUTED
  || keys.isEmpty();
{code}
It should be  {{RANGE_DISTRIBUTED}} instead of {{RANDOM_DISTRIBUTED}}





--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Created] (CALCITE-5166) Method accept(RelShuttle) is not overridden in LogicalCalc and LogicalTableModify

2022-05-22 Thread Roman Kondakov (Jira)
Roman Kondakov created CALCITE-5166:
---

 Summary: Method accept(RelShuttle) is not overridden in 
LogicalCalc and LogicalTableModify
 Key: CALCITE-5166
 URL: https://issues.apache.org/jira/browse/CALCITE-5166
 Project: Calcite
  Issue Type: Bug
  Components: core
Affects Versions: 1.30.0
Reporter: Roman Kondakov
Assignee: Roman Kondakov
 Fix For: 1.31.0


Method {{RelNode#accept(RelShuttle)}} is not overridden for {{LogicalCalc}} and 
{{{}LogicalTableModify{}}}. This leads to the bug when logic implemented in 
{{RelShuttle#visit(LogicalCalc)}} is never applied because {{visit()}} method 
is never called due to incorrect dispatching.

This test will fail without {{accept(RelShuttle)}} method is overridden in Calc:
{code:java}
@Test void testRelShuttleForLogicalCalc() {
final String sql = "select ename from emp";
final RelNode rel = sql(sql).toRel();
final HepProgramBuilder programBuilder = HepProgram.builder();
programBuilder.addRuleInstance(CoreRules.PROJECT_TO_CALC);
final HepPlanner planner = new HepPlanner(programBuilder.build());
planner.setRoot(rel);
final RelNode calc = planner.findBestExp();
final List rels = new ArrayList<>();
final RelShuttleImpl visitor = new RelShuttleImpl() {
  @Override public RelNode visit(LogicalCalc calc) {
RelNode visitedRel = super.visit(calc);
rels.add(visitedRel);
return visitedRel;
  }
};
visitor.visit(calc);
assertThat(rels.size(), is(1));
assertThat(rels.get(0), isA(LogicalCalc.class));
  }
{code}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


Re: Allow Cascades driver invoking "derive" on the nodes produced by "passThrough"

2022-02-13 Thread Roman Kondakov

Hi Alessandro,

this problem was already discussed on dev-list [1] and we have a ticket 
for this [2].


My concern is that many projects use Calcite as a Lego kit: they took 
internal components of Calcite and combine them for building a custom 
planning and execution pipeline. And sometimes downstream projects need 
to change the default behavior of internal components to fit their 
requirements or overcome the bug. So the idea of keeping even internal 
components of Calcite "more public" is rather a good thing than the bad 
one from my point of view.


Thank you.

[1] https://lists.apache.org/thread/cykl74dcphgow4790fwoc8frsjglz7n1

[2] https://issues.apache.org/jira/browse/CALCITE-4542

--
Roman Kondakov


On 11.02.2022 19:15, Alessandro Solimando wrote:

Hello everyone,
@Vladimir, +1 on the change introducing "enforceDerive()".

@Roman, could you walk us through the limitations you found that forced you
to copy-paste the whole class?

Maybe there is some middle ground for your problem(s) too, similar in
spirit to what Vladimir proposed for the other limitation.

I am not against making the class more public if necessary, but it would be
nice to have a discussion here before going down that path.
If the discussion leads to a better design of the original class, all
projects would benefit from that.

Best regards,
Alessandro

On Fri, 11 Feb 2022 at 04:14, Roman Kondakov 
wrote:


Hi Vladimir,

+1 for making the rule driver more public. We've faced similar problems
in the downstream project. The solution was to copy and paste the
TopDownRuleDrive code with small fixes since it was not possible to
override the default behavior.

--
Roman Kondakov


On 11.02.2022 02:50, Vladimir Ozerov wrote:

Hi,

In the Cascades driver, it is possible to propagate the requests top-down
using the "passThrough", method and then notify parents bottom-up about

the

concrete physical implementations of inputs using the "derive" method.

In some optimizers, the valid parent node cannot be created before the
trait sets of inputs are known. An example is a custom distribution trait
that includes the number of shards in the system. The parent operator

alone

may guess the distribution keys, but cannot know the number of input
shards. To mitigate this, you may create a "template" node with an

infinite

cost from within the optimization rule that will propagate the
passThrough/drive calls but would never participate in the final plan.

Currency, the top-down driver designed in a way that the nodes created

from

the "passThrough" method are not notified on the "derive" stage. This

leads

to the incomplete exploration of the search space. For example, the rule
may produce the node "A1.template" that will be converted into a normal
"A1" node in the derive phase. However, if the parent operator produced
"A2.template" from "A1.template" using pass-through mechanics, the
"A2.template" will never be notified about the concrete input traits,
possibly losing the optimal plan. This is especially painful in

distributed

engines, where the number of shards is important for the placement of
Shuffle operators.

It seems that the problem could be solved with relatively low effort. The
"derive" is not invoked on the nodes created from the "passThrough"

method,

because such nodes are placed in the "passThroughCache" collection.

Instead

of doing this unconditionally, we may introduce an additional predicate
that would selectively enforce "derive" on such nodes. For example, this
could be a default method in the PhysicalNode interface, like:

interface PhysicalNode {
default boolean enforceDerive() { return false; }
}

If there are no objections, I'll proceed with this change.

Alternatively, we may make the TopDownRuleDriver more "public", so that

the

user can extend it and decide within the driver whether to cache a
particular node or not.

I would appreciate your feedback on the matter.

Regards,
Vladimir.



Re: Allow Cascades driver invoking "derive" on the nodes produced by "passThrough"

2022-02-10 Thread Roman Kondakov

Hi Vladimir,

+1 for making the rule driver more public. We've faced similar problems 
in the downstream project. The solution was to copy and paste the 
TopDownRuleDrive code with small fixes since it was not possible to 
override the default behavior.


--
Roman Kondakov


On 11.02.2022 02:50, Vladimir Ozerov wrote:

Hi,

In the Cascades driver, it is possible to propagate the requests top-down
using the "passThrough", method and then notify parents bottom-up about the
concrete physical implementations of inputs using the "derive" method.

In some optimizers, the valid parent node cannot be created before the
trait sets of inputs are known. An example is a custom distribution trait
that includes the number of shards in the system. The parent operator alone
may guess the distribution keys, but cannot know the number of input
shards. To mitigate this, you may create a "template" node with an infinite
cost from within the optimization rule that will propagate the
passThrough/drive calls but would never participate in the final plan.

Currency, the top-down driver designed in a way that the nodes created from
the "passThrough" method are not notified on the "derive" stage. This leads
to the incomplete exploration of the search space. For example, the rule
may produce the node "A1.template" that will be converted into a normal
"A1" node in the derive phase. However, if the parent operator produced
"A2.template" from "A1.template" using pass-through mechanics, the
"A2.template" will never be notified about the concrete input traits,
possibly losing the optimal plan. This is especially painful in distributed
engines, where the number of shards is important for the placement of
Shuffle operators.

It seems that the problem could be solved with relatively low effort. The
"derive" is not invoked on the nodes created from the "passThrough" method,
because such nodes are placed in the "passThroughCache" collection. Instead
of doing this unconditionally, we may introduce an additional predicate
that would selectively enforce "derive" on such nodes. For example, this
could be a default method in the PhysicalNode interface, like:

interface PhysicalNode {
   default boolean enforceDerive() { return false; }
}

If there are no objections, I'll proceed with this change.

Alternatively, we may make the TopDownRuleDriver more "public", so that the
user can extend it and decide within the driver whether to cache a
particular node or not.

I would appreciate your feedback on the matter.

Regards,
Vladimir.



Re: List of tables & Columns

2022-02-06 Thread Roman Kondakov

Hi, Yogendra

there is an utility method RelOptUtil#findAllTables that looks for all 
tables in the rel tree. It doesn't collect column names, but you can 
adopt it for this purpose.


It seems to me that solution with visitor should also work fine.

--
Roman Kondakov


On 07.02.2022 12:45, Zhe Hu wrote:

Hi, Sharma.
As far as I can tell, Calcite doesn’t provide a simple way to get table names 
or column names for a query(or maybe I miss something).
But you can take a look at SqlSelect、SqlOrderBy and SqlWith if you want to do 
such things. Based on my own experience, SqlVisitor seems the only way to get 
table/column names or alias by visiting the SqlNode, at least that’s how I 
extract tables/columns、distinguish SQL type or rewriting SQL in practice.
Hope it’s helpful for you.


Best,
ZheHu




On 02/5/2022 18:39,Yogendra Sharma wrote:
Hi Community,

I had to gather the list of all fully qualified table names and thier column 
names used(referenced) jn a SELECT query.

I simply parsed the query and wrote a visitor to visit the SqlNode recursively.

While its working (may be with few bugs here and there), i am wondering if 
Calcite would already be doing it somewhere ir if there is an easier way to 
achieve this?


Thanks
Y



Get Outlook for Android<https://aka.ms/ghei36>


Re: Issues with some TPC-DS Queries

2021-07-20 Thread Roman Kondakov

Hi Rana,

there also should be a debugging information included into the 
exception: long list of sets, subsets and relational nodes. Could you 
please send it?


Roman Kondakov

On 21.07.2021 09:27, Rana Alotaibi wrote:

Hi Calcite Dev, Thanks for Calcite!

I'm currently using Calcite Core and Plus 1.27.0.  I got the following 
exception when I tried to get the logical plans of 18 out of 99 TPC-DS 
queries:


"There are not enough rules to produce a node with desired properties:"

I attached a sample code that I used when I tried to get the logical 
plan of TPC -DS query 6. The exact exception for that query is:


"Exception in thread "main" 
org.apache.calcite.plan.RelOptPlanner$CannotPlanException: There are not 
enough rules to produce a node with desired properties: 
convention=ENUMERABLE, sort=[0, 1].
Missing conversions are LogicalAggregate[convention: NONE -> ENUMERABLE, 
sort: [] -> [0]], LogicalAggregate[convention: NONE -> ENUMERABLE], 
LogicalProject[convention: NONE -> ENUMERABLE, sort: [] -> [94]]"


I was wondering if there is some planner setup or configurations that I 
missed


Many thanks,
Rana






Re: [ANNOUNCE] New committer: Vladimir Ozerov

2021-06-23 Thread Roman Kondakov

Congratulations, Vladimir!

Roman Kondakov

On 24.06.2021 12:23, 段雄 wrote:

Congratulations!

XING JIN  于2021年6月24日周四 上午10:21写道:


Congratulations ~

Best,
Jin

guangyuan wang  于2021年6月24日周四 上午9:50写道:


Congratulations!

Francis Chuang  于2021年6月24日周四 上午6:39写道:


Congrats, Vladimir!

Francis

On 24/06/2021 7:48 am, Haisheng Yuan wrote:

Congratulations and thanks for your contributions, Vladimir!

Regards,
Haisheng

On 2021/06/23 21:34:40, Stamatis Zampetakis 

wrote:

Apache Calcite's Project Management Committee (PMC) has invited

Vladimir

Ozerov to
become a committer, and we are pleased to announce that he has

accepted.


Vladimir is among the few people who know very well the internal

workings

of the
Calcite optimizer. He started and participated in many discussions

about

the core engine and contributed ideas and code for making it better.
Moreover, Vladimir has blogged and talked about Calcite in various
conferences and meetups giving publicity and showcasing the

capabilities of

the project.

Vladimir, welcome, thank you for your contributions, and we look

forward to

your
further interactions with the community! If you wish, please feel

free

to

tell
us more about yourself and what you are working on.

Stamatis (on behalf of the Apache Calcite PMC)











Re: Using Calcite as a Distributed Optimizer

2021-05-13 Thread Roman Kondakov

Hi Atri,

you can refer to this list [1]. I also aware that Apache Ignite and 
Hazelcast made some efforts to employ Calcite as a distributed query 
planner.



[1] https://calcite.apache.org/docs/powered_by.html

--
Roman Kondakov

On 14.05.2021 14:31, Atri Sharma wrote:

Is there an example I can refer to?

On Fri, 14 May 2021, 04:59 Haisheng Yuan,  wrote:


Yes, definitely. Many distributed big data systems use Apache Calcite to
optimize queries and generate distributed plans.

On 2021/05/13 23:16:10, Atri Sharma  wrote:

Thank you.

So my use case is such that I wish to use Calcite as a two phase

optimizer

-- Get a SQL query,  compile it and optimize it and convert it to a SQL
fragment.

Then run the query on worker nodes, get results on master and merge

results.


This question spans both Calcite and Avatica, but I wanted to understand

if

achieving the above is possible with Calcite today.

Atri

On Fri, 14 May 2021, 01:20 Julian Hyde,  wrote:


Calcite has no user@ list.  So, ask away here!


On May 13, 2021, at 10:52 AM, Atri Sharma 

wrote:


Sorry, didn't realize I had sent to the dev list. I will send to the

user

list

On Thu, 13 May 2021, 15:57 Atri Sharma,  wrote:


Hi All,

Are there examples of using Calcite to compile and optimize queries

to

be run on a set of nodes, and then merge partial results back?

Atri

--
Regards,

Atri
l'apprenant












[jira] [Created] (CALCITE-4590) Incorrect query result with fixed-length string

2021-04-25 Thread Roman Kondakov (Jira)
Roman Kondakov created CALCITE-4590:
---

 Summary: Incorrect query result with fixed-length string
 Key: CALCITE-4590
 URL: https://issues.apache.org/jira/browse/CALCITE-4590
 Project: Calcite
  Issue Type: Bug
  Components: core
Affects Versions: 1.26.0
Reporter: Roman Kondakov


Query may return wrong result when fixed-length strings (CHAR(N)) are used in 
OR/IN clause


{code:java}
@Test void test() {
// Passed.
CalciteAssert.that()
.query("select * from (values (1, 'a'), (2, 'abc')) where EXPR$1 = 'a'")
.returns("EXPR$0=1; EXPR$1=a  \n");

// Failed. Only "EXPR$0=2; EXPR$1=abc\n" is returned
CalciteAssert.that()
.query("select * from (values (1, 'a'), (2, 'abc')) where EXPR$1 = 'a' 
or EXPR$1 = 'abc'")
.returns("EXPR$0=1; EXPR$1=a  \n"
+ "EXPR$0=2; EXPR$1=abc\n");
  }
{code}




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: Top down planner

2021-03-04 Thread Roman Kondakov

Hello,

Apache Ignite currently employs the top-down approach. Take a look at 
this package [1].


The only documentation about the top-down I have seen so far can be 
found in this javadoc [2]


You can also read several discussions on the devlist about this topic 
[3], [4]


[1] 
https://github.com/apache/ignite/tree/sql-calcite/modules/calcite/src/main/java/org/apache/ignite/internal/processors/query/calcite
[2] 
https://github.com/apache/calcite/blob/master/core/src/main/java/org/apache/calcite/rel/PhysicalNode.java
[3] 
https://lists.apache.org/thread.html/r38ea71968c069f465921e7197488329c15413b46831c90ad4d48f87e%40%3Cdev.calcite.apache.org%3E
[4] 
https://lists.apache.org/thread.html/d1ce6e5201ba1a91a28da968f3294d7c09b1649fbfd3fb131231fede%40%3Cdev.calcite.apache.org%3E


--
Roman Kondakov

On 04.03.2021 10:42, Priyendra Deshwal wrote:

Hello friends,

What is the best resource for learning how to use the top down planner flag
in calcite? Are there any open source projects that use that capability
that I can study to learn more? I would like to see examples of projects
that use this capability to "distribute" a plan over multiple nodes using
traits like distribution / collation etc.

Regards!



Re: Apache Calicte learning curve

2020-08-12 Thread Roman Kondakov

Hi Abhishek,

IMO the first thing you need to do is to read the whitepaper [1] to 
understand the big picture.


Then you can read the Calcite presentation [2] where all main API's are 
very well described.


And the last step, which is my main source of knowledge about Calcite, 
is to explore the source code of systems that already use Calcite for 
query processing. I would recommend Apache Ignite [3], Apache Drill [4], 
Hazelcast [5] and many others that you can find here [6].




[1] https://arxiv.org/pdf/1802.10233.pdf
[2] 
https://www.slideshare.net/JordanHalterman/introduction-to-apache-calcite

[3] https://github.com/apache/ignite
[4] https://github.com/apache/drill
[5] https://github.com/hazelcast/hazelcast
[6] https://calcite.apache.org/docs/powered_by.html

--
Roman Kondakov

On 11.08.2020 20:09, Abhishek Dasgupta wrote:

Hi Team,
I am working as a Software Developer in data migration sector. We use the
Apache-Calcite framework. I am trying to learn how to use it and apparently
it has a steep learning curve. I went through the documentation but as a
beginner it is difficult to grasp. Can you provide me some useful links
where to start learning Apache Calcite apart from the documentation. Can
you provide a chronological order  what to understand first i.e a step by
step list.

Thanks,
Abhishek Dasgupta



Re: [DISCUSS] Make RexNode serializable

2020-07-07 Thread Roman Kondakov

Hi Rui,

AFAIK, RelNodes can be serialized to and deserialized from JSON format. 
See test [1] as an example. If I understand it correct,  RelNodes are 
serialized along with enclosed RexNodes, so you can transfer them over 
the network as plain strings.


[1] 
https://github.com/apache/calcite/blob/f64cdcbb9f6535650f0227da19640e736496a9c3/core/src/test/java/org/apache/calcite/plan/RelWriterTest.java#L88


--
Roman Kondakov

On 07.07.2020 22:13, Enrico Olivelli wrote:

Rui

Il Mar 7 Lug 2020, 20:30 Rui Wang  ha scritto:


Hi Community,

In Apache Beam we are facing a use case where we need to keep RexNode in
our distributed primitives. Because of the nature of distributed computing,
Beam requires the usage of those primitives be serializable (thus those
primitives can be sent over the network to backend/workers for
further execution).

In the Java world this requirement means to make RexNode implement the Java
Serializable interface.

A workaround right now is to create a bunch of classes to "clone" RexNode
while making those classes implement the Serializable interface.



Did you evaluate to use some framework like Kryo that allows you to
serialize Jon serializable classes?

I think that in general Java serialisation is not efficient as it is too
general purpose.
It also brings in a few Security issues.

Maybe an alternative idea is to add some serialisation ad-hoc mechanism in
RexNode.
We should also ensure that every RexNode will be able to be serialized and
deserialized.

Enrico



So what do you think of the idea that makes RexNode implement the
Serializable interface?


-Rui





Re: Doubts about VolcanoPlannerPhase

2020-07-07 Thread Roman Kondakov

Hi Aron,

there was a small discussion about this several months ago [1].

[1] https://github.com/apache/calcite/pull/1840#discussion_r387967624

--
Roman Kondakov

On 07.07.2020 09:52, JiaTao Tao wrote:

Seems only use OPTIMIZE phase, can we remove this mechanism?


Regards!

Aron Tao



Re: Using indexes rather than table scans with Calcite

2020-06-03 Thread Roman Kondakov
Hi Haisheng,

> If I didn't get it wrong, joining with scan_A means it has to read all the 
> tuples from table A, this will make index meaningless, because the purpose of 
> using index in that example is to avoid reading all the tuples from table A.

I meant 'rowId' as a physical address (ctid in PG or rowId in Oracle).
We can combine those rowIds in some way (by analogy with bitmap index)
and then use output rowids to fetch only those rows which we actually
need. Logically it is equivalent to join and can be implemented as a
nested loop join: we can iterate over the resulted rowids and fetch
corresponding rows from the row store.

> Perhaps you have different definition of Join, but the Join operator in 
> Calcite doesn't have correlation variable, only Correlate operator has, which 
> is Calcite's version of Apply. You still need the rule to transform Join to 
> Correlate, aka, Apply. In Calcite, the rule is JoinToCorrelateRule. However I 
> don't recommend using JoinToCorrelateRule, because it will transform Join to 
> Correlate unconditionally, in case of multi-way joins with join reordering 
> enabled, it will generate a lot of useless alternatives, unless you want to 
> support multi-level correlation, like:

We are going to use Correlate, I just skipped this detail in the
previous mail. You are right that it can inflate the search space. May
be we'll need to think of some pruning techniques later if it become a
problem. As I can see, this approach has some advantages because it
seems more flexible to me. In addition to the ability you mentioned to
use multi-level correlated join, it provides some kind of freedom
because we are not restricted to the particular pattern of
Join2IndexApply rule. In other words: we need to match Join2IndexApply
rule on some concrete pattern like

Join
  RelNode
  TableScan

But what if there is a filter or project between join and scan? We can
add specific patterns for this rule

Join
  RelNode
  Project
TableScan

Join
  RelNode
  Filter
TableScan

but still it's not enough to cover all possible cases. What if there are
some other RelNodes like aggregate there? We cannot foresee all possible
cases. With Correlation approach we just try to push down the filter
with correlation varible as much as possible. If it is possible to this
filter to reach the IndexScan, it becomes a very efficient index join
with low cost. Otherwise optimizer throws this alternative away. It can
be helpful in some queries like

SELECT * FROM
deps d
JOIN
(SELECT depId, COOUNT(*) FROM emps GROUP BY depId) AS e
ON d.depId = e.depId
WHERE d.name = "AAA"

if table deps is relatively small (affter filtering) and emps is big it
will be more efficient to do correlated nested loop join:

CorrelatedJoin(d.depId = e.depId)
  Filter(d.name = "AAA")
TableScan(deps)
  Agg(depId, COOUNT(*))
   IndexScan(emps, lookup: e.depId = d.depId[correlatedVar])

because in this case we don't have to do a full aggregation over the
'emps' table before doing a join:

HashJoin(d.depId = e.depId)
  Filter(d.name = "AAA")
TableScan(deps)
  Agg(depId, COOUNT(*)) <- full aggregation
   TableScan(emps)

So, tradeoff between rule-based approach and MV approach is classic:
space search inflation vs more plan alternatives.

Thank you!

-- 
Kind Regards
Roman Kondakov


On 03.06.2020 05:42, Haisheng Yuan wrote:
>> using this approach, bitmap indexes may be represented as set operations
>> (union, intersect) over idx_a and idx_b followed by joining with scan_A
>> using 'rowId'.
> 
> If I didn't get it wrong, joining with scan_A means it has to read all the 
> tuples from table A, this will make index meaningless, because the purpose of 
> using index in that example is to avoid reading all the tuples from table A.
> 
>> and it looks like it can be implemented without special
>> Join2IndexApplyRule: we'll use correlation variable from join condition
>> as filter condition.
> 
> Perhaps you have different definition of Join, but the Join operator in 
> Calcite doesn't have correlation variable, only Correlate operator has, which 
> is Calcite's version of Apply. You still need the rule to transform Join to 
> Correlate, aka, Apply. In Calcite, the rule is JoinToCorrelateRule. However I 
> don't recommend using JoinToCorrelateRule, because it will transform Join to 
> Correlate unconditionally, in case of multi-way joins with join reordering 
> enabled, it will generate a lot of useless alternatives, unless you want to 
> support multi-level correlation, like:
> 
> NestedLoopOuterJoin
> -> Table Scan on A
> ->  HashOuterJoin
>  Join Condition: B.Y = C.W
> -> Table Scan on B
> -> Index Scan usi

Re: Using indexes rather than table scans with Calcite

2020-06-02 Thread Roman Kondakov
Vladimir, Haisheng,

thank you for sharing your thoughts. And special thanks to Haisheng for
awesome examples.

I found Julian's reasoning about representing bitmap indexes as joins
very deep and interesting. As I understand this idea right, for table
A(a,b,c) indexes over columns, say, 'a' and 'b' can be represented as
relations sorted by 'a' and 'b':

idx_a('rowId', 'a') [collation=[a]]
idx_b('rowId', 'b') [collation=[b]]

Table A itself may be represented as a relation

scan_A('rowId', 'a', 'b', 'c') [collation=[]]

using this approach, bitmap indexes may be represented as set operations
(union, intersect) over idx_a and idx_b followed by joining with scan_A
using 'rowId'.

Index-only scans (aka covered indexes), can be modeled as scans over
idx_a or idx_b without consequent join with the main table.

@Julian, did I get your idea right?

If one try to apply this approach directly, then the search space will
be quickly polluted by those index relations. But if the Lattice-like
framework from the Julian's mail is implemented, then it will be a great
feature, I guess.

@Haisheng, BTW we are working on the indexed nested loop join right now
and it looks like it can be implemented without special
Join2IndexApplyRule: we'll use correlation variable from join condition
as filter condition. If there is no index there - it will be a simple
correlated nested loop join (or Apply). If index is present, the cost
will be lower because index lookup will speed up this join, and this
index scan will win.

> 
> There is another useful feature, that is IndexApply.
> 
> The rule of Join2IndexApply will transform a logical join to logical index 
> apply, then transform to physical indexed nested loop join.
> select * from foo join bar on foo.a = bar.f;
> There is an index on foo.a, but no index on bar.f.


Thank you!


-- 
Kind Regards
Roman Kondakov


On 02.06.2020 19:55, Julian Hyde wrote:

> Vladimir,
> 
> I feel the same way. MVs are more powerful and general, and with some effort 
> they could be just as efficient as other approaches. 
> 
> One problem that needs to be solved is the “registration problem”. If you 
> have a lot of MVs they all have to be registered in the planner’s search 
> space, otherwise the planner will not find them.
> 
> I would like to see a benchmark for this. Say we have 1,000 MVs and we plan 
> 10,000 queries that potentially use them. The goal is that the planning cost 
> Is comparable to only having 1 MV. We could achieve the goal by re-using the 
> MVs across planner instances (thereby amortizing the cost of creating them) 
> and “index” the planner’s search space.
> 
> The Lattice structure essentially does this for families for join-aggregate 
> MVs (summary tables). We need something similar for project-sort MVs 
> (indexes). 
> 
> Julian






On 02.06.2020 10:05, Vladimir Ozerov wrote:
> Hi Roman,
> 
> To me, there is no principal difference between materialized views and
> rule-based approaches - they are both "rules". In the materialized views
> approach the rule is to create all alternative access paths
> unconditionally, this rule is always fired first during the optimization
> process. A rule-based approach gives you flexibility. You may do exactly
> the same as with materialized views - just enumerate all access paths and
> add them to search space. But at the same time, you may develop more
> complicated things, such as pruning, bitmap index joins, etc. This is not
> possible with materialized views.
> 
> In this sense, the difference in complexity between materialized views
> approach and rule-based approach with similar features (no pruning, etc) is
> close to zero - in both cases you just iterate over existing indexes and
> add them to search space. Rule-based approach becomes more complex when you
> add more features to it.
> 
> вт, 2 июн. 2020 г. в 00:37, Haisheng Yuan :
> 
>> Hi Roman,
>>
>> There are some potential advantages.
>>
>> Let's still use this as example.
>> SELECT * FROM foo WHERE a > 100 or b < 1000;
>> Both column a and b have B-Tree indexes, note that it doesn't have to be
>> bitmap index.
>>
>>  Bitmap Table Scan on foo
>>->  BitmapOr
>>  ->  Bitmap Index Scan on foo_index_a
>>Index Cond: (a > 100)
>>  ->  Bitmap Index Scan on foo_index_b
>>Index Cond: (b < 1000)
>>
>> filter a:  
>> filter b:---
>>
>> The bitmap scan just needs to read the tuples of satisfying the condition
>> once, because the Bit

Re: Using indexes rather than table scans with Calcite

2020-06-01 Thread Roman Kondakov
Hi Xiening,

the example was synthetic. What is still not clear for me is how to
exploit index sortedness with a rule based approach. As far as I
understand with this approach we need to write complex rules (for
example [1]) that should decide which index is useful and which is not
useful. These rules should take into account both statistics and
collations, so they do some part of work that should be done by a cost
model. And it makes writing such rules quite a difficult task.

With a materialized views approach we can register all indexes as scans,
push filters to them if needed. And the cost model, not a rule, will
decide which index is better based on its cost and output collation.

So the benefits of rule based approach are not so obvious to me. I would
really appreciate if you could tell me in what cases rule-based approach
is better. I understand that its definitely better in scenarios when the
number of indexes is very high. But may be there are some other advantages?

Thank you!

[1]
https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/planner/index/rules/DbScanToIndexScanPrule.java

-- 
Kind Regards
Roman Kondakov


On 01.06.2020 21:00, Xiening Dai wrote:
> Hi Roman,
> 
> The example you mentioned is an advanced scenario. Note that there are 
> different types of index, such as clustered index, secondary index, covered 
> and non-covered index. In your case, typical OLTP/OLAP optimizer would create 
> an index-based join on top of the range table scan (or FilteredTableScan in 
> your term). And these transformation can definitely be based on rules. But 
> the difficult part is actually the statistics estimation and cost 
> calculation. You could end up with higher runtime cost with index based join 
> when join cardinality is high.
> 
> But back to the original question, if we’d like to leverage index on table 
> scan, I think simple rule would serve the purpose. In fact, we have 
> FilterTableScanPredicatePushdownRule in our system which does exactly the 
> same thing.
> 
>> On May 31, 2020, at 12:45 PM, Roman Kondakov  
>> wrote:
>>
>> Hi Vladimir,
>>
>> thank you for sharing your point. Could you please clarify some details
>> with a rulse-based index selection? You said
>>
>>> the fundamental problem with "indexes as materialized
>>> views" approach is that you have to register them beforehand, instead of
>>> using them only when needed.
>>
>> I agree, it's kind of a problem. What is not clear for me with
>> IndexScanRule-based approach is how to decide when and which index we
>> need? I understand that is is pretty easy to do in the case like this:
>>
>> Filter
>>  Scan
>>
>> we can match the IndexScanRule on this pattern and do an index lookup
>> using filter condition. But what to do in the more complex scenarios?
>> Let's consider an example
>>
>> SELECT * FROM A JOIN B ON A.a=B.b WHERE A.c > 100
>>
>> where A.a, A.c and B.b are indexed fields. The logical plan for this
>> query might look like this:
>>
>> LogicalJoin(A.a=B.b)
>>  LogicalFilter(A.c > 100)
>>LogicalScan(A)
>>  LogicalScan(B)
>>
>> as I understand (please correct me if I'm wrong), with the rule-based
>> approach, after allpying IndexScanRule the plan will look like this:
>>
>> LogicalJoin(A.a=B.b)
>>  PhysicalIndexScan(A.c, lower bound = 100)
>>  PhysicalTableScan(B)
>>
>> But in this case we lose the possibility of using index scans over A.a
>> and B.b and joining them with MergeJoin, which can be more efficient
>> plan in terms of the cost.
>>
>> My question is: how rule-based approach handle this scenario? Will it
>> re-apply IndexScanRule once again to produce PhysicalIndexScan(A.a) and
>> PhysicalIndexScan(B.b)? Or am I missing the crucial point of a rule-base
>> approach?
>>
>> Thank you in advance!
>>
>>
>> -- 
>> Kind Regards
>> Roman Kondakov
>>
>>
>> On 31.05.2020 21:39, Vladimir Ozerov wrote:
>>> As already mentioned, the fundamental problem with "indexes as materialized
>>> views" approach is that you have to register them beforehand, instead of
>>> using them only when needed. On the other hand, the complexity of index
>>> planning comes from cost estimation and predicate splitting.
>>> Materializations cannot help you with that anyhow. This is why I call this
>>> approach (not materialized views per se) "hacky" - you reuse several simple
>>> parts of the Calcite infrastructure at the cost of loss in the flexibility
>>> of the planner, while the mo

Re: Using indexes rather than table scans with Calcite

2020-05-31 Thread Roman Kondakov
Hi Vladimir,

thank you for sharing your point. Could you please clarify some details
with a rulse-based index selection? You said

> the fundamental problem with "indexes as materialized
> views" approach is that you have to register them beforehand, instead of
> using them only when needed.

I agree, it's kind of a problem. What is not clear for me with
IndexScanRule-based approach is how to decide when and which index we
need? I understand that is is pretty easy to do in the case like this:

Filter
  Scan

we can match the IndexScanRule on this pattern and do an index lookup
using filter condition. But what to do in the more complex scenarios?
Let's consider an example

SELECT * FROM A JOIN B ON A.a=B.b WHERE A.c > 100

where A.a, A.c and B.b are indexed fields. The logical plan for this
query might look like this:

LogicalJoin(A.a=B.b)
  LogicalFilter(A.c > 100)
LogicalScan(A)
  LogicalScan(B)

as I understand (please correct me if I'm wrong), with the rule-based
approach, after allpying IndexScanRule the plan will look like this:

LogicalJoin(A.a=B.b)
  PhysicalIndexScan(A.c, lower bound = 100)
  PhysicalTableScan(B)

But in this case we lose the possibility of using index scans over A.a
and B.b and joining them with MergeJoin, which can be more efficient
plan in terms of the cost.

My question is: how rule-based approach handle this scenario? Will it
re-apply IndexScanRule once again to produce PhysicalIndexScan(A.a) and
PhysicalIndexScan(B.b)? Or am I missing the crucial point of a rule-base
approach?

Thank you in advance!


-- 
Kind Regards
Roman Kondakov


On 31.05.2020 21:39, Vladimir Ozerov wrote:
> As already mentioned, the fundamental problem with "indexes as materialized
> views" approach is that you have to register them beforehand, instead of
> using them only when needed. On the other hand, the complexity of index
> planning comes from cost estimation and predicate splitting.
> Materializations cannot help you with that anyhow. This is why I call this
> approach (not materialized views per se) "hacky" - you reuse several simple
> parts of the Calcite infrastructure at the cost of loss in the flexibility
> of the planner, while the most complicated parts still need to be
> implemented by hand.
> 
> Materialized views could be a good fit e.g for partial indexes because in
> this case, Calcite could help you with complex subsumption mechanics. But
> for standard indexes, the pros/cons balance is not that obvious.
> 
> вс, 31 мая 2020 г. в 19:28, xu :
> 
>> Hi Tim,
>>
>> I am working on MySQL InnoDB adapter and trying to introduce this to
>> Calcite, currently it is only in early stage, and not approved/reviewed by
>> committers yet. Anyway, we are facing the same problem like what index to
>> use, how to push down order by operation, etc. I have developed a simple
>> rule based adapter to be "index aware" and being able to leverage a MySQL
>> InnoDB storage engine written in Java. Hope this will help you to explore
>> more options.
>>
>> https://issues.apache.org/jira/browse/CALCITE-4034
>>
>> Thanks,
>> Xu
>>
>> Haisheng Yuan  于2020年5月31日周日 下午10:06写道:
>>
>>> Hi Roman,
>>>
>>> Thank you for sharing your thoughts.
>>>
>>>> It can be very tricky because the rule should consider not
>>>> only filters, but also collations. This leads to increasing the
>>>> complexity of such rules.
>>>
>>> Logical transformation rules like FilterTableScan2IndexTableScanRule
>>> should not consider physical properties, like collation, distribution.
>> You
>>> forgot that we just reached consensus in CALCITE-3972. :)
>>>
>>> Regarding the 2nd option that uses index b, it is indeed not that easy
>> for
>>> Calcite 1.22.0. In latest version, it now becomes possible. After rule
>>> transformation, during top-down trait request, the collation is passed
>>> through Filter, down to physical TableScan, which accepts the trait
>> request
>>> with collation on b, find there is an index on b, and return a new
>> RelNode
>>> IndexScan. This process can be done in
>>> EnumerableTableScan#passThrough(required).
>>>
>>>> I hope, when the Cascades-style optimizer become a part of Calcite, the
>>> search space
>>>> pollution will not be a serious issue anymore.
>>>
>>> I hope so too. Top-down style can help alleviate space pollution by space
>>> pruning. But the space pollution caused by LogicalProject operator and
>> its
>>> related rules still can't be avoided. :)
>>>
>>> Again, thanks for sharing

Re: Using indexes rather than table scans with Calcite

2020-05-31 Thread Roman Kondakov
Hi Haisheng,

The basic rationale behind the using materialized views for secondary
index representation instead of special rules like mentioned
FilterTableScan2IndexTableScanRule is the simplicity of implementation.

You are absolutely right that materialized views approach has an obvious
drawback that it should register all indexes as materialized views in
the optimizer's search space. But we expect our users will not overuse
indexes because in general having too many indexes is a bad practice: on
each table update we also should update it's indexes and it can cause
some performance degradation of the system as a whole, not only
optimizer. So we expect the number of indexes will be relatively small.

Ignite uses indexes not only for index lookups, but also for exploiting
it's sortedness. In this case materialized view's approach can show some
advantages. Let's consider the example:

SELECT * FROM foo WHERE a > 100 ORDER BY b;

where both fields 'a' and 'b' are indexed. In this case we will have two
alternatives of the query execution:

1. Use index 'a' for index conditioned scan and then sort rows by 'b'
2. Use index scan over 'b' and then apply filter over 'a' - here we can
avoid sorting, because index over 'b' is already sorted.

If I understand the approach with FilterTableScan2IndexTableScanRule
correctly, at this step the rule should make a decision about which
index to use. It can be very tricky because the rule should consider not
only filters, but also collations. This leads to increasing the
complexity of such rules. With materialized views approach we just
register all indexes with their caollation and let the cost model do its
job. Our rules are very simple. The complexity is encapsulated in the
index scan cost estimation.

As for your example with disjunctive predicate:

> SELECT * FROM foo WHERE a > 100 or b < 1000;

I think with materialized views approach we can do a complex logic as
well. For example, we may implement a special rule for such cases:
BitmapOrRule. Or, even better, we will rewrite such predicates to a
UNION ALL clause. A very good explanation and a comparison of both
approaches I've found in this Oracle's blog post [1].

As a conclusion: choosing between materialized views approach and
IndexRule-based approach is a trade-off between complexity of
implementation and search space pollution. I hope, when the
Cascades-style optimizer become a part of Calcite, the search space
pollution will not be a serious issue anymore.


Thanks!


[1]
https://blogs.oracle.com/optimizer/optimizer-transformations:-or-expansion

-- 
Kind Regards
Roman Kondakov


On 31.05.2020 07:31, Haisheng Yuan wrote:
> Thanks Julian and Roman for sharing the experiences for modeling indexes.
> 
> Besides using materialized views, which is already proven by Phoenix and 
> Ignite, there is another approach, as mentioned by Vladimir, define your own 
> rules and indexscan operators.
> 
> FilterTableScan2IndexScanRule and its variances, which match Filter over 
> TableScan, create an IndexScan if the table has corresponding index on the 
> filter column. You don't have to register different tablescans for all the 
> indexes, in case you have nearly thousands indexes for a table, say 999, 
> which is allowed in SQL Server 2016. It can also support more complex 
> scenario, e.g.:
> 
> SELECT * FROM foo WHERE a > 100 or b < 1000;
> If there is an index on column a, and another index on column b, we may need 
> to utilize both indexes, through Bitmap TableScan.
> 
>  Bitmap Table Scan on foo
>->  BitmapOR
>  ->  Bitmap Index Scan on foo_index_a
>Index Condition: (a > 100)
>  ->  Bitmap Index Scan on foo_index_b
>Index Condition: (b < 1000)
> 
> But still, this approach requires some non-trivial work to do.
> 
> Hi Roman, I believe you definitely have consulted both approaches for Apache 
> Ignite to work with indexes, you decided to go with materialized views, there 
> are some reasons and tradeoffs to consider behind your decision. Do you mind 
> sharing with us?
> 
> Thanks,
> Haisheng
> 
> 
> On 2020/05/29 17:34:27, Julian Hyde  wrote: 
>> Materialized views are not a hack, as Vladimir claims. Materialized views 
>> are a fundamental concept in relational algebra, and they are an elegant way 
>> - in my opinion the correct way -  to model indexes (and many other 
>> structures).
>>
>> In Calcite materialized views are a feature in the planner that allows you 
>> to declare a table as equivalent to a query. (You do not need to issue a 
>> CREATE MATERIALIZED VIEW statement. They are internal.) Then you can use 
>> algebra to substitute one for the other, and apply

Re: Using indexes rather than table scans with Calcite

2020-05-29 Thread Roman Kondakov
Hi Tim,

In Apache Ignite we've already faced this challenge. We solved it using
materialized views and FilterableTable. Let's consider your example:

> select * from users where country='UK' and some_other_column='foo';

with a primary index and a sorted secondary index (B+Tree?) over the
'country' field.

As a first step Ignite registers all indexes that might be helpful for
the query plan in VolcanoPlanner as materialized views. For now we
register all indexes in all tables that participate in the query.
Registering all indexes might be excessive, but maybe we will apply some
pruning later. So it's ok for now to register all indexes.

Index materialized view is very simple: it's just a sorted table scan:

'SELECT * from tbl ORDER BY idx_fields'

After registering indexes as materialized views, the optimizer's search
space will look like this:

Project([*])
  Filter[country='UK' and some_other_column='foo']
Set0

where Set0 consists of table and index scans:

TableScan('tbl', collation=[])
IndexScan('PK', collation=[PK_field])
IndexScan('country_idx', collation=[country])

At this step we have our index scans registered in the optimizer within
the same equivalence set as a TableScan.

The next step is a pushing filters down to the scans. We do it with a
rule which is similar to 'FilterTableScanRule'. After applying this rule
we have a search space that is looking like this:

Project([*])
  TableScan('tbl', collation=[], filter=[country='UK' and
some_other_column='foo'])
  IndexScan('PK', collation=[PK_field], filter=[country='UK' and
some_other_column='foo'])
  IndexScan('country_idx', collation=[country], filter=[country='UK' and
some_other_column='foo'])

And the final step is adjusting the cost model to make it select the
scan with the lower cost which depends on the filter conditions within
the Scan. For example, full table scan with filters

TableScan('tbl', collation=[], filter=[country='UK' and
some_other_column='foo'])

will cost, say, 100. Because it have to scan all rows and then filter
out some set of them. On the other hand the index scan that can do index
lookup instead of full scan will have a less cost. For example

IndexScan('country_idx', collation=[country], filter=[country='UK' and
some_other_column='foo'])

will have cost about ~10, or so because it has a good index condition
`country='UK'` which can be used for index lookup that that returns only
10% of rows. And therefore this IndexScan should be chosen as the best
plan by the optimizer.

We've recently implemented this approach in Apache Ignite and it works
well for us. You can find it in [1]. This PR has many changes that are
unrelated to the main topic. So particularly you can look at
`IgnitePlanner.materializations()' method which registers indexes as
materialized views and `IgniteTableScan` which performs filter
conditions assessment and index scan cost estimation.


[1] https://github.com/apache/ignite/pull/7813

-- 
Kind Regards
Roman Kondakov


On 29.05.2020 11:44, Tim Fox wrote:
> Hi,
> 
> I'm building a query engine with Calcite - really enjoying working with
> Calcite so far!
> 
> When creating a plan, it seems Calcite always creates a plan where the
> sources are table scans, however in my implementation the tables can have
> indexes on them so a table scan is not always the right choice.
> 
> I was wondering if there was any way of making Calcite "index aware" - e.g.
> perhaps providing hints to the table scan instance that, actually, an index
> scan or a primary key lookup should be used instead of actually scanning
> the table. E.g. On the table meta-data if we provided information about any
> indexes on the table, then Calcite could figure out what parts of the query
> to push to the table scan and which to keep in the rest of the plan.
> 
> There are two specific cases I really care about:
> 
> 1. Queries that contain a primary key lookup:
> 
> select * from some_table where key_column=23 AND some_other_column='foo';
> 
> In the above case the 'select * from some_table where key_column=23' can be
> implemented as a simple PK lookup in the source table, not requiring a
> scan, thus leaving just the filter corresponding to
> 'some_other_column='foo'' in the rest of the plan
> 
> 2. Queries with expressions on a column which has a secondary index
> 
> select * from users where country='UK' and some_other_column='foo';
> 
> We have many users, and let's say 10% of them are from UK (still a lot). We
> have a secondary index in the country column

Re: [DISCUSS] Towards Cascades Optimizer

2020-05-11 Thread Roman Kondakov
Hi Jinpeng,

Apache Calcite community has entered into the interesting situation: we
have several concurrent efforts of building the new Cascades style
optimizers. I can see at least three activities (correct me if I'm wrong):

1. Haisheng's gradual rebuilding of VolcanoPlanner [1]
2. Jinpeng's new CascadesPlanner based on VolcanoPlanner [2]
3. My brand new CascadesPlanner [3]

All these activities are independent and may interfere with each other.
It is not good. We still don't know which approach is better and which
optimizer will be the most efficient while keeping backward compatibility.

In my opinion we must refrain from breaking changes in the master branch
right now because it can break other optimizers.

I think we need to develop a further strategy for activities related to
the new optimizers:
- find out the way how we will choose the new default Cascades optimizer
for Calcite: it should be stable, fast and backward-compatible
optimizer. If all optimizers will prove to be stable and backward
compatible, may be we need to choose the fastest one?
- think about the possibility of using plug-in optimizers: it would be a
great feature for users if they want to use their custom optimizer for
specific tasks or research purposes. It can be configured with
FrameworkConfig.setPlanner(new CustomPlanner()).

What do you think?

[1] https://github.com/apache/calcite/pull/1953
[2] https://github.com/apache/calcite/pull/1950
[3] https://github.com/apache/calcite/pull/1948


-- 
Kind Regards
Roman Kondakov


On 11.05.2020 11:12, Jinpeng Wu wrote:
> Hi, Roman.
> 
> Thanks to Julian's tips, I added a CoreCascadeQuidemTest to the code. It
> runs iq files that CoreQuidemTest would runs and uses CascadePlanner
> instead of VolcanoPlanner to generate physical plans. Currently all tests
> have passed. There are some plan changes but they are actually equivalent
> plans in different forms. I‘ve also committed those plan changes as '.cas'
> files.
> Moreover, if you would like to set -Dcalcite.cascade=true during tests, all
> test cases will use the new planner for planning. There would be some
> failures, typically because there's no common ways across all tests to tell
> whether a RelNode is physical or logical.
> 
> 
> When I was running the tests, I found that Haisheng's latest changes has a
> conflict with my PR. The new planner relies on AbstractConverter to tell
> whether a subset canConvert/needConvert to another subset. Haisheng's
> change will remove AbstractConverter and break this. Currently, I switch
> off top-down trait propagation in new planner to work around. I will merge
> these two top-down processing together later on to completely address this
> problem.
> One thing to recall is that, while I am writing a new planner, Haisheng
> modified the VolcanoPlanner directly and used a flag to switch on/off the
> new feature. I need to decide how to merge them: to keep a separated
> planner and invoke the new trait propagation interfaces or to modified the
> volcano planner directly. I am trying the first way now. But of course, if
> keeping one planner is preferred, I can also take the second way.
> 
> 
> Thanks,
> Jinpeng
> 
> 
> On Fri, May 1, 2020 at 6:40 AM Haisheng Yuan  wrote:
> 
>> Hi all,
>>
>> As planned in my proposal, I opened the pull request [1] for CALCITE-3896
>> to achieve:
>> 1. Top-down trait request
>> 2. Bottom-up trait derivation
>> 3. Trait enforcement without AbstractConverter
>>
>> The feature can be turned on or off by a flag, either through property
>> config file or VolcanoPlanner set method. Since Calcite doesn't turn on
>> AbstractConverter until latest master, I just disabled AbstractConverter by
>> turning on top-down trait request, now all tests passed.
>>
>> In our system, 99 tpcds queries' test results show almost no plan diff,
>> but the number of relnodes created during optimization is reduced by 10~15%
>> average (even without space pruning). I believe for other systems using
>> VolcanoPlanner, more than 20% reduction can be expected.
>>
>> It also has top-down rule apply in mind, later can be evolved to top-down
>> rule apply and space pruning, e.g. integrating code from Jingpeng and
>> Roman's. But the interface that is exposed to user, as described in the
>> proposal, can remain the same.
>>
>> Haisheng
>>
>> [1] https://github.com/apache/calcite/pull/1953
>>
>>
>> On 2020/04/30 18:01:26, Julian Hyde  wrote:
>>> If your test cases are SQL scripts, it might be fairly straightforward
>> to port them to Quidem (.iq) files. Plenty of examples in
>> https://github.com/apache/calcite/tree/master/core/src/test/resour

Re: [DISCUSS] Towards Calcite 1.23.0

2020-05-05 Thread Roman Kondakov
Hi Haisheng,

I would like to review CALCITE-3896. Do you have a PR?


-- 
Kind Regards
Roman Kondakov


On 05.05.2020 06:37, Haisheng Yuan wrote:
> IIRC, I am the release manager for 1.23.0.
> 
> I think CALCITE-3896 (top-down trait request) is in good shape, it will be 
> nice if it can go into 1.23.0. Can someone help review?
> 
> On 2020/05/04 22:58:25, Stamatis Zampetakis  wrote: 
>> Hello,
>>
>> Calcite 1.22.0 was released about 2 months ago (on March 5, 2020). During
>> the vote we said to try to have a release sooner than usual so I am
>> launching the discussion now hoping we could get to 1.23.0 rather soon.
>>
>> I was checking the status of the current release [1] and we have already
>> over 100 resolved issues with 91 fixes so it is already pretty big.
>>
>> I know that there quite a few discussions in progress so it would be good
>> to see how many of the ongoing cases should be fixed for 1.23.0.
>>
>> More importantly do we have a release manager for 1.23.0?
>>
>> Best,
>> Stamatis
>>
>> [1]
>> https://issues.apache.org/jira/secure/Dashboard.jspa?selectPageId=12333950
>>


[jira] [Created] (CALCITE-3969) Method RelTrait.apply(Mappings.Mapping) throws exception when mapping doesn't cover collation or distribution keys

2020-05-03 Thread Roman Kondakov (Jira)
Roman Kondakov created CALCITE-3969:
---

 Summary: Method RelTrait.apply(Mappings.Mapping) throws exception 
when mapping doesn't cover collation or distribution keys
 Key: CALCITE-3969
 URL: https://issues.apache.org/jira/browse/CALCITE-3969
 Project: Calcite
  Issue Type: Bug
  Components: core
Reporter: Roman Kondakov


Let's consider we have an input {{(id, name)}} ordered by {{id}} (i.e. 
collation == {{[0]}}). If we have a {{Project("name")}} on the top of this 
input and we apply project's mapping on the collation, we'll end up with 
exception:
{noformat}
java.lang.NullPointerException: at index 0
at 
com.google.common.collect.ObjectArrays.checkElementNotNull(ObjectArrays.java:239)
at 
com.google.common.collect.ObjectArrays.checkElementsNotNull(ObjectArrays.java:230)
at 
com.google.common.collect.ObjectArrays.checkElementsNotNull(ObjectArrays.java:225)
at 
com.google.common.collect.ImmutableList.construct(ImmutableList.java:281)
at 
com.google.common.collect.ImmutableList.copyOf(ImmutableList.java:239)
at org.apache.calcite.rel.RelCollations.of(RelCollations.java:69)
at org.apache.calcite.rex.RexUtil.apply(RexUtil.java:1271)
at 
org.apache.calcite.rel.RelCollationImpl.apply(RelCollationImpl.java:122)
at 
org.apache.calcite.rel.RelCollationImpl.apply(RelCollationImpl.java:40)
{noformat}
This happens because the collation field {{id}} is not a part of the mapping. 
The same problem is with distribution trait when distribution keys are not 
covered by the mapping.

Calcite should handle such situations gracefully. If it is not possible to 
deduce the collation/distribution after the mapping application, we should 
return:
 * {{EMPTY}} collation for {{RelCollation}} trait.
 * {{RANDOM_DISTRIBUTED}} distribution for {{RelDistribution}} trait.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [DISCUSS] Towards Cascades Optimizer

2020-04-28 Thread Roman Kondakov
Hi Xiening,

thank you for your feedback.

> 1. Do we really need RelGroup and RelSubGroup? I believe the memo structure 
> would be largely the same even if we move towards a Cascade planner. I think 
> we can reuse most of the RelSet/RelSubset today. RelSubset is a tricky one. 
> There’s no corresponding concept in the Cascade framework. But we do need a 
> way to track the RelNode physical traits anyway. A RelSubset is more like a 
> logical grouping of the RelNodes with same trait set, which I think is still 
> valuable.

No, we don't really need RelGroup and RelSubGroup. My initial approach
was to use only RelGroup as in original Cascades framework, but later I
realized that RelSubGroup is a convenient concept and introduced it. I'm
going to replace both with RelSet/RelSubset later. Despite
RelSet/RelSubset have many things that not needed for cascades (parents,
propagateCosImprovements, etc).

> Keeping backward compatibility is a key goal. We have to think through this 
> at the very beginning of the design. The community has invested a lot with 
> Volcano planner over the years. We cannot ask people to rewrite their rules 
> just to make them work with the new planner. We should expect current rules 
> would just work, or would work with “minimal” changes.

I agree with you. My next step is to bring backward compatibility for
Cascades planner.

> There are some building blocks that are currently missing which will prevent 
> us from using the Cascade top down search strategy. For example, the way how 
> rule is matched, the lack of transformation phase, the lack of the ability to 
> create physical nodes by the framework, etc. If we don’t care about current 
> rules, and start from scratch, we probably don’t need to fix these issues. 
> But with #2 in mind, we have to work out solutions for these so we can carry 
> over existing rules.

I think we can reuse existing rules and at the same time achieve correct
top-down Cascades search strategy. For example, I've changed rule
matching strategy in the prototype: rules are matched on demand in the
top-down way (see org.apache.calcite.plan.cascades.ApplyRule.Bindery
class). And I believe that all other things (like creating physical
nodes) we can fix in the similar way: just changing the planner code
while maintaining backward compatibility.


-- 
Kind Regards
Roman Kondakov


On 28.04.2020 03:12, Xiening Dai wrote:
> Hi Roman,
> 
> First, thank you for sharing your design and prototype code. I took a quick 
> look at your design and have some high level feedback -
> 
> 1. Do we really need RelGroup and RelSubGroup? I believe the memo structure 
> would be largely the same even if we move towards a Cascade planner. I think 
> we can reuse most of the RelSet/RelSubset today. RelSubset is a tricky one. 
> There’s no corresponding concept in the Cascade framework. But we do need a 
> way to track the RelNode physical traits anyway. A RelSubset is more like a 
> logical grouping of the RelNodes with same trait set, which I think is still 
> valuable.
> 
> 2. Keeping backward compatibility is a key goal. We have to think through 
> this at the very beginning of the design. The community has invested a lot 
> with Volcano planner over the years. We cannot ask people to rewrite their 
> rules just to make them work with the new planner. We should expect current 
> rules would just work, or would work with “minimal” changes.
> 
> 3. There are some building blocks that are currently missing which will 
> prevent us from using the Cascade top down search strategy. For example, the 
> way how rule is matched, the lack of transformation phase, the lack of the 
> ability to create physical nodes by the framework, etc. If we don’t care 
> about current rules, and start from scratch, we probably don’t need to fix 
> these issues. But with #2 in mind, we have to work out solutions for these so 
> we can carry over existing rules.
> 
> I think your execution plan looks good overall. We can iterate on the design 
> while working out a document.
> 
> For the purpose of transparency, I’ve been working with Haisheng in the last 
> few months regarding this topics. I agree with him on most parts of his 
> roadmap, which I think it’s a tangible plan to evolve current Volcano planner.
> 
> 
>> On Apr 27, 2020, at 11:29 AM, Roman Kondakov  
>> wrote:
>>
>> Hi all,
>>
>> Stamatis, Haisheng thank you very much for your feedback! I really
>> appreciate it.
>>
>>> If in the new planner we end up copy-pasting code then I guess it will be a 
>>> bad idea.
>>
>> Yes, there are some code duplication between Volcano planner and
>> Cascades planner. I think I'll move it to some common superclass: either
>> to existing Ab

Re: [DISCUSS] Towards Cascades Optimizer

2020-04-28 Thread Roman Kondakov
Hi Jinpeng,

I went through your PR and it seemed very impressive to me. It is very
similar to what I did, but you've reused many existing logic from the
Volcano planner. We should definitely stay in sync in our experiments. I
believe the future Cascades planner will be the kind combination of our
works.

Is there any way to run tests that are close to the real system query
execution? May be with Enumerable convention, or, better, with
convention that supports distribution trait? I just want to look through
your planner's optimization steps more thoroughly. I've found some tests
in org.apache.calcite.plan.volcano package, but they use synthetic
conventions and nodes. May be I missed something.

Thank you for sharing your work!

-- 
Kind Regards
Roman Kondakov


On 28.04.2020 15:19, Jinpeng Wu wrote:
> Hi, Roman. It's great to see your proposal. Actually my team has also been
> working on a cascade planner based on calcite.  And we already have some
> outcome as well.  Maybe we can combine our works.
> 
> I've pushed my code as https://github.com/apache/calcite/pull/1950 .
> 
> Our works have many places in common. We both developed a new
> CascadePlanner and avoid modifying the old VolcanoPlanner directly. We
> both implemented the top-down search strategy according to the
> Columnbia optimizer
> generator
> <https://15721.courses.cs.cmu.edu/spring2019/papers/22-optimizer1/xu-columbia-thesis1998.pdf>。But
> we also have some differences.
> 
> The first difference is that I try to reuse the existing VolcanoPlanner as
> much as possible. My CascadePlanner inherits from the existing
> VolcanoPlanner. Except that it overwrites ruleQueue and findBestPlan method
> to rearrange rule applies, most logic generally inherit from
> VolcanoPlanner. For example,
>   - It reuses the RelSet and RelSubset class and the register method
>   - Rules are fired as soon as a RelNode is registered (In the
> Columnbia optimizer generator, rules are not fired until exploring). The
> ApplyRule task controls when to invoke the onMatch method of a RuleMatch.
> This design have a benefit that we do not need to worry about missing a
> rule or firing a rule multiple times.
>   - It leverages AbstractConverter to pass traits requirements down.  So
> currently AC is still essential in my code.
> This makes the new planner highly compatible with the old VolcanoPlanner.
> Features like MV and Hints can apply to it directly.  And I tried to change
> VolcanoPlanner to the new CascadePlanner in tests. Most tests passed.
> Several cases did fail. I know the reason and how to fix them. But I am
> still thinking about making them as "won't fix" as the ruleset violates
> some basic principles of top-down trait requests.
> 
> The second difference is that our design have the ability for space
> pruning. Currently it contains a simply LowerBoundCost metadata to compute
> the lower bound of a RelNdoe. Because logical properties like cardinality
> of a RelSet is not stable across exploring, it is required that a group to
> be fully explored (implementation rules and enforcement rules should never
> modify the logical properties) before it can provide a valid lower bound
> cost. Because of that, logical search space pruning is not supported now.
> It can only pruned out implementation rules and enforcement rules. Testing
> with cases in our own product, the new planner saves about 10% rule
> applies. I am still considering how to support logical space pruning,
> looking forwards to have more improvements.
> 
> Hope my code will help.
> 
> Thanks,
> Jinpeng
> 
> 
> On Tue, Apr 28, 2020 at 11:22 AM Xiening Dai  wrote:
> 
>> For #1, aside from that we need to be able to build physical nodes based
>> on a convention. For example, if we merge two EnumerableProject, we would
>> want to create an EnumerableProject as a result, instead of LogicalProject.
>> The RelBuilder change I work on would help this case.
>>
>> For #2, I don’t think it’s just a bug. If the physical cost cannot be
>> reliable before transformation is finished, we should probably delay the
>> physical cost calculation, or we risk doing it over again. The other way is
>> to complete RelSet transformation before implementing it - which is a
>> common practice in industry, including Orca.
>>
>> The multi-convention is a key scenario, and I agree we should support. My
>> thinking is more about seperating logical one (Conventions.NONE) from
>> others.
>>
>>
>>> On Apr 27, 2020, at 4:59 PM, Julian Hyde  wrote:
>>>
>>> Re 1. By all means have multiple instances of a rule (e.g. one instance
>> that matches LogicalFilter and another that matches FooFilter) and enable

Re: [DISCUSS] Towards Cascades Optimizer

2020-04-27 Thread Roman Kondakov
Hi all,

Stamatis, Haisheng thank you very much for your feedback! I really
appreciate it.

> If in the new planner we end up copy-pasting code then I guess it will be a 
> bad idea.

Yes, there are some code duplication between Volcano planner and
Cascades planner. I think I'll move it to some common superclass: either
to existing AbstractRelOptPlanner or a new AbstractCostBasedPlanner.

> Like that it will be easier for everybody to test and see if the changes make 
> this better or worse. From a backward compatibility perspective it seems 
> feasible to keep the new eatures configurable for a certain amount of time.

I was thinking about backward compatibility in this way: what if we can
switch planner in tests by setting some flag (system property?)
somewhere and check what happens with new planner if we replace Volcano
with it. And if ever it turns out that the new planner passes all
Volcano's tests and at the same time it works more efficiently, we can
safely replace Volcano planner with Cascades planner.

> A design doc would definitely help, especially if it has a few end-to-end 
> (from logical to physical plan) examples showing how the optimizer works
at each step before/after the changes. This is actually what is usually
missing in research papers that makes them hard to understand.
> I am thinking some similar to the examples that Haisheng send in the first  
> email but possibly a bit more detailed.

I agree, I'll add some exmples to the design doc very soon.

> I looked very briefly in the PR by Roman but I think I didn't see tests where 
> the final plan contains operators from multiple conventions. Multiple 
> conventions is among the choices that complicate certain parts of he existing 
> planner so we should make sure that we take this into account.

It's on my radar. I'm going to add these tests.

I would like to ask the commutnity about my next steps on the way from
transition of the Cascades planner from the prototype status to becoming
a part of the project. I see these steps like this:

1. Create a jira ticket.
2. Update design document with examples.
3. Make some research to obtain backward compatibility with Volcano
planner to be able to replace Volcano planner with Cascades planner in
test and to understand the current porblems with planner.
4. Solve known problems
  - materialized views
  - hints
  - multiple convetions
  - listener hooks
  - problems from p.3.
5. new PR, review and merge.
6. Replacie Volcano planner wiith Cascades after several releases.

What do you think about this roadmap?


-- 
Kind Regards
Roman Kondakov


On 27.04.2020 01:55, Stamatis Zampetakis wrote:
> Hi all,
> 
> I am very excited about the ideas discussed so far and especially by the
> enthusiasm of many people that are ready to help for pulling this out.
> I wouldn't except that we could have a prototype so quickly.
> Thanks a lot everyone!
> 
> In the debate between creating new planner or patching the existing one, I
> don't have a clear preference.
> I think the answer depends on how many things can we reuse.
> If in the new planner we end up copy-pasting code then I guess it will be a
> bad idea.
> On the other hand, if the new and old planner do not have many things in
> common then I guess the answer is obvious.
> 
> From Haisheng's description, I was thinking that many of the proposed
> changes could go in the existing planner.
> Like that it will be easier for everybody to test and see if the changes
> make this better or worse.
> From a backward compatibility perspective it seems feasible to keep the new
> features configurable for a certain amount of time.
> 
> From the wish-list, I think we should focus initially on points:
> 1. Top-down trait request
> 2. Convert traits without Abstract converters
> 4. Bottom-up trait derivation
> 
> I know that 3, and 5, are also important but I have the feeling they can
> wait a bit longer.
> 
> A design doc would definitely help, especially if it has a few end-to-end
> (from logical to physical plan) examples showing how the optimizer works at
> each step before/after the changes.
> This is actually what is usually missing in research papers that makes them
> hard to understand.
> I am thinking some similar to the examples that Haisheng send in the first
> email but possibly a bit more detailed.
> 
> I looked very briefly in the PR by Roman but I think I didn't see tests
> where the final plan contains operators from multiple conventions.
> Multiple conventions is among the choices that complicate certain parts of
> the existing planner so we should make sure that we take this into account.
> 
> Hoping to find some time to think over all this more quietly. Very
> interesting stuff :)
> 
> Best,
> Stamatis
> 
> On 

Re: [DISCUSS] Towards Cascades Optimizer

2020-04-26 Thread Roman Kondakov
Hi everyone!

Haisheng, thank you for bringing this subject up. A new Cascades-style
optimizer should be definitely the next step for Apache Calcite. Many
projects suffer from the lack of this kind of optimizer.

That was the reason why several weeks ago I started working on the
prototype of Cascades optimizer for Apache Calcite. I was not sure that
I would build something useful without much experience in this area. But
now I'd like to share my work with the community. You can find the
Cascades prototype in PR [1]. This prototype is based on the well-known
paper [2].

What prototype can do:
- Top-down trait request
- Convert traits without Abstract converters
- Top-down rule apply
- Bottom-up trait derivation

What is not supported yet:
- Search space pruning (but I'm going to fix it in the next commits)
- Materialized views
- Planner hooks
- Hints
- Backward compatibility with Volcano planner (some research needed)

I prepared a design doc for this planner [3], you can find many details
there. I also opened it for comments.

I've written several basic test cases in
org.apache.calcite.plan.cascades.CascadesPlannerTest including that was
discussed in this thread previously in the context of trait requests:

>  MergeJoin hash[a] 
>   | TableScan R hash[a] (RelSubset)
>   + TableScan S hash[a] (RelSubset)


Haisheng, this is very similar to what you propose. Answering your question:

> There are 2 ways:
> a) Modify on current VolcanoPlanner.
>   Pros: code reuse, existing numerous test cases and infrastructure, fast 
> integration
>   Cons: changing code always brings risk
> 
> b) Add a new Planner
>   Pros: no risk, no tech debt, no need to worry about backward compatability
>   Cons: separate test cases for new planner, one more planner to maintain

I've chosen the second approach. Because I don't have clear
understanding how to fix Volcano planner gradually.

This new planner is very far from perfect, but I think it can be a good
starting point for community.

Please, share your thoughts about this planner.


[1] PR: https://github.com/apache/calcite/pull/1948
[2] Paper:
https://15721.courses.cs.cmu.edu/spring2019/papers/22-optimizer1/xu-columbia-thesis1998.pdf
[3] Design doc:
https://docs.google.com/document/d/1qaV3eSKTw4gfLuBR3XB_LVIc51hxIng9k1J4VD6qnRg/edit?usp=sharing



-- 
Kind Regards
Roman Kondakov


On 22.04.2020 09:52, Danny Chan wrote:
>> Is there any recommended approach to make that happen smoothly besides
> coding and testing work? We need to be aware that the new planner might be
> co-exist with VolcanoPlanner for 5 or more years, or even never replace
> VolcanoPlanner.
> 
> If that is true, i might say the new planner is probably with a not that
> good design, we expect to see in advance for what cases/reasons user has
> the reason to keep the old VolcanoPlanner and we *must* give a solution for
> those problems in the new design.
> 
> I was expecting that migrating to a new planner would at least take 1 year
> for developing, if that is true, modifying directly based on current
> planner means for the near future 3~4 versions Calcite, there would bring
> in huge plan changes/bugs for each release which i believe all the users of
> Calcite don't want to see. And on one can guarantee that modifying directly
> can keep good stability and compatibility, only the test set do.
> 
> From the experience of Alibaba Blink planner which has contributed to
> Apache Flink, yes, the old/new planner would co-exist at least for 2 years.
> For the reasons that the new and old planner has different ability in some
> corner cases.
> 
> From my point of view, we should at least:
> - Give a convincing test set for the new planner that makes us believe the
> new planner is stable and powerful enough. I mean obviously the current
> rule tests are far away from enough to support the new planner
> - We should give a more detailed design doc about the new planner,
> especially about the interfaces changes and any change that would bring in
> the compatibility problem. Then we can make more accurate decision how much
> work the new planner would bring in, until then, we can decide if switch to
> a pure new planner development is a good idea or modify the existing one.
> 
> 
> Haisheng Yuan  于2020年4月22日周三 上午9:45写道:
> 
>> Hi Andrii,
>>
>>> Obviously, from what is written here, I could guess that this would
>> require me to change my physical planning rules, even if only by
>> implementing a marker interface.
>> You don't need to change your physical rules, it will be treated as equal
>> as logical rules and be applied together with the real logical rules, no
>> more logical/physical rules difference. This is also how current
>> VolcanoPlanner works.
>&g

Re: Cannot create physical plan with join

2020-04-11 Thread Roman Kondakov
Hi Tim,

it looks like your physical converter rule for a Join node does not
convert it's inputs to your custom FLOWDB convention. And because of it
the PhysicalJoin is trying to get input rows from the LogicalScan.
You have:

PhysicalJoin[FLOWDB]
  LogicalTableScan[NONE] <- logical rels have infinite cost
  LogicalTableScan[NONE] <- logical rels have infinite cost

but it should be

PhysicalJoin[FLOWDB]
  PhysicalTableScan[FLOWDB]
  PhysicalTableScan[FLOWDB]

In order to achieve it you need to convert both inputs of the
PhysicalJoin node to the FLOWDB convention using  RelOptRule#convert()
and RelTraitSet#replace(FLOWDBConvention.INSTANCE) methods. You can find
examples in any join converter rule, i.e. BindableJoinRule#convert [1]


[1]
https://github.com/apache/calcite/blob/3755eb5871860f1fd5dc51990129784caa8ac0a4/core/src/main/java/org/apache/calcite/interpreter/Bindables.java#L476

-- 
Kind Regards
Roman Kondakov


On 11.04.2020 14:22, Tim Fox wrote:
> Hi All,
> 
> I have recently started using Calcite as the query parser/planner for a
> side project. I have created a set of RelNodes corresponding to my physical
> nodes, and a set of rules. I have created my own convention.
> 
> All works well for queries without a join - my physical nodes are
> created fine (aggregates, projections, filters, table scans, all ok).
> 
> When I try and transform to my physical plan where the query contains a
> join, I get the following error:
> 
> "There are not enough rules to produce a node with desired properties:
> convention=FLOWDB, sort=[]. All the inputs have relevant nodes, however the
> cost is still infinite."
> 
> (full error output at bottom of the post)
> 
> I stumbled upon this post when googling this:
> 
> https://issues.apache.org/jira/browse/CALCITE-3255
> 
> I have checked and I am specifying my convention when transforming to the
> physical plan, and my rules seem to be set up ok.
> 
> There is one comment in the above linked issue that is perhaps relevant
> 
> "You should also supply the metadata in you convention nodes, so that our
> metadata system can compute the cumulative cost correctly."
> 
> But I don't really understand what this means. Can someone explain to a
> newb like me what metadata is required and how I provide it?
> 
> Many thanks,
> 
> full error report:
> 
> _INITIAL: There are not enough rules to produce a node with desired
> properties: convention=FLOWDB, sort=[]. All the inputs have relevant nodes,
> however the cost is still infinite.
> 
> Root: rel#76:Subset#3.FLOWDB.[]
> 
> Original rel:
> 
> LogicalProject(subset=[rel#76:Subset#3.FLOWDB.[]], sensor_id=[$0],
> temp=[$2], name=[$4], country=[$5]): rowcount = 1500.0, cumulative cost =
> {1500.0 rows, 6000.0 cpu, 0.0 io}, id = 74
> 
>   LogicalJoin(subset=[rel#73:Subset#2.NONE.[]], condition=[=($1, $3)],
> joinType=[left]): rowcount = 1500.0, cumulative cost = {1500.0 rows, 0.0
> cpu, 0.0 io}, id = 72
> 
> LogicalTableScan(subset=[rel#70:Subset#0.NONE.[]],
> table=[[latest_sensor_readings]]): rowcount = 100.0, cumulative cost =
> {100.0 rows, 101.0 cpu, 0.0 io}, id = 65
> 
> LogicalTableScan(subset=[rel#71:Subset#1.NONE.[]],
> table=[[current_locations]]): rowcount = 100.0, cumulative cost = {100.0
> rows, 101.0 cpu, 0.0 io}, id = 66
> 
> Sets:
> 
> Set#0, type: RecordType(VARCHAR sensor_id, BIGINT location_id, DOUBLE temp)
> 
> rel#70:Subset#0.NONE.[], best=null, importance=0.7291
> 
> rel#65:LogicalTableScan.NONE.[](table=[latest_sensor_readings]),
> rowcount=100.0, cumulative cost={inf}
> 
> rel#84:Subset#0.FLOWDB.[], best=rel#83, importance=0.36455
> 
> rel#83:PhysicalTableScan.FLOWDB.[](table=[latest_sensor_readings]),
> rowcount=100.0, cumulative cost={100.0 rows, 101.0 cpu, 0.0 io}
> 
> Set#1, type: RecordType(BIGINT location_id, VARCHAR name, VARCHAR country)
> 
> rel#71:Subset#1.NONE.[], best=null, importance=0.7291
> 
> rel#66:LogicalTableScan.NONE.[](table=[current_locations]), rowcount=100.0,
> cumulative cost={inf}
> 
> rel#82:Subset#1.FLOWDB.[], best=rel#81, importance=0.36455
> 
> rel#81:PhysicalTableScan.FLOWDB.[](table=[current_locations]),
> rowcount=100.0, cumulative cost={100.0 rows, 101.0 cpu, 0.0 io}
> 
> Set#2, type: RecordType(VARCHAR sensor_id, BIGINT location_id, DOUBLE temp,
> BIGINT location_id0, VARCHAR name, VARCHAR country)
> 
> rel#73:Subset#2.NONE.[], best=null, importance=0.81
> 
> rel#72:LogicalJoin.NONE.[](left=RelSubset#70,right=RelSubset#71,condition==($1,
> $3),joinType=left), rowcount=1500.0, cumulative cost={inf}
> 
> rel#78:Subset#2.FLOWDB.[], best=null, importance=0.9
> 
> rel#80:PhysicalJoin.FLOWDB.[](le

[jira] [Created] (CALCITE-3885) Reestablish trace logging for rules queue and Volcano planner

2020-03-28 Thread Roman Kondakov (Jira)
Roman Kondakov created CALCITE-3885:
---

 Summary: Reestablish trace logging for rules queue and Volcano 
planner
 Key: CALCITE-3885
 URL: https://issues.apache.org/jira/browse/CALCITE-3885
 Project: Calcite
  Issue Type: Improvement
  Components: core
Reporter: Roman Kondakov
 Fix For: next


After fixing CALCITE-3753 the opportunity of tracing the rules queue and 
{{VolcanoPlanner}} search space state is disappeared.  We should restore this 
opportunity.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Join the community

2020-03-24 Thread Roman Kondakov
Hi everybody!

My name is Roman Kondakov. I use Calcite for building SQL layer on the
top of Apache Ignite.

I would like to join the Calcite community. I will start with minor
tasks (I have a couple on my mind) to understand your processes better.
I would also appreciate any help.

My Jira id is rkondakov.

Thank you in advance!

-- 
Kind Regards
Roman Kondakov



Re: When will the exchange node(Distribution) be added to the execution plan

2020-02-04 Thread Roman Kondakov
Hi Aron,

> 1. It seems in Calcite's main query process(via Prepare#prepareSql)
> there's no code to `addRelTraitDef(RelDistributionTraitDef.INSTANCE)`,
> and even no config, anyone know why?

AFAIK distributed systems that use Calcite as a Query optimizer (like
Drill, Flink, Ignite, etc) usually build their own planning
infrastructure. They don't use Prepare#prepareSql. Instead Parser,
SqlToRelConverter and Volcano planner are used by them directly. See
example in [1]. In this case you have more flexibility and you are
absolutely free to add any traits to planner.


> 2. I enable `useAbstractConvertersForConversion` and only register SMJ
> rule, the table has no collation when optimizing, it occurs error:
> 
> Missing conversions are EnumerableTableScan[sort: [] -> [0]] (2 cases)

Could you show the full stacktrace of error? As well as all rules that
you supplied to planner?


Thanks.


[1]
https://www.slideshare.net/JordanHalterman/introduction-to-apache-calcite


-- 
Kind Regards
Roman Kondakov


On 03.02.2020 16:38, JiaTao Tao wrote:
> The detail message is as follows, and I can see LogicalSort and
> LogicalExchange has been generated though ExpandConversionRule.
> 
> Missing conversions are EnumerableTableScan[sort: [] -> [0]] (2 cases)
> There are 2 empty subsets:
> Empty subset 0: rel#47:Subset#0.ENUMERABLE.[0].hash[0], the relevant
> part of the original plan is as follows
> 7:EnumerableTableScan(table=[[USERS]])
> 
> Empty subset 1: rel#49:Subset#1.ENUMERABLE.[0].hash[0], the relevant
> part of the original plan is as follows
> 8:EnumerableTableScan(table=[[JOBS]])
> 
> My table has no collation.
> 
> Regards!
> 
> Aron Tao
> 
> 
> JiaTao Tao  于2020年2月3日周一 下午8:38写道:
> 
>> Thank you very much, now I can see distribution in RelTrait, and I still
>> have some doubts:
>> 1. It seems in Calcite's main query process(via Prepare#prepareSql)
>> there's no code to `addRelTraitDef(RelDistributionTraitDef.INSTANCE)`,
>> and even no config, anyone know why?
>> 2. I enable `useAbstractConvertersForConversion` and only register SMJ
>> rule, the table has no collation when optimizing, it occurs error:
>>
>> Missing conversions are EnumerableTableScan[sort: [] -> [0]] (2 cases)
>>
>>
>> And when the table exposes collation, it just fine. How to make calcite
>> automatically add sort nodes, like Spark's ensure requirements.
>>
>> Regards!
>>
>> Aron Tao
>>
>>
>> Roman Kondakov  于2020年2月2日周日 下午7:26写道:
>>
>>> Hi
>>>
>>> If you want the distribution trait to be taken into account by
>>> optimizer, you need to register it:
>>>
>>> VolcanoPlanner planner = ...;
>>> planner.addRelTraitDef(RelDistributionTraitDef.INSTANCE);
>>>
>>> See example in [1].
>>>
>>> [1]
>>>
>>> https://github.com/apache/calcite/blob/a6f544eb48a87f4f71f76ed422584398c0c9baa3/core/src/test/java/org/apache/calcite/test/RelOptRulesTest.java#L6377
>>>
>>>
>>> --
>>> Kind Regards
>>> Roman Kondakov
>>>
>>>
>>> On 02.02.2020 08:01, JiaTao Tao wrote:
>>>> Hi
>>>> I wonder when will the exchange node be added to the execution plan. For
>>>> example, In Spark, if a join is SMJ(SortMergeJoin), it will add an
>>>> exchange and a sort node to the execution plan:
>>>>
>>>> 3631580619602_.pic.jpg
>>>>
>>>> In Calcite, Let me use CsvTest#testReadme for example and I can find a
>>>> sorting trait if the join is SMJ, but I can not find an exchange.
>>>>
>>>> The SQL:
>>>>
>>>> SELECT d.name <http://d.name>, COUNT(*) cnt
>>>> FROM emps AS e
>>>> JOIN depts AS d ON e.deptno = d.deptno
>>>> GROUP BY d.name <http://d.name>;
>>>>
>>>> The plan in volcano planner, see
>>>> `rel#76:EnumerableMergeJoin.ENUMERABLE.[[0], [2]]`, we can see the
>>>> conversion and the Collation, but no distribution.
>>>>
>>>> appendix
>>>>
>>>> Set#6, type: RecordType(INTEGER DEPTNO, VARCHAR NAME, INTEGER DEPTNO0)
>>>> rel#51:Subset#6.NONE.[], best=null, importance=0.6561
>>>>
>>>>
>>> rel#49:LogicalJoin.NONE.[](left=RelSubset#30,right=RelSubset#29,condition==($2,
>>>> $0),joinType=inner), rowcount=1500.0, cumulative cost={inf}
>>>>
>>>>
>>> rel#60:LogicalProject.NONE.[](input=RelSubset#32,DEPTNO=$1,NAME=$2,DEPTNO0=$0),
>>>> rowcount=1500.0, cumulative cost={inf}
>>>> rel#55:Subset#6.ENUMERABLE.[], best=rel#78,
>>>> importance=0.7291
>>>>
>>>>
>>> rel#70:EnumerableProject.ENUMERABLE.[](input=RelSubset#46,DEPTNO=$1,NAME=$2,DEPTNO0=$0),
>>>> rowcount=1500.0, cumulative cost={3686.517018598809 rows, 4626.25 cpu,
>>>> 0.0 io}
>>>> rel#76:EnumerableMergeJoin.ENUMERABLE.[[0],
>>>> [2]](left=RelSubset#74,right=RelSubset#75,condition==($2,
>>>> $0),joinType=inner), rowcount=1500.0, cumulative cost={inf}
>>>>
>>>>
>>> rel#78:EnumerableHashJoin.ENUMERABLE.[](left=RelSubset#30,right=RelSubset#69,condition==($0,
>>>> $2),joinType=inner), rowcount=1500.0, cumulative cost={2185.517018598809
>>>> rows, 126.25 cpu, 0.0 io}
>>>>
>>>> --
>>>> Regards!
>>>>
>>>> Aron Tao
>>>>
>>>>
>>>> --
>>>>
>>>> Regards!
>>>>
>>>> Aron Tao
>>>>
>>>
>>
> 


Re: When will the exchange node(Distribution) be added to the execution plan

2020-02-02 Thread Roman Kondakov
Hi

If you want the distribution trait to be taken into account by
optimizer, you need to register it:

VolcanoPlanner planner = ...;
planner.addRelTraitDef(RelDistributionTraitDef.INSTANCE);

See example in [1].

[1]
https://github.com/apache/calcite/blob/a6f544eb48a87f4f71f76ed422584398c0c9baa3/core/src/test/java/org/apache/calcite/test/RelOptRulesTest.java#L6377


-- 
Kind Regards
Roman Kondakov


On 02.02.2020 08:01, JiaTao Tao wrote:
> Hi
> I wonder when will the exchange node be added to the execution plan. For
> example, In Spark, if a join is SMJ(SortMergeJoin), it will add an
> exchange and a sort node to the execution plan:
> 
> 3631580619602_.pic.jpg
> 
> In Calcite, Let me use CsvTest#testReadme for example and I can find a
> sorting trait if the join is SMJ, but I can not find an exchange.
> 
> The SQL:
> 
> SELECT d.name <http://d.name>, COUNT(*) cnt
> FROM emps AS e
> JOIN depts AS d ON e.deptno = d.deptno
> GROUP BY d.name <http://d.name>;
> 
> The plan in volcano planner, see
> `rel#76:EnumerableMergeJoin.ENUMERABLE.[[0], [2]]`, we can see the
> conversion and the Collation, but no distribution.
> 
> appendix
> 
> Set#6, type: RecordType(INTEGER DEPTNO, VARCHAR NAME, INTEGER DEPTNO0)
>     rel#51:Subset#6.NONE.[], best=null, importance=0.6561
>        
> rel#49:LogicalJoin.NONE.[](left=RelSubset#30,right=RelSubset#29,condition==($2,
> $0),joinType=inner), rowcount=1500.0, cumulative cost={inf}
>        
> rel#60:LogicalProject.NONE.[](input=RelSubset#32,DEPTNO=$1,NAME=$2,DEPTNO0=$0),
> rowcount=1500.0, cumulative cost={inf}
>     rel#55:Subset#6.ENUMERABLE.[], best=rel#78,
> importance=0.7291
>        
> rel#70:EnumerableProject.ENUMERABLE.[](input=RelSubset#46,DEPTNO=$1,NAME=$2,DEPTNO0=$0),
> rowcount=1500.0, cumulative cost={3686.517018598809 rows, 4626.25 cpu,
> 0.0 io}
>         rel#76:EnumerableMergeJoin.ENUMERABLE.[[0],
> [2]](left=RelSubset#74,right=RelSubset#75,condition==($2,
> $0),joinType=inner), rowcount=1500.0, cumulative cost={inf}
>        
> rel#78:EnumerableHashJoin.ENUMERABLE.[](left=RelSubset#30,right=RelSubset#69,condition==($0,
> $2),joinType=inner), rowcount=1500.0, cumulative cost={2185.517018598809
> rows, 126.25 cpu, 0.0 io}
> 
> --
> Regards!
> 
> Aron Tao
> 
> 
> -- 
> 
> Regards!
> 
> Aron Tao
> 


Re: [DISCUSS] Proposal to add API to force rules matching specific rels

2020-01-12 Thread Roman Kondakov
@Haisheng

> Calcite uses Project operator and all kinds of ProjectXXXTranposeRule to 
> prune unused columns.

I also noticed that in most cases Project-related rules took significant
part of the planning time. But I didn't explore this problem yet.

> MEMO group (RelSet) represents logically equivalent expressions. All the 
> expressions in one group should share the same logical properties, e.g. 
> functional dependency, constraint properties etc. But they are not sharing 
> it. Computation is done for each individual operator.

I thought the equivalence of logical properties within groups (RelSets)
are implicit. For example, in RelSet#addInternal it is always verified
that the new added node has the same type as other members of the set.

Anyway I absolutely agree with you that problems with traits
propagation, rules matching and other that you mentioned in the previous
e-mails should be solved in the first place. We need first to make
Volcano optimizer work right and only then make some improvements like
search space pruning.

I would love to join this work to improve Volcano planner. Looking
forward for design doc.


-- 
Kind Regards
Roman Kondakov


On 11.01.2020 11:42, Haisheng Yuan wrote:
> Currently, Calcite uses Project operator and all kinds of 
> ProjectXXXTranposeRule to prune unused columns. Every operator's output 
> columns use index to reference child operators' columns. If there is a 
> Project operator with child operator of a Filter, if we push project down 
> under Filter, we will have Project-Filter-Project-FilterInput. All the newly 
> generated relnodes will trigger rule matches. e.g. If we already did 
> ReduceExpressionRule on filter, but due to the new filter RexCall's input ref 
> index changed, we have to apply ReduceExpressionRule on the new filter again, 
> even there is nothing can be reduced. Similarly new operator transpose/merge 
> rule will be triggered. This can trigger a lot of rule matches.
> 
> MEMO group (RelSet) represents logically equivalent expressions. All the 
> expressions in one group should share the same logical properties, e.g. 
> functional dependency, constraint properties etc. But they are not sharing 
> it. Computation is done for each individual operator.
> 
> Without resolving those issue, space pruning won't help much.
> 
> There are a lot of room for improvement. Hope the community can join the 
> effort to make Calcite better. 
> 
> - Haisheng
> 
> --
> 发件人:Roman Kondakov
> 日 期:2020年01月10日 19:39:51
> 收件人:
> 主 题:Re: [DISCUSS] Proposal to add API to force rules matching specific rels
> 
> @Haisheng, could you please clarify what you mean by these points?
> 
>> - the poor-design of column pruning, 
>> - lack of group properties etc.
> 
> I guess I'm not aware of these problems.
> 


Re: [DISCUSS] Proposal to add API to force rules matching specific rels

2020-01-10 Thread Roman Kondakov
@Haisheng, could you please clarify what you mean by these points?

> - the poor-design of column pruning, 
> - lack of group properties etc.

I guess I'm not aware of these problems.

-- 
Kind Regards
Roman Kondakov


On 08.01.2020 02:21, Haisheng Yuan wrote:
>> @Haisheng, are you doing something like that?
> Kind of, but not exactly. It is about on-demand trait propagation.
> 
> @Roman seems to be keen on space pruning for Calcite. But IMHO, for now, the 
> main reason of Calcite's poor performance is not lack of branch & bound space 
> puning, but 
> - rule applying on physical nodes, 
> - randomness of rule matching,
> - the poor-design of column pruning, 
> - lack of on-demand trait propagation, 
> - lack of group properties etc.
> 
> We tried a similar change with Roman's on our product. We totally removed 
> rule match importance and its comparison, split it into exploration, 
> implementation, enforcement 3 phases with specific top-down/bottom-up order, 
> it achieved almost 100% speedup.
> Even @vlsi's RexNode normalization can improve it to some degree.
> 
> Calcite currently generates only 1 join-order alternative for 6-way joins in 
> testJoinManyWay, not even top 10, 100  or N! ordering alternatives, but it 
> still can't finish within reasonable amount of time when abstract converter 
> is allowed. If there is only 1 join order alternative, the query optimizer 
> should finish the optimization quickly even for clique or chain queries with 
> 20 way joins, without space pruning. But this is not the case for Calcite.
> 
> Simply put it, space pruning is important for optimization, especially for 
> join-reordering, but not an urgent issue for Calcite.
> 
> - Haisheng
> 
> --
> 发件人:Roman Kondakov
> 日 期:2020年01月08日 02:39:19
> 收件人:
> 主 题:Re: [DISCUSS] Proposal to add API to force rules matching specific rels
> 
> I forgot to mention that this approach was inspired by Stamatis's idea [1]
> 
> [1]
> https://ponymail-vm.apache.org/_GUI_/thread.html/d8f8bc0efd091c0750534ca5cd224f4dfe8940c9d0a99ce486516fd5@%3Cdev.calcite.apache.org%3E
> 
> 


Re: [DISCUSS] Proposal to add API to force rules matching specific rels

2020-01-07 Thread Roman Kondakov
I forgot to mention that this approach was inspired by Stamatis's idea [1]

[1]
https://ponymail-vm.apache.org/_GUI_/thread.html/d8f8bc0efd091c0750534ca5cd224f4dfe8940c9d0a99ce486516fd5@%3Cdev.calcite.apache.org%3E


-- 
Kind Regards
Roman Kondakov


On 07.01.2020 21:14, Roman Kondakov wrote:
> Hello!
> 
> As Vladimir Ozerov mentioned, the general problem here is that
> VolcanoPlanner applies both logical and physical rules in a top-down
> manner. It's more like the original volcano paper approach [1]:
> 
>> In the Volcano search strategy, a first phase applied all
>> transformation rules to create all possible logical expressions for a query 
>> and all its subtrees. The second phase,
>> which performed the actual optimization, navigated within that network of 
>> equivalence classes and expressions,
>> applied implementation rules to obtain plans, and determined the best plan.
> The Cascades paper [2] says
> 
>> In the Cascades optimizer, this separation into two phases is abolished, 
>> because it is not useful to derive all
>> logically equivalent forms of all expressions, e.g., of a predicate. A group 
>> is explored using transformation rules
>> only on demand
> 
> Unfortunately, it is not clear from this paper what the actual "on
> demand" phrase means. But there is a great explanation of the Cascades
> optimization process can be found in [3]:
> 
>> It begins with the input query and, ..., 
>> proceeds top-down, using recursion on the inputs
>> of the current multiexpression MExpr. However, plan
>> costing actually proceeds bottom-up, based on the order
>> of the returns from the top-down recursive calls. 
> 
> and this is what VolcanoPlanner lacks for: when a logical rule is
> applied in a top-down way and a new LogicalRelNode is created, we need
> to understand whether the overall plan cost was improved by this move or
> not. In order to understand it we need to know the actual cost of the
> new plan. As [3] says:
> 
>> A plan is an expression made up entirely of physical operators.
> 
> Hence, to evaluate the plan cost we need to have fully implemented
> physical tree of operators for our new LogicalRelNode which was created
> during applying the logical rule above.
> 
> This process can be described with the pseudo-code:
> 
> for (LogicalRuleMatch rule : LogicalRuleMatchesSortedTopDown) {
>   LogicalNode newNode = rule.apply();
>   implementRecursivelyBottomUp(newNode);
> }
> 
> we first apply logical rules in a top-down fashion and then after each
> logical rule invocation we need to implement newly created nodes with
> the physical rules recursively in a bottom-up way.
> 
> The outcome here is when the physical converter rule is applied to a
> RelNode, all children of this node are already implemented, so their
> traits can easily be derived and the precise cost of them can be calculated.
> 
> I tried to simulate this behavior with minor changes in
> RuleQueue.RuleMatchImportanceComparator (see PR [4]) and it worked well
> for me. Tests started perform better. I counted planner ticks (see
> VolcanoPlanner#findBestExp) to evaluate performance improvements. For
> example, the number of ticks in JdbcTest#testJoinManyWay before
> comparator change:
> 
> ticks=11
> ticks=1041
> ticks=31362
> 
> and after:
> 
> ticks=11
> ticks=678
> ticks=26231
> 
> In some Apache Ignite test scenarios the number of ticks was reduced by
> half.
> 
> The essence of this change is the minor adjustment of
> RuleQueue.RuleMatchImportanceComparator:
> 
> - converter rules (physical rules) always have priority over the logical
> ones (if there are physical rules in the queue, they will be applied first)
> - Physical rules are ordered in the bottom-up fashion
> - Logical rules are ordered in the top-down fashion
> 
> For example, if we have the initial rel:
> 
> LogicalProject
>   LogicalFilter
> 
> the rule queue for this RelNode would be ordered like this:
> 
> 1. PhysicalFilterRule <-children physical rules go first
> 2. PhysicalProjectRule <- then go parent physical rules
> 3. FilterProjectTransposeRule <- logical rules go last
> 
> This minor improvement might help in some cases:
> - slight reducing the planning time.
> - simplification of the bottom-up trait propagation because when it's
> time to physically implement a parent node, all it's children are
> already physically implemented.
> 
> But in general this approach is not the actual Cascades implementation
> because it still missing one of the most useful Cascades feature: groups
> prunning. Details can be found in [3]. Long story short: we can skip
> opt

Re: [DISCUSS] Proposal to add API to force rules matching specific rels

2020-01-07 Thread Roman Kondakov
Hello!

As Vladimir Ozerov mentioned, the general problem here is that
VolcanoPlanner applies both logical and physical rules in a top-down
manner. It's more like the original volcano paper approach [1]:

> In the Volcano search strategy, a first phase applied all
> transformation rules to create all possible logical expressions for a query 
> and all its subtrees. The second phase,
> which performed the actual optimization, navigated within that network of 
> equivalence classes and expressions,
> applied implementation rules to obtain plans, and determined the best plan.
The Cascades paper [2] says

> In the Cascades optimizer, this separation into two phases is abolished, 
> because it is not useful to derive all
> logically equivalent forms of all expressions, e.g., of a predicate. A group 
> is explored using transformation rules
> only on demand

Unfortunately, it is not clear from this paper what the actual "on
demand" phrase means. But there is a great explanation of the Cascades
optimization process can be found in [3]:

> It begins with the input query and, ..., 
> proceeds top-down, using recursion on the inputs
> of the current multiexpression MExpr. However, plan
> costing actually proceeds bottom-up, based on the order
> of the returns from the top-down recursive calls. 

and this is what VolcanoPlanner lacks for: when a logical rule is
applied in a top-down way and a new LogicalRelNode is created, we need
to understand whether the overall plan cost was improved by this move or
not. In order to understand it we need to know the actual cost of the
new plan. As [3] says:

> A plan is an expression made up entirely of physical operators.

Hence, to evaluate the plan cost we need to have fully implemented
physical tree of operators for our new LogicalRelNode which was created
during applying the logical rule above.

This process can be described with the pseudo-code:

for (LogicalRuleMatch rule : LogicalRuleMatchesSortedTopDown) {
  LogicalNode newNode = rule.apply();
  implementRecursivelyBottomUp(newNode);
}

we first apply logical rules in a top-down fashion and then after each
logical rule invocation we need to implement newly created nodes with
the physical rules recursively in a bottom-up way.

The outcome here is when the physical converter rule is applied to a
RelNode, all children of this node are already implemented, so their
traits can easily be derived and the precise cost of them can be calculated.

I tried to simulate this behavior with minor changes in
RuleQueue.RuleMatchImportanceComparator (see PR [4]) and it worked well
for me. Tests started perform better. I counted planner ticks (see
VolcanoPlanner#findBestExp) to evaluate performance improvements. For
example, the number of ticks in JdbcTest#testJoinManyWay before
comparator change:

ticks=11
ticks=1041
ticks=31362

and after:

ticks=11
ticks=678
ticks=26231

In some Apache Ignite test scenarios the number of ticks was reduced by
half.

The essence of this change is the minor adjustment of
RuleQueue.RuleMatchImportanceComparator:

- converter rules (physical rules) always have priority over the logical
ones (if there are physical rules in the queue, they will be applied first)
- Physical rules are ordered in the bottom-up fashion
- Logical rules are ordered in the top-down fashion

For example, if we have the initial rel:

LogicalProject
  LogicalFilter

the rule queue for this RelNode would be ordered like this:

1. PhysicalFilterRule <-children physical rules go first
2. PhysicalProjectRule <- then go parent physical rules
3. FilterProjectTransposeRule <- logical rules go last

This minor improvement might help in some cases:
- slight reducing the planning time.
- simplification of the bottom-up trait propagation because when it's
time to physically implement a parent node, all it's children are
already physically implemented.

But in general this approach is not the actual Cascades implementation
because it still missing one of the most useful Cascades feature: groups
prunning. Details can be found in [3]. Long story short: we can skip
optimization of some RelNodes if it is known that it's children have the
bigger cost than the best RelNode in the same Subset. The more
fundamental changes are required to fix it.

@Haisheng, are you doing something like that?


[1] https://paperhub.s3.amazonaws.com/dace52a42c07f7f8348b08dc2b186061.pdf
[2]
https://www.cse.iitb.ac.in/infolab/Data/Courses/CS632/Papers/Cascades-graefe.pdf
[3]
https://15721.courses.cs.cmu.edu/spring2019/papers/22-optimizer1/shapiro-ideas2001.pdf
[4] https://github.com/apache/calcite/pull/1732


-- 
Kind Regards
Roman Kondakov


On 07.01.2020 16:54, Haisheng Yuan wrote:
> It is still on my radar.
> 
> I have been busy these days, but will send out a design doc in a few days. 
> But just a heads up, the change would be larger than everyone's expected.
>

Question about Volcano's planner root converters

2020-01-07 Thread Roman Kondakov
Hello!

I have a couple questions about method
VolcanoPlanner#ensureRootConverters [1]. In this method we first
calculate the difference of traits between root subset and other subsets
which belong to the same set as the root:

ImmutableList difference =
  root.getTraitSet().difference(subset.getTraitSet());

and then, if the differences list size equals to 1, we register a new
AbstractConverter from the given subset's trait to the root trait.

The problems I can see here are caused by the strict condition that
difference.size() should be equals to 1 in order to create converters to
the root traits. This requirement may often lead to CanNotPlanException.
Let's consider an example: In the root set  we have a root node with
traits of distribution SINGLETON and empty sortedness [] (I omitted
convention trait for simplicity) and Subset of physical nodes with
distribution HASH and sortedness [1]. Set dump would look like this in
the given case:

Set#1 <-- root set
  rel#29:Subset#1.SINGLETON.[] <-- root subset
[empty subset]
  rel#20:Subset#2.HASH.[1]
rel#27:IgniteTableScan.HASH.[1]

Root converters were not created here because the difference size
between traitsets was 2: HASH != SINGLETON and [] != [1]. But it looks
strange because an empty collation [] includes [1] as well as all
possible collations. So, the actual traits difference size is 1. And
converter from HASH to SINGLETON might be created here.

My questions are
1. Should we change the condition of root converters creation from the
direct traitset difference to the difference with considering traits
hierarchy (like [] includes [1])?
2. Should we cover the case when the actual traits difference size is
more than 1? I.e. root traits are SINGLETON.[2] and neighbor traits are
HASH.[1]. In this case we can create the chain of converters:

Subset.HASH.[1] ->
AbstractConverter[HASH->SINGLETON] ->
AbstractConverter[[1]->[2]] ->
RootSubset[SINGLETON.[2]]

This may help us to avoid CanNotPlanException in many cases at the cost
of creation of N! chains of abstract converters. I guess N should be
relatively small here (N <= 3) because traits dimensionality is usually
small.

What do you think?


[1]
https://github.com/apache/calcite/blob/963a266d7b3859d141964217f4d16cae350b10e1/core/src/main/java/org/apache/calcite/plan/volcano/VolcanoPlanner.java#L675

-- 
Kind Regards
Roman Kondakov



Re: Volcano's problem with trait propagation: current state and future

2019-12-05 Thread Roman Kondakov

Vladimir,

thank you for bringing it up. We are facing the same problems in Apache 
Ignite project
and it would be great if Apache Calcite community will propose a 
solution for this

issue.

From my point of view an approach with abstract converters looks more 
promising, but as
you mentioned it suffers from polluting the search space. The latter can 
be mitigated by
splitting a planning stage into the several phases: you shouldn't 
register all logical rules in the planner simultaneously - it looks like 
it is better to have several iterations of planning stage with different 
sets of rules, as Drill does.


Also I'd like to mention that decoupling the logical planning from the 
physical one looks
a bit weird to me because it violates the idea of Cascades framework. 
Possibly this decoupling is the consequence of some performance issues.



--
Kind Regards
Roman Kondakov

On 05.12.2019 14:24, Vladimir Ozerov wrote:

Hi,

As I mentioned before, we are building a distributed SQL engine that uses
Apache Calcite for query optimization. The key problem we faced is the
inability to pull the physical traits of child relations efficiently. I'd
like to outline my understanding of the problem (I guess it was already
discussed multiple times) and ask the community to prove or disprove the
existence of that problem and its severity for the products which uses
Apache Calcite and ask for ideas on how it could be improved in the future.

I'll start with the simplified problem description and mentioned more
complex use cases then. Consider that we have a logical tree and a set of
implementation rules. Our goal is to find the optimal physical tree by
applying these rules. The classical Cascades-based approach directs the
optimization process from the top to the bottom (hence "top-down").
However, the actual implementation of tree nodes still happens bottom-up.
For the tree L1 <- L2, we enter "optimize(L1)", which recursively delegates
to "optimize(L2)". We then implement children nodes L1 <- [P2', P2''], and
return back to the parent, which is now able to pick promising
implementations of the children nodes and reject bad ones with the
branch-and-bound approach. AFAIK Pivotal's Orca works this way.

The Apache Calcite is very different because it doesn't allow the recursion
so that we lose the context on which node requested the child
transformation. This loss of context leads to the following problems:
1) The parent node cannot deduce it's physical properties during the
execution of the implementation rule, because Calcite expects the
transformation to be applied before children nodes are implemented. That is
if we are optimizing LogicalProject <- LogicalScan, we cannot set proper
distribution and collation for the to be created PhysicalProject, because
it depends on the distribution and collation of the children which is yet
to be resolved.
2) The branch-and-bound cannot be used because it requires at least one
fully-built physical subtree.

As a result of this limitation, products which rely on Apache Calcite for
query optimization, use one or several workarounds:

*1) Guess the physical properties of parent nodes before logical children
are implemented*
*Apache Flink* uses this strategy. The strategy is bad because of the
number of combinations of traits growth exponentially with the number of
attributes in the given RelNode, so you either explode the search space or
give up optimization opportunities. Consider the following tree:
LogicalSort[a ASC] <- LogicalFilter <- LogicalScan
The optimal implementation of the LogicalFilter is PhysicalFilter[collation=a
ASC] because it may eliminate the parent sort. But such optimization should
happen only if we know that there is a physical implementation of scan
allowing for this sort order, e.g. PhysicalIndexScan[collation=a ASC]. I.e.
we need to know the child physical properties first. Otherwise we fallback
to speculative approaches. With the *optimistic* approach, we emit all
possible combinations of physical properties, with the hope that the child
will satisfy some of them, thus expanding the search space exponentially.
With the *pessimistic* approach, we just miss this optimization opportunity
even if the index exists. Apache Flink uses the pessimistic approach.

*2) Use AbstractConverters*
*Apache Drill* uses this strategy. The idea is to "glue" logical and
physical operators, so that implementation of a physical child re-triggers
implementation rule of a logical parent. The flow is as follows:
- Invoke parent implementation rule - it either doesn't produce new
physical nodes or produce not optimized physical nodes (like in the Apache
Flink case)
- Invoke children implementation rules and create physical children
- Then converters kick-in and re-trigger parent implementation rule through
the creation of an abstract converter
- Finally, the parent implementati

Using secondary indexes for accessing tables

2019-05-11 Thread Roman Kondakov

Hi,

I am investigating the possibility of using Apache Calcite as a 
cost-based optimizer for Apache Ignite project [1]. Apache Calcite looks 
a very powerful and versatile tool for the such kind of tasks. I've 
started this investigation a couple days ago - read docs, played with 
examples, explored some existing integrations (i.e. hive, drill, 
phoenix, etc.)


As I understand [2] Calcite doesn't support secondary indexes, isn't it? 
If so, is there any workarounds to handle it or integrations where this 
feature is implemented? Apache Ignite allows users to create B+tree 
based indexes over the tables but lacks of a good query optimizer. It 
would be great if the execution plan generated by Calcite contains not 
only full table scans but also index scans when appropriate.


Thank you in advance!

[1] https://ignite.apache.org/
[2]http://mail-archives.apache.org/mod_mbox/calcite-dev/201505.mbox/%3CCAAqfHNoQnr%2BeUVoH9nFxhB7FEDwdYAjV_Smq%3DY75fxsqGs66Yw%40mail.gmail.com%3E
--
Kind Regards
Roman Kondakov