Re: When will the exchange node(Distribution) be added to the execution plan

Roman Kondakov Tue, 04 Feb 2020 01:22:01 -0800

Hi Aron,

> 1. It seems in Calcite's main query process(via Prepare#prepareSql)
> there's no code to `addRelTraitDef(RelDistributionTraitDef.INSTANCE)`,
> and even no config, anyone know why?


AFAIK distributed systems that use Calcite as a Query optimizer (like
Drill, Flink, Ignite, etc) usually build their own planning
infrastructure. They don't use Prepare#prepareSql. Instead Parser,
SqlToRelConverter and Volcano planner are used by them directly. See
example in [1]. In this case you have more flexibility and you are
absolutely free to add any traits to planner.


> 2. I enable `useAbstractConvertersForConversion` and only register SMJ
> rule, the table has no collation when optimizing, it occurs error:
> 
> Missing conversions are EnumerableTableScan[sort: [] -> [0]] (2 cases)

Could you show the full stacktrace of error? As well as all rules that
you supplied to planner?


Thanks.


[1]
https://www.slideshare.net/JordanHalterman/introduction-to-apache-calcite


-- 
Kind Regards
Roman Kondakov


On 03.02.2020 16:38, JiaTao Tao wrote:
> The detail message is as follows, and I can see LogicalSort and
> LogicalExchange has been generated though ExpandConversionRule.
> 
> Missing conversions are EnumerableTableScan[sort: [] -> [0]] (2 cases)
> There are 2 empty subsets:
> Empty subset 0: rel#47:Subset#0.ENUMERABLE.[0].hash[0], the relevant
> part of the original plan is as follows
> 7:EnumerableTableScan(table=[[USERS]])
> 
> Empty subset 1: rel#49:Subset#1.ENUMERABLE.[0].hash[0], the relevant
> part of the original plan is as follows
> 8:EnumerableTableScan(table=[[JOBS]])
> 
> My table has no collation.
> 
> Regards!
> 
> Aron Tao
> 
> 
> JiaTao Tao <taojia...@gmail.com> 于2020年2月3日周一 下午8:38写道：
> 
>> Thank you very much, now I can see distribution in RelTrait, and I still
>> have some doubts:
>> 1. It seems in Calcite's main query process(via Prepare#prepareSql)
>> there's no code to `addRelTraitDef(RelDistributionTraitDef.INSTANCE)`,
>> and even no config, anyone know why?
>> 2. I enable `useAbstractConvertersForConversion` and only register SMJ
>> rule, the table has no collation when optimizing, it occurs error:
>>
>> Missing conversions are EnumerableTableScan[sort: [] -> [0]] (2 cases)
>>
>>
>> And when the table exposes collation, it just fine. How to make calcite
>> automatically add sort nodes, like Spark's ensure requirements.
>>
>> Regards!
>>
>> Aron Tao
>>
>>
>> Roman Kondakov <kondako...@mail.ru.invalid> 于2020年2月2日周日 下午7:26写道：
>>
>>> Hi
>>>
>>> If you want the distribution trait to be taken into account by
>>> optimizer, you need to register it:
>>>
>>> VolcanoPlanner planner = ...;
>>> planner.addRelTraitDef(RelDistributionTraitDef.INSTANCE);
>>>
>>> See example in [1].
>>>
>>> [1]
>>>
>>> https://github.com/apache/calcite/blob/a6f544eb48a87f4f71f76ed422584398c0c9baa3/core/src/test/java/org/apache/calcite/test/RelOptRulesTest.java#L6377
>>>
>>>
>>> --
>>> Kind Regards
>>> Roman Kondakov
>>>
>>>
>>> On 02.02.2020 08:01, JiaTao Tao wrote:
>>>> Hi
>>>> I wonder when will the exchange node be added to the execution plan. For
>>>> example, In Spark, if a join is SMJ(SortMergeJoin), it will add an
>>>> exchange and a sort node to the execution plan:
>>>>
>>>> 3631580619602_.pic.jpg
>>>>
>>>> In Calcite, Let me use CsvTest#testReadme for example and I can find a
>>>> sorting trait if the join is SMJ, but I can not find an exchange.
>>>>
>>>> The SQL:
>>>>
>>>> SELECT d.name <http://d.name>, COUNT(*) cnt
>>>> FROM emps AS e
>>>> JOIN depts AS d ON e.deptno = d.deptno
>>>> GROUP BY d.name <http://d.name>;
>>>>
>>>> The plan in volcano planner, see
>>>> `rel#76:EnumerableMergeJoin.ENUMERABLE.[[0], [2]]`, we can see the
>>>> conversion and the Collation, but no distribution.
>>>>
>>>> appendix
>>>>
>>>> Set#6, type: RecordType(INTEGER DEPTNO, VARCHAR NAME, INTEGER DEPTNO0)
>>>>     rel#51:Subset#6.NONE.[], best=null, importance=0.6561
>>>>
>>>>
>>> rel#49:LogicalJoin.NONE.[](left=RelSubset#30,right=RelSubset#29,condition==($2,
>>>> $0),joinType=inner), rowcount=1500.0, cumulative cost={inf}
>>>>
>>>>
>>> rel#60:LogicalProject.NONE.[](input=RelSubset#32,DEPTNO=$1,NAME=$2,DEPTNO0=$0),
>>>> rowcount=1500.0, cumulative cost={inf}
>>>>     rel#55:Subset#6.ENUMERABLE.[], best=rel#78,
>>>> importance=0.7290000000000001
>>>>
>>>>
>>> rel#70:EnumerableProject.ENUMERABLE.[](input=RelSubset#46,DEPTNO=$1,NAME=$2,DEPTNO0=$0),
>>>> rowcount=1500.0, cumulative cost={3686.517018598809 rows, 4626.25 cpu,
>>>> 0.0 io}
>>>>         rel#76:EnumerableMergeJoin.ENUMERABLE.[[0],
>>>> [2]](left=RelSubset#74,right=RelSubset#75,condition==($2,
>>>> $0),joinType=inner), rowcount=1500.0, cumulative cost={inf}
>>>>
>>>>
>>> rel#78:EnumerableHashJoin.ENUMERABLE.[](left=RelSubset#30,right=RelSubset#69,condition==($0,
>>>> $2),joinType=inner), rowcount=1500.0, cumulative cost={2185.517018598809
>>>> rows, 126.25 cpu, 0.0 io}
>>>>
>>>> --
>>>> Regards!
>>>>
>>>> Aron Tao
>>>>
>>>>
>>>> --
>>>>
>>>> Regards!
>>>>
>>>> Aron Tao
>>>>
>>>
>>
>

Re: When will the exchange node(Distribution) be added to the execution plan

Reply via email to