Re: When will the exchange node(Distribution) be added to the execution plan

JiaTao Tao Tue, 04 Feb 2020 06:57:32 -0800

Thank you for your patience, really appreciate it!


Regards!

Aron Tao


Roman Kondakov <kondako...@mail.ru.invalid> 于2020年2月4日周二 下午5:21写道：

> Hi Aron,
>
> > 1. It seems in Calcite's main query process(via Prepare#prepareSql)
> > there's no code to `addRelTraitDef(RelDistributionTraitDef.INSTANCE)`,
> > and even no config, anyone know why?
>
> AFAIK distributed systems that use Calcite as a Query optimizer (like
> Drill, Flink, Ignite, etc) usually build their own planning
> infrastructure. They don't use Prepare#prepareSql. Instead Parser,
> SqlToRelConverter and Volcano planner are used by them directly. See
> example in [1]. In this case you have more flexibility and you are
> absolutely free to add any traits to planner.
>
>
> > 2. I enable `useAbstractConvertersForConversion` and only register SMJ
> > rule, the table has no collation when optimizing, it occurs error:
> >
> > Missing conversions are EnumerableTableScan[sort: [] -> [0]] (2 cases)
>
> Could you show the full stacktrace of error? As well as all rules that
> you supplied to planner?
>
>
> Thanks.
>
>
> [1]
> https://www.slideshare.net/JordanHalterman/introduction-to-apache-calcite
>
>
> --
> Kind Regards
> Roman Kondakov
>
>
> On 03.02.2020 16:38, JiaTao Tao wrote:
> > The detail message is as follows, and I can see LogicalSort and
> > LogicalExchange has been generated though ExpandConversionRule.
> >
> > Missing conversions are EnumerableTableScan[sort: [] -> [0]] (2 cases)
> > There are 2 empty subsets:
> > Empty subset 0: rel#47:Subset#0.ENUMERABLE.[0].hash[0], the relevant
> > part of the original plan is as follows
> > 7:EnumerableTableScan(table=[[USERS]])
> >
> > Empty subset 1: rel#49:Subset#1.ENUMERABLE.[0].hash[0], the relevant
> > part of the original plan is as follows
> > 8:EnumerableTableScan(table=[[JOBS]])
> >
> > My table has no collation.
> >
> > Regards!
> >
> > Aron Tao
> >
> >
> > JiaTao Tao <taojia...@gmail.com> 于2020年2月3日周一 下午8:38写道：
> >
> >> Thank you very much, now I can see distribution in RelTrait, and I still
> >> have some doubts:
> >> 1. It seems in Calcite's main query process(via Prepare#prepareSql)
> >> there's no code to `addRelTraitDef(RelDistributionTraitDef.INSTANCE)`,
> >> and even no config, anyone know why?
> >> 2. I enable `useAbstractConvertersForConversion` and only register SMJ
> >> rule, the table has no collation when optimizing, it occurs error:
> >>
> >> Missing conversions are EnumerableTableScan[sort: [] -> [0]] (2 cases)
> >>
> >>
> >> And when the table exposes collation, it just fine. How to make calcite
> >> automatically add sort nodes, like Spark's ensure requirements.
> >>
> >> Regards!
> >>
> >> Aron Tao
> >>
> >>
> >> Roman Kondakov <kondako...@mail.ru.invalid> 于2020年2月2日周日 下午7:26写道：
> >>
> >>> Hi
> >>>
> >>> If you want the distribution trait to be taken into account by
> >>> optimizer, you need to register it:
> >>>
> >>> VolcanoPlanner planner = ...;
> >>> planner.addRelTraitDef(RelDistributionTraitDef.INSTANCE);
> >>>
> >>> See example in [1].
> >>>
> >>> [1]
> >>>
> >>>
> https://github.com/apache/calcite/blob/a6f544eb48a87f4f71f76ed422584398c0c9baa3/core/src/test/java/org/apache/calcite/test/RelOptRulesTest.java#L6377
> >>>
> >>>
> >>> --
> >>> Kind Regards
> >>> Roman Kondakov
> >>>
> >>>
> >>> On 02.02.2020 08:01, JiaTao Tao wrote:
> >>>> Hi
> >>>> I wonder when will the exchange node be added to the execution plan.
> For
> >>>> example, In Spark, if a join is SMJ(SortMergeJoin), it will add an
> >>>> exchange and a sort node to the execution plan:
> >>>>
> >>>> 3631580619602_.pic.jpg
> >>>>
> >>>> In Calcite, Let me use CsvTest#testReadme for example and I can find a
> >>>> sorting trait if the join is SMJ, but I can not find an exchange.
> >>>>
> >>>> The SQL:
> >>>>
> >>>> SELECT d.name <http://d.name>, COUNT(*) cnt
> >>>> FROM emps AS e
> >>>> JOIN depts AS d ON e.deptno = d.deptno
> >>>> GROUP BY d.name <http://d.name>;
> >>>>
> >>>> The plan in volcano planner, see
> >>>> `rel#76:EnumerableMergeJoin.ENUMERABLE.[[0], [2]]`, we can see the
> >>>> conversion and the Collation, but no distribution.
> >>>>
> >>>> appendix
> >>>>
> >>>> Set#6, type: RecordType(INTEGER DEPTNO, VARCHAR NAME, INTEGER DEPTNO0)
> >>>>     rel#51:Subset#6.NONE.[], best=null, importance=0.6561
> >>>>
> >>>>
> >>>
> rel#49:LogicalJoin.NONE.[](left=RelSubset#30,right=RelSubset#29,condition==($2,
> >>>> $0),joinType=inner), rowcount=1500.0, cumulative cost={inf}
> >>>>
> >>>>
> >>>
> rel#60:LogicalProject.NONE.[](input=RelSubset#32,DEPTNO=$1,NAME=$2,DEPTNO0=$0),
> >>>> rowcount=1500.0, cumulative cost={inf}
> >>>>     rel#55:Subset#6.ENUMERABLE.[], best=rel#78,
> >>>> importance=0.7290000000000001
> >>>>
> >>>>
> >>>
> rel#70:EnumerableProject.ENUMERABLE.[](input=RelSubset#46,DEPTNO=$1,NAME=$2,DEPTNO0=$0),
> >>>> rowcount=1500.0, cumulative cost={3686.517018598809 rows, 4626.25 cpu,
> >>>> 0.0 io}
> >>>>         rel#76:EnumerableMergeJoin.ENUMERABLE.[[0],
> >>>> [2]](left=RelSubset#74,right=RelSubset#75,condition==($2,
> >>>> $0),joinType=inner), rowcount=1500.0, cumulative cost={inf}
> >>>>
> >>>>
> >>>
> rel#78:EnumerableHashJoin.ENUMERABLE.[](left=RelSubset#30,right=RelSubset#69,condition==($0,
> >>>> $2),joinType=inner), rowcount=1500.0, cumulative
> cost={2185.517018598809
> >>>> rows, 126.25 cpu, 0.0 io}
> >>>>
> >>>> --
> >>>> Regards!
> >>>>
> >>>> Aron Tao
> >>>>
> >>>>
> >>>> --
> >>>>
> >>>> Regards!
> >>>>
> >>>> Aron Tao
> >>>>
> >>>
> >>
> >
>

Re: When will the exchange node(Distribution) be added to the execution plan

Reply via email to