[jira] [Created] (CALCITE-3766) Add a Builder to RelHint

2020-02-03 Thread Danny Chen (Jira)
Danny Chen created CALCITE-3766:
---

 Summary: Add a Builder to RelHint
 Key: CALCITE-3766
 URL: https://issues.apache.org/jira/browse/CALCITE-3766
 Project: Calcite
  Issue Type: Sub-task
  Components: core
Affects Versions: 1.21.0
Reporter: Danny Chen
Assignee: Danny Chen
 Fix For: 1.22.0


Add a builder to RelHint to constructor the it conveniently.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Calcite-Master - Build # 1590 - Failure

2020-02-03 Thread Apache Jenkins Server
The Apache Jenkins build system has built Calcite-Master (build #1590)

Status: Failure

Check console output at https://builds.apache.org/job/Calcite-Master/1590/ to 
view the results.

Re: Suitability of Avatica for Apache Arrow Flight JDBC Driver

2020-02-03 Thread Jacques Nadeau
Hey Andy,

I totally forgot that Avatica added all that extra stuff. I didn't
originally have it back in the day :)

Drill and Dremio both use Avatica without using the protocol/server/etc.

On Mon, Feb 3, 2020 at 2:45 PM Andy Grove  wrote:

> Hi,
>
> I have started building a JDBC driver for Apache Arrow Flight [1] and it
> has been suggested that I use Avatica instead of building from scratch.
> However, I'm not sure if Avatica is really designed for this use case since
> I would not require the Avatica wire protocol or server process. The Flight
> JDBC driver needs to use the Flight protocol [2] to interact with servers
> implementing that protocol.
>
> I could definitely see value in extending Avatica base classes to get
> things like all the ResultSet type conversion logic and DatabaseMetaData
> functionality since that is tedious to implement but it wasn't clear from
> the documentation if that was possible. I also have a slight concern
> (possibly unfounded) about basing the driver on the Avatica type system
> rather than the Arrow type system in case there are concepts that don't map
> cleanly.
>
> I'd appreciate any advice on the best path here.
>
> Thanks,
>
> Andy.
>
> [1] https://github.com/apache/arrow/pull/6343
> [2] https://arrow.apache.org/blog/2019/10/13/introducing-arrow-flight/
>


Suitability of Avatica for Apache Arrow Flight JDBC Driver

2020-02-03 Thread Andy Grove
Hi,

I have started building a JDBC driver for Apache Arrow Flight [1] and it
has been suggested that I use Avatica instead of building from scratch.
However, I'm not sure if Avatica is really designed for this use case since
I would not require the Avatica wire protocol or server process. The Flight
JDBC driver needs to use the Flight protocol [2] to interact with servers
implementing that protocol.

I could definitely see value in extending Avatica base classes to get
things like all the ResultSet type conversion logic and DatabaseMetaData
functionality since that is tedious to implement but it wasn't clear from
the documentation if that was possible. I also have a slight concern
(possibly unfounded) about basing the driver on the Avatica type system
rather than the Arrow type system in case there are concepts that don't map
cleanly.

I'd appreciate any advice on the best path here.

Thanks,

Andy.

[1] https://github.com/apache/arrow/pull/6343
[2] https://arrow.apache.org/blog/2019/10/13/introducing-arrow-flight/


Question about HASH_DISTRIBUTED

2020-02-03 Thread JiaTao Tao
Hi

When I see the example of HASH_DISTRIBUTED,
e.g. RelDistributions.hash(ImmutableList.of(0)), or in
RelDistributionImpl#satisfies
I can't find the info about "hash partition num" of "hash func", if we
don't know these message, how can we perform a Bucket Join (Collocated
Join)?
Or do I miss something?

Regards!

Aron Tao


Re: When will the exchange node(Distribution) be added to the execution plan

2020-02-03 Thread JiaTao Tao
The detail message is as follows, and I can see LogicalSort and
LogicalExchange has been generated though ExpandConversionRule.

Missing conversions are EnumerableTableScan[sort: [] -> [0]] (2 cases)
There are 2 empty subsets:
Empty subset 0: rel#47:Subset#0.ENUMERABLE.[0].hash[0], the relevant
part of the original plan is as follows
7:EnumerableTableScan(table=[[USERS]])

Empty subset 1: rel#49:Subset#1.ENUMERABLE.[0].hash[0], the relevant
part of the original plan is as follows
8:EnumerableTableScan(table=[[JOBS]])

My table has no collation.

Regards!

Aron Tao


JiaTao Tao  于2020年2月3日周一 下午8:38写道:

> Thank you very much, now I can see distribution in RelTrait, and I still
> have some doubts:
> 1. It seems in Calcite's main query process(via Prepare#prepareSql)
> there's no code to `addRelTraitDef(RelDistributionTraitDef.INSTANCE)`,
> and even no config, anyone know why?
> 2. I enable `useAbstractConvertersForConversion` and only register SMJ
> rule, the table has no collation when optimizing, it occurs error:
>
> Missing conversions are EnumerableTableScan[sort: [] -> [0]] (2 cases)
>
>
> And when the table exposes collation, it just fine. How to make calcite
> automatically add sort nodes, like Spark's ensure requirements.
>
> Regards!
>
> Aron Tao
>
>
> Roman Kondakov  于2020年2月2日周日 下午7:26写道:
>
>> Hi
>>
>> If you want the distribution trait to be taken into account by
>> optimizer, you need to register it:
>>
>> VolcanoPlanner planner = ...;
>> planner.addRelTraitDef(RelDistributionTraitDef.INSTANCE);
>>
>> See example in [1].
>>
>> [1]
>>
>> https://github.com/apache/calcite/blob/a6f544eb48a87f4f71f76ed422584398c0c9baa3/core/src/test/java/org/apache/calcite/test/RelOptRulesTest.java#L6377
>>
>>
>> --
>> Kind Regards
>> Roman Kondakov
>>
>>
>> On 02.02.2020 08:01, JiaTao Tao wrote:
>> > Hi
>> > I wonder when will the exchange node be added to the execution plan. For
>> > example, In Spark, if a join is SMJ(SortMergeJoin), it will add an
>> > exchange and a sort node to the execution plan:
>> >
>> > 3631580619602_.pic.jpg
>> >
>> > In Calcite, Let me use CsvTest#testReadme for example and I can find a
>> > sorting trait if the join is SMJ, but I can not find an exchange.
>> >
>> > The SQL:
>> >
>> > SELECT d.name , COUNT(*) cnt
>> > FROM emps AS e
>> > JOIN depts AS d ON e.deptno = d.deptno
>> > GROUP BY d.name ;
>> >
>> > The plan in volcano planner, see
>> > `rel#76:EnumerableMergeJoin.ENUMERABLE.[[0], [2]]`, we can see the
>> > conversion and the Collation, but no distribution.
>> >
>> > appendix
>> >
>> > Set#6, type: RecordType(INTEGER DEPTNO, VARCHAR NAME, INTEGER DEPTNO0)
>> > rel#51:Subset#6.NONE.[], best=null, importance=0.6561
>> >
>> >
>> rel#49:LogicalJoin.NONE.[](left=RelSubset#30,right=RelSubset#29,condition==($2,
>> > $0),joinType=inner), rowcount=1500.0, cumulative cost={inf}
>> >
>> >
>> rel#60:LogicalProject.NONE.[](input=RelSubset#32,DEPTNO=$1,NAME=$2,DEPTNO0=$0),
>> > rowcount=1500.0, cumulative cost={inf}
>> > rel#55:Subset#6.ENUMERABLE.[], best=rel#78,
>> > importance=0.7291
>> >
>> >
>> rel#70:EnumerableProject.ENUMERABLE.[](input=RelSubset#46,DEPTNO=$1,NAME=$2,DEPTNO0=$0),
>> > rowcount=1500.0, cumulative cost={3686.517018598809 rows, 4626.25 cpu,
>> > 0.0 io}
>> > rel#76:EnumerableMergeJoin.ENUMERABLE.[[0],
>> > [2]](left=RelSubset#74,right=RelSubset#75,condition==($2,
>> > $0),joinType=inner), rowcount=1500.0, cumulative cost={inf}
>> >
>> >
>> rel#78:EnumerableHashJoin.ENUMERABLE.[](left=RelSubset#30,right=RelSubset#69,condition==($0,
>> > $2),joinType=inner), rowcount=1500.0, cumulative cost={2185.517018598809
>> > rows, 126.25 cpu, 0.0 io}
>> >
>> > --
>> > Regards!
>> >
>> > Aron Tao
>> >
>> >
>> > --
>> >
>> > Regards!
>> >
>> > Aron Tao
>> >
>>
>


Re: When will the exchange node(Distribution) be added to the execution plan

2020-02-03 Thread JiaTao Tao
Thank you very much, now I can see distribution in RelTrait, and I still
have some doubts:
1. It seems in Calcite's main query process(via Prepare#prepareSql) there's
no code to `addRelTraitDef(RelDistributionTraitDef.INSTANCE)`, and even no
config, anyone know why?
2. I enable `useAbstractConvertersForConversion` and only register SMJ
rule, the table has no collation when optimizing, it occurs error:

Missing conversions are EnumerableTableScan[sort: [] -> [0]] (2 cases)


And when the table exposes collation, it just fine. How to make calcite
automatically add sort nodes, like Spark's ensure requirements.

Regards!

Aron Tao


Roman Kondakov  于2020年2月2日周日 下午7:26写道:

Hi

If you want the distribution trait to be taken into account by
optimizer, you need to register it:

VolcanoPlanner planner = ...;
planner.addRelTraitDef(RelDistributionTraitDef.INSTANCE);

See example in [1].

[1]
https://github.com/apache/calcite/blob/a6f544eb48a87f4f71f76ed422584398c0c9baa3/core/src/test/java/org/apache/calcite/test/RelOptRulesTest.java#L6377


-- 
Kind Regards
Roman Kondakov


On 02.02.2020 08:01, JiaTao Tao wrote:
> Hi
> I wonder when will the exchange node be added to the execution plan. For
> example, In Spark, if a join is SMJ(SortMergeJoin), it will add an
> exchange and a sort node to the execution plan:
>
> 3631580619602_.pic.jpg
>
> In Calcite, Let me use CsvTest#testReadme for example and I can find a
> sorting trait if the join is SMJ, but I can not find an exchange.
>
> The SQL:
>
> SELECT d.name , COUNT(*) cnt
> FROM emps AS e
> JOIN depts AS d ON e.deptno = d.deptno
> GROUP BY d.name ;
>
> The plan in volcano planner, see
> `rel#76:EnumerableMergeJoin.ENUMERABLE.[[0], [2]]`, we can see the
> conversion and the Collation, but no distribution.
>
> appendix
>
> Set#6, type: RecordType(INTEGER DEPTNO, VARCHAR NAME, INTEGER DEPTNO0)
> rel#51:Subset#6.NONE.[], best=null, importance=0.6561
>
>
rel#49:LogicalJoin.NONE.[](left=RelSubset#30,right=RelSubset#29,condition==($2,
> $0),joinType=inner), rowcount=1500.0, cumulative cost={inf}
>
>
rel#60:LogicalProject.NONE.[](input=RelSubset#32,DEPTNO=$1,NAME=$2,DEPTNO0=$0),
> rowcount=1500.0, cumulative cost={inf}
> rel#55:Subset#6.ENUMERABLE.[], best=rel#78,
> importance=0.7291
>
>
rel#70:EnumerableProject.ENUMERABLE.[](input=RelSubset#46,DEPTNO=$1,NAME=$2,DEPTNO0=$0),
> rowcount=1500.0, cumulative cost={3686.517018598809 rows, 4626.25 cpu,
> 0.0 io}
> rel#76:EnumerableMergeJoin.ENUMERABLE.[[0],
> [2]](left=RelSubset#74,right=RelSubset#75,condition==($2,
> $0),joinType=inner), rowcount=1500.0, cumulative cost={inf}
>
>
rel#78:EnumerableHashJoin.ENUMERABLE.[](left=RelSubset#30,right=RelSubset#69,condition==($0,
> $2),joinType=inner), rowcount=1500.0, cumulative cost={2185.517018598809
> rows, 126.25 cpu, 0.0 io}
>
> --
> Regards!
>
> Aron Tao
>
>
> --
>
> Regards!
>
> Aron Tao
>