[ 
https://issues.apache.org/jira/browse/CALCITE-2973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16815943#comment-16815943
 ] 

Lai Zhou edited comment on CALCITE-2973 at 4/18/19 3:02 AM:
------------------------------------------------------------

[~julianhyde],[~zabetak],[~hyuan]

I made a PR to improve the EnumerableJoin.

Since EnumerableMergeJoin is never taken ,I change the summary to "Allow theta 
joins that have equi conditions to be executed using a hash join algorithm."

Now  a join rel node will be converted  to an EnumerableJoin if it has mixed 
equi and non-equi conditions.

see 
[https://github.com/apache/calcite/blob/16098ab6ff68797b4eaad90718dcae8e83047e2b/core/src/main/java/org/apache/calcite/adapter/enumerable/EnumerableJoinRule.java#L62|https://github.com/apache/calcite/blob/2251c82f209612d8ae31e2e7a42acdb2bcb15d55/core/src/main/java/org/apache/calcite/adapter/enumerable/EnumerableJoinRule.java#L62]

Now EnumerableJoin can handle a per-row condition, I introduce a the 
remainCondition to generate the predicate for the join.

see

[https://github.com/apache/calcite/blob/16098ab6ff68797b4eaad90718dcae8e83047e2b/core/src/main/java/org/apache/calcite/adapter/enumerable/EnumerableJoin.java#L250|https://github.com/apache/calcite/blob/2251c82f209612d8ae31e2e7a42acdb2bcb15d55/core/src/main/java/org/apache/calcite/adapter/enumerable/EnumerableJoin.java#L250]

I also introduce a new  algorithm  to support join with predicate. 

see

[https://github.com/apache/calcite/blob/16098ab6ff68797b4eaad90718dcae8e83047e2b/linq4j/src/main/java/org/apache/calcite/linq4j/EnumerableDefaults.java#L1061|https://github.com/apache/calcite/blob/2251c82f209612d8ae31e2e7a42acdb2bcb15d55/linq4j/src/main/java/org/apache/calcite/linq4j/EnumerableDefaults.java#L1061]

 

 

 


was (Author: hhlai1990):
[~julianhyde],[~zabetak],[~hyuan]

I made a PR to improve the EnumerableJoin.

Since EnumerableMergeJoin is never taken ,I change the summary to "Allow theta 
joins that have equi conditions to be executed using a hash join algorithm."

Now  a join rel node will be converted  to an EnumerableJoin if it has mixed 
equi and non-equi conditions.

see 
[https://github.com/apache/calcite/blob/16098ab6ff68797b4eaad90718dcae8e83047e2b/core/src/main/java/org/apache/calcite/adapter/enumerable/EnumerableJoinRule.java#L62|https://github.com/apache/calcite/blob/2251c82f209612d8ae31e2e7a42acdb2bcb15d55/core/src/main/java/org/apache/calcite/adapter/enumerable/EnumerableJoinRule.java#L62]

Now EnumerableJoin can handle a per-row condition, I introduce a the 
remainCondition to generate the predicate for the join.

see

[https://github.com/apache/calcite/blob/16098ab6ff68797b4eaad90718dcae8e83047e2b/core/src/main/java/org/apache/calcite/adapter/enumerable/EnumerableJoin.java#L250|https://github.com/apache/calcite/blob/2251c82f209612d8ae31e2e7a42acdb2bcb15d55/core/src/main/java/org/apache/calcite/adapter/enumerable/EnumerableJoin.java#L250]

I also introduce a new  method to support join with predicate,  it doesn't 
affect  the old join method .

see

[https://github.com/apache/calcite/blob/16098ab6ff68797b4eaad90718dcae8e83047e2b/linq4j/src/main/java/org/apache/calcite/linq4j/EnumerableDefaults.java#L1061|https://github.com/apache/calcite/blob/2251c82f209612d8ae31e2e7a42acdb2bcb15d55/linq4j/src/main/java/org/apache/calcite/linq4j/EnumerableDefaults.java#L1061]

 

 

 

> Allow theta joins that have equi conditions to be executed using a hash join 
> algorithm
> --------------------------------------------------------------------------------------
>
>                 Key: CALCITE-2973
>                 URL: https://issues.apache.org/jira/browse/CALCITE-2973
>             Project: Calcite
>          Issue Type: New Feature
>          Components: core
>    Affects Versions: 1.19.0
>            Reporter: Lai Zhou
>            Priority: Minor
>              Labels: pull-request-available
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> Now the EnumerableMergeJoinRule only supports an inner and equi join.
> If users make a theta-join query  for a large dataset (such as 10000*10000), 
> the nested-loop join process will take dozens of time than the sort-merge 
> join process .
> So if we can apply merge-join or hash-join rule for a theta join, it will 
> improve the performance greatly.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to