[ 
https://issues.apache.org/jira/browse/CALCITE-2973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16840194#comment-16840194
 ] 

Ruben Quesada Lopez edited comment on CALCITE-2973 at 5/15/19 9:02 AM:
-----------------------------------------------------------------------

Digging into my previous comment, I believe that there is an alternative that 
may do the job without breaking things (I think):
- EnumerableJoin will no longer extend EquiJoin.
- It will continue having only the original "condition" field (no need to add 
remainCondition as a new field).
- The "condition" will now be any type of condition (equi / non-equi)
- EnumerableJoinRule will generate EnumerableThetaJoin (i.e. NestedLoopJoin) 
for "pure non equi-joins"
- EnumerableJoinRule will generate EnumerableJoin (i.e. HashJoin, with possibly 
extra predicate) for "pure and partial equi-joins"
- Inside EnumerableJoin#implement method, the "remainCondition" will be 
calculated on the fly, using {{Join#analyzecondition}} (or {{JoinInfo#of}}) 
method. If the condition is pure equi, the remainCondition will be null; if the 
condition is not pure equi, the remaining condition will be taken from 
{{NonEquiJoinInfo.remaining}} and will be passed to the new 
{{EnumerableJoin#generatePredicate}} method to create the extra predicate to be 
passed to {{BuiltInMethod.HASH_JOIN.method}}, which can remain as it is right 
now in the PR.

[~hhlai1990], I'm not sure if my explanation above is clear, let me know if you 
have any questions or you see any issues on the logic behind.

[UPDATE after seeing [~hhlai1990] last comment] 
If the only problem with EnumerableJoin not extending EquiJoin is the 
FilterJoinRule, maybe we could change it. The problematic line that you pasted 
in a previous comment:
https://github.com/apache/calcite/blob/ee83efd360793ef4201f4cdfc2af8d837b76ca69/core/src/main/java/org/apache/calcite/rel/rules/FilterJoinRule.java#L165
could be changed, instead of:
{code:java}
...
!(join instanceof EquiJoin),
...
{code}

We could use:
{code:java}
...
!join.analyzeCondition().isEqui(),
...
{code}

Which I think is safer, because IMHO the {{instaceof EquiJoin}} is deceiving, 
because right now it may be possible to create an EquiJoin instance with a 
non-equi condition (e.g. via {{RelBuilder#semiJoin}}), so I'd say that as a 
general rule it is safer to analyze the condition, rather that using the 
{{instaceof EquiJoin}}. With the modification I'm proposing, we will have 
EnumerableJoins (which do not extend EquiJoin) but which may have pure 
equi-join conditions, so the analyzeCondition should work fine in all cases.


was (Author: rubenql):
Digging into my previous comment, I believe that there is an alternative that 
may do the job without breaking things (I think):
- EnumerableJoin will no longer extend EquiJoin.
- It will continue having only the original "condition" field (no need to add 
remainCondition as a new field).
- The "condition" will now be any type of condition (equi / non-equi)
- EnumerableJoinRule will generate EnumerableThetaJoin (i.e. NestedLoopJoin) 
for "pure non equi-joins"
- EnumerableJoinRule will generate EnumerableJoin (i.e. HashJoin, with possibly 
extra predicate) for "pure and partial equi-joins"
- Inside EnumerableJoin#implement method, the "remainCondition" will be 
calculated on the fly, using {{Join#analyzecondition}} (or {{JoinInfo#of}}) 
method. If the condition is pure equi, the remainCondition will be null; if the 
condition is not pure equi, the remaining condition will be taken from 
{{NonEquiJoinInfo.remaining}} and will be passed to the new 
{{EnumerableJoin#generatePredicate}} method to create the extra predicate to be 
passed to {{BuiltInMethod.HASH_JOIN.method}}, which can remain as it is right 
now in the PR.

[~hhlai1990], I'm not sure if my explanation above is clear, let me know if you 
have any questions or you see any issues on the logic behind.

[UPDATE after seeing [~hhlai1990] last comment] 
If the only problem with EnumerableJoin not extending EquiJoin is the 
FilterJoinRule, maybe we could change it. The problematic line that you pasted 
in a previous comment:
https://github.com/apache/calcite/blob/ee83efd360793ef4201f4cdfc2af8d837b76ca69/core/src/main/java/org/apache/calcite/rel/rules/FilterJoinRule.java#L165
could be changed, instead of:
{code:java}
...
!(join instanceof EquiJoin),
...
{code}

We could use:
{code:java}
...
!join.analyzeCondition().isEqui(),
...
{code}

Which I think is safer, because IMHO the {{instaceof EquiJoin}} is deceiving, 
because right now it may be possible to create an EquiJoin instance with a 
non-equi condition, so I'd say that as a general rule it is safer to analyze 
the condition, rather that using the {{instaceof EquiJoin}}. With the 
modification I'm proposing, we will have EnumerableJoins (which are not 
EquiJoins) but which may have equi-join conditions, so the analyzeCondition 
should work fine in all cases.

> Allow theta joins that have equi conditions to be executed using a hash join 
> algorithm
> --------------------------------------------------------------------------------------
>
>                 Key: CALCITE-2973
>                 URL: https://issues.apache.org/jira/browse/CALCITE-2973
>             Project: Calcite
>          Issue Type: New Feature
>          Components: core
>    Affects Versions: 1.19.0
>            Reporter: Lai Zhou
>            Priority: Minor
>              Labels: pull-request-available
>             Fix For: 1.20.0
>
>          Time Spent: 3h 50m
>  Remaining Estimate: 0h
>
> Now the EnumerableMergeJoinRule only supports an inner and equi join.
> If users make a theta-join query  for a large dataset (such as 10000*10000), 
> the nested-loop join process will take dozens of time than the sort-merge 
> join process .
> So if we can apply merge-join or hash-join rule for a theta join, it will 
> improve the performance greatly.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to