[ https://issues.apache.org/jira/browse/CALCITE-2973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16840194#comment-16840194 ]
Ruben Quesada Lopez edited comment on CALCITE-2973 at 5/15/19 9:02 AM: ----------------------------------------------------------------------- Digging into my previous comment, I believe that there is an alternative that may do the job without breaking things (I think): - EnumerableJoin will no longer extend EquiJoin. - It will continue having only the original "condition" field (no need to add remainCondition as a new field). - The "condition" will now be any type of condition (equi / non-equi) - EnumerableJoinRule will generate EnumerableThetaJoin (i.e. NestedLoopJoin) for "pure non equi-joins" - EnumerableJoinRule will generate EnumerableJoin (i.e. HashJoin, with possibly extra predicate) for "pure and partial equi-joins" - Inside EnumerableJoin#implement method, the "remainCondition" will be calculated on the fly, using {{Join#analyzecondition}} (or {{JoinInfo#of}}) method. If the condition is pure equi, the remainCondition will be null; if the condition is not pure equi, the remaining condition will be taken from {{NonEquiJoinInfo.remaining}} and will be passed to the new {{EnumerableJoin#generatePredicate}} method to create the extra predicate to be passed to {{BuiltInMethod.HASH_JOIN.method}}, which can remain as it is right now in the PR. [~hhlai1990], I'm not sure if my explanation above is clear, let me know if you have any questions or you see any issues on the logic behind. [UPDATE after seeing [~hhlai1990] last comment] If the only problem with EnumerableJoin not extending EquiJoin is the FilterJoinRule, maybe we could change it. The problematic line that you pasted in a previous comment: https://github.com/apache/calcite/blob/ee83efd360793ef4201f4cdfc2af8d837b76ca69/core/src/main/java/org/apache/calcite/rel/rules/FilterJoinRule.java#L165 could be changed, instead of: {code:java} ... !(join instanceof EquiJoin), ... {code} We could use: {code:java} ... !join.analyzeCondition().isEqui(), ... {code} Which I think is safer, because IMHO the {{instaceof EquiJoin}} is deceiving, because right now it may be possible to create an EquiJoin instance with a non-equi condition (e.g. via {{RelBuilder#semiJoin}}), so I'd say that as a general rule it is safer to analyze the condition, rather that using the {{instaceof EquiJoin}}. With the modification I'm proposing, we will have EnumerableJoins (which do not extend EquiJoin) but which may have pure equi-join conditions, so the analyzeCondition should work fine in all cases. was (Author: rubenql): Digging into my previous comment, I believe that there is an alternative that may do the job without breaking things (I think): - EnumerableJoin will no longer extend EquiJoin. - It will continue having only the original "condition" field (no need to add remainCondition as a new field). - The "condition" will now be any type of condition (equi / non-equi) - EnumerableJoinRule will generate EnumerableThetaJoin (i.e. NestedLoopJoin) for "pure non equi-joins" - EnumerableJoinRule will generate EnumerableJoin (i.e. HashJoin, with possibly extra predicate) for "pure and partial equi-joins" - Inside EnumerableJoin#implement method, the "remainCondition" will be calculated on the fly, using {{Join#analyzecondition}} (or {{JoinInfo#of}}) method. If the condition is pure equi, the remainCondition will be null; if the condition is not pure equi, the remaining condition will be taken from {{NonEquiJoinInfo.remaining}} and will be passed to the new {{EnumerableJoin#generatePredicate}} method to create the extra predicate to be passed to {{BuiltInMethod.HASH_JOIN.method}}, which can remain as it is right now in the PR. [~hhlai1990], I'm not sure if my explanation above is clear, let me know if you have any questions or you see any issues on the logic behind. [UPDATE after seeing [~hhlai1990] last comment] If the only problem with EnumerableJoin not extending EquiJoin is the FilterJoinRule, maybe we could change it. The problematic line that you pasted in a previous comment: https://github.com/apache/calcite/blob/ee83efd360793ef4201f4cdfc2af8d837b76ca69/core/src/main/java/org/apache/calcite/rel/rules/FilterJoinRule.java#L165 could be changed, instead of: {code:java} ... !(join instanceof EquiJoin), ... {code} We could use: {code:java} ... !join.analyzeCondition().isEqui(), ... {code} Which I think is safer, because IMHO the {{instaceof EquiJoin}} is deceiving, because right now it may be possible to create an EquiJoin instance with a non-equi condition, so I'd say that as a general rule it is safer to analyze the condition, rather that using the {{instaceof EquiJoin}}. With the modification I'm proposing, we will have EnumerableJoins (which are not EquiJoins) but which may have equi-join conditions, so the analyzeCondition should work fine in all cases. > Allow theta joins that have equi conditions to be executed using a hash join > algorithm > -------------------------------------------------------------------------------------- > > Key: CALCITE-2973 > URL: https://issues.apache.org/jira/browse/CALCITE-2973 > Project: Calcite > Issue Type: New Feature > Components: core > Affects Versions: 1.19.0 > Reporter: Lai Zhou > Priority: Minor > Labels: pull-request-available > Fix For: 1.20.0 > > Time Spent: 3h 50m > Remaining Estimate: 0h > > Now the EnumerableMergeJoinRule only supports an inner and equi join. > If users make a theta-join query for a large dataset (such as 10000*10000), > the nested-loop join process will take dozens of time than the sort-merge > join process . > So if we can apply merge-join or hash-join rule for a theta join, it will > improve the performance greatly. -- This message was sent by Atlassian JIRA (v7.6.3#76005)