[ https://issues.apache.org/jira/browse/CALCITE-2973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16807393#comment-16807393 ]
Lai Zhou edited comment on CALCITE-2973 at 4/2/19 9:28 AM: ----------------------------------------------------------- [~julianhyde] , consider another query that the join conditions contains an equi condition and a non-equi condition meanwhile : {code:java} SELECT t1.i_item_desc FROM item t1 LEFT OUTER JOIN item_1 t2 ON t1.i_item_sk=t2.i_item_sk and t2.i_item_sk <10000{code} Merge join is also good for this query. But now it will be converted to a nested loop join. I have a try to replace the default ENUMERABLE_JOIN_RULE by a customized rule: {code:java} final JoinInfo info = JoinInfo.of(left, right, join.getCondition()); if (!info.isEqui() && join.getJoinType() != JoinRelType.INNER) { // EnumerableJoinRel only supports equi-join. We can put a filter on top // if it is an inner join. try { boolean hasEquiKeys = !info.leftKeys.isEmpty() && !info.rightKeys.isEmpty(); if (hasEquiKeys) { return convertToThetaMergeJoin(rel); } else { return new EnumerableThetaJoin(cluster, traitSet, left, right, join.getCondition(), join.getVariablesSet(), join.getJoinType()); } } catch (Exception e) { EnumerableRules.LOGGER.debug(e.toString()); return null; } } {code} if the join has equi-keys, it will be converted to an EnumerableThetaMergeJoin . {code:java} new EnumerableThetaMergeJoin(cluster, traits, left, right, info.getEquiCondition(left, right, cluster.getRexBuilder()), info.getRemaining(cluster.getRexBuilder()), info.leftKeys, info.rightKeys, join.getVariablesSet(), join.getJoinType());{code} I implement the EnumerableThetaMergeJoin to handle a theta join with equi keys . The key difference of EnumerableThetaMergeJoin and EnumerableMergeJoin is that: EnumerableThetaMergeJoin use a predicate generated by the remaining part of the JoinInfo, and the predicate will be applied on the cartesians result of a merge join. see [https://github.com/apache/calcite/blob/27d883983e76691f9294e5edd9e264b978dfa7e9/linq4j/src/main/java/org/apache/calcite/linq4j/EnumerableDefaults.java#L3298] I do some changes: {code:java} public TResult current() { final List<Object> list = cartesians.current(); @SuppressWarnings("unchecked") final TSource left = (TSource) list.get(0); @SuppressWarnings("unchecked") final TInner right = (TInner) list.get(1); //apply predicate for the result in cartesians boolean isNonEquiPredicateSatisfied=predicate.apply(left, right); if (!isNonEquiPredicateSatisfied) { if (generateNullsOnLeft) { return resultSelector.apply(null, right); } if (generateNullsOnRight) { return resultSelector.apply(left, null); } } return resultSelector.apply(left, right); } {code} was (Author: hhlai1990): [~julianhyde] , consider another query that the join conditions contains an equi condition and a non-equi condition meanwhile : {code:java} SELECT t1.i_item_desc FROM item t1 LEFT OUTER JOIN item_1 t2 ON t1.i_item_sk=t2.i_item_sk and t2.i_item_sk <10000{code} Merge join is also good for this query. But now it will be converted to a nested loop join. I have a try to replace the default ENUMERABLE_JOIN_RULE by a customized rule: {code:java} final JoinInfo info = JoinInfo.of(left, right, join.getCondition()); if (!info.isEqui() && join.getJoinType() != JoinRelType.INNER) { // EnumerableJoinRel only supports equi-join. We can put a filter on top // if it is an inner join. try { boolean hasEquiKeys = !info.leftKeys.isEmpty() && !info.rightKeys.isEmpty(); if (hasEquiKeys) { return convertToThetaMergeJoin(rel); } else { return new EnumerableThetaJoin(cluster, traitSet, left, right, join.getCondition(), join.getVariablesSet(), join.getJoinType()); } } catch (Exception e) { EnumerableRules.LOGGER.debug(e.toString()); return null; } } {code} if the join has equi-keys, it will be converted an EnumerableThetaMergeJoin . {code:java} new EnumerableThetaMergeJoin(cluster, traits, left, right, info.getEquiCondition(left, right, cluster.getRexBuilder()), info.getRemaining(cluster.getRexBuilder()), info.leftKeys, info.rightKeys, join.getVariablesSet(), join.getJoinType());{code} I implement the EnumerableThetaMergeJoin to handle a theta join with equi keys . The key difference of EnumerableThetaMergeJoin and EnumerableMergeJoin is that: EnumerableThetaMergeJoin use a predicate generated by the remaining part of the JoinInfo, and the predicate will be applied on the cartesians result of a merge join. see [https://github.com/apache/calcite/blob/27d883983e76691f9294e5edd9e264b978dfa7e9/linq4j/src/main/java/org/apache/calcite/linq4j/EnumerableDefaults.java#L3298] I do some changes: {code:java} public TResult current() { final List<Object> list = cartesians.current(); @SuppressWarnings("unchecked") final TSource left = (TSource) list.get(0); @SuppressWarnings("unchecked") final TInner right = (TInner) list.get(1); //apply predicate for the result in cartesians boolean isNonEquiPredicateSatisfied=predicate.apply(left, right); if (!isNonEquiPredicateSatisfied) { if (generateNullsOnLeft) { return resultSelector.apply(null, right); } if (generateNullsOnRight) { return resultSelector.apply(left, null); } } return resultSelector.apply(left, right); } {code} > Make EnumerableMergeJoinRule to support a theta join > ---------------------------------------------------- > > Key: CALCITE-2973 > URL: https://issues.apache.org/jira/browse/CALCITE-2973 > Project: Calcite > Issue Type: New Feature > Components: core > Affects Versions: 1.19.0 > Reporter: Lai Zhou > Priority: Minor > > Now the EnumerableMergeJoinRule only supports an inner and equi join. > If users make a theta-join query for a large dataset (such as 10000*10000), > the nested-loop join process will take dozens of time than the sort-merge > join process . > So if we can apply merge-join or hash-join rule for a theta join, it will > improve the performance greatly. -- This message was sent by Atlassian JIRA (v7.6.3#76005)