[ 
https://issues.apache.org/jira/browse/CALCITE-3667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17010288#comment-17010288
 ] 

Jin Xing commented on CALCITE-3667:
-----------------------------------

Hi, Julian ~

I think I didn't make myself clear ~

What I want to do in PR-1717 is actually Enumerables.where (I missed this 
method and brought in ConditionalEnumerator).

We generate a cartesian enumerator for each single matched key pair from LHS 
and RHS in MergeJoinEnumerator#advance, and filter the enumerator by predicate. 
That's why I said "the complexity is decided by the duplication of the keys":

1.  If lots of duplicated keys exist in LHS or RHS, a single cartesian 
enumerator can contain large number of rows

2. If most of the keys from LHS and RHS are distinct, cartesian enumerator 
contains smaller number of rows.

We cannot use TakeWhileEnumerator and shouldn't stop when the predicate 
evaluates to be false. Because the cartesian enumerator cannot guarantee that 
rows which satisfy predicate are consecutive and start from the beginning.

PR-1702 fixes this Jira by *mergeJoin(...).where(...)*. It's good for me. So I 
closed PR-1717.

 

Thanks a lot for your help, Julian :):)

 

Best,

Jin

> EnumerableMergeJoin should not use  take-while enumerator
> ---------------------------------------------------------
>
>                 Key: CALCITE-3667
>                 URL: https://issues.apache.org/jira/browse/CALCITE-3667
>             Project: Calcite
>          Issue Type: Improvement
>            Reporter: Jin Xing
>            Assignee: Jin Xing
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 1h
>  Remaining Estimate: 0h
>
> Currently EnumerableMergeJoin use take-while enumerator [1] to emit values 
> satisfy predicate. However take-while enumerator stops the enumeration at 
> once when predicate is failed, which is not correct and we should finish the 
> enumeration and extract all the qualified  values.
> [1] 
> https://github.com/apache/calcite/blob/master/linq4j/src/main/java/org/apache/calcite/linq4j/EnumerableDefaults.java#L3896



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to