[jira] [Comment Edited] (CALCITE-3221) Add a sort-merge union algorithm

Julian Hyde (Jira) Sun, 24 May 2020 12:52:48 -0700


    [ 
https://issues.apache.org/jira/browse/CALCITE-3221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17115432#comment-17115432
 ]


Julian Hyde edited comment on CALCITE-3221 at 5/24/20, 7:51 PM:
----------------------------------------------------------------

I was thinking of {{UnionToDistinctRule}}.  I suspect HerdDB is already using 
it, in which case their {{EnumerableUnion}} will always have {{all=true}}.

I agree with you that allowing {{EnumerableUnion(all=false)}} is not great. We 
may have allowed it because {{EnumerableIntersect(all=false)}} and 
{{EnumerableMinus(all=false)}} are a bit tricky to implement as separate 
operations.


was (Author: julianhyde):
I was thinking of {{UnionToDistinctRule}}.  I suspect HerdDB is already using 
it, in which case their {{EnumerableUnion}} will always have {{all=true}}.

> Add a sort-merge union algorithm
> --------------------------------
>
>                 Key: CALCITE-3221
>                 URL: https://issues.apache.org/jira/browse/CALCITE-3221
>             Project: Calcite
>          Issue Type: Improvement
>          Components: core
>    Affects Versions: 1.19.0
>            Reporter: Stamatis Zampetakis
>            Priority: Minor
>         Attachments: screenshot-1.png
>
>
> Currently, the union operation offered by Calcite is based on a {{HashSet}} 
> (see 
> [EnumerableDefaults.union|https://github.com/apache/calcite/blob/d98856bf1a5f5c151d004b769e14bdd368a67234/linq4j/src/main/java/org/apache/calcite/linq4j/EnumerableDefaults.java#L2747])
>  and necessitates reading in memory all rows before returning a single 
> result.   
> Apart from increased memory consumption the operator is blocking and also 
> destroys the order of its inputs.  
> The goal of this issue is to add a new union algorithm (EnumerableMergeUnion 
> ?) exploiting the fact that the inputs are sorted which consumes less memory 
> and retains the order of its inputs.   
> Most likely the implementation of the merge join can be useful.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Comment Edited] (CALCITE-3221) Add a sort-merge union algorithm

Reply via email to