[jira] [Updated] (DRILL-173) Join operator should reuse ValueVectors when duplicate keys are present

Jacques Nadeau (JIRA) Sun, 04 Jan 2015 13:29:22 -0800

     [ 
https://issues.apache.org/jira/browse/DRILL-173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Jacques Nadeau updated DRILL-173:
---------------------------------
    Component/s: Execution - Operators

> Join operator should reuse ValueVectors when duplicate keys are present
> -----------------------------------------------------------------------
>
>                 Key: DRILL-173
>                 URL: https://issues.apache.org/jira/browse/DRILL-173
>             Project: Apache Drill
>          Issue Type: Improvement
>          Components: Execution - Operators
>    Affects Versions: m1
>            Reporter: Ben Becker
>              Labels: optimization
>             Fix For: Future
>
>
> There are cases where joining two record batches can result in redundant 
> work.  Consider a merge join performed on two tables (*t1* and *t2*) with 
> duplicate keys on both sides:
> h5. t1
> || key || value ||
> | 2 | 'a' |
> | 2 | 'b' |
> h5. t2
> || key || value ||
> | 2 | 'A' |
> | 2 | 'B' |
> | 2 | 'C' |
> The resulting table will contain the cross product of all key values '2':
> || key || t1.value || t2.value ||
> | 2 | 'a' | 'A' |
> | 2 | 'a' | 'B' |
> | 2 | 'a' | 'C' |
> | 2 | 'b' | 'A' |
> | 2 | 'b' | 'B' |
> | 2 | 'b' | 'C' |
> The current implementation iteratively copies t2.value from the incoming 
> vectors.  Ideally, the t2.value vector would only be iteratively constructed 
> the first pass; after that it can be copied.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (DRILL-173) Join operator should reuse ValueVectors when duplicate keys are present

Reply via email to