[ https://issues.apache.org/jira/browse/DRILL-173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jacques Nadeau updated DRILL-173: --------------------------------- Component/s: Execution - Operators > Join operator should reuse ValueVectors when duplicate keys are present > ----------------------------------------------------------------------- > > Key: DRILL-173 > URL: https://issues.apache.org/jira/browse/DRILL-173 > Project: Apache Drill > Issue Type: Improvement > Components: Execution - Operators > Affects Versions: m1 > Reporter: Ben Becker > Labels: optimization > Fix For: Future > > > There are cases where joining two record batches can result in redundant > work. Consider a merge join performed on two tables (*t1* and *t2*) with > duplicate keys on both sides: > h5. t1 > || key || value || > | 2 | 'a' | > | 2 | 'b' | > h5. t2 > || key || value || > | 2 | 'A' | > | 2 | 'B' | > | 2 | 'C' | > The resulting table will contain the cross product of all key values '2': > || key || t1.value || t2.value || > | 2 | 'a' | 'A' | > | 2 | 'a' | 'B' | > | 2 | 'a' | 'C' | > | 2 | 'b' | 'A' | > | 2 | 'b' | 'B' | > | 2 | 'b' | 'C' | > The current implementation iteratively copies t2.value from the incoming > vectors. Ideally, the t2.value vector would only be iteratively constructed > the first pass; after that it can be copied. -- This message was sent by Atlassian JIRA (v6.3.4#6332)