[ 
https://issues.apache.org/jira/browse/HIVE-18908?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline updated HIVE-18908:
--------------------------------
    Attachment: HIVE-18908.09996.patch

> FULL OUTER JOIN to MapJoin
> --------------------------
>
>                 Key: HIVE-18908
>                 URL: https://issues.apache.org/jira/browse/HIVE-18908
>             Project: Hive
>          Issue Type: Improvement
>          Components: Hive
>            Reporter: Matt McCline
>            Assignee: Matt McCline
>            Priority: Critical
>         Attachments: FULL OUTER MapJoin Code Changes.docx, 
> HIVE-18908.01.patch, HIVE-18908.02.patch, HIVE-18908.03.patch, 
> HIVE-18908.04.patch, HIVE-18908.05.patch, HIVE-18908.06.patch, 
> HIVE-18908.08.patch, HIVE-18908.09.patch, HIVE-18908.091.patch, 
> HIVE-18908.092.patch, HIVE-18908.093.patch, HIVE-18908.096.patch, 
> HIVE-18908.097.patch, HIVE-18908.098.patch, HIVE-18908.099.patch, 
> HIVE-18908.0991.patch, HIVE-18908.0992.patch, HIVE-18908.0993.patch, 
> HIVE-18908.0994.patch, HIVE-18908.0995.patch, HIVE-18908.0996.patch, 
> HIVE-18908.0997.patch, HIVE-18908.0998.patch, HIVE-18908.0999.patch, 
> HIVE-18908.09991.patch, HIVE-18908.09992.patch, HIVE-18908.09993.patch, 
> HIVE-18908.09994.patch, HIVE-18908.09995.patch, HIVE-18908.09996.patch, JOIN 
> to MAPJOIN Transformation.pdf, SHARED-MEMORY FULL OUTER MapJoin.pdf
>
>
> Currently, we do not support FULL OUTER JOIN in MapJoin.
> Rough TPC-DS timings run on laptop:
> (NOTE: Query 51 has PTF as a bigger serial portion -- Amdahl's law at play)
> FULL OUTER MapJoin OFF =  MergeJoin
> Query 51:
> o     Vectorization OFF
> •     FULL OUTER MapJoin OFF: 4:30 minutes
> •     FULL OUTER MapJoin ON: 4:37 minutes
> o     Vectorization ON
> •     FULL OUTER MapJoin OFF: 2:35 minutes
> •     FULL OUTER MapJoin ON: 1:47 minutes
> Query 97:
> o     Vectorization OFF
> •     FULL OUTER MapJoin OFF: 2:37 minutes
> •     FULL OUTER MapJoin ON: 2:42 minutes
> o     Vectorization ON
> •     FULL OUTER MapJoin OFF: 1:17 minutes
> •     FULL OUTER MapJoin ON: 0:06 minutes
> FULL OUTER Join 10,000,000 rows against 323,910 small table keys
> o     Vectorization ON
> •     FULL OUTER MapJoin OFF: 14:56 minutes
> •     FULL OUTER MapJoin ON: 1:45 minutes
> FULL OUTER Join 10,000,000 rows against 1,000 small table keys
> o     Vectorization ON
> •     FULL OUTER MapJoin OFF: 12:37 minutes
> •     FULL OUTER MapJoin ON: 1:38 minutes
> Hopefully, someone will do large scale cluster testing.  
> [DynamicPartitionedHashJoin] MapJoin should scale dramatically better than 
> [Sort] MergeJoin reduce-shuffle.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to