[ https://issues.apache.org/jira/browse/HIVE-18908?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Matt McCline updated HIVE-18908: -------------------------------- Attachment: HIVE-18908.09996.patch > FULL OUTER JOIN to MapJoin > -------------------------- > > Key: HIVE-18908 > URL: https://issues.apache.org/jira/browse/HIVE-18908 > Project: Hive > Issue Type: Improvement > Components: Hive > Reporter: Matt McCline > Assignee: Matt McCline > Priority: Critical > Attachments: FULL OUTER MapJoin Code Changes.docx, > HIVE-18908.01.patch, HIVE-18908.02.patch, HIVE-18908.03.patch, > HIVE-18908.04.patch, HIVE-18908.05.patch, HIVE-18908.06.patch, > HIVE-18908.08.patch, HIVE-18908.09.patch, HIVE-18908.091.patch, > HIVE-18908.092.patch, HIVE-18908.093.patch, HIVE-18908.096.patch, > HIVE-18908.097.patch, HIVE-18908.098.patch, HIVE-18908.099.patch, > HIVE-18908.0991.patch, HIVE-18908.0992.patch, HIVE-18908.0993.patch, > HIVE-18908.0994.patch, HIVE-18908.0995.patch, HIVE-18908.0996.patch, > HIVE-18908.0997.patch, HIVE-18908.0998.patch, HIVE-18908.0999.patch, > HIVE-18908.09991.patch, HIVE-18908.09992.patch, HIVE-18908.09993.patch, > HIVE-18908.09994.patch, HIVE-18908.09995.patch, HIVE-18908.09996.patch, JOIN > to MAPJOIN Transformation.pdf, SHARED-MEMORY FULL OUTER MapJoin.pdf > > > Currently, we do not support FULL OUTER JOIN in MapJoin. > Rough TPC-DS timings run on laptop: > (NOTE: Query 51 has PTF as a bigger serial portion -- Amdahl's law at play) > FULL OUTER MapJoin OFF = MergeJoin > Query 51: > o Vectorization OFF > • FULL OUTER MapJoin OFF: 4:30 minutes > • FULL OUTER MapJoin ON: 4:37 minutes > o Vectorization ON > • FULL OUTER MapJoin OFF: 2:35 minutes > • FULL OUTER MapJoin ON: 1:47 minutes > Query 97: > o Vectorization OFF > • FULL OUTER MapJoin OFF: 2:37 minutes > • FULL OUTER MapJoin ON: 2:42 minutes > o Vectorization ON > • FULL OUTER MapJoin OFF: 1:17 minutes > • FULL OUTER MapJoin ON: 0:06 minutes > FULL OUTER Join 10,000,000 rows against 323,910 small table keys > o Vectorization ON > • FULL OUTER MapJoin OFF: 14:56 minutes > • FULL OUTER MapJoin ON: 1:45 minutes > FULL OUTER Join 10,000,000 rows against 1,000 small table keys > o Vectorization ON > • FULL OUTER MapJoin OFF: 12:37 minutes > • FULL OUTER MapJoin ON: 1:38 minutes > Hopefully, someone will do large scale cluster testing. > [DynamicPartitionedHashJoin] MapJoin should scale dramatically better than > [Sort] MergeJoin reduce-shuffle. -- This message was sent by Atlassian JIRA (v7.6.3#76005)