[ 
https://issues.apache.org/jira/browse/HIVE-11306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei Zheng updated HIVE-11306:
-----------------------------
    Attachment: HIVE-11306.6.patch

Failure in vector_leftsemi_mapjoin.q was due to a n-way left outer join issue, 
where for one small table we decide to spill, whereas for the second small 
table we early exit via bloomfilter. The other way around is also problematic.

Fixed in patch 6.

> Add a bloom-1 filter for Hybrid MapJoin spills
> ----------------------------------------------
>
>                 Key: HIVE-11306
>                 URL: https://issues.apache.org/jira/browse/HIVE-11306
>             Project: Hive
>          Issue Type: Improvement
>          Components: Hive
>    Affects Versions: 1.3.0, 2.0.0
>            Reporter: Gopal V
>            Assignee: Wei Zheng
>         Attachments: HIVE-11306.1.patch, HIVE-11306.2.patch, 
> HIVE-11306.3.patch, HIVE-11306.5.patch, HIVE-11306.6.patch
>
>
> HIVE-9277 implemented Spillable joins for Tez, which suffers from a 
> corner-case performance issue when joining wide small tables against a narrow 
> big table (like a user info table join events stream).
> The fact that the wide table is spilled causes extra IO, even though the nDV 
> of the join key might be in the thousands.
> A cheap bloom-1 filter would add a massive performance gain for such queries, 
> massively cutting down on the spill IO costs for the big-table spills.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to