[
https://issues.apache.org/jira/browse/HIVE-9277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14289035#comment-14289035
]
Lefty Leverenz commented on HIVE-9277:
--------------------------------------
[~wzheng] put the design doc on the wiki here: [Hybrid Hybrid Grace Hash Join,
v1.0 |
https://cwiki.apache.org/confluence/display/Hive/Hybrid+Hybrid+Grace+Hash+Join,+v1.0].
_Review comment:_ The final graphic in "Recursive Hashing and Spilling" says
...
bq. Now we probe using Matchfile 1 against HT 3. Matching values go into
result. Non-matching values go to Matchfile 4.
... but it shows non-matching values from HT4, not HT3, going to Matchfile4. A
dashed line from HT3 to Matchfile4 is missing. And should the text say "probe
using Matchfile 1 against HT3 and HT4 (if it fits in memory)"?
> Hybrid Hybrid Grace Hash Join
> -----------------------------
>
> Key: HIVE-9277
> URL: https://issues.apache.org/jira/browse/HIVE-9277
> Project: Hive
> Issue Type: New Feature
> Components: Physical Optimizer
> Reporter: Wei Zheng
> Assignee: Wei Zheng
> Labels: join
> Attachments: High-leveldesignforHybridHybridGraceHashJoinv1.0.pdf
>
>
> We are proposing an enhanced hash join algorithm called “hybrid hybrid grace
> hash join”. We can benefit from this feature as illustrated below:
> o The query will not fail even if the estimated memory requirement is
> slightly wrong
> o Expensive garbage collection overhead can be avoided when hash table grows
> o Join execution using a Map join operator even though the small table
> doesn't fit in memory as spilling some data from the build and probe sides
> will still be cheaper than having to shuffle the large fact table
> The design was based on Hadoop’s parallel processing capability and
> significant amount of memory available.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)