[jira] [Commented] (HIVE-6430) MapJoin hash table has large memory overhead

Sergey Shelukhin (JIRA) Tue, 15 Apr 2014 17:16:57 -0700

    [ 
https://issues.apache.org/jira/browse/HIVE-6430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13970244#comment-13970244
 ]


Sergey Shelukhin commented on HIVE-6430:
----------------------------------------

Resize has an epic bug, cannot rely on slot being part of the hash because of 
probing... that was pretty silly.
I think this also causes some of perf degradation because table does get 
rehashed and it may screw it up completely (I ran the query that returns no 
results so it wouldn't clutter my shell, good thinking there).

> MapJoin hash table has large memory overhead
> --------------------------------------------
>
>                 Key: HIVE-6430
>                 URL: https://issues.apache.org/jira/browse/HIVE-6430
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Sergey Shelukhin
>            Assignee: Sergey Shelukhin
>         Attachments: HIVE-6430.01.patch, HIVE-6430.02.patch, 
> HIVE-6430.03.patch, HIVE-6430.04.patch, HIVE-6430.05.patch, 
> HIVE-6430.06.patch, HIVE-6430.patch
>
>
> Right now, in some queries, I see that storing e.g. 4 ints (2 for key and 2 
> for row) can take several hundred bytes, which is ridiculous. I am reducing 
> the size of MJKey and MJRowContainer in other jiras, but in general we don't 
> need to have java hash table there.  We can either use primitive-friendly 
> hashtable like the one from HPPC (Apache-licenced), or some variation, to map 
> primitive keys to single row storage structure without an object per row 
> (similar to vectorization).



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HIVE-6430) MapJoin hash table has large memory overhead

Reply via email to