[ 
https://issues.apache.org/jira/browse/HIVE-13809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15294246#comment-15294246
 ] 

Wei Zheng commented on HIVE-13809:
----------------------------------

[~gopalv] Sure, thanks for your input. Here's the snippet from application log. 
It can be seen in this case we got 266 million keys, thus 207 MB for bloom 
filter.
{code}
2016-05-20 11:29:56,600 [INFO] [pool-17-thread-2] 
|persistence.HybridHashTableContainer|: Total available memory: 2115483632
2016-05-20 11:29:56,601 [INFO] [pool-17-thread-2] 
|persistence.HybridHashTableContainer|: Estimated small table size: 1600000000
2016-05-20 11:29:56,601 [INFO] [pool-17-thread-2] 
|persistence.HybridHashTableContainer|: Number of hash partitions to be 
created: 16
2016-05-20 11:29:56,614 [INFO] [TezChild] |vector.VectorGroupByOperator|: 
VectorGroupByOperator is vector output false
2016-05-20 11:29:56,617 [INFO] [TezChild] |exec.ReduceSinkOperator|: 
Initializing operator RS[44]
2016-05-20 11:29:56,620 [INFO] [TezChild] |exec.ReduceSinkOperator|: Using tag 
= -1
2016-05-20 11:29:56,780 [INFO] [pool-17-thread-2] 
|persistence.HybridHashTableContainer|: Using a bloom-1 filter 266666672 keys 
of size 207840816 bytes
{code}

> Hybrid Grace Hash Join memory usage estimation didn't take into account the 
> bloom filter size
> ---------------------------------------------------------------------------------------------
>
>                 Key: HIVE-13809
>                 URL: https://issues.apache.org/jira/browse/HIVE-13809
>             Project: Hive
>          Issue Type: Bug
>          Components: Hive
>    Affects Versions: 2.0.0, 2.1.0
>            Reporter: Wei Zheng
>            Assignee: Wei Zheng
>
> Memory estimation is important during hash table loading, because we need to 
> make the decision of whether to load the next hash partition in memory or 
> spill it. If the assumption is there's enough memory but it turns out not the 
> case, we will run into OOM problem.
> Currently hybrid grace hash join memory usage estimation didn't take into 
> account the bloom filter size. In large test cases (TB scale) the bloom 
> filter grows as big as hundreds of MB, big enough to cause estimation error.
> The solution is to count in the bloom filter size into memory estimation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to