[ https://issues.apache.org/jira/browse/HIVE-13809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15294246#comment-15294246 ]
Wei Zheng commented on HIVE-13809: ---------------------------------- [~gopalv] Sure, thanks for your input. Here's the snippet from application log. It can be seen in this case we got 266 million keys, thus 207 MB for bloom filter. {code} 2016-05-20 11:29:56,600 [INFO] [pool-17-thread-2] |persistence.HybridHashTableContainer|: Total available memory: 2115483632 2016-05-20 11:29:56,601 [INFO] [pool-17-thread-2] |persistence.HybridHashTableContainer|: Estimated small table size: 1600000000 2016-05-20 11:29:56,601 [INFO] [pool-17-thread-2] |persistence.HybridHashTableContainer|: Number of hash partitions to be created: 16 2016-05-20 11:29:56,614 [INFO] [TezChild] |vector.VectorGroupByOperator|: VectorGroupByOperator is vector output false 2016-05-20 11:29:56,617 [INFO] [TezChild] |exec.ReduceSinkOperator|: Initializing operator RS[44] 2016-05-20 11:29:56,620 [INFO] [TezChild] |exec.ReduceSinkOperator|: Using tag = -1 2016-05-20 11:29:56,780 [INFO] [pool-17-thread-2] |persistence.HybridHashTableContainer|: Using a bloom-1 filter 266666672 keys of size 207840816 bytes {code} > Hybrid Grace Hash Join memory usage estimation didn't take into account the > bloom filter size > --------------------------------------------------------------------------------------------- > > Key: HIVE-13809 > URL: https://issues.apache.org/jira/browse/HIVE-13809 > Project: Hive > Issue Type: Bug > Components: Hive > Affects Versions: 2.0.0, 2.1.0 > Reporter: Wei Zheng > Assignee: Wei Zheng > > Memory estimation is important during hash table loading, because we need to > make the decision of whether to load the next hash partition in memory or > spill it. If the assumption is there's enough memory but it turns out not the > case, we will run into OOM problem. > Currently hybrid grace hash join memory usage estimation didn't take into > account the bloom filter size. In large test cases (TB scale) the bloom > filter grows as big as hundreds of MB, big enough to cause estimation error. > The solution is to count in the bloom filter size into memory estimation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)