[ https://issues.apache.org/jira/browse/HIVE-19823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16507801#comment-16507801 ]
Zoltan Haindrich commented on HIVE-19823: ----------------------------------------- oh; I've just found it :) but that calculation is somehow ended up being incorrect: {code} threshold = (int)Math.ceil(keyCount / (keyCountAdj * loadFactor)); {code} but fixing it there would not make this problem entirely go away - I think the sizing strategy belongs to the datastracture > BytesBytesMultiHashMap estimation should account for loadFactor > --------------------------------------------------------------- > > Key: HIVE-19823 > URL: https://issues.apache.org/jira/browse/HIVE-19823 > Project: Hive > Issue Type: Improvement > Reporter: Zoltan Haindrich > Assignee: Zoltan Haindrich > Priority: Major > Attachments: HIVE-19823.01.patch > > > it could happen that the capacity is known beforehand; and the estimated size > of the hashtable is accurate. but still; because after some time the element > count violates loadfactor ratio a rehash will occur. > this by default could happen with a {{1-loadfactor = 25%}} probability > this rehashing takes around 2 seconds on my system for 6.5M entries > https://github.com/apache/hive/blob/cfd57348c1ac188e0ba131d5636a62ff7b7c27be/ql/src/java/org/apache/hadoop/hive/ql/exec/persistence/BytesBytesMultiHashMap.java#L176-L187 -- This message was sent by Atlassian JIRA (v7.6.3#76005)