[ https://issues.apache.org/jira/browse/HIVE-13665?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16263918#comment-16263918 ]
靳峥 commented on HIVE-13665: --------------------------- Besides, all the five HashCache Object have the same problem. It must be beautiful when all the threads are fighting for these five locks : ) > HS2 memory leak When multiple queries are running with get_json_object > ---------------------------------------------------------------------- > > Key: HIVE-13665 > URL: https://issues.apache.org/jira/browse/HIVE-13665 > Project: Hive > Issue Type: Bug > Affects Versions: 1.1.0, 2.0.0 > Reporter: JinsuKim > Attachments: patch.lst.txt > > > The extractObjectCache in UDFJson is increased over limitation(CACHE_SIZE = > 16). When multiple queries are running concurrently on HS2 local(not mr/tez) > with get_json_object or get_json_tuple > {code:java|title=HS2 heap_dump} > Object at 0x515ab18f8 > instance of org.apache.hadoop.hive.ql.udf.UDFJson$HashCache@0x515ab18f8 (77 > bytes) > Class: > class org.apache.hadoop.hive.ql.udf.UDFJson$HashCache > Instance data members: > accessOrder (Z) : false > entrySet (L) : <null> > hashSeed (I) : 0 > header (L) : java.util.LinkedHashMap$Entry@0x515a577d0 (60 bytes) > keySet (L) : <null> > loadFactor (F) : 0.6 > modCount (I) : 4741146 > size (I) : 2733158 <========== here!! > table (L) : [Ljava.util.HashMap$Entry;@0x7163d8b70 (67108880 bytes) > threshold (I) : 5033165 > values (L) : <null> > References to this object: > {code} > I think that this problem be caused by the LinkedHashMap object is not > thread-safe > {code} > * <p><strong>Note that this implementation is not synchronized.</strong> > * If multiple threads access a linked hash map concurrently, and at least > * one of the threads modifies the map structurally, it <em>must</em> be > * synchronized externally. This is typically accomplished by > * synchronizing on some object that naturally encapsulates the map. > {code} > Reproduce : > # Multiple queries are running with get_json_object and small input data(for > execution on hs2 local mode) > # jvm heap dump & analyze > {code:title=test scenario} > Multiple queries are running with get_json_object and small input data(for > execute on hs2 local mode) > 1.hql : > SELECT get_json_object(body, '$.fileSize'), get_json_object(body, > '$.ps_totalTimeSeconds'), get_json_object(body, '$.totalTimeSeconds') FROM > xxx.tttt WHERE part_hour='2016040105' > 2.hql : > SELECT get_json_object(body, '$.fileSize'), get_json_object(body, > '$.ps_totalTimeSeconds'), get_json_object(body, '$.totalTimeSeconds') FROM > xxx.tttt WHERE part_hour='2016040106' > 3.hql : > SELECT get_json_object(body, '$.fileSize'), get_json_object(body, > '$.ps_totalTimeSeconds'), get_json_object(body, '$.totalTimeSeconds') FROM > xxx.tttt WHERE part_hour='2016040107' > 4.hql : > SELECT get_json_object(body, '$.fileSize'), get_json_object(body, > '$.ps_totalTimeSeconds'), get_json_object(body, '$.totalTimeSeconds') FROM > xxx.tttt WHERE part_hour='2016040108' > > run.sh : > t_cnt=0 > while true > do > echo "query executing..." > for i in 1 2 3 4 > do > beeline -u jdbc:hive2://localhost:10000 -n hive --silent=true -f > $i.hql > $i.log 2>&1 & > done > wait > t_cnt=`expr $t_cnt + 1` > echo "query count : $t_cnt" > sleep 2 > done > jvm heap dump & analyze : > jmap -dump:format=b,file=hive.dmp $PID > jhat -J-mx48000m -port 8080 hive.dmp & > {code} > Finally I have attached our patch. -- This message was sent by Atlassian JIRA (v6.4.14#64029)