[ https://issues.apache.org/jira/browse/HIVE-25472?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jeongdae Kim updated HIVE-25472: -------------------------------- Summary: Prevent hive-server2 from getting OOM(Compressed class space) (Backport HIVE-18920) (was: Prevent hive-server2 from getting OOM(Compressed space size) (Backport HIVE-18920)) > Prevent hive-server2 from getting OOM(Compressed class space) (Backport > HIVE-18920) > ----------------------------------------------------------------------------------- > > Key: HIVE-25472 > URL: https://issues.apache.org/jira/browse/HIVE-25472 > Project: Hive > Issue Type: Bug > Affects Versions: 2.3.8 > Reporter: Jeongdae Kim > Assignee: Jeongdae Kim > Priority: Major > Attachments: image-2021-08-11-22-03-07-523.jpg, screenshot-5.png, > screenshot-6.png > > > Our hive servers are getting shutdown regularly by OOM. > {code:java} > Terminating due to java.lang.OutOfMemoryError: Compressed class space {code} > > We found out a lot of classes generated by janino compiler exist from heap > dump, > (about 98% of all classes loaded) > !screenshot-5.png|width=418,height=280! > > , and those generated classes are cached (in JaninoRelMetadataProvider) > !screenshot-6.png|width=424,height=594! > > This cache has no expiration, and hive server makes new metadata providers, > one of keys for caching, every query, which means hive servers make metadata > classes generated in runtime every query and we can't utilize the cache, and > finally those classes can't be loaded due to lack of meta space. > > By this issue, hive servers are getting slow down because it takes too much > time to load classes, until OOM, as below flame graph. > (48% of sampling is class loading) > !image-2021-08-11-22-03-07-523.jpg|width=405,height=209! > > I think we can fix this issue by either > a) maintain a static metadata provider (HIVE-18920) > or > b) make constant size caches > (https://issues.apache.org/jira/browse/CALCITE-1808) > > To apply b), we need to upgrade calcite version to 1.15, but this includes > lots of changes. > it may be inappropriate for patch releases. (+ inefficient solution) > > In our production clusters, It is proven that 1) can prevent OOM and > performance degradation. -- This message was sent by Atlassian Jira (v8.3.4#803005)