[ 
https://issues.apache.org/jira/browse/HIVE-25472?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeongdae Kim updated HIVE-25472:
--------------------------------
    Description: 
Our hive servers are getting shutdown regularly by OOM.
{code:java}
Terminating due to java.lang.OutOfMemoryError: Compressed class space {code}
 

We found out a lot of classes generated by janino compiler exist from heap dump,

(about 98% of all classes loaded)

!screenshot-5.png|width=418,height=280!  

 

, and those generated classes are cached in calcite's JaninoRelMetadataProvider.

!screenshot-6.png|width=424,height=594!

 

This cache has no expiration, and whenever queries compile, hive server makes 
new metadata providers, one of keys for caching, which means hive servers make 
metadata classes generated in runtime every query and hive servers can't 
utilize the cache, but cache is getting bigger, and finally terminated by OOM 
due to lack of meta space.

 

By this issue, hive servers are getting slow down because it takes too much 
time to load classes, until OOM, as below flame graph.

 (48% of sampling is class loading)

  !image-2021-08-11-22-03-07-523.jpg|width=405,height=209!

 

I think we can fix this issue by either

a) maintain a static metadata provider (HIVE-18920)

or 

b) make constant size caches 
(https://issues.apache.org/jira/browse/CALCITE-1808)

 

To apply b), we need to upgrade calcite version to 1.15, but this includes lots 
of changes.

it may be inappropriate for patch releases. (+ inefficient solution)

 

In our production clusters, It is proven that 1) can prevent OOM and 
performance degradation.

  was:
Our hive servers are getting shutdown regularly by OOM.
{code:java}
Terminating due to java.lang.OutOfMemoryError: Compressed class space {code}
 

We found out a lot of classes generated by janino compiler exist from heap dump,

(about 98% of all classes loaded)

!screenshot-5.png|width=418,height=280!  

 

, and those generated classes are cached (in JaninoRelMetadataProvider)

!screenshot-6.png|width=424,height=594!

 

This cache has no expiration, and hive server makes new metadata providers, one 
of keys for caching, every query, which means hive servers make metadata 
classes generated in runtime every query and we can't utilize the cache, and 
finally those classes can't be loaded due to lack of meta space.

 

By this issue, hive servers are getting slow down because it takes too much 
time to load classes, until OOM, as below flame graph.

 (48% of sampling is class loading)

  !image-2021-08-11-22-03-07-523.jpg|width=405,height=209!

 

I think we can fix this issue by either

a) maintain a static metadata provider (HIVE-18920)

or 

b) make constant size caches 
(https://issues.apache.org/jira/browse/CALCITE-1808)

 

To apply b), we need to upgrade calcite version to 1.15, but this includes lots 
of changes.

it may be inappropriate for patch releases. (+ inefficient solution)

 

In our production clusters, It is proven that 1) can prevent OOM and 
performance degradation.


> Prevent hive-server2 from getting OOM(Compressed class space) (Backport 
> HIVE-18920)
> -----------------------------------------------------------------------------------
>
>                 Key: HIVE-25472
>                 URL: https://issues.apache.org/jira/browse/HIVE-25472
>             Project: Hive
>          Issue Type: Bug
>    Affects Versions: 2.3.8
>            Reporter: Jeongdae Kim
>            Assignee: Jeongdae Kim
>            Priority: Major
>         Attachments: image-2021-08-11-22-03-07-523.jpg, screenshot-5.png, 
> screenshot-6.png
>
>
> Our hive servers are getting shutdown regularly by OOM.
> {code:java}
> Terminating due to java.lang.OutOfMemoryError: Compressed class space {code}
>  
> We found out a lot of classes generated by janino compiler exist from heap 
> dump,
> (about 98% of all classes loaded)
> !screenshot-5.png|width=418,height=280!  
>  
> , and those generated classes are cached in calcite's 
> JaninoRelMetadataProvider.
> !screenshot-6.png|width=424,height=594!
>  
> This cache has no expiration, and whenever queries compile, hive server makes 
> new metadata providers, one of keys for caching, which means hive servers 
> make metadata classes generated in runtime every query and hive servers can't 
> utilize the cache, but cache is getting bigger, and finally terminated by OOM 
> due to lack of meta space.
>  
> By this issue, hive servers are getting slow down because it takes too much 
> time to load classes, until OOM, as below flame graph.
>  (48% of sampling is class loading)
>   !image-2021-08-11-22-03-07-523.jpg|width=405,height=209!
>  
> I think we can fix this issue by either
> a) maintain a static metadata provider (HIVE-18920)
> or 
> b) make constant size caches 
> (https://issues.apache.org/jira/browse/CALCITE-1808)
>  
> To apply b), we need to upgrade calcite version to 1.15, but this includes 
> lots of changes.
> it may be inappropriate for patch releases. (+ inefficient solution)
>  
> In our production clusters, It is proven that 1) can prevent OOM and 
> performance degradation.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to