[ 
https://issues.apache.org/jira/browse/HIVE-13345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15214816#comment-15214816
 ] 

Prasanth Jayachandran commented on HIVE-13345:
----------------------------------------------

IMO we should store the serialized representation of metadata. Deserialized 
representation of metadata (Proto objects) are supposed to be short-lived. We 
have POJOs for all protobuf equivalents. BloomFilter, ColumnStatistics, 
StripeInformation etc. which creates POJOs from Proto objects. If we are 
caching the deserialized representation then we should cache the equivalent 
POJOs and not the proto objects.

> LLAP: metadata cache takes too much space, esp. with bloom filters, due to 
> Java/protobuf overhead
> -------------------------------------------------------------------------------------------------
>
>                 Key: HIVE-13345
>                 URL: https://issues.apache.org/jira/browse/HIVE-13345
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Sergey Shelukhin
>
> We cache java objects currently; these have high overhead, average stripe 
> metadata takes 200-500Kb on real files, and with bloom filters blowing up 
> more than x5 due to being stored as list of Long-s, up to 5Mb per stripe. 
> That is undesirable.
> We should either create better objects for ORC (might be good in general) or 
> store serialized metadata and deserialize when needed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to