[ https://issues.apache.org/jira/browse/HIVE-13345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15214816#comment-15214816 ]
Prasanth Jayachandran commented on HIVE-13345: ---------------------------------------------- IMO we should store the serialized representation of metadata. Deserialized representation of metadata (Proto objects) are supposed to be short-lived. We have POJOs for all protobuf equivalents. BloomFilter, ColumnStatistics, StripeInformation etc. which creates POJOs from Proto objects. If we are caching the deserialized representation then we should cache the equivalent POJOs and not the proto objects. > LLAP: metadata cache takes too much space, esp. with bloom filters, due to > Java/protobuf overhead > ------------------------------------------------------------------------------------------------- > > Key: HIVE-13345 > URL: https://issues.apache.org/jira/browse/HIVE-13345 > Project: Hive > Issue Type: Bug > Reporter: Sergey Shelukhin > > We cache java objects currently; these have high overhead, average stripe > metadata takes 200-500Kb on real files, and with bloom filters blowing up > more than x5 due to being stored as list of Long-s, up to 5Mb per stripe. > That is undesirable. > We should either create better objects for ORC (might be good in general) or > store serialized metadata and deserialize when needed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)