[ https://issues.apache.org/jira/browse/HIVE-13345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15214728#comment-15214728 ]
Sergey Shelukhin edited comment on HIVE-13345 at 3/28/16 7:27 PM: ------------------------------------------------------------------ [~gopalv] [~prasanth_j] [~owen.omalley] opinions on the best approach? I am leaning towards changing ORC to use POJOs instead of OrcProto stuff, but as an alternative we can change metadata cache in LLAP to store serialized metadata. The cost of deserializing every time in LLAP vs the cost of copying fields/converting some things (e.g. OrcProto stores bloom filters as List<Long>, which aside from being horrible on purely practical grounds, offends my engineering sensibilities, so I might be biased here). was (Author: sershe): [~gopalv] [~prasanth_j] [~owen.omalley] opinions on the best approach? I am leaning towards changing ORC to use POJOs instead of OrcProto stuff, but as an alternative we can change metadata cache in LLAP to store serialized metadata. The cost of deserializing every time in LLAP vs the cost of copying fields/converting some things (e.g. OrcProto stores bloom filters as List<Long>, which aside from being horrible on pure merits, offends my engineering sensibilities, so I might be biased here). > LLAP: metadata cache takes too much space, esp. with bloom filters, due to > Java/protobuf overhead > ------------------------------------------------------------------------------------------------- > > Key: HIVE-13345 > URL: https://issues.apache.org/jira/browse/HIVE-13345 > Project: Hive > Issue Type: Bug > Reporter: Sergey Shelukhin > > We cache java objects currently; these have high overhead, average stripe > metadata takes 200-500Kb on real files, and with bloom filters blowing up > more than x5 due to being stored as list of Long-s, up to 5Mb per stripe. > That is undesirable. > We should either create better objects for ORC (might be good in general) or > store serialized metadata and deserialize when needed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)