[ 
https://issues.apache.org/jira/browse/HIVE-13345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15214728#comment-15214728
 ] 

Sergey Shelukhin edited comment on HIVE-13345 at 3/28/16 7:27 PM:
------------------------------------------------------------------

[~gopalv] [~prasanth_j] [~owen.omalley] opinions on the best approach? I am 
leaning towards changing ORC to use POJOs instead of OrcProto stuff, but as an 
alternative we can change metadata cache in LLAP to store serialized metadata. 
The cost of deserializing every time in LLAP vs the cost of copying 
fields/converting some things (e.g. OrcProto stores bloom filters as 
List<Long>, which aside from being horrible on purely practical grounds, 
offends my engineering sensibilities, so I might be biased here).



was (Author: sershe):
[~gopalv] [~prasanth_j] [~owen.omalley] opinions on the best approach? I am 
leaning towards changing ORC to use POJOs instead of OrcProto stuff, but as an 
alternative we can change metadata cache in LLAP to store serialized metadata. 
The cost of deserializing every time in LLAP vs the cost of copying 
fields/converting some things (e.g. OrcProto stores bloom filters as 
List<Long>, which aside from being horrible on pure merits, offends my 
engineering sensibilities, so I might be biased here).


> LLAP: metadata cache takes too much space, esp. with bloom filters, due to 
> Java/protobuf overhead
> -------------------------------------------------------------------------------------------------
>
>                 Key: HIVE-13345
>                 URL: https://issues.apache.org/jira/browse/HIVE-13345
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Sergey Shelukhin
>
> We cache java objects currently; these have high overhead, average stripe 
> metadata takes 200-500Kb on real files, and with bloom filters blowing up 
> more than x5 due to being stored as list of Long-s, up to 5Mb per stripe. 
> That is undesirable.
> We should either create better objects for ORC (might be good in general) or 
> store serialized metadata and deserialize when needed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to