Nemon Lou created HIVE-12847:
--------------------------------
Summary: ORC file footer cache should be memory sensitive
Key: HIVE-12847
URL: https://issues.apache.org/jira/browse/HIVE-12847
Project: Hive
Issue Type: Improvement
Components: File Formats, ORC
Affects Versions: 1.2.1
Reporter: Nemon Lou
The size based footer cache can not control memory usage properly.
Having seen a HiveServer2 hang due to ORC file footer cache taking up too much
heap memory.
A simple query like "select * from orc_table limit 1" can make HiveServer2 hang.
The input table has about 1000 ORC files and each ORC file owns about 2500
stripes.
{noformat}
num #instances #bytes class name
----------------------------------------------
1: 214653601 25758432120
org.apache.hadoop.hive.ql.io.orc.OrcProto$ColumnStatistics
3: 122233301 8800797672
org.apache.hadoop.hive.ql.io.orc.OrcProto$StringStatistics
5: 89439001 6439608072
org.apache.hadoop.hive.ql.io.orc.OrcProto$IntegerStatistics
7: 2981300 262354400
org.apache.hadoop.hive.ql.io.orc.OrcProto$StripeInformation
9: 2981300 143102400
org.apache.hadoop.hive.ql.io.orc.OrcProto$StripeStatistics
12: 2983691 71608584
org.apache.hadoop.hive.ql.io.orc.ReaderImpl$StripeInformationImpl
15: 80929 7121752
org.apache.hadoop.hive.ql.io.orc.OrcProto$Type
17: 103282 5783792
org.apache.hadoop.mapreduce.lib.input.FileSplit
20: 51641 3305024
org.apache.hadoop.hive.ql.exec.FetchOperator$FetchInputFormatSplit
21: 51641 3305024 org.apache.hadoop.hive.ql.io.orc.OrcSplit
31: 1 413152
[Lorg.apache.hadoop.hive.ql.exec.FetchOperator$FetchInputFormatSplit;
100: 1122 26928 org.apache.hadoop.hive.ql.io.orc.Metadata
{noformat}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)