[ https://issues.apache.org/jira/browse/HIVE-12847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Nemon Lou updated HIVE-12847: ----------------------------- Description: The size based footer cache can not control memory usage properly. Having seen a HiveServer2 hang (full GC all the time) due to ORC file footer cache taking up too much heap memory. A simple query like "select * from orc_table limit 1" can make HiveServer2 hang. The input table has about 1000 ORC files and each ORC file owns about 2500 stripes. {noformat} num #instances #bytes class name ---------------------------------------------- 1: 214653601 25758432120 org.apache.hadoop.hive.ql.io.orc.OrcProto$ColumnStatistics 3: 122233301 8800797672 org.apache.hadoop.hive.ql.io.orc.OrcProto$StringStatistics 5: 89439001 6439608072 org.apache.hadoop.hive.ql.io.orc.OrcProto$IntegerStatistics 7: 2981300 262354400 org.apache.hadoop.hive.ql.io.orc.OrcProto$StripeInformation 9: 2981300 143102400 org.apache.hadoop.hive.ql.io.orc.OrcProto$StripeStatistics 12: 2983691 71608584 org.apache.hadoop.hive.ql.io.orc.ReaderImpl$StripeInformationImpl 15: 80929 7121752 org.apache.hadoop.hive.ql.io.orc.OrcProto$Type 17: 103282 5783792 org.apache.hadoop.mapreduce.lib.input.FileSplit 20: 51641 3305024 org.apache.hadoop.hive.ql.exec.FetchOperator$FetchInputFormatSplit 21: 51641 3305024 org.apache.hadoop.hive.ql.io.orc.OrcSplit 31: 1 413152 [Lorg.apache.hadoop.hive.ql.exec.FetchOperator$FetchInputFormatSplit; 100: 1122 26928 org.apache.hadoop.hive.ql.io.orc.Metadata {noformat} was: The size based footer cache can not control memory usage properly. Having seen a HiveServer2 hang due to ORC file footer cache taking up too much heap memory. A simple query like "select * from orc_table limit 1" can make HiveServer2 hang. The input table has about 1000 ORC files and each ORC file owns about 2500 stripes. {noformat} num #instances #bytes class name ---------------------------------------------- 1: 214653601 25758432120 org.apache.hadoop.hive.ql.io.orc.OrcProto$ColumnStatistics 3: 122233301 8800797672 org.apache.hadoop.hive.ql.io.orc.OrcProto$StringStatistics 5: 89439001 6439608072 org.apache.hadoop.hive.ql.io.orc.OrcProto$IntegerStatistics 7: 2981300 262354400 org.apache.hadoop.hive.ql.io.orc.OrcProto$StripeInformation 9: 2981300 143102400 org.apache.hadoop.hive.ql.io.orc.OrcProto$StripeStatistics 12: 2983691 71608584 org.apache.hadoop.hive.ql.io.orc.ReaderImpl$StripeInformationImpl 15: 80929 7121752 org.apache.hadoop.hive.ql.io.orc.OrcProto$Type 17: 103282 5783792 org.apache.hadoop.mapreduce.lib.input.FileSplit 20: 51641 3305024 org.apache.hadoop.hive.ql.exec.FetchOperator$FetchInputFormatSplit 21: 51641 3305024 org.apache.hadoop.hive.ql.io.orc.OrcSplit 31: 1 413152 [Lorg.apache.hadoop.hive.ql.exec.FetchOperator$FetchInputFormatSplit; 100: 1122 26928 org.apache.hadoop.hive.ql.io.orc.Metadata {noformat} > ORC file footer cache should be memory sensitive > ------------------------------------------------ > > Key: HIVE-12847 > URL: https://issues.apache.org/jira/browse/HIVE-12847 > Project: Hive > Issue Type: Improvement > Components: File Formats, ORC > Affects Versions: 1.2.1 > Reporter: Nemon Lou > Assignee: Nemon Lou > Attachments: HIVE-12847.patch > > > The size based footer cache can not control memory usage properly. > Having seen a HiveServer2 hang (full GC all the time) due to ORC file footer > cache taking up too much heap memory. > A simple query like "select * from orc_table limit 1" can make HiveServer2 > hang. > The input table has about 1000 ORC files and each ORC file owns about 2500 > stripes. > {noformat} > num #instances #bytes class name > ---------------------------------------------- > 1: 214653601 25758432120 > org.apache.hadoop.hive.ql.io.orc.OrcProto$ColumnStatistics > 3: 122233301 8800797672 > org.apache.hadoop.hive.ql.io.orc.OrcProto$StringStatistics > 5: 89439001 6439608072 > org.apache.hadoop.hive.ql.io.orc.OrcProto$IntegerStatistics > 7: 2981300 262354400 > org.apache.hadoop.hive.ql.io.orc.OrcProto$StripeInformation > 9: 2981300 143102400 > org.apache.hadoop.hive.ql.io.orc.OrcProto$StripeStatistics > 12: 2983691 71608584 > org.apache.hadoop.hive.ql.io.orc.ReaderImpl$StripeInformationImpl > 15: 80929 7121752 > org.apache.hadoop.hive.ql.io.orc.OrcProto$Type > 17: 103282 5783792 > org.apache.hadoop.mapreduce.lib.input.FileSplit > 20: 51641 3305024 > org.apache.hadoop.hive.ql.exec.FetchOperator$FetchInputFormatSplit > 21: 51641 3305024 org.apache.hadoop.hive.ql.io.orc.OrcSplit > 31: 1 413152 > [Lorg.apache.hadoop.hive.ql.exec.FetchOperator$FetchInputFormatSplit; > 100: 1122 26928 org.apache.hadoop.hive.ql.io.orc.Metadata > > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)