[
https://issues.apache.org/jira/browse/HIVE-8291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Mostafa Mokhtar updated HIVE-8291:
----------------------------------
Attachment: 2014_09_28_16_48_48.jfr
Hot function profile.
Use Java mission control (jmc) to open the file, JMC is part of Java 7.
> ACID : Reading from partitioned bucketed tables has high overhead, 50% of
> time is spent in OrcInputFormat.getReader
> -------------------------------------------------------------------------------------------------------------------
>
> Key: HIVE-8291
> URL: https://issues.apache.org/jira/browse/HIVE-8291
> Project: Hive
> Issue Type: Bug
> Components: Query Processor
> Affects Versions: 0.14.0
> Environment: cn105
> Reporter: Mostafa Mokhtar
> Assignee: Owen O'Malley
> Fix For: 0.14.0
>
> Attachments: 2014_09_28_16_48_48.jfr
>
>
> Reading from bucketed partitioned tables has significantly higher overhead
> compared to non-bucketed non-partitioned files.
> 50% of the time is spent in these two lines of code in
> OrcInputFormate.getReader()
> {code}
> String txnString = conf.get(ValidTxnList.VALID_TXNS_KEY,
> Long.MAX_VALUE + ":");
> ValidTxnList validTxnList = new ValidTxnListImpl(txnString);
> {code}
> {code}
> Stack Trace Sample Count Percentage(%)
> hive.ql.exec.tez.MapRecordSource.pushRecord() 2,981 87.215
> org.apache.tez.mapreduce.lib.MRReaderMapred.next() 2,002 58.572
>
> mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.next(Object,
> Object) 2,002 58.572
>
> mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.initNextRecordReader()
> 1,984 58.046
> hive.ql.io.HiveInputFormat.getRecordReader(InputSplit, JobConf,
> Reporter) 1,983 58.016
>
> hive.ql.io.orc.OrcInputFormat.getRecordReader(InputSplit, JobConf, Reporter)
> 1,891 55.325
> hive.ql.io.orc.OrcInputFormat.getReader(InputSplit,
> AcidInputFormat$Options) 1,723 50.41
> hive.common.ValidTxnListImpl.<init>(String)
> 934 27.326
> conf.Configuration.get(String, String) 621
> 18.169
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)