[
https://issues.apache.org/jira/browse/HIVE-8292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Mostafa Mokhtar updated HIVE-8292:
----------------------------------
Description:
Reading from bucketed partitioned tables has significantly higher overhead
compared to non-bucketed non-partitioned files.
50% of the profile is spent in MapOperator.cleanUpInputFileChangedOp
5% the CPU in
{code}
Path onepath = normalizePath(onefile);
{code}
And
45% the CPU in
{code}
onepath.toUri().relativize(fpath.toUri()).equals(fpath.toUri());
{code}
>From the profiler
{code}
Stack Trace Sample Count Percentage(%)
hive.ql.exec.tez.MapRecordSource.processRow(Object) 5,327 62.348
hive.ql.exec.vector.VectorMapOperator.process(Writable) 5,326 62.336
hive.ql.exec.Operator.cleanUpInputFileChanged() 4,851 56.777
hive.ql.exec.MapOperator.cleanUpInputFileChangedOp() 4,849 56.753
java.net.URI.relativize(URI) 3,903 45.681
java.net.URI.relativize(URI, URI) 3,903
45.681
java.net.URI.normalize(String) 2,169
25.386
java.net.URI.equal(String, String)
526 6.156
java.net.URI.equalIgnoringCase(String,
String) 1 0.012
java.lang.String.substring(int) 1
0.012
hive.ql.exec.MapOperator.normalizePath(String) 506 5.922
org.apache.commons.logging.impl.Log4JLogger.info(Object) 32
0.375
java.net.URI.equals(Object) 12 0.14
java.util.HashMap$KeySet.iterator() 5
0.059
java.util.HashMap.get(Object) 4 0.047
java.util.LinkedHashMap.get(Object) 3
0.035
hive.ql.exec.Operator.cleanUpInputFileChanged() 1 0.012
hive.ql.exec.Operator.forward(Object, ObjectInspector) 473 5.536
hive.ql.exec.mr.ExecMapperContext.inputFileChanged() 1 0.012
{code}
was:
Reading from bucketed partitioned tables has significantly higher overhead
compared to non-bucketed non-partitioned files.
50% of the profile is spent in MapOperator.cleanUpInputFileChangedOp
5% the CPU in
{code}
Path onepath = normalizePath(onefile);
{code}
And
45% the CPU in
{code}
onepath.toUri().relativize(fpath.toUri()).equals(fpath.toUri());
{code}
>From the profiler
{code}
Stack Trace Sample Count Percentage(%)
org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(Object) 978
28.613
org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.process(Writable)
978 28.613
org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged() 866
25.336
org.apache.hadoop.hive.ql.exec.MapOperator.cleanUpInputFileChangedOp()
866 25.336
java.net.URI.relativize(URI) 655 19.163
java.net.URI.relativize(URI, URI) 655 19.163
java.net.URI.normalize(String) 517 15.126
java.net.URI.needsNormalization(String)
372 10.884
java.lang.String.charAt(int) 235
6.875
java.net.URI.equal(String, String) 27 0.79
java.lang.StringBuilder.toString() 1 0.029
java.lang.StringBuilder.<init>() 1 0.029
java.lang.StringBuilder.append(String) 1 0.029
org.apache.hadoop.hive.ql.exec.MapOperator.normalizePath(String) 167
4.886
org.apache.hadoop.fs.Path.<init>(String) 162 4.74
org.apache.hadoop.fs.Path.initialize(String, String, String, String) 162
4.74
org.apache.hadoop.fs.Path.normalizePath(String, String) 97 2.838
org.apache.commons.lang.StringUtils.replace(String, String, String)
97 2.838
org.apache.commons.lang.StringUtils.replace(String, String,
String, int) 97 2.838
java.lang.String.indexOf(String, int) 97 2.838
java.net.URI.<init>(String, String, String, String, String)
65 1.902
{code}
> Reading from partitioned bucketed tables has high overhead in
> MapOperator.cleanUpInputFileChangedOp
> ---------------------------------------------------------------------------------------------------
>
> Key: HIVE-8292
> URL: https://issues.apache.org/jira/browse/HIVE-8292
> Project: Hive
> Issue Type: Bug
> Components: Query Processor
> Affects Versions: 0.14.0
> Environment: cn105
> Reporter: Mostafa Mokhtar
> Assignee: Prasanth J
> Fix For: 0.14.0
>
> Attachments: 2014_09_29_14_46_04.jfr
>
>
> Reading from bucketed partitioned tables has significantly higher overhead
> compared to non-bucketed non-partitioned files.
> 50% of the profile is spent in MapOperator.cleanUpInputFileChangedOp
> 5% the CPU in
> {code}
> Path onepath = normalizePath(onefile);
> {code}
> And
> 45% the CPU in
> {code}
> onepath.toUri().relativize(fpath.toUri()).equals(fpath.toUri());
> {code}
> From the profiler
> {code}
> Stack Trace Sample Count Percentage(%)
> hive.ql.exec.tez.MapRecordSource.processRow(Object) 5,327 62.348
> hive.ql.exec.vector.VectorMapOperator.process(Writable) 5,326 62.336
> hive.ql.exec.Operator.cleanUpInputFileChanged() 4,851 56.777
> hive.ql.exec.MapOperator.cleanUpInputFileChangedOp() 4,849 56.753
> java.net.URI.relativize(URI) 3,903 45.681
> java.net.URI.relativize(URI, URI) 3,903
> 45.681
> java.net.URI.normalize(String) 2,169
> 25.386
> java.net.URI.equal(String, String)
> 526 6.156
> java.net.URI.equalIgnoringCase(String,
> String) 1 0.012
> java.lang.String.substring(int)
> 1 0.012
> hive.ql.exec.MapOperator.normalizePath(String) 506 5.922
> org.apache.commons.logging.impl.Log4JLogger.info(Object) 32
> 0.375
> java.net.URI.equals(Object) 12 0.14
> java.util.HashMap$KeySet.iterator() 5
> 0.059
> java.util.HashMap.get(Object) 4
> 0.047
> java.util.LinkedHashMap.get(Object) 3
> 0.035
> hive.ql.exec.Operator.cleanUpInputFileChanged() 1 0.012
> hive.ql.exec.Operator.forward(Object, ObjectInspector) 473 5.536
> hive.ql.exec.mr.ExecMapperContext.inputFileChanged() 1 0.012
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)