[jira] [Commented] (HIVE-8292) Reading from partitioned bucketed tables has high overhead in MapOperator.cleanUpInputFileChangedOp

Hive QA (JIRA) Fri, 10 Oct 2014 08:03:12 -0700

    [ 
https://issues.apache.org/jira/browse/HIVE-8292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14166941#comment-14166941
 ]


Hive QA commented on HIVE-8292:
-------------------------------



{color:green}Overall{color}: +1 all checks pass

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12673717/HIVE-8292.2.patch

{color:green}SUCCESS:{color} +1 4119 tests passed

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/1197/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/1197/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-1197/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12673717

> Reading from partitioned bucketed tables has high overhead in 
> MapOperator.cleanUpInputFileChangedOp
> ---------------------------------------------------------------------------------------------------
>
>                 Key: HIVE-8292
>                 URL: https://issues.apache.org/jira/browse/HIVE-8292
>             Project: Hive
>          Issue Type: Bug
>          Components: Query Processor
>    Affects Versions: 0.14.0
>         Environment: cn105
>            Reporter: Mostafa Mokhtar
>            Assignee: Gopal V
>             Fix For: 0.14.0
>
>         Attachments: 2014_09_29_14_46_04.jfr, HIVE-8292.1.patch, 
> HIVE-8292.2.patch
>
>
> Reading from bucketed partitioned tables has significantly higher overhead 
> compared to non-bucketed non-partitioned files.
> 50% of the profile is spent in MapOperator.cleanUpInputFileChangedOp
> 5% the CPU in 
> {code}
>  Path onepath = normalizePath(onefile);
> {code}
> And 
> 45% the CPU in 
> {code}
>  onepath.toUri().relativize(fpath.toUri()).equals(fpath.toUri());
> {code}
> From the profiler 
> {code}
> Stack Trace   Sample Count    Percentage(%)
> hive.ql.exec.tez.MapRecordSource.processRow(Object)   5,327   62.348
>    hive.ql.exec.vector.VectorMapOperator.process(Writable)    5,326   62.336
>       hive.ql.exec.Operator.cleanUpInputFileChanged() 4,851   56.777
>          hive.ql.exec.MapOperator.cleanUpInputFileChangedOp() 4,849   56.753
>                                  java.net.URI.relativize(URI) 3,903   45.681
>                                     java.net.URI.relativize(URI, URI) 3,903   
> 45.681
>                                        java.net.URI.normalize(String) 2,169   
> 25.386
>                                        java.net.URI.equal(String, String)     
> 526     6.156
>                                        java.net.URI.equalIgnoringCase(String, 
> String) 1       0.012
>                                        java.lang.String.substring(int)        
> 1       0.012
>             hive.ql.exec.MapOperator.normalizePath(String)    506     5.922
>             org.apache.commons.logging.impl.Log4JLogger.info(Object)  32      
> 0.375
>                                  java.net.URI.equals(Object)  12      0.14
>                                  java.util.HashMap$KeySet.iterator()  5       
> 0.059
>                                  java.util.HashMap.get(Object)        4       
> 0.047
>                                  java.util.LinkedHashMap.get(Object)  3       
> 0.035
>          hive.ql.exec.Operator.cleanUpInputFileChanged()      1       0.012
>       hive.ql.exec.Operator.forward(Object, ObjectInspector)  473     5.536
>       hive.ql.exec.mr.ExecMapperContext.inputFileChanged()    1       0.012
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8292) Reading from partitioned bucketed tables has high overhead in MapOperator.cleanUpInputFileChangedOp

Reply via email to