[ 
https://issues.apache.org/jira/browse/MAPREDUCE-7317?focusedWorklogId=537261&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-537261
 ]

ASF GitHub Bot logged work on MAPREDUCE-7317:
---------------------------------------------

                Author: ASF GitHub Bot
            Created on: 18/Jan/21 07:56
            Start Date: 18/Jan/21 07:56
    Worklog Time Spent: 10m 
      Work Description: HeartSaVioR opened a new pull request #2624:
URL: https://github.com/apache/hadoop/pull/2624


   This PR proposes to add latency information in 
FileOutputCommitter.mergePaths, so that we can trace how much latency specific 
directory takes to merge.
   
   This information would provide some value on investigation when the commit 
in FileOutputCommitter takes huge time than expected. This class logged the 
call with from/to params in debug level which looks insufficient to trace the 
latency of specific directory due to recursive call.
   
   No test added as there's nothing to test actually. Manual test done via 
adding below in log4j.properties
   
   ```
   log4j.logger.org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter=DEBUG
   ```
   
   and ran tests in TestFileOutputCommitter.
   
   ```
   2021-01-18 16:14:03,475 DEBUG [main] output.FileOutputCommitter 
(FileOutputCommitter.java:mergePaths(461)) - Merging data from 
DeprecatedRawLocalFileStatus{path=file:/Users/jlim/WorkArea/JavaProjects/hadoop/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/target/test-dir/org.apache.hadoop.mapreduce.lib.output.TestFileOutputCommitter/_temporary/0/task_200707121733_0001_m_000000;
 isDirectory=true; modification_time=1610954043000; access_time=1610954043000; 
owner=; group=; permission=rwxrwxrwx; isSymlink=false; hasAcl=false; 
isEncrypted=false; isErasureCoded=false} to 
file:/Users/jlim/WorkArea/JavaProjects/hadoop/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/target/test-dir/org.apache.hadoop.mapreduce.lib.output.TestFileOutputCommitter
   ...
   2021-01-18 16:14:03,476 DEBUG [main] output.FileOutputCommitter 
(FileOutputCommitter.java:mergePaths(502)) - Merged data from 
file:/.../hadoop/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/target/test-dir/org.apache.hadoop.mapreduce.lib.output.TestFileOutputCommitter/_temporary/0/task_200707121733_0001_m_000000
 to 
file:/.../hadoop/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/target/test-dir/org.apache.hadoop.mapreduce.lib.output.TestFileOutputCommitter
 in 1 ms
   ```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
-------------------

            Worklog Id:     (was: 537261)
    Remaining Estimate: 0h
            Time Spent: 10m

> Add latency information in FileOutputCommitter.mergePaths
> ---------------------------------------------------------
>
>                 Key: MAPREDUCE-7317
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-7317
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: client
>            Reporter: Jungtaek Lim
>            Priority: Minor
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> We have been observed some occurrences of huge delay from file output 
> committer V1, where file output committer V2 is not an option.
> While the root cause should have investigated on our side, there's another 
> issue that there's insufficient information to debug. Most likely the huge 
> delay comes from mergePaths, but the class only provides the "debug" log 
> message to log the call itself with parameters, nothing else. mergePaths has 
> been called recursively which is harder to trace how much latency specific 
> directory takes to merge.
> It would be nice and not intrusive to add latency info in mergePath, so that 
> we can see how much latency specific directory takes to merge, only when 
> debug log is enabled.
> (Ideally it'd be nice if we can log warn message when the call takes huge 
> time to process, but I don't have the proper threshold for the "huge time", 
> so I'd avoid dealing with it altogether here.)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org

Reply via email to