Prabhu Joseph created MAPREDUCE-6981:
----------------------------------------
Summary: Map Progress is misleading for Distcp job
Key: MAPREDUCE-6981
URL: https://issues.apache.org/jira/browse/MAPREDUCE-6981
Project: Hadoop Map/Reduce
Issue Type: Bug
Components: distcp
Affects Versions: 2.7.3
Reporter: Prabhu Joseph
Priority: Minor
The Progress displayed by client when running Distcp job is misleading. The Map
Progress reaches 100% earlier than the map tasks finishes. The issue reproduced
by just running Distcp with multiple huge files.
JobImpl returns progress 1.0 when either task finishes or task progress is 1.0.
The MapTask of Distcp gets the progress from SequenceFileRecordReader which
looks like updates the progress after reading the list of files and which does
not account the time taken to copy the files into Destination.
{code}
17/10/11 13:33:29 INFO mapreduce.Job: map 100% reduce 0%
17/10/11 13:34:47 INFO mapreduce.Job: Job job_1506610341926_0016 completed
successfully
{code}
The MapTask Progress is displayed at 17/10/11 13:33:29 whereas the last map
task finishes at 2017-10-11 13:34:45
{code}
2017-10-11 13:34:45,159 INFO [AsyncDispatcher event handler]
org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl:
task_1506610341926_0016_m_000002 Task Transitioned from RUNNING to SUCCEEDED
{code}
Attaching the client and application logs.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]