[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1635?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amareshwari Sriramadasu updated MAPREDUCE-1635:
-----------------------------------------------

    Attachment: patch-1635.txt

I think the solution is to move the calculation of task output size to Task, 
instead of TaskTracker trying to construct the output file and failing. Task 
already has all the information of MapOutputFile. So, Task can set the output 
size in its last update, before sending umbilical.done(). 

Attached patch does the above fix. I added a MiniMR test to test task output 
sizes for map-only job, map-reduce job and a failed job.

In trunk, the log saying " reported output size..."  in 
TaskTracker.TaskInProgress.reportDone() does not make sense, because 
setOutputSize() happens after the reportDone() call. 
But, with the attached patch it makes sense. I validated that the log prints 
proper value with patch.

Patch removes following null checks in the code :
{code}
-      Path tmp_output =  mapOutputFile.getOutputFile();
-      if(tmp_output == null)
-        return 0;
-      FileSystem localFS = FileSystem.getLocal(conf);
-      FileStatus stat = localFS.getFileStatus(tmp_output);
-      if(stat == null)
-        return 0;
{code}
Because, mapOutputFile.getOutputFile() or localFS.getFileStatus(tmp_output) 
would never return null. Those calls either return proper value or throw an 
Exception. And the method handles Exception properly. Essentially these checks 
are unreachable code. Moreover, the return values deviate from the 
documentation that output size should be -1 if it can not be calculated.

Also, TaskStatus.outputSize is initialized to -1 to take care of task failures.


> ResourceEstimator does not work after MAPREDUCE-842
> ---------------------------------------------------
>
>                 Key: MAPREDUCE-1635
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1635
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: tasktracker
>    Affects Versions: 0.21.0
>            Reporter: Amareshwari Sriramadasu
>             Fix For: 0.22.0
>
>         Attachments: patch-1635.txt
>
>
> MAPREDUCE-842 changed Child's mapred.local.dir to have attemptDir as the base 
> local directory. Also assumption is that
> org.apache.hadoop.mapred.MapOutputFile always gets Child's mapred.local.dir. 
> But, MapOuptutFile.getOutputFile() is called from TaskTracker's conf, which 
> does not find the output file. Thus TaskTracker.tryToGetOutputSize() always 
> returns zero.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to