[ https://issues.apache.org/jira/browse/MAPREDUCE-1635?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Amareshwari Sriramadasu updated MAPREDUCE-1635: ----------------------------------------------- Attachment: patch-1635.txt I think the solution is to move the calculation of task output size to Task, instead of TaskTracker trying to construct the output file and failing. Task already has all the information of MapOutputFile. So, Task can set the output size in its last update, before sending umbilical.done(). Attached patch does the above fix. I added a MiniMR test to test task output sizes for map-only job, map-reduce job and a failed job. In trunk, the log saying " reported output size..." in TaskTracker.TaskInProgress.reportDone() does not make sense, because setOutputSize() happens after the reportDone() call. But, with the attached patch it makes sense. I validated that the log prints proper value with patch. Patch removes following null checks in the code : {code} - Path tmp_output = mapOutputFile.getOutputFile(); - if(tmp_output == null) - return 0; - FileSystem localFS = FileSystem.getLocal(conf); - FileStatus stat = localFS.getFileStatus(tmp_output); - if(stat == null) - return 0; {code} Because, mapOutputFile.getOutputFile() or localFS.getFileStatus(tmp_output) would never return null. Those calls either return proper value or throw an Exception. And the method handles Exception properly. Essentially these checks are unreachable code. Moreover, the return values deviate from the documentation that output size should be -1 if it can not be calculated. Also, TaskStatus.outputSize is initialized to -1 to take care of task failures. > ResourceEstimator does not work after MAPREDUCE-842 > --------------------------------------------------- > > Key: MAPREDUCE-1635 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-1635 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: tasktracker > Affects Versions: 0.21.0 > Reporter: Amareshwari Sriramadasu > Fix For: 0.22.0 > > Attachments: patch-1635.txt > > > MAPREDUCE-842 changed Child's mapred.local.dir to have attemptDir as the base > local directory. Also assumption is that > org.apache.hadoop.mapred.MapOutputFile always gets Child's mapred.local.dir. > But, MapOuptutFile.getOutputFile() is called from TaskTracker's conf, which > does not find the output file. Thus TaskTracker.tryToGetOutputSize() always > returns zero. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.