[
https://issues.apache.org/jira/browse/HADOOP-4246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12634369#action_12634369
]
Jothi Padmanabhan commented on HADOOP-4246:
-------------------------------------------
The patch looks good. A few minor comments
* Since MAX_FAILED_UNIQUE_FETCHES is no longer a constant, it should be named
maxFailedUniqueFetches
* getClosestPowerOf2 will not return negative numbers. So, this piece of code
{code}
if (this.maxFetchRetriesPerMap < 1) {
this.maxFetchRetriesPerMap = 1;
}
{code}
should be modifed to
{code}
if (this.maxFetcRetriesPerMap ==0) {
this.maxFetchRetriesPerMap = 1;
}
{code}
for better clarity
* For the backoff value for a GENERIC_ERROR, should we just back off by a fixed
amount and retry? The concern here is that if we are hitting a
'disk-out-of-space' exception, we are better off identifying it earlier than
late. If the map_run_time is high, we might actually be spending a lot of time
before the jobtracker gets notified. Thoughts?
> Reduce task copy errors may not kill it eventually
> --------------------------------------------------
>
> Key: HADOOP-4246
> URL: https://issues.apache.org/jira/browse/HADOOP-4246
> Project: Hadoop Core
> Issue Type: Bug
> Components: mapred
> Affects Versions: 0.19.0
> Reporter: Amareshwari Sriramadasu
> Assignee: Amareshwari Sriramadasu
> Priority: Blocker
> Fix For: 0.19.0
>
> Attachments: patch-4246.txt
>
>
> maxFetchRetriesPerMap in reduce task can be zero some times (when
> maxMapRunTime is less than 4 seconds or mapred.reduce.copy.backoff is less
> than 4). This will not count reduce task copy errors to kill it eventually.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.