[
https://issues.apache.org/jira/browse/HADOOP-1087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12488773
]
Koji Noguchi commented on HADOOP-1087:
--------------------------------------
I tried some random mapouput link on 0.10.1 and 0.12.3.
In 0.10.1, it returned 500.
in 0.12.3, it returned 410.
On the web (in 0.10.1)
==============================
HTTP ERROR: 500
/hadoop1/mapred/local/task_0198_m_000251_0/file.out.index
RequestURI=/mapOutput
Powered by Jetty://
==============================
When the original error happened, I found the file.out.index at the different
directory. /hadoop4/mapred/local/task_0198_m_000251_0/file.out.index instead
of /hadoop1. That's how I thought it's something to do with the full drive.
> Another thing pointing to this direction is that the map output will be
> declared as 'lost' in the same exception handler code in the doGet method,
> and the JobTracker will reexecute the map. So the job should not hang.
>
Was this a fix after 0.10.1?
If so, we can change this to 'won't fix'.
> Reducer hangs pulling from incorrect file.out.index path. (when one of the
> mapred.local.dir is not accessible but becomes available later at reduce time)
> ---------------------------------------------------------------------------------------------------------------------------------------------------------
>
> Key: HADOOP-1087
> URL: https://issues.apache.org/jira/browse/HADOOP-1087
> Project: Hadoop
> Issue Type: Bug
> Components: mapred
> Affects Versions: 0.10.1
> Reporter: Koji Noguchi
>
> 2007-03-07 23:14:23,431 WARN org.apache.hadoop.mapred.TaskRunner:
> java.io.IOException: Server returned HTTP response code: 500 for URL:
> http://____:____/mapOutput?map=task_7810_m_000897_0&reduce=397
> at
> sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1149)
> at
> org.apache.hadoop.mapred.MapOutputLocation.getFile(MapOutputLocation.java:121)
> at
> org.apache.hadoop.mapred.ReduceTaskRunner$MapOutputCopier.copyOutput(ReduceTaskRunner.java:236)
> at
> org.apache.hadoop.mapred.ReduceTaskRunner$MapOutputCopier.run(ReduceTaskRunner.java:199)
> 2007-03-07 23:14:23,431 WARN org.apache.hadoop.mapred.TaskRunner:
> task_7810_r_000397_0 adding host ____.com to penalty box, next contact in 279
> seconds
> This happened when one of the drives was full and not accessible at map time.
> and one mapper
> public void mergeParts() throws IOException {
> ...
> Path finalIndexFile = mapOutputFile.getOutputIndexFile(getTaskId());
> failed on the first hash entry in mapred.local.dir and used the second entry
> Afterwards, first dir entry became available and when reducer tried to pull
> through,
> public static class MapOutputServlet extends HttpServlet {
> ...
> Path indexFileName = conf.getLocalPath(mapId+"/file.out.index");
> it used the first entry.
> As a result, directory was empty and reducer kept on trying to pull from the
> incorrect path and hang.
> (wasn't sure if this is a duplicate of HADOOP-895 since it is not
> reproducible unless I get disk failure.)
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.