Disk problems should be handled better by the MR framework
----------------------------------------------------------
Key: HADOOP-1252
URL: https://issues.apache.org/jira/browse/HADOOP-1252
Project: Hadoop
Issue Type: Bug
Components: mapred
Affects Versions: 0.12.3
Reporter: Devaraj Das
Assigned To: Devaraj Das
Fix For: 0.13.0
The MR framework should recover from Disk Failure problems without causing jobs
to hang. Note that this issue is about a short-term solution to solving the
problem. For example, by looking at the code and improving the exception
handling (to better detect faulty disks and missing files). The long term
approach might be to have a FS layer that takes care of failed disks and makes
it transparent to the tasks. That will be a separate issue by itself.
Some of the issues that have been reported are HADOOP-1087 and a comment by
Koji on HADOOP-1200 (not sure whether those are all). Please add to this issue
as much details as possible on disk failures leading to hung jobs (details like
relevant exception traces, way to reproduce, etc.).
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.