[ 
https://issues.apache.org/jira/browse/MAPREDUCE-2413?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ravi Gummadi updated MAPREDUCE-2413:
------------------------------------

    Attachment: MR-2413.v0.3.patch

Attaching new patch removing an unused field in LocalStorage class.

> TaskTracker should handle disk failures at both startup and runtime
> -------------------------------------------------------------------
>
>                 Key: MAPREDUCE-2413
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2413
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: task-controller, tasktracker
>    Affects Versions: 0.20.204.0
>            Reporter: Bharath Mundlapudi
>            Assignee: Ravi Gummadi
>             Fix For: 0.20.204.0
>
>         Attachments: MR-2413.v0.1.patch, MR-2413.v0.2.patch, 
> MR-2413.v0.3.patch, MR-2413.v0.patch
>
>
> At present, TaskTracker doesn't handle disk failures properly both at startup 
> and runtime.
> (1) Currently TaskTracker doesn't come up if any of the mapred-local-dirs is 
> on a bad disk. TaskTracker should ignore that particular mapred-local-dir and 
> start up and use only the remaining good mapred-local-dirs.
> (2) If a disk goes bad while TaskTracker is running, currently TaskTracker 
> doesn't do anything special. This results in either
>    (a) TaskTracker continues to "try to use that bad disk" and this results 
> in lots of task failures and possibly job failures(because of multiple TTs 
> having bad disks) and eventually these TTs getting graylisted for all jobs. 
> And this needs manual restart of TT with modified configuration of 
> mapred-local-dirs avoiding the bad disk. OR
>    (b) Health check script identifying the disk as bad and the TT gets 
> blacklisted. And this also needs manual restart of TT with modified 
> configuration of mapred-local-dirs avoiding the bad disk.
> This JIRA is to make TaskTracker more fault-tolerant to disk failures solving 
> (1) and (2). i.e. TT should start even if at least one of the 
> mapred-local-dirs is on a good disk and TT should adjust its in-memory list 
> of mapred-local-dirs and avoid using bad mapred-local-dirs.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to