[ 
https://issues.apache.org/jira/browse/HADOOP-15493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16530413#comment-16530413
 ] 

Arpit Agarwal commented on HADOOP-15493:
----------------------------------------

{quote}I think we have to rely on the system to detect a failed 
controller/drive. Maybe we should just attempt to provoke the disk to go 
read-only. Have the DN periodically write a file to its storages every n-many 
mins – but take no action upon failure. Instead rely on the normal disk check 
to subsequently discover the disk is read-only.
{quote}
When you say 'we have to rely on the system', do you mean the OS?

We saw disk failures (and rarely controller failures) go undetected 
indefinitely. Application requests would fail and trigger disk checker which 
always succeeded. We had customers hit data loss after multiple undetected disk 
failures over a few days.

 
{quote}I don't think this disk-is-writable check should be in common.
{quote}
We can make the write check HDFS-internal. We still need a disk full check. 
Perhaps the safest option is a threshold which avoids false positives and 
allows false negatives.

> DiskChecker should handle disk full situation
> ---------------------------------------------
>
>                 Key: HADOOP-15493
>                 URL: https://issues.apache.org/jira/browse/HADOOP-15493
>             Project: Hadoop Common
>          Issue Type: Improvement
>            Reporter: Arpit Agarwal
>            Assignee: Arpit Agarwal
>            Priority: Critical
>         Attachments: HADOOP-15493.01.patch, HADOOP-15493.02.patch
>
>
> DiskChecker#checkDirWithDiskIo creates a file to verify that the disk is 
> writable.
> However check should not fail when file creation fails due to disk being 
> full. This avoids marking full disks as _failed_.
> Reported by [~kihwal] and [~daryn] in HADOOP-15450. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

Reply via email to