[ 
https://issues.apache.org/jira/browse/HADOOP-13738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15616800#comment-15616800
 ] 

Arpit Agarwal commented on HADOOP-13738:
----------------------------------------

Thanks for the feedback all! I've incorporated most comments.

bq. so I am not sure why we need random at all here
Changed file naming scheme to use fixed names. If we hit two successive 
failures then we'll try once more with a randomized file name.

bq. shouldn't diskchecker be able to understand the delete operation failed
Fixed.

bq. Can we have some timer/threshold (in ms level) for the expected execution 
time of each diskIoCheckWithoutNativeIo() test to break out of the retry loop
Hi [~xyao], that will require spawning a thread. DiskChecker will have to 
maintain a thread pool. We could end up with many threads stalled on a slow 
disk and checks of healthy disks waiting for thread availability. It is easier 
to solve this in the caller. Let me know if you're okay with deferring this 
particular problem for now.

> DiskChecker should perform some disk IO
> ---------------------------------------
>
>                 Key: HADOOP-13738
>                 URL: https://issues.apache.org/jira/browse/HADOOP-13738
>             Project: Hadoop Common
>          Issue Type: Improvement
>            Reporter: Arpit Agarwal
>            Assignee: Arpit Agarwal
>         Attachments: HADOOP-13738.01.patch, HADOOP-13738.02.patch, 
> HADOOP-13738.03.patch, HADOOP-13738.04.patch
>
>
> DiskChecker can fail to detect total disk/controller failures indefinitely. 
> We have seen this in real clusters. DiskChecker performs simple 
> permissions-based checks on directories which do not guarantee that any disk 
> IO will be attempted.
> A simple improvement is to write some data and flush it to the disk.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

Reply via email to