[ 
https://issues.apache.org/jira/browse/HADOOP-4679?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hairong Kuang updated HADOOP-4679:
----------------------------------

    Attachment: diskError2.patch

This patch incorporates Raghu's comments except for comment 3. The unit test 
does not create files in a tight loop. It waits for all replications are 
created before moving to the next iteration. I tried a few other ways of 
writing this test. It seems that the current one is most efficient.

In addition, I made a change to BlockReceiver. If BlockReceiver constructor 
fails, it checks if it caused by a read-only disk. Since checking read-only 
disks is an expensive operation, it is performed only when creating the 
temporary block file fails. 

> Datanode prints tons of log messages: Waiting for threadgroup to exit, active 
> theads is XX
> ------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-4679
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4679
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>            Reporter: Hairong Kuang
>            Assignee: Hairong Kuang
>         Attachments: diskError.patch, diskError1.patch, diskError2.patch
>
>
> When a data receiver thread sees a disk error, it immediately calls shutdown 
> to shutdown DataNode. But the shutdown method does not return before all data 
> receiver threads exit, which will never happen. Therefore the DataNode gets 
> into a dead/live lock state, emitting tons of log messages: Waiting for 
> threadgroup to exit, active threads is XX.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to