[
https://issues.apache.org/jira/browse/HADOOP-4679?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Hairong Kuang updated HADOOP-4679:
----------------------------------
Attachment: diskError2.patch
This patch incorporates Raghu's comments except for comment 3. The unit test
does not create files in a tight loop. It waits for all replications are
created before moving to the next iteration. I tried a few other ways of
writing this test. It seems that the current one is most efficient.
In addition, I made a change to BlockReceiver. If BlockReceiver constructor
fails, it checks if it caused by a read-only disk. Since checking read-only
disks is an expensive operation, it is performed only when creating the
temporary block file fails.
> Datanode prints tons of log messages: Waiting for threadgroup to exit, active
> theads is XX
> ------------------------------------------------------------------------------------------
>
> Key: HADOOP-4679
> URL: https://issues.apache.org/jira/browse/HADOOP-4679
> Project: Hadoop Core
> Issue Type: Bug
> Components: dfs
> Reporter: Hairong Kuang
> Assignee: Hairong Kuang
> Attachments: diskError.patch, diskError1.patch, diskError2.patch
>
>
> When a data receiver thread sees a disk error, it immediately calls shutdown
> to shutdown DataNode. But the shutdown method does not return before all data
> receiver threads exit, which will never happen. Therefore the DataNode gets
> into a dead/live lock state, emitting tons of log messages: Waiting for
> threadgroup to exit, active threads is XX.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.