[ 
https://issues.apache.org/jira/browse/HADOOP-4679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12650704#action_12650704
 ] 

Raghu Angadi commented on HADOOP-4679:
--------------------------------------

After talking to Hairong:

  # DataXceiverServer should handle SocketTimeoutException. Right now an idle 
DN prints exception every 10 seconds.
  # the timeout for serever socket could be lower.. that test will finish 
faster.
  # The unit test need not create files in a tight loop.
  # immedateShutdown is not really necessary. The way shutdown() works, it 
should only be called from offerService() thread. I think javadoc JavaDoc 
should state it explicitly. 
  # The reason log was printed in a tight infinite loop (with out sleep) is 
that thread inturrupts itself before calling sleep().. so sleep returns 
immediately!

I think this should go into 0.18. No one likes disks filling up with these log 
messages.
  

> Datanode prints tons of log messages: Waiting for threadgroup to exit, active 
> theads is XX
> ------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-4679
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4679
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>            Reporter: Hairong Kuang
>            Assignee: Hairong Kuang
>         Attachments: diskError.patch, diskError1.patch
>
>
> When a data receiver thread sees a disk error, it immediately calls shutdown 
> to shutdown DataNode. But the shutdown method does not return before all data 
> receiver threads exit, which will never happen. Therefore the DataNode gets 
> into a dead/live lock state, emitting tons of log messages: Waiting for 
> threadgroup to exit, active threads is XX.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to