[ https://issues.apache.org/jira/browse/HDFS-9409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15819152#comment-15819152 ]
Kihwal Lee commented on HDFS-9409: ---------------------------------- {{BlockScanner}} is the same way. It removes a thread from its tracking data structure after a timed join. This is probably less problematic because the timeout in each {{join()}} is 5 minutes. Although this wait may be good for unit tests, waiting up to 5 minutes per each volume scanner thread is unreasonable when the datanode needs to be restarted quickly. We actually internally reduced it down to 100ms and the rolling upgrades works a lot better. Perhaps we need a flag in datanode that tells whether it should wait until everything is terminated or quickly shutdown without waiting a long time for daemon threads to terminate. The shutdown code of each subsystem would then check this flag and behave differently. We would turn it on for unit testing and off for production. How does it sound? > DataNode shutdown does not guarantee full shutdown of all threads due to race > condition. > ---------------------------------------------------------------------------------------- > > Key: HDFS-9409 > URL: https://issues.apache.org/jira/browse/HDFS-9409 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode > Reporter: Chris Nauroth > > {{DataNode#shutdown}} is documented to return "only after shutdown is > complete". Even after completion of this method, it's possible that threads > started by the DataNode are still running. Race conditions in the shutdown > sequence may cause it to skip stopping and joining the {{BPServiceActor}} > threads. > This is likely not a big problem in normal operations, because these are > daemon threads that won't block overall process exit. It is more of a > problem for tests, because it makes it impossible to write reliable > assertions that these threads exited cleanly. For large test suites, it can > also cause an accumulation of unneeded threads, which might harm test > performance. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org