[ https://issues.apache.org/jira/browse/HDFS-5583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13899574#comment-13899574 ]
Kihwal Lee commented on HDFS-5583: ---------------------------------- This patch triggers sending of the restart OOB ack to clients who are currently writing data. The shutdown ordering and timing have been adjusted to give enough time for DataXceiver threads (serving writes) to send the restart OOB ack upstream. First, DataXceiverServer is interrupted and in turn each DataXceiver threads are interrupted by it after closing the server socket to prevent further client connections. Idling DataXceiver threads due to keepalive will simply terminate. If {{DataNode#restarting}} is set, the OOB ack will be directly sent by these threads before taking down the packet responder threads. If the packet responder is in the middle of sending an ack, it can be blocked up to a configured amount of time before failing, which is 1.5 seconds by default. If they started sending but send takes a long time (e.g. slow client, network issue, etc.), they will get interrupted by DataXceiverServer in 2 seconds. DataXceiverServer will tear down sooner if all DataXceiver threads finish less than 2 seconds. The IPC server is stopped later in order to minimize the chance of shutdownDatanode() response being dropped. The shutdown method will only start interrupting the thread pool after a few seconds have passed since the DataXceiverServer interruption. By this time, all threads must have stopped, but if anyone didn't, they will get interrupted repeatedly. This is an existing behavior. The main DataNode thread joins on BP service threads. There was a fixed 2 second sleep, which has been changed to only wait until the shutdown is done. If the BP service threads terminated but shutdown() was not called, main thread will delay the exit for 2 seconds as it did before. This patch does not include the client-side changes, so the OOB ack will not have any visible effects. It will be treated as a node failure, which also happens when a datanode shuts down. > Make DN send an OOB Ack on shutdown before restaring > ---------------------------------------------------- > > Key: HDFS-5583 > URL: https://issues.apache.org/jira/browse/HDFS-5583 > Project: Hadoop HDFS > Issue Type: Sub-task > Reporter: Kihwal Lee > Assignee: Kihwal Lee > Attachments: HDFS-5583.patch, HDFS-5583.patch > > > Add an ability for data nodes to send an OOB response in order to indicate an > upcoming upgrade-restart. Client should ignore the pipeline error from the > node for a configured amount of time and try reconstruct the pipeline without > excluding the restarted node. If the node does not come back in time, > regular pipeline recovery should happen. > This feature is useful for the applications with a need to keep blocks local. > If the upgrade-restart is fast, the wait is preferable to losing locality. > It could also be used in general instead of the draining-writer strategy. -- This message was sent by Atlassian JIRA (v6.1.5#6160)