[ 
https://issues.apache.org/jira/browse/HDFS-5583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13899574#comment-13899574
 ] 

Kihwal Lee commented on HDFS-5583:
----------------------------------

This patch triggers sending of the restart OOB ack to clients who are currently 
writing data.

The shutdown ordering and timing have been adjusted to give enough time for 
DataXceiver threads (serving writes) to send the restart OOB ack  upstream. 
First, DataXceiverServer is interrupted and in turn each DataXceiver threads 
are interrupted by it after closing the server socket to prevent further client 
connections.  Idling DataXceiver threads due to keepalive will simply terminate.

If {{DataNode#restarting}} is set, the OOB ack will be directly sent by these 
threads before taking down the packet responder threads. If the packet 
responder is in the middle of sending an ack, it can be blocked up to a 
configured amount of time before failing, which is 1.5 seconds by default. If 
they started sending but send takes a long time (e.g. slow client, network 
issue, etc.), they will get interrupted by DataXceiverServer in 2 seconds.  
DataXceiverServer will tear down sooner if all DataXceiver threads finish less 
than 2 seconds.

The IPC server is stopped later in order to minimize the chance of 
shutdownDatanode() response being dropped. The shutdown method will only start 
interrupting the thread pool after a few seconds have passed since the 
DataXceiverServer interruption. By this time, all threads must have stopped, 
but if anyone didn't, they will get interrupted repeatedly. This is an existing 
behavior.

The main DataNode thread joins on BP service threads. There was a fixed 2 
second sleep, which has been changed to only wait until the shutdown is done. 
If the BP service threads terminated but shutdown() was not called, main thread 
will delay the exit for 2 seconds as it did before.

This patch does not include the client-side changes, so the OOB ack will not 
have any visible effects. It will be treated as a node failure, which also 
happens when a datanode shuts down.

> Make DN send an OOB Ack on shutdown before restaring
> ----------------------------------------------------
>
>                 Key: HDFS-5583
>                 URL: https://issues.apache.org/jira/browse/HDFS-5583
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>            Reporter: Kihwal Lee
>            Assignee: Kihwal Lee
>         Attachments: HDFS-5583.patch, HDFS-5583.patch
>
>
> Add an ability for data nodes to send an OOB response in order to indicate an 
> upcoming upgrade-restart. Client should ignore the pipeline error from the 
> node for a configured amount of time and try reconstruct the pipeline without 
> excluding the restarted node.  If the node does not come back in time, 
> regular pipeline recovery should happen.
> This feature is useful for the applications with a need to keep blocks local. 
> If the upgrade-restart is fast, the wait is preferable to losing locality.  
> It could also be used in general instead of the draining-writer strategy.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Reply via email to