Viraj Jasani created HDFS-16918:
-----------------------------------

             Summary: Optionally shut down datanode if it does not stay 
connected to active namenode
                 Key: HDFS-16918
                 URL: https://issues.apache.org/jira/browse/HDFS-16918
             Project: Hadoop HDFS
          Issue Type: New Feature
            Reporter: Viraj Jasani
            Assignee: Viraj Jasani


While deploying Hdfs on Envoy proxy setup, depending on the socket timeout 
configured at envoy, the network connection issues or packet loss could be 
observed. All of envoys basically form a transparent communication mesh in 
which each app can send and receive packets to and from localhost and is 
unaware of the network topology.

The primary purpose of Envoy is to make the network transparent to 
applications, in order to identify network issues reliably. However, sometimes 
such proxy based setup could result into socket connection issues b/ datanode 
and namenode.

Many deployment frameworks provide auto-start functionality when any of the 
hadoop daemons are stopped. If a given datanode does not stay connected to 
active namenode in the cluster i.e. does not receive heartbeat response in time 
from active namenode (even though active namenode is not terminated), it would 
not be much useful. We should be able to provide configurable behavior such 
that if a given datanode cannot receive heartbeat response from active namenode 
in configurable time duration, it should terminate itself to avoid impacting 
the availability SLA. This is specifically helpful when the underlying 
deployment or observability framework (e.g. K8S) can start up the datanode 
automatically upon it's shutdown (unless it is being restarted as part of 
rolling upgrade) and help the newly brought up datanode (in case of k8s, a new 
pod with dynamically changing nodes) establish new socket connection to active 
and standby namenodes. This should be an opt-in behavior and not default one.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

Reply via email to