Zilong Zhu created HDFS-17504: --------------------------------- Summary: DN process should exit when BPServiceActor exit Key: HDFS-17504 URL: https://issues.apache.org/jira/browse/HDFS-17504 Project: Hadoop HDFS Issue Type: Bug Reporter: Zilong Zhu
BPServiceActor is a very important thread. In a non-HA cluster, the exit of the BPServiceActor thread will cause the DN process to exit. However, in a HA cluster, this is not the case. I found HDFS-15651 causes BPServiceActor thread to exit and sets the "runningState" from "RunningState.FAILED" to "RunningState.EXITED", it can be confusing during troubleshooting. I believe that the DN process should exit when the flag of the BPServiceActor is set to RunningState.FAILED because at this point, the DN is unable to recover and establish a heartbeat connection with the ANN on its own. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org