Matthew Rossi created HDFS-17307: ------------------------------------ Summary: docker-compose.yaml sets namenode directory wrong causing datanode failures on restart Key: HDFS-17307 URL: https://issues.apache.org/jira/browse/HDFS-17307 Project: Hadoop HDFS Issue Type: Bug Components: datanode, namenode Reporter: Matthew Rossi
Restarting existing services using the docker-compose.yaml, causes the datanode to crash after a few seconds. How to reproduce: {code:java} $ docker-compose up -d # everything starts ok $ docker-compose stop # stop services without removing containers $ docker-compose up -d # everything starts, but datanode crashes after a few seconds{code} The log produced by the datanode suggests the issue is due to a mismatch in the clusterIDs of the namenode and the datanode: {code:java} datanode_1 | 2023-12-28 11:17:15 WARN Storage:420 - Failed to add storage directory [DISK]file:/tmp/hadoop-hadoop/dfs/data datanode_1 | java.io.IOException: Incompatible clusterIDs in /tmp/hadoop-hadoop/dfs/data: namenode clusterID = CID-250bae07-6a8a-45ce-84bb-8828b37b10b7; datanode clusterID = CID-2c1c7105-7fdf-4a19-8ef8-7cb763e5b701 {code} After some troubleshooting I found out the namenode is not reusing the clusterID of the previous run because it cannot find it in the directory set by ENSURE_NAMENODE_DIR=/tmp/hadoop-root/dfs/name. This is due to a change of the default user of the namenode, which is now "hadoop", so the namenode is actually writing these information to /tmp/hadoop-hadoop/dfs/name. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org