Matthew Rossi created HDFS-17307:
------------------------------------

             Summary: docker-compose.yaml sets namenode directory wrong causing 
datanode failures on restart
                 Key: HDFS-17307
                 URL: https://issues.apache.org/jira/browse/HDFS-17307
             Project: Hadoop HDFS
          Issue Type: Bug
          Components: datanode, namenode
            Reporter: Matthew Rossi


Restarting existing services using the docker-compose.yaml, causes the datanode 
to crash after a few seconds.

How to reproduce:
{code:java}
$ docker-compose up -d # everything starts ok
$ docker-compose stop  # stop services without removing containers
$ docker-compose up -d # everything starts, but datanode crashes after a few 
seconds{code}
The log produced by the datanode suggests the issue is due to a mismatch in the 
clusterIDs of the namenode and the datanode:
{code:java}
datanode_1         | 2023-12-28 11:17:15 WARN  Storage:420 - Failed to add 
storage directory [DISK]file:/tmp/hadoop-hadoop/dfs/data
datanode_1         | java.io.IOException: Incompatible clusterIDs in 
/tmp/hadoop-hadoop/dfs/data: namenode clusterID = 
CID-250bae07-6a8a-45ce-84bb-8828b37b10b7; datanode clusterID = 
CID-2c1c7105-7fdf-4a19-8ef8-7cb763e5b701 {code}
After some troubleshooting I found out the namenode is not reusing the 
clusterID of the previous run because it cannot find it in the directory set by 
ENSURE_NAMENODE_DIR=/tmp/hadoop-root/dfs/name. This is due to a change of the 
default user of the namenode, which is now "hadoop",  so the namenode is 
actually writing these information to /tmp/hadoop-hadoop/dfs/name.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

Reply via email to