Matthew Rossi created HDFS-17307:
------------------------------------
Summary: docker-compose.yaml sets namenode directory wrong causing
datanode failures on restart
Key: HDFS-17307
URL: https://issues.apache.org/jira/browse/HDFS-17307
Project: Hadoop HDFS
Issue Type: Bug
Components: datanode, namenode
Reporter: Matthew Rossi
Restarting existing services using the docker-compose.yaml, causes the datanode
to crash after a few seconds.
How to reproduce:
{code:java}
$ docker-compose up -d # everything starts ok
$ docker-compose stop # stop services without removing containers
$ docker-compose up -d # everything starts, but datanode crashes after a few
seconds{code}
The log produced by the datanode suggests the issue is due to a mismatch in the
clusterIDs of the namenode and the datanode:
{code:java}
datanode_1 | 2023-12-28 11:17:15 WARN Storage:420 - Failed to add
storage directory [DISK]file:/tmp/hadoop-hadoop/dfs/data
datanode_1 | java.io.IOException: Incompatible clusterIDs in
/tmp/hadoop-hadoop/dfs/data: namenode clusterID =
CID-250bae07-6a8a-45ce-84bb-8828b37b10b7; datanode clusterID =
CID-2c1c7105-7fdf-4a19-8ef8-7cb763e5b701 {code}
After some troubleshooting I found out the namenode is not reusing the
clusterID of the previous run because it cannot find it in the directory set by
ENSURE_NAMENODE_DIR=/tmp/hadoop-root/dfs/name. This is due to a change of the
default user of the namenode, which is now "hadoop", so the namenode is
actually writing these information to /tmp/hadoop-hadoop/dfs/name.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]