[ 
https://issues.apache.org/jira/browse/HDFS-17307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17833219#comment-17833219
 ] 

ASF GitHub Bot commented on HDFS-17307:
---------------------------------------

matthewrossi commented on PR #6387:
URL: https://github.com/apache/hadoop/pull/6387#issuecomment-2032214029

   This is what I've found diving into the project history:
   
   - `docker-compose.yaml` was always configured with `ENSURE_NAMENODE_DIR: 
"/tmp/hadoop-root/dfs/name"`
   - the namenode [base 
image](https://github.com/apache/hadoop/blob/docker-hadoop-runner/Dockerfile) 
always specified the use of the `hadoop` user (so my initial assumption about 
the previous use of the `root` user was wrong)
   - the default configurations of 
[Hadoop](https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/resources/core-default.xml#L37)
 and 
[HDFS](https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/resources/hdfs-default.xml#L442)
 are the ones determining the use of the `/tmp/hadoop-${user.name}/dfs/name` 
directory,  but they date back before the creation of the `docker-compose.yaml`
   
   So, it looks like the issue has always been there.




> docker-compose.yaml sets namenode directory wrong causing datanode failures 
> on restart
> --------------------------------------------------------------------------------------
>
>                 Key: HDFS-17307
>                 URL: https://issues.apache.org/jira/browse/HDFS-17307
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: datanode, namenode
>            Reporter: Matthew Rossi
>            Priority: Major
>              Labels: pull-request-available
>
> Restarting existing services using the docker-compose.yaml, causes the 
> datanode to crash after a few seconds.
> How to reproduce:
> {code:java}
> $ docker-compose up -d # everything starts ok
> $ docker-compose stop  # stop services without removing containers
> $ docker-compose up -d # everything starts, but datanode crashes after a few 
> seconds{code}
> The log produced by the datanode suggests the issue is due to a mismatch in 
> the clusterIDs of the namenode and the datanode:
> {code:java}
> datanode_1         | 2023-12-28 11:17:15 WARN  Storage:420 - Failed to add 
> storage directory [DISK]file:/tmp/hadoop-hadoop/dfs/data
> datanode_1         | java.io.IOException: Incompatible clusterIDs in 
> /tmp/hadoop-hadoop/dfs/data: namenode clusterID = 
> CID-250bae07-6a8a-45ce-84bb-8828b37b10b7; datanode clusterID = 
> CID-2c1c7105-7fdf-4a19-8ef8-7cb763e5b701 {code}
> After some troubleshooting I found out the namenode is not reusing the 
> clusterID of the previous run because it cannot find it in the directory set 
> by ENSURE_NAMENODE_DIR=/tmp/hadoop-root/dfs/name. This is due to a change of 
> the default user of the namenode, which is now "hadoop",  so the namenode is 
> actually writing these information to /tmp/hadoop-hadoop/dfs/name.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to