[ https://issues.apache.org/jira/browse/HAWQ-1504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Radar Lei resolved HAWQ-1504. ----------------------------- Resolution: Fixed Fix Version/s: 2.3.0.0-incubating Fixed by Shubham Sharma. > Namenode hangs during restart of docker environment configured using > incubator-hawq/contrib/hawq-docker/ > -------------------------------------------------------------------------------------------------------- > > Key: HAWQ-1504 > URL: https://issues.apache.org/jira/browse/HAWQ-1504 > Project: Apache HAWQ > Issue Type: Bug > Components: Command Line Tools > Reporter: Shubham Sharma > Assignee: Radar Lei > Priority: Minor > Fix For: 2.3.0.0-incubating > > > After setting up an environment using instructions provided under > incubator-hawq/contrib/hawq-docker/, while trying to restart docker > containers namenode hangs and tries a namenode -format during every start. > Steps to reproduce this issue - > - Navigate to incubator-hawq/contrib/hawq-docker > - make stop > - make start > - docker exec -it centos7-namenode bash > - ps -ef | grep java > You can see namenode -format running. > {code} > [gpadmin@centos7-namenode data]$ ps -ef | grep java > hdfs 11 10 1 00:56 ? 00:00:06 > /etc/alternatives/java_sdk/bin/java -Dproc_namenode -Xmx1000m > -Dhdfs.namenode=centos7-namenode -Dhadoop.log.dir=/var/log/hadoop/hdfs > -Dhadoop.log.file=hadoop.log -Dhadoop.home.dir=/usr/hdp/2.5.0.0-1245/hadoop > -Dhadoop.id.str= -Dhadoop.root.logger=INFO,console > -Djava.library.path=:/usr/hdp/2.5.0.0-1245/hadoop/lib/native/Linux-amd64-64:/usr/hdp/2.5.0.0-1245/hadoop/lib/native > -Dhadoop.policy.file=hadoop-policy.xml -Djava.net.preferIPv4Stack=true > -Dhadoop.security.logger=INFO,NullAppender > org.apache.hadoop.hdfs.server.namenode.NameNode -format > {code} > Since namenode -format runs in interactive mode and at this stage it is > waiting for a (Yes/No) response, the namenode will remain stuck forever. This > makes hdfs unavailable. > Root cause of the problem - > In the dockerfiles present under > incubator-hawq/contrib/hawq-docker/centos6-docker/hawq-test and > incubator-hawq/contrib/hawq-docker/centos7-docker/hawq-test, the docker > directive ENTRYPOINT executes entrypoin.sh during startup. > The entrypoint.sh in turn executes start-hdfs.sh. start-dfs.sh checks for the > following - > {code} > if [ ! -d /tmp/hdfs/name/current ]; then > su -l hdfs -c "hdfs namenode -format" > fi > {code} > My assumption is it looks for fsimage and edit logs. If they are not present > the script assumes that this a first time initialization and namenode format > should be done. However, path /tmp/hdfs/name/current does not exist on > namenode. > From namenode logs it is clear that fsimage and edit logs are written under > /tmp/hadoop-hdfs/dfs/name/current. > {code} > 2017-07-18 00:55:20,892 INFO org.apache.hadoop.hdfs.server.namenode.FSImage: > No edit log streams selected. > 2017-07-18 00:55:20,893 INFO org.apache.hadoop.hdfs.server.namenode.FSImage: > Planning to load image: > FSImageFile(file=/tmp/hadoop-hdfs/dfs/name/current/fsimage_0000000000000000000, > cpktTxId=0000000000000000000) > 2017-07-18 00:55:20,995 INFO > org.apache.hadoop.hdfs.server.namenode.FSImageFormatPBINode: Loading 1 INodes. > 2017-07-18 00:55:21,064 INFO > org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf: Loaded FSImage > in 0 seconds. > 2017-07-18 00:55:21,065 INFO org.apache.hadoop.hdfs.server.namenode.FSImage: > Loaded image for txid 0 from > /tmp/hadoop-hdfs/dfs/name/current/fsimage_0000000000000000000 > 2017-07-18 00:55:21,084 INFO > org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Need to save fs image? > false (staleImage=false, haEnabled=false, isRollingUpgrade=false) > 2017-07-18 00:55:21,084 INFO > org.apache.hadoop.hdfs.server.namenode.FSEditLog: Starting log segment at 1 > {code} > Thus wrong path in > incubator-hawq/contrib/hawq-docker/centos*-docker/hawq-test/start-hdfs.sh > causes namenode to hang during each restart of the containers making hdfs > unavailable. -- This message was sent by Atlassian JIRA (v6.4.14#64029)