Wei-Chiu Chuang created HDFS-9249:
-------------------------------------

             Summary: NPE thrown if an IOException is thrown in NameNode.<init>
                 Key: HDFS-9249
                 URL: https://issues.apache.org/jira/browse/HDFS-9249
             Project: Hadoop HDFS
          Issue Type: Bug
            Reporter: Wei-Chiu Chuang
            Assignee: Wei-Chiu Chuang
            Priority: Minor


This issue was found when running test case TestBackupNode.testCheckpointNode, 
but upon closer look, the problem is not due to the test case.

Looks like an IOException was thrown in
    try {
      initializeGenericKeys(conf, nsId, namenodeId);
      initialize(conf);
      try {
        haContext.writeLock();
        state.prepareToEnterState(haContext);
        state.enterState(haContext);
      } finally {
        haContext.writeUnlock();
      }
causing the namenode to stop, but the namesystem was not yet properly 
instantiated, causing NPE.

I tried to reproduce locally, but to no avail.

Because I could not reproduce the bug, and the log does not indicate what 
caused the IOException, I suggest make this a supportability JIRA to log the 
exception for future improvement.

Stacktrace
java.lang.NullPointerException: null
at org.apache.hadoop.hdfs.server.namenode.NameNode.getFSImage(NameNode.java:906)
at org.apache.hadoop.hdfs.server.namenode.BackupNode.stop(BackupNode.java:210)
at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:827)
at org.apache.hadoop.hdfs.server.namenode.BackupNode.<init>(BackupNode.java:89)
at 
org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1474)
at 
org.apache.hadoop.hdfs.server.namenode.TestBackupNode.startBackupNode(TestBackupNode.java:102)
at 
org.apache.hadoop.hdfs.server.namenode.TestBackupNode.testCheckpoint(TestBackupNode.java:298)
at 
org.apache.hadoop.hdfs.server.namenode.TestBackupNode.testCheckpointNode(TestBackupNode.java:130)
The last few lines of log:
2015-10-14 19:45:07,807 INFO namenode.NameNode 
(NameNode.java:createNameNode(1422)) - createNameNode [-checkpoint]
2015-10-14 19:45:07,807 INFO impl.MetricsSystemImpl 
(MetricsSystemImpl.java:init(158)) - CheckpointNode metrics system started 
(again)
2015-10-14 19:45:07,808 INFO namenode.NameNode 
(NameNode.java:setClientNamenodeAddress(402)) - fs.defaultFS is 
hdfs://localhost:37835
2015-10-14 19:45:07,808 INFO namenode.NameNode 
(NameNode.java:setClientNamenodeAddress(422)) - Clients are to use 
localhost:37835 to access this namenode/service.
2015-10-14 19:45:07,810 INFO hdfs.MiniDFSCluster 
(MiniDFSCluster.java:shutdown(1708)) - Shutting down the Mini HDFS Cluster
2015-10-14 19:45:07,810 INFO namenode.FSNamesystem 
(FSNamesystem.java:stopActiveServices(1298)) - Stopping services started for 
active state
2015-10-14 19:45:07,811 INFO namenode.FSEditLog 
(FSEditLog.java:endCurrentLogSegment(1228)) - Ending log segment 1
2015-10-14 19:45:07,811 INFO namenode.FSNamesystem 
(FSNamesystem.java:run(5306)) - NameNodeEditLogRoller was interrupted, exiting
2015-10-14 19:45:07,811 INFO namenode.FSEditLog 
(FSEditLog.java:printStatistics(703)) - Number of transactions: 3 Total time 
for transactions(ms): 0 Number of transactions batched in Syncs: 0 Number of 
syncs: 4 SyncTimes(ms): 2 1 
2015-10-14 19:45:07,811 INFO namenode.FSNamesystem 
(FSNamesystem.java:run(5373)) - LazyPersistFileScrubber was interrupted, exiting
2015-10-14 19:45:07,822 INFO namenode.FileJournalManager 
(FileJournalManager.java:finalizeLogSegment(142)) - Finalizing edits file 
/data/jenkins/workspace/CDH5.5.0-Hadoop-HDFS-2.6.0/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/name1/current/edits_inprogress_0000000000000000001
 -> 
/data/jenkins/workspace/CDH5.5.0-Hadoop-HDFS-2.6.0/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/name1/current/edits_0000000000000000001-0000000000000000003
2015-10-14 19:45:07,835 INFO namenode.FileJournalManager 
(FileJournalManager.java:finalizeLogSegment(142)) - Finalizing edits file 
/data/jenkins/workspace/CDH5.5.0-Hadoop-HDFS-2.6.0/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/name2/current/edits_inprogress_0000000000000000001
 -> 
/data/jenkins/workspace/CDH5.5.0-Hadoop-HDFS-2.6.0/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/name2/current/edits_0000000000000000001-0000000000000000003
2015-10-14 19:45:07,836 INFO blockmanagement.CacheReplicationMonitor 
(CacheReplicationMonitor.java:run(169)) - Shutting down CacheReplicationMonitor
2015-10-14 19:45:07,836 INFO ipc.Server (Server.java:stop(2485)) - Stopping 
server on 37835
2015-10-14 19:45:07,837 INFO ipc.Server (Server.java:run(718)) - Stopping IPC 
Server listener on 37835
2015-10-14 19:45:07,837 INFO ipc.Server (Server.java:run(844)) - Stopping IPC 
Server Responder
2015-10-14 19:45:07,837 INFO blockmanagement.BlockManager 
(BlockManager.java:run(3781)) - Stopping ReplicationMonitor.
2015-10-14 19:45:07,838 WARN blockmanagement.DecommissionManager 
(DecommissionManager.java:run(78)) - Monitor interrupted: 
java.lang.InterruptedException: sleep interrupted
2015-10-14 19:45:07,844 INFO namenode.FSNamesystem 
(FSNamesystem.java:stopActiveServices(1298)) - Stopping services started for 
active state
2015-10-14 19:45:07,845 INFO namenode.FSNamesystem 
(FSNamesystem.java:stopStandbyServices(1386)) - Stopping services started for 
standby state
2015-10-14 19:45:07,848 INFO mortbay.log (Slf4jLog.java:info(67)) - Stopped 
HttpServer2$SelectChannelConnectorWithSafeStartup@localhost:0



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to