[ https://issues.apache.org/jira/browse/HDFS-9249?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Wei-Chiu Chuang updated HDFS-9249: ---------------------------------- Labels: supportability (was: ) > NPE thrown if an IOException is thrown in NameNode.<init> > --------------------------------------------------------- > > Key: HDFS-9249 > URL: https://issues.apache.org/jira/browse/HDFS-9249 > Project: Hadoop HDFS > Issue Type: Bug > Reporter: Wei-Chiu Chuang > Assignee: Wei-Chiu Chuang > Priority: Minor > Labels: supportability > > This issue was found when running test case > TestBackupNode.testCheckpointNode, but upon closer look, the problem is not > due to the test case. > Looks like an IOException was thrown in > try { > initializeGenericKeys(conf, nsId, namenodeId); > initialize(conf); > try { > haContext.writeLock(); > state.prepareToEnterState(haContext); > state.enterState(haContext); > } finally { > haContext.writeUnlock(); > } > causing the namenode to stop, but the namesystem was not yet properly > instantiated, causing NPE. > I tried to reproduce locally, but to no avail. > Because I could not reproduce the bug, and the log does not indicate what > caused the IOException, I suggest make this a supportability JIRA to log the > exception for future improvement. > Stacktrace > java.lang.NullPointerException: null > at > org.apache.hadoop.hdfs.server.namenode.NameNode.getFSImage(NameNode.java:906) > at org.apache.hadoop.hdfs.server.namenode.BackupNode.stop(BackupNode.java:210) > at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:827) > at > org.apache.hadoop.hdfs.server.namenode.BackupNode.<init>(BackupNode.java:89) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1474) > at > org.apache.hadoop.hdfs.server.namenode.TestBackupNode.startBackupNode(TestBackupNode.java:102) > at > org.apache.hadoop.hdfs.server.namenode.TestBackupNode.testCheckpoint(TestBackupNode.java:298) > at > org.apache.hadoop.hdfs.server.namenode.TestBackupNode.testCheckpointNode(TestBackupNode.java:130) > The last few lines of log: > 2015-10-14 19:45:07,807 INFO namenode.NameNode > (NameNode.java:createNameNode(1422)) - createNameNode [-checkpoint] > 2015-10-14 19:45:07,807 INFO impl.MetricsSystemImpl > (MetricsSystemImpl.java:init(158)) - CheckpointNode metrics system started > (again) > 2015-10-14 19:45:07,808 INFO namenode.NameNode > (NameNode.java:setClientNamenodeAddress(402)) - fs.defaultFS is > hdfs://localhost:37835 > 2015-10-14 19:45:07,808 INFO namenode.NameNode > (NameNode.java:setClientNamenodeAddress(422)) - Clients are to use > localhost:37835 to access this namenode/service. > 2015-10-14 19:45:07,810 INFO hdfs.MiniDFSCluster > (MiniDFSCluster.java:shutdown(1708)) - Shutting down the Mini HDFS Cluster > 2015-10-14 19:45:07,810 INFO namenode.FSNamesystem > (FSNamesystem.java:stopActiveServices(1298)) - Stopping services started for > active state > 2015-10-14 19:45:07,811 INFO namenode.FSEditLog > (FSEditLog.java:endCurrentLogSegment(1228)) - Ending log segment 1 > 2015-10-14 19:45:07,811 INFO namenode.FSNamesystem > (FSNamesystem.java:run(5306)) - NameNodeEditLogRoller was interrupted, exiting > 2015-10-14 19:45:07,811 INFO namenode.FSEditLog > (FSEditLog.java:printStatistics(703)) - Number of transactions: 3 Total time > for transactions(ms): 0 Number of transactions batched in Syncs: 0 Number of > syncs: 4 SyncTimes(ms): 2 1 > 2015-10-14 19:45:07,811 INFO namenode.FSNamesystem > (FSNamesystem.java:run(5373)) - LazyPersistFileScrubber was interrupted, > exiting > 2015-10-14 19:45:07,822 INFO namenode.FileJournalManager > (FileJournalManager.java:finalizeLogSegment(142)) - Finalizing edits file > /data/jenkins/workspace/CDH5.5.0-Hadoop-HDFS-2.6.0/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/name1/current/edits_inprogress_0000000000000000001 > -> > /data/jenkins/workspace/CDH5.5.0-Hadoop-HDFS-2.6.0/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/name1/current/edits_0000000000000000001-0000000000000000003 > 2015-10-14 19:45:07,835 INFO namenode.FileJournalManager > (FileJournalManager.java:finalizeLogSegment(142)) - Finalizing edits file > /data/jenkins/workspace/CDH5.5.0-Hadoop-HDFS-2.6.0/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/name2/current/edits_inprogress_0000000000000000001 > -> > /data/jenkins/workspace/CDH5.5.0-Hadoop-HDFS-2.6.0/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/name2/current/edits_0000000000000000001-0000000000000000003 > 2015-10-14 19:45:07,836 INFO blockmanagement.CacheReplicationMonitor > (CacheReplicationMonitor.java:run(169)) - Shutting down > CacheReplicationMonitor > 2015-10-14 19:45:07,836 INFO ipc.Server (Server.java:stop(2485)) - Stopping > server on 37835 > 2015-10-14 19:45:07,837 INFO ipc.Server (Server.java:run(718)) - Stopping IPC > Server listener on 37835 > 2015-10-14 19:45:07,837 INFO ipc.Server (Server.java:run(844)) - Stopping IPC > Server Responder > 2015-10-14 19:45:07,837 INFO blockmanagement.BlockManager > (BlockManager.java:run(3781)) - Stopping ReplicationMonitor. > 2015-10-14 19:45:07,838 WARN blockmanagement.DecommissionManager > (DecommissionManager.java:run(78)) - Monitor interrupted: > java.lang.InterruptedException: sleep interrupted > 2015-10-14 19:45:07,844 INFO namenode.FSNamesystem > (FSNamesystem.java:stopActiveServices(1298)) - Stopping services started for > active state > 2015-10-14 19:45:07,845 INFO namenode.FSNamesystem > (FSNamesystem.java:stopStandbyServices(1386)) - Stopping services started for > standby state > 2015-10-14 19:45:07,848 INFO mortbay.log (Slf4jLog.java:info(67)) - Stopped > HttpServer2$SelectChannelConnectorWithSafeStartup@localhost:0 -- This message was sent by Atlassian JIRA (v6.3.4#6332)