[ https://issues.apache.org/jira/browse/HDFS-4302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Eugene Koontz reassigned HDFS-4302: ----------------------------------- Assignee: Eugene Koontz > Precondition in EditLogFileInputStream's length() method is checked too early > in NameNode startup, causing fatal exception > -------------------------------------------------------------------------------------------------------------------------- > > Key: HDFS-4302 > URL: https://issues.apache.org/jira/browse/HDFS-4302 > Project: Hadoop HDFS > Issue Type: Bug > Components: ha, namenode > Reporter: Eugene Koontz > Assignee: Eugene Koontz > Attachments: HDFS-4302.patch, HDFS-4302.patch > > > When bringing up a namenode in standby mode, where DEBUG is enabled for > namenode, the namenode will hit the following code in > {{hadoop/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSEditLogLoader.java}}: > {code} > if (LOG.isDebugEnabled()) { > LOG.debug("edit log length: " + in.length() + ", start txid: " > + expectedStartingTxId + ", last txid: " + lastTxId); > } > {code}. > However, if {{in}} has an {{EditLogFileInputStream}} as its {{streams[0]}}, > this code is hit before the {{EditLogFileInputStream}}'s {{advertizedSize}} > is initialized (before the HTTP client connects to the remote edit log server > (i.e. the journal node)). This causes the following precondition to fail in > {{EditLogFileInputStream:length()}}: > {code} > Preconditions.checkState(advertisedSize != -1, > "must get input stream before length is available"); > {code} > which shuts down the namenode with the following log messages and stack trace: > {code} > 2012-12-11 10:45:33,319 DEBUG ipc.ProtobufRpcEngine > (ProtobufRpcEngine.java:invoke(217)) - Call: getEditLogManifest took 88ms > 2012-12-11 10:45:33,336 DEBUG client.QuorumJournalManager > (QuorumJournalManager.java:selectInputStreams(459)) - selectInputStream > manifests: > 172.16.175.1:8485: [[1,3]] > 2012-12-11 10:45:33,351 DEBUG namenode.FSImage > (FSImage.java:loadFSImage(605)) - Planning to load image : > FSImageFile(file=/tmp/hadoop-data/dfs/name/current/fsimage_0000000000000000000, > cpktTxId=0000000000000000000) > 2012-12-11 10:45:33,351 DEBUG namenode.FSImage > (FSImage.java:loadFSImage(607)) - Planning to load edit log stream: > org.apache.hadoop.hdfs.server.namenode.RedundantEditLogInputStream@465098f9 > 2012-12-11 10:45:33,355 INFO namenode.FSImage (FSImageFormat.java:load(168)) > - Loading image file > /tmp/hadoop-data/dfs/name/current/fsimage_0000000000000000000 using no > compression > 2012-12-11 10:45:33,355 INFO namenode.FSImage (FSImageFormat.java:load(171)) > - Number of files = 1 > 2012-12-11 10:45:33,356 INFO namenode.FSImage > (FSImageFormat.java:loadFilesUnderConstruction(383)) - Number of files under > construction = 0 > 2012-12-11 10:45:33,357 INFO namenode.FSImage (FSImageFormat.java:load(193)) > - Image file of size 119 loaded in 0 seconds. > 2012-12-11 10:45:33,357 INFO namenode.FSImage > (FSImage.java:loadFSImage(753)) - Loaded image for txid 0 from > /tmp/hadoop-data/dfs/name/current/fsimage_0000000000000000000 > 2012-12-11 10:45:33,357 DEBUG namenode.FSImage (FSImage.java:loadEdits(686)) > - About to load edits: > org.apache.hadoop.hdfs.server.namenode.RedundantEditLogInputStream@465098f9 > 2012-12-11 10:45:33,359 INFO namenode.FSImage (FSImage.java:loadEdits(694)) > - Reading > org.apache.hadoop.hdfs.server.namenode.RedundantEditLogInputStream@465098f9 > expecting start txid #1 > 2012-12-11 10:45:33,361 DEBUG ipc.Client (Client.java:stop(1060)) - Stopping > client > 2012-12-11 10:45:33,363 DEBUG ipc.Client (Client.java:close(1016)) - IPC > Client (1462017562) connection to Eugenes-MacBook-Pro.local/172.16.175.1:8485 > from hdfs/eugenes-macbook-pro.lo...@example.com: closed > 2012-12-11 10:45:33,363 DEBUG ipc.Client (Client.java:run(848)) - IPC Client > (1462017562) connection to Eugenes-MacBook-Pro.local/172.16.175.1:8485 from > hdfs/eugenes-macbook-pro.lo...@example.com: stopped, remaining connections 0 > 2012-12-11 10:45:33,464 FATAL namenode.NameNode (NameNode.java:main(1224)) - > Exception in namenode join > java.lang.IllegalStateException: must get input stream before length is > available > at > com.google.common.base.Preconditions.checkState(Preconditions.java:145) > at > org.apache.hadoop.hdfs.server.namenode.EditLogFileInputStream$URLLog.length(EditLogFileInputStream.java:405) > at > org.apache.hadoop.hdfs.server.namenode.EditLogFileInputStream.length(EditLogFileInputStream.java:258) > at > org.apache.hadoop.hdfs.server.namenode.RedundantEditLogInputStream.length(RedundantEditLogInputStream.java:256) > at > org.apache.hadoop.hdfs.server.namenode.RedundantEditLogInputStream.length(RedundantEditLogInputStream.java:256) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:125) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:88) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:697) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:642) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:259) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:604) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:447) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:409) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:400) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:434) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:606) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:591) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1153) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1219) > 2012-12-11 10:45:33,470 INFO util.ExitUtil (ExitUtil.java:terminate(84)) - > Exiting with status 1 > 2012-12-11 10:45:33,471 INFO namenode.NameNode (StringUtils.java:run(620)) - > SHUTDOWN_MSG: > /************************************************************ > SHUTDOWN_MSG: Shutting down NameNode at Eugenes-MacBook-Pro.local/172.16.175.1 > ************************************************************/ > {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira