[ https://issues.apache.org/jira/browse/HDFS-1580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13039769#comment-13039769 ]
Ivan Kelly commented on HDFS-1580: ---------------------------------- I've been working on implementing the design for HDFS-1580 on top of the HDFS-1073 branch and have run into a problem with #getNumberOfTransactions(). Specifically, I've been working on the input code in FSImage: {code} protected boolean loadEdits(JournalManager journal) throws IOException { LOG.debug("About to load edits:\n " + journal); FSEditLogLoader loader = new FSEditLogLoader(namesystem); long startingTxId = storage.getMostRecentCheckpointTxId() + 1; int numLoaded = 0; // Load latest edits long numTransactionsToLoad = journal.getNumberOfTransactions(startingTxId); while (numLoaded < numTransactionsToLoad) { EditLogInputStream editIn = journal.getInputStream(startingTxId); LOG.debug("Reading " + editIn + " expecting start txid #" + startingTxId); int thisNumLoaded = loader.loadFSEdits(editIn, startingTxId); startingTxId += thisNumLoaded; numLoaded += thisNumLoaded; editIn.close(); } // update the counts getFSNamesystem().dir.updateCountForINodeWithQuota(); // update the txid for the edit log editLog.setNextTxId(storage.getMostRecentCheckpointTxId() + numLoaded + 1); // If we loaded any edits, need to save. return numLoaded > 0; } {code} The load is in a loop now, as the output is still in LogSegment form, but even in a single stream implementation getNumberOfTransactions() presents a problem. The problem is that sometimes it is impossible to return a number for getNumberOfTransactions(). This case is when NameNode has crashed in the middle of an edit log. The editlog is named edits_inprogress_N where N is the first transaction id in the edit log. But since NN crashed, we dont know the last transaction so we cannot possibly return the number of transactions in the journal without scanning the file from the start. Without getNumberOfTransactions() its difficult to choose which journal has the most edits. HDFS-1073 uses the number of bytes in the file, but this doesn't feel very safe for anything that isn't a file. Whats more, if the start transaction of two journal snippets are out of sync, then it becomes impossible to choose which journal has the most transactions using just filesize(This is an argument for log segments). The simplest solution I see is to actually scan the _inprogress file from the start to get the last transaction written. As this should only happen in NameNode crashes, the delay for doing this shouldn't be prohibitive. > Add interface for generic Write Ahead Logging mechanisms > -------------------------------------------------------- > > Key: HDFS-1580 > URL: https://issues.apache.org/jira/browse/HDFS-1580 > Project: Hadoop HDFS > Issue Type: Improvement > Reporter: Ivan Kelly > Fix For: Edit log branch (HDFS-1073) > > Attachments: EditlogInterface.1.pdf, EditlogInterface.2.pdf, > HDFS-1580+1521.diff, HDFS-1580.diff, HDFS-1580.diff, HDFS-1580.diff, > generic_wal_iface.pdf, generic_wal_iface.pdf, generic_wal_iface.pdf, > generic_wal_iface.txt > > -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira