[ 
https://issues.apache.org/jira/browse/HDFS-1521?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Kelly updated HDFS-1521:
-----------------------------

    Attachment: HDFS-1521.diff

This patch addresses all Konstantin's comments except 11. There is something 
strange going on with lastAppliedTxid, that TestBackupNode isn't currently 
picking up. At line 177 change it to the following
{code}
      //
      // Take a checkpoint
      //
      backup = startBackupNode(conf, op, 1);
      waitCheckpointDone(backup);
      
      for (int i = 0; i < 10; i++) {
        writeFile(fileSys, new Path("file_" + i), replication);
      }
      
      backup.doCheckpoint();
      waitCheckpointDone(backup);

{code} 
This will trigger the test to fail. The normal run of the test doesn't exercise 
convergeJournalSpool, so usually you don't see this. 

So, now you'll see that if BackupNode loads a checkpoint, and then tries to 
journal something, the lastAppliedTxid + 1 will be 1 even though we've loaded 
in an image and editlog. The simple fix is to put 
{code}
      lastAppliedTxId = getEditLog().getLastWrittenTxId();
{code} 
in loadCheckpoint(). This should be the end of the story.

However, with this change, you get the error 
{quote}
java.io.IOException: Expected transaction ID 10 but got 11
{quote}
A transaction is going missing. Whats happening is, when doCheckpoint get 
kicked off, the log is rolled, and logJSpoolStart is called which creates an 
edit with opcode OP_JSPOOL_START. This opcode, is caught by 
EditLogBackupOutputStream and never transmitted to the backup node, so the 
transaction ids on the Primary and the Backup get out of sync. 

So, the question here is, is there any harm is actually transferring these 
OP_JSPOOL_START transactions, or are they just excluded as a precaution?

> Persist transaction ID on disk between NN restarts
> --------------------------------------------------
>
>                 Key: HDFS-1521
>                 URL: https://issues.apache.org/jira/browse/HDFS-1521
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: name-node
>    Affects Versions: 0.22.0
>            Reporter: Todd Lipcon
>            Assignee: Todd Lipcon
>         Attachments: HDFS-1521.diff, HDFS-1521.diff, hdfs-1521.3.txt, 
> hdfs-1521.4.txt, hdfs-1521.5.txt, hdfs-1521.txt, hdfs-1521.txt
>
>
> For HDFS-1073 and other future work, we'd like to have the concept of a 
> transaction ID that is persisted on disk with the image/edits. We already 
> have this concept in the NameNode but it resets to 0 on restart. We can also 
> use this txid to replace the _checkpointTime_ field, I believe.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to