[ 
https://issues.apache.org/jira/browse/HDFS-2305?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron T. Myers updated HDFS-2305:
---------------------------------

    Attachment: hdfs-2305.1.patch

> Running multiple 2NNs can result in corrupt file system
> -------------------------------------------------------
>
>                 Key: HDFS-2305
>                 URL: https://issues.apache.org/jira/browse/HDFS-2305
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: name-node
>    Affects Versions: 0.20.2
>            Reporter: Aaron T. Myers
>            Assignee: Aaron T. Myers
>         Attachments: hdfs-2305-test.patch, hdfs-2305.0.patch, 
> hdfs-2305.1.patch
>
>
> Here's the scenario:
> * You run the NN and 2NN (2NN A) on the same machine.
> * You don't have the address of the 2NN configured, so it's defaulting to 
> 127.0.0.1.
> * There's another 2NN (2NN B) running on a second machine.
> * When a 2NN is done checkpointing, it says "hey NN, I have an updated 
> fsimage for you. You can download it from this URL, which includes my IP 
> address, which is x"
> And here's the steps that occur to cause this issue:
> # Some edits happen.
> # 2NN A (on the NN machine) does a checkpoint. All is dandy.
> # Some more edits happen.
> # 2NN B (on a different machine) does a checkpoint. It tells the NN "grab the 
> newly-merged fsimage file from 127.0.0.1"
> # NN happily grabs the fsimage from 2NN A (the 2NN on the NN machine), which 
> is stale.
> # NN renames edits.new file to edits. At this point the in-memory FS state is 
> fine, but the on-disk state is missing edits.
> # The next time a 2NN (any 2NN) tries to do a checkpoint, it gets an 
> up-to-date edits file, with an outdated fsimage, and tries to apply those 
> edits to that fsimage.
> # Kaboom.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to