[jira] Commented: (HDFS-1073) Simpler model for Namenode's fs Image and edit Logs

Todd Lipcon (JIRA) Sun, 04 Apr 2010 20:27:56 -0700

    [ 
https://issues.apache.org/jira/browse/HDFS-1073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12853290#action_12853290
 ]


Todd Lipcon commented on HDFS-1073:
-----------------------------------

bq.  what are the pros-and-cons of numbering the files sequentially, fsimage_0, 
fsimage_1, etc vs appending the last known transaction into the filename?

Interesting question. The pro I can think of for sequential numbering 
(0,1,2...) is that we can determine whether there is a "gap" in edit logs 
without looking at file contents. For example, if we see edits_0, edits_1, 
edits_3 we know that this edits directory is corrupt since we missed edits_2. 
Whereas with txn IDs we can only detect a gap by reading through the entirety 
of the file and counting transactions.

The pro of txid numbering is that we can detect the case where some middle log 
got truncated. For example, if we have edits_0, edits_1000, and edits_2000, but 
edits_1000 only contains 500 edits, we can fail at that point.

However, there's nothing stopping us from getting the benefits of both - we 
could either make the filenames something like edits_<idx>_<first txid>, or 
just make sure we store the first txid in the header of the edit log.

Sanjay mentioned "it decouples the split (ie roll) of the edit log and the 
checkpoint of the image" but I'm not sure what he meant by that. I think we can 
still achieve the same goal using indexed files, as long as each roll 
increments the index. So, if we roll three times but only succeed to checkpoint 
once, we'd see fsimage_0, edits_0, edits_1, edits_2, fsimage_2, edits_3 (where 
fsimage_0 and edits_0 through edits_2 may be GCed according to ageout policy)


bq. this is very different from what we currently got in the trunk. And this is 
a heavyweight change

Agree this is a large change, however I think it will reduce the amount of 
complicated statemachine code, and we know there are several very tricky bugs 
in the trunk implementation. I think this simpler design will be easier to 
understand and thus harder to write bugs into. Plus, it has the nice property 
that even if there is a bug it will be _very_ hard to write one that corrupts 
the data since old versions can be lazily deleted and are never modified after 
close.


> Simpler model for Namenode's fs Image and edit Logs 
> ----------------------------------------------------
>
>                 Key: HDFS-1073
>                 URL: https://issues.apache.org/jira/browse/HDFS-1073
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>            Reporter: Sanjay Radia
>
> The naming and handling of  NN's fsImage and edit logs can be significantly 
> improved resulting simpler and more robust code.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HDFS-1073) Simpler model for Namenode's fs Image and edit Logs

Reply via email to