[ 
https://issues.apache.org/jira/browse/HDFS-957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12831605#action_12831605
 ] 

Todd Lipcon commented on HDFS-957:
----------------------------------

bq. Now, the NN closes the file and the header sector that has the correct 
LAYOUT VERSION is flushed to the disk whereas some other pages of the file 
encountered an error while being flushed

In the attached patch I actually close the file, then re-open with 
RandomAccessFile to set the header correct. My thinking was that the close() 
would flush -- it's probably not guaranteed, but my thinking is that with most 
journaled filesystems (eg ext3 w/ data=ordered) the data must be written before 
the metadata transaction will commit to disk. Therefore on an OS crash or power 
outage the file will either not have committed its metadata, in which case it 
will appear empty or not at all, or it will have committed its metadata _after_ 
flushing the complete data. Whether the rewritten correct LAYOUT_VERSION has 
been synced is unknown, but I think the ordering is going to be correct.

That said, we should probably stick a FileChannel.force() in there after both 
writes to be extra safe. It may hurt performance a little bit, but I think this 
is one of those areas where we should err on the side of safety, yea?

bq. The other question is that the device that shall store the FSImage now 
needs to be Seekable. Is this the case earlier too?

The current code simply uses a FileOutputStream - there's no abstraction going 
on. So it is assuming the existence of a normal filesystem with random access. 
For the edit logs, we can't assume seek currently, but I think it's a fair 
assumption for the images.




> FSImage layout version should be only once file is complete
> -----------------------------------------------------------
>
>                 Key: HDFS-957
>                 URL: https://issues.apache.org/jira/browse/HDFS-957
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: name-node
>    Affects Versions: 0.22.0
>            Reporter: Todd Lipcon
>            Assignee: Todd Lipcon
>         Attachments: hdfs-957.txt
>
>
> Right now, the FSImage save code writes the LAYOUT_VERSION at the head of the 
> file, along with some other headers, and then dumps the directory into the 
> file. Instead, it should write a special IMAGE_IN_PROGRESS entry for the 
> layout version, dump all of the data, then seek back to the head of the file 
> to write the proper LAYOUT_VERSION. This would make it very easy to detect 
> the case where the FSImage save got interrupted.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to