[ https://issues.apache.org/jira/browse/HDFS-957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12831605#action_12831605 ]
Todd Lipcon commented on HDFS-957: ---------------------------------- bq. Now, the NN closes the file and the header sector that has the correct LAYOUT VERSION is flushed to the disk whereas some other pages of the file encountered an error while being flushed In the attached patch I actually close the file, then re-open with RandomAccessFile to set the header correct. My thinking was that the close() would flush -- it's probably not guaranteed, but my thinking is that with most journaled filesystems (eg ext3 w/ data=ordered) the data must be written before the metadata transaction will commit to disk. Therefore on an OS crash or power outage the file will either not have committed its metadata, in which case it will appear empty or not at all, or it will have committed its metadata _after_ flushing the complete data. Whether the rewritten correct LAYOUT_VERSION has been synced is unknown, but I think the ordering is going to be correct. That said, we should probably stick a FileChannel.force() in there after both writes to be extra safe. It may hurt performance a little bit, but I think this is one of those areas where we should err on the side of safety, yea? bq. The other question is that the device that shall store the FSImage now needs to be Seekable. Is this the case earlier too? The current code simply uses a FileOutputStream - there's no abstraction going on. So it is assuming the existence of a normal filesystem with random access. For the edit logs, we can't assume seek currently, but I think it's a fair assumption for the images. > FSImage layout version should be only once file is complete > ----------------------------------------------------------- > > Key: HDFS-957 > URL: https://issues.apache.org/jira/browse/HDFS-957 > Project: Hadoop HDFS > Issue Type: Improvement > Components: name-node > Affects Versions: 0.22.0 > Reporter: Todd Lipcon > Assignee: Todd Lipcon > Attachments: hdfs-957.txt > > > Right now, the FSImage save code writes the LAYOUT_VERSION at the head of the > file, along with some other headers, and then dumps the directory into the > file. Instead, it should write a special IMAGE_IN_PROGRESS entry for the > layout version, dump all of the data, then seek back to the head of the file > to write the proper LAYOUT_VERSION. This would make it very easy to detect > the case where the FSImage save got interrupted. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.