[
https://issues.apache.org/jira/browse/HDFS-2018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13082274#comment-13082274
]
Ivan Kelly commented on HDFS-2018:
----------------------------------
{quote}
getFirstTxnId and getLastTxnId seem a bit redundant in the EditLogInputStream
interface. The first txnid must be the last read plus one. The last txnid can
be obtained using getNumberOfTransactions.
Similarly in the constructor of EditLogFileInputStream. It might lead to
inconsistent use of EditLogFileInputStream, for example if the file contents
don't match the transaction ids being passed.
{quote}
getLastTxId is required by FSEditLog#selectInputStreams to make sure a
continuous set of streams is selected. getNumberOfTransactions wouldn't work
here, because that counts the how many transactions are available on a journal
manager from that point, not how much is in the next segment.
Take the scenario where you have to logs with txns A[[1,100][101,140][201,300]]
& B[[1,100][101,200][201,240]]. A has had an error at txn 140 so that stream is
incomplete. B has had an error at txn 240, so that stream is incomplete.
Now if you used getNumberOfTransactions(101) for B, you get 140, and A you get
40. So, the stream from B is selected. But we can't read all 140, we must only
read the next segment as we can't start reading half way through a
segment(actually you can since HDFS-2187, but that was done after this and its
still undesirable). Since we are selecting all the streams before starting to
read them, we can't wait until we've read to the end of the stream to get last
txid. So getLastTxId() is useful here.
getFirstTxId() is very useful in BackupImage to find the current inprogress
stream.
> 1073: Move all journal stream management code into one place
> ------------------------------------------------------------
>
> Key: HDFS-2018
> URL: https://issues.apache.org/jira/browse/HDFS-2018
> Project: Hadoop HDFS
> Issue Type: Sub-task
> Reporter: Ivan Kelly
> Assignee: Ivan Kelly
> Fix For: 0.23.0
>
> Attachments: HDFS-2018.diff, HDFS-2018.diff, HDFS-2018.diff,
> HDFS-2018.diff, HDFS-2018.diff, HDFS-2018.diff, HDFS-2018.diff,
> HDFS-2018.diff, HDFS-2018.diff, HDFS-2018.diff, HDFS-2018.diff,
> HDFS-2018.diff, HDFS-2018.diff, HDFS-2018.diff
>
>
> Currently in the HDFS-1073 branch, the code for creating output streams is in
> FileJournalManager and the code for input streams is in the inspectors. This
> change does a number of things.
> - Input and Output streams are now created by the JournalManager.
> - FSImageStorageInspectors now deals with URIs when referring to edit logs
> - Recovery of inprogress logs is performed by counting the number of
> transactions instead of looking at the length of the file.
> The patch for this applies on top of the HDFS-1073 branch + HDFS-2003 patch.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira