[ 
https://issues.apache.org/jira/browse/HDFS-2018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13082274#comment-13082274
 ] 

Ivan Kelly commented on HDFS-2018:
----------------------------------

{quote}
getFirstTxnId and getLastTxnId seem a bit redundant in the EditLogInputStream 
interface. The first txnid must be the last read plus one. The last txnid can 
be obtained using getNumberOfTransactions.
Similarly in the constructor of EditLogFileInputStream. It might lead to 
inconsistent use of EditLogFileInputStream, for example if the file contents 
don't match the transaction ids being passed.
{quote}
getLastTxId is required by FSEditLog#selectInputStreams to make sure a 
continuous set of streams is selected. getNumberOfTransactions wouldn't work 
here, because that counts the how many transactions are available on a journal 
manager from that point, not how much is in the next segment.

Take the scenario where you have to logs with txns A[[1,100][101,140][201,300]] 
& B[[1,100][101,200][201,240]]. A has had an error at txn 140 so that stream is 
incomplete. B has had an error at txn 240, so that stream is incomplete. 

Now if you used getNumberOfTransactions(101) for B, you get 140, and A you get 
40. So, the stream from B is selected. But we can't read all 140, we must only 
read the next segment as we can't start reading half way through a 
segment(actually you can since HDFS-2187, but that was done after this and its 
still undesirable). Since we are selecting all the streams before starting to 
read them, we can't wait until we've read to the end of the stream to get last 
txid. So getLastTxId() is useful here.

getFirstTxId() is very useful in BackupImage to find the current inprogress 
stream.

> 1073: Move all journal stream management code into one place
> ------------------------------------------------------------
>
>                 Key: HDFS-2018
>                 URL: https://issues.apache.org/jira/browse/HDFS-2018
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>            Reporter: Ivan Kelly
>            Assignee: Ivan Kelly
>             Fix For: 0.23.0
>
>         Attachments: HDFS-2018.diff, HDFS-2018.diff, HDFS-2018.diff, 
> HDFS-2018.diff, HDFS-2018.diff, HDFS-2018.diff, HDFS-2018.diff, 
> HDFS-2018.diff, HDFS-2018.diff, HDFS-2018.diff, HDFS-2018.diff, 
> HDFS-2018.diff, HDFS-2018.diff, HDFS-2018.diff
>
>
> Currently in the HDFS-1073 branch, the code for creating output streams is in 
> FileJournalManager and the code for input streams is in the inspectors. This 
> change does a number of things.
>   - Input and Output streams are now created by the JournalManager.
>   - FSImageStorageInspectors now deals with URIs when referring to edit logs
>   - Recovery of inprogress logs is performed by counting the number of 
> transactions instead of looking at the length of the file.
> The patch for this applies on top of the HDFS-1073 branch + HDFS-2003 patch.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to