[ 
https://issues.apache.org/jira/browse/HDFS-2018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13062798#comment-13062798
 ] 

Ivan Kelly commented on HDFS-2018:
----------------------------------

{quote}
in selectInputStream, it's counting both finalized and unfinalized 
transactions. But at startup, it should be recovering all of the inprogress 
logs to finalized logs, right? Given that, I don't think we need the API 
getNumberOfTransactions – ie we only need the finalized one.
{quote}
We need both, there are two times which you need to count the number of 
transactions on a journal, startup and checkpointing. For startup you want to 
consider inprogress logs. They're the result of a crash. For checkpointing, 
they shouldn't be. The primary is still writing to an inprogress.
With a file based journal, you cannot tell if you are starting up or 
checkpointing without some kind of write lease for the journal, which we don't 
have now (May be a nice thing to have in future).

{quote}
the API change on the StorageArchiver interface seems less than ideal – an 
archiver may very well want to know the txid range of a log to know what to do 
with it – any way we can preserve this?
{quote}
I've put the txid range back into this API. I haven't used the FoundFSImage and 
FoundEditLog interfaces though, as it would create a circular dependency 
between StorageInspector and StorageArchiver. Also, FoundEditLog has gone away, 
so using File and longs makes it more uniform.

{quote}
the idea of the "remote edit log manifest" and the way we do edits transfer is 
inextricably linked to the idea of log segments. But, the new JournalManager 
APIs are based on the idea that logs are just sequences with no segmenting. I 
think having both ideas coexist is fairly confusing and a good opening for bugs 
– eg right now, the JournalManagers can return RemoteEditLogs for any 
transaction range, but the GetImageServlet still expects files. If edits are to 
be decoupled from files, then RemoteEditLogs should probably include a URI 
which identifies an edits transfer method. For FileJournalManager, the URI 
would be http-based and simply point to the GetImageServlet, but with BK-based 
logs it would point to the ZK ledger, right?
{quote}
Further to what I said about URIs last week, I spoke to Jitendra about this 
transfer before and he said that the plan was to take this functionality out of 
band, with rsync or something. Now that image and logs are decoupled this is 
possible.

> Move all journal stream management code into one place
> ------------------------------------------------------
>
>                 Key: HDFS-2018
>                 URL: https://issues.apache.org/jira/browse/HDFS-2018
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>            Reporter: Ivan Kelly
>            Assignee: Ivan Kelly
>             Fix For: Edit log branch (HDFS-1073)
>
>         Attachments: HDFS-2018.diff, HDFS-2018.diff, HDFS-2018.diff, 
> HDFS-2018.diff, HDFS-2018.diff, HDFS-2018.diff, HDFS-2018.diff, 
> HDFS-2018.diff, HDFS-2018.diff, HDFS-2018.diff
>
>
> Currently in the HDFS-1073 branch, the code for creating output streams is in 
> FileJournalManager and the code for input streams is in the inspectors. This 
> change does a number of things.
>   - Input and Output streams are now created by the JournalManager.
>   - FSImageStorageInspectors now deals with URIs when referring to edit logs
>   - Recovery of inprogress logs is performed by counting the number of 
> transactions instead of looking at the length of the file.
> The patch for this applies on top of the HDFS-1073 branch + HDFS-2003 patch.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


Reply via email to