[ 
https://issues.apache.org/jira/browse/HDFS-1580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13026742#comment-13026742
 ] 

Jitendra Nath Pandey commented on HDFS-1580:
--------------------------------------------

- The design doesn't go in any detail regarding snapshots concurring to your 
view. However, I mentioned about it because it is one of the requirements we 
will have to address eventually.
- This jira doesn't change any semantics related to the layout version. The 
version is a piece of metadata that needs to be stored with edit logs so that 
namenode can understand and load edit logs. I am open to making it a byte array 
instead of just an integer so that namenode can store any metadata it wants to 
store, which is relevant for understanding the edit logs. I agree that version 
is a little overloaded but that can be addressed in a different jira.
- I think retention policy for edit logs should be namenode's responsibility, 
because retention of edit logs will be closely tied with retention of old 
checkpoint images. If namenode has called purgeTransactions it should never ask 
for older transaction ids.
- "mark" means that the last written transaction is available for reading 
including all previous transactions. sinceTxnId in getInputStream can be any 
transaction Id before the last call of mark or close of the output stream. 
Apart from that, sinceTxnId doesn't assume any boundary.
- The motivation for "mark" method was that BK has this limitation that open 
ledgers cannot be read, "mark" will give a cue to a BK implementation that the 
current ledger should be made available for reading. If an implementation 
doesn't have this limitation it can just ignore mark, that is why I didn't call 
it roll. That also explains that it is different from sync.
- I assumed that a write also syncs, because in most operations we sync 
immediately after writing the log, and in this design we are writing the entire 
transaction as a unit. Management of buffers and flush, should be the 
responsibility of the implementation.
- In EditLogInputStream, I think we can rename next to readNext, it will look 
less like iterator. One way to avoid extra array copy would be that readNext() 
reads the version and txnId and synchronizes the underlying inputstream to the 
begining of transaction record and then getTxn can directly return the 
underlying inputstream for reading the transaction bytes. Does that make sense?

LogSegements:
  LogSegments gets rid of roll method but exposes the underlying units of 
storage to the namenode which I don't think is required.

>.. elsewhere we have discussed that we want to keep the property that logs 
>always roll together across all parts of the system.
  Do we really want this property? Isn't it better that we don't expose any 
boundaries between transactions to the namenode?
> We generally want the property that, while saving a namespace or in safe 
> mode, we don't accept edits.
  This can be achieved by just closing the EditLogOutputStream.
  


> Add interface for generic Write Ahead Logging mechanisms
> --------------------------------------------------------
>
>                 Key: HDFS-1580
>                 URL: https://issues.apache.org/jira/browse/HDFS-1580
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>            Reporter: Ivan Kelly
>             Fix For: Edit log branch (HDFS-1073)
>
>         Attachments: EditlogInterface.1.pdf, HDFS-1580+1521.diff, 
> HDFS-1580.diff, HDFS-1580.diff, HDFS-1580.diff, generic_wal_iface.pdf, 
> generic_wal_iface.pdf, generic_wal_iface.pdf, generic_wal_iface.txt
>
>


--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to