[jira] [Commented] (HDFS-3212) Persist the epoch received by the JournalService
[ https://issues.apache.org/jira/browse/HDFS-3212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13248587#comment-13248587 ] Todd Lipcon commented on HDFS-3212: --- bq. Todd, if you are referring to creating a edit log with the name format edit_log_in_progress or when finalized edit_log, it is a better solution that creating a seperate metadata file. Sure, that works too. Except you'll have to change a ton of FileJournalManager code paths to do this... bq. Otherwise, Suresh's solution in adding the epoch number in start log segment sounds good. I still think that's really wrong, because transaction _data_ is separate from transaction _storage_. Epoch numbers are a storage layer thing. bq. Actually, for debugging purposes, we should add more information such as time when the journal was started, NN id of owner etc along with epoch number I agree with all of the above, except for the epoch number. The timestamp, NN id, hostname, etc, are all NN-layer things, whereas the epoch number is an edits storage layer thing. > Persist the epoch received by the JournalService > > > Key: HDFS-3212 > URL: https://issues.apache.org/jira/browse/HDFS-3212 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: ha, name-node >Affects Versions: Shared journals (HDFS-3092) >Reporter: Suresh Srinivas > > epoch received over JournalProtocol should be persisted by JournalService. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3212) Persist the epoch received by the JournalService
[ https://issues.apache.org/jira/browse/HDFS-3212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13248498#comment-13248498 ] Hari Mankude commented on HDFS-3212: bq.I agree, if you're talking about prefixing it at the beginning of the file, before the first transaction. But, if you're talking about actually putting it in the content of the first transaction, I think it's a bad idea for the reason above. Todd, if you are referring to creating a edit log with the name format edit_log__in_progress or when finalized edit_log___, it is a better solution that creating a seperate metadata file. Otherwise, Suresh's solution in adding the epoch number in start log segment sounds good. Actually, for debugging purposes, we should add more information such as time when the journal was started, NN id of owner etc along with epoch number. Basically convert OP_START_LOG_SEGMENT to hold journal header info. > Persist the epoch received by the JournalService > > > Key: HDFS-3212 > URL: https://issues.apache.org/jira/browse/HDFS-3212 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: ha, name-node >Affects Versions: Shared journals (HDFS-3092) >Reporter: Suresh Srinivas > > epoch received over JournalProtocol should be persisted by JournalService. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3212) Persist the epoch received by the JournalService
[ https://issues.apache.org/jira/browse/HDFS-3212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13247894#comment-13247894 ] Todd Lipcon commented on HDFS-3212: --- bq. I do not understand what you mean by NN layer. Epoch is a notion from JournalManager to the JournalNode. Both need to understand this and provide appropriate guarantees. Currently, the NN code when starting a new log segment looks like this: {code} editLogStream = journalSet.startLogSegment(segmentTxId); ... if (writeHeaderTxn) { logEdit(LogSegmentOp.getInstance( FSEditLogOpCodes.OP_START_LOG_SEGMENT)); logSync(); } {code} So the operation of starting a segment, and writing the OP_START_LOG_SEGMENT transaction are separate. In general, the JournalManager abstraction doesn't know about the contents of the edits it's writing -- it's just responsible for bytes. If you wanted to include the epoch number in the OP_START_LOG_SEGMENT transaction, you'd have to have the NN code do something like {{journalManager.getCurrentEpoch()}}, and then feed that into the logEdit call. But that's not very generic, so it seems like a leak of abstraction. bq. Whether you store it in a directory per-epoch or record it in the startlogSegment record at the beginning of the segment - they are essentially the same. I agree, if you're talking about prefixing it at the beginning of the file, before the first transaction. But, if you're talking about actually putting it in the content of the first transaction, I think it's a bad idea for the reason above. My preference is to keep it separated from the file, so that the files written by JournalDaemon are exactly identical to the files that would be written by FileJournalManager. That allows you to copy to and from the different types of nodes without any difference in format. > Persist the epoch received by the JournalService > > > Key: HDFS-3212 > URL: https://issues.apache.org/jira/browse/HDFS-3212 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: ha, name-node >Affects Versions: Shared journals (HDFS-3092) >Reporter: Suresh Srinivas > > epoch received over JournalProtocol should be persisted by JournalService. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3212) Persist the epoch received by the JournalService
[ https://issues.apache.org/jira/browse/HDFS-3212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13247884#comment-13247884 ] Suresh Srinivas commented on HDFS-3212: --- bq. Currently, when FSEditLog starts a new segment, it calls journal.startLogSegment(), then journal.logEdit(StartLogSegmentOp), then journal.logSync(). So there is a point of time when the log segment is empty, with no transactions. If instead, we changed it so that the startLogSegment() call was responsible for writing the first transaction (and only the first), atomically, then we might not have a problem. We just have to make the restriction that the first transaction of any segment is always deterministic (eg just START_LOG_SEGMENT(txid) and nothing else). Okay, I am surprise to find this. All along, in previous discussions, I have been assuming that JournalManager calls roll to JournalService and the startLog transaction is recorded in JournalService. This is when epoch also gets persisted along with that record. bq. I think it's just a matter of getting the ordering right. Before starting a log segment, you need to fence prior writers. The fencing step is what writes down the epoch. Then, when you create a new log segment, you tag it (eg by storing it in a directory per-epoch, or by writing a metadata file next to it before you create the file). I think this is sufficiently atomic. Whether you store it in a directory per-epoch or record it in the startlogSegment record at the beginning of the segment - they are essentially the same. > Persist the epoch received by the JournalService > > > Key: HDFS-3212 > URL: https://issues.apache.org/jira/browse/HDFS-3212 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: ha, name-node >Affects Versions: Shared journals (HDFS-3092) >Reporter: Suresh Srinivas > > epoch received over JournalProtocol should be persisted by JournalService. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3212) Persist the epoch received by the JournalService
[ https://issues.apache.org/jira/browse/HDFS-3212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13247880#comment-13247880 ] Suresh Srinivas commented on HDFS-3212: --- bq. I don't think it's reasonable to put the epoch number inside the START transaction, because that leaks the idea of epochs out of the journal manager layer into the NN layer. I do not understand what you mean by NN layer. Epoch is a notion from JournalManager to the JournalNode. Both need to understand this and provide appropriate guarantees. bq. Also, if the JN restarts, when it comes up, how do you make sure that an old NN doesn't come back to life with a startLogSegment transaction? Can you give me an example. I am not sure I understand the issue. > Persist the epoch received by the JournalService > > > Key: HDFS-3212 > URL: https://issues.apache.org/jira/browse/HDFS-3212 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: ha, name-node >Affects Versions: Shared journals (HDFS-3092) >Reporter: Suresh Srinivas > > epoch received over JournalProtocol should be persisted by JournalService. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3212) Persist the epoch received by the JournalService
[ https://issues.apache.org/jira/browse/HDFS-3212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13247871#comment-13247871 ] Todd Lipcon commented on HDFS-3212: --- bq. Is it the case that JN will reject it since the old NN has a smaller epoch? Right -- that's why it needs to persist, IMO. bq. 2. might be less optimal because now it consists of 2 operations. 1) rolling the log and creating a new segment 2) updating a metadata file. I think it's just a matter of getting the ordering right. Before starting a log segment, you need to fence prior writers. The fencing step is what writes down the epoch. Then, when you create a new log segment, you tag it (eg by storing it in a directory per-epoch, or by writing a metadata file next to it before you create the file). I think this is sufficiently atomic. bq. So 2 edit logs with same txid but can be differentiated using epochs I've had another idea which I want to write up in the design doc. But, basically, I think we can solve this problem more simply by the following: - Currently, when FSEditLog starts a new segment, it calls journal.startLogSegment(), then journal.logEdit(StartLogSegmentOp), then journal.logSync(). So there is a point of time when the log segment is empty, with no transactions. If instead, we changed it so that the startLogSegment() call was responsible for writing the first transaction (and only the first), atomically, then we might not have a problem. We just have to make the restriction that the first transaction of any segment is always deterministic (eg just START_LOG_SEGMENT(txid) and nothing else). Let me revise the design doc in HDFS-3077 with this idea to see if it works when fully fleshed out. > Persist the epoch received by the JournalService > > > Key: HDFS-3212 > URL: https://issues.apache.org/jira/browse/HDFS-3212 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: ha, name-node >Affects Versions: Shared journals (HDFS-3092) >Reporter: Suresh Srinivas > > epoch received over JournalProtocol should be persisted by JournalService. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3212) Persist the epoch received by the JournalService
[ https://issues.apache.org/jira/browse/HDFS-3212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13247854#comment-13247854 ] Bikas Saha commented on HDFS-3212: -- I have been trying to read ZAB and re-read PAXOS before I make some comments on some of the epoch stuff. At first glance, it seems to me that some of these operations need to be atomic. I havent caught up with HDFS-3077 but I remember Tod clarifying to an example of mine by saying that edit log segments are relevant in the context of an epoch. So 2 edit logs with same txid but can be differentiated using epochs. In that case, it makes sense tying the epoch to segment relation in the roll via 1 above. Because then creating a segment and attaching it to an epoch would be 1 operation to the extent rolling is 1 operation. 2. might be less optimal because now it consists of 2 operations. 1) rolling the log and creating a new segment 2) updating a metadata file. However, my understanding of rolling might be incomplete. So please take this with the necessary pinch of salt :P > Persist the epoch received by the JournalService > > > Key: HDFS-3212 > URL: https://issues.apache.org/jira/browse/HDFS-3212 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: ha, name-node >Affects Versions: Shared journals (HDFS-3092) >Reporter: Suresh Srinivas > > epoch received over JournalProtocol should be persisted by JournalService. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3212) Persist the epoch received by the JournalService
[ https://issues.apache.org/jira/browse/HDFS-3212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13247855#comment-13247855 ] Tsz Wo (Nicholas), SZE commented on HDFS-3212: -- > Also, if the JN restarts, when it comes up, how do you make sure that an old > NN doesn't come back to life with a startLogSegment transaction? Is it the case that JN will reject it since the old NN has a smaller epoch? > Persist the epoch received by the JournalService > > > Key: HDFS-3212 > URL: https://issues.apache.org/jira/browse/HDFS-3212 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: ha, name-node >Affects Versions: Shared journals (HDFS-3092) >Reporter: Suresh Srinivas > > epoch received over JournalProtocol should be persisted by JournalService. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3212) Persist the epoch received by the JournalService
[ https://issues.apache.org/jira/browse/HDFS-3212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13247847#comment-13247847 ] Todd Lipcon commented on HDFS-3212: --- I don't think it's reasonable to put the epoch number inside the START transaction, because that leaks the idea of epochs out of the journal manager layer into the NN layer. Also, if the JN restarts, when it comes up, how do you make sure that an old NN doesn't come back to life with a startLogSegment transaction? I think you need to record the epoch number separately from the idea of segments, for fencing purposes, since you aren't always guaranteed to be in the middle of a segment, and you don't want disagreement about who gets to call startLogSegment. > Persist the epoch received by the JournalService > > > Key: HDFS-3212 > URL: https://issues.apache.org/jira/browse/HDFS-3212 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: ha, name-node >Affects Versions: Shared journals (HDFS-3092) >Reporter: Suresh Srinivas > > epoch received over JournalProtocol should be persisted by JournalService. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3212) Persist the epoch received by the JournalService
[ https://issues.apache.org/jira/browse/HDFS-3212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13247836#comment-13247836 ] Suresh Srinivas commented on HDFS-3212: --- There is some discussion in HDFS-3077 about this. Currently two alternatives under consideration are: # Use the record we write during starting of a log segment to record the epoch. #* On fence method call, a JournalService promises not to accept any other requests from old active. #* After fence, the next call is to roll, when a new log segment is created. JournalService records in this record the epoch. #* This fits in nicely with every log segment belongs to a single epoch. # Use a separate metadata file to record epoch. Based on discussions in 3077, lets choose one of the options. > Persist the epoch received by the JournalService > > > Key: HDFS-3212 > URL: https://issues.apache.org/jira/browse/HDFS-3212 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: ha, name-node >Affects Versions: Shared journals (HDFS-3092) >Reporter: Suresh Srinivas > > epoch received over JournalProtocol should be persisted by JournalService. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira