[jira] [Commented] (HDFS-3212) Persist the epoch received by the JournalService

2012-04-06 Thread Todd Lipcon (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13248587#comment-13248587
 ] 

Todd Lipcon commented on HDFS-3212:
---

bq. Todd, if you are referring to creating a edit log with the name format 
edit_log_in_progress or when finalized 
edit_log, it is a better solution that 
creating a seperate metadata file.

Sure, that works too. Except you'll have to change a ton of FileJournalManager 
code paths to do this...

bq. Otherwise, Suresh's solution in adding the epoch number in start log 
segment sounds good.

I still think that's really wrong, because transaction _data_ is separate from 
transaction _storage_. Epoch numbers are a storage layer thing.

bq. Actually, for debugging purposes, we should add more information such as 
time when the journal was started, NN id of owner etc along with epoch number

I agree with all of the above, except for the epoch number. The timestamp, NN 
id, hostname, etc, are all NN-layer things, whereas the epoch number is an 
edits storage layer thing.

> Persist the epoch received by the JournalService
> 
>
> Key: HDFS-3212
> URL: https://issues.apache.org/jira/browse/HDFS-3212
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ha, name-node
>Affects Versions: Shared journals (HDFS-3092)
>Reporter: Suresh Srinivas
>
> epoch received over JournalProtocol should be persisted by JournalService.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-3212) Persist the epoch received by the JournalService

2012-04-06 Thread Hari Mankude (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13248498#comment-13248498
 ] 

Hari Mankude commented on HDFS-3212:


bq.I agree, if you're talking about prefixing it at the beginning of the file, 
before the first transaction. But, if you're talking about actually putting it 
in the content of the first transaction, I think it's a bad idea for the reason 
above. 

Todd, if you are referring to creating a edit log with the name format 
edit_log__in_progress or when finalized 
edit_log___, it is a better solution that 
creating a seperate metadata file. Otherwise, Suresh's solution in adding the 
epoch number in start log segment sounds good. Actually, for debugging 
purposes, we should add more information such as time when the journal was 
started, NN id of owner etc along with epoch number. Basically convert 
OP_START_LOG_SEGMENT to hold journal header info.



> Persist the epoch received by the JournalService
> 
>
> Key: HDFS-3212
> URL: https://issues.apache.org/jira/browse/HDFS-3212
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ha, name-node
>Affects Versions: Shared journals (HDFS-3092)
>Reporter: Suresh Srinivas
>
> epoch received over JournalProtocol should be persisted by JournalService.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-3212) Persist the epoch received by the JournalService

2012-04-05 Thread Todd Lipcon (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13247894#comment-13247894
 ] 

Todd Lipcon commented on HDFS-3212:
---

bq. I do not understand what you mean by NN layer. Epoch is a notion from 
JournalManager to the JournalNode. Both need to understand this and provide 
appropriate guarantees.

Currently, the NN code when starting a new log segment looks like this:
{code}
  editLogStream = journalSet.startLogSegment(segmentTxId);
...
if (writeHeaderTxn) {
  logEdit(LogSegmentOp.getInstance(
  FSEditLogOpCodes.OP_START_LOG_SEGMENT));
  logSync();
}
{code}

So the operation of starting a segment, and writing the OP_START_LOG_SEGMENT 
transaction are separate. In general, the JournalManager abstraction doesn't 
know about the contents of the edits it's writing -- it's just responsible for 
bytes. If you wanted to include the epoch number in the OP_START_LOG_SEGMENT 
transaction, you'd have to have the NN code do something like 
{{journalManager.getCurrentEpoch()}}, and then feed that into the logEdit call. 
But that's not very generic, so it seems like a leak of abstraction.

bq. Whether you store it in a directory per-epoch or record it in the 
startlogSegment record at the beginning of the segment - they are essentially 
the same.

I agree, if you're talking about prefixing it at the beginning of the file, 
before the first transaction. But, if you're talking about actually putting it 
in the content of the first transaction, I think it's a bad idea for the reason 
above. My preference is to keep it separated from the file, so that the files 
written by JournalDaemon are exactly identical to the files that would be 
written by FileJournalManager. That allows you to copy to and from the 
different types of nodes without any difference in format.

> Persist the epoch received by the JournalService
> 
>
> Key: HDFS-3212
> URL: https://issues.apache.org/jira/browse/HDFS-3212
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ha, name-node
>Affects Versions: Shared journals (HDFS-3092)
>Reporter: Suresh Srinivas
>
> epoch received over JournalProtocol should be persisted by JournalService.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-3212) Persist the epoch received by the JournalService

2012-04-05 Thread Suresh Srinivas (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13247884#comment-13247884
 ] 

Suresh Srinivas commented on HDFS-3212:
---

bq. Currently, when FSEditLog starts a new segment, it calls 
journal.startLogSegment(), then journal.logEdit(StartLogSegmentOp), then 
journal.logSync(). So there is a point of time when the log segment is empty, 
with no transactions. If instead, we changed it so that the startLogSegment() 
call was responsible for writing the first transaction (and only the first), 
atomically, then we might not have a problem. We just have to make the 
restriction that the first transaction of any segment is always deterministic 
(eg just START_LOG_SEGMENT(txid) and nothing else).

Okay, I am surprise to find this. All along, in previous discussions, I have 
been assuming that JournalManager calls roll to JournalService and the startLog 
transaction is recorded in JournalService. This is when epoch also gets 
persisted along with that record.

bq. I think it's just a matter of getting the ordering right. Before starting a 
log segment, you need to fence prior writers. The fencing step is what writes 
down the epoch. Then, when you create a new log segment, you tag it (eg by 
storing it in a directory per-epoch, or by writing a metadata file next to it 
before you create the file). I think this is sufficiently atomic.

Whether you store it in a directory per-epoch or record it in the 
startlogSegment record at the beginning of the segment - they are essentially 
the same.

> Persist the epoch received by the JournalService
> 
>
> Key: HDFS-3212
> URL: https://issues.apache.org/jira/browse/HDFS-3212
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ha, name-node
>Affects Versions: Shared journals (HDFS-3092)
>Reporter: Suresh Srinivas
>
> epoch received over JournalProtocol should be persisted by JournalService.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-3212) Persist the epoch received by the JournalService

2012-04-05 Thread Suresh Srinivas (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13247880#comment-13247880
 ] 

Suresh Srinivas commented on HDFS-3212:
---

bq. I don't think it's reasonable to put the epoch number inside the START 
transaction, because that leaks the idea of epochs out of the journal manager 
layer into the NN layer.
I do not understand what you mean by NN layer. Epoch is a notion from 
JournalManager to the JournalNode. Both need to understand this and provide 
appropriate guarantees.

bq. Also, if the JN restarts, when it comes up, how do you make sure that an 
old NN doesn't come back to life with a startLogSegment transaction?
Can you give me an example. I am not sure I understand the issue.

> Persist the epoch received by the JournalService
> 
>
> Key: HDFS-3212
> URL: https://issues.apache.org/jira/browse/HDFS-3212
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ha, name-node
>Affects Versions: Shared journals (HDFS-3092)
>Reporter: Suresh Srinivas
>
> epoch received over JournalProtocol should be persisted by JournalService.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-3212) Persist the epoch received by the JournalService

2012-04-05 Thread Todd Lipcon (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13247871#comment-13247871
 ] 

Todd Lipcon commented on HDFS-3212:
---

bq. Is it the case that JN will reject it since the old NN has a smaller epoch?

Right -- that's why it needs to persist, IMO.

bq. 2. might be less optimal because now it consists of 2 operations. 1) 
rolling the log and creating a new segment 2) updating a metadata file.

I think it's just a matter of getting the ordering right. Before starting a log 
segment, you need to fence prior writers. The fencing step is what writes down 
the epoch. Then, when you create a new log segment, you tag it (eg by storing 
it in a directory per-epoch, or by writing a metadata file next to it before 
you create the file). I think this is sufficiently atomic.

bq. So 2 edit logs with same txid but can be differentiated using epochs

I've had another idea which I want to write up in the design doc. But, 
basically, I think we can solve this problem more simply by the following:
- Currently, when FSEditLog starts a new segment, it calls 
journal.startLogSegment(), then journal.logEdit(StartLogSegmentOp), then 
journal.logSync(). So there is a point of time when the log segment is empty, 
with no transactions. If instead, we changed it so that the startLogSegment() 
call was responsible for writing the first transaction (and only the first), 
atomically, then we might not have a problem. We just have to make the 
restriction that the first transaction of any segment is always deterministic 
(eg just START_LOG_SEGMENT(txid) and nothing else).

Let me revise the design doc in HDFS-3077 with this idea to see if it works 
when fully fleshed out.


> Persist the epoch received by the JournalService
> 
>
> Key: HDFS-3212
> URL: https://issues.apache.org/jira/browse/HDFS-3212
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ha, name-node
>Affects Versions: Shared journals (HDFS-3092)
>Reporter: Suresh Srinivas
>
> epoch received over JournalProtocol should be persisted by JournalService.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-3212) Persist the epoch received by the JournalService

2012-04-05 Thread Bikas Saha (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13247854#comment-13247854
 ] 

Bikas Saha commented on HDFS-3212:
--

I have been trying to read ZAB and re-read PAXOS before I make some comments on 
some of the epoch stuff.
At first glance, it seems to me that some of these operations need to be 
atomic. I havent caught up with HDFS-3077 but I remember Tod clarifying to an 
example of mine by saying that edit log segments are relevant in the context of 
an epoch. So 2 edit logs with same txid but can be differentiated using epochs. 
In that case, it makes sense tying the epoch to segment relation in the roll 
via 1 above. Because then creating a segment and attaching it to an epoch would 
be 1 operation to the extent rolling is 1 operation.
2. might be less optimal because now it consists of 2 operations. 1) rolling 
the log and creating a new segment 2) updating a metadata file.
However, my understanding of rolling might be incomplete. So please take this 
with the necessary pinch of salt :P

> Persist the epoch received by the JournalService
> 
>
> Key: HDFS-3212
> URL: https://issues.apache.org/jira/browse/HDFS-3212
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ha, name-node
>Affects Versions: Shared journals (HDFS-3092)
>Reporter: Suresh Srinivas
>
> epoch received over JournalProtocol should be persisted by JournalService.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-3212) Persist the epoch received by the JournalService

2012-04-05 Thread Tsz Wo (Nicholas), SZE (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13247855#comment-13247855
 ] 

Tsz Wo (Nicholas), SZE commented on HDFS-3212:
--

> Also, if the JN restarts, when it comes up, how do you make sure that an old 
> NN doesn't come back to life with a startLogSegment transaction?

Is it the case that JN will reject it since the old NN has a smaller epoch?

> Persist the epoch received by the JournalService
> 
>
> Key: HDFS-3212
> URL: https://issues.apache.org/jira/browse/HDFS-3212
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ha, name-node
>Affects Versions: Shared journals (HDFS-3092)
>Reporter: Suresh Srinivas
>
> epoch received over JournalProtocol should be persisted by JournalService.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-3212) Persist the epoch received by the JournalService

2012-04-05 Thread Todd Lipcon (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13247847#comment-13247847
 ] 

Todd Lipcon commented on HDFS-3212:
---

I don't think it's reasonable to put the epoch number inside the START 
transaction, because that leaks the idea of epochs out of the journal manager 
layer into the NN layer.

Also, if the JN restarts, when it comes up, how do you make sure that an old NN 
doesn't come back to life with a startLogSegment transaction?

I think you need to record the epoch number separately from the idea of 
segments, for fencing purposes, since you aren't always guaranteed to be in the 
middle of a segment, and you don't want disagreement about who gets to call 
startLogSegment.

> Persist the epoch received by the JournalService
> 
>
> Key: HDFS-3212
> URL: https://issues.apache.org/jira/browse/HDFS-3212
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ha, name-node
>Affects Versions: Shared journals (HDFS-3092)
>Reporter: Suresh Srinivas
>
> epoch received over JournalProtocol should be persisted by JournalService.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-3212) Persist the epoch received by the JournalService

2012-04-05 Thread Suresh Srinivas (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13247836#comment-13247836
 ] 

Suresh Srinivas commented on HDFS-3212:
---

There is some discussion in HDFS-3077 about this. Currently two alternatives 
under consideration are:
# Use the record we write during starting of a log segment to record the epoch. 
#* On fence method call, a JournalService promises not to accept any other 
requests from old active.
#* After fence, the next call is to roll, when a new log segment is created. 
JournalService records in this record the epoch.
#* This fits in nicely with every log segment belongs to a single epoch.
# Use a separate metadata file to record epoch.

Based on discussions in 3077, lets choose one of the options.

> Persist the epoch received by the JournalService
> 
>
> Key: HDFS-3212
> URL: https://issues.apache.org/jira/browse/HDFS-3212
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ha, name-node
>Affects Versions: Shared journals (HDFS-3092)
>Reporter: Suresh Srinivas
>
> epoch received over JournalProtocol should be persisted by JournalService.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira