[jira] [Commented] (HDFS-3863) QJM: track last committed txid

2012-09-04 Thread Eli Collins (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13448387#comment-13448387
 ] 

Eli Collins commented on HDFS-3863:
---

+1 looks great

 QJM: track last committed txid
 

 Key: HDFS-3863
 URL: https://issues.apache.org/jira/browse/HDFS-3863
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: ha
Affects Versions: QuorumJournalManager (HDFS-3077)
Reporter: Todd Lipcon
Assignee: Todd Lipcon
 Attachments: hdfs-3863-prelim.txt, hdfs-3863.txt, hdfs-3863.txt


 Per some discussion with [~stepinto] 
 [here|https://issues.apache.org/jira/browse/HDFS-3077?focusedCommentId=13422579page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13422579],
  we should keep track of the last committed txid on each JournalNode. Then 
 during any recovery operation, we can sanity-check that we aren't asked to 
 truncate a log to an earlier transaction.
 This is also a necessary step if we want to support reading from in-progress 
 segments in the future (since we should only allow reads up to the commit 
 point)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-3863) QJM: track last committed txid

2012-09-01 Thread Eli Collins (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13446852#comment-13446852
 ] 

Eli Collins commented on HDFS-3863:
---

Agree w you and Chao Shi, nice change to the protocol.

Consider making committedTxId and lastCommittedTxId non-optional?  Why not use 
INVALID_TXID rather than 0 as a default value in the file and protocol for 
tracking the committed txid?

 QJM: track last committed txid
 

 Key: HDFS-3863
 URL: https://issues.apache.org/jira/browse/HDFS-3863
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: ha
Affects Versions: QuorumJournalManager (HDFS-3077)
Reporter: Todd Lipcon
Assignee: Todd Lipcon
 Attachments: hdfs-3863-prelim.txt, hdfs-3863.txt


 Per some discussion with [~stepinto] 
 [here|https://issues.apache.org/jira/browse/HDFS-3077?focusedCommentId=13422579page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13422579],
  we should keep track of the last committed txid on each JournalNode. Then 
 during any recovery operation, we can sanity-check that we aren't asked to 
 truncate a log to an earlier transaction.
 This is also a necessary step if we want to support reading from in-progress 
 segments in the future (since we should only allow reads up to the commit 
 point)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-3863) QJM: track last committed txid

2012-08-30 Thread Chao Shi (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13444946#comment-13444946
 ] 

Chao Shi commented on HDFS-3863:


Todd, your patch looks good to me.

How about these:
1) Collect max committed-txid from PrepareRecovery response of each JN, and 
check that logToSync.endTxId = max committed-txid. Since there may be 
unexpected race conditions, it would be better to protect it in both client and 
server side. We're paranoid anyway.
2) In Journal#checkRequest(), verify that committed-txid is non-decreasing 
before saving it.

 QJM: track last committed txid
 

 Key: HDFS-3863
 URL: https://issues.apache.org/jira/browse/HDFS-3863
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: ha
Affects Versions: QuorumJournalManager (HDFS-3077)
Reporter: Todd Lipcon
Assignee: Todd Lipcon
 Attachments: hdfs-3863-prelim.txt


 Per some discussion with [~stepinto] 
 [here|https://issues.apache.org/jira/browse/HDFS-3077?focusedCommentId=13422579page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13422579],
  we should keep track of the last committed txid on each JournalNode. Then 
 during any recovery operation, we can sanity-check that we aren't asked to 
 truncate a log to an earlier transaction.
 This is also a necessary step if we want to support reading from in-progress 
 segments in the future (since we should only allow reads up to the commit 
 point)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-3863) QJM: track last committed txid

2012-08-30 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13445301#comment-13445301
 ] 

Todd Lipcon commented on HDFS-3863:
---

Hi Chao. I tried to add the sanity checks you suggested, and ran into a little 
difficult with the first one. It caused a test failure in the following 
scenario:

JN1 has fallen behind, has: edits_inprogress with txid 44-45
JN2 and JN3 both finished writing this segment (44-47), had fully written 
48-51, and had started a log segment 42, without yet writing any transactions 
to it.

In the current code, when prepareRecovery() invokes scanStorage(), this caused 
JN2 and JN3 to return an empty {{lastSegmentTxId}}. So, the client code went 
into recovery of the log segment with txid 44. It correctly recovered to 44-47, 
but then the assertion failed because the other loggers had seen txid 51 
committed.

So, I had to fix {{scanStorage}} a bit so that it would return the correct most 
recent segment txid, even in this scenario.

I'll upload the improved patch soon after running some more test iterations. 
Thanks for the good idea, as it did catch a slight bug here!

 QJM: track last committed txid
 

 Key: HDFS-3863
 URL: https://issues.apache.org/jira/browse/HDFS-3863
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: ha
Affects Versions: QuorumJournalManager (HDFS-3077)
Reporter: Todd Lipcon
Assignee: Todd Lipcon
 Attachments: hdfs-3863-prelim.txt


 Per some discussion with [~stepinto] 
 [here|https://issues.apache.org/jira/browse/HDFS-3077?focusedCommentId=13422579page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13422579],
  we should keep track of the last committed txid on each JournalNode. Then 
 during any recovery operation, we can sanity-check that we aren't asked to 
 truncate a log to an earlier transaction.
 This is also a necessary step if we want to support reading from in-progress 
 segments in the future (since we should only allow reads up to the commit 
 point)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-3863) QJM: track last committed txid

2012-08-29 Thread Chao Shi (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13443969#comment-13443969
 ] 

Chao Shi commented on HDFS-3863:


Todd, assume JN1/2/3 make up a quorum and JN1 is far behind. JN1 is selected to 
be the lastest one by some buggy algorithm and NN is going to log after JN1. 
JN2 and JN3 will reject, since they know their log number is greater than 
JN1's. Everything works fine so far.

However, imagine a stupid administrator replaces JN2 and JN3 with some new 
machines. Since JN1 is far behind, it doesn't know about the journal number 
committed by JN2 and JN3. It passes the check.

I'm thinking of the similarity between committed-txid and epoch number. They 
both never decrease. I think we can do the following:
- NN maintain highest committed-txid in its memory (or more particularly a 
member of AsyncLoggerSet)
- NN sends it to JN in request header of every packet
- JN saves committed-txid
- NN updates its committed-txid once a write is acked by a quorum of JNs

Note that a JN falls behind may still learn the highest committed-txid, as long 
as the connection between that JN and NN works. The invariant there is NN's 
committed-txid = JN's committed-txid.

We can also add an extra check when NN decide the txid to finalize: it should 
no less than any of JN's commited-txid.

 QJM: track last committed txid
 

 Key: HDFS-3863
 URL: https://issues.apache.org/jira/browse/HDFS-3863
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: ha
Affects Versions: QuorumJournalManager (HDFS-3077)
Reporter: Todd Lipcon
Assignee: Todd Lipcon

 Per some discussion with [~stepinto] 
 [here|https://issues.apache.org/jira/browse/HDFS-3077?focusedCommentId=13422579page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13422579],
  we should keep track of the last committed txid on each JournalNode. Then 
 during any recovery operation, we can sanity-check that we aren't asked to 
 truncate a log to an earlier transaction.
 This is also a necessary step if we want to support reading from in-progress 
 segments in the future (since we should only allow reads up to the commit 
 point)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-3863) QJM: track last committed txid

2012-08-29 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13444304#comment-13444304
 ] 

Todd Lipcon commented on HDFS-3863:
---

Hi Chao. Really interesting idea. Thanks! Let me try to code it up and see how 
it works.

 QJM: track last committed txid
 

 Key: HDFS-3863
 URL: https://issues.apache.org/jira/browse/HDFS-3863
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: ha
Affects Versions: QuorumJournalManager (HDFS-3077)
Reporter: Todd Lipcon
Assignee: Todd Lipcon

 Per some discussion with [~stepinto] 
 [here|https://issues.apache.org/jira/browse/HDFS-3077?focusedCommentId=13422579page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13422579],
  we should keep track of the last committed txid on each JournalNode. Then 
 during any recovery operation, we can sanity-check that we aren't asked to 
 truncate a log to an earlier transaction.
 This is also a necessary step if we want to support reading from in-progress 
 segments in the future (since we should only allow reads up to the commit 
 point)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-3863) QJM: track last committed txid

2012-08-29 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13444316#comment-13444316
 ] 

Todd Lipcon commented on HDFS-3863:
---

BTW, I forgot to address this comment:

{quote}However, imagine a stupid administrator replaces JN2 and JN3 with some 
new machines. Since JN1 is far behind, it doesn't know about the journal number 
committed by JN2 and JN3. It passes the check.
{quote}

In the case that the admin replaces a majority of nodes, then the NN would 
refuse to start up because they would throw JournalNotFormattedException. See 
the discussion on https://issues.apache.org/jira/browse/HDFS-3743

 QJM: track last committed txid
 

 Key: HDFS-3863
 URL: https://issues.apache.org/jira/browse/HDFS-3863
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: ha
Affects Versions: QuorumJournalManager (HDFS-3077)
Reporter: Todd Lipcon
Assignee: Todd Lipcon

 Per some discussion with [~stepinto] 
 [here|https://issues.apache.org/jira/browse/HDFS-3077?focusedCommentId=13422579page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13422579],
  we should keep track of the last committed txid on each JournalNode. Then 
 during any recovery operation, we can sanity-check that we aren't asked to 
 truncate a log to an earlier transaction.
 This is also a necessary step if we want to support reading from in-progress 
 segments in the future (since we should only allow reads up to the commit 
 point)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-3863) QJM: track last committed txid

2012-08-29 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=1361#comment-1361
 ] 

Todd Lipcon commented on HDFS-3863:
---

I decided to punt this aspect to a follow-up patch:
{quote}
This alone is enough for a good sanity check. If we want to also support 
reading the committed transactions while in-progress, it's not quite sufficient 
– the last batch of transactions will never be readable if the NN stops writing 
new batches for a protracted period of time. To solve this, we can add a timer 
thread to the client which periodically (eg once or twice a second) sends an 
RPC to update the committed-txid on all of the nodes. The periodic timer will 
also have the nice property of causing a NN which has been fenced to abort 
itself even if no write transactions are taking place.
{quote}

 QJM: track last committed txid
 

 Key: HDFS-3863
 URL: https://issues.apache.org/jira/browse/HDFS-3863
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: ha
Affects Versions: QuorumJournalManager (HDFS-3077)
Reporter: Todd Lipcon
Assignee: Todd Lipcon
 Attachments: hdfs-3863-prelim.txt


 Per some discussion with [~stepinto] 
 [here|https://issues.apache.org/jira/browse/HDFS-3077?focusedCommentId=13422579page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13422579],
  we should keep track of the last committed txid on each JournalNode. Then 
 during any recovery operation, we can sanity-check that we aren't asked to 
 truncate a log to an earlier transaction.
 This is also a necessary step if we want to support reading from in-progress 
 segments in the future (since we should only allow reads up to the commit 
 point)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-3863) QJM: track last committed txid

2012-08-28 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13443556#comment-13443556
 ] 

Todd Lipcon commented on HDFS-3863:
---

The design here is pretty simple, given the way our journaling protocol works. 
In particular, we only have one outstanding batch of transactions at once. We 
never send a batch of transactions beginning at txid N until the prior batch 
(up through N-1) has been accepted at a quorum of nodes. Thus, any 
{{sendEdits()}} call with {{firstTxId}} N implies a {{commit(N-1)}}.

So, my plan is as follows:

- Introduce a new file inside the journal directory called {{committed-txid}}. 
This would include a single numeric text line, similar to the {{seen_txid}} 
that the NameNode maintains.
- Since this whole feature is not required for correctness, we don't need to 
fsync this file on every update. Instead, we can let the operating system write 
it out to disk whenever it so chooses. If, after a system crash, it reverts to 
an earlier value, this is OK, since our recovery protocol doesn't depend on it 
being up-to-date in any way. Put another way, the invariant is that the file 
contains a value which is a lower bound on the latest committed txn.

The data would be when any sendEdits() call is made -- the call implicitly 
commits all edits prior to the current batch.

This alone is enough for a good sanity check. If we want to also support 
reading the committed transactions while in-progress, it's not quite sufficient 
-- the last batch of transactions will never be readable if the NN stops 
writing new batches for a protracted period of time. To solve this, we can add 
a timer thread to the client which periodically (eg once or twice a second) 
sends an RPC to update the committed-txid on all of the nodes. The periodic 
timer will also have the nice property of causing a NN which has been fenced to 
abort itself even if no write transactions are taking place.

 QJM: track last committed txid
 

 Key: HDFS-3863
 URL: https://issues.apache.org/jira/browse/HDFS-3863
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: ha
Affects Versions: QuorumJournalManager (HDFS-3077)
Reporter: Todd Lipcon
Assignee: Todd Lipcon

 Per some discussion with [~stepinto] 
 [here|https://issues.apache.org/jira/browse/HDFS-3077?focusedCommentId=13422579page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13422579],
  we should keep track of the last committed txid on each JournalNode. Then 
 during any recovery operation, we can sanity-check that we aren't asked to 
 truncate a log to an earlier transaction.
 This is also a necessary step if we want to support reading from in-progress 
 segments in the future (since we should only allow reads up to the commit 
 point)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira