[jira] [Commented] (HDFS-16493) [SBN Read]When fast path tail enabled, standby or observer namenode may read uncommitted data
[ https://issues.apache.org/jira/browse/HDFS-16493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17531016#comment-17531016 ] 袁枫 commented on HDFS-16493: --- I am confusing in: org/apache/hadoop/hdfs/qjournal/client/SegmentRecoveryComparator.java:86 {code:java} return ComparisonChain.start() .compare(r1SeenEpoch, r2SeenEpoch) .compare(r1.getSegmentState().getEndTxId(), r2.getSegmentState().getEndTxId()) .result(); {code} Why would pick longest when round1, if this way, will pick jn3`s log to sync? Why not quorum length? Do you know this? [~liutongwei] > [SBN Read]When fast path tail enabled, standby or observer namenode may read > uncommitted data > - > > Key: HDFS-16493 > URL: https://issues.apache.org/jira/browse/HDFS-16493 > Project: Hadoop HDFS > Issue Type: Bug > Components: journal-node, namanode >Reporter: liutongwei >Priority: Critical > Attachments: exapmle.v1.patch > > > Although fast path tail use quorum read to pull edit log, it seem like can > read uncommitted data in some corner case. > Here is an example. Suppose we have three JN, their init state is: > > {code:java} > epoch 1 > JN1 [1-3](in-progress) > JN2 [1-3](in-progress) > JN3 [1-4](in-progress) > Note that, in epoch 1 txid 1-3 was committed, and txid 4 not. > {code} > When a failover occur, if a new writer cannot contact to JN3 for network > partition, and finish the recovery stage, and write a new txid 4 in epoch 2, > which value not equal to JN3's. > > {code:java} > epcho 2 > JN1 [1-3](finalized) [4-4](inprogress) > JN2 [1-3](finalized) [4-4](inprogress) > JN3 [1-4](inprogress) > Note that, in JN3 txid4's value not equal to other JN. > {code} > > Now there is a read namenode to pull edits, and it contact to JN3 and JN2, it > got majority response. But it got logs of same length but different > content.And no more information to choose which log is right. If we choose > JN3, we got meta data corruption. > There is a test example patch [^example.patch] for running and debug. > For fix it i think we should add finalized state to > {{{}GetJournaledEditsResponseProto{}}}, so we can discard the fault log. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-16493) [SBN Read]When fast path tail enabled, standby or observer namenode may read uncommitted data
[ https://issues.apache.org/jira/browse/HDFS-16493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17521419#comment-17521419 ] liutongwei commented on HDFS-16493: --- [~Feng Yuan] Sorry for the mistake when I create path I do not recheck code. Update a new version. > [SBN Read]When fast path tail enabled, standby or observer namenode may read > uncommitted data > - > > Key: HDFS-16493 > URL: https://issues.apache.org/jira/browse/HDFS-16493 > Project: Hadoop HDFS > Issue Type: Bug > Components: journal-node, namanode >Reporter: liutongwei >Priority: Critical > Attachments: exapmle.v1.patch > > > Although fast path tail use quorum read to pull edit log, it seem like can > read uncommitted data in some corner case. > Here is an example. Suppose we have three JN, their init state is: > > {code:java} > epoch 1 > JN1 [1-3](in-progress) > JN2 [1-3](in-progress) > JN3 [1-4](in-progress) > Note that, in epoch 1 txid 1-3 was committed, and txid 4 not. > {code} > When a failover occur, if a new writer cannot contact to JN3 for network > partition, and finish the recovery stage, and write a new txid 4 in epoch 2, > which value not equal to JN3's. > > {code:java} > epcho 2 > JN1 [1-3](finalized) [4-4](inprogress) > JN2 [1-3](finalized) [4-4](inprogress) > JN3 [1-4](inprogress) > Note that, in JN3 txid4's value not equal to other JN. > {code} > > Now there is a read namenode to pull edits, and it contact to JN3 and JN2, it > got majority response. But it got logs of same length but different > content.And no more information to choose which log is right. If we choose > JN3, we got meta data corruption. > There is a test example patch [^example.patch] for running and debug. > For fix it i think we should add finalized state to > {{{}GetJournaledEditsResponseProto{}}}, so we can discard the fault log. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-16493) [SBN Read]When fast path tail enabled, standby or observer namenode may read uncommitted data
[ https://issues.apache.org/jira/browse/HDFS-16493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17521131#comment-17521131 ] 袁枫 commented on HDFS-16493: --- [~liutongwei] Is there some mistake? 1. {code:java} failLoggerAtTxn(spies.get(1), 4); failLoggerAtTxn(spies.get(2), 4); {code} indicate jn2 and jn3? > [SBN Read]When fast path tail enabled, standby or observer namenode may read > uncommitted data > - > > Key: HDFS-16493 > URL: https://issues.apache.org/jira/browse/HDFS-16493 > Project: Hadoop HDFS > Issue Type: Bug > Components: journal-node, namanode >Reporter: liutongwei >Priority: Critical > Attachments: example.patch > > > Although fast path tail use quorum read to pull edit log, it seem like can > read uncommitted data in some corner case. > Here is an example. Suppose we have three JN, their init state is: > > {code:java} > epoch 1 > JN1 [1-3](in-progress) > JN2 [1-3](in-progress) > JN3 [1-4](in-progress) > Note that, in epoch 1 txid 1-3 was committed, and txid 4 not. > {code} > When a failover occur, if a new writer cannot contact to JN3 for network > partition, and finish the recovery stage, and write a new txid 4 in epoch 2, > which value not equal to JN3's. > > {code:java} > epcho 2 > JN1 [1-3](finalized) [4-4](inprogress) > JN2 [1-3](finalized) [4-4](inprogress) > JN3 [1-4](inprogress) > Note that, in JN3 txid4's value not equal to other JN. > {code} > > Now there is a read namenode to pull edits, and it contact to JN3 and JN2, it > got majority response. But it got logs of same length but different > content.And no more information to choose which log is right. If we choose > JN3, we got meta data corruption. > There is a test example patch [^example.patch] for running and debug. > For fix it i think we should add finalized state to > {{{}GetJournaledEditsResponseProto{}}}, so we can discard the fault log. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-16493) [SBN Read]When fast path tail enabled, standby or observer namenode may read uncommitted data
[ https://issues.apache.org/jira/browse/HDFS-16493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17502070#comment-17502070 ] liutongwei commented on HDFS-16493: --- Thanks [~xkrogen] for replying. {quote}Thanks for reporting liutongwei! I guess this is a continuation of your comment on HDFS-13150, is that correct? {quote} Yes. When I reviewing the journalnode source code to track the issue [HDFS-16490|https://issues.apache.org/jira/browse/HDFS-16490]. The doubt of fast path tail resurface. After add test case [^example.patch], it seems that fast path tail not return edit logs same as the original tail process. [~shv] , what's your opinion about this issue? > [SBN Read]When fast path tail enabled, standby or observer namenode may read > uncommitted data > - > > Key: HDFS-16493 > URL: https://issues.apache.org/jira/browse/HDFS-16493 > Project: Hadoop HDFS > Issue Type: Bug > Components: journal-node, namanode >Reporter: liutongwei >Priority: Critical > Attachments: example.patch > > > Although fast path tail use quorum read to pull edit log, it seem like can > read uncommitted data in some corner case. > Here is an example. Suppose we have three JN, their init state is: > > {code:java} > epoch 1 > JN1 [1-3](in-progress) > JN2 [1-3](in-progress) > JN3 [1-4](in-progress) > Note that, in epoch 1 txid 1-3 was committed, and txid 4 not. > {code} > When a failover occur, if a new writer cannot contact to JN3 for network > partition, and finish the recovery stage, and write a new txid 4 in epoch 2, > which value not equal to JN3's. > > {code:java} > epcho 2 > JN1 [1-3](finalized) [4-4](inprogress) > JN2 [1-3](finalized) [4-4](inprogress) > JN3 [1-4](inprogress) > Note that, in JN3 txid4's value not equal to other JN. > {code} > > Now there is a read namenode to pull edits, and it contact to JN3 and JN2, it > got majority response. But it got logs of same length but different > content.And no more information to choose which log is right. If we choose > JN3, we got meta data corruption. > There is a test example patch [^example.patch] for running and debug. > For fix it i think we should add finalized state to > {{{}GetJournaledEditsResponseProto{}}}, so we can discard the fault log. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-16493) [SBN Read]When fast path tail enabled, standby or observer namenode may read uncommitted data
[ https://issues.apache.org/jira/browse/HDFS-16493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17501435#comment-17501435 ] Erik Krogen commented on HDFS-16493: Thanks for reporting [~liutongwei]! I guess this is a continuation of [your comment on HDFS-13150|https://issues.apache.org/jira/browse/HDFS-13150?focusedCommentId=17408479=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17408479], is that correct? As I said there I don't personally have bandwidth to dig deep onto this, but from your detailed explanation, it does seem to be a valid issue. I will let [~shv] take a closer look. > [SBN Read]When fast path tail enabled, standby or observer namenode may read > uncommitted data > - > > Key: HDFS-16493 > URL: https://issues.apache.org/jira/browse/HDFS-16493 > Project: Hadoop HDFS > Issue Type: Bug > Components: journal-node, namanode >Reporter: liutongwei >Priority: Critical > Attachments: example.patch > > > Although fast path tail use quorum read to pull edit log, it seem like can > read uncommitted data in some corner case. > Here is an example. Suppose we have three JN, their init state is: > > {code:java} > epoch 1 > JN1 [1-3](in-progress) > JN2 [1-3](in-progress) > JN3 [1-4](in-progress) > Note that, in epoch 1 txid 1-3 was committed, and txid 4 not. > {code} > When a failover occur, if a new writer cannot contact to JN3 for network > partition, and finish the recovery stage, and write a new txid 4 in epoch 2, > which value not equal to JN3's. > > {code:java} > epcho 2 > JN1 [1-3](finalized) [4-4](inprogress) > JN2 [1-3](finalized) [4-4](inprogress) > JN3 [1-4](inprogress) > Note that, in JN3 txid4's value not equal to other JN. > {code} > > Now there is a read namenode to pull edits, and it contact to JN3 and JN2, it > got majority response. But it got logs of same length but different > content.And no more information to choose which log is right. If we choose > JN3, we got meta data corruption. > There is a test example patch [^example.patch] for running and debug. > For fix it i think we should add finalized state to > {{{}GetJournaledEditsResponseProto{}}}, so we can discard the fault log. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-16493) [SBN Read]When fast path tail enabled, standby or observer namenode may read uncommitted data
[ https://issues.apache.org/jira/browse/HDFS-16493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17501172#comment-17501172 ] liutongwei commented on HDFS-16493: --- [~shv] [~xkrogen] , I have add a test case for the concern mentioned in https://issues.apache.org/jira/browse/HDFS-13150?focusedCommentId=17408479=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17408479. > [SBN Read]When fast path tail enabled, standby or observer namenode may read > uncommitted data > - > > Key: HDFS-16493 > URL: https://issues.apache.org/jira/browse/HDFS-16493 > Project: Hadoop HDFS > Issue Type: Bug > Components: journal-node, namanode >Reporter: liutongwei >Priority: Critical > Attachments: example.patch > > > Although fast path tail use quorum read to pull edit log, it seem like is can > read uncommitted data in some corner case. > Here is an example. Suppose we have three JN, their init state is: > > {code:java} > epoch 1 > JN1 [1-3](in-progress) > JN2 [1-3](in-progress) > JN3 [1-4](in-progress) > Note that, in epoch 1 txid 1-3 was committed, and txid 4 not. > {code} > When a failover occur, if a new writer cannot contact to JN3 for network > partition, and finish the recovery stage, and write a new txid 4 in epoch 2, > which value not equal to JN3's. > > {code:java} > epcho 2 > JN1 [1-3](finalized) [4-4](inprogress) > JN2 [1-3](finalized) [4-4](inprogress) > JN3 [1-4](inprogress) > Note that, in JN3 txid4's value not equal to other JN. > {code} > > Now there is a read namenode to pull edits, and it contact to JN3 and JN2, it > got majority response. But it got logs of same length but different > content.And no more information to choose which log is right. If we choose > JN3, we got meta data corruption. > There is a test example patch [^example.patch] for running and debug. > For fix it i think we should add finalized state to > {{{}GetJournaledEditsResponseProto{}}}, so we can discard the fault log. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org