[ 
https://issues.apache.org/jira/browse/HDFS-16689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17579189#comment-17579189
 ] 

ASF GitHub Bot commented on HDFS-16689:
---------------------------------------

ZanderXu commented on PR #4628:
URL: https://github.com/apache/hadoop/pull/4628#issuecomment-1213594732

   Thanks @xkrogen for your detailed explanation. 
   I have through this corner case during coding this patch. Please correct me 
if I'm wrong. I think above scenario always existed before HDFS-12943, so I 
fixed this bug like this way. And we can find a good idea to fix data loss 
issue you mentioned above.
   
   > 1. Currently NN0 is active and JN0-2 all have txn 2 committed.
   2. NN0 attempts to write txn 3. It only succeeds to JN0, and crashes before 
writing to JN1/JN2.
   3. We fail over to NN1, which currently has txns up to 1
   4. NN1 attempts to load most recent state from JNs
        4a. Before HDFS-12943, NN1 uses `getEditLogManifest()`, it will load 
and apply txn 2 AND 3.
   
   Because during NN0 stoping active service, it will close the current 
segment, last finalize segment of JN0 contains the txn 3. So during NN1 
starting active service, it can load and apply txn 3 through  
`getEditLogManifest()`.
   
   And the current logic in `startActiveServices` is confusing.
   1.  using `onlyDurableTxns=true` to catchup all edits from JNs.
   2. using `onlyDurableTxns=false` to check if there are newer txid readable 
in `openForWrite`.
   
   There is indeed a probability of data loss, if the disk in JN0 corrupted 
before the segment is not synchronized by JN1 and JN2 in time. But maybe we 
need add a new logic to find this case and let JNs synchronously sync the 
missing txid, such as in `startActive()` method.




> Standby NameNode crashes when transitioning to Active with in-progress tailer
> -----------------------------------------------------------------------------
>
>                 Key: HDFS-16689
>                 URL: https://issues.apache.org/jira/browse/HDFS-16689
>             Project: Hadoop HDFS
>          Issue Type: Bug
>            Reporter: ZanderXu
>            Assignee: ZanderXu
>            Priority: Critical
>              Labels: pull-request-available
>             Fix For: 3.4.0
>
>          Time Spent: 50m
>  Remaining Estimate: 0h
>
> Standby NameNode crashes when transitioning to Active with a in-progress 
> tailer. And the error message like blew:
> {code:java}
> Caused by: java.lang.IllegalStateException: Cannot start writing at txid X 
> when there is a stream available for read: ByteStringEditLog[X, Y], 
> ByteStringEditLog[X, 0]
>       at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLog.openForWrite(FSEditLog.java:344)
>       at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogAsync.openForWrite(FSEditLogAsync.java:113)
>       at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startActiveServices(FSNamesystem.java:1423)
>       at 
> org.apache.hadoop.hdfs.server.namenode.NameNode$NameNodeHAContext.startActiveServices(NameNode.java:2132)
>       ... 36 more
> {code}
> After tracing and found there is a critical bug in 
> *EditlogTailer#catchupDuringFailover()* when 
> *DFS_HA_TAILEDITS_INPROGRESS_KEY* is true. Because *catchupDuringFailover()* 
> try to replay all missed edits from JournalNodes with *onlyDurableTxns=true*. 
> It may cannot replay any edits when they are some abnormal JournalNodes. 
> Reproduce method, suppose:
> - There are 2 namenode, namely NN0 and NN1, and the status of echo namenode 
> is Active, Standby respectively. And there are 3 JournalNodes, namely JN0, 
> JN1 and JN2. 
> - NN0 try to sync 3 edits to JNs with started txid 3, but only successfully 
> synced them to JN1 and JN3. And JN0 is abnormal, such as GC, bad network or 
> restarted.
> - NN1's lastAppliedTxId is 2, and at the moment, we are trying failover 
> active from NN0 to NN1. 
> - NN1 only got two responses from JN0 and JN1 when it try to selecting 
> inputStreams with *fromTxnId=3*  and *onlyDurableTxns=true*, and the count 
> txid of response is 0, 3 respectively. JN2 is abnormal, such as GC,  bad 
> network or restarted.
> - NN1 will cannot replay any Edits with *fromTxnId=3* from JournalNodes 
> because the *maxAllowedTxns* is 0.
> So I think Standby NameNode should *catchupDuringFailover()* with 
> *onlyDurableTxns=false* , so that it can replay all missed edits from 
> JournalNode.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to