ZanderXu opened a new pull request, #4628:
URL: https://github.com/apache/hadoop/pull/4628

   ### Description of PR
   Standby NameNode crashes when transitioning to Active with a in-progress 
tailer. 
   And the error message as blew:
   ```java
   Caused by: java.lang.IllegalStateException: Cannot start writing at txid X 
when there is a stream available for read: ByteStringEditLog[X, Y], 
ByteStringEditLog[X, 0]
        at 
org.apache.hadoop.hdfs.server.namenode.FSEditLog.openForWrite(FSEditLog.java:344)
        at 
org.apache.hadoop.hdfs.server.namenode.FSEditLogAsync.openForWrite(FSEditLogAsync.java:113)
        at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startActiveServices(FSNamesystem.java:1423)
        at 
org.apache.hadoop.hdfs.server.namenode.NameNode$NameNodeHAContext.startActiveServices(NameNode.java:2132)
        ... 36 more
   ```
   
   After tracing and found there is a critical bug in 
`EditlogTailer#catchupDuringFailover()` when `DFS_HA_TAILEDITS_INPROGRESS_KEY` 
is true. Because `catchupDuringFailover()` try to replay all edits with 
`onlyDurableTxns` is true. It may cannot replay any edits when they are some 
abnormal JournalNodes. 
   
   
   Reproduce method, suppose:
   
   - There are 2 namenode, namely NN0 and NN1, and the status of echo namenode 
is Active, Standby respectively. And there are 3 JournalNodes, namely JN0, JN1 
and JN2. 
   - NN0 try to sync 3 edits to JNs with started txid 3, but only successfully 
synced them to JN1 and JN3. And JN0 is abnormal, such as GC, bad network or 
restarted.
   - NN1's lastAppliedTxId is 2, and at the moment, we are trying failover 
active from NN0 to NN1. 
   - NN1 only got two responses from JN0 and JN1 when it try to selecting 
inputStreams with `fromTxnId=3`  and `onlyDurableTxns=true`, and the count txid 
of response is 0, 3 respectively. JN2 is abnormal, such as GC,  bad network or 
restarted.
   - NN1 will cannot replay any Edits with `fromTxnId=3` from JournalNodes 
because the `maxAllowedTxns` is 0.
   
   
   So I think Standby NameNode should `catchupDuringFailover()` with 
`onlyDurableTxns=true` , so that it can replay all missed edits from 
JournalNode.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

Reply via email to