[ https://issues.apache.org/jira/browse/HDFS-14806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16922858#comment-16922858 ]
Hadoop QA commented on HDFS-14806: ---------------------------------- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s{color} | {color:blue} Docker mode activated. {color} | | {color:red}-1{color} | {color:red} docker {color} | {color:red} 0m 21s{color} | {color:red} Docker failed to build yetus/hadoop:bdbca0e53b4. {color} | \\ \\ || Subsystem || Report/Notes || | JIRA Issue | HDFS-14806 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12979466/HDFS-14806.002.patch | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/27783/console | | Powered by | Apache Yetus 0.8.0 http://yetus.apache.org | This message was automatically generated. > Bootstrap standby may fail if used in-progress tailing > ------------------------------------------------------ > > Key: HDFS-14806 > URL: https://issues.apache.org/jira/browse/HDFS-14806 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode > Affects Versions: 3.3.0 > Reporter: Chen Liang > Assignee: Chen Liang > Priority: Major > Attachments: HDFS-14806.001.patch, HDFS-14806.002.patch > > > One issue we went across was that if in-progress tailing is enabled, > bootstrap standby could fail. > When in-progress tailing is enabled, Bootstrap uses the RPC mechanism to get > edits. There is a config {{dfs.ha.tail-edits.qjm.rpc.max-txns}} that sets an > upper bound on how many txnid can be included in one RPC call. The default is > 5000. Meaning bootstraping NN (say NN1) can only pull at most 5000 edits from > JN. However, as part of bootstrap, NN1 queries another NN (say NN2) for NN2's > current transactionID, NN2 may return a state that is > 5000 txnid from NN1's > current image. But NN1 can only see 5000 more txnid from JNs. At this point > NN1 goes panic, because txnid retuned by JNs is behind NN2's returned state, > bootstrap then fail. > Essentially, bootstrap standby can fail if both of two following conditions > are met: > # in-progress tailing is enabled AND > # the boostraping NN is too far (>5000 txid) behind > Increasing the value of {{dfs.ha.tail-edits.qjm.rpc.max-txns}} to some super > large value allowed bootstrap to continue. But this is hardly the ideal > solution. -- This message was sent by Atlassian Jira (v8.3.2#803003) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org