[jira] [Commented] (HDFS-14806) Bootstrap standby may fail if used in-progress tailing

Hadoop QA (Jira) Wed, 04 Sep 2019 13:52:34 -0700


    [ 
https://issues.apache.org/jira/browse/HDFS-14806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16922858#comment-16922858
 ]


Hadoop QA commented on HDFS-14806:
----------------------------------

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m  
0s{color} | {color:blue} Docker mode activated. {color} |
| {color:red}-1{color} | {color:red} docker {color} | {color:red}  0m 
21s{color} | {color:red} Docker failed to build yetus/hadoop:bdbca0e53b4. 
{color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Issue | HDFS-14806 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12979466/HDFS-14806.002.patch |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/27783/console |
| Powered by | Apache Yetus 0.8.0   http://yetus.apache.org |


This message was automatically generated.



> Bootstrap standby may fail if used in-progress tailing
> ------------------------------------------------------
>
>                 Key: HDFS-14806
>                 URL: https://issues.apache.org/jira/browse/HDFS-14806
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: namenode
>    Affects Versions: 3.3.0
>            Reporter: Chen Liang
>            Assignee: Chen Liang
>            Priority: Major
>         Attachments: HDFS-14806.001.patch, HDFS-14806.002.patch
>
>
> One issue we went across was that if in-progress tailing is enabled, 
> bootstrap standby could fail.
> When in-progress tailing is enabled, Bootstrap uses the RPC mechanism to get 
> edits. There is a config {{dfs.ha.tail-edits.qjm.rpc.max-txns}} that sets an 
> upper bound on how many txnid can be included in one RPC call. The default is 
> 5000. Meaning bootstraping NN (say NN1) can only pull at most 5000 edits from 
> JN. However, as part of bootstrap, NN1 queries another NN (say NN2) for NN2's 
> current transactionID, NN2 may return a state that is > 5000 txnid from NN1's 
> current image. But NN1 can only see 5000 more txnid from JNs. At this point 
> NN1 goes panic, because txnid retuned by JNs is behind NN2's returned state, 
> bootstrap then fail.
> Essentially, bootstrap standby can fail if both of two following conditions 
> are met:
>  # in-progress tailing is enabled AND
>  # the boostraping NN is too far (>5000 txid)  behind 
> Increasing the value of {{dfs.ha.tail-edits.qjm.rpc.max-txns}} to some super 
> large value allowed bootstrap to continue. But this is hardly the ideal 
> solution.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-14806) Bootstrap standby may fail if used in-progress tailing

Reply via email to