[jira] [Updated] (HDFS-9119) Discrepancy between edit log tailing interval and RPC timeout for transitionToActive

2017-01-06 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9119?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated HDFS-9119:
-
Target Version/s:   (was: 2.8.0)

> Discrepancy between edit log tailing interval and RPC timeout for 
> transitionToActive
> 
>
> Key: HDFS-9119
> URL: https://issues.apache.org/jira/browse/HDFS-9119
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: ha
>Affects Versions: 2.7.1
>Reporter: Zhe Zhang
>Assignee: Zhe Zhang
> Attachments: HDFS-9119.00.patch
>
>
> {{EditLogTailer}} on standby NameNode tails edits from active NameNode every 
> 2 minutes. But the {{transitionToActive}} RPC call has a timeout of 1 minute.
> If active NameNode encounters very intensive metadata workload (in 
> particular, a lot of {{AddOp}} and {{MkDir}} operations to create new files 
> and directories), the amount of updates accumulated in the 2 mins edit log 
> tailing interval is hard for the standby NameNode to catch up in the 1 min 
> timeout window. If that happens, the FailoverController will timeout and give 
> up trying to transition the standby to active. The old ANN will resume adding 
> more edits. When the SbNN finally finishes catching up the edits and tries to 
> become active, it will crash.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-9119) Discrepancy between edit log tailing interval and RPC timeout for transitionToActive

2015-10-12 Thread Zhe Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9119?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhe Zhang updated HDFS-9119:

Attachment: HDFS-9119.00.patch

> Discrepancy between edit log tailing interval and RPC timeout for 
> transitionToActive
> 
>
> Key: HDFS-9119
> URL: https://issues.apache.org/jira/browse/HDFS-9119
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: ha
>Affects Versions: 2.7.1
>Reporter: Zhe Zhang
>Assignee: Zhe Zhang
> Attachments: HDFS-9119.00.patch
>
>
> {{EditLogTailer}} on standby NameNode tails edits from active NameNode every 
> 2 minutes. But the {{transitionToActive}} RPC call has a timeout of 1 minute.
> If active NameNode encounters very intensive metadata workload (in 
> particular, a lot of {{AddOp}} and {{MkDir}} operations to create new files 
> and directories), the amount of updates accumulated in the 2 mins edit log 
> tailing interval is hard for the standby NameNode to catch up in the 1 min 
> timeout window. If that happens, the FailoverController will timeout and give 
> up trying to transition the standby to active. The old ANN will resume adding 
> more edits. When the SbNN finally finishes catching up the edits and tries to 
> become active, it will crash.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-9119) Discrepancy between edit log tailing interval and RPC timeout for transitionToActive

2015-10-12 Thread Zhe Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9119?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhe Zhang updated HDFS-9119:

Status: Patch Available  (was: Open)

Submitting initial patch to trigger Jenkins. Will add a new test in the next 
rev.

> Discrepancy between edit log tailing interval and RPC timeout for 
> transitionToActive
> 
>
> Key: HDFS-9119
> URL: https://issues.apache.org/jira/browse/HDFS-9119
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: ha
>Affects Versions: 2.7.1
>Reporter: Zhe Zhang
>Assignee: Zhe Zhang
> Attachments: HDFS-9119.00.patch
>
>
> {{EditLogTailer}} on standby NameNode tails edits from active NameNode every 
> 2 minutes. But the {{transitionToActive}} RPC call has a timeout of 1 minute.
> If active NameNode encounters very intensive metadata workload (in 
> particular, a lot of {{AddOp}} and {{MkDir}} operations to create new files 
> and directories), the amount of updates accumulated in the 2 mins edit log 
> tailing interval is hard for the standby NameNode to catch up in the 1 min 
> timeout window. If that happens, the FailoverController will timeout and give 
> up trying to transition the standby to active. The old ANN will resume adding 
> more edits. When the SbNN finally finishes catching up the edits and tries to 
> become active, it will crash.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)