[jira] [Commented] (HDFS-2949) HA: Add check to active state transition to prevent operator-induced split brain

2014-06-08 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14021139#comment-14021139
 ] 

Zhijie Shen commented on HDFS-2949:
---

This patch seems to break YARN-2075.

> HA: Add check to active state transition to prevent operator-induced split 
> brain
> 
>
> Key: HDFS-2949
> URL: https://issues.apache.org/jira/browse/HDFS-2949
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: ha, namenode
>Affects Versions: 0.24.0
>Reporter: Todd Lipcon
>Assignee: Rushabh S Shah
> Fix For: 3.0.0, 2.5.0
>
> Attachments: HDFS-2949-v2.patch, HDFS-2949-v3.patch, HDFS-2949.patch
>
>
> Currently, if the administrator mistakenly calls "-transitionToActive" on one 
> NN while the other one is still active, all hell will break loose. We can add 
> a simple check by having the NN make a getServiceState() RPC to its peer with 
> a short (~1 second?) timeout. If the RPC succeeds and indicates the other 
> node is active, it should refuse to enter active mode. If the RPC fails or 
> indicates standby, it can proceed.
> This is just meant as a preventative safety check - we still expect users to 
> use the "-failover" command which has other checks plus fencing built in.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-2949) HA: Add check to active state transition to prevent operator-induced split brain

2014-05-20 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14003474#comment-14003474
 ] 

Hudson commented on HDFS-2949:
--

FAILURE: Integrated in Hadoop-Mapreduce-trunk #1780 (See 
[https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1780/])
HDFS-2949. Add check to active state transition to prevent operator-induced 
split brain. Contributed by Rushabh S Shah. (kihwal: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1594709)
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ha/HAAdmin.java
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNode.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/DFSHAAdmin.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/tools/TestDFSHAAdmin.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/tools/TestDFSHAAdminMiniCluster.java


> HA: Add check to active state transition to prevent operator-induced split 
> brain
> 
>
> Key: HDFS-2949
> URL: https://issues.apache.org/jira/browse/HDFS-2949
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: ha, namenode
>Affects Versions: 0.24.0
>Reporter: Todd Lipcon
>Assignee: Rushabh S Shah
> Fix For: 3.0.0, 2.5.0
>
> Attachments: HDFS-2949-v2.patch, HDFS-2949-v3.patch, HDFS-2949.patch
>
>
> Currently, if the administrator mistakenly calls "-transitionToActive" on one 
> NN while the other one is still active, all hell will break loose. We can add 
> a simple check by having the NN make a getServiceState() RPC to its peer with 
> a short (~1 second?) timeout. If the RPC succeeds and indicates the other 
> node is active, it should refuse to enter active mode. If the RPC fails or 
> indicates standby, it can proceed.
> This is just meant as a preventative safety check - we still expect users to 
> use the "-failover" command which has other checks plus fencing built in.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-2949) HA: Add check to active state transition to prevent operator-induced split brain

2014-05-20 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14003297#comment-14003297
 ] 

Hudson commented on HDFS-2949:
--

FAILURE: Integrated in Hadoop-Hdfs-trunk #1754 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk/1754/])
HDFS-2949. Add check to active state transition to prevent operator-induced 
split brain. Contributed by Rushabh S Shah. (kihwal: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1594709)
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ha/HAAdmin.java
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNode.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/DFSHAAdmin.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/tools/TestDFSHAAdmin.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/tools/TestDFSHAAdminMiniCluster.java


> HA: Add check to active state transition to prevent operator-induced split 
> brain
> 
>
> Key: HDFS-2949
> URL: https://issues.apache.org/jira/browse/HDFS-2949
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: ha, namenode
>Affects Versions: 0.24.0
>Reporter: Todd Lipcon
>Assignee: Rushabh S Shah
> Fix For: 3.0.0, 2.5.0
>
> Attachments: HDFS-2949-v2.patch, HDFS-2949-v3.patch, HDFS-2949.patch
>
>
> Currently, if the administrator mistakenly calls "-transitionToActive" on one 
> NN while the other one is still active, all hell will break loose. We can add 
> a simple check by having the NN make a getServiceState() RPC to its peer with 
> a short (~1 second?) timeout. If the RPC succeeds and indicates the other 
> node is active, it should refuse to enter active mode. If the RPC fails or 
> indicates standby, it can proceed.
> This is just meant as a preventative safety check - we still expect users to 
> use the "-failover" command which has other checks plus fencing built in.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-2949) HA: Add check to active state transition to prevent operator-induced split brain

2014-05-20 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14003262#comment-14003262
 ] 

Hudson commented on HDFS-2949:
--

FAILURE: Integrated in Hadoop-Yarn-trunk #562 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/562/])
HDFS-2949. Add check to active state transition to prevent operator-induced 
split brain. Contributed by Rushabh S Shah. (kihwal: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1594709)
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ha/HAAdmin.java
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNode.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/DFSHAAdmin.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/tools/TestDFSHAAdmin.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/tools/TestDFSHAAdminMiniCluster.java


> HA: Add check to active state transition to prevent operator-induced split 
> brain
> 
>
> Key: HDFS-2949
> URL: https://issues.apache.org/jira/browse/HDFS-2949
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: ha, namenode
>Affects Versions: 0.24.0
>Reporter: Todd Lipcon
>Assignee: Rushabh S Shah
> Fix For: 3.0.0, 2.5.0
>
> Attachments: HDFS-2949-v2.patch, HDFS-2949-v3.patch, HDFS-2949.patch
>
>
> Currently, if the administrator mistakenly calls "-transitionToActive" on one 
> NN while the other one is still active, all hell will break loose. We can add 
> a simple check by having the NN make a getServiceState() RPC to its peer with 
> a short (~1 second?) timeout. If the RPC succeeds and indicates the other 
> node is active, it should refuse to enter active mode. If the RPC fails or 
> indicates standby, it can proceed.
> This is just meant as a preventative safety check - we still expect users to 
> use the "-failover" command which has other checks plus fencing built in.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-2949) HA: Add check to active state transition to prevent operator-induced split brain

2014-05-16 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13999019#comment-13999019
 ] 

Hudson commented on HDFS-2949:
--

SUCCESS: Integrated in Hadoop-trunk-Commit #5605 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/5605/])
HDFS-2949. Add check to active state transition to prevent operator-induced 
split brain. Contributed by Rushabh S Shah. (kihwal: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1594709)
* 
/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ha/HAAdmin.java
* /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNode.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/DFSHAAdmin.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/tools/TestDFSHAAdmin.java
* 
/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/tools/TestDFSHAAdminMiniCluster.java


> HA: Add check to active state transition to prevent operator-induced split 
> brain
> 
>
> Key: HDFS-2949
> URL: https://issues.apache.org/jira/browse/HDFS-2949
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: ha, namenode
>Affects Versions: 0.24.0
>Reporter: Todd Lipcon
>Assignee: Rushabh S Shah
> Fix For: 3.0.0, 2.5.0
>
> Attachments: HDFS-2949-v2.patch, HDFS-2949-v3.patch, HDFS-2949.patch
>
>
> Currently, if the administrator mistakenly calls "-transitionToActive" on one 
> NN while the other one is still active, all hell will break loose. We can add 
> a simple check by having the NN make a getServiceState() RPC to its peer with 
> a short (~1 second?) timeout. If the RPC succeeds and indicates the other 
> node is active, it should refuse to enter active mode. If the RPC fails or 
> indicates standby, it can proceed.
> This is just meant as a preventative safety check - we still expect users to 
> use the "-failover" command which has other checks plus fencing built in.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-2949) HA: Add check to active state transition to prevent operator-induced split brain

2014-05-14 Thread Kihwal Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13997723#comment-13997723
 ] 

Kihwal Lee commented on HDFS-2949:
--

I've manually kicked precommit.

> HA: Add check to active state transition to prevent operator-induced split 
> brain
> 
>
> Key: HDFS-2949
> URL: https://issues.apache.org/jira/browse/HDFS-2949
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: ha, namenode
>Affects Versions: 0.24.0
>Reporter: Todd Lipcon
>Assignee: Rushabh S Shah
> Attachments: HDFS-2949-v2.patch, HDFS-2949-v3.patch, HDFS-2949.patch
>
>
> Currently, if the administrator mistakenly calls "-transitionToActive" on one 
> NN while the other one is still active, all hell will break loose. We can add 
> a simple check by having the NN make a getServiceState() RPC to its peer with 
> a short (~1 second?) timeout. If the RPC succeeds and indicates the other 
> node is active, it should refuse to enter active mode. If the RPC fails or 
> indicates standby, it can proceed.
> This is just meant as a preventative safety check - we still expect users to 
> use the "-failover" command which has other checks plus fencing built in.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-2949) HA: Add check to active state transition to prevent operator-induced split brain

2014-05-14 Thread Kihwal Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13998002#comment-13998002
 ] 

Kihwal Lee commented on HDFS-2949:
--

+1 The patch looks good.

> HA: Add check to active state transition to prevent operator-induced split 
> brain
> 
>
> Key: HDFS-2949
> URL: https://issues.apache.org/jira/browse/HDFS-2949
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: ha, namenode
>Affects Versions: 0.24.0
>Reporter: Todd Lipcon
>Assignee: Rushabh S Shah
> Attachments: HDFS-2949-v2.patch, HDFS-2949-v3.patch, HDFS-2949.patch
>
>
> Currently, if the administrator mistakenly calls "-transitionToActive" on one 
> NN while the other one is still active, all hell will break loose. We can add 
> a simple check by having the NN make a getServiceState() RPC to its peer with 
> a short (~1 second?) timeout. If the RPC succeeds and indicates the other 
> node is active, it should refuse to enter active mode. If the RPC fails or 
> indicates standby, it can proceed.
> This is just meant as a preventative safety check - we still expect users to 
> use the "-failover" command which has other checks plus fencing built in.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-2949) HA: Add check to active state transition to prevent operator-induced split brain

2014-05-14 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13997928#comment-13997928
 ] 

Hadoop QA commented on HDFS-2949:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12644835/HDFS-2949-v3.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs:

  
org.apache.hadoop.hdfs.server.balancer.TestBalancerWithNodeGroup

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/6899//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/6899//console

This message is automatically generated.

> HA: Add check to active state transition to prevent operator-induced split 
> brain
> 
>
> Key: HDFS-2949
> URL: https://issues.apache.org/jira/browse/HDFS-2949
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: ha, namenode
>Affects Versions: 0.24.0
>Reporter: Todd Lipcon
>Assignee: Rushabh S Shah
> Attachments: HDFS-2949-v2.patch, HDFS-2949-v3.patch, HDFS-2949.patch
>
>
> Currently, if the administrator mistakenly calls "-transitionToActive" on one 
> NN while the other one is still active, all hell will break loose. We can add 
> a simple check by having the NN make a getServiceState() RPC to its peer with 
> a short (~1 second?) timeout. If the RPC succeeds and indicates the other 
> node is active, it should refuse to enter active mode. If the RPC fails or 
> indicates standby, it can proceed.
> This is just meant as a preventative safety check - we still expect users to 
> use the "-failover" command which has other checks plus fencing built in.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-2949) HA: Add check to active state transition to prevent operator-induced split brain

2014-05-14 Thread Rushabh S Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13998010#comment-13998010
 ] 

Rushabh S Shah commented on HDFS-2949:
--

Thanks Kihwal for reviewing and committing the patch.

> HA: Add check to active state transition to prevent operator-induced split 
> brain
> 
>
> Key: HDFS-2949
> URL: https://issues.apache.org/jira/browse/HDFS-2949
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: ha, namenode
>Affects Versions: 0.24.0
>Reporter: Todd Lipcon
>Assignee: Rushabh S Shah
> Fix For: 3.0.0, 2.5.0
>
> Attachments: HDFS-2949-v2.patch, HDFS-2949-v3.patch, HDFS-2949.patch
>
>
> Currently, if the administrator mistakenly calls "-transitionToActive" on one 
> NN while the other one is still active, all hell will break loose. We can add 
> a simple check by having the NN make a getServiceState() RPC to its peer with 
> a short (~1 second?) timeout. If the RPC succeeds and indicates the other 
> node is active, it should refuse to enter active mode. If the RPC fails or 
> indicates standby, it can proceed.
> This is just meant as a preventative safety check - we still expect users to 
> use the "-failover" command which has other checks plus fencing built in.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-2949) HA: Add check to active state transition to prevent operator-induced split brain

2014-04-27 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13982212#comment-13982212
 ] 

Hadoop QA commented on HDFS-2949:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12642104/HDFS-2949-v2.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/6749//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/6749//console

This message is automatically generated.

> HA: Add check to active state transition to prevent operator-induced split 
> brain
> 
>
> Key: HDFS-2949
> URL: https://issues.apache.org/jira/browse/HDFS-2949
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: ha, namenode
>Affects Versions: 0.24.0
>Reporter: Todd Lipcon
>Assignee: Rushabh S Shah
> Attachments: HDFS-2949-v2.patch, HDFS-2949.patch
>
>
> Currently, if the administrator mistakenly calls "-transitionToActive" on one 
> NN while the other one is still active, all hell will break loose. We can add 
> a simple check by having the NN make a getServiceState() RPC to its peer with 
> a short (~1 second?) timeout. If the RPC succeeds and indicates the other 
> node is active, it should refuse to enter active mode. If the RPC fails or 
> indicates standby, it can proceed.
> This is just meant as a preventative safety check - we still expect users to 
> use the "-failover" command which has other checks plus fencing built in.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-2949) HA: Add check to active state transition to prevent operator-induced split brain

2014-04-25 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13981847#comment-13981847
 ] 

Hadoop QA commented on HDFS-2949:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12642029/HDFS-2949.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-common-project/hadoop-common hadoop-hdfs-project/hadoop-hdfs:

  org.apache.hadoop.hdfs.tools.TestDFSHAAdmin

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/6744//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/6744//console

This message is automatically generated.

> HA: Add check to active state transition to prevent operator-induced split 
> brain
> 
>
> Key: HDFS-2949
> URL: https://issues.apache.org/jira/browse/HDFS-2949
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: ha, namenode
>Affects Versions: 0.24.0
>Reporter: Todd Lipcon
>Assignee: Rushabh S Shah
> Attachments: HDFS-2949.patch
>
>
> Currently, if the administrator mistakenly calls "-transitionToActive" on one 
> NN while the other one is still active, all hell will break loose. We can add 
> a simple check by having the NN make a getServiceState() RPC to its peer with 
> a short (~1 second?) timeout. If the RPC succeeds and indicates the other 
> node is active, it should refuse to enter active mode. If the RPC fails or 
> indicates standby, it can proceed.
> This is just meant as a preventative safety check - we still expect users to 
> use the "-failover" command which has other checks plus fencing built in.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-2949) HA: Add check to active state transition to prevent operator-induced split brain

2012-03-15 Thread Todd Lipcon (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13230550#comment-13230550
 ] 

Todd Lipcon commented on HDFS-2949:
---

Another safety check here is to make sure that the transaction IDs match 
between the nodes before going active.

> HA: Add check to active state transition to prevent operator-induced split 
> brain
> 
>
> Key: HDFS-2949
> URL: https://issues.apache.org/jira/browse/HDFS-2949
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: ha, name-node
>Affects Versions: 0.24.0
>Reporter: Todd Lipcon
>
> Currently, if the administrator mistakenly calls "-transitionToActive" on one 
> NN while the other one is still active, all hell will break loose. We can add 
> a simple check by having the NN make a getServiceState() RPC to its peer with 
> a short (~1 second?) timeout. If the RPC succeeds and indicates the other 
> node is active, it should refuse to enter active mode. If the RPC fails or 
> indicates standby, it can proceed.
> This is just meant as a preventative safety check - we still expect users to 
> use the "-failover" command which has other checks plus fencing built in.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2949) HA: Add check to active state transition to prevent operator-induced split brain

2012-02-14 Thread Todd Lipcon (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13208232#comment-13208232
 ] 

Todd Lipcon commented on HDFS-2949:
---

Yep, this is not supposed to solve issues, just to prevent a mistake in the 
common case. Fencing is the correct answer to prevent split brain in the 
general case.

Asking for confirmation might be a nice improvement as well, so long as there's 
a --force option.

> HA: Add check to active state transition to prevent operator-induced split 
> brain
> 
>
> Key: HDFS-2949
> URL: https://issues.apache.org/jira/browse/HDFS-2949
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ha, name-node
>Affects Versions: HA branch (HDFS-1623)
>Reporter: Todd Lipcon
>
> Currently, if the administrator mistakenly calls "-transitionToActive" on one 
> NN while the other one is still active, all hell will break loose. We can add 
> a simple check by having the NN make a getServiceState() RPC to its peer with 
> a short (~1 second?) timeout. If the RPC succeeds and indicates the other 
> node is active, it should refuse to enter active mode. If the RPC fails or 
> indicates standby, it can proceed.
> This is just meant as a preventative safety check - we still expect users to 
> use the "-failover" command which has other checks plus fencing built in.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2949) HA: Add check to active state transition to prevent operator-induced split brain

2012-02-14 Thread Uma Maheswara Rao G (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13208231#comment-13208231
 ] 

Uma Maheswara Rao G commented on HDFS-2949:
---

{quote}
That said, having the safety check described in this JIRA is still valuable, 
{quote}
Agreed with this point to add safety checks. But anyway this can not solve 100% 
split barain scenarios right? (ex: small network breakage between active and 
standby and admin accidentally executed -transitiontoActive on standby.) I 
think this will be addressed in future as part of Automatic failover and shared 
storage fencing. But when admins deals directly with command line for some 
maintanence purpose, this case may occur right?
Also for the apis transitionTo*, do we need to take the confirmation from the 
user before actually transitioning? this may give some more attention to the 
admin for proceeding.

> HA: Add check to active state transition to prevent operator-induced split 
> brain
> 
>
> Key: HDFS-2949
> URL: https://issues.apache.org/jira/browse/HDFS-2949
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ha, name-node
>Affects Versions: HA branch (HDFS-1623)
>Reporter: Todd Lipcon
>
> Currently, if the administrator mistakenly calls "-transitionToActive" on one 
> NN while the other one is still active, all hell will break loose. We can add 
> a simple check by having the NN make a getServiceState() RPC to its peer with 
> a short (~1 second?) timeout. If the RPC succeeds and indicates the other 
> node is active, it should refuse to enter active mode. If the RPC fails or 
> indicates standby, it can proceed.
> This is just meant as a preventative safety check - we still expect users to 
> use the "-failover" command which has other checks plus fencing built in.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2949) HA: Add check to active state transition to prevent operator-induced split brain

2012-02-14 Thread Todd Lipcon (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13208168#comment-13208168
 ] 

Todd Lipcon commented on HDFS-2949:
---

I think we should probably un-document the transitionTo* commands, but leave 
them as a safety valve. It's nice to have direct access to these RPCs just in 
case there's some problem with one of the safer methods and you need a 
workaround without recompiling the client.

That said, having the safety check described in this JIRA is still valuable, 
even using haadmin -failover, in case the admin has a messed up configuration 
in some way (eg the fencing script returns true but did not in fact fence the 
standby correctly)

> HA: Add check to active state transition to prevent operator-induced split 
> brain
> 
>
> Key: HDFS-2949
> URL: https://issues.apache.org/jira/browse/HDFS-2949
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ha, name-node
>Affects Versions: HA branch (HDFS-1623)
>Reporter: Todd Lipcon
>
> Currently, if the administrator mistakenly calls "-transitionToActive" on one 
> NN while the other one is still active, all hell will break loose. We can add 
> a simple check by having the NN make a getServiceState() RPC to its peer with 
> a short (~1 second?) timeout. If the RPC succeeds and indicates the other 
> node is active, it should refuse to enter active mode. If the RPC fails or 
> indicates standby, it can proceed.
> This is just meant as a preventative safety check - we still expect users to 
> use the "-failover" command which has other checks plus fencing built in.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2949) HA: Add check to active state transition to prevent operator-induced split brain

2012-02-14 Thread Hari Mankude (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13208161#comment-13208161
 ] 

Hari Mankude commented on HDFS-2949:


If -failover command can handle this situation and other situations correctly, 
why not deprecate -transitiontoActive entirely?


> HA: Add check to active state transition to prevent operator-induced split 
> brain
> 
>
> Key: HDFS-2949
> URL: https://issues.apache.org/jira/browse/HDFS-2949
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ha, name-node
>Affects Versions: HA branch (HDFS-1623)
>Reporter: Todd Lipcon
>
> Currently, if the administrator mistakenly calls "-transitionToActive" on one 
> NN while the other one is still active, all hell will break loose. We can add 
> a simple check by having the NN make a getServiceState() RPC to its peer with 
> a short (~1 second?) timeout. If the RPC succeeds and indicates the other 
> node is active, it should refuse to enter active mode. If the RPC fails or 
> indicates standby, it can proceed.
> This is just meant as a preventative safety check - we still expect users to 
> use the "-failover" command which has other checks plus fencing built in.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira