[jira] [Commented] (HDFS-1973) HA: HDFS clients must handle namenode failover and switch over to the new active namenode.

2011-10-06 Thread Todd Lipcon (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13122403#comment-13122403
 ] 

Todd Lipcon commented on HDFS-1973:
---

+1 lgtm

> HA: HDFS clients must handle namenode failover and switch over to the new 
> active namenode.
> --
>
> Key: HDFS-1973
> URL: https://issues.apache.org/jira/browse/HDFS-1973
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Suresh Srinivas
>Assignee: Aaron T. Myers
> Attachments: HDFS-1973-HDFS-1623.patch, HDFS-1973-HDFS-1623.patch, 
> HDFS-1973-HDFS-1623.patch, HDFS-1973-HDFS-1623.patch, 
> HDFS-1973-HDFS-1623.patch, HDFS-1973-HDFS-1623.patch, 
> HDFS-1973-HDFS-1623.patch, hdfs-1973.0.patch
>
>
> During failover, a client must detect the current active namenode failure and 
> switch over to the new active namenode. The switch over might make use of IP 
> failover or some thing more elaborate such as zookeeper to discover the new 
> active.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-1973) HA: HDFS clients must handle namenode failover and switch over to the new active namenode.

2011-10-06 Thread Hadoop QA (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13122381#comment-13122381
 ] 

Hadoop QA commented on HDFS-1973:
-

+1 overall.  Here are the results of testing the latest attachment 
  
http://issues.apache.org/jira/secure/attachment/12498050/HDFS-1973-HDFS-1623.patch
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 2 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs (version 1.3.9) 
warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed unit tests in .

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/1346//testReport/
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/1346//console

This message is automatically generated.

> HA: HDFS clients must handle namenode failover and switch over to the new 
> active namenode.
> --
>
> Key: HDFS-1973
> URL: https://issues.apache.org/jira/browse/HDFS-1973
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Suresh Srinivas
>Assignee: Aaron T. Myers
> Attachments: HDFS-1973-HDFS-1623.patch, HDFS-1973-HDFS-1623.patch, 
> HDFS-1973-HDFS-1623.patch, HDFS-1973-HDFS-1623.patch, 
> HDFS-1973-HDFS-1623.patch, HDFS-1973-HDFS-1623.patch, 
> HDFS-1973-HDFS-1623.patch, hdfs-1973.0.patch
>
>
> During failover, a client must detect the current active namenode failure and 
> switch over to the new active namenode. The switch over might make use of IP 
> failover or some thing more elaborate such as zookeeper to discover the new 
> active.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-1973) HA: HDFS clients must handle namenode failover and switch over to the new active namenode.

2011-10-06 Thread Hadoop QA (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13121784#comment-13121784
 ] 

Hadoop QA commented on HDFS-1973:
-

-1 overall.  Here are the results of testing the latest attachment 
  
http://issues.apache.org/jira/secure/attachment/12497950/HDFS-1973-HDFS-1623.patch
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 2 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

-1 findbugs.  The patch appears to introduce 1 new Findbugs (version 1.3.9) 
warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed unit tests in .

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/1342//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HDFS-Build/1342//artifact/trunk/hadoop-hdfs-project/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/1342//console

This message is automatically generated.

> HA: HDFS clients must handle namenode failover and switch over to the new 
> active namenode.
> --
>
> Key: HDFS-1973
> URL: https://issues.apache.org/jira/browse/HDFS-1973
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Suresh Srinivas
>Assignee: Aaron T. Myers
> Attachments: HDFS-1973-HDFS-1623.patch, HDFS-1973-HDFS-1623.patch, 
> HDFS-1973-HDFS-1623.patch, HDFS-1973-HDFS-1623.patch, 
> HDFS-1973-HDFS-1623.patch, HDFS-1973-HDFS-1623.patch, hdfs-1973.0.patch
>
>
> During failover, a client must detect the current active namenode failure and 
> switch over to the new active namenode. The switch over might make use of IP 
> failover or some thing more elaborate such as zookeeper to discover the new 
> active.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-1973) HA: HDFS clients must handle namenode failover and switch over to the new active namenode.

2011-10-05 Thread Hadoop QA (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13121729#comment-13121729
 ] 

Hadoop QA commented on HDFS-1973:
-

-1 overall.  Here are the results of testing the latest attachment 
  
http://issues.apache.org/jira/secure/attachment/12497935/HDFS-1973-HDFS-1623.patch
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 2 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

-1 findbugs.  The patch appears to introduce 1 new Findbugs (version 1.3.9) 
warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed unit tests in .

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/1340//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HDFS-Build/1340//artifact/trunk/hadoop-hdfs-project/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/1340//console

This message is automatically generated.

> HA: HDFS clients must handle namenode failover and switch over to the new 
> active namenode.
> --
>
> Key: HDFS-1973
> URL: https://issues.apache.org/jira/browse/HDFS-1973
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Suresh Srinivas
>Assignee: Aaron T. Myers
> Attachments: HDFS-1973-HDFS-1623.patch, HDFS-1973-HDFS-1623.patch, 
> HDFS-1973-HDFS-1623.patch, HDFS-1973-HDFS-1623.patch, 
> HDFS-1973-HDFS-1623.patch, hdfs-1973.0.patch
>
>
> During failover, a client must detect the current active namenode failure and 
> switch over to the new active namenode. The switch over might make use of IP 
> failover or some thing more elaborate such as zookeeper to discover the new 
> active.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-1973) HA: HDFS clients must handle namenode failover and switch over to the new active namenode.

2011-10-05 Thread Hadoop QA (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13121690#comment-13121690
 ] 

Hadoop QA commented on HDFS-1973:
-

-1 overall.  Here are the results of testing the latest attachment 
  
http://issues.apache.org/jira/secure/attachment/12497942/HDFS-1973-HDFS-1623.patch
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 2 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

-1 findbugs.  The patch appears to introduce 1 new Findbugs (version 1.3.9) 
warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed unit tests in .

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/1341//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HDFS-Build/1341//artifact/trunk/hadoop-hdfs-project/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/1341//console

This message is automatically generated.

> HA: HDFS clients must handle namenode failover and switch over to the new 
> active namenode.
> --
>
> Key: HDFS-1973
> URL: https://issues.apache.org/jira/browse/HDFS-1973
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Suresh Srinivas
>Assignee: Aaron T. Myers
> Attachments: HDFS-1973-HDFS-1623.patch, HDFS-1973-HDFS-1623.patch, 
> HDFS-1973-HDFS-1623.patch, HDFS-1973-HDFS-1623.patch, 
> HDFS-1973-HDFS-1623.patch, hdfs-1973.0.patch
>
>
> During failover, a client must detect the current active namenode failure and 
> switch over to the new active namenode. The switch over might make use of IP 
> failover or some thing more elaborate such as zookeeper to discover the new 
> active.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-1973) HA: HDFS clients must handle namenode failover and switch over to the new active namenode.

2011-10-05 Thread Todd Lipcon (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13121662#comment-13121662
 ] 

Todd Lipcon commented on HDFS-1973:
---

One small thing I didn't notice before (sorry):
- in the test cases, you have new methods verifyFileContents and 
writeContentsToFile. These are very similar to AppendTestUtil.write and 
AppndTestUtil.check. Can you use those instead? If not, you should fix the call 
to {{in.read}} to use {{IOUtils.readFully}} since {{read}} isn't guaranteed to 
read the whole length.

> HA: HDFS clients must handle namenode failover and switch over to the new 
> active namenode.
> --
>
> Key: HDFS-1973
> URL: https://issues.apache.org/jira/browse/HDFS-1973
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Suresh Srinivas
>Assignee: Aaron T. Myers
> Attachments: HDFS-1973-HDFS-1623.patch, HDFS-1973-HDFS-1623.patch, 
> HDFS-1973-HDFS-1623.patch, HDFS-1973-HDFS-1623.patch, hdfs-1973.0.patch
>
>
> During failover, a client must detect the current active namenode failure and 
> switch over to the new active namenode. The switch over might make use of IP 
> failover or some thing more elaborate such as zookeeper to discover the new 
> active.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-1973) HA: HDFS clients must handle namenode failover and switch over to the new active namenode.

2011-10-03 Thread Aaron T. Myers (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13119607#comment-13119607
 ] 

Aaron T. Myers commented on HDFS-1973:
--

Thanks a lot for the review, Todd. Doing what you describe will require Common 
changes. I've filed HADOOP-7717 to address these. Once that's committed, I'll 
update the patch here to remove the synchronization from 
{{ConfiguredFailoverProxyProvider}}.

> HA: HDFS clients must handle namenode failover and switch over to the new 
> active namenode.
> --
>
> Key: HDFS-1973
> URL: https://issues.apache.org/jira/browse/HDFS-1973
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Suresh Srinivas
>Assignee: Aaron T. Myers
> Attachments: HDFS-1973-HDFS-1623.patch, HDFS-1973-HDFS-1623.patch, 
> HDFS-1973-HDFS-1623.patch, hdfs-1973.0.patch
>
>
> During failover, a client must detect the current active namenode failure and 
> switch over to the new active namenode. The switch over might make use of IP 
> failover or some thing more elaborate such as zookeeper to discover the new 
> active.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-1973) HA: HDFS clients must handle namenode failover and switch over to the new active namenode.

2011-10-03 Thread Todd Lipcon (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13119567#comment-13119567
 ] 

Todd Lipcon commented on HDFS-1973:
---

Aaron and I just chatted about this a bit. Here's a summary of what we 
discussed:
- the if condition in {{performFailover}} was somewhat confusing to me as to 
its purpose. Aaron explained that its purpose is to avoid the case where 
multiple outstanding RPC calls fail, and then they all call performFailover at 
the same time. If there were an even number of such calls, and you didn't do 
any such checks for "already failed over", then you'd have a case where you 
failover twice and end up back at the original proxy object.
- we decided that, rather than try to handle this situation in the 
FailoverProvider itself, it would be better to do this at the caller. 
Otherwise, each failover provider implementation will have to have this same 
concern.

So, Aaron is going to update the patch to include a safeguard at the call site 
of {{performFailver}} which checks that, before calling performFailover, 
another thread hasn't already failed over to a new proxy object.

> HA: HDFS clients must handle namenode failover and switch over to the new 
> active namenode.
> --
>
> Key: HDFS-1973
> URL: https://issues.apache.org/jira/browse/HDFS-1973
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Suresh Srinivas
>Assignee: Aaron T. Myers
> Attachments: HDFS-1973-HDFS-1623.patch, HDFS-1973-HDFS-1623.patch, 
> HDFS-1973-HDFS-1623.patch, hdfs-1973.0.patch
>
>
> During failover, a client must detect the current active namenode failure and 
> switch over to the new active namenode. The switch over might make use of IP 
> failover or some thing more elaborate such as zookeeper to discover the new 
> active.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-1973) HA: HDFS clients must handle namenode failover and switch over to the new active namenode.

2011-09-30 Thread Hadoop QA (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13118553#comment-13118553
 ] 

Hadoop QA commented on HDFS-1973:
-

-1 overall.  Here are the results of testing the latest attachment 
  
http://issues.apache.org/jira/secure/attachment/12497230/HDFS-1973-HDFS-1623.patch
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 2 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

-1 findbugs.  The patch appears to introduce 1 new Findbugs (version 1.3.9) 
warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed unit tests in .

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/1324//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HDFS-Build/1324//artifact/trunk/hadoop-hdfs-project/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/1324//console

This message is automatically generated.

> HA: HDFS clients must handle namenode failover and switch over to the new 
> active namenode.
> --
>
> Key: HDFS-1973
> URL: https://issues.apache.org/jira/browse/HDFS-1973
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Suresh Srinivas
>Assignee: Aaron T. Myers
> Attachments: HDFS-1973-HDFS-1623.patch, HDFS-1973-HDFS-1623.patch, 
> HDFS-1973-HDFS-1623.patch, hdfs-1973.0.patch
>
>
> During failover, a client must detect the current active namenode failure and 
> switch over to the new active namenode. The switch over might make use of IP 
> failover or some thing more elaborate such as zookeeper to discover the new 
> active.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-1973) HA: HDFS clients must handle namenode failover and switch over to the new active namenode.

2011-09-30 Thread Todd Lipcon (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13118478#comment-13118478
 ] 

Todd Lipcon commented on HDFS-1973:
---

bq. Sure, but the whole of DFSClient is annotated @InterfaceAudience.Private. 
You still think we should keep it around?

Yes, because unfortunately the Private annotation was added after the 0.20 
release. If it's problematic to keep around, we don't have to, but it seemed 
easy enough to maintain for now.


{code}
+  @Override
+  public synchronized void performFailover(Object currentProxy) {
+if (proxies.get(currentProxyIndex) != currentProxy) {
+  currentProxyIndex = (currentProxyIndex + 1) % proxies.size();
+}
+  }
{code}
This code confuses me -- isn't {{currentProxy}} a proxy object, whereas 
{{proxies.get(...)}} is an {{AddressRpcProxyPair}}? Which is to say they're 
always un-equal?


> HA: HDFS clients must handle namenode failover and switch over to the new 
> active namenode.
> --
>
> Key: HDFS-1973
> URL: https://issues.apache.org/jira/browse/HDFS-1973
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Suresh Srinivas
>Assignee: Aaron T. Myers
> Attachments: HDFS-1973-HDFS-1623.patch, HDFS-1973-HDFS-1623.patch, 
> hdfs-1973.0.patch
>
>
> During failover, a client must detect the current active namenode failure and 
> switch over to the new active namenode. The switch over might make use of IP 
> failover or some thing more elaborate such as zookeeper to discover the new 
> active.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-1973) HA: HDFS clients must handle namenode failover and switch over to the new active namenode.

2011-09-26 Thread Hadoop QA (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13115101#comment-13115101
 ] 

Hadoop QA commented on HDFS-1973:
-

-1 overall.  Here are the results of testing the latest attachment 
  
http://issues.apache.org/jira/secure/attachment/12496201/HDFS-1973-HDFS-1623.patch
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 2 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

-1 findbugs.  The patch appears to introduce 1 new Findbugs (version 1.3.9) 
warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

-1 core tests.  The patch failed these unit tests:
  org.apache.hadoop.hdfs.TestDfsOverAvroRpc
  
org.apache.hadoop.hdfs.server.blockmanagement.TestHost2NodesMap

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/1291//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HDFS-Build/1291//artifact/trunk/hadoop-hdfs-project/patchprocess/newPatchFindbugsWarningshadoop-hdfs.html
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/1291//console

This message is automatically generated.

> HA: HDFS clients must handle namenode failover and switch over to the new 
> active namenode.
> --
>
> Key: HDFS-1973
> URL: https://issues.apache.org/jira/browse/HDFS-1973
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Suresh Srinivas
>Assignee: Aaron T. Myers
> Attachments: HDFS-1973-HDFS-1623.patch, HDFS-1973-HDFS-1623.patch, 
> hdfs-1973.0.patch
>
>
> During failover, a client must detect the current active namenode failure and 
> switch over to the new active namenode. The switch over might make use of IP 
> failover or some thing more elaborate such as zookeeper to discover the new 
> active.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-1973) HA: HDFS clients must handle namenode failover and switch over to the new active namenode.

2011-09-26 Thread Aaron T. Myers (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13115036#comment-13115036
 ] 

Aaron T. Myers commented on HDFS-1973:
--

bq. A general note: this failover proxy provider doesn't get passed enough info 
to choose different configs based on the "logical URI". For example, if I have 
two clusters, foo and bar I'd expect to be able to configure 
dfs.ha.namenodes.foo separately from dfs.ha.namenodes.bar after setting both of 
dfs.client.failover.proxy.provider.foo|bar.

I've filed HDFS-2367 to address this issue.

> HA: HDFS clients must handle namenode failover and switch over to the new 
> active namenode.
> --
>
> Key: HDFS-1973
> URL: https://issues.apache.org/jira/browse/HDFS-1973
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Suresh Srinivas
>Assignee: Aaron T. Myers
> Attachments: HDFS-1973-HDFS-1623.patch, HDFS-1973-HDFS-1623.patch, 
> hdfs-1973.0.patch
>
>
> During failover, a client must detect the current active namenode failure and 
> switch over to the new active namenode. The switch over might make use of IP 
> failover or some thing more elaborate such as zookeeper to discover the new 
> active.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-1973) HA: HDFS clients must handle namenode failover and switch over to the new active namenode.

2011-09-22 Thread Aaron T. Myers (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13113045#comment-13113045
 ] 

Aaron T. Myers commented on HDFS-1973:
--

Thanks a lot for the review, Todd. I'll attach an updated patch addressing most 
of your comments in just a moment. Comments below.

bq. not sure if removing the DFSClient(Configuration conf) constructor is a 
good idea - it was deprecated at 0.21 but since we haven't had a well-adopted 
release since then, seems we should keep it around. (it's trivial to keep, 
right?)

Sure, but the whole of {{DFSClient}} is annotated 
{{@InterfaceAudience.Private}}. You still think we should keep it around?

bq. some log messages left in: LOG.info("address of nn1: " + 
conf.get("dfs.atm.nn1"))

Whoops! Fixed. Any more that you noticed? Or just that one? :)

bq. some of the lines seem to have a lot of columns - try to wrap at 80?

Like which? I wasn't super strict about wrapping at 80, but I'm pretty sure 
none go over 90.

bq. maybe rename this parameter to nameNodeUri?

Done.

bq. Shouldn't that have a "." at the end? It seems authorities are appended 
without a "." in the middle.

Yep, great catch. Fixed.

bq. why? It seems we should at least be logging a WARN if this is configured to 
point to a class that can't be loaded, if not re-throwing as IOE.

Good point. I've changed it to throw an {{IOE}} with a descriptive error 
message.

bq. I'm not sure that o.a.h.h.protocol is the right package for 
ConfiguredFailoverProxyProvider - maybe it belongs in the HA package?

Good point. I've moved it to o.a.h.server.namenode.ha.

bq. missing license headers on many of the new files

Fixed. It's just two new files.

bq. these will throw NPE in the case of misconfiguration. Should at least throw 
an exception indicating what the mistaken config is.

Fixed.

bq. Also, it seems more sensible to have a single config like 
"dfs.ha.namenode.addresses" with comma-separated URIs, since there's nothing 
explicitly tied to having exactly 2 NNs here, right?

Good point. Fixed.

bq. A general note: this failover proxy provider doesn't get passed enough info 
to choose different configs based on the "logical URI". For example, if I have 
two clusters, foo and bar I'd expect to be able to configure 
dfs.ha.namenodes.foo separately from dfs.ha.namenodes.bar after setting both of 
dfs.client.failover.proxy.provider.foo|bar.

Good point. Mind if we address this in a follow-up JIRA? It seems like the 
solution would be to replace "dfs.ha.namenode.addresses" with something like 
"dfs.ha.namenode.addresses.", so as to be able to support 
configuring the addresses of multiple HA clusters.

bq. Can we separate the addition of Idempotent annotations to a different patch?

Certainly. I've removed all of them from this patch except for 
{{getBlockLocations}}, which is necessary for the test to work.

bq. file a followup JIRA or use "TODO" so we don't forget about this case?

Changed to:

{noformat}
// TODO(HA): This test should probably be made to fail if a client fails over
// to talk to an NN with a different block pool id. Once failover between
// active/standy in a single block pool is implemented, this test should be
// changed to exercise that.
{noformat}

> HA: HDFS clients must handle namenode failover and switch over to the new 
> active namenode.
> --
>
> Key: HDFS-1973
> URL: https://issues.apache.org/jira/browse/HDFS-1973
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Suresh Srinivas
>Assignee: Aaron T. Myers
> Attachments: HDFS-1973-HDFS-1623.patch, HDFS-1973-HDFS-1623.patch, 
> hdfs-1973.0.patch
>
>
> During failover, a client must detect the current active namenode failure and 
> switch over to the new active namenode. The switch over might make use of IP 
> failover or some thing more elaborate such as zookeeper to discover the new 
> active.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-1973) HA: HDFS clients must handle namenode failover and switch over to the new active namenode.

2011-09-20 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13109200#comment-13109200
 ] 

Todd Lipcon commented on HDFS-1973:
---

- not sure if removing the DFSClient(Configuration conf) constructor is a good 
idea - it was deprecated at 0.21 but since we haven't had a well-adopted 
release since then, seems we should keep it around. (it's trivial to keep, 
right?)
- some log messages left in: LOG.info("address of nn1: " + 
conf.get("dfs.atm.nn1"))
- some of the lines seem to have a lot of columns - try to wrap at 80?



{code}
-  DFSClient(InetSocketAddress nameNodeAddr, ClientProtocol rpcNamenode,
+  DFSClient(URI nameNodeAddr, ClientProtocol rpcNamenode,
{code}
maybe rename this parameter to {{nameNodeUri}}?


{code}
+  public static final String  DFS_CLIENT_FAILOVER_PROXY_PROVIDER_KEY_PREFIX = 
"dfs.client.failover.proxy.provider";
{code}
Shouldn't that have a "." at the end? It seems authorities are appended without 
a "." in the middle.


{code}
+} catch (RuntimeException e) {
+  if (e.getCause() instanceof ClassNotFoundException) {
+return null;
{code}
why? It seems we should at least be logging a WARN if this is configured to 
point to a class that can't be loaded, if not re-throwing as IOE.



- I'm not sure that o.a.h.h.protocol is the right package for 
ConfiguredFailoverProxyProvider - maybe it belongs in the HA package?
- missing license headers on many of the new files


{code}
+  InetSocketAddress first = NameNode.getAddress(
+  new URI(conf.get(CONFIGURED_NAMENODE_ADDRESS_FIRST)).getAuthority());
+  InetSocketAddress second = NameNode.getAddress(
+  new 
URI(conf.get(CONFIGURED_NAMENODE_ADDRESS_SECOND)).getAuthority());
{code}
these will throw NPE in the case of misconfiguration. Should at least throw an 
exception indicating what the mistaken config is.

Also, it seems more sensible to have a single config like 
"dfs.ha.namenode.addresses" with comma-separated URIs, since there's nothing 
explicitly tied to having exactly 2 NNs here, right?

A general note: this failover proxy provider doesn't get passed enough info to 
choose different configs based on the "logical URI". For example, if I have two 
clusters, {{foo}} and {{bar}} I'd expect to be able to configure 
{{dfs.ha.namenodes.foo}} separately from {{dfs.ha.namenodes.bar}} after setting 
both of {{dfs.client.failover.proxy.provider.foo|bar}}.

- Can we separate the addition of Idempotent annotations to a different patch?


{code}
+  // This test should probably be made to fail if a client fails over to talk 
to
+  // an NN with a different block pool id.
{code}
file a followup JIRA or use "TODO" so we don't forget about this case?

> HA: HDFS clients must handle namenode failover and switch over to the new 
> active namenode.
> --
>
> Key: HDFS-1973
> URL: https://issues.apache.org/jira/browse/HDFS-1973
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Suresh Srinivas
>Assignee: Aaron T. Myers
> Attachments: HDFS-1973-HDFS-1623.patch, hdfs-1973.0.patch
>
>
> During failover, a client must detect the current active namenode failure and 
> switch over to the new active namenode. The switch over might make use of IP 
> failover or some thing more elaborate such as zookeeper to discover the new 
> active.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-1973) HA: HDFS clients must handle namenode failover and switch over to the new active namenode.

2011-08-26 Thread Aaron T. Myers (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13092073#comment-13092073
 ] 

Aaron T. Myers commented on HDFS-1973:
--

@Eli, sure I'll file a separate JIRA. It'd certainly be worth enumerating all 
of the places where HTTP fail-over is an issue.

The example you provided is an interesting one. It seems you're assuming that 
an HA setup would have three nodes - active, standby, and 2NN, with the 2NN 
failing over to do checkpointing against the standby after a failure of the 
active. The design document in HDFS-1623 doesn't really address checkpointing. 
I've heard from Suresh and Todd informally that the intention is probably to 
make the standby node also capable of performing checkpointing. I'll file a 
separate JIRA to address this as well.

> HA: HDFS clients must handle namenode failover and switch over to the new 
> active namenode.
> --
>
> Key: HDFS-1973
> URL: https://issues.apache.org/jira/browse/HDFS-1973
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Suresh Srinivas
>Assignee: Aaron T. Myers
>
> During failover, a client must detect the current active namenode failure and 
> switch over to the new active namenode. The switch over might make use of IP 
> failover or some thing more elaborate such as zookeeper to discover the new 
> active.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-1973) HA: HDFS clients must handle namenode failover and switch over to the new active namenode.

2011-08-25 Thread Eli Collins (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13091525#comment-13091525
 ] 

Eli Collins commented on HDFS-1973:
---

@atm - mind filing a separate jira for HTTP fail-over? In some cases we may be 
able to define away the problem. Eg now that 1073 is in we could modify the 2NN 
to not have to fetch files via http (at least when using shared storage). 

> HA: HDFS clients must handle namenode failover and switch over to the new 
> active namenode.
> --
>
> Key: HDFS-1973
> URL: https://issues.apache.org/jira/browse/HDFS-1973
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Suresh Srinivas
>Assignee: Aaron T. Myers
>
> During failover, a client must detect the current active namenode failure and 
> switch over to the new active namenode. The switch over might make use of IP 
> failover or some thing more elaborate such as zookeeper to discover the new 
> active.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-1973) HA: HDFS clients must handle namenode failover and switch over to the new active namenode.

2011-08-23 Thread Aaron T. Myers (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13089807#comment-13089807
 ] 

Aaron T. Myers commented on HDFS-1973:
--

bq. Alternatively client gets the address of both the namenodes. Tries them one 
at a time until it gets connected to the new active.

This is what I was trying to communicate with "Configuration-based client 
failover. Clients are configured with a set of NN addresses to try until an 
operation succeeds." I think we're in agreement on this point, just talking 
past each other a little bit. :)

bq. Proxy based client failover is an implementation details. It still needs to 
figure out the new active based on one of the schemes above.

The proxy process would need to figure out the address of the new active, but 
clients wouldn't - the clients would just have the address of the proxy. The 
only thing for the client to do, then, would be to retry the RPC to the same 
address (the address of the proxy.)

bq. +1 for logical URI. We could consider merging this requirement with 
HDFS-2231 to do this.

Good point. I'll comment there.

bq. Logical URI is needed for identifying a nameservice and not cluster, since 
federation supports multiple namenodes with in a cluster.

Good point. In the above design document: s/cluster/nameservice/g.

bq. Why should failover method be based on URI cluster part? Can it be a single 
mechanism across all the nameservices? Hence change the parameter to 
dfs.client.ha.failover.method?

Imagine that one writes a program which uses absolute URIs to connect to two 
distinct clusters, one of which is HA-enabled using ZK to resolve the address, 
and the other is not. In this case we should use some ZK-based 
{{FailoverProxyProvider}} for the first, and just the normal RPC connection for 
the second. Thus, the configuration should be per-nameservice. I suppose we 
could do something like introduce {{dfs.client.ha.failover.method.}}, but that seems more annoying to configure to me.

bq. The scheme you have defined works only for RPC protocols. How about HTTP?

Yes, that's certainly true. My thinking there was that since it's generally 
less critical for the NN web interfaces to immediately fail over, and since we 
don't generally control the HTTP clients which access the NN web interface, 
that this be out of scope for this JIRA. To facilitate this, the operator could 
either run a standard HTTP proxy, use round-robin DNS, or even change DNS 
resolution of the NN and wait for clients to get the update address.

bq. I am not sure why logical URI is required for VIP/failover based setup.

The value of the logical URI could be the same as the actual URI of the proxy. 
It would then only be used to configure an appropriate 
{{FailoverProxyProvider}} which would retry failed RPCs to the same address.

> HA: HDFS clients must handle namenode failover and switch over to the new 
> active namenode.
> --
>
> Key: HDFS-1973
> URL: https://issues.apache.org/jira/browse/HDFS-1973
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Suresh Srinivas
>Assignee: Aaron T. Myers
>
> During failover, a client must detect the current active namenode failure and 
> switch over to the new active namenode. The switch over might make use of IP 
> failover or some thing more elaborate such as zookeeper to discover the new 
> active.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-1973) HA: HDFS clients must handle namenode failover and switch over to the new active namenode.

2011-08-18 Thread Suresh Srinivas (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13087225#comment-13087225
 ] 

Suresh Srinivas commented on HDFS-1973:
---

Sorry for the late comment. I had been traveling.

Before {{Cases to support}}, could we add a section like this:
>>
On failover, clients need the address of the new active. This could be done by:
# Contacting zookeeper to get the current active NN.
# Alternatively client gets the address of both the namenodes. Tries them one 
at a time until it gets connected to the new active.
# For setups using IP failover, clients always use the same VIP/failover 
address, which moves to active.
>>
Given this, I am not sure about the {{Cases to support}}:
Proxy based client failover is an implementation details. It still needs to 
figure out the new active based on one of the schemes above. I am not very 
clear on Configuration based support. Do you mean here, client config will be 
changed to point to the new active? DNS SRV records are also unnecessary given 
our config would have both the namenode addresses.

+1 for logical URI. We could consider merging this requirement with HDFS-2231 
to do this. 

Logical URI is needed for identifying a nameservice and not cluster, since 
federation supports multiple namenodes with in a cluster.  We could use the 
concept of nameservice, introduced in federation for that?  So URI would be 
nameservice1.foo.com. nameservices1 maps to nn1, nn2.

As regards to viewfs, I think this scheme will work for viewfs. The viewfs 
mounttables will point to the logical URI, which in turn will use the mechanism 
you are proposing.

Why should failover method be based on URI cluster part? Can it be a single 
mechanism across all the nameservices? Hence change the parameter to 
dfs.client.ha.failover.method?

These are my early thoughts. Some questions I am left with are:
# The scheme you have defined works only for RPC protocols. How about HTTP?
# I am not sure why logical URI is required for VIP/failover based setup.

We could continue to add more details.


> HA: HDFS clients must handle namenode failover and switch over to the new 
> active namenode.
> --
>
> Key: HDFS-1973
> URL: https://issues.apache.org/jira/browse/HDFS-1973
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Suresh Srinivas
>Assignee: Aaron T. Myers
>
> During failover, a client must detect the current active namenode failure and 
> switch over to the new active namenode. The switch over might make use of IP 
> failover or some thing more elaborate such as zookeeper to discover the new 
> active.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-1973) HA: HDFS clients must handle namenode failover and switch over to the new active namenode.

2011-08-17 Thread Aaron T. Myers (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13086524#comment-13086524
 ] 

Aaron T. Myers commented on HDFS-1973:
--

Any feedback on the above proposal?

Sanjay/Suresh - I'd particularly like your feedback since you guys have a lot 
of viewfs expertise, and some of this is similar/related.

> HA: HDFS clients must handle namenode failover and switch over to the new 
> active namenode.
> --
>
> Key: HDFS-1973
> URL: https://issues.apache.org/jira/browse/HDFS-1973
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Suresh Srinivas
>Assignee: Aaron T. Myers
>
> During failover, a client must detect the current active namenode failure and 
> switch over to the new active namenode. The switch over might make use of IP 
> failover or some thing more elaborate such as zookeeper to discover the new 
> active.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-1973) HA: HDFS clients must handle namenode failover and switch over to the new active namenode.

2011-08-04 Thread Aaron T. Myers (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13079584#comment-13079584
 ] 

Aaron T. Myers commented on HDFS-1973:
--

h3. Client Failover overview

On failover between active and standby NNs, it's necessary for clients to be 
redirected to the new active NN. The goal of HDFS-1623 is to provide a 
framework for HDFS HA which can in fact support multiple underlying mechanisms. 
As such, the client failover approach should support multiple options.

h3. Cases to support

# Proxy-based client failover. Clients always communicate with an in-band proxy 
service which forwards all RPCs on to the correct NN. On failure, a process 
causes this proxy to begin sending requests to the now-active NN.
# Virtual IP-based client failover. Clients always connect to a hostname which 
resolves to a particular IP address. On failure of the active NN, a process is 
initiated to switch which NIC will receive packets intended for said IP address 
to the now-active NN. (From a client's perspective, this case is equivalent to 
case #1.)
# Zookeeper-based client failover. The URI to contact the active NN is stored 
in Zookeeper or some other highly-available service. Clients look up which NN 
to talk to by communicating with ZK to discern the currently active NN. On 
failure, some process causes the address stored in ZK to be changed to point to 
the now-active NN.
# Configuration-based client failover. Clients are configured with a set of NN 
addresses to try until an operation succeeds. This configuration might exist in 
client-side configuration files, or perhaps in DNS via a SRV record that lists 
the NNs with different priorities.

h3. Assumptions

This proposal assumes that NN fencing works, and that after a failover any 
standby NN is either unreachable or will throw a {{StandbyException}} on any 
RPC from a client. That is, a client will not possibly receive incorrect 
results if it chooses to contact the wrong NN. This proposal also presumes that 
there is no direct coordination required between any central failover 
coordinator and clients, i.e. there's an intermediate name resolution system of 
some sort (ZK, DNS, local configuration, etc.)

h3. Proposal

The commit of HADOOP-7380 already introduced a facility whereby an IPC 
{{RetryInvocationHandler}} can utilize a {{FailoverProxyProvider}} 
implementation to perform the appropriate client-side action in the event of 
failover. At the moment, the only implementation of a {{FailoverProxyProvider}} 
is the {{DefaultFailoverProxyProvider}}, which does nothing in the case of 
failover. HADOOP-7380 also added an {{@Idempotent}} annotation which can be 
used to identify which methods can be safely retried during a failover event.

What remains, then, is:

# To implement {{FailoverProxyProviders}} which can support the cases outlined 
above (and perhaps others).
# To provide a mechanism to select which {{FailoverProxyProvider}} 
implementation to use for a given HDFS URI.
# To annotate the appropriate HDFS {{ClientProtocol}} interface methods with 
the {{@Idempotent}} tag.

h4. {{FailoverProxyProvider}} implementations

Cases 1 and 2 above can be achieved by implementing a single 
{{FailoverProxyProvider}} which simply retries to reconnect to the previous 
hostname/IP address on failover. Cases 3 and 4 can be implemented as distinct 
custom {{FailoverProxyProviders}}.

h4. A mechanism to select the appropriate {{FailoverProxyProvider}} 
implementation

I propose we add a mechanism to configure a mapping from URI authority -> 
{{FailoverProxyProvider}} implementation. Absolute URIs which previously 
specified the NN host name will instead contain a logical cluster name (which 
might be chosen to be identical to one of the NN's host names) which will be 
used by the chosen {{FailoverProxyProvider}} to determine the appropriate host 
to connect to. Introducing the concept of a cluster name will be a useful 
abstraction in general if, for example, in the future someone develops a 
fully-distributed NN, the cluster name still applies.

On instantiation of a {{DFSClient}} (or other user of an HDFS URI, e.g. HFTP), 
the mapping would be checked to see if there's an entry for the given URI 
authority. If there is not, then a normal RPC client with connected socket to 
the given authority will be created as is done today with a 
{{DefaultProxyProvider}}. If there is an entry, then the authority will be 
treated as a logical cluster name, a {{FailoverProxyProvider}} of the correct 
type will be instantiated (via a factory class), and an RPC client will be 
created which utilizes this {{FailoverProxyProvider}}. The various 
{{FailoverProxyProvider}} implementations are responsible for their own 
configuration.

As a straw man example, consider the following configuration:

{code}

  
fs.defaultFS
cluster1.foo.com
  

  
dfs.ha.client.failover-

[jira] [Commented] (HDFS-1973) HA: HDFS clients must handle namenode failover and switch over to the new active namenode.

2011-06-09 Thread Hari A V (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13046987#comment-13046987
 ] 

Hari A V commented on HDFS-1973:


Hi Aron,

Thanks for the answer. I will watch these issues to get more information :-) 

-Hari

> HA: HDFS clients must handle namenode failover and switch over to the new 
> active namenode.
> --
>
> Key: HDFS-1973
> URL: https://issues.apache.org/jira/browse/HDFS-1973
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Suresh Srinivas
>Assignee: Aaron T. Myers
>
> During failover, a client must detect the current active namenode failure and 
> switch over to the new active namenode. The switch over might make use of IP 
> failover or some thing more elaborate such as zookeeper to discover the new 
> active.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-1973) HA: HDFS clients must handle namenode failover and switch over to the new active namenode.

2011-06-09 Thread Aaron T. Myers (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13046767#comment-13046767
 ] 

Aaron T. Myers commented on HDFS-1973:
--

Hi Hari,

bq. Can you please elaborate a little bit on your area of interest with 
ZOOKEEPER-1080?

As noted in Sanjay's design doc, one proposal for detecting NN failure would be 
to use an external ZK service. The HDFS proposal doesn't go into great detail 
on this, but it suggests using ZK with a heartbeat mechanism to see if the NN 
is still alive. I personally like the ZK recipe better (i.e. using ephemeral + 
sequence nodes).

Another possible use for ZK in the implementation of NN HA would be to use ZK 
as the source of truth for clients to determine the active NN. This would seem 
to flow naturally from the part of the ZK recipe which says "Applications may 
consider creating a separate to znode to acknowledge that the leader has 
executed the leader procedure." If NN HA were to utilize an implementation of 
the ZK leader election recipe, then perhaps this "leader-procedure-complete 
znode" could store the IP or hostname of the active NN which clients could use.

I haven't read the design doc posted on ZOOKEEPER-1080 yet. I'll go ahead and 
do that and post my comments there.

I should also mention that we have not settled upon what strategy we'll take to 
do NN failure detection or client failover. As noted in Sanjay's design doc, 
we're also strongly considering using virtual IPs for client failover.

> HA: HDFS clients must handle namenode failover and switch over to the new 
> active namenode.
> --
>
> Key: HDFS-1973
> URL: https://issues.apache.org/jira/browse/HDFS-1973
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Suresh Srinivas
>Assignee: Aaron T. Myers
>
> During failover, a client must detect the current active namenode failure and 
> switch over to the new active namenode. The switch over might make use of IP 
> failover or some thing more elaborate such as zookeeper to discover the new 
> active.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-1973) HA: HDFS clients must handle namenode failover and switch over to the new active namenode.

2011-06-08 Thread Hari A V (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13046352#comment-13046352
 ] 

Hari A V commented on HDFS-1973:


Hi Aron,

Can you please elaborate a little bit on your area of interest with 
ZOOKEEPER-1080? 

If it is for NN HA, i would be very happy to get your comments on the design 
which i have suggested, so that i can consider them for the patch. 
I have refered to the NameNode+HA_v2_1.pdf uploaded at HDFS-1623 while 
preparing the design doc (that helped me a lot for my design doc). The usecases 
are almost matching. I am currently in the process of making the patch ready 
for submitting.

- Hari

> HA: HDFS clients must handle namenode failover and switch over to the new 
> active namenode.
> --
>
> Key: HDFS-1973
> URL: https://issues.apache.org/jira/browse/HDFS-1973
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Suresh Srinivas
>Assignee: Aaron T. Myers
>
> During failover, a client must detect the current active namenode failure and 
> switch over to the new active namenode. The switch over might make use of IP 
> failover or some thing more elaborate such as zookeeper to discover the new 
> active.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira