[ 
https://issues.apache.org/jira/browse/HADOOP-8212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13239683#comment-13239683
 ] 

Bikas Saha commented on HADOOP-8212:
------------------------------------

bq. The patch does add the same handling to StatCallback. It uses the ZooKeeper 
"context" parameter to pass the original zkClient. Unfortunately the Watcher 
interface doesn't have any context object, which is why I had to introduce the 
wrapper class there.
Not this. I was talking about the handling of session expired code that was 
added to the create callback (and is the title of this jira). The same thing 
could happen in the stat callback. the stat callback could get session expired 
code and send a fatal error instead of letting the process watcher callback 
rejoin the election on session expired.

bq.In my experience working on similar projects in the past, getting all the 
initial code in place is only half the battle. The real work starts once the 
code is there and you start banging on it in realistic test scenarios.
And it might be that if we go slow in the first place we might save that time 
in the later phase :)

bq.We'd like to see automatic failover be a supported piece of the HA solution 
in 0.23.x (..err..2.0), and to hit that timeline, we need to get into the 
latter phase ASAP.
That seems like a worthwhile engineering goal!

bq.If you'd prefer, I'm happy to create a feature branch for auto-failover and 
then call a merge vote when it's ready for the full QA onslaught.
Thats a really good idea! This way at least all the jira will be linked to that 
and easy to follow. And we can make aggressive changes without worrying about 
churning or destabilizing trunk. I dont think voting to bring this important 
piece of work back to trunk will be a problem. But it makes the process a lot 
more manageable and trackable. I am +1 for it. Thanks!

                
> Improve ActiveStandbyElector's behavior when session expires
> ------------------------------------------------------------
>
>                 Key: HADOOP-8212
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8212
>             Project: Hadoop Common
>          Issue Type: Improvement
>    Affects Versions: 0.23.3, 0.24.0
>            Reporter: Todd Lipcon
>            Assignee: Todd Lipcon
>             Fix For: 0.23.3, 0.24.0
>
>         Attachments: hadoop-8212.txt, hadoop-8212.txt
>
>
> Currently when the ZK session expires, it results in a fatal error being sent 
> to the application callback. This is not the best behavior -- for example, in 
> the case of HA, if ZK goes down, we would like the current state to be 
> maintained, rather than causing either NN to abort. When the ZK clients are 
> able to reconnect, they should sort out the correct leader based on the 
> normal locking schemes.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to