[ 
https://issues.apache.org/jira/browse/HBASE-21126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16596543#comment-16596543
 ] 

David Manning commented on HBASE-21126:
---------------------------------------

Well that's embarrassing, somehow I let a 'd' into my first patch. Updated with 
second patch.

I spent a lot of time trying to create a more useful test case. Ultimately I 
was unable to use a multiple node ZooKeeper ensemble in startMiniZkCluster due 
to HBASE-10283. I may investigate this later. I could hack something together 
which sort of worked by sharing the ZKDatabase instance between each 
ZooKeeperServer, but that seems problematic. It didn't actually do any 
replication between the ZooKeeper instances, but did allow me to do more 
meaningful Canary validation.

> Add ability for HBase Canary to ignore a configurable number of ZooKeeper 
> down nodes
> ------------------------------------------------------------------------------------
>
>                 Key: HBASE-21126
>                 URL: https://issues.apache.org/jira/browse/HBASE-21126
>             Project: HBase
>          Issue Type: Improvement
>          Components: canary, Zookeeper
>    Affects Versions: 1.0.0, 3.0.0, 2.0.0
>            Reporter: David Manning
>            Priority: Minor
>             Fix For: 3.0.0, 1.3.0, 2.0.0
>
>         Attachments: HBASE-21126.master.001.patch, 
> HBASE-21126.master.002.patch
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> When running org.apache.hadoop.hbase.tool.Canary with args -zookeeper 
> -treatFailureAsError, the Canary will try to get a znode from each ZooKeeper 
> server in the ensemble. If any server is unavailable or unresponsive, the 
> canary will exit with a failure code.
> If we use the Canary to gauge server health, and alert accordingly, this can 
> be too strict. For example, in a 5-node ZooKeeper cluster, having one node 
> down is safe and expected in rolling upgrades/patches.
> This is a request to allow the Canary to take another parameter
> {code:java}
> -permittedZookeeperFailures <N>{code}
> If N=1, in the 5-node ZooKeeper ensemble example, then the Canary will still 
> pass if 4 ZooKeeper nodes are reachable, but fail if 3 or fewer are reachable.
> (This is my first Jira posting... sorry if I messed anything up.)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to