[ https://issues.apache.org/jira/browse/HBASE-21126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16599102#comment-16599102 ]
David Manning commented on HBASE-21126: --------------------------------------- Thanks [~elserj] I figured that it was related to the Jenkins issues. Thank you for keeping tabs on this and helping me out. I can fix the checkstyle issues. Most are line lengths which I thought was 120 instead of 100, my fault. The one related to the 150 line method for parseArgs will require more refactoring which will be more disruptive. At the very least, I could move the various calls to "print error, print usage and exit" into one method which should bring down the lines somewhat. I could also refactor it into various check sections since they are all independent of each other. I don't know if this is advisable to do as part of this work, or better to do as a separate work item (as I do want this change to go to branch-1 as well.) > Add ability for HBase Canary to ignore a configurable number of ZooKeeper > down nodes > ------------------------------------------------------------------------------------ > > Key: HBASE-21126 > URL: https://issues.apache.org/jira/browse/HBASE-21126 > Project: HBase > Issue Type: Improvement > Components: canary, Zookeeper > Affects Versions: 1.0.0, 3.0.0, 2.0.0 > Reporter: David Manning > Assignee: David Manning > Priority: Minor > Fix For: 3.0.0, 1.3.0, 2.0.0 > > Attachments: HBASE-21126.master.001.patch, > HBASE-21126.master.002.patch, HBASE-21126.master.003.patch, > zookeeperCanaryLocalTestValidation.txt > > Original Estimate: 48h > Remaining Estimate: 48h > > When running org.apache.hadoop.hbase.tool.Canary with args -zookeeper > -treatFailureAsError, the Canary will try to get a znode from each ZooKeeper > server in the ensemble. If any server is unavailable or unresponsive, the > canary will exit with a failure code. > If we use the Canary to gauge server health, and alert accordingly, this can > be too strict. For example, in a 5-node ZooKeeper cluster, having one node > down is safe and expected in rolling upgrades/patches. > This is a request to allow the Canary to take another parameter > {code:java} > -permittedZookeeperFailures <N>{code} > If N=1, in the 5-node ZooKeeper ensemble example, then the Canary will still > pass if 4 ZooKeeper nodes are reachable, but fail if 3 or fewer are reachable. > (This is my first Jira posting... sorry if I messed anything up.) -- This message was sent by Atlassian JIRA (v7.6.3#76005)