[ https://issues.apache.org/jira/browse/HBASE-7525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13625336#comment-13625336 ]
takeshi.miao commented on HBASE-7525: ------------------------------------- I also tested it with hbase-0.95 branch > A canary monitoring program specifically for regionserver > --------------------------------------------------------- > > Key: HBASE-7525 > URL: https://issues.apache.org/jira/browse/HBASE-7525 > Project: HBase > Issue Type: New Feature > Components: monitoring > Affects Versions: 0.94.0 > Reporter: takeshi.miao > Priority: Minor > Attachments: HBASE-7525-0.95-v0.patch, HBASE-7525-v0.patch, > RegionServerCanary.java > > > *Motivation* > This ticket is to provide a canary monitoring tool specifically for > HRegionserver, details as follows > 1. This tool is required by operation team due to they thought that the > canary for each region of a HBase is too many for them, so I implemented this > coarse-granular one based on the original o.a.h.h.tool.Canary for them > 2. And this tool is implemented by multi-threading, which means the each Get > request sent by a thread. the reason I use this way is due to we suffered the > region server hung issue by now the root cause is still not clear. so this > tool can help operation team to detect hung region server if any. > *example* > 1. the tool docs > ./bin/hbase org.apache.hadoop.hbase.tool.RegionServerCanary -help > Usage: [opts] [regionServerName 1 [regionServrName 2...]] > regionServerName - FQDN serverName, can use linux command:hostname -f to > check your serverName > where [-opts] are: > -help Show this help and exit. > -e Use regionServerName as regular expression > which means the regionServerName is regular expression pattern > -f <B> stop whole program if first error occurs, default is true > -t <N> timeout for a check, default is 600000 (milisecs) > -daemon Continuous check at defined intervals. > -interval <N> Interval between checks (sec) > 2. Will send a request to each regionserver in a HBase cluster > ./bin/hbase org.apache.hadoop.hbase.tool.RegionServerCanary > 3. Will send a request to a regionserver by given name > ./bin/hbase org.apache.hadoop.hbase.tool.RegionServerCanary rs1.domainname > 4. Will send a request to regionserver(s) by given regular-expression > /opt/trend/circus-opstool/bin/hbase-canary-monitor-each-regionserver.sh -e > rs1.domainname.pattern > // another example > ./bin/hbase org.apache.hadoop.hbase.tool.RegionServerCanary -e > tw-poc-tm-puppet-hdn[0-9]\{1,2\}.client.tw.trendnet.org > 5. Will send a request to a regionserver and also set a timeout limit for > this test > // query regionserver:rs1.domainname with timeout limit 10sec > // -f false, means that will not exit this program even test failed > ./bin/hbase org.apache.hadoop.hbase.tool.RegionServerCanary -f false -t 10000 > rs1.domainname > // echo "1" if timeout > echo "$?" > 6. Will run as daemon mode, which means it will send request to each > regionserver periodically > ./bin/hbase org.apache.hadoop.hbase.tool.RegionServerCanary -daemon -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira