[ https://issues.apache.org/jira/browse/HBASE-7525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13754458#comment-13754458 ]
takeshi.miao commented on HBASE-7525: ------------------------------------- Dear [~stack] Here is the answer for your questions {quote} ./hbase-0.95.3-SNAPSHOT/bin/hbase --config /home/stack/conf_hbase org.apache.hadoop.hbase.tool.Canary ... it goes off and does something; default looks to go and get from all regions. {quote} Yes, it's default behavior is just align with the old one, does the all regions monitoring bq. You add 2013-08-29 09:32:16,463 DEBUG [main] tool.Canary: runCount=2. What does it mean ? It is the internal DEBUG msg, for counting how many loop of this monitor instance did; It can help user to observe the monitor instance's behavior whether as expected Following are the questions you asked about _'-regionserver'_ option {quote} {code} Usage: bin/hbase org.apache.hadoop.hbase.tool.Canary [opts] [table/regionserver 1 [table/regionserver 2...]] ... {code} {quote} {quote} Would it be clearer if the -regionserver option took arguments as in -regionserver=rs1,rs2,rs3 etc.? How to interpret this then: Usage: bin/hbase org.apache.hadoop.hbase.tool.Canary -regionserver=rs1 table1 Would above only get regions from table1 on rs1? If no regions from table1 then it would print out there are none? {quote} The option _'-regionserver'_ (regionserver mode) is exclusive with the default mode (region mode), which means user can only choose to use default mode or regionserver mode either bq. I do not know how to read 'table/regionserver 1'. What is the '1'? So it seems the usage output confuses the user, I would like to change it to following, how do you think ? {code} Usage: bin/hbase org.apache.hadoop.hbase.tool.Canary [opts] [table|regionserver [table|regionserver ...]] ... {code} {quote} Or if you pass a table1 when you have a -regionserver option specified, you could just fail with "Cannot pass a tablename when using the -regionserver option" – that'd probably be simplest. {quote} Yes, this is a good suggestion, but currently I would not check this if the passed arguments are whether tableNames in HBase, due to I need to new a HBaseAdim instance to get the table list firstly, then compare them with the passed argument. How do you think that I modify the usage output more precisely for -regionserver option ? such as... {code} ... -regionserver replace the table argument to regionserver, which means to enable regionserver mode, instead of region mode (default) ... {code} Either way is ok for me. I will upload the new patches after we confirm which way to go, and tks for your questions and suggestions :) > A canary monitoring program specifically for regionserver > --------------------------------------------------------- > > Key: HBASE-7525 > URL: https://issues.apache.org/jira/browse/HBASE-7525 > Project: HBase > Issue Type: New Feature > Components: monitoring > Affects Versions: 0.94.0 > Reporter: takeshi.miao > Priority: Critical > Fix For: 0.98.0 > > Attachments: HBASE-7525-0.95-v0.patch, HBASE-7525-0.95-v1.patch, > HBASE-7525-0.95-v3.patch, HBASE-7525-0.95-v4.patch, HBASE-7525-0.95-v6.patch, > HBASE-7525-trunk-v2.patch, HBASE-7525-v0.patch, RegionServerCanary.java > > > *Motivation* > This ticket is to provide a canary monitoring tool specifically for > HRegionserver, details as follows > 1. This tool is required by operation team due to they thought that the > canary for each region of a HBase is too many for them, so I implemented this > coarse-granular one based on the original o.a.h.h.tool.Canary for them > 2. And this tool is implemented by multi-threading, which means the each Get > request sent by a thread. the reason I use this way is due to we suffered the > region server hung issue by now the root cause is still not clear. so this > tool can help operation team to detect hung region server if any. > *example* > 1. the tool docs > ./bin/hbase org.apache.hadoop.hbase.tool.RegionServerCanary -help > Usage: [opts] [regionServerName 1 [regionServrName 2...]] > regionServerName - FQDN serverName, can use linux command:hostname -f to > check your serverName > where [-opts] are: > -help Show this help and exit. > -e Use regionServerName as regular expression > which means the regionServerName is regular expression pattern > -f <B> stop whole program if first error occurs, default is true > -t <N> timeout for a check, default is 600000 (milisecs) > -daemon Continuous check at defined intervals. > -interval <N> Interval between checks (sec) > 2. Will send a request to each regionserver in a HBase cluster > ./bin/hbase org.apache.hadoop.hbase.tool.RegionServerCanary > 3. Will send a request to a regionserver by given name > ./bin/hbase org.apache.hadoop.hbase.tool.RegionServerCanary rs1.domainname > 4. Will send a request to regionserver(s) by given regular-expression > /opt/trend/circus-opstool/bin/hbase-canary-monitor-each-regionserver.sh -e > rs1.domainname.pattern > // another example > ./bin/hbase org.apache.hadoop.hbase.tool.RegionServerCanary -e > tw-poc-tm-puppet-hdn[0-9]\{1,2\}.client.tw.trendnet.org > 5. Will send a request to a regionserver and also set a timeout limit for > this test > // query regionserver:rs1.domainname with timeout limit 10sec > // -f false, means that will not exit this program even test failed > ./bin/hbase org.apache.hadoop.hbase.tool.RegionServerCanary -f false -t 10000 > rs1.domainname > // echo "1" if timeout > echo "$?" > 6. Will run as daemon mode, which means it will send request to each > regionserver periodically > ./bin/hbase org.apache.hadoop.hbase.tool.RegionServerCanary -daemon -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira