[jira] [Commented] (HBASE-7525) A canary monitoring program specifically for regionserver
[ https://issues.apache.org/jira/browse/HBASE-7525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13806331#comment-13806331 ] takeshi.miao commented on HBASE-7525: - [~eclark] tks for reviewing this jira :) A canary monitoring program specifically for regionserver - Key: HBASE-7525 URL: https://issues.apache.org/jira/browse/HBASE-7525 Project: HBase Issue Type: New Feature Components: monitoring Affects Versions: 0.94.0 Reporter: takeshi.miao Assignee: takeshi.miao Priority: Critical Fix For: 0.98.0, 0.96.1 Attachments: HBASE-7525-0.95-v0.patch, HBASE-7525-0.95-v1.patch, HBASE-7525-0.95-v3.patch, HBASE-7525-0.95-v4.patch, HBASE-7525-0.95-v6.patch, HBASE-7525-0.95-v7.patch, HBASE-7525-trunk-v2.patch, HBASE-7525-trunk-v3.patch, HBASE-7525-trunk-v4.patch, HBASE-7525-v0.patch, RegionServerCanary.java *Motivation* This ticket is to provide a canary monitoring tool specifically for HRegionserver, details as follows 1. This tool is required by operation team due to they thought that the canary for each region of a HBase is too many for them, so I implemented this coarse-granular one based on the original o.a.h.h.tool.Canary for them 2. And this tool is implemented by multi-threading, which means the each Get request sent by a thread. the reason I use this way is due to we suffered the region server hung issue by now the root cause is still not clear. so this tool can help operation team to detect hung region server if any. *example* 1. the tool docs ./bin/hbase org.apache.hadoop.hbase.tool.RegionServerCanary -help Usage: [opts] [regionServerName 1 [regionServrName 2...]] regionServerName - FQDN serverName, can use linux command:hostname -f to check your serverName where [-opts] are: -help Show this help and exit. -eUse regionServerName as regular expression which means the regionServerName is regular expression pattern -f B stop whole program if first error occurs, default is true -t N timeout for a check, default is 60 (milisecs) -daemonContinuous check at defined intervals. -interval N Interval between checks (sec) 2. Will send a request to each regionserver in a HBase cluster ./bin/hbase org.apache.hadoop.hbase.tool.RegionServerCanary 3. Will send a request to a regionserver by given name ./bin/hbase org.apache.hadoop.hbase.tool.RegionServerCanary rs1.domainname 4. Will send a request to regionserver(s) by given regular-expression /opt/trend/circus-opstool/bin/hbase-canary-monitor-each-regionserver.sh -e rs1.domainname.pattern // another example ./bin/hbase org.apache.hadoop.hbase.tool.RegionServerCanary -e tw-poc-tm-puppet-hdn[0-9]\{1,2\}.client.tw.trendnet.org 5. Will send a request to a regionserver and also set a timeout limit for this test // query regionserver:rs1.domainname with timeout limit 10sec // -f false, means that will not exit this program even test failed ./bin/hbase org.apache.hadoop.hbase.tool.RegionServerCanary -f false -t 1 rs1.domainname // echo 1 if timeout echo $? 6. Will run as daemon mode, which means it will send request to each regionserver periodically ./bin/hbase org.apache.hadoop.hbase.tool.RegionServerCanary -daemon -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-7525) A canary monitoring program specifically for regionserver
[ https://issues.apache.org/jira/browse/HBASE-7525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13805632#comment-13805632 ] Elliott Clark commented on HBASE-7525: -- +1 there are some docs that should be added. But I can add those in a new jira. A canary monitoring program specifically for regionserver - Key: HBASE-7525 URL: https://issues.apache.org/jira/browse/HBASE-7525 Project: HBase Issue Type: New Feature Components: monitoring Affects Versions: 0.94.0 Reporter: takeshi.miao Priority: Critical Fix For: 0.98.0, 0.96.1 Attachments: HBASE-7525-0.95-v0.patch, HBASE-7525-0.95-v1.patch, HBASE-7525-0.95-v3.patch, HBASE-7525-0.95-v4.patch, HBASE-7525-0.95-v6.patch, HBASE-7525-0.95-v7.patch, HBASE-7525-trunk-v2.patch, HBASE-7525-trunk-v3.patch, HBASE-7525-trunk-v4.patch, HBASE-7525-v0.patch, RegionServerCanary.java *Motivation* This ticket is to provide a canary monitoring tool specifically for HRegionserver, details as follows 1. This tool is required by operation team due to they thought that the canary for each region of a HBase is too many for them, so I implemented this coarse-granular one based on the original o.a.h.h.tool.Canary for them 2. And this tool is implemented by multi-threading, which means the each Get request sent by a thread. the reason I use this way is due to we suffered the region server hung issue by now the root cause is still not clear. so this tool can help operation team to detect hung region server if any. *example* 1. the tool docs ./bin/hbase org.apache.hadoop.hbase.tool.RegionServerCanary -help Usage: [opts] [regionServerName 1 [regionServrName 2...]] regionServerName - FQDN serverName, can use linux command:hostname -f to check your serverName where [-opts] are: -help Show this help and exit. -eUse regionServerName as regular expression which means the regionServerName is regular expression pattern -f B stop whole program if first error occurs, default is true -t N timeout for a check, default is 60 (milisecs) -daemonContinuous check at defined intervals. -interval N Interval between checks (sec) 2. Will send a request to each regionserver in a HBase cluster ./bin/hbase org.apache.hadoop.hbase.tool.RegionServerCanary 3. Will send a request to a regionserver by given name ./bin/hbase org.apache.hadoop.hbase.tool.RegionServerCanary rs1.domainname 4. Will send a request to regionserver(s) by given regular-expression /opt/trend/circus-opstool/bin/hbase-canary-monitor-each-regionserver.sh -e rs1.domainname.pattern // another example ./bin/hbase org.apache.hadoop.hbase.tool.RegionServerCanary -e tw-poc-tm-puppet-hdn[0-9]\{1,2\}.client.tw.trendnet.org 5. Will send a request to a regionserver and also set a timeout limit for this test // query regionserver:rs1.domainname with timeout limit 10sec // -f false, means that will not exit this program even test failed ./bin/hbase org.apache.hadoop.hbase.tool.RegionServerCanary -f false -t 1 rs1.domainname // echo 1 if timeout echo $? 6. Will run as daemon mode, which means it will send request to each regionserver periodically ./bin/hbase org.apache.hadoop.hbase.tool.RegionServerCanary -daemon -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-7525) A canary monitoring program specifically for regionserver
[ https://issues.apache.org/jira/browse/HBASE-7525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13805848#comment-13805848 ] Hudson commented on HBASE-7525: --- SUCCESS: Integrated in hbase-0.96-hadoop2 #102 (See [https://builds.apache.org/job/hbase-0.96-hadoop2/102/]) HBASE-7525 A canary monitoring program specifically for regionserver (takeshi.miao) (eclark: rev 1535847) * /hbase/branches/0.96/hbase-server/src/main/java/org/apache/hadoop/hbase/tool/Canary.java A canary monitoring program specifically for regionserver - Key: HBASE-7525 URL: https://issues.apache.org/jira/browse/HBASE-7525 Project: HBase Issue Type: New Feature Components: monitoring Affects Versions: 0.94.0 Reporter: takeshi.miao Assignee: takeshi.miao Priority: Critical Fix For: 0.98.0, 0.96.1 Attachments: HBASE-7525-0.95-v0.patch, HBASE-7525-0.95-v1.patch, HBASE-7525-0.95-v3.patch, HBASE-7525-0.95-v4.patch, HBASE-7525-0.95-v6.patch, HBASE-7525-0.95-v7.patch, HBASE-7525-trunk-v2.patch, HBASE-7525-trunk-v3.patch, HBASE-7525-trunk-v4.patch, HBASE-7525-v0.patch, RegionServerCanary.java *Motivation* This ticket is to provide a canary monitoring tool specifically for HRegionserver, details as follows 1. This tool is required by operation team due to they thought that the canary for each region of a HBase is too many for them, so I implemented this coarse-granular one based on the original o.a.h.h.tool.Canary for them 2. And this tool is implemented by multi-threading, which means the each Get request sent by a thread. the reason I use this way is due to we suffered the region server hung issue by now the root cause is still not clear. so this tool can help operation team to detect hung region server if any. *example* 1. the tool docs ./bin/hbase org.apache.hadoop.hbase.tool.RegionServerCanary -help Usage: [opts] [regionServerName 1 [regionServrName 2...]] regionServerName - FQDN serverName, can use linux command:hostname -f to check your serverName where [-opts] are: -help Show this help and exit. -eUse regionServerName as regular expression which means the regionServerName is regular expression pattern -f B stop whole program if first error occurs, default is true -t N timeout for a check, default is 60 (milisecs) -daemonContinuous check at defined intervals. -interval N Interval between checks (sec) 2. Will send a request to each regionserver in a HBase cluster ./bin/hbase org.apache.hadoop.hbase.tool.RegionServerCanary 3. Will send a request to a regionserver by given name ./bin/hbase org.apache.hadoop.hbase.tool.RegionServerCanary rs1.domainname 4. Will send a request to regionserver(s) by given regular-expression /opt/trend/circus-opstool/bin/hbase-canary-monitor-each-regionserver.sh -e rs1.domainname.pattern // another example ./bin/hbase org.apache.hadoop.hbase.tool.RegionServerCanary -e tw-poc-tm-puppet-hdn[0-9]\{1,2\}.client.tw.trendnet.org 5. Will send a request to a regionserver and also set a timeout limit for this test // query regionserver:rs1.domainname with timeout limit 10sec // -f false, means that will not exit this program even test failed ./bin/hbase org.apache.hadoop.hbase.tool.RegionServerCanary -f false -t 1 rs1.domainname // echo 1 if timeout echo $? 6. Will run as daemon mode, which means it will send request to each regionserver periodically ./bin/hbase org.apache.hadoop.hbase.tool.RegionServerCanary -daemon -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-7525) A canary monitoring program specifically for regionserver
[ https://issues.apache.org/jira/browse/HBASE-7525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13805903#comment-13805903 ] Hudson commented on HBASE-7525: --- SUCCESS: Integrated in hbase-0.96 #162 (See [https://builds.apache.org/job/hbase-0.96/162/]) HBASE-7525 A canary monitoring program specifically for regionserver (takeshi.miao) (eclark: rev 1535847) * /hbase/branches/0.96/hbase-server/src/main/java/org/apache/hadoop/hbase/tool/Canary.java A canary monitoring program specifically for regionserver - Key: HBASE-7525 URL: https://issues.apache.org/jira/browse/HBASE-7525 Project: HBase Issue Type: New Feature Components: monitoring Affects Versions: 0.94.0 Reporter: takeshi.miao Assignee: takeshi.miao Priority: Critical Fix For: 0.98.0, 0.96.1 Attachments: HBASE-7525-0.95-v0.patch, HBASE-7525-0.95-v1.patch, HBASE-7525-0.95-v3.patch, HBASE-7525-0.95-v4.patch, HBASE-7525-0.95-v6.patch, HBASE-7525-0.95-v7.patch, HBASE-7525-trunk-v2.patch, HBASE-7525-trunk-v3.patch, HBASE-7525-trunk-v4.patch, HBASE-7525-v0.patch, RegionServerCanary.java *Motivation* This ticket is to provide a canary monitoring tool specifically for HRegionserver, details as follows 1. This tool is required by operation team due to they thought that the canary for each region of a HBase is too many for them, so I implemented this coarse-granular one based on the original o.a.h.h.tool.Canary for them 2. And this tool is implemented by multi-threading, which means the each Get request sent by a thread. the reason I use this way is due to we suffered the region server hung issue by now the root cause is still not clear. so this tool can help operation team to detect hung region server if any. *example* 1. the tool docs ./bin/hbase org.apache.hadoop.hbase.tool.RegionServerCanary -help Usage: [opts] [regionServerName 1 [regionServrName 2...]] regionServerName - FQDN serverName, can use linux command:hostname -f to check your serverName where [-opts] are: -help Show this help and exit. -eUse regionServerName as regular expression which means the regionServerName is regular expression pattern -f B stop whole program if first error occurs, default is true -t N timeout for a check, default is 60 (milisecs) -daemonContinuous check at defined intervals. -interval N Interval between checks (sec) 2. Will send a request to each regionserver in a HBase cluster ./bin/hbase org.apache.hadoop.hbase.tool.RegionServerCanary 3. Will send a request to a regionserver by given name ./bin/hbase org.apache.hadoop.hbase.tool.RegionServerCanary rs1.domainname 4. Will send a request to regionserver(s) by given regular-expression /opt/trend/circus-opstool/bin/hbase-canary-monitor-each-regionserver.sh -e rs1.domainname.pattern // another example ./bin/hbase org.apache.hadoop.hbase.tool.RegionServerCanary -e tw-poc-tm-puppet-hdn[0-9]\{1,2\}.client.tw.trendnet.org 5. Will send a request to a regionserver and also set a timeout limit for this test // query regionserver:rs1.domainname with timeout limit 10sec // -f false, means that will not exit this program even test failed ./bin/hbase org.apache.hadoop.hbase.tool.RegionServerCanary -f false -t 1 rs1.domainname // echo 1 if timeout echo $? 6. Will run as daemon mode, which means it will send request to each regionserver periodically ./bin/hbase org.apache.hadoop.hbase.tool.RegionServerCanary -daemon -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-7525) A canary monitoring program specifically for regionserver
[ https://issues.apache.org/jira/browse/HBASE-7525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13805914#comment-13805914 ] Hudson commented on HBASE-7525: --- SUCCESS: Integrated in HBase-TRUNK-on-Hadoop-2.0.0 #811 (See [https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-2.0.0/811/]) HBASE-7525 A canary monitoring program specifically for regionserver (takeshi.miao) (eclark: rev 1535846) * /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/tool/Canary.java A canary monitoring program specifically for regionserver - Key: HBASE-7525 URL: https://issues.apache.org/jira/browse/HBASE-7525 Project: HBase Issue Type: New Feature Components: monitoring Affects Versions: 0.94.0 Reporter: takeshi.miao Assignee: takeshi.miao Priority: Critical Fix For: 0.98.0, 0.96.1 Attachments: HBASE-7525-0.95-v0.patch, HBASE-7525-0.95-v1.patch, HBASE-7525-0.95-v3.patch, HBASE-7525-0.95-v4.patch, HBASE-7525-0.95-v6.patch, HBASE-7525-0.95-v7.patch, HBASE-7525-trunk-v2.patch, HBASE-7525-trunk-v3.patch, HBASE-7525-trunk-v4.patch, HBASE-7525-v0.patch, RegionServerCanary.java *Motivation* This ticket is to provide a canary monitoring tool specifically for HRegionserver, details as follows 1. This tool is required by operation team due to they thought that the canary for each region of a HBase is too many for them, so I implemented this coarse-granular one based on the original o.a.h.h.tool.Canary for them 2. And this tool is implemented by multi-threading, which means the each Get request sent by a thread. the reason I use this way is due to we suffered the region server hung issue by now the root cause is still not clear. so this tool can help operation team to detect hung region server if any. *example* 1. the tool docs ./bin/hbase org.apache.hadoop.hbase.tool.RegionServerCanary -help Usage: [opts] [regionServerName 1 [regionServrName 2...]] regionServerName - FQDN serverName, can use linux command:hostname -f to check your serverName where [-opts] are: -help Show this help and exit. -eUse regionServerName as regular expression which means the regionServerName is regular expression pattern -f B stop whole program if first error occurs, default is true -t N timeout for a check, default is 60 (milisecs) -daemonContinuous check at defined intervals. -interval N Interval between checks (sec) 2. Will send a request to each regionserver in a HBase cluster ./bin/hbase org.apache.hadoop.hbase.tool.RegionServerCanary 3. Will send a request to a regionserver by given name ./bin/hbase org.apache.hadoop.hbase.tool.RegionServerCanary rs1.domainname 4. Will send a request to regionserver(s) by given regular-expression /opt/trend/circus-opstool/bin/hbase-canary-monitor-each-regionserver.sh -e rs1.domainname.pattern // another example ./bin/hbase org.apache.hadoop.hbase.tool.RegionServerCanary -e tw-poc-tm-puppet-hdn[0-9]\{1,2\}.client.tw.trendnet.org 5. Will send a request to a regionserver and also set a timeout limit for this test // query regionserver:rs1.domainname with timeout limit 10sec // -f false, means that will not exit this program even test failed ./bin/hbase org.apache.hadoop.hbase.tool.RegionServerCanary -f false -t 1 rs1.domainname // echo 1 if timeout echo $? 6. Will run as daemon mode, which means it will send request to each regionserver periodically ./bin/hbase org.apache.hadoop.hbase.tool.RegionServerCanary -daemon -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-7525) A canary monitoring program specifically for regionserver
[ https://issues.apache.org/jira/browse/HBASE-7525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13805946#comment-13805946 ] Hudson commented on HBASE-7525: --- SUCCESS: Integrated in HBase-TRUNK #4648 (See [https://builds.apache.org/job/HBase-TRUNK/4648/]) HBASE-7525 A canary monitoring program specifically for regionserver (takeshi.miao) (eclark: rev 1535846) * /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/tool/Canary.java A canary monitoring program specifically for regionserver - Key: HBASE-7525 URL: https://issues.apache.org/jira/browse/HBASE-7525 Project: HBase Issue Type: New Feature Components: monitoring Affects Versions: 0.94.0 Reporter: takeshi.miao Assignee: takeshi.miao Priority: Critical Fix For: 0.98.0, 0.96.1 Attachments: HBASE-7525-0.95-v0.patch, HBASE-7525-0.95-v1.patch, HBASE-7525-0.95-v3.patch, HBASE-7525-0.95-v4.patch, HBASE-7525-0.95-v6.patch, HBASE-7525-0.95-v7.patch, HBASE-7525-trunk-v2.patch, HBASE-7525-trunk-v3.patch, HBASE-7525-trunk-v4.patch, HBASE-7525-v0.patch, RegionServerCanary.java *Motivation* This ticket is to provide a canary monitoring tool specifically for HRegionserver, details as follows 1. This tool is required by operation team due to they thought that the canary for each region of a HBase is too many for them, so I implemented this coarse-granular one based on the original o.a.h.h.tool.Canary for them 2. And this tool is implemented by multi-threading, which means the each Get request sent by a thread. the reason I use this way is due to we suffered the region server hung issue by now the root cause is still not clear. so this tool can help operation team to detect hung region server if any. *example* 1. the tool docs ./bin/hbase org.apache.hadoop.hbase.tool.RegionServerCanary -help Usage: [opts] [regionServerName 1 [regionServrName 2...]] regionServerName - FQDN serverName, can use linux command:hostname -f to check your serverName where [-opts] are: -help Show this help and exit. -eUse regionServerName as regular expression which means the regionServerName is regular expression pattern -f B stop whole program if first error occurs, default is true -t N timeout for a check, default is 60 (milisecs) -daemonContinuous check at defined intervals. -interval N Interval between checks (sec) 2. Will send a request to each regionserver in a HBase cluster ./bin/hbase org.apache.hadoop.hbase.tool.RegionServerCanary 3. Will send a request to a regionserver by given name ./bin/hbase org.apache.hadoop.hbase.tool.RegionServerCanary rs1.domainname 4. Will send a request to regionserver(s) by given regular-expression /opt/trend/circus-opstool/bin/hbase-canary-monitor-each-regionserver.sh -e rs1.domainname.pattern // another example ./bin/hbase org.apache.hadoop.hbase.tool.RegionServerCanary -e tw-poc-tm-puppet-hdn[0-9]\{1,2\}.client.tw.trendnet.org 5. Will send a request to a regionserver and also set a timeout limit for this test // query regionserver:rs1.domainname with timeout limit 10sec // -f false, means that will not exit this program even test failed ./bin/hbase org.apache.hadoop.hbase.tool.RegionServerCanary -f false -t 1 rs1.domainname // echo 1 if timeout echo $? 6. Will run as daemon mode, which means it will send request to each regionserver periodically ./bin/hbase org.apache.hadoop.hbase.tool.RegionServerCanary -daemon -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-7525) A canary monitoring program specifically for regionserver
[ https://issues.apache.org/jira/browse/HBASE-7525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13776161#comment-13776161 ] takeshi.miao commented on HBASE-7525: - Dear [~stack] I had already unloaded the patch [HBASE-7525-trunk-v4.patch|https://issues.apache.org/jira/secure/attachment/12604509/HBASE-7525-trunk-v4.patch], but I not sure why the CI job did not execute it. A canary monitoring program specifically for regionserver - Key: HBASE-7525 URL: https://issues.apache.org/jira/browse/HBASE-7525 Project: HBase Issue Type: New Feature Components: monitoring Affects Versions: 0.94.0 Reporter: takeshi.miao Priority: Critical Fix For: 0.98.0 Attachments: HBASE-7525-0.95-v0.patch, HBASE-7525-0.95-v1.patch, HBASE-7525-0.95-v3.patch, HBASE-7525-0.95-v4.patch, HBASE-7525-0.95-v6.patch, HBASE-7525-0.95-v7.patch, HBASE-7525-trunk-v2.patch, HBASE-7525-trunk-v3.patch, HBASE-7525-trunk-v4.patch, HBASE-7525-v0.patch, RegionServerCanary.java *Motivation* This ticket is to provide a canary monitoring tool specifically for HRegionserver, details as follows 1. This tool is required by operation team due to they thought that the canary for each region of a HBase is too many for them, so I implemented this coarse-granular one based on the original o.a.h.h.tool.Canary for them 2. And this tool is implemented by multi-threading, which means the each Get request sent by a thread. the reason I use this way is due to we suffered the region server hung issue by now the root cause is still not clear. so this tool can help operation team to detect hung region server if any. *example* 1. the tool docs ./bin/hbase org.apache.hadoop.hbase.tool.RegionServerCanary -help Usage: [opts] [regionServerName 1 [regionServrName 2...]] regionServerName - FQDN serverName, can use linux command:hostname -f to check your serverName where [-opts] are: -help Show this help and exit. -eUse regionServerName as regular expression which means the regionServerName is regular expression pattern -f B stop whole program if first error occurs, default is true -t N timeout for a check, default is 60 (milisecs) -daemonContinuous check at defined intervals. -interval N Interval between checks (sec) 2. Will send a request to each regionserver in a HBase cluster ./bin/hbase org.apache.hadoop.hbase.tool.RegionServerCanary 3. Will send a request to a regionserver by given name ./bin/hbase org.apache.hadoop.hbase.tool.RegionServerCanary rs1.domainname 4. Will send a request to regionserver(s) by given regular-expression /opt/trend/circus-opstool/bin/hbase-canary-monitor-each-regionserver.sh -e rs1.domainname.pattern // another example ./bin/hbase org.apache.hadoop.hbase.tool.RegionServerCanary -e tw-poc-tm-puppet-hdn[0-9]\{1,2\}.client.tw.trendnet.org 5. Will send a request to a regionserver and also set a timeout limit for this test // query regionserver:rs1.domainname with timeout limit 10sec // -f false, means that will not exit this program even test failed ./bin/hbase org.apache.hadoop.hbase.tool.RegionServerCanary -f false -t 1 rs1.domainname // echo 1 if timeout echo $? 6. Will run as daemon mode, which means it will send request to each regionserver periodically ./bin/hbase org.apache.hadoop.hbase.tool.RegionServerCanary -daemon -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7525) A canary monitoring program specifically for regionserver
[ https://issues.apache.org/jira/browse/HBASE-7525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13776225#comment-13776225 ] Hadoop QA commented on HBASE-7525: -- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12604509/HBASE-7525-trunk-v4.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 hadoop1.0{color}. The patch compiles against the hadoop 1.0 profile. {color:green}+1 hadoop2.0{color}. The patch compiles against the hadoop 2.0 profile. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:red}-1 findbugs{color}. The patch appears to introduce 2 new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 lineLengths{color}. The patch does not introduce lines longer than 100 {color:green}+1 site{color}. The mvn site goal succeeds with this patch. {color:red}-1 core tests{color}. The patch failed these unit tests: org.apache.hadoop.hbase.regionserver.TestAtomicOperation Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/7354//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7354//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7354//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-thrift.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7354//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7354//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7354//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7354//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7354//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7354//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/7354//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/7354//console This message is automatically generated. A canary monitoring program specifically for regionserver - Key: HBASE-7525 URL: https://issues.apache.org/jira/browse/HBASE-7525 Project: HBase Issue Type: New Feature Components: monitoring Affects Versions: 0.94.0 Reporter: takeshi.miao Priority: Critical Fix For: 0.98.0 Attachments: HBASE-7525-0.95-v0.patch, HBASE-7525-0.95-v1.patch, HBASE-7525-0.95-v3.patch, HBASE-7525-0.95-v4.patch, HBASE-7525-0.95-v6.patch, HBASE-7525-0.95-v7.patch, HBASE-7525-trunk-v2.patch, HBASE-7525-trunk-v3.patch, HBASE-7525-trunk-v4.patch, HBASE-7525-v0.patch, RegionServerCanary.java *Motivation* This ticket is to provide a canary monitoring tool specifically for HRegionserver, details as follows 1. This tool is required by operation team due to they thought that the canary for each region of a HBase is too many for them, so I implemented this coarse-granular one based on the original o.a.h.h.tool.Canary for them 2. And this tool is implemented by multi-threading, which means the each Get request sent by a thread. the reason I use this way is due to we suffered the region server hung issue by now the root cause is still not clear. so this tool can help operation team to detect hung region server if any. *example* 1. the tool docs ./bin/hbase org.apache.hadoop.hbase.tool.RegionServerCanary -help Usage: [opts] [regionServerName 1 [regionServrName 2...]] regionServerName - FQDN serverName, can use linux command:hostname -f to
[jira] [Commented] (HBASE-7525) A canary monitoring program specifically for regionserver
[ https://issues.apache.org/jira/browse/HBASE-7525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13774256#comment-13774256 ] takeshi.miao commented on HBASE-7525: - Dear [~stack] Sorry about this, I think that I might test the patch on 0.95 branch, but forgot to test it on trunk. Currently fixed it. A canary monitoring program specifically for regionserver - Key: HBASE-7525 URL: https://issues.apache.org/jira/browse/HBASE-7525 Project: HBase Issue Type: New Feature Components: monitoring Affects Versions: 0.94.0 Reporter: takeshi.miao Priority: Critical Fix For: 0.98.0 Attachments: HBASE-7525-0.95-v0.patch, HBASE-7525-0.95-v1.patch, HBASE-7525-0.95-v3.patch, HBASE-7525-0.95-v4.patch, HBASE-7525-0.95-v6.patch, HBASE-7525-0.95-v7.patch, HBASE-7525-trunk-v2.patch, HBASE-7525-trunk-v3.patch, HBASE-7525-trunk-v4.patch, HBASE-7525-v0.patch, RegionServerCanary.java *Motivation* This ticket is to provide a canary monitoring tool specifically for HRegionserver, details as follows 1. This tool is required by operation team due to they thought that the canary for each region of a HBase is too many for them, so I implemented this coarse-granular one based on the original o.a.h.h.tool.Canary for them 2. And this tool is implemented by multi-threading, which means the each Get request sent by a thread. the reason I use this way is due to we suffered the region server hung issue by now the root cause is still not clear. so this tool can help operation team to detect hung region server if any. *example* 1. the tool docs ./bin/hbase org.apache.hadoop.hbase.tool.RegionServerCanary -help Usage: [opts] [regionServerName 1 [regionServrName 2...]] regionServerName - FQDN serverName, can use linux command:hostname -f to check your serverName where [-opts] are: -help Show this help and exit. -eUse regionServerName as regular expression which means the regionServerName is regular expression pattern -f B stop whole program if first error occurs, default is true -t N timeout for a check, default is 60 (milisecs) -daemonContinuous check at defined intervals. -interval N Interval between checks (sec) 2. Will send a request to each regionserver in a HBase cluster ./bin/hbase org.apache.hadoop.hbase.tool.RegionServerCanary 3. Will send a request to a regionserver by given name ./bin/hbase org.apache.hadoop.hbase.tool.RegionServerCanary rs1.domainname 4. Will send a request to regionserver(s) by given regular-expression /opt/trend/circus-opstool/bin/hbase-canary-monitor-each-regionserver.sh -e rs1.domainname.pattern // another example ./bin/hbase org.apache.hadoop.hbase.tool.RegionServerCanary -e tw-poc-tm-puppet-hdn[0-9]\{1,2\}.client.tw.trendnet.org 5. Will send a request to a regionserver and also set a timeout limit for this test // query regionserver:rs1.domainname with timeout limit 10sec // -f false, means that will not exit this program even test failed ./bin/hbase org.apache.hadoop.hbase.tool.RegionServerCanary -f false -t 1 rs1.domainname // echo 1 if timeout echo $? 6. Will run as daemon mode, which means it will send request to each regionserver periodically ./bin/hbase org.apache.hadoop.hbase.tool.RegionServerCanary -daemon -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7525) A canary monitoring program specifically for regionserver
[ https://issues.apache.org/jira/browse/HBASE-7525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13772618#comment-13772618 ] stack commented on HBASE-7525: -- That should be good [~takeshi.miao]. I went to commit but trunk patch has this issue: [INFO] [ERROR] Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:2.5.1:compile (default-compile) on project hbase-server: Compilation failure [ERROR] /Users/stack/checkouts/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/tool/Canary.java:[610,43] cannot find symbol [ERROR] symbol : method getNameAsString() [ERROR] location: class byte[] [ERROR] - [Help 1] Does it work for you? Thanks. A canary monitoring program specifically for regionserver - Key: HBASE-7525 URL: https://issues.apache.org/jira/browse/HBASE-7525 Project: HBase Issue Type: New Feature Components: monitoring Affects Versions: 0.94.0 Reporter: takeshi.miao Priority: Critical Fix For: 0.98.0 Attachments: HBASE-7525-0.95-v0.patch, HBASE-7525-0.95-v1.patch, HBASE-7525-0.95-v3.patch, HBASE-7525-0.95-v4.patch, HBASE-7525-0.95-v6.patch, HBASE-7525-0.95-v7.patch, HBASE-7525-trunk-v2.patch, HBASE-7525-trunk-v3.patch, HBASE-7525-v0.patch, RegionServerCanary.java *Motivation* This ticket is to provide a canary monitoring tool specifically for HRegionserver, details as follows 1. This tool is required by operation team due to they thought that the canary for each region of a HBase is too many for them, so I implemented this coarse-granular one based on the original o.a.h.h.tool.Canary for them 2. And this tool is implemented by multi-threading, which means the each Get request sent by a thread. the reason I use this way is due to we suffered the region server hung issue by now the root cause is still not clear. so this tool can help operation team to detect hung region server if any. *example* 1. the tool docs ./bin/hbase org.apache.hadoop.hbase.tool.RegionServerCanary -help Usage: [opts] [regionServerName 1 [regionServrName 2...]] regionServerName - FQDN serverName, can use linux command:hostname -f to check your serverName where [-opts] are: -help Show this help and exit. -eUse regionServerName as regular expression which means the regionServerName is regular expression pattern -f B stop whole program if first error occurs, default is true -t N timeout for a check, default is 60 (milisecs) -daemonContinuous check at defined intervals. -interval N Interval between checks (sec) 2. Will send a request to each regionserver in a HBase cluster ./bin/hbase org.apache.hadoop.hbase.tool.RegionServerCanary 3. Will send a request to a regionserver by given name ./bin/hbase org.apache.hadoop.hbase.tool.RegionServerCanary rs1.domainname 4. Will send a request to regionserver(s) by given regular-expression /opt/trend/circus-opstool/bin/hbase-canary-monitor-each-regionserver.sh -e rs1.domainname.pattern // another example ./bin/hbase org.apache.hadoop.hbase.tool.RegionServerCanary -e tw-poc-tm-puppet-hdn[0-9]\{1,2\}.client.tw.trendnet.org 5. Will send a request to a regionserver and also set a timeout limit for this test // query regionserver:rs1.domainname with timeout limit 10sec // -f false, means that will not exit this program even test failed ./bin/hbase org.apache.hadoop.hbase.tool.RegionServerCanary -f false -t 1 rs1.domainname // echo 1 if timeout echo $? 6. Will run as daemon mode, which means it will send request to each regionserver periodically ./bin/hbase org.apache.hadoop.hbase.tool.RegionServerCanary -daemon -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7525) A canary monitoring program specifically for regionserver
[ https://issues.apache.org/jira/browse/HBASE-7525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13767709#comment-13767709 ] Hadoop QA commented on HBASE-7525: -- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12603227/HBASE-7525-0.95-v7.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:red}-1 hadoop1.0{color}. The patch failed to compile against the hadoop 1.0 profile. Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/7232//console This message is automatically generated. A canary monitoring program specifically for regionserver - Key: HBASE-7525 URL: https://issues.apache.org/jira/browse/HBASE-7525 Project: HBase Issue Type: New Feature Components: monitoring Affects Versions: 0.94.0 Reporter: takeshi.miao Priority: Critical Fix For: 0.98.0 Attachments: HBASE-7525-0.95-v0.patch, HBASE-7525-0.95-v1.patch, HBASE-7525-0.95-v3.patch, HBASE-7525-0.95-v4.patch, HBASE-7525-0.95-v6.patch, HBASE-7525-0.95-v7.patch, HBASE-7525-trunk-v2.patch, HBASE-7525-trunk-v3.patch, HBASE-7525-v0.patch, RegionServerCanary.java *Motivation* This ticket is to provide a canary monitoring tool specifically for HRegionserver, details as follows 1. This tool is required by operation team due to they thought that the canary for each region of a HBase is too many for them, so I implemented this coarse-granular one based on the original o.a.h.h.tool.Canary for them 2. And this tool is implemented by multi-threading, which means the each Get request sent by a thread. the reason I use this way is due to we suffered the region server hung issue by now the root cause is still not clear. so this tool can help operation team to detect hung region server if any. *example* 1. the tool docs ./bin/hbase org.apache.hadoop.hbase.tool.RegionServerCanary -help Usage: [opts] [regionServerName 1 [regionServrName 2...]] regionServerName - FQDN serverName, can use linux command:hostname -f to check your serverName where [-opts] are: -help Show this help and exit. -eUse regionServerName as regular expression which means the regionServerName is regular expression pattern -f B stop whole program if first error occurs, default is true -t N timeout for a check, default is 60 (milisecs) -daemonContinuous check at defined intervals. -interval N Interval between checks (sec) 2. Will send a request to each regionserver in a HBase cluster ./bin/hbase org.apache.hadoop.hbase.tool.RegionServerCanary 3. Will send a request to a regionserver by given name ./bin/hbase org.apache.hadoop.hbase.tool.RegionServerCanary rs1.domainname 4. Will send a request to regionserver(s) by given regular-expression /opt/trend/circus-opstool/bin/hbase-canary-monitor-each-regionserver.sh -e rs1.domainname.pattern // another example ./bin/hbase org.apache.hadoop.hbase.tool.RegionServerCanary -e tw-poc-tm-puppet-hdn[0-9]\{1,2\}.client.tw.trendnet.org 5. Will send a request to a regionserver and also set a timeout limit for this test // query regionserver:rs1.domainname with timeout limit 10sec // -f false, means that will not exit this program even test failed ./bin/hbase org.apache.hadoop.hbase.tool.RegionServerCanary -f false -t 1 rs1.domainname // echo 1 if timeout echo $? 6. Will run as daemon mode, which means it will send request to each regionserver periodically ./bin/hbase org.apache.hadoop.hbase.tool.RegionServerCanary -daemon -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7525) A canary monitoring program specifically for regionserver
[ https://issues.apache.org/jira/browse/HBASE-7525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13767713#comment-13767713 ] takeshi.miao commented on HBASE-7525: - Dear [~stack] I uploaded the new patches for both 0.95 and trunk with following changes. 1. added a check method for user wheteher pass the tableName with -regionserver option {code} # user pass tableNames 't1' and 't2' with '-regionserver' option bin/hbase org.apache.hadoop.hbase.tool.Canary -regionserver t1 t2 ... # will see following error msg from stderr Cannot pass a tablename when using the -regionserver option, tablenames:[t1, t2] {code} 2. changed the usage output. {code} bin/hbase org.apache.hadoop.hbase.tool.Canary -help Usage: bin/hbase org.apache.hadoop.hbase.tool.Canary [opts] [table1 [table2]...] | [regionserver1 [regionserver2]..] ... {code} 3. removed 'DEBUG [main] tool.Canary: runCount=...' from log msg Pls tell me if any question, tks~ A canary monitoring program specifically for regionserver - Key: HBASE-7525 URL: https://issues.apache.org/jira/browse/HBASE-7525 Project: HBase Issue Type: New Feature Components: monitoring Affects Versions: 0.94.0 Reporter: takeshi.miao Priority: Critical Fix For: 0.98.0 Attachments: HBASE-7525-0.95-v0.patch, HBASE-7525-0.95-v1.patch, HBASE-7525-0.95-v3.patch, HBASE-7525-0.95-v4.patch, HBASE-7525-0.95-v6.patch, HBASE-7525-0.95-v7.patch, HBASE-7525-trunk-v2.patch, HBASE-7525-trunk-v3.patch, HBASE-7525-v0.patch, RegionServerCanary.java *Motivation* This ticket is to provide a canary monitoring tool specifically for HRegionserver, details as follows 1. This tool is required by operation team due to they thought that the canary for each region of a HBase is too many for them, so I implemented this coarse-granular one based on the original o.a.h.h.tool.Canary for them 2. And this tool is implemented by multi-threading, which means the each Get request sent by a thread. the reason I use this way is due to we suffered the region server hung issue by now the root cause is still not clear. so this tool can help operation team to detect hung region server if any. *example* 1. the tool docs ./bin/hbase org.apache.hadoop.hbase.tool.RegionServerCanary -help Usage: [opts] [regionServerName 1 [regionServrName 2...]] regionServerName - FQDN serverName, can use linux command:hostname -f to check your serverName where [-opts] are: -help Show this help and exit. -eUse regionServerName as regular expression which means the regionServerName is regular expression pattern -f B stop whole program if first error occurs, default is true -t N timeout for a check, default is 60 (milisecs) -daemonContinuous check at defined intervals. -interval N Interval between checks (sec) 2. Will send a request to each regionserver in a HBase cluster ./bin/hbase org.apache.hadoop.hbase.tool.RegionServerCanary 3. Will send a request to a regionserver by given name ./bin/hbase org.apache.hadoop.hbase.tool.RegionServerCanary rs1.domainname 4. Will send a request to regionserver(s) by given regular-expression /opt/trend/circus-opstool/bin/hbase-canary-monitor-each-regionserver.sh -e rs1.domainname.pattern // another example ./bin/hbase org.apache.hadoop.hbase.tool.RegionServerCanary -e tw-poc-tm-puppet-hdn[0-9]\{1,2\}.client.tw.trendnet.org 5. Will send a request to a regionserver and also set a timeout limit for this test // query regionserver:rs1.domainname with timeout limit 10sec // -f false, means that will not exit this program even test failed ./bin/hbase org.apache.hadoop.hbase.tool.RegionServerCanary -f false -t 1 rs1.domainname // echo 1 if timeout echo $? 6. Will run as daemon mode, which means it will send request to each regionserver periodically ./bin/hbase org.apache.hadoop.hbase.tool.RegionServerCanary -daemon -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7525) A canary monitoring program specifically for regionserver
[ https://issues.apache.org/jira/browse/HBASE-7525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13767688#comment-13767688 ] stack commented on HBASE-7525: -- bq. Yes, it's default behavior is just align with the old one, does the all regions monitoring Ok. The original behavior is a little 'surprising' but if it has been this way up to this, it is fair-enough changing it. bq. It is the internal DEBUG msg, for counting how many loop of this monitor instance did; It can help user to observe the monitor instance's behavior whether as expected I did not understand this log message. I did not seem to ask for more than one loop so seeing more than one w/o asking for it is unexpected. bq. The option '-regionserver' (regionserver mode) is exclusive with the default mode (region mode), which means user can only choose to use default mode or regionserver mode either Understood. We should fix the usage to make it more plain it exclusive w/ table ops: Usage: ./bin/hbase Canary [opts] [table1 [table2]...] | [regionserver1 [regionserver2]..] ... or something like that. As is it would seem to mix the exlusive args. Your suggestion would allow: Canary table1 regionserver2 ,etc. Suggest that in the usage you are more clear that it is table OR regionserver ops. A canary monitoring program specifically for regionserver - Key: HBASE-7525 URL: https://issues.apache.org/jira/browse/HBASE-7525 Project: HBase Issue Type: New Feature Components: monitoring Affects Versions: 0.94.0 Reporter: takeshi.miao Priority: Critical Fix For: 0.98.0 Attachments: HBASE-7525-0.95-v0.patch, HBASE-7525-0.95-v1.patch, HBASE-7525-0.95-v3.patch, HBASE-7525-0.95-v4.patch, HBASE-7525-0.95-v6.patch, HBASE-7525-trunk-v2.patch, HBASE-7525-v0.patch, RegionServerCanary.java *Motivation* This ticket is to provide a canary monitoring tool specifically for HRegionserver, details as follows 1. This tool is required by operation team due to they thought that the canary for each region of a HBase is too many for them, so I implemented this coarse-granular one based on the original o.a.h.h.tool.Canary for them 2. And this tool is implemented by multi-threading, which means the each Get request sent by a thread. the reason I use this way is due to we suffered the region server hung issue by now the root cause is still not clear. so this tool can help operation team to detect hung region server if any. *example* 1. the tool docs ./bin/hbase org.apache.hadoop.hbase.tool.RegionServerCanary -help Usage: [opts] [regionServerName 1 [regionServrName 2...]] regionServerName - FQDN serverName, can use linux command:hostname -f to check your serverName where [-opts] are: -help Show this help and exit. -eUse regionServerName as regular expression which means the regionServerName is regular expression pattern -f B stop whole program if first error occurs, default is true -t N timeout for a check, default is 60 (milisecs) -daemonContinuous check at defined intervals. -interval N Interval between checks (sec) 2. Will send a request to each regionserver in a HBase cluster ./bin/hbase org.apache.hadoop.hbase.tool.RegionServerCanary 3. Will send a request to a regionserver by given name ./bin/hbase org.apache.hadoop.hbase.tool.RegionServerCanary rs1.domainname 4. Will send a request to regionserver(s) by given regular-expression /opt/trend/circus-opstool/bin/hbase-canary-monitor-each-regionserver.sh -e rs1.domainname.pattern // another example ./bin/hbase org.apache.hadoop.hbase.tool.RegionServerCanary -e tw-poc-tm-puppet-hdn[0-9]\{1,2\}.client.tw.trendnet.org 5. Will send a request to a regionserver and also set a timeout limit for this test // query regionserver:rs1.domainname with timeout limit 10sec // -f false, means that will not exit this program even test failed ./bin/hbase org.apache.hadoop.hbase.tool.RegionServerCanary -f false -t 1 rs1.domainname // echo 1 if timeout echo $? 6. Will run as daemon mode, which means it will send request to each regionserver periodically ./bin/hbase org.apache.hadoop.hbase.tool.RegionServerCanary -daemon -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7525) A canary monitoring program specifically for regionserver
[ https://issues.apache.org/jira/browse/HBASE-7525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13754458#comment-13754458 ] takeshi.miao commented on HBASE-7525: - Dear [~stack] Here is the answer for your questions {quote} ./hbase-0.95.3-SNAPSHOT/bin/hbase --config /home/stack/conf_hbase org.apache.hadoop.hbase.tool.Canary ... it goes off and does something; default looks to go and get from all regions. {quote} Yes, it's default behavior is just align with the old one, does the all regions monitoring bq. You add 2013-08-29 09:32:16,463 DEBUG [main] tool.Canary: runCount=2. What does it mean ? It is the internal DEBUG msg, for counting how many loop of this monitor instance did; It can help user to observe the monitor instance's behavior whether as expected Following are the questions you asked about _'-regionserver'_ option {quote} {code} Usage: bin/hbase org.apache.hadoop.hbase.tool.Canary [opts] [table/regionserver 1 [table/regionserver 2...]] ... {code} {quote} {quote} Would it be clearer if the -regionserver option took arguments as in -regionserver=rs1,rs2,rs3 etc.? How to interpret this then: Usage: bin/hbase org.apache.hadoop.hbase.tool.Canary -regionserver=rs1 table1 Would above only get regions from table1 on rs1? If no regions from table1 then it would print out there are none? {quote} The option _'-regionserver'_ (regionserver mode) is exclusive with the default mode (region mode), which means user can only choose to use default mode or regionserver mode either bq. I do not know how to read 'table/regionserver 1'. What is the '1'? So it seems the usage output confuses the user, I would like to change it to following, how do you think ? {code} Usage: bin/hbase org.apache.hadoop.hbase.tool.Canary [opts] [table|regionserver [table|regionserver ...]] ... {code} {quote} Or if you pass a table1 when you have a -regionserver option specified, you could just fail with Cannot pass a tablename when using the -regionserver option – that'd probably be simplest. {quote} Yes, this is a good suggestion, but currently I would not check this if the passed arguments are whether tableNames in HBase, due to I need to new a HBaseAdim instance to get the table list firstly, then compare them with the passed argument. How do you think that I modify the usage output more precisely for -regionserver option ? such as... {code} ... -regionserver replace the table argument to regionserver, which means to enable regionserver mode, instead of region mode (default) ... {code} Either way is ok for me. I will upload the new patches after we confirm which way to go, and tks for your questions and suggestions :) A canary monitoring program specifically for regionserver - Key: HBASE-7525 URL: https://issues.apache.org/jira/browse/HBASE-7525 Project: HBase Issue Type: New Feature Components: monitoring Affects Versions: 0.94.0 Reporter: takeshi.miao Priority: Critical Fix For: 0.98.0 Attachments: HBASE-7525-0.95-v0.patch, HBASE-7525-0.95-v1.patch, HBASE-7525-0.95-v3.patch, HBASE-7525-0.95-v4.patch, HBASE-7525-0.95-v6.patch, HBASE-7525-trunk-v2.patch, HBASE-7525-v0.patch, RegionServerCanary.java *Motivation* This ticket is to provide a canary monitoring tool specifically for HRegionserver, details as follows 1. This tool is required by operation team due to they thought that the canary for each region of a HBase is too many for them, so I implemented this coarse-granular one based on the original o.a.h.h.tool.Canary for them 2. And this tool is implemented by multi-threading, which means the each Get request sent by a thread. the reason I use this way is due to we suffered the region server hung issue by now the root cause is still not clear. so this tool can help operation team to detect hung region server if any. *example* 1. the tool docs ./bin/hbase org.apache.hadoop.hbase.tool.RegionServerCanary -help Usage: [opts] [regionServerName 1 [regionServrName 2...]] regionServerName - FQDN serverName, can use linux command:hostname -f to check your serverName where [-opts] are: -help Show this help and exit. -eUse regionServerName as regular expression which means the regionServerName is regular expression pattern -f B stop whole program if first error occurs, default is true -t N timeout for a check, default is 60 (milisecs) -daemonContinuous check at defined intervals. -interval N Interval between checks (sec) 2. Will send a request to each regionserver in a HBase cluster ./bin/hbase org.apache.hadoop.hbase.tool.RegionServerCanary 3. Will send a request to a regionserver by given name ./bin/hbase
[jira] [Commented] (HBASE-7525) A canary monitoring program specifically for regionserver
[ https://issues.apache.org/jira/browse/HBASE-7525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13753446#comment-13753446 ] takeshi.miao commented on HBASE-7525: - Dear [~stack] I rebased and attached two patches for branch trunk and 0.95 respectively, and the root cause of the old patch did not work is due to the _'region.getTableNameAsString()'_ method was removed. {code:title=Old code} ... 448 public static void sniff(final HBaseAdmin admin, String tableName) throws Exception { 449sniff(admin, new StdOutSink(), tableName); 450 } ... 579 tableName = region.getTableNameAsString(); {code} {code:title=New code} ... 448 public static void sniff(final HBaseAdmin admin, TableName tableName) throws Exception { 449sniff(admin, new StdOutSink(), tableName.getNameAsString()); 450 } ... 579 tableName = region.getTableName().getNameAsString(); {code} A canary monitoring program specifically for regionserver - Key: HBASE-7525 URL: https://issues.apache.org/jira/browse/HBASE-7525 Project: HBase Issue Type: New Feature Components: monitoring Affects Versions: 0.94.0 Reporter: takeshi.miao Priority: Critical Fix For: 0.98.0 Attachments: HBASE-7525-0.95-v0.patch, HBASE-7525-0.95-v1.patch, HBASE-7525-0.95-v3.patch, HBASE-7525-0.95-v4.patch, HBASE-7525-0.95-v6.patch, HBASE-7525-trunk-v2.patch, HBASE-7525-v0.patch, RegionServerCanary.java *Motivation* This ticket is to provide a canary monitoring tool specifically for HRegionserver, details as follows 1. This tool is required by operation team due to they thought that the canary for each region of a HBase is too many for them, so I implemented this coarse-granular one based on the original o.a.h.h.tool.Canary for them 2. And this tool is implemented by multi-threading, which means the each Get request sent by a thread. the reason I use this way is due to we suffered the region server hung issue by now the root cause is still not clear. so this tool can help operation team to detect hung region server if any. *example* 1. the tool docs ./bin/hbase org.apache.hadoop.hbase.tool.RegionServerCanary -help Usage: [opts] [regionServerName 1 [regionServrName 2...]] regionServerName - FQDN serverName, can use linux command:hostname -f to check your serverName where [-opts] are: -help Show this help and exit. -eUse regionServerName as regular expression which means the regionServerName is regular expression pattern -f B stop whole program if first error occurs, default is true -t N timeout for a check, default is 60 (milisecs) -daemonContinuous check at defined intervals. -interval N Interval between checks (sec) 2. Will send a request to each regionserver in a HBase cluster ./bin/hbase org.apache.hadoop.hbase.tool.RegionServerCanary 3. Will send a request to a regionserver by given name ./bin/hbase org.apache.hadoop.hbase.tool.RegionServerCanary rs1.domainname 4. Will send a request to regionserver(s) by given regular-expression /opt/trend/circus-opstool/bin/hbase-canary-monitor-each-regionserver.sh -e rs1.domainname.pattern // another example ./bin/hbase org.apache.hadoop.hbase.tool.RegionServerCanary -e tw-poc-tm-puppet-hdn[0-9]\{1,2\}.client.tw.trendnet.org 5. Will send a request to a regionserver and also set a timeout limit for this test // query regionserver:rs1.domainname with timeout limit 10sec // -f false, means that will not exit this program even test failed ./bin/hbase org.apache.hadoop.hbase.tool.RegionServerCanary -f false -t 1 rs1.domainname // echo 1 if timeout echo $? 6. Will run as daemon mode, which means it will send request to each regionserver periodically ./bin/hbase org.apache.hadoop.hbase.tool.RegionServerCanary -daemon -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7525) A canary monitoring program specifically for regionserver
[ https://issues.apache.org/jira/browse/HBASE-7525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13753486#comment-13753486 ] Hadoop QA commented on HBASE-7525: -- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12600557/HBASE-7525-0.95-v6.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 hadoop1.0{color}. The patch compiles against the hadoop 1.0 profile. {color:green}+1 hadoop2.0{color}. The patch compiles against the hadoop 2.0 profile. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:red}-1 findbugs{color}. The patch appears to introduce 2 new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 lineLengths{color}. The patch does not introduce lines longer than 100 {color:red}-1 site{color}. The patch appears to cause mvn site goal to fail. {color:green}+1 core tests{color}. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/6962//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6962//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6962//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6962//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6962//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6962//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6962//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6962//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6962//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/6962//console This message is automatically generated. A canary monitoring program specifically for regionserver - Key: HBASE-7525 URL: https://issues.apache.org/jira/browse/HBASE-7525 Project: HBase Issue Type: New Feature Components: monitoring Affects Versions: 0.94.0 Reporter: takeshi.miao Priority: Critical Fix For: 0.98.0 Attachments: HBASE-7525-0.95-v0.patch, HBASE-7525-0.95-v1.patch, HBASE-7525-0.95-v3.patch, HBASE-7525-0.95-v4.patch, HBASE-7525-0.95-v6.patch, HBASE-7525-trunk-v2.patch, HBASE-7525-v0.patch, RegionServerCanary.java *Motivation* This ticket is to provide a canary monitoring tool specifically for HRegionserver, details as follows 1. This tool is required by operation team due to they thought that the canary for each region of a HBase is too many for them, so I implemented this coarse-granular one based on the original o.a.h.h.tool.Canary for them 2. And this tool is implemented by multi-threading, which means the each Get request sent by a thread. the reason I use this way is due to we suffered the region server hung issue by now the root cause is still not clear. so this tool can help operation team to detect hung region server if any. *example* 1. the tool docs ./bin/hbase org.apache.hadoop.hbase.tool.RegionServerCanary -help Usage: [opts] [regionServerName 1 [regionServrName 2...]] regionServerName - FQDN serverName, can use linux command:hostname -f to check your serverName where [-opts] are: -help Show this help and exit. -eUse regionServerName as regular expression which means the regionServerName is regular expression pattern -f B stop whole program if first error occurs, default is true -t N timeout for a
[jira] [Commented] (HBASE-7525) A canary monitoring program specifically for regionserver
[ https://issues.apache.org/jira/browse/HBASE-7525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13753807#comment-13753807 ] stack commented on HBASE-7525: -- I tried it. It is great. Nice utility. But usage needs cleanup because otherwise users will be confused on how to use it. Did it always just run if you did not provide an argument: i.e. if I do this: ./hbase-0.95.3-SNAPSHOT/bin/hbase --config /home/stack/conf_hbase org.apache.hadoop.hbase.tool.Canary ... it goes off and does something; default looks to go and get from all regions. You add 2013-08-29 09:32:16,463 DEBUG [main] tool.Canary: runCount=2. What does it mean? Here is current usage: {code} Usage: bin/hbase org.apache.hadoop.hbase.tool.Canary [opts] [table/regionserver 1 [table/regionserver 2...]] where [opts] are: -help Show this help and exit. -regionserver replace the table argument to regionserver, which means to enable regionserver mode -daemonContinuous check at defined intervals. -interval N Interval between checks (sec) -e Use region/regionserver as regular expression which means the region/regionserver is regular expression pattern -f B stop whole program if first error occurs, default is true -t N timeout for a check, default is 60 (milisecs) {code} First, the formatting is off in the above. So, if I supply -regionserver, then the last argument is a regionserver hostname rather than a table it seems? (I tried it and it seems so) I do not know how to read 'table/regionserver 1'. What is the '1'? Would it be clearer if the -regionserver option took arguments as in -regionserver=rs1,rs2,rs3 etc.? How to interpret this then: Usage: bin/hbase org.apache.hadoop.hbase.tool.Canary -regionserver=rs1 table1 Would above only get regions from table1 on rs1? If no regions from table1 then it would print out there are none? Or if you pass a table1 when you have a -regionserver option specified, you could just fail with Cannot pass a tablename when using the -regionserver option -- that'd probably be simplest. A canary monitoring program specifically for regionserver - Key: HBASE-7525 URL: https://issues.apache.org/jira/browse/HBASE-7525 Project: HBase Issue Type: New Feature Components: monitoring Affects Versions: 0.94.0 Reporter: takeshi.miao Priority: Critical Fix For: 0.98.0 Attachments: HBASE-7525-0.95-v0.patch, HBASE-7525-0.95-v1.patch, HBASE-7525-0.95-v3.patch, HBASE-7525-0.95-v4.patch, HBASE-7525-0.95-v6.patch, HBASE-7525-trunk-v2.patch, HBASE-7525-v0.patch, RegionServerCanary.java *Motivation* This ticket is to provide a canary monitoring tool specifically for HRegionserver, details as follows 1. This tool is required by operation team due to they thought that the canary for each region of a HBase is too many for them, so I implemented this coarse-granular one based on the original o.a.h.h.tool.Canary for them 2. And this tool is implemented by multi-threading, which means the each Get request sent by a thread. the reason I use this way is due to we suffered the region server hung issue by now the root cause is still not clear. so this tool can help operation team to detect hung region server if any. *example* 1. the tool docs ./bin/hbase org.apache.hadoop.hbase.tool.RegionServerCanary -help Usage: [opts] [regionServerName 1 [regionServrName 2...]] regionServerName - FQDN serverName, can use linux command:hostname -f to check your serverName where [-opts] are: -help Show this help and exit. -eUse regionServerName as regular expression which means the regionServerName is regular expression pattern -f B stop whole program if first error occurs, default is true -t N timeout for a check, default is 60 (milisecs) -daemonContinuous check at defined intervals. -interval N Interval between checks (sec) 2. Will send a request to each regionserver in a HBase cluster ./bin/hbase org.apache.hadoop.hbase.tool.RegionServerCanary 3. Will send a request to a regionserver by given name ./bin/hbase org.apache.hadoop.hbase.tool.RegionServerCanary rs1.domainname 4. Will send a request to regionserver(s) by given regular-expression /opt/trend/circus-opstool/bin/hbase-canary-monitor-each-regionserver.sh -e rs1.domainname.pattern // another example ./bin/hbase org.apache.hadoop.hbase.tool.RegionServerCanary -e tw-poc-tm-puppet-hdn[0-9]\{1,2\}.client.tw.trendnet.org 5. Will send a request to a regionserver and also set a timeout limit for this test // query regionserver:rs1.domainname with timeout limit 10sec // -f false, means that will not exit this program even test
[jira] [Commented] (HBASE-7525) A canary monitoring program specifically for regionserver
[ https://issues.apache.org/jira/browse/HBASE-7525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13752928#comment-13752928 ] stack commented on HBASE-7525: -- [~takeshi.miao] for 0.95 branch and for trunk. Thank you. A canary monitoring program specifically for regionserver - Key: HBASE-7525 URL: https://issues.apache.org/jira/browse/HBASE-7525 Project: HBase Issue Type: New Feature Components: monitoring Affects Versions: 0.94.0 Reporter: takeshi.miao Priority: Critical Fix For: 0.98.0, 0.96.0 Attachments: HBASE-7525-0.95-v0.patch, HBASE-7525-0.95-v1.patch, HBASE-7525-0.95-v3.patch, HBASE-7525-0.95-v4.patch, HBASE-7525-v0.patch, RegionServerCanary.java *Motivation* This ticket is to provide a canary monitoring tool specifically for HRegionserver, details as follows 1. This tool is required by operation team due to they thought that the canary for each region of a HBase is too many for them, so I implemented this coarse-granular one based on the original o.a.h.h.tool.Canary for them 2. And this tool is implemented by multi-threading, which means the each Get request sent by a thread. the reason I use this way is due to we suffered the region server hung issue by now the root cause is still not clear. so this tool can help operation team to detect hung region server if any. *example* 1. the tool docs ./bin/hbase org.apache.hadoop.hbase.tool.RegionServerCanary -help Usage: [opts] [regionServerName 1 [regionServrName 2...]] regionServerName - FQDN serverName, can use linux command:hostname -f to check your serverName where [-opts] are: -help Show this help and exit. -eUse regionServerName as regular expression which means the regionServerName is regular expression pattern -f B stop whole program if first error occurs, default is true -t N timeout for a check, default is 60 (milisecs) -daemonContinuous check at defined intervals. -interval N Interval between checks (sec) 2. Will send a request to each regionserver in a HBase cluster ./bin/hbase org.apache.hadoop.hbase.tool.RegionServerCanary 3. Will send a request to a regionserver by given name ./bin/hbase org.apache.hadoop.hbase.tool.RegionServerCanary rs1.domainname 4. Will send a request to regionserver(s) by given regular-expression /opt/trend/circus-opstool/bin/hbase-canary-monitor-each-regionserver.sh -e rs1.domainname.pattern // another example ./bin/hbase org.apache.hadoop.hbase.tool.RegionServerCanary -e tw-poc-tm-puppet-hdn[0-9]\{1,2\}.client.tw.trendnet.org 5. Will send a request to a regionserver and also set a timeout limit for this test // query regionserver:rs1.domainname with timeout limit 10sec // -f false, means that will not exit this program even test failed ./bin/hbase org.apache.hadoop.hbase.tool.RegionServerCanary -f false -t 1 rs1.domainname // echo 1 if timeout echo $? 6. Will run as daemon mode, which means it will send request to each regionserver periodically ./bin/hbase org.apache.hadoop.hbase.tool.RegionServerCanary -daemon -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7525) A canary monitoring program specifically for regionserver
[ https://issues.apache.org/jira/browse/HBASE-7525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13749337#comment-13749337 ] takeshi.miao commented on HBASE-7525: - Dear [~stack] I am not sure which branch I need to rebase for you ? 0.95 or trunk ? due to I not see any branch for 0.98 and 0.96 A canary monitoring program specifically for regionserver - Key: HBASE-7525 URL: https://issues.apache.org/jira/browse/HBASE-7525 Project: HBase Issue Type: New Feature Components: monitoring Affects Versions: 0.94.0 Reporter: takeshi.miao Priority: Critical Fix For: 0.98.0, 0.96.0 Attachments: HBASE-7525-0.95-v0.patch, HBASE-7525-0.95-v1.patch, HBASE-7525-0.95-v3.patch, HBASE-7525-0.95-v4.patch, HBASE-7525-v0.patch, RegionServerCanary.java *Motivation* This ticket is to provide a canary monitoring tool specifically for HRegionserver, details as follows 1. This tool is required by operation team due to they thought that the canary for each region of a HBase is too many for them, so I implemented this coarse-granular one based on the original o.a.h.h.tool.Canary for them 2. And this tool is implemented by multi-threading, which means the each Get request sent by a thread. the reason I use this way is due to we suffered the region server hung issue by now the root cause is still not clear. so this tool can help operation team to detect hung region server if any. *example* 1. the tool docs ./bin/hbase org.apache.hadoop.hbase.tool.RegionServerCanary -help Usage: [opts] [regionServerName 1 [regionServrName 2...]] regionServerName - FQDN serverName, can use linux command:hostname -f to check your serverName where [-opts] are: -help Show this help and exit. -eUse regionServerName as regular expression which means the regionServerName is regular expression pattern -f B stop whole program if first error occurs, default is true -t N timeout for a check, default is 60 (milisecs) -daemonContinuous check at defined intervals. -interval N Interval between checks (sec) 2. Will send a request to each regionserver in a HBase cluster ./bin/hbase org.apache.hadoop.hbase.tool.RegionServerCanary 3. Will send a request to a regionserver by given name ./bin/hbase org.apache.hadoop.hbase.tool.RegionServerCanary rs1.domainname 4. Will send a request to regionserver(s) by given regular-expression /opt/trend/circus-opstool/bin/hbase-canary-monitor-each-regionserver.sh -e rs1.domainname.pattern // another example ./bin/hbase org.apache.hadoop.hbase.tool.RegionServerCanary -e tw-poc-tm-puppet-hdn[0-9]\{1,2\}.client.tw.trendnet.org 5. Will send a request to a regionserver and also set a timeout limit for this test // query regionserver:rs1.domainname with timeout limit 10sec // -f false, means that will not exit this program even test failed ./bin/hbase org.apache.hadoop.hbase.tool.RegionServerCanary -f false -t 1 rs1.domainname // echo 1 if timeout echo $? 6. Will run as daemon mode, which means it will send request to each regionserver periodically ./bin/hbase org.apache.hadoop.hbase.tool.RegionServerCanary -daemon -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7525) A canary monitoring program specifically for regionserver
[ https://issues.apache.org/jira/browse/HBASE-7525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13748316#comment-13748316 ] stack commented on HBASE-7525: -- [~takeshi.miao] I went to test this patch this evening but it has rotted in a pretty bad way (seems strange since no changes to Canary.java in a while). Any chance of a rebase? Thank you. A canary monitoring program specifically for regionserver - Key: HBASE-7525 URL: https://issues.apache.org/jira/browse/HBASE-7525 Project: HBase Issue Type: New Feature Components: monitoring Affects Versions: 0.94.0 Reporter: takeshi.miao Priority: Critical Fix For: 0.98.0, 0.96.0 Attachments: HBASE-7525-0.95-v0.patch, HBASE-7525-0.95-v1.patch, HBASE-7525-0.95-v3.patch, HBASE-7525-0.95-v4.patch, HBASE-7525-v0.patch, RegionServerCanary.java *Motivation* This ticket is to provide a canary monitoring tool specifically for HRegionserver, details as follows 1. This tool is required by operation team due to they thought that the canary for each region of a HBase is too many for them, so I implemented this coarse-granular one based on the original o.a.h.h.tool.Canary for them 2. And this tool is implemented by multi-threading, which means the each Get request sent by a thread. the reason I use this way is due to we suffered the region server hung issue by now the root cause is still not clear. so this tool can help operation team to detect hung region server if any. *example* 1. the tool docs ./bin/hbase org.apache.hadoop.hbase.tool.RegionServerCanary -help Usage: [opts] [regionServerName 1 [regionServrName 2...]] regionServerName - FQDN serverName, can use linux command:hostname -f to check your serverName where [-opts] are: -help Show this help and exit. -eUse regionServerName as regular expression which means the regionServerName is regular expression pattern -f B stop whole program if first error occurs, default is true -t N timeout for a check, default is 60 (milisecs) -daemonContinuous check at defined intervals. -interval N Interval between checks (sec) 2. Will send a request to each regionserver in a HBase cluster ./bin/hbase org.apache.hadoop.hbase.tool.RegionServerCanary 3. Will send a request to a regionserver by given name ./bin/hbase org.apache.hadoop.hbase.tool.RegionServerCanary rs1.domainname 4. Will send a request to regionserver(s) by given regular-expression /opt/trend/circus-opstool/bin/hbase-canary-monitor-each-regionserver.sh -e rs1.domainname.pattern // another example ./bin/hbase org.apache.hadoop.hbase.tool.RegionServerCanary -e tw-poc-tm-puppet-hdn[0-9]\{1,2\}.client.tw.trendnet.org 5. Will send a request to a regionserver and also set a timeout limit for this test // query regionserver:rs1.domainname with timeout limit 10sec // -f false, means that will not exit this program even test failed ./bin/hbase org.apache.hadoop.hbase.tool.RegionServerCanary -f false -t 1 rs1.domainname // echo 1 if timeout echo $? 6. Will run as daemon mode, which means it will send request to each regionserver periodically ./bin/hbase org.apache.hadoop.hbase.tool.RegionServerCanary -daemon -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7525) A canary monitoring program specifically for regionserver
[ https://issues.apache.org/jira/browse/HBASE-7525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13736350#comment-13736350 ] takeshi.miao commented on HBASE-7525: - Dear [~stack] Pls tell me if any thing I can do for this ticket, tks a lot A canary monitoring program specifically for regionserver - Key: HBASE-7525 URL: https://issues.apache.org/jira/browse/HBASE-7525 Project: HBase Issue Type: New Feature Components: monitoring Affects Versions: 0.94.0 Reporter: takeshi.miao Priority: Minor Fix For: 0.95.0 Attachments: HBASE-7525-0.95-v0.patch, HBASE-7525-0.95-v1.patch, HBASE-7525-0.95-v3.patch, HBASE-7525-0.95-v4.patch, HBASE-7525-v0.patch, RegionServerCanary.java *Motivation* This ticket is to provide a canary monitoring tool specifically for HRegionserver, details as follows 1. This tool is required by operation team due to they thought that the canary for each region of a HBase is too many for them, so I implemented this coarse-granular one based on the original o.a.h.h.tool.Canary for them 2. And this tool is implemented by multi-threading, which means the each Get request sent by a thread. the reason I use this way is due to we suffered the region server hung issue by now the root cause is still not clear. so this tool can help operation team to detect hung region server if any. *example* 1. the tool docs ./bin/hbase org.apache.hadoop.hbase.tool.RegionServerCanary -help Usage: [opts] [regionServerName 1 [regionServrName 2...]] regionServerName - FQDN serverName, can use linux command:hostname -f to check your serverName where [-opts] are: -help Show this help and exit. -eUse regionServerName as regular expression which means the regionServerName is regular expression pattern -f B stop whole program if first error occurs, default is true -t N timeout for a check, default is 60 (milisecs) -daemonContinuous check at defined intervals. -interval N Interval between checks (sec) 2. Will send a request to each regionserver in a HBase cluster ./bin/hbase org.apache.hadoop.hbase.tool.RegionServerCanary 3. Will send a request to a regionserver by given name ./bin/hbase org.apache.hadoop.hbase.tool.RegionServerCanary rs1.domainname 4. Will send a request to regionserver(s) by given regular-expression /opt/trend/circus-opstool/bin/hbase-canary-monitor-each-regionserver.sh -e rs1.domainname.pattern // another example ./bin/hbase org.apache.hadoop.hbase.tool.RegionServerCanary -e tw-poc-tm-puppet-hdn[0-9]\{1,2\}.client.tw.trendnet.org 5. Will send a request to a regionserver and also set a timeout limit for this test // query regionserver:rs1.domainname with timeout limit 10sec // -f false, means that will not exit this program even test failed ./bin/hbase org.apache.hadoop.hbase.tool.RegionServerCanary -f false -t 1 rs1.domainname // echo 1 if timeout echo $? 6. Will run as daemon mode, which means it will send request to each regionserver periodically ./bin/hbase org.apache.hadoop.hbase.tool.RegionServerCanary -daemon -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7525) A canary monitoring program specifically for regionserver
[ https://issues.apache.org/jira/browse/HBASE-7525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13731848#comment-13731848 ] takeshi.miao commented on HBASE-7525: - Dear [~stack] I have added the rebased patch, pls note that I am still using the Scan for empty startRowKey case, due to I still suffer the following exception while testing in my Env. {code} ... 13/08/07 10:05:11 INFO zookeeper.RecoverableZooKeeper: Process identifier=hconnection-0x6a3b8b49 connecting to ZooKeeper ensemble=localhost:2181 13/08/07 10:05:11 INFO zookeeper.ClientCnxn: Session establishment complete on server localhost.localdomain/127.0.0.1:2181, sessionid = 0x140583a69570010, negotiated timeout = 9 Exception in thread Thread-1 java.lang.IllegalArgumentException: Row length is 0 at org.apache.hadoop.hbase.client.Mutation.checkRow(Mutation.java:364) at org.apache.hadoop.hbase.client.Mutation.checkRow(Mutation.java:348) at org.apache.hadoop.hbase.client.Get.init(Get.java:86) at org.apache.hadoop.hbase.tool.Canary$RegionServerMonitor.monitorRegionServers(Canary.java:563) at org.apache.hadoop.hbase.tool.Canary$RegionServerMonitor.run(Canary.java:540) at java.lang.Thread.run(Thread.java:662) {code} A canary monitoring program specifically for regionserver - Key: HBASE-7525 URL: https://issues.apache.org/jira/browse/HBASE-7525 Project: HBase Issue Type: New Feature Components: monitoring Affects Versions: 0.94.0 Reporter: takeshi.miao Priority: Minor Fix For: 0.95.0 Attachments: HBASE-7525-0.95-v0.patch, HBASE-7525-0.95-v1.patch, HBASE-7525-0.95-v3.patch, HBASE-7525-0.95-v4.patch, HBASE-7525-v0.patch, RegionServerCanary.java *Motivation* This ticket is to provide a canary monitoring tool specifically for HRegionserver, details as follows 1. This tool is required by operation team due to they thought that the canary for each region of a HBase is too many for them, so I implemented this coarse-granular one based on the original o.a.h.h.tool.Canary for them 2. And this tool is implemented by multi-threading, which means the each Get request sent by a thread. the reason I use this way is due to we suffered the region server hung issue by now the root cause is still not clear. so this tool can help operation team to detect hung region server if any. *example* 1. the tool docs ./bin/hbase org.apache.hadoop.hbase.tool.RegionServerCanary -help Usage: [opts] [regionServerName 1 [regionServrName 2...]] regionServerName - FQDN serverName, can use linux command:hostname -f to check your serverName where [-opts] are: -help Show this help and exit. -eUse regionServerName as regular expression which means the regionServerName is regular expression pattern -f B stop whole program if first error occurs, default is true -t N timeout for a check, default is 60 (milisecs) -daemonContinuous check at defined intervals. -interval N Interval between checks (sec) 2. Will send a request to each regionserver in a HBase cluster ./bin/hbase org.apache.hadoop.hbase.tool.RegionServerCanary 3. Will send a request to a regionserver by given name ./bin/hbase org.apache.hadoop.hbase.tool.RegionServerCanary rs1.domainname 4. Will send a request to regionserver(s) by given regular-expression /opt/trend/circus-opstool/bin/hbase-canary-monitor-each-regionserver.sh -e rs1.domainname.pattern // another example ./bin/hbase org.apache.hadoop.hbase.tool.RegionServerCanary -e tw-poc-tm-puppet-hdn[0-9]\{1,2\}.client.tw.trendnet.org 5. Will send a request to a regionserver and also set a timeout limit for this test // query regionserver:rs1.domainname with timeout limit 10sec // -f false, means that will not exit this program even test failed ./bin/hbase org.apache.hadoop.hbase.tool.RegionServerCanary -f false -t 1 rs1.domainname // echo 1 if timeout echo $? 6. Will run as daemon mode, which means it will send request to each regionserver periodically ./bin/hbase org.apache.hadoop.hbase.tool.RegionServerCanary -daemon -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7525) A canary monitoring program specifically for regionserver
[ https://issues.apache.org/jira/browse/HBASE-7525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13731889#comment-13731889 ] Hadoop QA commented on HBASE-7525: -- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12596576/HBASE-7525-0.95-v4.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 hadoop1.0{color}. The patch compiles against the hadoop 1.0 profile. {color:green}+1 hadoop2.0{color}. The patch compiles against the hadoop 2.0 profile. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 lineLengths{color}. The patch does not introduce lines longer than 100 {color:green}+1 site{color}. The mvn site goal succeeds with this patch. {color:red}-1 core tests{color}. The patch failed these unit tests: org.apache.hadoop.hbase.backup.TestHFileArchiving Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/6633//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6633//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6633//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6633//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6633//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6633//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6633//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6633//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/6633//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/6633//console This message is automatically generated. A canary monitoring program specifically for regionserver - Key: HBASE-7525 URL: https://issues.apache.org/jira/browse/HBASE-7525 Project: HBase Issue Type: New Feature Components: monitoring Affects Versions: 0.94.0 Reporter: takeshi.miao Priority: Minor Fix For: 0.95.0 Attachments: HBASE-7525-0.95-v0.patch, HBASE-7525-0.95-v1.patch, HBASE-7525-0.95-v3.patch, HBASE-7525-0.95-v4.patch, HBASE-7525-v0.patch, RegionServerCanary.java *Motivation* This ticket is to provide a canary monitoring tool specifically for HRegionserver, details as follows 1. This tool is required by operation team due to they thought that the canary for each region of a HBase is too many for them, so I implemented this coarse-granular one based on the original o.a.h.h.tool.Canary for them 2. And this tool is implemented by multi-threading, which means the each Get request sent by a thread. the reason I use this way is due to we suffered the region server hung issue by now the root cause is still not clear. so this tool can help operation team to detect hung region server if any. *example* 1. the tool docs ./bin/hbase org.apache.hadoop.hbase.tool.RegionServerCanary -help Usage: [opts] [regionServerName 1 [regionServrName 2...]] regionServerName - FQDN serverName, can use linux command:hostname -f to check your serverName where [-opts] are: -help Show this help and exit. -eUse regionServerName as regular expression which means the regionServerName is regular expression pattern -f B stop whole program if first error occurs, default is true -t N
[jira] [Commented] (HBASE-7525) A canary monitoring program specifically for regionserver
[ https://issues.apache.org/jira/browse/HBASE-7525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13729743#comment-13729743 ] stack commented on HBASE-7525: -- [~takeshi.miao] Pardon us for overlooking this addition. Please rebase (I think the scan instead of get for the first key in table has been addressed) and lets get it in. A canary monitoring program specifically for regionserver - Key: HBASE-7525 URL: https://issues.apache.org/jira/browse/HBASE-7525 Project: HBase Issue Type: New Feature Components: monitoring Affects Versions: 0.94.0 Reporter: takeshi.miao Priority: Minor Fix For: 0.95.0 Attachments: HBASE-7525-0.95-v0.patch, HBASE-7525-0.95-v1.patch, HBASE-7525-0.95-v3.patch, HBASE-7525-v0.patch, RegionServerCanary.java *Motivation* This ticket is to provide a canary monitoring tool specifically for HRegionserver, details as follows 1. This tool is required by operation team due to they thought that the canary for each region of a HBase is too many for them, so I implemented this coarse-granular one based on the original o.a.h.h.tool.Canary for them 2. And this tool is implemented by multi-threading, which means the each Get request sent by a thread. the reason I use this way is due to we suffered the region server hung issue by now the root cause is still not clear. so this tool can help operation team to detect hung region server if any. *example* 1. the tool docs ./bin/hbase org.apache.hadoop.hbase.tool.RegionServerCanary -help Usage: [opts] [regionServerName 1 [regionServrName 2...]] regionServerName - FQDN serverName, can use linux command:hostname -f to check your serverName where [-opts] are: -help Show this help and exit. -eUse regionServerName as regular expression which means the regionServerName is regular expression pattern -f B stop whole program if first error occurs, default is true -t N timeout for a check, default is 60 (milisecs) -daemonContinuous check at defined intervals. -interval N Interval between checks (sec) 2. Will send a request to each regionserver in a HBase cluster ./bin/hbase org.apache.hadoop.hbase.tool.RegionServerCanary 3. Will send a request to a regionserver by given name ./bin/hbase org.apache.hadoop.hbase.tool.RegionServerCanary rs1.domainname 4. Will send a request to regionserver(s) by given regular-expression /opt/trend/circus-opstool/bin/hbase-canary-monitor-each-regionserver.sh -e rs1.domainname.pattern // another example ./bin/hbase org.apache.hadoop.hbase.tool.RegionServerCanary -e tw-poc-tm-puppet-hdn[0-9]\{1,2\}.client.tw.trendnet.org 5. Will send a request to a regionserver and also set a timeout limit for this test // query regionserver:rs1.domainname with timeout limit 10sec // -f false, means that will not exit this program even test failed ./bin/hbase org.apache.hadoop.hbase.tool.RegionServerCanary -f false -t 1 rs1.domainname // echo 1 if timeout echo $? 6. Will run as daemon mode, which means it will send request to each regionserver periodically ./bin/hbase org.apache.hadoop.hbase.tool.RegionServerCanary -daemon -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7525) A canary monitoring program specifically for regionserver
[ https://issues.apache.org/jira/browse/HBASE-7525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13729100#comment-13729100 ] takeshi.miao commented on HBASE-7525: - Dear [~stack] [~mbertozzi] I am wondering how do you think about this ticket ? A canary monitoring program specifically for regionserver - Key: HBASE-7525 URL: https://issues.apache.org/jira/browse/HBASE-7525 Project: HBase Issue Type: New Feature Components: monitoring Affects Versions: 0.94.0 Reporter: takeshi.miao Priority: Minor Fix For: 0.95.0 Attachments: HBASE-7525-0.95-v0.patch, HBASE-7525-0.95-v1.patch, HBASE-7525-0.95-v3.patch, HBASE-7525-v0.patch, RegionServerCanary.java *Motivation* This ticket is to provide a canary monitoring tool specifically for HRegionserver, details as follows 1. This tool is required by operation team due to they thought that the canary for each region of a HBase is too many for them, so I implemented this coarse-granular one based on the original o.a.h.h.tool.Canary for them 2. And this tool is implemented by multi-threading, which means the each Get request sent by a thread. the reason I use this way is due to we suffered the region server hung issue by now the root cause is still not clear. so this tool can help operation team to detect hung region server if any. *example* 1. the tool docs ./bin/hbase org.apache.hadoop.hbase.tool.RegionServerCanary -help Usage: [opts] [regionServerName 1 [regionServrName 2...]] regionServerName - FQDN serverName, can use linux command:hostname -f to check your serverName where [-opts] are: -help Show this help and exit. -eUse regionServerName as regular expression which means the regionServerName is regular expression pattern -f B stop whole program if first error occurs, default is true -t N timeout for a check, default is 60 (milisecs) -daemonContinuous check at defined intervals. -interval N Interval between checks (sec) 2. Will send a request to each regionserver in a HBase cluster ./bin/hbase org.apache.hadoop.hbase.tool.RegionServerCanary 3. Will send a request to a regionserver by given name ./bin/hbase org.apache.hadoop.hbase.tool.RegionServerCanary rs1.domainname 4. Will send a request to regionserver(s) by given regular-expression /opt/trend/circus-opstool/bin/hbase-canary-monitor-each-regionserver.sh -e rs1.domainname.pattern // another example ./bin/hbase org.apache.hadoop.hbase.tool.RegionServerCanary -e tw-poc-tm-puppet-hdn[0-9]\{1,2\}.client.tw.trendnet.org 5. Will send a request to a regionserver and also set a timeout limit for this test // query regionserver:rs1.domainname with timeout limit 10sec // -f false, means that will not exit this program even test failed ./bin/hbase org.apache.hadoop.hbase.tool.RegionServerCanary -f false -t 1 rs1.domainname // echo 1 if timeout echo $? 6. Will run as daemon mode, which means it will send request to each regionserver periodically ./bin/hbase org.apache.hadoop.hbase.tool.RegionServerCanary -daemon -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7525) A canary monitoring program specifically for regionserver
[ https://issues.apache.org/jira/browse/HBASE-7525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13646538#comment-13646538 ] takeshi.miao commented on HBASE-7525: - [~mbertozzi] the new patch is applied, could you please help to take a look on it ? A canary monitoring program specifically for regionserver - Key: HBASE-7525 URL: https://issues.apache.org/jira/browse/HBASE-7525 Project: HBase Issue Type: New Feature Components: monitoring Affects Versions: 0.94.0 Reporter: takeshi.miao Priority: Minor Fix For: 0.95.0 Attachments: HBASE-7525-0.95-v0.patch, HBASE-7525-0.95-v1.patch, HBASE-7525-v0.patch, RegionServerCanary.java *Motivation* This ticket is to provide a canary monitoring tool specifically for HRegionserver, details as follows 1. This tool is required by operation team due to they thought that the canary for each region of a HBase is too many for them, so I implemented this coarse-granular one based on the original o.a.h.h.tool.Canary for them 2. And this tool is implemented by multi-threading, which means the each Get request sent by a thread. the reason I use this way is due to we suffered the region server hung issue by now the root cause is still not clear. so this tool can help operation team to detect hung region server if any. *example* 1. the tool docs ./bin/hbase org.apache.hadoop.hbase.tool.RegionServerCanary -help Usage: [opts] [regionServerName 1 [regionServrName 2...]] regionServerName - FQDN serverName, can use linux command:hostname -f to check your serverName where [-opts] are: -help Show this help and exit. -eUse regionServerName as regular expression which means the regionServerName is regular expression pattern -f B stop whole program if first error occurs, default is true -t N timeout for a check, default is 60 (milisecs) -daemonContinuous check at defined intervals. -interval N Interval between checks (sec) 2. Will send a request to each regionserver in a HBase cluster ./bin/hbase org.apache.hadoop.hbase.tool.RegionServerCanary 3. Will send a request to a regionserver by given name ./bin/hbase org.apache.hadoop.hbase.tool.RegionServerCanary rs1.domainname 4. Will send a request to regionserver(s) by given regular-expression /opt/trend/circus-opstool/bin/hbase-canary-monitor-each-regionserver.sh -e rs1.domainname.pattern // another example ./bin/hbase org.apache.hadoop.hbase.tool.RegionServerCanary -e tw-poc-tm-puppet-hdn[0-9]\{1,2\}.client.tw.trendnet.org 5. Will send a request to a regionserver and also set a timeout limit for this test // query regionserver:rs1.domainname with timeout limit 10sec // -f false, means that will not exit this program even test failed ./bin/hbase org.apache.hadoop.hbase.tool.RegionServerCanary -f false -t 1 rs1.domainname // echo 1 if timeout echo $? 6. Will run as daemon mode, which means it will send request to each regionserver periodically ./bin/hbase org.apache.hadoop.hbase.tool.RegionServerCanary -daemon -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7525) A canary monitoring program specifically for regionserver
[ https://issues.apache.org/jira/browse/HBASE-7525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13646539#comment-13646539 ] Hadoop QA commented on HBASE-7525: -- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12581354/HBASE-7525-0.95-v1.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/5519//console This message is automatically generated. A canary monitoring program specifically for regionserver - Key: HBASE-7525 URL: https://issues.apache.org/jira/browse/HBASE-7525 Project: HBase Issue Type: New Feature Components: monitoring Affects Versions: 0.94.0 Reporter: takeshi.miao Priority: Minor Fix For: 0.95.0 Attachments: HBASE-7525-0.95-v0.patch, HBASE-7525-0.95-v1.patch, HBASE-7525-v0.patch, RegionServerCanary.java *Motivation* This ticket is to provide a canary monitoring tool specifically for HRegionserver, details as follows 1. This tool is required by operation team due to they thought that the canary for each region of a HBase is too many for them, so I implemented this coarse-granular one based on the original o.a.h.h.tool.Canary for them 2. And this tool is implemented by multi-threading, which means the each Get request sent by a thread. the reason I use this way is due to we suffered the region server hung issue by now the root cause is still not clear. so this tool can help operation team to detect hung region server if any. *example* 1. the tool docs ./bin/hbase org.apache.hadoop.hbase.tool.RegionServerCanary -help Usage: [opts] [regionServerName 1 [regionServrName 2...]] regionServerName - FQDN serverName, can use linux command:hostname -f to check your serverName where [-opts] are: -help Show this help and exit. -eUse regionServerName as regular expression which means the regionServerName is regular expression pattern -f B stop whole program if first error occurs, default is true -t N timeout for a check, default is 60 (milisecs) -daemonContinuous check at defined intervals. -interval N Interval between checks (sec) 2. Will send a request to each regionserver in a HBase cluster ./bin/hbase org.apache.hadoop.hbase.tool.RegionServerCanary 3. Will send a request to a regionserver by given name ./bin/hbase org.apache.hadoop.hbase.tool.RegionServerCanary rs1.domainname 4. Will send a request to regionserver(s) by given regular-expression /opt/trend/circus-opstool/bin/hbase-canary-monitor-each-regionserver.sh -e rs1.domainname.pattern // another example ./bin/hbase org.apache.hadoop.hbase.tool.RegionServerCanary -e tw-poc-tm-puppet-hdn[0-9]\{1,2\}.client.tw.trendnet.org 5. Will send a request to a regionserver and also set a timeout limit for this test // query regionserver:rs1.domainname with timeout limit 10sec // -f false, means that will not exit this program even test failed ./bin/hbase org.apache.hadoop.hbase.tool.RegionServerCanary -f false -t 1 rs1.domainname // echo 1 if timeout echo $? 6. Will run as daemon mode, which means it will send request to each regionserver periodically ./bin/hbase org.apache.hadoop.hbase.tool.RegionServerCanary -daemon -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7525) A canary monitoring program specifically for regionserver
[ https://issues.apache.org/jira/browse/HBASE-7525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13646553#comment-13646553 ] Matteo Bertozzi commented on HBASE-7525: thanks for the quick follow-up to make the Hadoop QA happy, use git diff HBASE-XYZ.patch (see the unable to apply patch error above) for the getStartKey() you're right, it's not null but even an empty byte array doesn't pass the test. Mutation.checkRow() throws an exception on length == 0 {code} java.lang.IllegalArgumentException: Row length is 0 at org.apache.hadoop.hbase.client.Mutation.checkRow(Mutation.java:335) at org.apache.hadoop.hbase.client.Mutation.checkRow(Mutation.java:319) at org.apache.hadoop.hbase.client.Get.init(Get.java:86) at org.apache.hadoop.hbase.tool.Canary$RegionMonitor.sniffRegion(Canary.java:483) at org.apache.hadoop.hbase.tool.Canary$RegionMonitor.sniff(Canary.java:463) at org.apache.hadoop.hbase.tool.Canary$RegionMonitor.sniff(Canary.java:433) at org.apache.hadoop.hbase.tool.Canary$RegionMonitor.run(Canary.java:380) {code} My simple test is this... and you get a misleading error on the first region due to the empty key. {code} $ hbase shell create 'testtb', 'cf' put 'testtb', 'row0', 'cf:q', '0' put 'testtb', 'row1', 'cf:q', '1' put 'testtb', 'row2', 'cf:q', '2' $ hbase org.apache.hadoop.hbase.tool.Canary ... 2013-05-01 13:58:50,310 ERROR [Thread-0] tool.Canary: read from region testtb,,1367412865960.99b4d7e3b71c1f5292bc96fad28bb67e. failed {code} also you may be useful to add at least a LOG.debug() with the exception inside all the catch just to have an idea of what's going wrong (like the Get failure above) A canary monitoring program specifically for regionserver - Key: HBASE-7525 URL: https://issues.apache.org/jira/browse/HBASE-7525 Project: HBase Issue Type: New Feature Components: monitoring Affects Versions: 0.94.0 Reporter: takeshi.miao Priority: Minor Fix For: 0.95.0 Attachments: HBASE-7525-0.95-v0.patch, HBASE-7525-0.95-v1.patch, HBASE-7525-v0.patch, RegionServerCanary.java *Motivation* This ticket is to provide a canary monitoring tool specifically for HRegionserver, details as follows 1. This tool is required by operation team due to they thought that the canary for each region of a HBase is too many for them, so I implemented this coarse-granular one based on the original o.a.h.h.tool.Canary for them 2. And this tool is implemented by multi-threading, which means the each Get request sent by a thread. the reason I use this way is due to we suffered the region server hung issue by now the root cause is still not clear. so this tool can help operation team to detect hung region server if any. *example* 1. the tool docs ./bin/hbase org.apache.hadoop.hbase.tool.RegionServerCanary -help Usage: [opts] [regionServerName 1 [regionServrName 2...]] regionServerName - FQDN serverName, can use linux command:hostname -f to check your serverName where [-opts] are: -help Show this help and exit. -eUse regionServerName as regular expression which means the regionServerName is regular expression pattern -f B stop whole program if first error occurs, default is true -t N timeout for a check, default is 60 (milisecs) -daemonContinuous check at defined intervals. -interval N Interval between checks (sec) 2. Will send a request to each regionserver in a HBase cluster ./bin/hbase org.apache.hadoop.hbase.tool.RegionServerCanary 3. Will send a request to a regionserver by given name ./bin/hbase org.apache.hadoop.hbase.tool.RegionServerCanary rs1.domainname 4. Will send a request to regionserver(s) by given regular-expression /opt/trend/circus-opstool/bin/hbase-canary-monitor-each-regionserver.sh -e rs1.domainname.pattern // another example ./bin/hbase org.apache.hadoop.hbase.tool.RegionServerCanary -e tw-poc-tm-puppet-hdn[0-9]\{1,2\}.client.tw.trendnet.org 5. Will send a request to a regionserver and also set a timeout limit for this test // query regionserver:rs1.domainname with timeout limit 10sec // -f false, means that will not exit this program even test failed ./bin/hbase org.apache.hadoop.hbase.tool.RegionServerCanary -f false -t 1 rs1.domainname // echo 1 if timeout echo $? 6. Will run as daemon mode, which means it will send request to each regionserver periodically ./bin/hbase org.apache.hadoop.hbase.tool.RegionServerCanary -daemon -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7525) A canary monitoring program specifically for regionserver
[ https://issues.apache.org/jira/browse/HBASE-7525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13646626#comment-13646626 ] takeshi.miao commented on HBASE-7525: - [~mbertozzi], I'd like to use Scan to solve the region.getStartKey() issue, how do you think ? {code} startKey = region.getStartKey(); if(startKey.length 0) { get = new Get(startKey); get.addFamily(column.getName()); } else { scan = new Scan(); scan.setCaching(1); scan.addFamily(column.getName()) } ... if(startKey.length 0) { table.get(get); } else { table.getScanner(scan); } {code} A canary monitoring program specifically for regionserver - Key: HBASE-7525 URL: https://issues.apache.org/jira/browse/HBASE-7525 Project: HBase Issue Type: New Feature Components: monitoring Affects Versions: 0.94.0 Reporter: takeshi.miao Priority: Minor Fix For: 0.95.0 Attachments: HBASE-7525-0.95-v0.patch, HBASE-7525-0.95-v1.patch, HBASE-7525-v0.patch, RegionServerCanary.java *Motivation* This ticket is to provide a canary monitoring tool specifically for HRegionserver, details as follows 1. This tool is required by operation team due to they thought that the canary for each region of a HBase is too many for them, so I implemented this coarse-granular one based on the original o.a.h.h.tool.Canary for them 2. And this tool is implemented by multi-threading, which means the each Get request sent by a thread. the reason I use this way is due to we suffered the region server hung issue by now the root cause is still not clear. so this tool can help operation team to detect hung region server if any. *example* 1. the tool docs ./bin/hbase org.apache.hadoop.hbase.tool.RegionServerCanary -help Usage: [opts] [regionServerName 1 [regionServrName 2...]] regionServerName - FQDN serverName, can use linux command:hostname -f to check your serverName where [-opts] are: -help Show this help and exit. -eUse regionServerName as regular expression which means the regionServerName is regular expression pattern -f B stop whole program if first error occurs, default is true -t N timeout for a check, default is 60 (milisecs) -daemonContinuous check at defined intervals. -interval N Interval between checks (sec) 2. Will send a request to each regionserver in a HBase cluster ./bin/hbase org.apache.hadoop.hbase.tool.RegionServerCanary 3. Will send a request to a regionserver by given name ./bin/hbase org.apache.hadoop.hbase.tool.RegionServerCanary rs1.domainname 4. Will send a request to regionserver(s) by given regular-expression /opt/trend/circus-opstool/bin/hbase-canary-monitor-each-regionserver.sh -e rs1.domainname.pattern // another example ./bin/hbase org.apache.hadoop.hbase.tool.RegionServerCanary -e tw-poc-tm-puppet-hdn[0-9]\{1,2\}.client.tw.trendnet.org 5. Will send a request to a regionserver and also set a timeout limit for this test // query regionserver:rs1.domainname with timeout limit 10sec // -f false, means that will not exit this program even test failed ./bin/hbase org.apache.hadoop.hbase.tool.RegionServerCanary -f false -t 1 rs1.domainname // echo 1 if timeout echo $? 6. Will run as daemon mode, which means it will send request to each regionserver periodically ./bin/hbase org.apache.hadoop.hbase.tool.RegionServerCanary -daemon -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7525) A canary monitoring program specifically for regionserver
[ https://issues.apache.org/jira/browse/HBASE-7525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13646632#comment-13646632 ] Matteo Bertozzi commented on HBASE-7525: yeah, it seems a good solution. add also scan.setMaxResultSize(1) A canary monitoring program specifically for regionserver - Key: HBASE-7525 URL: https://issues.apache.org/jira/browse/HBASE-7525 Project: HBase Issue Type: New Feature Components: monitoring Affects Versions: 0.94.0 Reporter: takeshi.miao Priority: Minor Fix For: 0.95.0 Attachments: HBASE-7525-0.95-v0.patch, HBASE-7525-0.95-v1.patch, HBASE-7525-v0.patch, RegionServerCanary.java *Motivation* This ticket is to provide a canary monitoring tool specifically for HRegionserver, details as follows 1. This tool is required by operation team due to they thought that the canary for each region of a HBase is too many for them, so I implemented this coarse-granular one based on the original o.a.h.h.tool.Canary for them 2. And this tool is implemented by multi-threading, which means the each Get request sent by a thread. the reason I use this way is due to we suffered the region server hung issue by now the root cause is still not clear. so this tool can help operation team to detect hung region server if any. *example* 1. the tool docs ./bin/hbase org.apache.hadoop.hbase.tool.RegionServerCanary -help Usage: [opts] [regionServerName 1 [regionServrName 2...]] regionServerName - FQDN serverName, can use linux command:hostname -f to check your serverName where [-opts] are: -help Show this help and exit. -eUse regionServerName as regular expression which means the regionServerName is regular expression pattern -f B stop whole program if first error occurs, default is true -t N timeout for a check, default is 60 (milisecs) -daemonContinuous check at defined intervals. -interval N Interval between checks (sec) 2. Will send a request to each regionserver in a HBase cluster ./bin/hbase org.apache.hadoop.hbase.tool.RegionServerCanary 3. Will send a request to a regionserver by given name ./bin/hbase org.apache.hadoop.hbase.tool.RegionServerCanary rs1.domainname 4. Will send a request to regionserver(s) by given regular-expression /opt/trend/circus-opstool/bin/hbase-canary-monitor-each-regionserver.sh -e rs1.domainname.pattern // another example ./bin/hbase org.apache.hadoop.hbase.tool.RegionServerCanary -e tw-poc-tm-puppet-hdn[0-9]\{1,2\}.client.tw.trendnet.org 5. Will send a request to a regionserver and also set a timeout limit for this test // query regionserver:rs1.domainname with timeout limit 10sec // -f false, means that will not exit this program even test failed ./bin/hbase org.apache.hadoop.hbase.tool.RegionServerCanary -f false -t 1 rs1.domainname // echo 1 if timeout echo $? 6. Will run as daemon mode, which means it will send request to each regionserver periodically ./bin/hbase org.apache.hadoop.hbase.tool.RegionServerCanary -daemon -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7525) A canary monitoring program specifically for regionserver
[ https://issues.apache.org/jira/browse/HBASE-7525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13646773#comment-13646773 ] Hadoop QA commented on HBASE-7525: -- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12581374/HBASE-7525-0.95-v3.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 hadoop1.0{color}. The patch compiles against the hadoop 1.0 profile. {color:green}+1 hadoop2.0{color}. The patch compiles against the hadoop 2.0 profile. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 lineLengths{color}. The patch does not introduce lines longer than 100 {color:green}+1 site{color}. The mvn site goal succeeds with this patch. {color:red}-1 core tests{color}. The patch failed these unit tests: org.apache.hadoop.hbase.util.TestHBaseFsck Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/5520//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/5520//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/5520//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/5520//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/5520//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/5520//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/5520//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/5520//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/5520//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/5520//console This message is automatically generated. A canary monitoring program specifically for regionserver - Key: HBASE-7525 URL: https://issues.apache.org/jira/browse/HBASE-7525 Project: HBase Issue Type: New Feature Components: monitoring Affects Versions: 0.94.0 Reporter: takeshi.miao Priority: Minor Fix For: 0.95.0 Attachments: HBASE-7525-0.95-v0.patch, HBASE-7525-0.95-v1.patch, HBASE-7525-0.95-v3.patch, HBASE-7525-v0.patch, RegionServerCanary.java *Motivation* This ticket is to provide a canary monitoring tool specifically for HRegionserver, details as follows 1. This tool is required by operation team due to they thought that the canary for each region of a HBase is too many for them, so I implemented this coarse-granular one based on the original o.a.h.h.tool.Canary for them 2. And this tool is implemented by multi-threading, which means the each Get request sent by a thread. the reason I use this way is due to we suffered the region server hung issue by now the root cause is still not clear. so this tool can help operation team to detect hung region server if any. *example* 1. the tool docs ./bin/hbase org.apache.hadoop.hbase.tool.RegionServerCanary -help Usage: [opts] [regionServerName 1 [regionServrName 2...]] regionServerName - FQDN serverName, can use linux command:hostname -f to check your serverName where [-opts] are: -help Show this help and exit. -eUse regionServerName as regular expression which means the regionServerName is regular expression pattern -f B stop whole program if first error occurs, default is true -t N timeout for a check, default is 60
[jira] [Commented] (HBASE-7525) A canary monitoring program specifically for regionserver
[ https://issues.apache.org/jira/browse/HBASE-7525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13645465#comment-13645465 ] Matteo Bertozzi commented on HBASE-7525: at first look it seems ok, as the Hadoop QA has reported there's one line over 100 in printUsageAndExit() -f B There're some names that may be changed, and a one line doc on the long ones may be useful Monitor.isError() maybe rename to hasError() Monitor.initialAdmin() maybe initAdmin() Monitor.doPrepareFilteredRegionServerAndRegionsMap() is supposed to be something like filterRegionServerByName ? give me some more time to try it and get back with other feedback A canary monitoring program specifically for regionserver - Key: HBASE-7525 URL: https://issues.apache.org/jira/browse/HBASE-7525 Project: HBase Issue Type: New Feature Components: monitoring Affects Versions: 0.94.0 Reporter: takeshi.miao Priority: Minor Fix For: 0.95.0 Attachments: HBASE-7525-0.95-v0.patch, HBASE-7525-v0.patch, RegionServerCanary.java *Motivation* This ticket is to provide a canary monitoring tool specifically for HRegionserver, details as follows 1. This tool is required by operation team due to they thought that the canary for each region of a HBase is too many for them, so I implemented this coarse-granular one based on the original o.a.h.h.tool.Canary for them 2. And this tool is implemented by multi-threading, which means the each Get request sent by a thread. the reason I use this way is due to we suffered the region server hung issue by now the root cause is still not clear. so this tool can help operation team to detect hung region server if any. *example* 1. the tool docs ./bin/hbase org.apache.hadoop.hbase.tool.RegionServerCanary -help Usage: [opts] [regionServerName 1 [regionServrName 2...]] regionServerName - FQDN serverName, can use linux command:hostname -f to check your serverName where [-opts] are: -help Show this help and exit. -eUse regionServerName as regular expression which means the regionServerName is regular expression pattern -f B stop whole program if first error occurs, default is true -t N timeout for a check, default is 60 (milisecs) -daemonContinuous check at defined intervals. -interval N Interval between checks (sec) 2. Will send a request to each regionserver in a HBase cluster ./bin/hbase org.apache.hadoop.hbase.tool.RegionServerCanary 3. Will send a request to a regionserver by given name ./bin/hbase org.apache.hadoop.hbase.tool.RegionServerCanary rs1.domainname 4. Will send a request to regionserver(s) by given regular-expression /opt/trend/circus-opstool/bin/hbase-canary-monitor-each-regionserver.sh -e rs1.domainname.pattern // another example ./bin/hbase org.apache.hadoop.hbase.tool.RegionServerCanary -e tw-poc-tm-puppet-hdn[0-9]\{1,2\}.client.tw.trendnet.org 5. Will send a request to a regionserver and also set a timeout limit for this test // query regionserver:rs1.domainname with timeout limit 10sec // -f false, means that will not exit this program even test failed ./bin/hbase org.apache.hadoop.hbase.tool.RegionServerCanary -f false -t 1 rs1.domainname // echo 1 if timeout echo $? 6. Will run as daemon mode, which means it will send request to each regionserver periodically ./bin/hbase org.apache.hadoop.hbase.tool.RegionServerCanary -daemon -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7525) A canary monitoring program specifically for regionserver
[ https://issues.apache.org/jira/browse/HBASE-7525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13645593#comment-13645593 ] Matteo Bertozzi commented on HBASE-7525: There's a problem with the first regions on Get(region.getStartKey()) that throws an exception since the start key may be null. so you get a wrong report read from region xyz failed. Could you extract the list of names in the Canary.run() where you do all the other parse instead passing the args + index around? or at least add a comment of what is the args + index... it's a bit anonymous if you don't read the rest of the code... It's not documented in the coding style, but I don't like the Yoda Conditions (null != obj) we've already some of them in but most of the code is (obj != null) so my guess is that is better to stay inline with what we've already. A canary monitoring program specifically for regionserver - Key: HBASE-7525 URL: https://issues.apache.org/jira/browse/HBASE-7525 Project: HBase Issue Type: New Feature Components: monitoring Affects Versions: 0.94.0 Reporter: takeshi.miao Priority: Minor Fix For: 0.95.0 Attachments: HBASE-7525-0.95-v0.patch, HBASE-7525-v0.patch, RegionServerCanary.java *Motivation* This ticket is to provide a canary monitoring tool specifically for HRegionserver, details as follows 1. This tool is required by operation team due to they thought that the canary for each region of a HBase is too many for them, so I implemented this coarse-granular one based on the original o.a.h.h.tool.Canary for them 2. And this tool is implemented by multi-threading, which means the each Get request sent by a thread. the reason I use this way is due to we suffered the region server hung issue by now the root cause is still not clear. so this tool can help operation team to detect hung region server if any. *example* 1. the tool docs ./bin/hbase org.apache.hadoop.hbase.tool.RegionServerCanary -help Usage: [opts] [regionServerName 1 [regionServrName 2...]] regionServerName - FQDN serverName, can use linux command:hostname -f to check your serverName where [-opts] are: -help Show this help and exit. -eUse regionServerName as regular expression which means the regionServerName is regular expression pattern -f B stop whole program if first error occurs, default is true -t N timeout for a check, default is 60 (milisecs) -daemonContinuous check at defined intervals. -interval N Interval between checks (sec) 2. Will send a request to each regionserver in a HBase cluster ./bin/hbase org.apache.hadoop.hbase.tool.RegionServerCanary 3. Will send a request to a regionserver by given name ./bin/hbase org.apache.hadoop.hbase.tool.RegionServerCanary rs1.domainname 4. Will send a request to regionserver(s) by given regular-expression /opt/trend/circus-opstool/bin/hbase-canary-monitor-each-regionserver.sh -e rs1.domainname.pattern // another example ./bin/hbase org.apache.hadoop.hbase.tool.RegionServerCanary -e tw-poc-tm-puppet-hdn[0-9]\{1,2\}.client.tw.trendnet.org 5. Will send a request to a regionserver and also set a timeout limit for this test // query regionserver:rs1.domainname with timeout limit 10sec // -f false, means that will not exit this program even test failed ./bin/hbase org.apache.hadoop.hbase.tool.RegionServerCanary -f false -t 1 rs1.domainname // echo 1 if timeout echo $? 6. Will run as daemon mode, which means it will send request to each regionserver periodically ./bin/hbase org.apache.hadoop.hbase.tool.RegionServerCanary -daemon -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7525) A canary monitoring program specifically for regionserver
[ https://issues.apache.org/jira/browse/HBASE-7525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13645604#comment-13645604 ] takeshi.miao commented on HBASE-7525: - Do I need to do any follow up ? this is my first time to contribute to community, please remind me if I miss any thing, thanks a lot A canary monitoring program specifically for regionserver - Key: HBASE-7525 URL: https://issues.apache.org/jira/browse/HBASE-7525 Project: HBase Issue Type: New Feature Components: monitoring Affects Versions: 0.94.0 Reporter: takeshi.miao Priority: Minor Fix For: 0.95.0 Attachments: HBASE-7525-0.95-v0.patch, HBASE-7525-v0.patch, RegionServerCanary.java *Motivation* This ticket is to provide a canary monitoring tool specifically for HRegionserver, details as follows 1. This tool is required by operation team due to they thought that the canary for each region of a HBase is too many for them, so I implemented this coarse-granular one based on the original o.a.h.h.tool.Canary for them 2. And this tool is implemented by multi-threading, which means the each Get request sent by a thread. the reason I use this way is due to we suffered the region server hung issue by now the root cause is still not clear. so this tool can help operation team to detect hung region server if any. *example* 1. the tool docs ./bin/hbase org.apache.hadoop.hbase.tool.RegionServerCanary -help Usage: [opts] [regionServerName 1 [regionServrName 2...]] regionServerName - FQDN serverName, can use linux command:hostname -f to check your serverName where [-opts] are: -help Show this help and exit. -eUse regionServerName as regular expression which means the regionServerName is regular expression pattern -f B stop whole program if first error occurs, default is true -t N timeout for a check, default is 60 (milisecs) -daemonContinuous check at defined intervals. -interval N Interval between checks (sec) 2. Will send a request to each regionserver in a HBase cluster ./bin/hbase org.apache.hadoop.hbase.tool.RegionServerCanary 3. Will send a request to a regionserver by given name ./bin/hbase org.apache.hadoop.hbase.tool.RegionServerCanary rs1.domainname 4. Will send a request to regionserver(s) by given regular-expression /opt/trend/circus-opstool/bin/hbase-canary-monitor-each-regionserver.sh -e rs1.domainname.pattern // another example ./bin/hbase org.apache.hadoop.hbase.tool.RegionServerCanary -e tw-poc-tm-puppet-hdn[0-9]\{1,2\}.client.tw.trendnet.org 5. Will send a request to a regionserver and also set a timeout limit for this test // query regionserver:rs1.domainname with timeout limit 10sec // -f false, means that will not exit this program even test failed ./bin/hbase org.apache.hadoop.hbase.tool.RegionServerCanary -f false -t 1 rs1.domainname // echo 1 if timeout echo $? 6. Will run as daemon mode, which means it will send request to each regionserver periodically ./bin/hbase org.apache.hadoop.hbase.tool.RegionServerCanary -daemon -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7525) A canary monitoring program specifically for regionserver
[ https://issues.apache.org/jira/browse/HBASE-7525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13645622#comment-13645622 ] takeshi.miao commented on HBASE-7525: - [~mbertozzi], I got it, I will modify codes with the issues you talked about. I may ask more questions if anything need to be confirm with you, thanks a lot. A canary monitoring program specifically for regionserver - Key: HBASE-7525 URL: https://issues.apache.org/jira/browse/HBASE-7525 Project: HBase Issue Type: New Feature Components: monitoring Affects Versions: 0.94.0 Reporter: takeshi.miao Priority: Minor Fix For: 0.95.0 Attachments: HBASE-7525-0.95-v0.patch, HBASE-7525-v0.patch, RegionServerCanary.java *Motivation* This ticket is to provide a canary monitoring tool specifically for HRegionserver, details as follows 1. This tool is required by operation team due to they thought that the canary for each region of a HBase is too many for them, so I implemented this coarse-granular one based on the original o.a.h.h.tool.Canary for them 2. And this tool is implemented by multi-threading, which means the each Get request sent by a thread. the reason I use this way is due to we suffered the region server hung issue by now the root cause is still not clear. so this tool can help operation team to detect hung region server if any. *example* 1. the tool docs ./bin/hbase org.apache.hadoop.hbase.tool.RegionServerCanary -help Usage: [opts] [regionServerName 1 [regionServrName 2...]] regionServerName - FQDN serverName, can use linux command:hostname -f to check your serverName where [-opts] are: -help Show this help and exit. -eUse regionServerName as regular expression which means the regionServerName is regular expression pattern -f B stop whole program if first error occurs, default is true -t N timeout for a check, default is 60 (milisecs) -daemonContinuous check at defined intervals. -interval N Interval between checks (sec) 2. Will send a request to each regionserver in a HBase cluster ./bin/hbase org.apache.hadoop.hbase.tool.RegionServerCanary 3. Will send a request to a regionserver by given name ./bin/hbase org.apache.hadoop.hbase.tool.RegionServerCanary rs1.domainname 4. Will send a request to regionserver(s) by given regular-expression /opt/trend/circus-opstool/bin/hbase-canary-monitor-each-regionserver.sh -e rs1.domainname.pattern // another example ./bin/hbase org.apache.hadoop.hbase.tool.RegionServerCanary -e tw-poc-tm-puppet-hdn[0-9]\{1,2\}.client.tw.trendnet.org 5. Will send a request to a regionserver and also set a timeout limit for this test // query regionserver:rs1.domainname with timeout limit 10sec // -f false, means that will not exit this program even test failed ./bin/hbase org.apache.hadoop.hbase.tool.RegionServerCanary -f false -t 1 rs1.domainname // echo 1 if timeout echo $? 6. Will run as daemon mode, which means it will send request to each regionserver periodically ./bin/hbase org.apache.hadoop.hbase.tool.RegionServerCanary -daemon -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7525) A canary monitoring program specifically for regionserver
[ https://issues.apache.org/jira/browse/HBASE-7525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13644319#comment-13644319 ] Hadoop QA commented on HBASE-7525: -- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12577525/HBASE-7525-0.95-v0.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 hadoop1.0{color}. The patch compiles against the hadoop 1.0 profile. {color:green}+1 hadoop2.0{color}. The patch compiles against the hadoop 2.0 profile. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:red}-1 release audit{color}. The applied patch generated 1 release audit warnings (more than the trunk's current 0 warnings). {color:red}-1 lineLengths{color}. The patch introduces lines longer than 100 {color:green}+1 site{color}. The mvn site goal succeeds with this patch. {color:red}-1 core tests{color}. The patch failed these unit tests: org.apache.hadoop.hbase.backup.TestHFileArchiving Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/5481//testReport/ Release audit warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/5481//artifact/trunk/patchprocess/patchReleaseAuditProblems.txt Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/5481//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/5481//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/5481//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/5481//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/5481//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/5481//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/5481//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/5481//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/5481//console This message is automatically generated. A canary monitoring program specifically for regionserver - Key: HBASE-7525 URL: https://issues.apache.org/jira/browse/HBASE-7525 Project: HBase Issue Type: New Feature Components: monitoring Affects Versions: 0.94.0 Reporter: takeshi.miao Priority: Minor Fix For: 0.95.0 Attachments: HBASE-7525-0.95-v0.patch, HBASE-7525-v0.patch, RegionServerCanary.java *Motivation* This ticket is to provide a canary monitoring tool specifically for HRegionserver, details as follows 1. This tool is required by operation team due to they thought that the canary for each region of a HBase is too many for them, so I implemented this coarse-granular one based on the original o.a.h.h.tool.Canary for them 2. And this tool is implemented by multi-threading, which means the each Get request sent by a thread. the reason I use this way is due to we suffered the region server hung issue by now the root cause is still not clear. so this tool can help operation team to detect hung region server if any. *example* 1. the tool docs ./bin/hbase org.apache.hadoop.hbase.tool.RegionServerCanary -help Usage: [opts] [regionServerName 1 [regionServrName 2...]] regionServerName - FQDN serverName, can use linux command:hostname -f to check your serverName where [-opts] are: -help Show this help and exit. -eUse regionServerName as regular expression which means the regionServerName is regular expression pattern -f B stop whole
[jira] [Commented] (HBASE-7525) A canary monitoring program specifically for regionserver
[ https://issues.apache.org/jira/browse/HBASE-7525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13644705#comment-13644705 ] stack commented on HBASE-7525: -- [~mbertozzi] You mind taking a looksee boss? Its mods your Canary program. A canary monitoring program specifically for regionserver - Key: HBASE-7525 URL: https://issues.apache.org/jira/browse/HBASE-7525 Project: HBase Issue Type: New Feature Components: monitoring Affects Versions: 0.94.0 Reporter: takeshi.miao Priority: Minor Fix For: 0.95.0 Attachments: HBASE-7525-0.95-v0.patch, HBASE-7525-v0.patch, RegionServerCanary.java *Motivation* This ticket is to provide a canary monitoring tool specifically for HRegionserver, details as follows 1. This tool is required by operation team due to they thought that the canary for each region of a HBase is too many for them, so I implemented this coarse-granular one based on the original o.a.h.h.tool.Canary for them 2. And this tool is implemented by multi-threading, which means the each Get request sent by a thread. the reason I use this way is due to we suffered the region server hung issue by now the root cause is still not clear. so this tool can help operation team to detect hung region server if any. *example* 1. the tool docs ./bin/hbase org.apache.hadoop.hbase.tool.RegionServerCanary -help Usage: [opts] [regionServerName 1 [regionServrName 2...]] regionServerName - FQDN serverName, can use linux command:hostname -f to check your serverName where [-opts] are: -help Show this help and exit. -eUse regionServerName as regular expression which means the regionServerName is regular expression pattern -f B stop whole program if first error occurs, default is true -t N timeout for a check, default is 60 (milisecs) -daemonContinuous check at defined intervals. -interval N Interval between checks (sec) 2. Will send a request to each regionserver in a HBase cluster ./bin/hbase org.apache.hadoop.hbase.tool.RegionServerCanary 3. Will send a request to a regionserver by given name ./bin/hbase org.apache.hadoop.hbase.tool.RegionServerCanary rs1.domainname 4. Will send a request to regionserver(s) by given regular-expression /opt/trend/circus-opstool/bin/hbase-canary-monitor-each-regionserver.sh -e rs1.domainname.pattern // another example ./bin/hbase org.apache.hadoop.hbase.tool.RegionServerCanary -e tw-poc-tm-puppet-hdn[0-9]\{1,2\}.client.tw.trendnet.org 5. Will send a request to a regionserver and also set a timeout limit for this test // query regionserver:rs1.domainname with timeout limit 10sec // -f false, means that will not exit this program even test failed ./bin/hbase org.apache.hadoop.hbase.tool.RegionServerCanary -f false -t 1 rs1.domainname // echo 1 if timeout echo $? 6. Will run as daemon mode, which means it will send request to each regionserver periodically ./bin/hbase org.apache.hadoop.hbase.tool.RegionServerCanary -daemon -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7525) A canary monitoring program specifically for regionserver
[ https://issues.apache.org/jira/browse/HBASE-7525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13625336#comment-13625336 ] takeshi.miao commented on HBASE-7525: - I also tested it with hbase-0.95 branch A canary monitoring program specifically for regionserver - Key: HBASE-7525 URL: https://issues.apache.org/jira/browse/HBASE-7525 Project: HBase Issue Type: New Feature Components: monitoring Affects Versions: 0.94.0 Reporter: takeshi.miao Priority: Minor Attachments: HBASE-7525-0.95-v0.patch, HBASE-7525-v0.patch, RegionServerCanary.java *Motivation* This ticket is to provide a canary monitoring tool specifically for HRegionserver, details as follows 1. This tool is required by operation team due to they thought that the canary for each region of a HBase is too many for them, so I implemented this coarse-granular one based on the original o.a.h.h.tool.Canary for them 2. And this tool is implemented by multi-threading, which means the each Get request sent by a thread. the reason I use this way is due to we suffered the region server hung issue by now the root cause is still not clear. so this tool can help operation team to detect hung region server if any. *example* 1. the tool docs ./bin/hbase org.apache.hadoop.hbase.tool.RegionServerCanary -help Usage: [opts] [regionServerName 1 [regionServrName 2...]] regionServerName - FQDN serverName, can use linux command:hostname -f to check your serverName where [-opts] are: -help Show this help and exit. -eUse regionServerName as regular expression which means the regionServerName is regular expression pattern -f B stop whole program if first error occurs, default is true -t N timeout for a check, default is 60 (milisecs) -daemonContinuous check at defined intervals. -interval N Interval between checks (sec) 2. Will send a request to each regionserver in a HBase cluster ./bin/hbase org.apache.hadoop.hbase.tool.RegionServerCanary 3. Will send a request to a regionserver by given name ./bin/hbase org.apache.hadoop.hbase.tool.RegionServerCanary rs1.domainname 4. Will send a request to regionserver(s) by given regular-expression /opt/trend/circus-opstool/bin/hbase-canary-monitor-each-regionserver.sh -e rs1.domainname.pattern // another example ./bin/hbase org.apache.hadoop.hbase.tool.RegionServerCanary -e tw-poc-tm-puppet-hdn[0-9]\{1,2\}.client.tw.trendnet.org 5. Will send a request to a regionserver and also set a timeout limit for this test // query regionserver:rs1.domainname with timeout limit 10sec // -f false, means that will not exit this program even test failed ./bin/hbase org.apache.hadoop.hbase.tool.RegionServerCanary -f false -t 1 rs1.domainname // echo 1 if timeout echo $? 6. Will run as daemon mode, which means it will send request to each regionserver periodically ./bin/hbase org.apache.hadoop.hbase.tool.RegionServerCanary -daemon -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7525) A canary monitoring program specifically for regionserver
[ https://issues.apache.org/jira/browse/HBASE-7525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13625339#comment-13625339 ] takeshi.miao commented on HBASE-7525: - new usage output will like is... {Usage: bin/hbase org.apache.hadoop.hbase.tool.Canary [opts] [table/regionserver 1 [table/regionserver 2...]] where [opts] are: -help Show this help and exit. -regionserver replace the table argument to regionserver, which means to enable regionserver mode -daemonContinuous check at defined intervals. -interval N Interval between checks (sec) -e Use region/regionserver as regular expression which means the region/regionserver is regular expression pattern -f B stop whole program if first error occurs, default is true -t N timeout for a check, default is 60 (milisecs)} A canary monitoring program specifically for regionserver - Key: HBASE-7525 URL: https://issues.apache.org/jira/browse/HBASE-7525 Project: HBase Issue Type: New Feature Components: monitoring Affects Versions: 0.94.0 Reporter: takeshi.miao Priority: Minor Attachments: HBASE-7525-0.95-v0.patch, HBASE-7525-v0.patch, RegionServerCanary.java *Motivation* This ticket is to provide a canary monitoring tool specifically for HRegionserver, details as follows 1. This tool is required by operation team due to they thought that the canary for each region of a HBase is too many for them, so I implemented this coarse-granular one based on the original o.a.h.h.tool.Canary for them 2. And this tool is implemented by multi-threading, which means the each Get request sent by a thread. the reason I use this way is due to we suffered the region server hung issue by now the root cause is still not clear. so this tool can help operation team to detect hung region server if any. *example* 1. the tool docs ./bin/hbase org.apache.hadoop.hbase.tool.RegionServerCanary -help Usage: [opts] [regionServerName 1 [regionServrName 2...]] regionServerName - FQDN serverName, can use linux command:hostname -f to check your serverName where [-opts] are: -help Show this help and exit. -eUse regionServerName as regular expression which means the regionServerName is regular expression pattern -f B stop whole program if first error occurs, default is true -t N timeout for a check, default is 60 (milisecs) -daemonContinuous check at defined intervals. -interval N Interval between checks (sec) 2. Will send a request to each regionserver in a HBase cluster ./bin/hbase org.apache.hadoop.hbase.tool.RegionServerCanary 3. Will send a request to a regionserver by given name ./bin/hbase org.apache.hadoop.hbase.tool.RegionServerCanary rs1.domainname 4. Will send a request to regionserver(s) by given regular-expression /opt/trend/circus-opstool/bin/hbase-canary-monitor-each-regionserver.sh -e rs1.domainname.pattern // another example ./bin/hbase org.apache.hadoop.hbase.tool.RegionServerCanary -e tw-poc-tm-puppet-hdn[0-9]\{1,2\}.client.tw.trendnet.org 5. Will send a request to a regionserver and also set a timeout limit for this test // query regionserver:rs1.domainname with timeout limit 10sec // -f false, means that will not exit this program even test failed ./bin/hbase org.apache.hadoop.hbase.tool.RegionServerCanary -f false -t 1 rs1.domainname // echo 1 if timeout echo $? 6. Will run as daemon mode, which means it will send request to each regionserver periodically ./bin/hbase org.apache.hadoop.hbase.tool.RegionServerCanary -daemon -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7525) A canary monitoring program specifically for regionserver
[ https://issues.apache.org/jira/browse/HBASE-7525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13548542#comment-13548542 ] Jonathan Hsieh commented on HBASE-7525: --- Can you compare how is this related to HBASE-4393? A canary monitoring program specifically for regionserver - Key: HBASE-7525 URL: https://issues.apache.org/jira/browse/HBASE-7525 Project: HBase Issue Type: New Feature Components: monitoring Affects Versions: 0.94.0 Reporter: takeshi.miao Priority: Minor Fix For: 0.94.0 Attachments: HBASE-7525-v0.patch, RegionServerCanary.java *Motivation* This ticket is to provide a canary monitoring tool specifically for HRegionserver, details as follows 1. This tool is required by operation team due to they thought that the canary for each region of a HBase is too many for them, so I implemented this coarse-granular one based on the original o.a.h.h.tool.Canary for them 2. And this tool is implemented by multi-threading, which means the each Get request sent by a thread. the reason I use this way is due to we suffered the region server hung issue by now the root cause is still not clear. so this tool can help operation team to detect hung region server if any. *example* 1. the tool docs ./bin/hbase org.apache.hadoop.hbase.tool.RegionServerCanary -help Usage: [opts] [regionServerName 1 [regionServrName 2...]] regionServerName - FQDN serverName, can use linux command:hostname -f to check your serverName where [-opts] are: -help Show this help and exit. -eUse regionServerName as regular expression which means the regionServerName is regular expression pattern -f B stop whole program if first error occurs, default is true -t N timeout for a check, default is 60 (milisecs) -daemonContinuous check at defined intervals. -interval N Interval between checks (sec) 2. Will send a request to each regionserver in a HBase cluster ./bin/hbase org.apache.hadoop.hbase.tool.RegionServerCanary 3. Will send a request to a regionserver by given name ./bin/hbase org.apache.hadoop.hbase.tool.RegionServerCanary rs1.domainname 4. Will send a request to regionserver(s) by given regular-expression /opt/trend/circus-opstool/bin/hbase-canary-monitor-each-regionserver.sh -e rs1.domainname.pattern // another example ./bin/hbase org.apache.hadoop.hbase.tool.RegionServerCanary -e tw-poc-tm-puppet-hdn[0-9]\{1,2\}.client.tw.trendnet.org 5. Will send a request to a regionserver and also set a timeout limit for this test // query regionserver:rs1.domainname with timeout limit 10sec // -f false, means that will not exit this program even test failed ./bin/hbase org.apache.hadoop.hbase.tool.RegionServerCanary -f false -t 1 rs1.domainname // echo 1 if timeout echo $? 6. Will run as daemon mode, which means it will send request to each regionserver periodically ./bin/hbase org.apache.hadoop.hbase.tool.RegionServerCanary -daemon -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7525) A canary monitoring program specifically for regionserver
[ https://issues.apache.org/jira/browse/HBASE-7525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13548894#comment-13548894 ] Andrew Purtell commented on HBASE-7525: --- The ideas are good, curious if it's possible to submit this as an incremental change on the existing utility? A canary monitoring program specifically for regionserver - Key: HBASE-7525 URL: https://issues.apache.org/jira/browse/HBASE-7525 Project: HBase Issue Type: New Feature Components: monitoring Affects Versions: 0.94.0 Reporter: takeshi.miao Priority: Minor Fix For: 0.94.0 Attachments: HBASE-7525-v0.patch, RegionServerCanary.java *Motivation* This ticket is to provide a canary monitoring tool specifically for HRegionserver, details as follows 1. This tool is required by operation team due to they thought that the canary for each region of a HBase is too many for them, so I implemented this coarse-granular one based on the original o.a.h.h.tool.Canary for them 2. And this tool is implemented by multi-threading, which means the each Get request sent by a thread. the reason I use this way is due to we suffered the region server hung issue by now the root cause is still not clear. so this tool can help operation team to detect hung region server if any. *example* 1. the tool docs ./bin/hbase org.apache.hadoop.hbase.tool.RegionServerCanary -help Usage: [opts] [regionServerName 1 [regionServrName 2...]] regionServerName - FQDN serverName, can use linux command:hostname -f to check your serverName where [-opts] are: -help Show this help and exit. -eUse regionServerName as regular expression which means the regionServerName is regular expression pattern -f B stop whole program if first error occurs, default is true -t N timeout for a check, default is 60 (milisecs) -daemonContinuous check at defined intervals. -interval N Interval between checks (sec) 2. Will send a request to each regionserver in a HBase cluster ./bin/hbase org.apache.hadoop.hbase.tool.RegionServerCanary 3. Will send a request to a regionserver by given name ./bin/hbase org.apache.hadoop.hbase.tool.RegionServerCanary rs1.domainname 4. Will send a request to regionserver(s) by given regular-expression /opt/trend/circus-opstool/bin/hbase-canary-monitor-each-regionserver.sh -e rs1.domainname.pattern // another example ./bin/hbase org.apache.hadoop.hbase.tool.RegionServerCanary -e tw-poc-tm-puppet-hdn[0-9]\{1,2\}.client.tw.trendnet.org 5. Will send a request to a regionserver and also set a timeout limit for this test // query regionserver:rs1.domainname with timeout limit 10sec // -f false, means that will not exit this program even test failed ./bin/hbase org.apache.hadoop.hbase.tool.RegionServerCanary -f false -t 1 rs1.domainname // echo 1 if timeout echo $? 6. Will run as daemon mode, which means it will send request to each regionserver periodically ./bin/hbase org.apache.hadoop.hbase.tool.RegionServerCanary -daemon -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7525) A canary monitoring program specifically for regionserver
[ https://issues.apache.org/jira/browse/HBASE-7525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13549310#comment-13549310 ] takeshi.miao commented on HBASE-7525: - This is for Jonathan Hsieh's question There are 4 differences compared with #HBASE-4393 1. this tool will take any one region from each region server to monitor, not every region in whole HBase cluster 2. this tool was implemented with multi-threaded feature, so it will not be blocked if any region server being hung 3. this tool is taking one or more region server FQDN as options, then will monitor the given region servers 3.1 monitor all region servers if no option given 4. this tool can also take one or more regular expression patterns for region server FQDN for user easily use I use this tool on our internal HBase operation, so I think that other people may have the identical requirements A canary monitoring program specifically for regionserver - Key: HBASE-7525 URL: https://issues.apache.org/jira/browse/HBASE-7525 Project: HBase Issue Type: New Feature Components: monitoring Affects Versions: 0.94.0 Reporter: takeshi.miao Priority: Minor Fix For: 0.94.0 Attachments: HBASE-7525-v0.patch, RegionServerCanary.java *Motivation* This ticket is to provide a canary monitoring tool specifically for HRegionserver, details as follows 1. This tool is required by operation team due to they thought that the canary for each region of a HBase is too many for them, so I implemented this coarse-granular one based on the original o.a.h.h.tool.Canary for them 2. And this tool is implemented by multi-threading, which means the each Get request sent by a thread. the reason I use this way is due to we suffered the region server hung issue by now the root cause is still not clear. so this tool can help operation team to detect hung region server if any. *example* 1. the tool docs ./bin/hbase org.apache.hadoop.hbase.tool.RegionServerCanary -help Usage: [opts] [regionServerName 1 [regionServrName 2...]] regionServerName - FQDN serverName, can use linux command:hostname -f to check your serverName where [-opts] are: -help Show this help and exit. -eUse regionServerName as regular expression which means the regionServerName is regular expression pattern -f B stop whole program if first error occurs, default is true -t N timeout for a check, default is 60 (milisecs) -daemonContinuous check at defined intervals. -interval N Interval between checks (sec) 2. Will send a request to each regionserver in a HBase cluster ./bin/hbase org.apache.hadoop.hbase.tool.RegionServerCanary 3. Will send a request to a regionserver by given name ./bin/hbase org.apache.hadoop.hbase.tool.RegionServerCanary rs1.domainname 4. Will send a request to regionserver(s) by given regular-expression /opt/trend/circus-opstool/bin/hbase-canary-monitor-each-regionserver.sh -e rs1.domainname.pattern // another example ./bin/hbase org.apache.hadoop.hbase.tool.RegionServerCanary -e tw-poc-tm-puppet-hdn[0-9]\{1,2\}.client.tw.trendnet.org 5. Will send a request to a regionserver and also set a timeout limit for this test // query regionserver:rs1.domainname with timeout limit 10sec // -f false, means that will not exit this program even test failed ./bin/hbase org.apache.hadoop.hbase.tool.RegionServerCanary -f false -t 1 rs1.domainname // echo 1 if timeout echo $? 6. Will run as daemon mode, which means it will send request to each regionserver periodically ./bin/hbase org.apache.hadoop.hbase.tool.RegionServerCanary -daemon -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7525) A canary monitoring program specifically for regionserver
[ https://issues.apache.org/jira/browse/HBASE-7525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13549314#comment-13549314 ] takeshi.miao commented on HBASE-7525: - Andrew Purtell, yes, I can merge this tool with o.a.h.h.tool.Canary in these couple days. Do I need to issue a new ticket for this merge work ? A canary monitoring program specifically for regionserver - Key: HBASE-7525 URL: https://issues.apache.org/jira/browse/HBASE-7525 Project: HBase Issue Type: New Feature Components: monitoring Affects Versions: 0.94.0 Reporter: takeshi.miao Priority: Minor Fix For: 0.94.0 Attachments: HBASE-7525-v0.patch, RegionServerCanary.java *Motivation* This ticket is to provide a canary monitoring tool specifically for HRegionserver, details as follows 1. This tool is required by operation team due to they thought that the canary for each region of a HBase is too many for them, so I implemented this coarse-granular one based on the original o.a.h.h.tool.Canary for them 2. And this tool is implemented by multi-threading, which means the each Get request sent by a thread. the reason I use this way is due to we suffered the region server hung issue by now the root cause is still not clear. so this tool can help operation team to detect hung region server if any. *example* 1. the tool docs ./bin/hbase org.apache.hadoop.hbase.tool.RegionServerCanary -help Usage: [opts] [regionServerName 1 [regionServrName 2...]] regionServerName - FQDN serverName, can use linux command:hostname -f to check your serverName where [-opts] are: -help Show this help and exit. -eUse regionServerName as regular expression which means the regionServerName is regular expression pattern -f B stop whole program if first error occurs, default is true -t N timeout for a check, default is 60 (milisecs) -daemonContinuous check at defined intervals. -interval N Interval between checks (sec) 2. Will send a request to each regionserver in a HBase cluster ./bin/hbase org.apache.hadoop.hbase.tool.RegionServerCanary 3. Will send a request to a regionserver by given name ./bin/hbase org.apache.hadoop.hbase.tool.RegionServerCanary rs1.domainname 4. Will send a request to regionserver(s) by given regular-expression /opt/trend/circus-opstool/bin/hbase-canary-monitor-each-regionserver.sh -e rs1.domainname.pattern // another example ./bin/hbase org.apache.hadoop.hbase.tool.RegionServerCanary -e tw-poc-tm-puppet-hdn[0-9]\{1,2\}.client.tw.trendnet.org 5. Will send a request to a regionserver and also set a timeout limit for this test // query regionserver:rs1.domainname with timeout limit 10sec // -f false, means that will not exit this program even test failed ./bin/hbase org.apache.hadoop.hbase.tool.RegionServerCanary -f false -t 1 rs1.domainname // echo 1 if timeout echo $? 6. Will run as daemon mode, which means it will send request to each regionserver periodically ./bin/hbase org.apache.hadoop.hbase.tool.RegionServerCanary -daemon -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7525) A canary monitoring program specifically for regionserver
[ https://issues.apache.org/jira/browse/HBASE-7525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13549319#comment-13549319 ] Andrew Purtell commented on HBASE-7525: --- bq. Do I need to issue a new ticket for this merge work ? This ticket is ok. A canary monitoring program specifically for regionserver - Key: HBASE-7525 URL: https://issues.apache.org/jira/browse/HBASE-7525 Project: HBase Issue Type: New Feature Components: monitoring Affects Versions: 0.94.0 Reporter: takeshi.miao Priority: Minor Fix For: 0.94.0 Attachments: HBASE-7525-v0.patch, RegionServerCanary.java *Motivation* This ticket is to provide a canary monitoring tool specifically for HRegionserver, details as follows 1. This tool is required by operation team due to they thought that the canary for each region of a HBase is too many for them, so I implemented this coarse-granular one based on the original o.a.h.h.tool.Canary for them 2. And this tool is implemented by multi-threading, which means the each Get request sent by a thread. the reason I use this way is due to we suffered the region server hung issue by now the root cause is still not clear. so this tool can help operation team to detect hung region server if any. *example* 1. the tool docs ./bin/hbase org.apache.hadoop.hbase.tool.RegionServerCanary -help Usage: [opts] [regionServerName 1 [regionServrName 2...]] regionServerName - FQDN serverName, can use linux command:hostname -f to check your serverName where [-opts] are: -help Show this help and exit. -eUse regionServerName as regular expression which means the regionServerName is regular expression pattern -f B stop whole program if first error occurs, default is true -t N timeout for a check, default is 60 (milisecs) -daemonContinuous check at defined intervals. -interval N Interval between checks (sec) 2. Will send a request to each regionserver in a HBase cluster ./bin/hbase org.apache.hadoop.hbase.tool.RegionServerCanary 3. Will send a request to a regionserver by given name ./bin/hbase org.apache.hadoop.hbase.tool.RegionServerCanary rs1.domainname 4. Will send a request to regionserver(s) by given regular-expression /opt/trend/circus-opstool/bin/hbase-canary-monitor-each-regionserver.sh -e rs1.domainname.pattern // another example ./bin/hbase org.apache.hadoop.hbase.tool.RegionServerCanary -e tw-poc-tm-puppet-hdn[0-9]\{1,2\}.client.tw.trendnet.org 5. Will send a request to a regionserver and also set a timeout limit for this test // query regionserver:rs1.domainname with timeout limit 10sec // -f false, means that will not exit this program even test failed ./bin/hbase org.apache.hadoop.hbase.tool.RegionServerCanary -f false -t 1 rs1.domainname // echo 1 if timeout echo $? 6. Will run as daemon mode, which means it will send request to each regionserver periodically ./bin/hbase org.apache.hadoop.hbase.tool.RegionServerCanary -daemon -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira