[jira] [Created] (HBASE-17598) new archetypes apparently not deployed to Maven Central Repository
Daniel Vimont created HBASE-17598: - Summary: new archetypes apparently not deployed to Maven Central Repository Key: HBASE-17598 URL: https://issues.apache.org/jira/browse/HBASE-17598 Project: HBase Issue Type: Bug Components: Usability Affects Versions: 1.3.0, 1.3.1 Reporter: Daniel Vimont On search.maven.org, one sees that most of the artifacts for the 1.3.0 release have been successfully deployed, but neither of the two new archetypes can apparently be found there: {{hbase-client-project-archetype}} {{hbase-shaded-client-project-archetype}} We do see the projects used to *build* the archetypes (which actually probably don't need to be published via Maven), but we do not see the archetypes themselves. For a specific example, the following artifacts (POM and JAR) need to have been deployed for the first archetype listed above: {{hbase/hbase-archetypes/hbase-client-project/target/build-archetype/target/generated-sources/archetype/pom.xml}} {{hbase/hbase-archetypes/hbase-client-project/target/build-archetype/target/generated-sources/archetype/target/hbase-client-project-archetype-1.3.0.jar}} (The awkwardly deep directory structures are what the Maven archetype-generation tools generate during the build of hbase.) BTW, the Maven documentation on the deployment of archetypes amounts to a single sentence: "Once you are happy with the state of your archetype, you can deploy (or submit it to Maven Central) it as any other artifact and the archetype will then be available to any user of Maven." That sentence is found on this page: https://maven.apache.org/guides/mini/guide-creating-archetypes.html -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (HBASE-17597) TestMetaWithReplicas.testMetaTableReplicaAssignment is flaky
Duo Zhang created HBASE-17597: - Summary: TestMetaWithReplicas.testMetaTableReplicaAssignment is flaky Key: HBASE-17597 URL: https://issues.apache.org/jira/browse/HBASE-17597 Project: HBase Issue Type: Bug Components: test Affects Versions: 1.4.0 Reporter: Duo Zhang Fix For: 1.4.0 The problem is we get NPE when getting ServerName from HRegionLocation. I think this is a test issue, not something wrong with our code. The location of meta region is fetched from zk and it could be null if the region has not been assigned yet. We should deal with null HRegionLocation in the test code. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
Re: Canary Test Tool and write sniffing
Brief search on HBASE-4393 didn't reveal why the interval was shortened. If you read the first paragraph of: http://hbase.apache.org/book.html#_run_canary_test_as_daemon_mode possibly the reasoning was that canary would exit upon seeing some error (the first time). BTW There was a mismatch in the description for this command: (5 seconds vs. 5 milliseconds) ${HBASE_HOME}/bin/hbase canary -daemon -interval 5 -f false On Sat, Feb 4, 2017 at 8:21 AM, Lars George wrote: > Oh right, Ted. An earlier patch attached to the JIRA had 60 secs, the > last one has 6 secs. Am I reading this right? It hands 6000 into the > Thread.sleep() call, which takes millisecs. So that makes 6 secs > between checks, which seems super short, no? I might just dull here. > > On Sat, Feb 4, 2017 at 5:00 PM, Ted Yu wrote: > > For the default interval , if you were looking at: > > > > private static final long DEFAULT_INTERVAL = 6000; > > > > The above was from: > > > > HBASE-4393 Implement a canary monitoring program > > > > which was integrated on Tue Apr 24 07:20:16 2012 > > > > FYI > > > > On Sat, Feb 4, 2017 at 4:06 AM, Lars George > wrote: > > > >> Also, the default interval used to be 60 secs, but is now 6 secs. Does > >> that make sense? Seems awfully short for a default, assuming you have > >> many regions or servers. > >> > >> On Sat, Feb 4, 2017 at 11:54 AM, Lars George > >> wrote: > >> > Hi, > >> > > >> > Looking at the Canary tool, it tries to ensure that all canary test > >> > table regions are spread across all region servers. If that is not the > >> > case, it calls: > >> > > >> > if (numberOfCoveredServers < numberOfServers) { > >> > admin.balancer(); > >> > } > >> > > >> > I doubt this will help with the StochasticLoadBalancer, which is known > >> > to consider per-table balancing as one of many factors. In practice, > >> > the SLB will most likely _not_ distribute the canary regions > >> > sufficiently, leaving gap in the check. Switching on the per-table > >> > option is discouraged against to let it do its thing. > >> > > >> > Just pointing it out for vetting. > >> > > >> > Lars > >> >
Re: Canary Test Tool and write sniffing
Oh right, Ted. An earlier patch attached to the JIRA had 60 secs, the last one has 6 secs. Am I reading this right? It hands 6000 into the Thread.sleep() call, which takes millisecs. So that makes 6 secs between checks, which seems super short, no? I might just dull here. On Sat, Feb 4, 2017 at 5:00 PM, Ted Yu wrote: > For the default interval , if you were looking at: > > private static final long DEFAULT_INTERVAL = 6000; > > The above was from: > > HBASE-4393 Implement a canary monitoring program > > which was integrated on Tue Apr 24 07:20:16 2012 > > FYI > > On Sat, Feb 4, 2017 at 4:06 AM, Lars George wrote: > >> Also, the default interval used to be 60 secs, but is now 6 secs. Does >> that make sense? Seems awfully short for a default, assuming you have >> many regions or servers. >> >> On Sat, Feb 4, 2017 at 11:54 AM, Lars George >> wrote: >> > Hi, >> > >> > Looking at the Canary tool, it tries to ensure that all canary test >> > table regions are spread across all region servers. If that is not the >> > case, it calls: >> > >> > if (numberOfCoveredServers < numberOfServers) { >> > admin.balancer(); >> > } >> > >> > I doubt this will help with the StochasticLoadBalancer, which is known >> > to consider per-table balancing as one of many factors. In practice, >> > the SLB will most likely _not_ distribute the canary regions >> > sufficiently, leaving gap in the check. Switching on the per-table >> > option is discouraged against to let it do its thing. >> > >> > Just pointing it out for vetting. >> > >> > Lars >>
Re: Canary Test Tool and write sniffing
For the default interval , if you were looking at: private static final long DEFAULT_INTERVAL = 6000; The above was from: HBASE-4393 Implement a canary monitoring program which was integrated on Tue Apr 24 07:20:16 2012 FYI On Sat, Feb 4, 2017 at 4:06 AM, Lars George wrote: > Also, the default interval used to be 60 secs, but is now 6 secs. Does > that make sense? Seems awfully short for a default, assuming you have > many regions or servers. > > On Sat, Feb 4, 2017 at 11:54 AM, Lars George > wrote: > > Hi, > > > > Looking at the Canary tool, it tries to ensure that all canary test > > table regions are spread across all region servers. If that is not the > > case, it calls: > > > > if (numberOfCoveredServers < numberOfServers) { > > admin.balancer(); > > } > > > > I doubt this will help with the StochasticLoadBalancer, which is known > > to consider per-table balancing as one of many factors. In practice, > > the SLB will most likely _not_ distribute the canary regions > > sufficiently, leaving gap in the check. Switching on the per-table > > option is discouraged against to let it do its thing. > > > > Just pointing it out for vetting. > > > > Lars >
Re: Health Script does not stop region server
Running the command from the script locally (on Mac): $ /usr/bin/snmpwalk -t 5 -Oe -Oq -Os -v 1 -c public localhost if Timeout: No Response from localhost $ echo $? 1 Looks like the script should parse the output from snmpwalk and provide some hint if unexpected result is reported. Cheers On Sat, Feb 4, 2017 at 6:40 AM, Lars George wrote: > Hi, > > I tried the supplied `healthcheck.sh`, but did not have snmpd running. > That caused the script to take a long time to error out, which exceed > the 10 seconds the check was meant to run. That resets the check and > it keeps reporting the error, but never stops the servers: > > 2017-02-04 05:55:08,962 INFO > [regionserver/slave-1.internal.larsgeorge.com/10.0.10.10:16020] > hbase.HealthCheckChore: Health Check Chore runs every 10sec > 2017-02-04 05:55:08,975 INFO > [regionserver/slave-1.internal.larsgeorge.com/10.0.10.10:16020] > hbase.HealthChecker: HealthChecker initialized with script at > /opt/hbase/bin/healthcheck.sh, timeout=6 > > ... > > 2017-02-04 05:55:50,435 INFO > [slave-1.internal.larsgeorge.com,16020,1486216506007_ChoreService_1] > hbase.HealthCheckChore: Health status at 412837hrs, 55mins, 50sec : > ERROR check link, OK: disks ok, > > 2017-02-04 05:55:50,436 INFO > [slave-1.internal.larsgeorge.com,16020,1486216506007_ChoreService_1] > hbase.ScheduledChore: Chore: CompactionChecker missed its start time > 2017-02-04 05:55:50,437 INFO > [slave-1.internal.larsgeorge.com,16020,1486216506007_ChoreService_1] > hbase.ScheduledChore: Chore: > slave-1.internal.larsgeorge.com,16020,1486216506007-MemstoreFlusherChore > missed its start time > 2017-02-04 05:55:50,438 INFO > [slave-1.internal.larsgeorge.com,16020,1486216506007_ChoreService_2] > hbase.ScheduledChore: Chore: HealthChecker missed its start time > 2017-02-04 05:56:20,522 INFO > [slave-1.internal.larsgeorge.com,16020,1486216506007_ChoreService_2] > hbase.HealthCheckChore: Health status at 412837hrs, 56mins, 20sec : > ERROR check link, OK: disks ok, > > 2017-02-04 05:56:20,523 INFO > [slave-1.internal.larsgeorge.com,16020,1486216506007_ChoreService_2] > hbase.ScheduledChore: Chore: HealthChecker missed its start time > 2017-02-04 05:56:50,600 INFO > [slave-1.internal.larsgeorge.com,16020,1486216506007_ChoreService_2] > hbase.HealthCheckChore: Health status at 412837hrs, 56mins, 50sec : > ERROR check link, OK: disks ok, > > 2017-02-04 05:56:50,600 INFO > [slave-1.internal.larsgeorge.com,16020,1486216506007_ChoreService_2] > hbase.ScheduledChore: Chore: HealthChecker missed its start time > 2017-02-04 05:57:20,681 INFO > [slave-1.internal.larsgeorge.com,16020,1486216506007_ChoreService_1] > hbase.HealthCheckChore: Health status at 412837hrs, 57mins, 20sec : > ERROR check link, OK: disks ok, > > 2017-02-04 05:57:20,681 INFO > [slave-1.internal.larsgeorge.com,16020,1486216506007_ChoreService_1] > hbase.ScheduledChore: Chore: HealthChecker missed its start time > 2017-02-04 05:57:50,763 INFO > [slave-1.internal.larsgeorge.com,16020,1486216506007_ChoreService_1] > hbase.HealthCheckChore: Health status at 412837hrs, 57mins, 50sec : > ERROR check link, OK: disks ok, > > 2017-02-04 05:57:50,764 INFO > [slave-1.internal.larsgeorge.com,16020,1486216506007_ChoreService_1] > hbase.ScheduledChore: Chore: HealthChecker missed its start time > 2017-02-04 05:58:20,844 INFO > [slave-1.internal.larsgeorge.com,16020,1486216506007_ChoreService_1] > hbase.HealthCheckChore: Health status at 412837hrs, 58mins, 20sec : > ERROR check link, OK: disks ok, > > 2017-02-04 05:58:20,844 INFO > [slave-1.internal.larsgeorge.com,16020,1486216506007_ChoreService_1] > hbase.ScheduledChore: Chore: HealthChecker missed its start time > 2017-02-04 05:58:50,923 INFO > [slave-1.internal.larsgeorge.com,16020,1486216506007_ChoreService_1] > hbase.HealthCheckChore: Health status at 412837hrs, 58mins, 50sec : > ERROR check link, OK: disks ok, > > 2017-02-04 05:58:50,923 INFO > [slave-1.internal.larsgeorge.com,16020,1486216506007_ChoreService_1] > hbase.ScheduledChore: Chore: HealthChecker missed its start time > 2017-02-04 05:59:21,017 INFO > [slave-1.internal.larsgeorge.com,16020,1486216506007_ChoreService_2] > hbase.HealthCheckChore: Health status at 412837hrs, 59mins, 21sec : > ERROR check link, OK: disks ok, > > 2017-02-04 05:59:21,018 INFO > [slave-1.internal.larsgeorge.com,16020,1486216506007_ChoreService_2] > hbase.ScheduledChore: Chore: HealthChecker missed its start time > > That seems like a bug, no? > > Lars >
Successful: HBase Generate Website
Build status: Successful If successful, the website and docs have been generated. To update the live site, follow the instructions below. If failed, skip to the bottom of this email. Use the following commands to download the patch and apply it to a clean branch based on origin/asf-site. If you prefer to keep the hbase-site repo around permanently, you can skip the clone step. git clone https://git-wip-us.apache.org/repos/asf/hbase-site.git cd hbase-site wget -O- https://builds.apache.org/job/hbase_generate_website/479/artifact/website.patch.zip | funzip > 4e77b18da2515a14772a456f408ee34376a3c71f.patch git fetch git checkout -b asf-site-4e77b18da2515a14772a456f408ee34376a3c71f origin/asf-site git am --whitespace=fix 4e77b18da2515a14772a456f408ee34376a3c71f.patch At this point, you can preview the changes by opening index.html or any of the other HTML pages in your local asf-site-4e77b18da2515a14772a456f408ee34376a3c71f branch. There are lots of spurious changes, such as timestamps and CSS styles in tables, so a generic git diff is not very useful. To see a list of files that have been added, deleted, renamed, changed type, or are otherwise interesting, use the following command: git diff --name-status --diff-filter=ADCRTXUB origin/asf-site To see only files that had 100 or more lines changed: git diff --stat origin/asf-site | grep -E '[1-9][0-9]{2,}' When you are satisfied, publish your changes to origin/asf-site using these commands: git commit --allow-empty -m "Empty commit" # to work around a current ASF INFRA bug git push origin asf-site-4e77b18da2515a14772a456f408ee34376a3c71f:asf-site git checkout asf-site git branch -D asf-site-4e77b18da2515a14772a456f408ee34376a3c71f Changes take a couple of minutes to be propagated. You can verify whether they have been propagated by looking at the Last Published date at the bottom of http://hbase.apache.org/. It should match the date in the index.html on the asf-site branch in Git. As a courtesy- reply-all to this email to let other committers know you pushed the site. If failed, see https://builds.apache.org/job/hbase_generate_website/479/console
Health Script does not stop region server
Hi, I tried the supplied `healthcheck.sh`, but did not have snmpd running. That caused the script to take a long time to error out, which exceed the 10 seconds the check was meant to run. That resets the check and it keeps reporting the error, but never stops the servers: 2017-02-04 05:55:08,962 INFO [regionserver/slave-1.internal.larsgeorge.com/10.0.10.10:16020] hbase.HealthCheckChore: Health Check Chore runs every 10sec 2017-02-04 05:55:08,975 INFO [regionserver/slave-1.internal.larsgeorge.com/10.0.10.10:16020] hbase.HealthChecker: HealthChecker initialized with script at /opt/hbase/bin/healthcheck.sh, timeout=6 ... 2017-02-04 05:55:50,435 INFO [slave-1.internal.larsgeorge.com,16020,1486216506007_ChoreService_1] hbase.HealthCheckChore: Health status at 412837hrs, 55mins, 50sec : ERROR check link, OK: disks ok, 2017-02-04 05:55:50,436 INFO [slave-1.internal.larsgeorge.com,16020,1486216506007_ChoreService_1] hbase.ScheduledChore: Chore: CompactionChecker missed its start time 2017-02-04 05:55:50,437 INFO [slave-1.internal.larsgeorge.com,16020,1486216506007_ChoreService_1] hbase.ScheduledChore: Chore: slave-1.internal.larsgeorge.com,16020,1486216506007-MemstoreFlusherChore missed its start time 2017-02-04 05:55:50,438 INFO [slave-1.internal.larsgeorge.com,16020,1486216506007_ChoreService_2] hbase.ScheduledChore: Chore: HealthChecker missed its start time 2017-02-04 05:56:20,522 INFO [slave-1.internal.larsgeorge.com,16020,1486216506007_ChoreService_2] hbase.HealthCheckChore: Health status at 412837hrs, 56mins, 20sec : ERROR check link, OK: disks ok, 2017-02-04 05:56:20,523 INFO [slave-1.internal.larsgeorge.com,16020,1486216506007_ChoreService_2] hbase.ScheduledChore: Chore: HealthChecker missed its start time 2017-02-04 05:56:50,600 INFO [slave-1.internal.larsgeorge.com,16020,1486216506007_ChoreService_2] hbase.HealthCheckChore: Health status at 412837hrs, 56mins, 50sec : ERROR check link, OK: disks ok, 2017-02-04 05:56:50,600 INFO [slave-1.internal.larsgeorge.com,16020,1486216506007_ChoreService_2] hbase.ScheduledChore: Chore: HealthChecker missed its start time 2017-02-04 05:57:20,681 INFO [slave-1.internal.larsgeorge.com,16020,1486216506007_ChoreService_1] hbase.HealthCheckChore: Health status at 412837hrs, 57mins, 20sec : ERROR check link, OK: disks ok, 2017-02-04 05:57:20,681 INFO [slave-1.internal.larsgeorge.com,16020,1486216506007_ChoreService_1] hbase.ScheduledChore: Chore: HealthChecker missed its start time 2017-02-04 05:57:50,763 INFO [slave-1.internal.larsgeorge.com,16020,1486216506007_ChoreService_1] hbase.HealthCheckChore: Health status at 412837hrs, 57mins, 50sec : ERROR check link, OK: disks ok, 2017-02-04 05:57:50,764 INFO [slave-1.internal.larsgeorge.com,16020,1486216506007_ChoreService_1] hbase.ScheduledChore: Chore: HealthChecker missed its start time 2017-02-04 05:58:20,844 INFO [slave-1.internal.larsgeorge.com,16020,1486216506007_ChoreService_1] hbase.HealthCheckChore: Health status at 412837hrs, 58mins, 20sec : ERROR check link, OK: disks ok, 2017-02-04 05:58:20,844 INFO [slave-1.internal.larsgeorge.com,16020,1486216506007_ChoreService_1] hbase.ScheduledChore: Chore: HealthChecker missed its start time 2017-02-04 05:58:50,923 INFO [slave-1.internal.larsgeorge.com,16020,1486216506007_ChoreService_1] hbase.HealthCheckChore: Health status at 412837hrs, 58mins, 50sec : ERROR check link, OK: disks ok, 2017-02-04 05:58:50,923 INFO [slave-1.internal.larsgeorge.com,16020,1486216506007_ChoreService_1] hbase.ScheduledChore: Chore: HealthChecker missed its start time 2017-02-04 05:59:21,017 INFO [slave-1.internal.larsgeorge.com,16020,1486216506007_ChoreService_2] hbase.HealthCheckChore: Health status at 412837hrs, 59mins, 21sec : ERROR check link, OK: disks ok, 2017-02-04 05:59:21,018 INFO [slave-1.internal.larsgeorge.com,16020,1486216506007_ChoreService_2] hbase.ScheduledChore: Chore: HealthChecker missed its start time That seems like a bug, no? Lars
Re: Canary Test Tool and write sniffing
Also, the default interval used to be 60 secs, but is now 6 secs. Does that make sense? Seems awfully short for a default, assuming you have many regions or servers. On Sat, Feb 4, 2017 at 11:54 AM, Lars George wrote: > Hi, > > Looking at the Canary tool, it tries to ensure that all canary test > table regions are spread across all region servers. If that is not the > case, it calls: > > if (numberOfCoveredServers < numberOfServers) { > admin.balancer(); > } > > I doubt this will help with the StochasticLoadBalancer, which is known > to consider per-table balancing as one of many factors. In practice, > the SLB will most likely _not_ distribute the canary regions > sufficiently, leaving gap in the check. Switching on the per-table > option is discouraged against to let it do its thing. > > Just pointing it out for vetting. > > Lars
[jira] [Created] (HBASE-17596) Implement add/delete/modify column family methods
Guanghao Zhang created HBASE-17596: -- Summary: Implement add/delete/modify column family methods Key: HBASE-17596 URL: https://issues.apache.org/jira/browse/HBASE-17596 Project: HBase Issue Type: Sub-task Reporter: Guanghao Zhang -- This message was sent by Atlassian JIRA (v6.3.15#6346)
Canary Test Tool and write sniffing
Hi, Looking at the Canary tool, it tries to ensure that all canary test table regions are spread across all region servers. If that is not the case, it calls: if (numberOfCoveredServers < numberOfServers) { admin.balancer(); } I doubt this will help with the StochasticLoadBalancer, which is known to consider per-table balancing as one of many factors. In practice, the SLB will most likely _not_ distribute the canary regions sufficiently, leaving gap in the check. Switching on the per-table option is discouraged against to let it do its thing. Just pointing it out for vetting. Lars