Re: [ANNOUNCE] New HBase committer Bryan Beaudreault
Congratulations! Tak Lon (Stephen) Wu 于2022年4月12日周二 02:29写道: > Congrats Bryan! > > -Stephen > > On Mon, Apr 11, 2022 at 9:44 AM Pankaj Kumar > wrote: > > > > Congratulations & welcome Bryan..!! > > > > Regards, > > Pankaj > > > > On Sat, Apr 9, 2022, 5:15 PM 张铎(Duo Zhang) > wrote: > > > > > On behalf of the Apache HBase PMC, I am pleased to announce that Bryan > > > Beaudreault(bbeaudreault) has accepted the PMC's invitation to become a > > > committer on the project. We appreciate all of Bryan's generous > > > contributions thus far and look forward to his continued involvement. > > > > > > Congratulations and welcome, Bryan Beaudreault! > > > > > > 我很高兴代表 Apache HBase PMC 宣布 Bryan Beaudreault 已接受我们的邀请,成为 Apache HBase > 项目的 > > > Committer。感谢 Bryan Beaudreault 一直以来为 HBase 项目做出的贡献,并期待他在未来继续承担更多的责任。 > > > > > > 欢迎 Bryan Beaudreault! > > > >
Re: [VOTE] First release candidate for hbase-thirdparty 4.1.1 is available for download
+1 (binding) * Signature: ok * Checksum : ok * Rat check (1.8.0_301): ok - mvn clean apache-rat:check * Built from source (1.8.0_301): ok - mvn clean install -DskipTests 张铎(Duo Zhang) 于2022年6月20日周一 23:40写道: > See https://github.com/apache/hbase/pull/4552 > > After modifying some code in HBase, mainly because of a behavior change in > jetty, we can pass all the UTs when depending on hbase-thirdparty-4.1.1. > > 张铎(Duo Zhang) 于2022年6月18日周六 23:04写道: > > > Please vote on this Apache hbase thirdparty release candidate, > > hbase-thirdparty-4.1.1RC0 > > > > The VOTE will remain open for at least 72 hours. > > > > [ ] +1 Release this package as Apache hbase thirdparty 4.1.1 > > [ ] -1 Do not release this package because ... > > > > The tag to be voted on is 4.1.1RC0: > > > > https://github.com/apache/hbase-thirdparty/tree/4.1.1RC0 > > > > This tag currently points to git reference > > > > d674246a75e1d7d1d4c5ee09a2567bbfa1cec022 > > > > The release files, including signatures, digests, as well as CHANGES.md > > and RELEASENOTES.md included in this RC can be found at: > > > > > https://dist.apache.org/repos/dist/dev/hbase/hbase-thirdparty-4.1.1RC0/ > > > > Maven artifacts are available in a staging repository at: > > > > > https://repository.apache.org/content/repositories/orgapachehbase-1488/ > > > > Artifacts were signed with the 9AD2AE49 key which can be found in: > > > > https://downloads.apache.org/hbase/KEYS > > > > To learn more about Apache hbase thirdparty, please see > > > > http://hbase.apache.org/ > > > > Thanks, > > Your HBase Release Manager > > >
Re: [ANNOUNCE] New HBase Committer Liangjun He
Congratulations! OpenInx 于2022年12月6日周二 19:03写道: > Congrats and welcome ! > > On Tue, Dec 6, 2022 at 2:21 AM Andrew Purtell wrote: > > > Congratulations, and welcome! > > > > On Sat, Dec 3, 2022 at 5:51 AM Yu Li wrote: > > > > > Hi All, > > > > > > On behalf of the Apache HBase PMC, I am pleased to announce that > Liangjun > > > He (heliangjun) has accepted the PMC's invitation to become a committer > > on > > > the project. We appreciate all of Liangjun's generous contributions > thus > > > far and look forward to his continued involvement. > > > > > > Congratulations and welcome, Liangjun! > > > > > > 我很高兴代表 Apache HBase PMC 宣布 Liangjun He (何良均) 已接受我们的邀请,成为 Apache HBase > 项目的 > > > Committer。感谢何良均一直以来为 HBase 项目做出的贡献,并期待他在未来继续承担更多的责任。 > > > > > > 欢迎良均! > > > > > > Best Regards, > > > Yu > > > -- > > > Best Regards, > > > Yu > > > > > > > > > -- > > Best regards, > > Andrew > > > > Unrest, ignorance distilled, nihilistic imbeciles - > > It's what we’ve earned > > Welcome, apocalypse, what’s taken you so long? > > Bring us the fitting end that we’ve been counting on > >- A23, Welcome, Apocalypse > > >
Re: [DISCUSS] How to deal with the disabling of public sign ups for jira.a.o(enable github issues?)
Did other projects have the same solution for this, sync github issues to jira issues? Github issues will be useful to get more feedback. 张铎(Duo Zhang) 于2022年12月6日周二 00:13写道: > The PR for HBASE-27513 is available > > https://github.com/apache/hbase/pull/4913 > > Let's at least tell our users to send email to private@hbase for > acquiring a jira account. > > Thanks. > > 张铎(Duo Zhang) 于2022年12月2日周五 12:46写道: > > > > Currently all the comment on github PR will be sent to issues@hbase, > > like this one > > > > https://lists.apache.org/thread/jbfm269b4m24xl2r82l8b0t3pmqr44hr > > > > But I think this can only be used as an archive, to make sure that all > > discussions are recorded on asf infrastructure. > > > > For github issues, I'm afraid we can only do the same thing. As the > > format of github comment is different, it will be hard to read if we > > just sync the message to jira... > > > > Thanks. > > > > Bryan Beaudreault 于2022年12月1日周四 > 21:30写道: > > > > > > Should we have them sent to private@? Just thinking in terms of > reducing > > > spam to users who put their email and full name on a public list. > > > > > > One thought I had about bug tracking is whether we could use some sort > of > > > github -> jira sync. I've seen them used before, where it automatically > > > syncs issues and comments between the two systems. It's definitely not > > > ideal, but maybe an option? I'm guessing it would require INFRA help. > > > > > > On Thu, Dec 1, 2022 at 5:47 AM 张铎(Duo Zhang) > wrote: > > > > > > > I've filed HBASE-27513 for changing the readme on github. > > > > > > > > At least let's reuse the existing mailing list for acquiring jira > account. > > > > > > > > Thanks. > > > > > > > > 张铎(Duo Zhang) 于2022年11月29日周二 22:34写道: > > > > > > > > > > > > > > Bump and also send this to user@hbase. > > > > > > > > > > We need to find a way to deal with the current situation where > > > > > contributors can not create a Jira account on their own... > > > > > > > > > > At least, we need to change the readme on github page, web site and > > > > > also the ref guide to tell users how to acquire a jira account... > > > > > > > > > > Thanks. > > > > > > > > > > 张铎(Duo Zhang) 于2022年11月27日周日 22:06写道: > > > > > > > > > > > > For me, I think most developers already have a github account, so > > > > > > enabling it could help us get more feedback. For lots of younger > > > > > > Chinese developers, they rarely use email in their daily life... > > > > > > No doubt later we need to modify our readme on github. If we > just let > > > > > > users go to github issues on the readme, they will soon open an > issue > > > > > > there. But if we ask users to first send an email to a mailing > list, > > > > > > for acquiring a jira account, and then wait for a PMC member to > submit > > > > > > the request, and receive the email response, set up their > account, and > > > > > > then they can finally open an issue on jira. I'm afraid lots of > users > > > > > > will just give up, it is not very friendly... > > > > > > > > > > > > And I do not mean separate issue systems for users and devs. > Users can > > > > > > still open jira issues or ask in the mailing list if they want, > github > > > > > > issues is just another channel. If a user asks something in the > > > > > > mailing list and we think it is a bug, we will ask the user to > file an > > > > > > issue or we will file an issue for it. It is just the same with > github > > > > > > issues. > > > > > > > > > > > > Thanks. > > > > > > > > > > > > Nick Dimiduk 于2022年11月24日周四 15:44写道: > > > > > > > > > > > > > > This new situation around JIRA seems very similar to the > existing > > > > situation > > > > > > > around Slack. A new community member currently must acquire a > Slack > > > > invite > > > > > > > somehow, usually by emailing one of the lists. Mailing lists > > > > themselves > > > > > > > involve a signup process, though it may be possible to email > > > > user/-zh/dev > > > > > > > without first subscribing to the list. > > > > > > > > > > > > > > I have a -0 opinion on using GitHub Issues to manage JIRA > > > > subscription > > > > > > > access. It seems like a comical cascade of complexity. I’d > prefer to > > > > keep > > > > > > > GitHub Issues available to us as a future alternative to JIRA > for > > > > project > > > > > > > issue tracking. I agree with you that migrating away from JIRA > will > > > > be > > > > > > > painful. > > > > > > > > > > > > > > I’m not a big fan of having separate issue systems for users > vs. > > > > devs. It > > > > > > > emphasizes the idea that users and devs are different groups of > > > > people with > > > > > > > unequal voice in the project direction. I suppose it could be > done > > > > well, > > > > > > > but I think it is more likely to be done poorly. > > > > > > > > > > > > > > I follow the Infra list, but only casually. It seems there’s a > plan > > > > to > > > > > > > eventually adopt some Atlassian Cloud service, which
Re: [VOTE] The first release candidate for HBase 2.5.2 is available
+1 (binding) * Checksum : ok * Rat check (1.8.0_301): ok - mvn clean apache-rat:check * Built from source (1.8.0_301): ok - mvn clean install -DskipTests * Unit tests pass (1.8.0_301): ok - mvn package -P runSmallTests Duo Zhang 于2022年11月30日周三 23:37写道: > Bump. > > The phoenix community has tested the hadoop3 artifacts and it works well. > > Let's get this release done~ > > Thanks. > > Duo Zhang 于2022年11月24日周四 12:32写道: > > > > Please vote on this Apache hbase release candidate, > > hbase-2.5.2RC0 > > > > The VOTE will remain open for at least 72 hours. > > > > [ ] +1 Release this package as Apache hbase 2.5.2 > > [ ] -1 Do not release this package because ... > > > > The tag to be voted on is 2.5.2RC0: > > > > https://github.com/apache/hbase/tree/2.5.2RC0 > > > > This tag currently points to git reference > > > > 3e28acf0b819f4b4a1ada2b98d59e05b0ef94f96 > > > > The release files, including signatures, digests, as well as CHANGES.md > > and RELEASENOTES.md included in this RC can be found at: > > > > https://dist.apache.org/repos/dist/dev/hbase/2.5.2RC0/ > > > > Maven artifacts are available in a staging repository at: > > > > > https://repository.apache.org/content/repositories/orgapachehbase-1503/ > > > > Maven artifacts for hadoop3 are available in a staging repository at: > > > > > https://repository.apache.org/content/repositories/orgapachehbase-1504/ > > > > Artifacts were signed with the 0x9AD2AE49 key which can be found in: > > > > https://downloads.apache.org/hbase/KEYS > > > > 2.5.2 includes 28 bug and improvement fixes done since 2.5.1. And > > starting from 2.5.2, we will publish dist and maven artifacts for both > > hadoop2 and hadoop3. Feel free to report any issues for the newly > > published hadoop3 dist and maven artifacts. > > > > To learn more about Apache hbase, please see > > > > http://hbase.apache.org/ > > > > Thanks, > > Your HBase Release Manager >
Re: [VOTE] First release candidate for hbase 3.0.0-alpha-4 is available for download
+1 (binding) * Checksum : ok * Rat check (1.8.0_362): ok - mvn clean apache-rat:check * Built from source (1.8.0_362): ok - mvn clean install -DskipTests * Unit tests pass (1.8.0_362): ok - mvn clean package -P runSmallTests -Dsurefire.rerunFailingTestsCount=3 Shanmukha Haripriya Kota 于2023年6月7日周三 08:17写道: > +1 > > [INFO] > > [INFO] BUILD SUCCESS > [INFO] > > [INFO] Total time: 33:55 min > [INFO] Finished at: 2023-06-06T19:00:23-05:00 > [INFO] > > ~/Desktop/hbase300/hbase-3.0.0-alpha-4/dev-support > * Signature: ok > * Checksum : ok > * Rat check (11.0.10): ok > - mvn clean apache-rat:check > * Built from source (11.0.10): ok > - mvn clean install -DskipTests > * Unit tests pass (11.0.10): ok > - mvn clean package -P runSmallTests > -Dsurefire.rerunFailingTestsCount=3 > > [1] + 19163 done ./hbase-vote.sh -s -f > https://dist.apache.org/repos/dist/release/hbase/KEYS > > On Tue, Jun 6, 2023 at 10:22 AM Andrew Purtell > wrote: > > > +1 (binding) > > > > * Signature: ok > > * Checksum : ok > > * Rat check (11.0.19): ok > > - mvn clean apache-rat:check > > * Built from source (11.0.19): ok > > - mvn clean install -DskipTests > > * Unit tests pass (11.0.19): failed > > - mvn clean package -P runAllTests > > -Dsurefire.rerunFailingTestsCount=3 > > > > TestSpnegoHttpServer and TestProxyUserSpnegoHttpServer in hbase-http > > consistently failed but it may be environmental, related to local keberos > > configs. > > > > On Sun, May 28, 2023 at 8:13 AM Duo Zhang wrote: > > > > > Please vote on this Apache hbase release candidate, > > > hbase-3.0.0-alpha-4RC0 > > > > > > The VOTE will remain open for at least 72 hours. > > > > > > [ ] +1 Release this package as Apache hbase 3.0.0-alpha-4 > > > [ ] -1 Do not release this package because ... > > > > > > The tag to be voted on is 3.0.0-alpha-4RC0: > > > > > > https://github.com/apache/hbase/tree/3.0.0-alpha-4RC0 > > > > > > This tag currently points to git reference > > > > > > e44cc02c75ecae7ece845f04722eb16b7528393f > > > > > > The release files, including signatures, digests, as well as CHANGES.md > > > and RELEASENOTES.md included in this RC can be found at: > > > > > > https://dist.apache.org/repos/dist/dev/hbase/3.0.0-alpha-4RC0/ > > > > > > Maven artifacts are available in a staging repository at: > > > > > > > > https://repository.apache.org/content/repositories/orgapachehbase-1520/ > > > > > > Maven artifacts for hadoop3 are available in a staging repository at: > > > > > > https://repository.apache.org/content/repositories/not-applicable/ > > > > > > Artifacts were signed with the 0x9AD2AE49 key which can be found in: > > > > > > https://downloads.apache.org/hbase/KEYS > > > > > > 3.0.0-alpha-4 is the fourth alpha release for our 3.0.0 major release > > line. > > > HBase 3.0.0 includes the following big feature/changes: > > > Synchronous Replication > > > OpenTelemetry Tracing > > > Distributed MOB Compaction > > > Backup and Restore > > > Move RSGroup balancer to core > > > Reimplement sync client on async client > > > CPEPs on shaded protobuf > > > Move the logging framework from log4j to log4j2 > > > Decouple region replication and general replication framework, and > > > also make region replication can work when SKIP_WAL is used > > > A new file system based replication peer storage > > > Used hbase table instead of zookeeper for tracking hbase replication > > > queue > > > > > > Notice that this is not a production ready release. It is used to let > our > > > users try and test the new major release, to get feedback before the > > final > > > GA release is out. > > > So please do NOT use it in production. Just try it and report back > > > everything you find unusual. > > > > > > And this time we will not include CHANGES.md and RELEASENOTE.md > > > in our source code, you can find it on the download site. For getting > > these > > > two files for old releases, please go to > > > > > > https://archive.apache.org/dist/hbase/ > > > > > > To learn more about Apache hbase, please see > > > > > > http://hbase.apache.org/ > > > > > > Thanks, > > > Your HBase Release Manager > > > > > > > > > -- > > Best regards, > > Andrew > > > > Unrest, ignorance distilled, nihilistic imbeciles - > > It's what we’ve earned > > Welcome, apocalypse, what’s taken you so long? > > Bring us the fitting end that we’ve been counting on > >- A23, Welcome, Apocalypse > > > > > -- > Regards, > Shanmukha Kota >
[jira] [Created] (HBASE-13686) Fail to limit rate in RateLimiter
Guanghao Zhang created HBASE-13686: -- Summary: Fail to limit rate in RateLimiter Key: HBASE-13686 URL: https://issues.apache.org/jira/browse/HBASE-13686 Project: HBase Issue Type: Bug Affects Versions: 2.0.0, 1.1.0 Reporter: Guanghao Zhang Priority: Minor While using the patch in HBASE-11598 , I found that RateLimiter can't to limit the rate right. {code} /** * given the time interval, are there enough available resources to allow execution? * @param now the current timestamp * @param lastTs the timestamp of the last update * @param amount the number of required resources * @return true if there are enough available resources, otherwise false */ public synchronized boolean canExecute(final long now, final long lastTs, final long amount) { return avail = amount ? true : refill(now, lastTs) = amount; } {code} When avail = amount, avail can't be refill. But in the next time to call canExecute, lastTs maybe update. So avail will waste some time to refill. Even we use smaller rate than the limit, the canExecute will return false. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-13829) Add more ThrottleType
Guanghao Zhang created HBASE-13829: -- Summary: Add more ThrottleType Key: HBASE-13829 URL: https://issues.apache.org/jira/browse/HBASE-13829 Project: HBase Issue Type: Improvement Components: Client Reporter: Guanghao Zhang Assignee: Guanghao Zhang Fix For: 2.0.0 HBASE-11598 add simple throttling for hbase. But in the client, it doesn't support user to set ThrottleType like WRITE_NUM, WRITE_SIZE, READ_NUM, READ_SIZE. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-13974) TestRateLimiter#testFixedIntervalResourceAvailability may fail
Guanghao Zhang created HBASE-13974: -- Summary: TestRateLimiter#testFixedIntervalResourceAvailability may fail Key: HBASE-13974 URL: https://issues.apache.org/jira/browse/HBASE-13974 Project: HBase Issue Type: Bug Components: test Affects Versions: 2.0.0 Reporter: Guanghao Zhang Assignee: Guanghao Zhang Stacktrace java.lang.AssertionError: null at org.junit.Assert.fail(Assert.java:86) at org.junit.Assert.assertTrue(Assert.java:41) at org.junit.Assert.assertFalse(Assert.java:64) at org.junit.Assert.assertFalse(Assert.java:74) at org.apache.hadoop.hbase.quotas.TestRateLimiter.testFixedIntervalResourceAvailability(TestRateLimiter.java:151) The code of this ut. {code} RateLimiter limiter = new FixedIntervalRateLimiter(); limiter.set(10, TimeUnit.MILLISECONDS); assertTrue(limiter.canExecute(10)); limiter.consume(3); assertEquals(7, limiter.getAvailable()); assertFalse(limiter.canExecute(10)); {code} The limiter will refill by MILLISECONDS. So if this unit test execute slowly or hang by others over 1 ms, the assertFalse(limiter.canExecute(10)) will fail. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-13888) refill bug from HBASE-13686
Guanghao Zhang created HBASE-13888: -- Summary: refill bug from HBASE-13686 Key: HBASE-13888 URL: https://issues.apache.org/jira/browse/HBASE-13888 Project: HBase Issue Type: Bug Affects Versions: 2.0.0 Reporter: Guanghao Zhang Assignee: Guanghao Zhang As I report the RateLimiter fail to limit in HBASE-13686, then [~ashish singhi] fix that problem by support two kinds of RateLimiter: AverageIntervalRateLimiter and FixedIntervalRateLimiter. But in my use of the code, I found a new bug about refill() in AverageIntervalRateLimiter. {code} long delta = (limit * (now - nextRefillTime)) / super.getTimeUnitInMillis(); if (delta 0) { this.nextRefillTime = now; return Math.min(limit, available + delta); } {code} When delta 0, refill maybe return available + delta. Then in the canExecute(), avail will add refillAmount again. So the new avail maybe 2 * avail + delta. {code} long refillAmount = refill(limit, avail); if (refillAmount == 0 avail amount) { return false; } // check for positive overflow if (avail = Long.MAX_VALUE - refillAmount) { avail = Math.max(0, Math.min(avail + refillAmount, limit)); } else { avail = Math.max(0, limit); } {code} I will add more unit tests for RateLimiter in the next days. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-13987) Modify the result of shell cmd list_quotas when not enable quota
Guanghao Zhang created HBASE-13987: -- Summary: Modify the result of shell cmd list_quotas when not enable quota Key: HBASE-13987 URL: https://issues.apache.org/jira/browse/HBASE-13987 Project: HBase Issue Type: Improvement Affects Versions: 2.0.0 Environment: When not enable quota, use shell cmd list_quotas will get result as belows: hbase(main):008:0 list_quotas OWNERQUOTAS ERROR: Unknown table hbase:quota! It is confuse if user doesn't know quotas are stored in hbase:quota. I add check isQuotaEnabled before scan the table hbase:quota. So it will return result ERROR: quota support disabled, which is same with set_quota. Reporter: Guanghao Zhang Assignee: Guanghao Zhang Priority: Minor -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-14706) RegionLocationFinder should return multiple servername by top host
Guanghao Zhang created HBASE-14706: -- Summary: RegionLocationFinder should return multiple servername by top host Key: HBASE-14706 URL: https://issues.apache.org/jira/browse/HBASE-14706 Project: HBase Issue Type: Bug Components: Balancer Affects Versions: 2.0.0 Reporter: Guanghao Zhang Assignee: Guanghao Zhang Multiple RS can run on the same host. But in current RegionLocationFinder, mapHostNameToServerName map one host to only one server. This will make LocalityCostFunction get wrong locality about region. {code} // create a mapping from hostname to ServerName for fast lookup HashMap<String, ServerName> hostToServerName = new HashMap<String, ServerName>(); for (ServerName sn : regionServers) { hostToServerName.put(sn.getHostname(), sn); } {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-14604) Improve MoveCostFunction in StochasticLoadBalancer
Guanghao Zhang created HBASE-14604: -- Summary: Improve MoveCostFunction in StochasticLoadBalancer Key: HBASE-14604 URL: https://issues.apache.org/jira/browse/HBASE-14604 Project: HBase Issue Type: Bug Components: Balancer Reporter: Guanghao Zhang Assignee: Guanghao Zhang The code in MoveCoseFunction: {code} return scale(0, cluster.numRegions + META_MOVE_COST_MULT, moveCost); {code} It uses cluster.numRegions + META_MOVE_COST_MULT as the max value when scale moveCost to [0,1]. But this should use maxMoves as the max value when cluster have a lot of regions. Assume a cluster have 1 regions, maxMoves is 2500, it only scale moveCost to [0, 0.25]. Improve moveCost by use maxMoves. {code} return scale(0, Math.min(cluster.numRegions, maxMoves) + META_MOVE_COST_MULT, moveCost); {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-14609) Can't config all day as OffPeakHours
Guanghao Zhang created HBASE-14609: -- Summary: Can't config all day as OffPeakHours Key: HBASE-14609 URL: https://issues.apache.org/jira/browse/HBASE-14609 Project: HBase Issue Type: Bug Reporter: Guanghao Zhang Assignee: Guanghao Zhang Priority: Minor The offpeak hours is [startHour, endHour) and endhour is exclusive. But endHour is not valid when config as 24, so we can't config all day as OffPeakHours. {code} private static boolean isValidHour(int hour) { return 0 <= hour && hour <= 23; } {code} Let endHour=24 is valid or enable startHour==endHour can fix this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-16012) Major compaction can't work because left scanner read point in RegionServer
Guanghao Zhang created HBASE-16012: -- Summary: Major compaction can't work because left scanner read point in RegionServer Key: HBASE-16012 URL: https://issues.apache.org/jira/browse/HBASE-16012 Project: HBase Issue Type: Bug Components: Compaction, Scanners Affects Versions: 0.94.27, 2.0.0 Reporter: Guanghao Zhang When new RegionScanner, it will add a scanner read point in scannerReadPoints. But if we got a exception after add read point, the read point will keep in regions server and the delete after this mvcc number will never be compacted. Our hbase version is base 0.94. If it throws other exception when initialize RegionScanner, the master branch has this bug, too. ERROR org.apache.hadoop.hbase.regionserver.HRegionServer: Failed openScanner java.io.IOException: Could not seek StoreFileScanner at org.apache.hadoop.hbase.regionserver.StoreFileScanner.seek(StoreFileScanner.java:160) at org.apache.hadoop.hbase.regionserver.StoreScanner.seekScanners(StoreScanner.java:268) at org.apache.hadoop.hbase.regionserver.StoreScanner.(StoreScanner.java:168) at org.apache.hadoop.hbase.regionserver.Store.getScanner(Store.java:2232) at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.(HRegion.java:4026) at org.apache.hadoop.hbase.regionserver.HRegion.instantiateRegionScanner(HRegion.java:1895) at org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.java:1879) at org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.java:1854) at org.apache.hadoop.hbase.regionserver.HRegionServer.internalOpenScanner(HRegionServer.java:3032) at org.apache.hadoop.hbase.regionserver.HRegionServer.openScanner(HRegionServer.java:2995) at sun.reflect.GeneratedMethodAccessor67.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.ipc.SecureRpcEngine$Server.call(SecureRpcEngine.java:338) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1595) Caused by: org.apache.hadoop.hbase.ipc.CallerDisconnectedException: Aborting call openScanner, since caller disconnected at org.apache.hadoop.hbase.ipc.HBaseServer$Call.throwExceptionIfCallerDisconnected(HBaseServer.java:475) at org.apache.hadoop.hbase.io.hfile.HFileBlock$AbstractFSReader.readAtOffset(HFileBlock.java:1443) at org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderV2.readBlockDataInternal(HFileBlock.java:1902) at org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderV2.readBlockData(HFileBlock.java:1766) at org.apache.hadoop.hbase.io.hfile.HFileReaderV2.readBlock(HFileReaderV2.java:345) at org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$BlockIndexReader.loadDataBlockWithScanInfo(HFileBlockIndex.java:254) at org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.seekTo(HFileReaderV2.java:499) at org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.seekTo(HFileReaderV2.java:520) at org.apache.hadoop.hbase.regionserver.StoreFileScanner.seekAtOrAfter(StoreFileScanner.java:235) at org.apache.hadoop.hbase.regionserver.StoreFileScanner.seek(StoreFileScanner.java:148) ... 14 more -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-15615) Wrong sleep time when RegionServerCallable need retry
Guanghao Zhang created HBASE-15615: -- Summary: Wrong sleep time when RegionServerCallable need retry Key: HBASE-15615 URL: https://issues.apache.org/jira/browse/HBASE-15615 Project: HBase Issue Type: Bug Components: Client Affects Versions: 2.0.0 Reporter: Guanghao Zhang In RpcRetryingCallerImpl, it get pause time by expectedSleep = callable.sleep(pause, tries + 1); And in RegionServerCallable, it get pasue time by sleep = ConnectionUtils.getPauseTime(pause, tries + 1). So tries will be bumped up twice. Now RETRY_BACKOFF = {1, 2, 3, 5, 10, 20, 40, 100, 100, 100, 100, 200, 200}; So the pasue time is 3 * hbase.client.pause when tries is 0. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-15515) Improve LocalityBasedCandidateGenerator in Balancer
Guanghao Zhang created HBASE-15515: -- Summary: Improve LocalityBasedCandidateGenerator in Balancer Key: HBASE-15515 URL: https://issues.apache.org/jira/browse/HBASE-15515 Project: HBase Issue Type: Bug Affects Versions: 2.0.0, 1.3.0 Reporter: Guanghao Zhang Assignee: Guanghao Zhang Priority: Minor Fix For: 2.0.0 There are some problems which need to fix. 1. LocalityBasedCandidateGenerator.getLowestLocalityRegionOnServer should skip empty region. 2. When use LocalityBasedCandidateGenerator to generate Cluster.Action, it should add random operation instead of pickLowestLocalityServer(cluster). Because the search function may stuck here if it always generate the same Cluster.Action. 3. getLeastLoadedTopServerForRegion should get least loaded server which have better locality than current server. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-15529) Override needBalance in StochasticLoadBalancer
Guanghao Zhang created HBASE-15529: -- Summary: Override needBalance in StochasticLoadBalancer Key: HBASE-15529 URL: https://issues.apache.org/jira/browse/HBASE-15529 Project: HBase Issue Type: Improvement Reporter: Guanghao Zhang Priority: Minor StochasticLoadBalancer includes cost functions to compute the cost of region rount, r/w qps, table load, region locality, memstore size, and storefile size. Every cost function returns a number between 0 and 1 inclusive and the computed costs are scaled by their respective multipliers. The bigger multiplier means that the respective cost function have the bigger weight. But needBalance decide whether to balance only by region count and doesn't consider r/w qps, locality even you config these cost function with bigger multiplier. StochasticLoadBalancer should override needBalance and decide whether to balance by it's configs of cost function. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-15496) Throw RowTooBigException only for user scan/get
Guanghao Zhang created HBASE-15496: -- Summary: Throw RowTooBigException only for user scan/get Key: HBASE-15496 URL: https://issues.apache.org/jira/browse/HBASE-15496 Project: HBase Issue Type: Improvement Components: Scanners Reporter: Guanghao Zhang Priority: Minor Fix For: 2.0.0 When config hbase.table.max.rowsize, RowTooBigException may be thrown by StoreScanner. But region flush/compact should catch it or throw it only for user scan. org.apache.hadoop.hbase.regionserver.RowTooBigException: Max row size allowed: 10485760, but row is bigger than that at org.apache.hadoop.hbase.regionserver.StoreScanner.seekScanners(StoreScanner.java:355) at org.apache.hadoop.hbase.regionserver.StoreScanner.(StoreScanner.java:276) at org.apache.hadoop.hbase.regionserver.StoreScanner.(StoreScanner.java:238) at org.apache.hadoop.hbase.regionserver.compactions.Compactor.createScanner(Compactor.java:403) at org.apache.hadoop.hbase.regionserver.compactions.DefaultCompactor.compact(DefaultCompactor.java:95) at org.apache.hadoop.hbase.regionserver.DefaultStoreEngine$DefaultCompactionContext.compact(DefaultStoreEngine.java:131) at org.apache.hadoop.hbase.regionserver.HStore.compact(HStore.java:1211) at org.apache.hadoop.hbase.regionserver.HRegion.compact(HRegion.java:1952) at org.apache.hadoop.hbase.regionserver.HRegion.compact(HRegion.java:1774) or org.apache.hadoop.hbase.regionserver.RowTooBigException: Max row size allowed: 10485760, but the row is bigger than that. at org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:576) at org.apache.hadoop.hbase.regionserver.StoreFlusher.performFlush(StoreFlusher.java:132) at org.apache.hadoop.hbase.regionserver.DefaultStoreFlusher.flushSnapshot(DefaultStoreFlusher.java:75) at org.apache.hadoop.hbase.regionserver.HStore.flushCache(HStore.java:880) at org.apache.hadoop.hbase.regionserver.HStore$StoreFlusherImpl.flushCache(HStore.java:2155) at org.apache.hadoop.hbase.regionserver.HRegion.internalFlushCacheAndCommit(HRegion.java:2454) at org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2193) at org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2162) at org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:2053) at org.apache.hadoop.hbase.regionserver.HRegion.flush(HRegion.java:1979) at org.apache.hadoop.hbase.regionserver.TestRowTooBig.testScannersSeekOnFewLargeCells(TestRowTooBig.java:101) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-15885) Compute StoreFile HDFS Blocks Distribution when need it
Guanghao Zhang created HBASE-15885: -- Summary: Compute StoreFile HDFS Blocks Distribution when need it Key: HBASE-15885 URL: https://issues.apache.org/jira/browse/HBASE-15885 Project: HBase Issue Type: Improvement Components: HFile Affects Versions: 2.0.0 Reporter: Guanghao Zhang Now when open a StoreFileReader, it always need to compute HDFS blocks distribution. But when balance a region, it will increase the region not serving time. Because it need first close region on rs A, then open it on rs B. When close region, it first preFlush, then flush the new update to a new store file. The new store file will first be flushed to tmp directory, then move it to column family directory. These need open StoreFileReader twice which means it need compute HDFS blocks distribution twice. When open region on rs B, it need open StoreFileReader and compute HDFS blocks distribution too. So when balance a region, it need compute HDFS blocks distribution three times for per new store file. This will increase the region not serving time and we don't need compute HDFS blocks distribution when close a region. The related three methods in HStore. 1. validateStoreFile(...) 2. commitFile(...) 3. openStoreFiles(...) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-15829) hbase.client.retries.number has different meanings in branch-1 and master
Guanghao Zhang created HBASE-15829: -- Summary: hbase.client.retries.number has different meanings in branch-1 and master Key: HBASE-15829 URL: https://issues.apache.org/jira/browse/HBASE-15829 Project: HBase Issue Type: Bug Components: Client Affects Versions: 2.0.0 Reporter: Guanghao Zhang Priority: Minor The comment of hbase.client.retries.number is: {code} /** * Parameter name for maximum retries, used as maximum for all retryable * operations such as fetching of the root region from root region server, * getting a cell's value, starting a row update, etc. */ public static final String HBASE_CLIENT_RETRIES_NUMBER = "hbase.client.retries.number"; {code} In branch-1, the max attempts number equals with hbase.client.retries.number. But in master, the max attempts number equals with hbase.client.retries.number + 1. For RpcRetryingCaller. {code} this.retries = retries; // branch-1 {code} {code} this.maxAttempts = retries + 1; // master {code} For AsyncProcess: {code} this.numTries = conf.getInt(HConstants.HBASE_CLIENT_RETRIES_NUMBER, HConstants.DEFAULT_HBASE_CLIENT_RETRIES_NUMBER); // branch-1 {code} {code} // how many times we could try in total, one more than retry number this.numTries = conf.getInt(HConstants.HBASE_CLIENT_RETRIES_NUMBER, HConstants.DEFAULT_HBASE_CLIENT_RETRIES_NUMBER) + 1; // master {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-16416) Make NoncedRegionServerCallable extends RegionServerCallable
Guanghao Zhang created HBASE-16416: -- Summary: Make NoncedRegionServerCallable extends RegionServerCallable Key: HBASE-16416 URL: https://issues.apache.org/jira/browse/HBASE-16416 Project: HBase Issue Type: Improvement Components: Client Affects Versions: 2.0.0 Reporter: Guanghao Zhang Priority: Minor After HBASE-16308, there are a new class NoncedRegionServerCallable which extends AbstractRegionServerCallable. But it have some duplicate methods with RegionServerCallable. So we can make NoncedRegionServerCallable extends RegionServerCallable. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-16368) test*WhenRegionMove in TestPartialResultsFromClientSide is flaky
Guanghao Zhang created HBASE-16368: -- Summary: test*WhenRegionMove in TestPartialResultsFromClientSide is flaky Key: HBASE-16368 URL: https://issues.apache.org/jira/browse/HBASE-16368 Project: HBase Issue Type: Bug Components: Scanners Affects Versions: 1.4.0 Reporter: Guanghao Zhang This test fail when Hadoop QA run preCommit: https://builds.apache.org/job/PreCommit-HBASE-Build/2971/testReport/org.apache.hadoop.hbase/TestPartialResultsFromClientSide/testReversedCompleteResultWhenRegionMove/. And I found it is in Flaky Tests Dashboard: http://hbase.x10host.com/flaky-tests/. I run it in my local machine and it may fail, too. Test results show that the region location didn't update when scanner callable get a NotServingRegionException or RegionMovedException. {code} org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed after attempts=36, exceptions: Sat Aug 06 05:55:52 UTC 2016, null, java.net.SocketTimeoutException: callTimeout=2000, callDuration=2157: org.apache.hadoop.hbase.NotServingRegionException: testReversedCompleteResultWhenRegionMove,,1470462949504.5069bd63bf6eda5108acec4fcc087b0e. is closing at org.apache.hadoop.hbase.regionserver.HRegion.startRegionOperation(HRegion.java:8233) at org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.java:2634) at org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.java:2629) at org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.java:2623) at org.apache.hadoop.hbase.regionserver.RSRpcServices.scan(RSRpcServices.java:2490) at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:34950) at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2264) at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:118) at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:189) at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:169) row '' on table 'testReversedCompleteResultWhenRegionMove' at region=testReversedCompleteResultWhenRegionMove,,1470462949504.5069bd63bf6eda5108acec4fcc087b0e., hostname=asf907.gq1.ygridcore.net,38914,1470462943053, seqNum=2 at org.apache.hadoop.hbase.client.RpcRetryingCallerWithReadReplicas.throwEnrichedException(RpcRetryingCallerWithReadReplicas.java:281) at org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:213) at org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:61) at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithoutRetries(RpcRetryingCaller.java:212) at org.apache.hadoop.hbase.client.ReversedClientScanner.nextScanner(ReversedClientScanner.java:118) at org.apache.hadoop.hbase.client.ClientScanner.initializeScannerInConstruction(ClientScanner.java:166) at org.apache.hadoop.hbase.client.ClientScanner.(ClientScanner.java:161) at org.apache.hadoop.hbase.client.ReversedClientScanner.(ReversedClientScanner.java:56) at org.apache.hadoop.hbase.client.HTable.getScanner(HTable.java:785) at org.apache.hadoop.hbase.TestPartialResultsFromClientSide.testReversedCompleteResultWhenRegionMove(TestPartialResultsFromClientSide.java:986) {code} {code} org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed after attempts=36, exceptions: Sat Aug 06 16:27:22 CST 2016, null, java.net.SocketTimeoutException: callTimeout=2000, callDuration=3035: Region moved to: hostname=localhost port=58351 startCode=1470472007714. As of locationSeqNum=6. row 'testRow0' on table 'testPartialResultWhenRegionMove' at region=testPartialResultWhenRegionMove,,1470472035048.977faf05c1d6d9990b5559b17aa18913., hostname=localhost,40425,1470472007646, seqNum=2 at org.apache.hadoop.hbase.client.RpcRetryingCallerWithReadReplicas.throwEnrichedException(RpcRetryingCallerWithReadReplicas.java:281) at org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:213) at org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:61) at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithoutRetries(RpcRetryingCaller.java:212) at org.apache.hadoop.hbase.client.ClientScanner.call(ClientScanner.java:326) at org.apache.hadoop.hbase.client.ClientScanner.nextScanner(ClientScanner.java:301) at org.apache.hadoop.hbase.client.ClientScanner.possiblyNextScanner(ClientScanner.java:247) at org.apache.hadoop.hbase.client.ClientScanner.loadCache(ClientScanner.java:541) at org.apache.hadoop.hbase.client.ClientScanner.next(ClientScanner.java:370
[jira] [Created] (HBASE-17600) Implement get/create/modify/delete/list namespace admin operations
Guanghao Zhang created HBASE-17600: -- Summary: Implement get/create/modify/delete/list namespace admin operations Key: HBASE-17600 URL: https://issues.apache.org/jira/browse/HBASE-17600 Project: HBase Issue Type: Sub-task Reporter: Guanghao Zhang Assignee: Guanghao Zhang -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (HBASE-17615) Use nonce and procedure v2 for add/remove replication peer
Guanghao Zhang created HBASE-17615: -- Summary: Use nonce and procedure v2 for add/remove replication peer Key: HBASE-17615 URL: https://issues.apache.org/jira/browse/HBASE-17615 Project: HBase Issue Type: Sub-task Affects Versions: 2.0.0 Reporter: Guanghao Zhang -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (HBASE-17596) Implement add/delete/modify column family methods
Guanghao Zhang created HBASE-17596: -- Summary: Implement add/delete/modify column family methods Key: HBASE-17596 URL: https://issues.apache.org/jira/browse/HBASE-17596 Project: HBase Issue Type: Sub-task Reporter: Guanghao Zhang -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (HBASE-17511) Implement enable/disable table methods
Guanghao Zhang created HBASE-17511: -- Summary: Implement enable/disable table methods Key: HBASE-17511 URL: https://issues.apache.org/jira/browse/HBASE-17511 Project: HBase Issue Type: Sub-task Reporter: Guanghao Zhang Assignee: Guanghao Zhang -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-17498) Implement listTables methods
Guanghao Zhang created HBASE-17498: -- Summary: Implement listTables methods Key: HBASE-17498 URL: https://issues.apache.org/jira/browse/HBASE-17498 Project: HBase Issue Type: Sub-task Reporter: Guanghao Zhang Assignee: Guanghao Zhang -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-17500) Implement getTable/creatTable/deleteTable/truncateTable methods
Guanghao Zhang created HBASE-17500: -- Summary: Implement getTable/creatTable/deleteTable/truncateTable methods Key: HBASE-17500 URL: https://issues.apache.org/jira/browse/HBASE-17500 Project: HBase Issue Type: Sub-task Reporter: Guanghao Zhang Assignee: Guanghao Zhang -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-16460) Can't rebuild the BucketAllocator's data structures when BucketCache use FileIOEngine
Guanghao Zhang created HBASE-16460: -- Summary: Can't rebuild the BucketAllocator's data structures when BucketCache use FileIOEngine Key: HBASE-16460 URL: https://issues.apache.org/jira/browse/HBASE-16460 Project: HBase Issue Type: Bug Components: BucketCache Affects Versions: 2.0.0 Reporter: Guanghao Zhang Assignee: Guanghao Zhang When bucket cache use FileIOEngine, it will rebuild the bucket allocator's data structures from a persisted map. So it should first read the map from persistence file then use the map to new a BucketAllocator. But now the code has wrong sequence in retrieveFromFile() method of BucketCache.java. {code} BucketAllocator allocator = new BucketAllocator(cacheCapacity, bucketSizes, backingMap, realCacheSize); backingMap = (ConcurrentHashMap<BlockCacheKey, BucketEntry>) ois.readObject(); {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-16561) Add metrics about read/write/scan queue length and active handler count
Guanghao Zhang created HBASE-16561: -- Summary: Add metrics about read/write/scan queue length and active handler count Key: HBASE-16561 URL: https://issues.apache.org/jira/browse/HBASE-16561 Project: HBase Issue Type: Improvement Components: IPC/RPC, metrics Reporter: Guanghao Zhang Priority: Minor Now there are only metrics about total queue length and active rpc handler count. But for the RWQueueRpcExecutor, we can have different queues and handlers for read/write/scan request. I thought it is necessary to add more metrics for RWQueueRpcExecutor. When use it in production cluster, we can adjust the config of queues and handlers according to the metrics. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-16666) Add append and remove peer namespaces cmds for replication
Guanghao Zhang created HBASE-1: -- Summary: Add append and remove peer namespaces cmds for replication Key: HBASE-1 URL: https://issues.apache.org/jira/browse/HBASE-1 Project: HBase Issue Type: Improvement Components: Replication Reporter: Guanghao Zhang Assignee: Guanghao Zhang Priority: Minor After HBASE-16447, we support replication by namespaces config in peer. Like append_peer_tableCFs and remove_peer_tableCFs, I thought we need two new shell cmd: append_peer_namespaces and remove_peer_namespaces. Then we can easily change the namespaces config. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-16653) Backport HBASE-11393 to all branches which support namespace
Guanghao Zhang created HBASE-16653: -- Summary: Backport HBASE-11393 to all branches which support namespace Key: HBASE-16653 URL: https://issues.apache.org/jira/browse/HBASE-16653 Project: HBase Issue Type: Bug Reporter: Guanghao Zhang As HBASE-11386 mentioned, the parse code about replication table-cfs config will be wrong when table name contains namespace and we can only config the default namespace's tables in the peer. It is a bug for all branches which support namespace. HBASE-11393 resolved this by use a pb object but it was only merged to master branch. Other branches still have this problem. I thought we should fix this bug in all branches which support namespace. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-16446) append_peer_tableCFs failed when there already have this table's partial cfs in the peer
Guanghao Zhang created HBASE-16446: -- Summary: append_peer_tableCFs failed when there already have this table's partial cfs in the peer Key: HBASE-16446 URL: https://issues.apache.org/jira/browse/HBASE-16446 Project: HBase Issue Type: Bug Components: Replication Affects Versions: 0.98.21, 2.0.0 Reporter: Guanghao Zhang Assignee: Guanghao Zhang Priority: Minor {code} hbase(main):011:0> list_peers PEER_ID CLUSTER_KEY STATE TABLE_CFS PROTOCOL BANDWIDTH 20 hbase://c3tst-pressure98 ENABLED default.test_replication:A NATIVE 0 1 row(s) in 0.0080 seconds hbase(main):012:0> append_peer_tableCFs '20', {"test_replication" => []} 0 row(s) in 0.0060 seconds hbase(main):013:0> list_peers PEER_ID CLUSTER_KEY STATE TABLE_CFS PROTOCOL BANDWIDTH 20 hbase://c3tst-pressure98 ENABLED default.test_replication:A NATIVE 0 1 row(s) in 0.0030 seconds {code} "test_replication" => [] means replication all cf of this table,so the result doesn't right. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-16447) Replication by namespace in peer
Guanghao Zhang created HBASE-16447: -- Summary: Replication by namespace in peer Key: HBASE-16447 URL: https://issues.apache.org/jira/browse/HBASE-16447 Project: HBase Issue Type: New Feature Components: Replication Reporter: Guanghao Zhang Now we only config table cfs in peer. But in our production cluster, there are a dozen of namespace and every namespace has dozens of tables. It was complicated to config all table cfs in peer. For some namespace, it need replication all tables to other slave cluster. It will be easy to config if we support replication by namespace. Suggestions and discussions are welcomed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-16707) [Umbrella] Improve throttling feature for production usage
Guanghao Zhang created HBASE-16707: -- Summary: [Umbrella] Improve throttling feature for production usage Key: HBASE-16707 URL: https://issues.apache.org/jira/browse/HBASE-16707 Project: HBase Issue Type: Umbrella Reporter: Guanghao Zhang HBASE-11598 add rpc throttling feature and did a great initial work there. We plan to use throttling in our production cluster and did some improvements for it. From the user mail list, I found that there are other users used throttling feature, too. I thought it is time to contribute our work to community, include: 1. Add shell cmd to start/stop throttling. 2. Add metrics for throttling request. 3. Basic UI support in master/regionserver. 4. Handle throttling exception in client. 5. Add more throttle types like DynamoDB, use read/write capacity unit to throttle. 6. Support soft limit, user can over consume his quota when regionserver has available capacity because other users not consume at the same time. 7. ... ... I thought some of these improvements are useful. So open an umbrella issue to track. Suggestions and discussions are welcomed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-16870) Add the metrics of replication sources which were transformed from other died rs to ReplicationLoad
Guanghao Zhang created HBASE-16870: -- Summary: Add the metrics of replication sources which were transformed from other died rs to ReplicationLoad Key: HBASE-16870 URL: https://issues.apache.org/jira/browse/HBASE-16870 Project: HBase Issue Type: Bug Components: Replication Reporter: Guanghao Zhang Assignee: Guanghao Zhang Priority: Minor {code} private void buildReplicationLoad() { // get source List sources = this.replicationManager.getSources(); List sourceMetricsList = new ArrayList(); for (ReplicationSourceInterface source : sources) { if (source instanceof ReplicationSource) { sourceMetricsList.add(((ReplicationSource) source).getSourceMetrics()); } } // get sink MetricsSink sinkMetrics = this.replicationSink.getSinkMetrics(); this.replicationLoad.buildReplicationLoad(sourceMetricsList, sinkMetrics); } {code} The buildReplicationLoad method in o.a.h.h.r.r.Replication didn't consider the replication source which were transformed from other died rs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-16868) Avoid appending table to a peer which replicates all tables
Guanghao Zhang created HBASE-16868: -- Summary: Avoid appending table to a peer which replicates all tables Key: HBASE-16868 URL: https://issues.apache.org/jira/browse/HBASE-16868 Project: HBase Issue Type: New Feature Components: Replication Affects Versions: 2.0.0 Reporter: Guanghao Zhang First add a new peer by shell cmd. {code} add_peer '1', CLUSTER_KEY => "server1.cie.com:2181:/hbase". {code} If we don't set namespaces and table cfs in peer config. It means replicate all tables to the peer cluster. Then append a table to the peer config. {code} append_peer_tableCFs '1', {"table1" => []} {code} Then this peer will only replicate table1 to the peer cluster. It changes to replicate only one table from replicate all tables in the cluster. It is very easy to misuse in production cluster. So we should avoid appending table to a peer which replicates all table. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-16938) TableCFsUpdater maybe failed due to no write permission on peerNode
Guanghao Zhang created HBASE-16938: -- Summary: TableCFsUpdater maybe failed due to no write permission on peerNode Key: HBASE-16938 URL: https://issues.apache.org/jira/browse/HBASE-16938 Project: HBase Issue Type: Bug Components: Replication Affects Versions: 2.0.0, 1.4.0 Reporter: Guanghao Zhang After HBASE-11393, replication table-cfs use a PB object. So it need copy the old string config to new PB object when upgrade cluster. In our use case, we have different kerberos for different cluster, etc. online serve cluster and offline processing cluster. And we use a unify global admin kerberos for all clusters. The peer node is created by client. So only global admin has the write permission for it. When upgrade cluster, HMaster doesn't has the write permission on peer node, it maybe failed to copy old table-cfs string to new PB Object. I thought it need a tool for client to do this copy job. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-16939) ExportSnapshot: set owner and permission on right directory
Guanghao Zhang created HBASE-16939: -- Summary: ExportSnapshot: set owner and permission on right directory Key: HBASE-16939 URL: https://issues.apache.org/jira/browse/HBASE-16939 Project: HBase Issue Type: Bug Reporter: Guanghao Zhang Priority: Minor {code} FileUtil.copy(inputFs, snapshotDir, outputFs, initialOutputSnapshotDir, false, false, conf); if (filesUser != null || filesGroup != null) { setOwner(outputFs, snapshotTmpDir, filesUser, filesGroup, true); } if (filesMode > 0) { setPermission(outputFs, snapshotTmpDir, (short)filesMode, true); } {code} It copy snapshot manifest to initialOutputSnapshotDir, but it set owner on snapshotTmpDir. They are different directory when skipTmp is true. Another problem is new cluster doesn't have .hbase-snapshot directory. So after export snapshot, it should set owner on .hbase-snapshot directory. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-17088) Refactor RWQueueRpcExecutor/BalancedQueueRpcExecutor/RpcExecutor
Guanghao Zhang created HBASE-17088: -- Summary: Refactor RWQueueRpcExecutor/BalancedQueueRpcExecutor/RpcExecutor Key: HBASE-17088 URL: https://issues.apache.org/jira/browse/HBASE-17088 Project: HBase Issue Type: Improvement Components: rpc Affects Versions: 2.0.0 Reporter: Guanghao Zhang 1. The RWQueueRpcExecutor has eight constructor method and the longest one has ten parameters. But It is only used in SimpleRpcScheduler and easy to confused when read the code. 2. There are duplicate method implement in RWQueueRpcExecutor and BalancedQueueRpcExecutor. They can be implemented in their parent class RpcExecutor. 3. SimpleRpcScheduler read many configs to new RpcExecutor. But the CALL_QUEUE_SCAN_SHARE_CONF_KEY is only needed by RWQueueRpcExecutor. And CALL_QUEUE_CODEL_TARGET_DELAY, CALL_QUEUE_CODEL_INTERVAL and CALL_QUEUE_CODEL_LIFO_THRESHOLD are only needed by AdaptiveLifoCoDelCallQueue. So I thought we can refactor it. Suggestions are welcome. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-17178) Add region balance throttling
Guanghao Zhang created HBASE-17178: -- Summary: Add region balance throttling Key: HBASE-17178 URL: https://issues.apache.org/jira/browse/HBASE-17178 Project: HBase Issue Type: Improvement Components: Balancer Reporter: Guanghao Zhang Our online cluster serves dozens of tables and different tables serve for different services. If the balancer moves too many regions in the same time, it will decrease the availability for some table or some services. So we add region balance throttling on our online serve cluster. We introduce a new config hbase.balancer.max.balancing.regions, which means the max number of regions in transition when balancing. If we config this to 1 and a table have 100 regions, then the table will have 99 regions available at any time. It helps a lot for our use case and it has been running a long time our production cluster. But for some use case, we need the balancer run faster. If a cluster has 100 regionservers, then it add 50 new regionservers for peak requests. Then it need balancer run as soon as possible and let the cluster reach a balance state soon. Our idea is compute max number of regions in transition by the max balancing time and the average time of region in transition. Then the balancer use the computed value to throttling. Examples for understanding. A cluster has 100 regionservers, each regionserver has 200 regions and the average time of region in transition is 1 seconds, we config the max balancing time is 10 * 60 seconds. Case 1. One regionserver crash, the cluster at most need balance 200 regions. Then 200 / (10 * 60s / 1s) < 1, it means the max number of regions in transition is 1 when balancing. Then the balancer can move region one by one and the cluster will have high availability when balancing. Case 2. Add other 100 regionservers, the cluster at most need balance 1 regions. Then 1 / (10 * 60s / 1s) = 16.7, it means the max number of regions in transition is 17 when balancing. Then the cluster can reach a balance state within the max balancing time. Any suggestions are welcomed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-17125) Inconsistent result when use filter to read data
Guanghao Zhang created HBASE-17125: -- Summary: Inconsistent result when use filter to read data Key: HBASE-17125 URL: https://issues.apache.org/jira/browse/HBASE-17125 Project: HBase Issue Type: Bug Reporter: Guanghao Zhang Assume a cloumn's max versions is 3, then we write 4 versions of this column. The oldest version doesn't remove immediately. But from the user view, the oldest version has gone. When user use a filter to query, if the filter skip a new version, then the oldest version will be seen again. But after compact the region, then the oldest version will never been seen. So it is weird for user. The query will get inconsistent result before and after region compaction. The reason is matchColumn method of UserScanQueryMatcher. It first check the cell by filter, then check the number of versions needed. So if the filter skip the new version, then the oldest version will be seen again when it is not removed. Have a discussion offline with [~Apache9] and [~fenghh], now we have two solution for this problem. The first idea is check the number of versions first, then check the cell by filter. As the comment of setFilter, the filter is called after all tests for ttl, column match, deletes and max versions have been run. {code} /** * Apply the specified server-side filter when performing the Query. * Only {@link Filter#filterKeyValue(Cell)} is called AFTER all tests * for ttl, column match, deletes and max versions have been run. * @param filter filter to run on the server * @return this for invocation chaining */ public Query setFilter(Filter filter) { this.filter = filter; return this; } {code} But this idea has another problem, if a column's max version is 5 and the user query only need 3 versions. It first check the version's number, then check the cell by filter. So the cells number of the result may less than 3. But there are 2 versions which don't read anymore. So the second idea has three steps. 1. check by the max versions of this column 2. check the kv by filter 3. check the versions which user need. But this will lead the ScanQueryMatcher more complicated. And this will break the javadoc of Query.setFilter. Now we don't have a final solution for this problem. Suggestions are welcomed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-17077) Don't copy the replication queue which belong to the peer have been deleted
Guanghao Zhang created HBASE-17077: -- Summary: Don't copy the replication queue which belong to the peer have been deleted Key: HBASE-17077 URL: https://issues.apache.org/jira/browse/HBASE-17077 Project: HBase Issue Type: Improvement Reporter: Guanghao Zhang Assignee: Guanghao Zhang Priority: Minor When a region server is dead, then other live region servers will transfer the dead rs's replication queue to their own queue. Now the live rs first copy the wals queue to its own znode, then create a new replication source to replicate the wals. But if the queue belong to a peer have been deleted, it copy the queue, too. The current steps is: 1. copy the queue to its own znode 2. found the queue belong to a peer have been deleted 3. remove the queue and don't create a new replication source for it There is a small improvement. The live region server doesn't need to copy the queue to its own znode. The new steps is: 1. found the queue belong to a peer have been deleted 2. remove the queue directly instead of copy it -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-17140) Throw RegionOfflineException directly when request for a disabled table
Guanghao Zhang created HBASE-17140: -- Summary: Throw RegionOfflineException directly when request for a disabled table Key: HBASE-17140 URL: https://issues.apache.org/jira/browse/HBASE-17140 Project: HBase Issue Type: Improvement Components: Client Reporter: Guanghao Zhang Now when request for a disabled table, it need 3 rpc call before fail. 1. get region location 2. send call to rs and get NotServeRegionException 3. retry and check the table state, then throw TableNotEnabledException The table state check is added for disabled table. But now the prepare method in RegionServerCallable shows that all retry request will get table state first. {code} public void prepare(final boolean reload) throws IOException { // check table state if this is a retry if (reload && !tableName.equals(TableName.META_TABLE_NAME) && getConnection().isTableDisabled(tableName)) { throw new TableNotEnabledException(tableName.getNameAsString() + " is disabled."); } try (RegionLocator regionLocator = connection.getRegionLocator(tableName)) { this.location = regionLocator.getRegionLocation(row); } if (this.location == null) { throw new IOException("Failed to find location, tableName=" + tableName + ", row=" + Bytes.toString(row) + ", reload=" + reload); } setStubByServiceName(this.location.getServerName()); } {code} An improvement is set the region offline in HRegionInfo. Then throw the RegionOfflineException when get region location. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-16910) Avoid NPE when start StochasticLoadBalancer
Guanghao Zhang created HBASE-16910: -- Summary: Avoid NPE when start StochasticLoadBalancer Key: HBASE-16910 URL: https://issues.apache.org/jira/browse/HBASE-16910 Project: HBase Issue Type: Bug Components: Balancer Affects Versions: 2.0.0 Reporter: Guanghao Zhang Priority: Minor When master start, it initialize StochasticLoadBalancer. {code} this.balancer.setClusterStatus(getClusterStatus()); this.balancer.setMasterServices(this); {code} It first setClusterStatus(), then setMasterService(). But in setClusterStatus method, it use master service which is not initialized. So it will throw NPE. ``` int tablesCount = isByTable ? services.getTableDescriptors().getAll().size() : 1; ``` It happens when set isByTable is ture. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-16985) TestClusterId failed due to wrong hbase rootdir
Guanghao Zhang created HBASE-16985: -- Summary: TestClusterId failed due to wrong hbase rootdir Key: HBASE-16985 URL: https://issues.apache.org/jira/browse/HBASE-16985 Project: HBase Issue Type: Bug Reporter: Guanghao Zhang Priority: Minor https://builds.apache.org/job/PreCommit-HBASE-Build/4253/testReport/org.apache.hadoop.hbase.regionserver/TestClusterId/testClusterId/ {code} java.io.IOException: Shutting down at org.apache.hadoop.hbase.util.JVMClusterUtil.startup(JVMClusterUtil.java:230) at org.apache.hadoop.hbase.LocalHBaseCluster.startup(LocalHBaseCluster.java:409) at org.apache.hadoop.hbase.MiniHBaseCluster.init(MiniHBaseCluster.java:227) at org.apache.hadoop.hbase.MiniHBaseCluster.(MiniHBaseCluster.java:96) at org.apache.hadoop.hbase.HBaseTestingUtility.startMiniHBaseCluster(HBaseTestingUtility.java:1071) at org.apache.hadoop.hbase.HBaseTestingUtility.startMiniHBaseCluster(HBaseTestingUtility.java:1037) at org.apache.hadoop.hbase.regionserver.TestClusterId.testClusterId(TestClusterId.java:85) {code} The cluster can not start up because there are no active master. The active master can not finish initialing because the hbase:namespace region can not be assign. In TestClusterId unit test, TEST_UTIL.startMiniHBaseCluster set new hbase root dir. But the regionserver thread which stared first used a different hbase root dir. If assign hbase:namespace region to this regionserver, the region can not be assigned because there are no tableinfo on wrong hbase root dir. When regionserver report to master, it will get back some new config. But the FSTableDescriptors has been initialed so it's root dir didn't changed. {code} if (LOG.isDebugEnabled()) { LOG.info("Config from master: " + key + "=" + value); } {code} I thought FSTableDescriptors need update the rootdir when regionserver get report from master. The master branch has same problem, too. But the balancer always assign hbase:namesapce region to master. So this unit test can passed on master branch. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Reopened] (HBASE-16983) TestMultiTableSnapshotInputFormat failing with Unable to create region directory: /tmp/...
[ https://issues.apache.org/jira/browse/HBASE-16983?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guanghao Zhang reopened HBASE-16983: Reopen for addendum. [~stack] > TestMultiTableSnapshotInputFormat failing with Unable to create region > directory: /tmp/... > --- > > Key: HBASE-16983 > URL: https://issues.apache.org/jira/browse/HBASE-16983 > Project: HBase > Issue Type: Bug > Components: test >Reporter: stack >Assignee: stack >Priority: Minor > Fix For: 2.0.0, 1.4.0 > > Attachments: 16983.txt, HBASE-16983-ADDENDUM.patch, > HBASE-16983-branch-1-ADDENDUM.patch > > > Test is using /tmp. We failed creating dir in /tmp in a few tests from this > suite just now: > https://builds.apache.org/job/PreCommit-HBASE-Build/4253/testReport/org.apache.hadoop.hbase.mapred/TestMultiTableSnapshotInputFormat/testScanOBBToOPP/ > {code} > Caused by: java.io.IOException: Unable to create region directory: > /tmp/scantest2_snapshot__953e2b2d-22aa-4c6a-a46a-272619f5436e/data/default/scantest2/5629158a49e010e21ac0bd16453b2d8c > at > org.apache.hadoop.hbase.regionserver.HRegionFileSystem.createRegionOnFileSystem(HRegionFileSystem.java:896) > at > org.apache.hadoop.hbase.regionserver.HRegion.createHRegion(HRegion.java:6520) > at > org.apache.hadoop.hbase.util.ModifyRegionUtils.createRegion(ModifyRegionUtils.java:205) > at > org.apache.hadoop.hbase.util.ModifyRegionUtils$1.call(ModifyRegionUtils.java:173) > at > org.apache.hadoop.hbase.util.ModifyRegionUtils$1.call(ModifyRegionUtils.java:170) > at java.util.concurrent.FutureTask.run(FutureTask.java:262) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) > at java.util.concurrent.FutureTask.run(FutureTask.java:262) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > ... > {code} > No more detail than this. Let me change it so creates stuff in the test dir > that it for sure owns/can write to. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-16947) Some improvements for DumpReplicationQueues tool
Guanghao Zhang created HBASE-16947: -- Summary: Some improvements for DumpReplicationQueues tool Key: HBASE-16947 URL: https://issues.apache.org/jira/browse/HBASE-16947 Project: HBase Issue Type: Improvement Components: Replication Reporter: Guanghao Zhang Recently we met too many replication WALs problem in our production cluster. We need the DumpReplicationQueues tool to analyze the replication queues info in zookeeper. So I backport HBASE-16450 to our branch based 0.98 and did some improvements for it. 1. Show the dead regionservers under replication/rs znode. When there are too many WALs under znode, it can't be atomic transferred to new rs znode. So the dead rs znode will be leaved on zookeeper. 2. Make a summary about all the queues that belong to peer has been deleted. 3. Aggregate all regionservers' size of replication queue. Now the regionserver report ReplicationLoad to master, but there were not a aggregate metrics for replication. 4. Show how many WALs which can not found on hdfs. But the reason (WAL Not Found) need more time to dig. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-17288) Add warn log for huge keyvalue and huge row
Guanghao Zhang created HBASE-17288: -- Summary: Add warn log for huge keyvalue and huge row Key: HBASE-17288 URL: https://issues.apache.org/jira/browse/HBASE-17288 Project: HBase Issue Type: Improvement Components: scan Reporter: Guanghao Zhang Assignee: Guanghao Zhang Priority: Minor Some log examples from our production cluster. {code} 2016-12-10,17:08:11,478 WARN org.apache.hadoop.hbase.regionserver.StoreScanner: adding a HUGE KV into result list, kv size:1253360, kv:10567114001-1-c/R:r1/1481360887152/Put/vlen=1253245/ts=923099, from table X 2016-12-10,17:08:16,724 WARN org.apache.hadoop.hbase.regionserver.StoreScanner: adding a HUGE KV into result list, kv size:1048680, kv:0220459/I:i_0/1481360889551/Put/vlen=1048576/ts=13642, from table XX {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-17289) Avoid adding a replication peer named "lock"
Guanghao Zhang created HBASE-17289: -- Summary: Avoid adding a replication peer named "lock" Key: HBASE-17289 URL: https://issues.apache.org/jira/browse/HBASE-17289 Project: HBase Issue Type: Bug Components: Replication Affects Versions: 1.2.4, 0.98.23, 1.1.7, 1.3.0, 1.4.0 Reporter: Guanghao Zhang Priority: Minor When zk based replication queue is used and useMulti is false, the steps of transfer replication queues are first add a lock, then copy nodes, finally clean old queue and the lock. And the default lock znode's name is "lock". So we should avoid adding a peer named "lock". -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-17312) [JDK8] Use default method in XXXObserver
Guanghao Zhang created HBASE-17312: -- Summary: [JDK8] Use default method in XXXObserver Key: HBASE-17312 URL: https://issues.apache.org/jira/browse/HBASE-17312 Project: HBase Issue Type: Task Components: Coprocessors Affects Versions: 2.0.0 Reporter: Guanghao Zhang -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-17317) [branch-1] The updatePeerConfig method in ReplicationPeersZKImpl didn't update the table-cfs map
Guanghao Zhang created HBASE-17317: -- Summary: [branch-1] The updatePeerConfig method in ReplicationPeersZKImpl didn't update the table-cfs map Key: HBASE-17317 URL: https://issues.apache.org/jira/browse/HBASE-17317 Project: HBase Issue Type: Task Affects Versions: 1.4.0 Reporter: Guanghao Zhang Assignee: Guanghao Zhang The updatePeerConfig method in ReplicationPeersZKImpl.java {code} @Override public void updatePeerConfig(String id, ReplicationPeerConfig newConfig) throws ReplicationException { ReplicationPeer peer = getPeer(id); if (peer == null){ throw new ReplicationException("Could not find peer Id " + id); } ReplicationPeerConfig existingConfig = peer.getPeerConfig(); if (newConfig.getClusterKey() != null && !newConfig.getClusterKey().isEmpty() && !newConfig.getClusterKey().equals(existingConfig.getClusterKey())){ throw new ReplicationException("Changing the cluster key on an existing peer is not allowed." + " Existing key '" + existingConfig.getClusterKey() + "' does not match new key '" + newConfig.getClusterKey() + "'"); } String existingEndpointImpl = existingConfig.getReplicationEndpointImpl(); if (newConfig.getReplicationEndpointImpl() != null && !newConfig.getReplicationEndpointImpl().isEmpty() && !newConfig.getReplicationEndpointImpl().equals(existingEndpointImpl)){ throw new ReplicationException("Changing the replication endpoint implementation class " + "on an existing peer is not allowed. Existing class '" + existingConfig.getReplicationEndpointImpl() + "' does not match new class '" + newConfig.getReplicationEndpointImpl() + "'"); } //Update existingConfig's peer config and peer data with the new values, but don't touch config // or data that weren't explicitly changed existingConfig.getConfiguration().putAll(newConfig.getConfiguration()); existingConfig.getPeerData().putAll(newConfig.getPeerData()); // Bug. We should update table-cfs map, too. try { ZKUtil.setData(this.zookeeper, getPeerNode(id), ReplicationSerDeHelper.toByteArray(existingConfig)); } catch(KeeperException ke){ throw new ReplicationException("There was a problem trying to save changes to the " + "replication peer " + id, ke); } } {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-17296) Provide per peer throttling for replication
Guanghao Zhang created HBASE-17296: -- Summary: Provide per peer throttling for replication Key: HBASE-17296 URL: https://issues.apache.org/jira/browse/HBASE-17296 Project: HBase Issue Type: Improvement Components: Replication Reporter: Guanghao Zhang HBASE-9501 added a config to provide throttling for replication. And each peer has same bandwidth up limit. In our use case, one cluster may have several peers and several slave clusters. Each slave cluster may have different scales and need different bandwidth up limit for each peer. So We add bandwidth to replication peer config and provide a shell cmd set_peer bandwidth to update the bandwidth in need. It has been used for a long time on our clusters. Any suggestions are welcomed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-17303) Let master to check and transfer the dead rs's replication queues
Guanghao Zhang created HBASE-17303: -- Summary: Let master to check and transfer the dead rs's replication queues Key: HBASE-17303 URL: https://issues.apache.org/jira/browse/HBASE-17303 Project: HBase Issue Type: Bug Components: Replication Reporter: Guanghao Zhang Dump replication queues result from our cluster. {code} Found 8 deleted queues, run hbck -fixReplication in order to remove the deleted replication queues hostname,24610,1481528189915/80-hostname,24620,1476784763605 hostname,24620,1476784763605/70-hostname,24630,1470418208092-hostname,24600,1476773709589 hostname,24630,1481528526258/17000-hostname,24620,1470044455538-hostname,24630,1470037674231-hostname,24600,1476773708489-hostname,24620,1476784763605 hostname,24620,1481528358531/70-hostname,24600,1476773709589-hostname,24620,1476784763605 hostname,24600,1481528021595/70-hostname,24630,1470421093464-hostname,24630,1476773708939-hostname,24610,1476779010928-hostname,24620,1476784747260 hostname,24600,1481528021595/17000-hostname,24620,1476784763605 hostname,24600,1481528021595/17000-hostname,24630,1475381530644-hostname,24600,1476773709589-hostname,24620,1476784763605 hostname,24600,1481528021595/17000-hostname,24600,1476773709589-hostname,24620,1476784763605 Found 2 dead regionservers, restart one regionserver to transfer the queues of dead regionservers hostname,24600,1481547616148 hostname,24620,1476784763605 {code} Now for dead rs's replication znode, you need restart one regionserver to transfer the replication queues of dead regionservers. Same idea with HBASE-16336, we can let master to periodically check the dead rs znode, too. And send the transfer replication queues request to any regionserver. Then the dead rs's replication queues can be transfer automatically and don't need to wait a regionserver restart. Any suggestions are welcomed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-17442) Move most of the replication related classes to hbase-server package
Guanghao Zhang created HBASE-17442: -- Summary: Move most of the replication related classes to hbase-server package Key: HBASE-17442 URL: https://issues.apache.org/jira/browse/HBASE-17442 Project: HBase Issue Type: Sub-task Affects Versions: 2.0.0 Reporter: Guanghao Zhang After the replication requests are routed through master, replication implementation details didn't need be exposed to client. We should move most of the replication related classes to hbase-server package. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-17205) Add a metric for the duration of region in transition
Guanghao Zhang created HBASE-17205: -- Summary: Add a metric for the duration of region in transition Key: HBASE-17205 URL: https://issues.apache.org/jira/browse/HBASE-17205 Project: HBase Issue Type: Improvement Components: Region Assignment Reporter: Guanghao Zhang Assignee: Guanghao Zhang Priority: Minor When work for HBASE-17178, I found there are not a metric for the overall duration of region in transition. When move a region form A to B, the transformation of region state is PENDING_CLOSE => CLOSING => CLOSED => PENDING_OPEN => OPENING => OPENED. When transform old region state to new region state, it update the time stamp to current time. So we can't get the overall transformation's duration of region in transition. Add a rit duration to RegionState for accumulating this metric. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-17388) Move ReplicationPeer and other replication related PB messages to the replication.proto
Guanghao Zhang created HBASE-17388: -- Summary: Move ReplicationPeer and other replication related PB messages to the replication.proto Key: HBASE-17388 URL: https://issues.apache.org/jira/browse/HBASE-17388 Project: HBase Issue Type: Sub-task Components: Replication Affects Versions: 2.0.0 Reporter: Guanghao Zhang Assignee: Guanghao Zhang Fix For: 2.0.0 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-17389) Convert all internal usages from ReplicationAdmin to Admin
Guanghao Zhang created HBASE-17389: -- Summary: Convert all internal usages from ReplicationAdmin to Admin Key: HBASE-17389 URL: https://issues.apache.org/jira/browse/HBASE-17389 Project: HBase Issue Type: Sub-task Affects Versions: 2.0.0 Reporter: Guanghao Zhang Fix For: 2.0.0 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-17396) Add first async admin impl and implement balance methods
Guanghao Zhang created HBASE-17396: -- Summary: Add first async admin impl and implement balance methods Key: HBASE-17396 URL: https://issues.apache.org/jira/browse/HBASE-17396 Project: HBase Issue Type: Sub-task Reporter: Guanghao Zhang Assignee: Guanghao Zhang -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-17443) Move listReplicated/enableTableRep/disableTableRep from ReplicationAdmin to Admin
Guanghao Zhang created HBASE-17443: -- Summary: Move listReplicated/enableTableRep/disableTableRep from ReplicationAdmin to Admin Key: HBASE-17443 URL: https://issues.apache.org/jira/browse/HBASE-17443 Project: HBase Issue Type: Sub-task Affects Versions: 2.0.0 Reporter: Guanghao Zhang Fix For: 2.0.0 We have moved other replication requests to Admin and mark ReplicationAdmin as Deprecated, so listReplicated/enableTableRep/disableTableRep methods need move to Admin, too. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-17348) Remove the unused hbase.replication from javadoc/comment completely
Guanghao Zhang created HBASE-17348: -- Summary: Remove the unused hbase.replication from javadoc/comment completely Key: HBASE-17348 URL: https://issues.apache.org/jira/browse/HBASE-17348 Project: HBase Issue Type: Improvement Reporter: Guanghao Zhang Assignee: Guanghao Zhang Priority: Trivial Configuration hbase.replication has been removed by HBASE-16040. But there are still some hbase.replication left in javadoc of ReplicationAdmin, Admin.proto and shell.rb. Let's remove it completely. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-17337) list replication peers request should be routed through master
Guanghao Zhang created HBASE-17337: -- Summary: list replication peers request should be routed through master Key: HBASE-17337 URL: https://issues.apache.org/jira/browse/HBASE-17337 Project: HBase Issue Type: Sub-task Reporter: Guanghao Zhang Assignee: Guanghao Zhang -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-17335) enable/disable replication peer requests should be routed through master
Guanghao Zhang created HBASE-17335: -- Summary: enable/disable replication peer requests should be routed through master Key: HBASE-17335 URL: https://issues.apache.org/jira/browse/HBASE-17335 Project: HBase Issue Type: Sub-task Reporter: Guanghao Zhang Assignee: Guanghao Zhang As HBASE-11392 description says, we should move replication operations to be routed through master. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-17336) get/update replication peer config requests should be routed through master
Guanghao Zhang created HBASE-17336: -- Summary: get/update replication peer config requests should be routed through master Key: HBASE-17336 URL: https://issues.apache.org/jira/browse/HBASE-17336 Project: HBase Issue Type: Sub-task Reporter: Guanghao Zhang Assignee: Guanghao Zhang As HBASE-11392 description says, we should move replication operations to be routed through master. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HBASE-14609) Can't config all day as OffPeakHours
[ https://issues.apache.org/jira/browse/HBASE-14609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guanghao Zhang resolved HBASE-14609. Resolution: Won't Fix > Can't config all day as OffPeakHours > > > Key: HBASE-14609 > URL: https://issues.apache.org/jira/browse/HBASE-14609 > Project: HBase > Issue Type: Bug > Reporter: Guanghao Zhang > Assignee: Guanghao Zhang >Priority: Minor > > The offpeak hours is [startHour, endHour) and endhour is exclusive. But > endHour is not valid when config as 24, so we can't config all day as > OffPeakHours. > {code} > private static boolean isValidHour(int hour) { > return 0 <= hour && hour <= 23; > } > {code} > Let endHour=24 is valid or enable startHour==endHour can fix this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-17326) Fix findbugs warning in BufferedMutatorParams
Guanghao Zhang created HBASE-17326: -- Summary: Fix findbugs warning in BufferedMutatorParams Key: HBASE-17326 URL: https://issues.apache.org/jira/browse/HBASE-17326 Project: HBase Issue Type: Bug Reporter: Guanghao Zhang https://builds.apache.org/job/PreCommit-HBASE-Build/4947/artifact/patchprocess/branch-findbugs-hbase-client-warnings.html org.apache.hadoop.hbase.client.BufferedMutatorParams defines clone() but doesn't implement Cloneable -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-17846) [JDK8] Use Optional instead of Nullable parameter in async client
Guanghao Zhang created HBASE-17846: -- Summary: [JDK8] Use Optional instead of Nullable parameter in async client Key: HBASE-17846 URL: https://issues.apache.org/jira/browse/HBASE-17846 Project: HBase Issue Type: Improvement Affects Versions: 2.0.0 Reporter: Guanghao Zhang Assignee: Guanghao Zhang For master branch, we use a lot of java 8 features in async client, like lambda, stream, default method and so on. And java 8 support Optional, we can update some method to use Optional instead of Nullable parameter. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (HBASE-17790) Mark ReplicationAdmin's peerAdded and listReplicationPeers as Deprecated
Guanghao Zhang created HBASE-17790: -- Summary: Mark ReplicationAdmin's peerAdded and listReplicationPeers as Deprecated Key: HBASE-17790 URL: https://issues.apache.org/jira/browse/HBASE-17790 Project: HBase Issue Type: Sub-task Reporter: Guanghao Zhang Assignee: Guanghao Zhang Priority: Minor Now most of public method in ReplicationAdmin have been moved to Admin and marked as Deprecated. And peerAdded and listReplicationPeers method need be marked as Deprecated, too. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (HBASE-17913) Fix flaky test TestExportSnapshot/TestMobExportSnapshot/TestMobSecureExportSnapshot/TestSecureExportSnapshot
Guanghao Zhang created HBASE-17913: -- Summary: Fix flaky test TestExportSnapshot/TestMobExportSnapshot/TestMobSecureExportSnapshot/TestSecureExportSnapshot Key: HBASE-17913 URL: https://issues.apache.org/jira/browse/HBASE-17913 Project: HBase Issue Type: Bug Reporter: Guanghao Zhang https://builds.apache.org/job/PreCommit-HBASE-Build/6410/artifact/patchprocess/patch-unit-hbase-server.txt Failed tests: TestExportSnapshot.testExportRetry:273->testExportFileSystemState:233 expected:<0> but was:<1> TestExportSnapshot.testExportWithTargetName:192->testExportFileSystemState:197->testExportFileSystemState:204->testExportFileSystemState:233 expected:<0> but was:<1> TestMobExportSnapshot>TestExportSnapshot.testExportRetry:273->TestExportSnapshot.testExportFileSystemState:233 expected:<0> but was:<1> TestMobExportSnapshot>TestExportSnapshot.testExportWithTargetName:192->TestExportSnapshot.testExportFileSystemState:197->TestExportSnapshot.testExportFileSystemState:204->TestExportSnapshot.testExportFileSystemState:233 expected:<0> but was:<1> TestMobSecureExportSnapshot>TestExportSnapshot.testConsecutiveExports:184->TestExportSnapshot.testExportFileSystemState:204->TestExportSnapshot.testExportFileSystemState:233 expected:<0> but was:<1> TestMobSecureExportSnapshot>TestExportSnapshot.testEmptyExportFileSystemState:178->TestExportSnapshot.testExportFileSystemState:197->TestExportSnapshot.testExportFileSystemState:204->TestExportSnapshot.testExportFileSystemState:233 expected:<0> but was:<1> TestMobSecureExportSnapshot>TestExportSnapshot.testExportFileSystemState:163->TestExportSnapshot.testExportFileSystemState:197->TestExportSnapshot.testExportFileSystemState:204->TestExportSnapshot.testExportFileSystemState:233 expected:<0> but was:<1> TestSecureExportSnapshot>TestExportSnapshot.testExportFileSystemStateWithSkipTmp:170->TestExportSnapshot.testExportFileSystemState:197->TestExportSnapshot.testExportFileSystemState:204->TestExportSnapshot.testExportFileSystemState:233 expected:<0> but was:<1> TestSecureExportSnapshot>TestExportSnapshot.testExportRetry:273->TestExportSnapshot.testExportFileSystemState:233 expected:<0> but was:<1> -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (HBASE-17915) Implement replication admin methods
Guanghao Zhang created HBASE-17915: -- Summary: Implement replication admin methods Key: HBASE-17915 URL: https://issues.apache.org/jira/browse/HBASE-17915 Project: HBase Issue Type: Sub-task Reporter: Guanghao Zhang Assignee: Guanghao Zhang -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Reopened] (HBASE-9899) for idempotent operation dups, return the result instead of throwing conflict exception
[ https://issues.apache.org/jira/browse/HBASE-9899?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guanghao Zhang reopened HBASE-9899: --- Reopen for forget to pass nonce for scanner in branch-1. > for idempotent operation dups, return the result instead of throwing conflict > exception > --- > > Key: HBASE-9899 > URL: https://issues.apache.org/jira/browse/HBASE-9899 > Project: HBase > Issue Type: Improvement >Reporter: Sergey Shelukhin > Assignee: Guanghao Zhang > Fix For: 2.0.0, 1.3.0, 1.4.0 > > Attachments: HBASE-9899-addendum.patch, HBASE-9899-branch-1.patch, > HBASE-9899-branch-1.patch, HBASE-9899-branch-1.patch, HBASE-9899-v1.patch, > HBASE-9899-v2.patch, HBASE-9899-v3.patch, HBASE-9899-v3.patch, > HBASE-9899-v4.patch, HBASE-9899-v4.patch > > > After HBASE-3787, we could store mvcc in operation context, and use it to > convert the modification request into read on dups instead of throwing > OperationConflictException. > MVCC tracking will have to be aware of such MVCC numbers present. Given that > scanners are usually relatively short-lived, that would prevent low watermark > from advancing for quite a bit more time -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (HBASE-18485) Performance issue: ClientAsyncPrefetchScanner is slower than ClientSimpleScanner
Guanghao Zhang created HBASE-18485: -- Summary: Performance issue: ClientAsyncPrefetchScanner is slower than ClientSimpleScanner Key: HBASE-18485 URL: https://issues.apache.org/jira/browse/HBASE-18485 Project: HBase Issue Type: Bug Reporter: Guanghao Zhang Copied the test result from HBASE-17994. {code} ./bin/hbase org.apache.hadoop.hbase.PerformanceEvaluation --rows=10 --nomapred scan 1 ./bin/hbase org.apache.hadoop.hbase.PerformanceEvaluation --rows=10 --nomapred --asyncPrefetch=True scan 1 {code} Mean latency. || ||Test1|| Test2 || Test3 || Test4|| Test5|| |scan| 12.21 | 14.32 | 13.25 | 13.07 | 11.83 | |scan with prefetch=True | 37.36 | 37.88 | 37.56 | 37.66 | 38.28 | -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (HBASE-18481) The autoFlush flag was not used in PE tool
Guanghao Zhang created HBASE-18481: -- Summary: The autoFlush flag was not used in PE tool Key: HBASE-18481 URL: https://issues.apache.org/jira/browse/HBASE-18481 Project: HBase Issue Type: Bug Reporter: Guanghao Zhang Priority: Minor After HBASE-12728, PE used the BufferedMutator for random/sequential write test and the autoFlush flag was not used. So all write test will buffered the write request and send as a batch request when the buffer has filled. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (HBASE-18500) Performance issue: Don't use BufferedMutator for HTable's put method
Guanghao Zhang created HBASE-18500: -- Summary: Performance issue: Don't use BufferedMutator for HTable's put method Key: HBASE-18500 URL: https://issues.apache.org/jira/browse/HBASE-18500 Project: HBase Issue Type: Bug Reporter: Guanghao Zhang Copied the test result from HBASE-17994. Run start-hbase.sh in my local computer and use the default config to test with PE tool. {code} ./bin/hbase org.apache.hadoop.hbase.PerformanceEvaluation --rows=10 --nomapred --autoFlush=True randomWrite 1 ./bin/hbase org.apache.hadoop.hbase.PerformanceEvaluation --rows=10 --nomapred --autoFlush=True asyncRandomWrite 1 {code} Mean latency test result. || || Test1 || Test2 || Test3 || Test4 || Test5 || | randomWrite | 164.39 | 161.22 | 164.78 | 140.61 | 151.69 | | asyncRandomWrite | 122.29 | 125.58 | 122.23 | 113.18 | 123.02 | 50th latency test result. || || Test1 || Test2 || Test3 || Test4 || Test5 || | randomWrite | 130.00 | 125.00 | 123.00 | 112.00 | 121.00 | | asyncRandomWrite | 95.00 | 97.00 | 95.00 | 88.00 | 95.00 | 99th latency test result. || || Test1 || Test2 || Test3 || Test4 || Test5 || | randomWrite | 600.00 | 600.00 | 650.00 | 404.00 | 425.00 | | asyncRandomWrite | 339.00 | 327.00 | 297.00 | 311.00 | 318.00 | In our internal 0.98 branch, the PE test result shows the async write has the almost same latency with the blocking write. But for master branch, the result shows the async write has better latency than the blocking client. Take a look about the code, I thought the difference is the BufferedMutator. For master branch, HTable don't have a write buffer and all write request will be flushed directly. And user can use BufferedMutator when user want to perform client-side buffering of writes. For the performance issue (autoFlush=True), I thought we can use rpc caller directly in HTable's put method. Thanks. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (HBASE-18608) AsyncConnection should return AsyncAdmin interface instead of the implemenation
Guanghao Zhang created HBASE-18608: -- Summary: AsyncConnection should return AsyncAdmin interface instead of the implemenation Key: HBASE-18608 URL: https://issues.apache.org/jira/browse/HBASE-18608 Project: HBase Issue Type: Sub-task Reporter: Guanghao Zhang Assignee: Guanghao Zhang hbase-client/src/main/java/org/apache/hadoop/hbase/client/AsyncConnection.java {code} AsyncAdminBuilder getAdminBuilder(); AsyncAdminBuilder getAdminBuilder(ExecutorService pool); {code} These two method should not use the implemenation: RawAsyncHBaseAdmin and AsyncHBaseAdmin. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (HBASE-18598) AsyncNonMetaRegionLocator use FIFO algorithm to get a candidate locate request
Guanghao Zhang created HBASE-18598: -- Summary: AsyncNonMetaRegionLocator use FIFO algorithm to get a candidate locate request Key: HBASE-18598 URL: https://issues.apache.org/jira/browse/HBASE-18598 Project: HBase Issue Type: Bug Components: asyncclient Reporter: Guanghao Zhang Assignee: Guanghao Zhang Priority: Minor -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (HBASE-18600) Provide a method to disable the Short-Circuit HTable in coprocessor
Guanghao Zhang created HBASE-18600: -- Summary: Provide a method to disable the Short-Circuit HTable in coprocessor Key: HBASE-18600 URL: https://issues.apache.org/jira/browse/HBASE-18600 Project: HBase Issue Type: Improvement Reporter: Guanghao Zhang -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (HBASE-18571) RpcContext inconsistent when call Table's method in RegionCoprocessorEnvironment
Guanghao Zhang created HBASE-18571: -- Summary: RpcContext inconsistent when call Table's method in RegionCoprocessorEnvironment Key: HBASE-18571 URL: https://issues.apache.org/jira/browse/HBASE-18571 Project: HBase Issue Type: Bug Reporter: Guanghao Zhang -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Resolved] (HBASE-17797) Add a filter to implement the function which return the special number of versions of each column
[ https://issues.apache.org/jira/browse/HBASE-17797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guanghao Zhang resolved HBASE-17797. Resolution: Duplicate As we solved this in HBASE-17125, so resolve this as duplicate. > Add a filter to implement the function which return the special number of > versions of each column > - > > Key: HBASE-17797 > URL: https://issues.apache.org/jira/browse/HBASE-17797 > Project: HBase > Issue Type: Bug >Affects Versions: 2.0.0 > Reporter: Guanghao Zhang > Assignee: Guanghao Zhang > > After HBASE-17125, ScanQueryMatch will first check column then check by > filter. The scan/get will get consistent result when use filter to read data. > But scan/get setMaxVersions() can not return the special number of versions > of each column. So this issue will introduce a new filter to implement this > function which return the special number of versions of each column. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (HBASE-18380) Implement async RSGroup admin based on the async admin
Guanghao Zhang created HBASE-18380: -- Summary: Implement async RSGroup admin based on the async admin Key: HBASE-18380 URL: https://issues.apache.org/jira/browse/HBASE-18380 Project: HBase Issue Type: Sub-task Reporter: Guanghao Zhang Now the RSGroup admin client get a blocking stub based on the blocking admin's coprocessor service. As we add coprocessor service support in async admin. So we can implement a new async RSGroup admin client based on the new async admin. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Reopened] (HBASE-18343) Track the remaining unimplemented methods for async admin
[ https://issues.apache.org/jira/browse/HBASE-18343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guanghao Zhang reopened HBASE-18343: TestAsyncRegionAdminApi#testSplitSwitch failed on my PC. > Track the remaining unimplemented methods for async admin > - > > Key: HBASE-18343 > URL: https://issues.apache.org/jira/browse/HBASE-18343 > Project: HBase > Issue Type: Sub-task > Components: Client > Reporter: Guanghao Zhang > Assignee: Guanghao Zhang > Fix For: 3.0.0, 2.0.0-alpha-2 > > Attachments: HBASE-18343.master.001.patch > > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (HBASE-18343) Track the remaining unimplemented methods
Guanghao Zhang created HBASE-18343: -- Summary: Track the remaining unimplemented methods Key: HBASE-18343 URL: https://issues.apache.org/jira/browse/HBASE-18343 Project: HBase Issue Type: Sub-task Reporter: Guanghao Zhang -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (HBASE-18342) Add coprocessor service support
Guanghao Zhang created HBASE-18342: -- Summary: Add coprocessor service support Key: HBASE-18342 URL: https://issues.apache.org/jira/browse/HBASE-18342 Project: HBase Issue Type: Sub-task Reporter: Guanghao Zhang -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Resolved] (HBASE-17359) Implement async admin
[ https://issues.apache.org/jira/browse/HBASE-17359?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guanghao Zhang resolved HBASE-17359. Resolution: Fixed Fix Version/s: (was: 2.0.0) 2.0.0-alpha-2 3.0.0 All sub-tasks have been resolved. > Implement async admin > - > > Key: HBASE-17359 > URL: https://issues.apache.org/jira/browse/HBASE-17359 > Project: HBase > Issue Type: Umbrella > Components: Client >Reporter: Duo Zhang > Assignee: Guanghao Zhang > Labels: asynchronous > Fix For: 3.0.0, 2.0.0-alpha-2 > > > And as we will return a CompletableFuture, I think we can just remove the > XXXAsync methods, and make all the methods blocking which means we will only > finish the CompletableFuture when the operation is done. User can choose > whether to wait on the returned CompletableFuture. > Convert this to a umbrella task. There maybe some sub-tasks. > 1. Table admin operations. > 2. Region admin operations. > 3. Namespace admin operations. > 4. Snapshot admin operations. > 5. Replication admin operations. > 6. Other operations, like quota, balance.. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (HBASE-18297) Provide a AsyncAdminBuilder to create new AsyncAdmin instance
Guanghao Zhang created HBASE-18297: -- Summary: Provide a AsyncAdminBuilder to create new AsyncAdmin instance Key: HBASE-18297 URL: https://issues.apache.org/jira/browse/HBASE-18297 Project: HBase Issue Type: Sub-task Reporter: Guanghao Zhang Similar with AsyncTableBuilder, user can only set the configs they care about when create a new AsyncAdmin instance. It is easy to update the rpc timeout or operation timeout config when they take some time-consuming admin operations. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (HBASE-18317) Implement async admin operations for Normalizer/CleanerChore/CatalogJanitor
Guanghao Zhang created HBASE-18317: -- Summary: Implement async admin operations for Normalizer/CleanerChore/CatalogJanitor Key: HBASE-18317 URL: https://issues.apache.org/jira/browse/HBASE-18317 Project: HBase Issue Type: Sub-task Reporter: Guanghao Zhang -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (HBASE-18318) Implement updateConfiguration/stopMaster/stopRegionServer/shutdown methods
Guanghao Zhang created HBASE-18318: -- Summary: Implement updateConfiguration/stopMaster/stopRegionServer/shutdown methods Key: HBASE-18318 URL: https://issues.apache.org/jira/browse/HBASE-18318 Project: HBase Issue Type: Sub-task Reporter: Guanghao Zhang -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (HBASE-18316) Implement async admin operations for draining region servers
Guanghao Zhang created HBASE-18316: -- Summary: Implement async admin operations for draining region servers Key: HBASE-18316 URL: https://issues.apache.org/jira/browse/HBASE-18316 Project: HBase Issue Type: Sub-task Reporter: Guanghao Zhang -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (HBASE-18319) Implement getClusterStatus method
Guanghao Zhang created HBASE-18319: -- Summary: Implement getClusterStatus method Key: HBASE-18319 URL: https://issues.apache.org/jira/browse/HBASE-18319 Project: HBase Issue Type: Sub-task Reporter: Guanghao Zhang -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (HBASE-17958) Avoid passing cell to ScanQueryMatcher when optimize SEEK to SKIP
Guanghao Zhang created HBASE-17958: -- Summary: Avoid passing cell to ScanQueryMatcher when optimize SEEK to SKIP Key: HBASE-17958 URL: https://issues.apache.org/jira/browse/HBASE-17958 Project: HBase Issue Type: Bug Reporter: Guanghao Zhang {code} ScanQueryMatcher.MatchCode qcode = matcher.match(cell); qcode = optimize(qcode, cell); {code} The optimize method may change the MatchCode from SEEK_NEXT_COL/SEEK_NEXT_ROW to SKIP. But it still pass the next cell to ScanQueryMatcher. It will get wrong result when use some filter, etc. ColumnCountGetFilter. It just count the columns's number. If pass a same column to this filter, the count result will be wrong. So we should avoid passing cell to ScanQueryMatcher when optimize SEEK to SKIP. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (HBASE-18626) Handle the incompatible change about the replication TableCFs' config
Guanghao Zhang created HBASE-18626: -- Summary: Handle the incompatible change about the replication TableCFs' config Key: HBASE-18626 URL: https://issues.apache.org/jira/browse/HBASE-18626 Project: HBase Issue Type: Bug Reporter: Guanghao Zhang Priority: Blocker About compatibility, there is one incompatible change about the replication TableCFs' config. The old config is a string and it concatenate the list of tables and column families in format "table1:cf1,cf2;table2:cfA,cfB" in zookeeper for table-cf to replication peer mapping. When parse the config, it use ":" to split the string. If table name includes namespace, it will be wrong (See HBASE-11386). It is a problem since we support namespace (0.98). So HBASE-11393 (and HBASE-16653) changed it to a PB object. When rolling update cluster, you need rolling master first. And the master will try to translate the string config to a PB object. But there are two problems. 1. Permission problem. The replication client can write the zookeeper directly. So the znode may have different owner. And master may don't have the write permission for the znode. It maybe failed to translate old table-cfs string to new PB Object. See HBASE-16938 2. We usually keep compatibility between old client and new server. But the old replication client may write a string config to znode directly. Then the new server can't parse them. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (HBASE-18053) AsyncTableResultScanner will hang when scan wrong column family
Guanghao Zhang created HBASE-18053: -- Summary: AsyncTableResultScanner will hang when scan wrong column family Key: HBASE-18053 URL: https://issues.apache.org/jira/browse/HBASE-18053 Project: HBase Issue Type: Bug Reporter: Guanghao Zhang -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (HBASE-18052) Add doc, examples for async admin
Guanghao Zhang created HBASE-18052: -- Summary: Add doc, examples for async admin Key: HBASE-18052 URL: https://issues.apache.org/jira/browse/HBASE-18052 Project: HBase Issue Type: Sub-task Reporter: Guanghao Zhang -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Reopened] (HBASE-18234) Revisit the async admin api
[ https://issues.apache.org/jira/browse/HBASE-18234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guanghao Zhang reopened HBASE-18234: > Revisit the async admin api > --- > > Key: HBASE-18234 > URL: https://issues.apache.org/jira/browse/HBASE-18234 > Project: HBase > Issue Type: Sub-task > Components: Client > Reporter: Guanghao Zhang > Assignee: Guanghao Zhang > Fix For: 3.0.0, 2.0.0-alpha-2 > > Attachments: HBASE-18234.master.001.patch, > HBASE-18234.master.002.patch, HBASE-18234.master.003.patch, > HBASE-18234.master.004.patch, HBASE-18234.master.005.patch, > HBASE-18234.master.006.patch, HBASE-18234.master.006.patch, > HBASE-18234.master.006.patch, HBASE-18234.master.007.patch, > HBASE-18234.master.008.patch, HBASE-18234.master.009.patch, > HBASE-18234.master.010.patch, HBASE-18234.master.010.patch, > HBASE-18234.master.addendum.patch > > > 1. Update the balance method name. > balancer -> balance > setBalancerRunning -> setBalancerOn > isBalancerEnabled -> isBalancerOn > 2. Use HRegionLocation instead of Pair<HRegionInfo, ServerName> > 3. Remove the closeRegionWithEncodedRegionName method. Because all other api > can handle region name or encoded region name both. So don't need a method > for encoded name. > 4. Unify the region name parameter's type to byte[]. And region name may be > full name or encoded name. > 5. Unify the server name parameter's type to ServerName. For smoe api, it > support null for server name. So use Optional instead. > 6. Unify the table name parameter's type to TableName. > 7. Unify all list* method only support Pattern as the parameter type. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Resolved] (HBASE-17846) [JDK8] Use Optional instead of Nullable parameter in async client
[ https://issues.apache.org/jira/browse/HBASE-17846?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guanghao Zhang resolved HBASE-17846. Resolution: Duplicate As HBASE-18234 has been resloved, this can be closed, too. > [JDK8] Use Optional instead of Nullable parameter in async client > - > > Key: HBASE-17846 > URL: https://issues.apache.org/jira/browse/HBASE-17846 > Project: HBase > Issue Type: Sub-task >Affects Versions: 2.0.0 > Reporter: Guanghao Zhang > Assignee: Guanghao Zhang > > For master branch, we use a lot of java 8 features in async client, like > lambda, stream, default method and so on. And java 8 support Optional, we can > update some method to use Optional instead of Nullable parameter. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Reopened] (HBASE-15616) Allow null qualifier for all table operations
[ https://issues.apache.org/jira/browse/HBASE-15616?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guanghao Zhang reopened HBASE-15616: > Allow null qualifier for all table operations > - > > Key: HBASE-15616 > URL: https://issues.apache.org/jira/browse/HBASE-15616 > Project: HBase > Issue Type: Bug > Components: Client >Affects Versions: 2.0.0 >Reporter: Jianwei Cui > Assignee: Guanghao Zhang > Fix For: 2.0.0 > > Attachments: HBASE-15615-addendum.patch, HBASE-15616-v1.patch, > HBASE-15616-v2.patch, HBASE-15616-v3.patch, HBASE-15616-v4.patch, > HBASE-15616-v5.patch > > > If qualifier to check is null, the checkAndMutate/checkAndPut/checkAndDelete > will encounter NPE. > The test code: > {code} > table.checkAndPut(row, family, null, Bytes.toBytes(0), new > Put(row).addColumn(family, null, Bytes.toBytes(1))); > {code} > The exception: > {code} > Exception in thread "main" > org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed after > attempts=3, exceptions: > Fri Apr 08 15:51:31 CST 2016, > RpcRetryingCaller{globalStartTime=1460101891615, pause=100, maxAttempts=3}, > java.io.IOException: com.google.protobuf.ServiceException: > java.lang.NullPointerException > Fri Apr 08 15:51:31 CST 2016, > RpcRetryingCaller{globalStartTime=1460101891615, pause=100, maxAttempts=3}, > java.io.IOException: com.google.protobuf.ServiceException: > java.lang.NullPointerException > Fri Apr 08 15:51:32 CST 2016, > RpcRetryingCaller{globalStartTime=1460101891615, pause=100, maxAttempts=3}, > java.io.IOException: com.google.protobuf.ServiceException: > java.lang.NullPointerException > at > org.apache.hadoop.hbase.client.RpcRetryingCallerImpl.callWithRetries(RpcRetryingCallerImpl.java:120) > at org.apache.hadoop.hbase.client.HTable.checkAndPut(HTable.java:772) > at ... > Caused by: java.io.IOException: com.google.protobuf.ServiceException: > java.lang.NullPointerException > at > org.apache.hadoop.hbase.protobuf.ProtobufUtil.getRemoteException(ProtobufUtil.java:341) > at org.apache.hadoop.hbase.client.HTable$7.call(HTable.java:768) > at org.apache.hadoop.hbase.client.HTable$7.call(HTable.java:755) > at > org.apache.hadoop.hbase.client.RpcRetryingCallerImpl.callWithRetries(RpcRetryingCallerImpl.java:99) > ... 2 more > Caused by: com.google.protobuf.ServiceException: > java.lang.NullPointerException > at > org.apache.hadoop.hbase.ipc.AbstractRpcClient.callBlockingMethod(AbstractRpcClient.java:239) > at > org.apache.hadoop.hbase.ipc.AbstractRpcClient$BlockingRpcChannelImplementation.callBlockingMethod(AbstractRpcClient.java:331) > at > org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$BlockingStub.mutate(ClientProtos.java:35252) > at org.apache.hadoop.hbase.client.HTable$7.call(HTable.java:765) > ... 4 more > Caused by: java.lang.NullPointerException > at com.google.protobuf.LiteralByteString.size(LiteralByteString.java:76) > at > com.google.protobuf.CodedOutputStream.computeBytesSizeNoTag(CodedOutputStream.java:767) > at > com.google.protobuf.CodedOutputStream.computeBytesSize(CodedOutputStream.java:539) > at > org.apache.hadoop.hbase.protobuf.generated.ClientProtos$Condition.getSerializedSize(ClientProtos.java:7483) > at > com.google.protobuf.CodedOutputStream.computeMessageSizeNoTag(CodedOutputStream.java:749) > at > com.google.protobuf.CodedOutputStream.computeMessageSize(CodedOutputStream.java:530) > at > org.apache.hadoop.hbase.protobuf.generated.ClientProtos$MutateRequest.getSerializedSize(ClientProtos.java:12431) > at > org.apache.hadoop.hbase.ipc.IPCUtil.getTotalSizeWhenWrittenDelimited(IPCUtil.java:311) > at > org.apache.hadoop.hbase.ipc.AsyncRpcChannel.writeRequest(AsyncRpcChannel.java:409) > at > org.apache.hadoop.hbase.ipc.AsyncRpcChannel.callMethod(AsyncRpcChannel.java:333) > at > org.apache.hadoop.hbase.ipc.AsyncRpcClient.call(AsyncRpcClient.java:245) > at > org.apache.hadoop.hbase.ipc.AbstractRpcClient.callBlockingMethod(AbstractRpcClient.java:226) > ... 7 more > {code} > The reason is {{LiteralByteString.size()}} will throw NPE if wrapped byte > array is null. It is possible to invoke {{put}} and {{checkAndMutate}} on the > same column, because null qualifier is allowed for {{Put}}, users may be > confused if null qualifier is not allowed for {{checkAndMutate}}. We can also > convert null qualifier to empty byte array for {{checkAndMutate}} in client > side. Discussions and suggestions are welcomed. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (HBASE-18111) Replication stuck when cluster connection is closed
Guanghao Zhang created HBASE-18111: -- Summary: Replication stuck when cluster connection is closed Key: HBASE-18111 URL: https://issues.apache.org/jira/browse/HBASE-18111 Project: HBase Issue Type: Bug Affects Versions: 1.1.10, 0.98.24, 1.2.5, 1.3.1, 2.0.0, 1.4.0 Reporter: Guanghao Zhang Assignee: Guanghao Zhang Log: {code} 2017-05-24,03:01:25,603 ERROR [regionserver13700-SendThread(hostxxx:11000)] org.apache.zookeeper.ClientCnxn: SASL authentication with Zookeeper Quorum member failed: javax.security.sasl.SaslException: An error: (java.security.PrivilegedActionException: javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Connection reset)]) occurred when evaluating Zookeeper Quorum Member's received SASL token. Zookeeper Client will go to AUTH_FAILED state. 2017-05-24,03:01:25,615 FATAL [regionserver13700-EventThread] org.apache.hadoop.hbase.client.HConnectionImplementation: hconnection-0x1148dd9b-0x35b6b4d4ca999c6, quorum=10.108.37.30:11000,10.108.38.30:11000,10.108.39.30:11000,10.108.84.25:11000,10.108.84.32:11000, baseZNode=/hbase/c3prc-xiaomi98 hconnection-0x1148dd9b-0x35b6b4d4ca999c6 received auth failed from ZooKeeper, aborting org.apache.zookeeper.KeeperException$AuthFailedException: KeeperErrorCode = AuthFailed at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.connectionEvent(ZooKeeperWatcher.java:425) at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:333) at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:522) at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498) 2017-05-24,03:01:25,615 INFO [regionserver13700-EventThread] org.apache.hadoop.hbase.client.HConnectionImplementation: Closing zookeeper sessionid=0x35b6b4d4ca999c6 2017-05-24,03:01:25,623 WARN [regionserver13700.replicationSource,800] org.apache.hadoop.hbase.replication.regionserver.HBaseInterClusterReplicationEndpoint: Replicate edites to peer cluster failed. java.io.IOException: Call to hostxxx/10.136.22.6:24600 failed on local exception: java.io.IOException: Connection closed {code} jstack {code} java.lang.Thread.State: TIMED_WAITING (sleeping) at java.lang.Thread.sleep(Native Method) at org.apache.hadoop.hbase.replication.regionserver.HBaseInterClusterReplicationEndpoint.sleepForRetries(HBaseInterClusterReplicationEndpoint.java:127) at org.apache.hadoop.hbase.replication.regionserver.HBaseInterClusterReplicationEndpoint.replicate(HBaseInterClusterReplicationEndpoint.java:199) at org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.shipEdits(ReplicationSource.java:905) at org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.run(ReplicationSource.java:492) {code} The cluster connection was aborted when the ZookeeperWatcher receive a AuthFailed event. Then the HBaseInterClusterReplicationEndpoint's replicate() method will stuck in a while loop. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (HBASE-18130) Refactor ReplicationSource
Guanghao Zhang created HBASE-18130: -- Summary: Refactor ReplicationSource Key: HBASE-18130 URL: https://issues.apache.org/jira/browse/HBASE-18130 Project: HBase Issue Type: Improvement Reporter: Guanghao Zhang Assignee: Guanghao Zhang One basic idea is move the code about recovered queue to a new subclass RecoveredReplicationSource. Then ReplicationSource will don't need call isQueueRecovered many times. This will make the code more clearly. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (HBASE-18114) Update the config of TestAsync*AdminApi to make test stable
Guanghao Zhang created HBASE-18114: -- Summary: Update the config of TestAsync*AdminApi to make test stable Key: HBASE-18114 URL: https://issues.apache.org/jira/browse/HBASE-18114 Project: HBase Issue Type: Sub-task Reporter: Guanghao Zhang Assignee: Guanghao Zhang {code} 2017-05-25 17:56:34,967 INFO [RpcServer.default.FPBQ.Fifo.handler=3,queue=0,port=50801] master.HMaster$11(2297): Client=hao//127.0.0.1 disable testModifyColumnFamily 2017-05-25 17:56:37,974 INFO [RpcClient-timer-pool1-t1] client.AsyncHBaseAdmin$TableProcedureBiConsumer(2219): Operation: DISABLE, Table Name: default:testModifyColumnFamily failed with Failed after attempts=3, exceptions: Thu May 25 17:56:35 CST 2017, , java.io.IOException: Call to localhost/127.0.0.1:50801 failed on local exception: org.apache.hadoop.hbase.ipc.CallTimeoutException: Call id=294, waitTime=1008, rpcTimeout=1000 Thu May 25 17:56:37 CST 2017, , java.io.IOException: Call to localhost/127.0.0.1:50801 failed on local exception: org.apache.hadoop.hbase.ipc.CallTimeoutException: Call id=295, waitTime=1299, rpcTimeout=1000 Thu May 25 17:56:37 CST 2017, , java.io.IOException: Call to localhost/127.0.0.1:50801 failed on local exception: org.apache.hadoop.hbase.ipc.CallTimeoutException: Call id=296, waitTime=668, rpcTimeout=660 017-05-25 17:56:38,936 DEBUG [RpcServer.default.FPBQ.Fifo.handler=3,queue=0,port=50801] procedure2.ProcedureExecutor(788): Stored procId=15, owner=hao, state=RUNNABLE:DISABLE_TABLE_PREPARE, DisableTableProcedure table=testModifyColumnFamily {code} For this disable table procedure, master return the procedure id when it submit the procedure to ProcedureExecutor. And the above procedure take 4 seconds to submit. So the disable table call failed because the rpc timeout is 1 seconds and the retry number is 3. For admin operation, I thought we don't need change the default timeout config in unit test. And the retry is not need, too. (Or we can set a retry > 1 to test nonce thing). Meanwhile, the default timeout is 60 seconds. So the test type may need change to LargeTests. -- This message was sent by Atlassian JIRA (v6.3.15#6346)