from:"Guanghao Zhang"

Re: [ANNOUNCE] New HBase committer Bryan Beaudreault

2022-04-11 Thread Guanghao Zhang

Congratulations!

Tak Lon (Stephen) Wu  于2022年4月12日周二 02:29写道：

> Congrats Bryan!
>
> -Stephen
>
> On Mon, Apr 11, 2022 at 9:44 AM Pankaj Kumar 
> wrote:
> >
> > Congratulations & welcome Bryan..!!
> >
> > Regards,
> > Pankaj
> >
> > On Sat, Apr 9, 2022, 5:15 PM 张铎(Duo Zhang) 
> wrote:
> >
> > > On behalf of the Apache HBase PMC, I am pleased to announce that Bryan
> > > Beaudreault(bbeaudreault) has accepted the PMC's invitation to become a
> > > committer on the project. We appreciate all of Bryan's generous
> > > contributions thus far and look forward to his continued involvement.
> > >
> > > Congratulations and welcome, Bryan Beaudreault!
> > >
> > > 我很高兴代表 Apache HBase PMC 宣布 Bryan Beaudreault 已接受我们的邀请，成为 Apache HBase
> 项目的
> > > Committer。感谢 Bryan Beaudreault 一直以来为 HBase 项目做出的贡献，并期待他在未来继续承担更多的责任。
> > >
> > > 欢迎 Bryan Beaudreault！
> > >
>

Re: [VOTE] First release candidate for hbase-thirdparty 4.1.1 is available for download

2022-06-20 Thread Guanghao Zhang

+1 (binding)

* Signature: ok
* Checksum : ok
* Rat check (1.8.0_301): ok
 - mvn clean apache-rat:check
* Built from source (1.8.0_301): ok
 - mvn clean install  -DskipTests

张铎(Duo Zhang)  于2022年6月20日周一 23:40写道：

> See https://github.com/apache/hbase/pull/4552
>
> After modifying some code in HBase, mainly because of a behavior change in
> jetty, we can pass all the UTs when depending on hbase-thirdparty-4.1.1.
>
> 张铎(Duo Zhang)  于2022年6月18日周六 23:04写道：
>
> > Please vote on this Apache hbase thirdparty release candidate,
> > hbase-thirdparty-4.1.1RC0
> >
> > The VOTE will remain open for at least 72 hours.
> >
> > [ ] +1 Release this package as Apache hbase thirdparty 4.1.1
> > [ ] -1 Do not release this package because ...
> >
> > The tag to be voted on is 4.1.1RC0:
> >
> >   https://github.com/apache/hbase-thirdparty/tree/4.1.1RC0
> >
> > This tag currently points to git reference
> >
> >   d674246a75e1d7d1d4c5ee09a2567bbfa1cec022
> >
> > The release files, including signatures, digests, as well as CHANGES.md
> > and RELEASENOTES.md included in this RC can be found at:
> >
> >
> https://dist.apache.org/repos/dist/dev/hbase/hbase-thirdparty-4.1.1RC0/
> >
> > Maven artifacts are available in a staging repository at:
> >
> >
> https://repository.apache.org/content/repositories/orgapachehbase-1488/
> >
> > Artifacts were signed with the 9AD2AE49 key which can be found in:
> >
> >   https://downloads.apache.org/hbase/KEYS
> >
> > To learn more about Apache hbase thirdparty, please see
> >
> >   http://hbase.apache.org/
> >
> > Thanks,
> > Your HBase Release Manager
> >
>

Re: [ANNOUNCE] New HBase Committer Liangjun He

2022-12-06 Thread Guanghao Zhang

Congratulations!

OpenInx  于2022年12月6日周二 19:03写道：

> Congrats and welcome !
>
> On Tue, Dec 6, 2022 at 2:21 AM Andrew Purtell  wrote:
>
> > Congratulations, and welcome!
> >
> > On Sat, Dec 3, 2022 at 5:51 AM Yu Li  wrote:
> >
> > > Hi All,
> > >
> > > On behalf of the Apache HBase PMC, I am pleased to announce that
> Liangjun
> > > He (heliangjun) has accepted the PMC's invitation to become a committer
> > on
> > > the project. We appreciate all of Liangjun's generous contributions
> thus
> > > far and look forward to his continued involvement.
> > >
> > > Congratulations and welcome, Liangjun!
> > >
> > > 我很高兴代表 Apache HBase PMC 宣布 Liangjun He (何良均) 已接受我们的邀请，成为 Apache HBase
> 项目的
> > > Committer。感谢何良均一直以来为 HBase 项目做出的贡献，并期待他在未来继续承担更多的责任。
> > >
> > > 欢迎良均！
> > >
> > > Best Regards,
> > > Yu
> > > --
> > > Best Regards,
> > > Yu
> > >
> >
> >
> > --
> > Best regards,
> > Andrew
> >
> > Unrest, ignorance distilled, nihilistic imbeciles -
> > It's what we’ve earned
> > Welcome, apocalypse, what’s taken you so long?
> > Bring us the fitting end that we’ve been counting on
> >- A23, Welcome, Apocalypse
> >
>

Re: [DISCUSS] How to deal with the disabling of public sign ups for jira.a.o(enable github issues?)

2022-12-06 Thread Guanghao Zhang

Did other projects have the same solution for this, sync github issues to
jira issues? Github issues will be useful to get more feedback.

张铎(Duo Zhang)  于2022年12月6日周二 00:13写道：

> The PR for HBASE-27513 is available
>
> https://github.com/apache/hbase/pull/4913
>
> Let's at least tell our users to send email to private@hbase for
> acquiring a jira account.
>
> Thanks.
>
> 张铎(Duo Zhang)  于2022年12月2日周五 12:46写道：
> >
> > Currently all the comment on github PR will be sent to issues@hbase,
> > like this one
> >
> > https://lists.apache.org/thread/jbfm269b4m24xl2r82l8b0t3pmqr44hr
> >
> > But I think this can only be used as an archive, to make sure that all
> > discussions are recorded on asf infrastructure.
> >
> > For github issues, I'm afraid we can only do the same thing. As the
> > format of github comment is different, it will be hard to read if we
> > just sync the message to jira...
> >
> > Thanks.
> >
> > Bryan Beaudreault  于2022年12月1日周四
> 21:30写道：
> > >
> > > Should we have them sent to private@? Just thinking in terms of
> reducing
> > > spam to users who put their email and full name on a public list.
> > >
> > > One thought I had about bug tracking is whether we could use some sort
> of
> > > github -> jira sync. I've seen them used before, where it automatically
> > > syncs issues and comments between the two systems. It's definitely not
> > > ideal, but maybe an option? I'm guessing it would require INFRA help.
> > >
> > > On Thu, Dec 1, 2022 at 5:47 AM 张铎(Duo Zhang) 
> wrote:
> > >
> > > > I've filed HBASE-27513 for changing the readme on github.
> > > >
> > > > At least let's reuse the existing mailing list for acquiring jira
> account.
> > > >
> > > > Thanks.
> > > >
> > > > 张铎(Duo Zhang)  于2022年11月29日周二 22:34写道：
> > > >
> > > > >
> > > > > Bump and also send this to user@hbase.
> > > > >
> > > > > We need to find a way to deal with the current situation where
> > > > > contributors can not create a Jira account on their own...
> > > > >
> > > > > At least, we need to change the readme on github page, web site and
> > > > > also the ref guide to tell users how to acquire a jira account...
> > > > >
> > > > > Thanks.
> > > > >
> > > > > 张铎(Duo Zhang)  于2022年11月27日周日 22:06写道：
> > > > > >
> > > > > > For me, I think most developers already have a github account, so
> > > > > > enabling it could help us get more feedback. For lots of younger
> > > > > > Chinese developers, they rarely use email in their daily life...
> > > > > > No doubt later we need to modify our readme on github. If we
> just let
> > > > > > users go to github issues on the readme, they will soon open an
> issue
> > > > > > there. But if we ask users to first send an email to a mailing
> list,
> > > > > > for acquiring a jira account, and then wait for a PMC member to
> submit
> > > > > > the request, and receive the email response, set up their
> account, and
> > > > > > then they can finally open an issue on jira. I'm afraid lots of
> users
> > > > > > will just give up, it is not very friendly...
> > > > > >
> > > > > > And I do not mean separate issue systems for users and devs.
> Users can
> > > > > > still open jira issues or ask in the mailing list if they want,
> github
> > > > > > issues is just another channel. If a user asks something in the
> > > > > > mailing list and we think it is a bug, we will ask the user to
> file an
> > > > > > issue or we will file an issue for it. It is just the same with
> github
> > > > > > issues.
> > > > > >
> > > > > > Thanks.
> > > > > >
> > > > > > Nick Dimiduk  于2022年11月24日周四 15:44写道：
> > > > > > >
> > > > > > > This new situation around JIRA seems very similar to the
> existing
> > > > situation
> > > > > > > around Slack. A new community member currently must acquire a
> Slack
> > > > invite
> > > > > > > somehow, usually by emailing one of the lists. Mailing lists
> > > > themselves
> > > > > > > involve a signup process, though it may be possible to email
> > > > user/-zh/dev
> > > > > > > without first subscribing to the list.
> > > > > > >
> > > > > > > I have a -0 opinion on using GitHub Issues to manage JIRA
> > > > subscription
> > > > > > > access. It seems like a comical cascade of complexity. I’d
> prefer to
> > > > keep
> > > > > > > GitHub Issues available to us as a future alternative to JIRA
> for
> > > > project
> > > > > > > issue tracking. I agree with you that migrating away from JIRA
> will
> > > > be
> > > > > > > painful.
> > > > > > >
> > > > > > > I’m not a big fan of having separate issue systems for users
> vs.
> > > > devs. It
> > > > > > > emphasizes the idea that users and devs are different groups of
> > > > people with
> > > > > > > unequal voice in the project direction. I suppose it could be
> done
> > > > well,
> > > > > > > but I think it is more likely to be done poorly.
> > > > > > >
> > > > > > > I follow the Infra list, but only casually. It seems there’s a
> plan
> > > > to
> > > > > > > eventually adopt some Atlassian Cloud service, which

Re: [VOTE] The first release candidate for HBase 2.5.2 is available

2022-12-01 Thread Guanghao Zhang

+1 (binding)

* Checksum : ok
* Rat check (1.8.0_301): ok
 - mvn clean apache-rat:check
* Built from source (1.8.0_301): ok
 - mvn clean install -DskipTests
* Unit tests pass (1.8.0_301): ok
 - mvn package -P runSmallTests

Duo Zhang  于2022年11月30日周三 23:37写道：

> Bump.
>
> The phoenix community has tested the hadoop3 artifacts and it works well.
>
> Let's get this release done~
>
> Thanks.
>
> Duo Zhang  于2022年11月24日周四 12:32写道：
> >
> > Please vote on this Apache hbase release candidate,
> > hbase-2.5.2RC0
> >
> > The VOTE will remain open for at least 72 hours.
> >
> > [ ] +1 Release this package as Apache hbase 2.5.2
> > [ ] -1 Do not release this package because ...
> >
> > The tag to be voted on is 2.5.2RC0:
> >
> >   https://github.com/apache/hbase/tree/2.5.2RC0
> >
> > This tag currently points to git reference
> >
> >   3e28acf0b819f4b4a1ada2b98d59e05b0ef94f96
> >
> > The release files, including signatures, digests, as well as CHANGES.md
> > and RELEASENOTES.md included in this RC can be found at:
> >
> >   https://dist.apache.org/repos/dist/dev/hbase/2.5.2RC0/
> >
> > Maven artifacts are available in a staging repository at:
> >
> >
> https://repository.apache.org/content/repositories/orgapachehbase-1503/
> >
> > Maven artifacts for hadoop3 are available in a staging repository at:
> >
> >
> https://repository.apache.org/content/repositories/orgapachehbase-1504/
> >
> > Artifacts were signed with the 0x9AD2AE49 key which can be found in:
> >
> >   https://downloads.apache.org/hbase/KEYS
> >
> > 2.5.2 includes 28 bug and improvement fixes done since 2.5.1. And
> > starting from 2.5.2, we will publish dist and maven artifacts for both
> > hadoop2 and hadoop3. Feel free to report any issues for the newly
> > published hadoop3 dist and maven artifacts.
> >
> > To learn more about Apache hbase, please see
> >
> >   http://hbase.apache.org/
> >
> > Thanks,
> > Your HBase Release Manager
>

Re: [VOTE] First release candidate for hbase 3.0.0-alpha-4 is available for download

2023-06-07 Thread Guanghao Zhang

+1 (binding)

* Checksum : ok
* Rat check (1.8.0_362): ok
 - mvn clean apache-rat:check
* Built from source (1.8.0_362): ok
 - mvn clean install -DskipTests
* Unit tests pass (1.8.0_362): ok
 - mvn clean package -P runSmallTests
 -Dsurefire.rerunFailingTestsCount=3

Shanmukha Haripriya Kota  于2023年6月7日周三 08:17写道：

> +1
>
> [INFO]
> 
> [INFO] BUILD SUCCESS
> [INFO]
> 
> [INFO] Total time:  33:55 min
> [INFO] Finished at: 2023-06-06T19:00:23-05:00
> [INFO]
> 
> ~/Desktop/hbase300/hbase-3.0.0-alpha-4/dev-support
> * Signature: ok
> * Checksum : ok
> * Rat check (11.0.10): ok
>  - mvn clean apache-rat:check
> * Built from source (11.0.10): ok
>  - mvn clean install  -DskipTests
> * Unit tests pass (11.0.10): ok
>  - mvn clean package -P runSmallTests
>  -Dsurefire.rerunFailingTestsCount=3
>
> [1]  + 19163 done   ./hbase-vote.sh -s  -f
> https://dist.apache.org/repos/dist/release/hbase/KEYS
>
> On Tue, Jun 6, 2023 at 10:22 AM Andrew Purtell 
> wrote:
>
> > +1 (binding)
> >
> > * Signature: ok
> > * Checksum : ok
> > * Rat check (11.0.19): ok
> >  - mvn clean apache-rat:check
> > * Built from source (11.0.19): ok
> >  - mvn clean install  -DskipTests
> > * Unit tests pass (11.0.19): failed
> >  - mvn clean package -P runAllTests
> >  -Dsurefire.rerunFailingTestsCount=3
> >
> > TestSpnegoHttpServer and TestProxyUserSpnegoHttpServer in hbase-http
> > consistently failed but it may be environmental, related to local keberos
> > configs.
> >
> > On Sun, May 28, 2023 at 8:13 AM Duo Zhang  wrote:
> >
> > > Please vote on this Apache hbase release candidate,
> > > hbase-3.0.0-alpha-4RC0
> > >
> > > The VOTE will remain open for at least 72 hours.
> > >
> > > [ ] +1 Release this package as Apache hbase 3.0.0-alpha-4
> > > [ ] -1 Do not release this package because ...
> > >
> > > The tag to be voted on is 3.0.0-alpha-4RC0:
> > >
> > >   https://github.com/apache/hbase/tree/3.0.0-alpha-4RC0
> > >
> > > This tag currently points to git reference
> > >
> > >   e44cc02c75ecae7ece845f04722eb16b7528393f
> > >
> > > The release files, including signatures, digests, as well as CHANGES.md
> > > and RELEASENOTES.md included in this RC can be found at:
> > >
> > >   https://dist.apache.org/repos/dist/dev/hbase/3.0.0-alpha-4RC0/
> > >
> > > Maven artifacts are available in a staging repository at:
> > >
> > >
> > https://repository.apache.org/content/repositories/orgapachehbase-1520/
> > >
> > > Maven artifacts for hadoop3 are available in a staging repository at:
> > >
> > >   https://repository.apache.org/content/repositories/not-applicable/
> > >
> > > Artifacts were signed with the 0x9AD2AE49 key which can be found in:
> > >
> > >   https://downloads.apache.org/hbase/KEYS
> > >
> > > 3.0.0-alpha-4 is the fourth alpha release for our 3.0.0 major release
> > line.
> > > HBase 3.0.0 includes the following big feature/changes:
> > >   Synchronous Replication
> > >   OpenTelemetry Tracing
> > >   Distributed MOB Compaction
> > >   Backup and Restore
> > >   Move RSGroup balancer to core
> > >   Reimplement sync client on async client
> > >   CPEPs on shaded protobuf
> > >   Move the logging framework from log4j to log4j2
> > >   Decouple region replication and general replication framework, and
> > >   also make region replication can work when SKIP_WAL is used
> > >   A new file system based replication peer storage
> > >   Used hbase table instead of zookeeper for tracking hbase replication
> > > queue
> > >
> > > Notice that this is not a production ready release. It is used to let
> our
> > > users try and test the new major release, to get feedback before the
> > final
> > > GA release is out.
> > > So please do NOT use it in production. Just try it and report back
> > > everything you find unusual.
> > >
> > > And this time we will not include CHANGES.md and RELEASENOTE.md
> > > in our source code, you can find it on the download site. For getting
> > these
> > > two files for old releases, please go to
> > >
> > >   https://archive.apache.org/dist/hbase/
> > >
> > > To learn more about Apache hbase, please see
> > >
> > >   http://hbase.apache.org/
> > >
> > > Thanks,
> > > Your HBase Release Manager
> > >
> >
> >
> > --
> > Best regards,
> > Andrew
> >
> > Unrest, ignorance distilled, nihilistic imbeciles -
> > It's what we’ve earned
> > Welcome, apocalypse, what’s taken you so long?
> > Bring us the fitting end that we’ve been counting on
> >- A23, Welcome, Apocalypse
> >
>
>
> --
> Regards,
> Shanmukha Kota
>

[jira] [Created] (HBASE-13686) Fail to limit rate in RateLimiter

2015-05-13 Thread Guanghao Zhang (JIRA)

Guanghao Zhang created HBASE-13686:
--

 Summary: Fail to limit rate in RateLimiter
 Key: HBASE-13686
 URL: https://issues.apache.org/jira/browse/HBASE-13686
 Project: HBase
  Issue Type: Bug
Affects Versions: 2.0.0, 1.1.0
Reporter: Guanghao Zhang
Priority: Minor


While using the patch in HBASE-11598 , I found that RateLimiter can't to limit 
the rate right.
{code} 
 /**
   * given the time interval, are there enough available resources to allow 
execution?
   * @param now the current timestamp
   * @param lastTs the timestamp of the last update
   * @param amount the number of required resources
   * @return true if there are enough available resources, otherwise false
   */
  public synchronized boolean canExecute(final long now, final long lastTs, 
final long amount) {
return avail = amount ? true : refill(now, lastTs) = amount;
  }
{code}
When avail = amount, avail can't be refill. But in the next time to call 
canExecute, lastTs maybe update. So avail will waste some time to refill. Even 
we use smaller rate than the limit, the canExecute will return false. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HBASE-13829) Add more ThrottleType

2015-06-03 Thread Guanghao Zhang (JIRA)

Guanghao Zhang created HBASE-13829:
--

 Summary: Add more ThrottleType
 Key: HBASE-13829
 URL: https://issues.apache.org/jira/browse/HBASE-13829
 Project: HBase
  Issue Type: Improvement
  Components: Client
Reporter: Guanghao Zhang
Assignee: Guanghao Zhang
 Fix For: 2.0.0


HBASE-11598 add simple throttling for hbase. But in the client, it doesn't 
support user to set ThrottleType like WRITE_NUM, WRITE_SIZE, READ_NUM, 
READ_SIZE.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HBASE-13974) TestRateLimiter#testFixedIntervalResourceAvailability may fail

2015-06-25 Thread Guanghao Zhang (JIRA)

Guanghao Zhang created HBASE-13974:
--

 Summary: TestRateLimiter#testFixedIntervalResourceAvailability may 
fail
 Key: HBASE-13974
 URL: https://issues.apache.org/jira/browse/HBASE-13974
 Project: HBase
  Issue Type: Bug
  Components: test
Affects Versions: 2.0.0
Reporter: Guanghao Zhang
Assignee: Guanghao Zhang


Stacktrace

java.lang.AssertionError: null
at org.junit.Assert.fail(Assert.java:86)
at org.junit.Assert.assertTrue(Assert.java:41)
at org.junit.Assert.assertFalse(Assert.java:64)
at org.junit.Assert.assertFalse(Assert.java:74)
at 
org.apache.hadoop.hbase.quotas.TestRateLimiter.testFixedIntervalResourceAvailability(TestRateLimiter.java:151)

The code of this ut.
{code}
 RateLimiter limiter = new FixedIntervalRateLimiter();
 limiter.set(10, TimeUnit.MILLISECONDS);
 
 assertTrue(limiter.canExecute(10));
 limiter.consume(3);
 assertEquals(7, limiter.getAvailable());
 assertFalse(limiter.canExecute(10));
{code}
The limiter will refill by MILLISECONDS. So if this unit test execute slowly or 
hang by others over 1 ms, the assertFalse(limiter.canExecute(10)) will fail.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HBASE-13888) refill bug from HBASE-13686

2015-06-11 Thread Guanghao Zhang (JIRA)

Guanghao Zhang created HBASE-13888:
--

 Summary: refill bug from HBASE-13686
 Key: HBASE-13888
 URL: https://issues.apache.org/jira/browse/HBASE-13888
 Project: HBase
  Issue Type: Bug
Affects Versions: 2.0.0
Reporter: Guanghao Zhang
Assignee: Guanghao Zhang


As I report the RateLimiter fail to limit in HBASE-13686, then [~ashish singhi] 
fix that problem by support two kinds of RateLimiter:  
AverageIntervalRateLimiter and FixedIntervalRateLimiter. But in my use of the 
code, I found a new bug about refill() in AverageIntervalRateLimiter.
{code}
long delta = (limit * (now - nextRefillTime)) / super.getTimeUnitInMillis();
if (delta  0) {
  this.nextRefillTime = now;
  return Math.min(limit, available + delta);
}   
{code}
When delta  0, refill maybe return available + delta. Then in the 
canExecute(), avail will add refillAmount again. So the new avail maybe 2 * 
avail + delta.
{code}
long refillAmount = refill(limit, avail);
if (refillAmount == 0  avail  amount) {
  return false;
}   
// check for positive overflow
if (avail = Long.MAX_VALUE - refillAmount) {
  avail = Math.max(0, Math.min(avail + refillAmount, limit));
} else {
  avail = Math.max(0, limit);
} 
{code}
I will add more unit tests for RateLimiter in the next days.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HBASE-13987) Modify the result of shell cmd list_quotas when not enable quota

2015-06-29 Thread Guanghao Zhang (JIRA)

Guanghao Zhang created HBASE-13987:
--

 Summary: Modify the result of shell cmd list_quotas when not 
enable quota
 Key: HBASE-13987
 URL: https://issues.apache.org/jira/browse/HBASE-13987
 Project: HBase
  Issue Type: Improvement
Affects Versions: 2.0.0
 Environment: When not enable quota, use shell cmd list_quotas will get 
result as belows:

hbase(main):008:0 list_quotas
OWNERQUOTAS 

 

ERROR: Unknown table hbase:quota!

It is confuse if user doesn't know quotas are stored in hbase:quota. I add 
check isQuotaEnabled before scan the table hbase:quota. So it will return 
result  ERROR: quota support disabled, which is same with set_quota.
Reporter: Guanghao Zhang
Assignee: Guanghao Zhang
Priority: Minor






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HBASE-14706) RegionLocationFinder should return multiple servername by top host

2015-10-27 Thread Guanghao Zhang (JIRA)

Guanghao Zhang created HBASE-14706:
--

 Summary: RegionLocationFinder should return multiple servername by 
top host
 Key: HBASE-14706
 URL: https://issues.apache.org/jira/browse/HBASE-14706
 Project: HBase
  Issue Type: Bug
  Components: Balancer
Affects Versions: 2.0.0
Reporter: Guanghao Zhang
Assignee: Guanghao Zhang


Multiple RS can run on the same host. But in current RegionLocationFinder, 
mapHostNameToServerName map one host to only one server. This will make 
LocalityCostFunction get wrong locality about region.
{code}
// create a mapping from hostname to ServerName for fast lookup
HashMap<String, ServerName> hostToServerName = new HashMap<String, 
ServerName>();
for (ServerName sn : regionServers) {
  hostToServerName.put(sn.getHostname(), sn);
}
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HBASE-14604) Improve MoveCostFunction in StochasticLoadBalancer

2015-10-14 Thread Guanghao Zhang (JIRA)

Guanghao Zhang created HBASE-14604:
--

 Summary: Improve MoveCostFunction in StochasticLoadBalancer
 Key: HBASE-14604
 URL: https://issues.apache.org/jira/browse/HBASE-14604
 Project: HBase
  Issue Type: Bug
  Components: Balancer
Reporter: Guanghao Zhang
Assignee: Guanghao Zhang


The code in MoveCoseFunction:
{code}
return scale(0, cluster.numRegions + META_MOVE_COST_MULT, moveCost);
{code}
It uses cluster.numRegions + META_MOVE_COST_MULT as the max value when scale 
moveCost to [0,1]. But this should use maxMoves as the max value when cluster 
have a lot of regions.

Assume a cluster have 1 regions, maxMoves is 2500, it only scale moveCost 
to [0, 0.25].

Improve moveCost by use maxMoves.
{code}
return scale(0, Math.min(cluster.numRegions, maxMoves) + META_MOVE_COST_MULT, 
moveCost);
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HBASE-14609) Can't config all day as OffPeakHours

2015-10-14 Thread Guanghao Zhang (JIRA)

Guanghao Zhang created HBASE-14609:
--

 Summary: Can't config all day as OffPeakHours
 Key: HBASE-14609
 URL: https://issues.apache.org/jira/browse/HBASE-14609
 Project: HBase
  Issue Type: Bug
Reporter: Guanghao Zhang
Assignee: Guanghao Zhang
Priority: Minor


The offpeak hours is [startHour, endHour) and endhour is exclusive. But endHour 
is not valid when config as 24, so we can't config all day as OffPeakHours.
{code}
  private static boolean isValidHour(int hour) {
return 0 <= hour && hour <= 23; 
  }
{code}
Let endHour=24 is valid or  enable startHour==endHour can fix this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HBASE-16012) Major compaction can't work because left scanner read point in RegionServer

2016-06-13 Thread Guanghao Zhang (JIRA)

Guanghao Zhang created HBASE-16012:
--

 Summary: Major compaction can't work because left scanner read 
point in RegionServer
 Key: HBASE-16012
 URL: https://issues.apache.org/jira/browse/HBASE-16012
 Project: HBase
  Issue Type: Bug
  Components: Compaction, Scanners
Affects Versions: 0.94.27, 2.0.0
Reporter: Guanghao Zhang


When new RegionScanner, it will add a scanner read point in scannerReadPoints. 
But if we got a exception after add read point, the read point will keep in 
regions server and the delete after this mvcc number will never be compacted.

Our hbase version is base 0.94. If it throws other exception when initialize 
RegionScanner, the master branch has this bug, too.

ERROR org.apache.hadoop.hbase.regionserver.HRegionServer: Failed openScanner 
java.io.IOException: Could not seek StoreFileScanner
  at 
org.apache.hadoop.hbase.regionserver.StoreFileScanner.seek(StoreFileScanner.java:160)
  at 
org.apache.hadoop.hbase.regionserver.StoreScanner.seekScanners(StoreScanner.java:268)
  at 
org.apache.hadoop.hbase.regionserver.StoreScanner.(StoreScanner.java:168)
  at org.apache.hadoop.hbase.regionserver.Store.getScanner(Store.java:2232)
  at 
org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.(HRegion.java:4026)
  at 
org.apache.hadoop.hbase.regionserver.HRegion.instantiateRegionScanner(HRegion.java:1895)
  at org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.java:1879)
  at org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.java:1854)
  at 
org.apache.hadoop.hbase.regionserver.HRegionServer.internalOpenScanner(HRegionServer.java:3032)
  at 
org.apache.hadoop.hbase.regionserver.HRegionServer.openScanner(HRegionServer.java:2995)
  at sun.reflect.GeneratedMethodAccessor67.invoke(Unknown Source)
  at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
  at java.lang.reflect.Method.invoke(Method.java:597)
  at 
org.apache.hadoop.hbase.ipc.SecureRpcEngine$Server.call(SecureRpcEngine.java:338)
  at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1595)
Caused by: org.apache.hadoop.hbase.ipc.CallerDisconnectedException: Aborting 
call openScanner, since caller disconnected
  at 
org.apache.hadoop.hbase.ipc.HBaseServer$Call.throwExceptionIfCallerDisconnected(HBaseServer.java:475)
  at 
org.apache.hadoop.hbase.io.hfile.HFileBlock$AbstractFSReader.readAtOffset(HFileBlock.java:1443)
  at 
org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderV2.readBlockDataInternal(HFileBlock.java:1902)
  at 
org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderV2.readBlockData(HFileBlock.java:1766)
  at 
org.apache.hadoop.hbase.io.hfile.HFileReaderV2.readBlock(HFileReaderV2.java:345)
  at 
org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$BlockIndexReader.loadDataBlockWithScanInfo(HFileBlockIndex.java:254)
  at 
org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.seekTo(HFileReaderV2.java:499)
  at 
org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.seekTo(HFileReaderV2.java:520)
  at 
org.apache.hadoop.hbase.regionserver.StoreFileScanner.seekAtOrAfter(StoreFileScanner.java:235)
  at 
org.apache.hadoop.hbase.regionserver.StoreFileScanner.seek(StoreFileScanner.java:148)
  ... 14 more



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HBASE-15615) Wrong sleep time when RegionServerCallable need retry

2016-04-07 Thread Guanghao Zhang (JIRA)

Guanghao Zhang created HBASE-15615:
--

 Summary: Wrong sleep time when RegionServerCallable need retry
 Key: HBASE-15615
 URL: https://issues.apache.org/jira/browse/HBASE-15615
 Project: HBase
  Issue Type: Bug
  Components: Client
Affects Versions: 2.0.0
Reporter: Guanghao Zhang


In RpcRetryingCallerImpl, it get pause time by expectedSleep = 
callable.sleep(pause, tries + 1); And in RegionServerCallable, it get pasue 
time by sleep = ConnectionUtils.getPauseTime(pause, tries + 1). So tries will 
be bumped up twice.
Now RETRY_BACKOFF = {1, 2, 3, 5, 10, 20, 40, 100, 100, 100, 100, 200, 200}; So 
the pasue time is 3 * hbase.client.pause when tries is 0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HBASE-15515) Improve LocalityBasedCandidateGenerator in Balancer

2016-03-21 Thread Guanghao Zhang (JIRA)

Guanghao Zhang created HBASE-15515:
--

 Summary: Improve LocalityBasedCandidateGenerator in Balancer
 Key: HBASE-15515
 URL: https://issues.apache.org/jira/browse/HBASE-15515
 Project: HBase
  Issue Type: Bug
Affects Versions: 2.0.0, 1.3.0
Reporter: Guanghao Zhang
Assignee: Guanghao Zhang
Priority: Minor
 Fix For: 2.0.0


There are some problems which need to fix.
1. LocalityBasedCandidateGenerator.getLowestLocalityRegionOnServer should skip 
empty region.
2. When use LocalityBasedCandidateGenerator to generate Cluster.Action, it 
should add random operation instead of pickLowestLocalityServer(cluster). 
Because the search function may stuck here if it always generate the same 
Cluster.Action.
3. getLeastLoadedTopServerForRegion should get least loaded server which have 
better locality than current server.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HBASE-15529) Override needBalance in StochasticLoadBalancer

2016-03-25 Thread Guanghao Zhang (JIRA)

Guanghao Zhang created HBASE-15529:
--

 Summary: Override needBalance in StochasticLoadBalancer
 Key: HBASE-15529
 URL: https://issues.apache.org/jira/browse/HBASE-15529
 Project: HBase
  Issue Type: Improvement
Reporter: Guanghao Zhang
Priority: Minor


StochasticLoadBalancer includes cost functions to compute the cost of region 
rount, r/w qps, table load, region locality, memstore size, and storefile size. 
Every cost function returns a number between 0 and 1 inclusive and the computed 
costs are scaled by their respective multipliers. The bigger multiplier means 
that the respective cost function have the bigger weight. But needBalance 
decide whether to balance only by region count and doesn't consider r/w qps, 
locality even you config these cost function with bigger multiplier. 
StochasticLoadBalancer should override needBalance and decide whether to 
balance by it's configs of cost function.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HBASE-15496) Throw RowTooBigException only for user scan/get

2016-03-21 Thread Guanghao Zhang (JIRA)

Guanghao Zhang created HBASE-15496:
--

 Summary: Throw RowTooBigException only for user scan/get
 Key: HBASE-15496
 URL: https://issues.apache.org/jira/browse/HBASE-15496
 Project: HBase
  Issue Type: Improvement
  Components: Scanners
Reporter: Guanghao Zhang
Priority: Minor
 Fix For: 2.0.0


When config hbase.table.max.rowsize, RowTooBigException may be thrown by 
StoreScanner. But region flush/compact should catch it or throw it only for 
user scan.

org.apache.hadoop.hbase.regionserver.RowTooBigException: Max row size allowed: 
10485760, but row is bigger than that
  at 
org.apache.hadoop.hbase.regionserver.StoreScanner.seekScanners(StoreScanner.java:355)
  at 
org.apache.hadoop.hbase.regionserver.StoreScanner.(StoreScanner.java:276)
  at 
org.apache.hadoop.hbase.regionserver.StoreScanner.(StoreScanner.java:238)
  at 
org.apache.hadoop.hbase.regionserver.compactions.Compactor.createScanner(Compactor.java:403)
  at 
org.apache.hadoop.hbase.regionserver.compactions.DefaultCompactor.compact(DefaultCompactor.java:95)
  at 
org.apache.hadoop.hbase.regionserver.DefaultStoreEngine$DefaultCompactionContext.compact(DefaultStoreEngine.java:131)
  at org.apache.hadoop.hbase.regionserver.HStore.compact(HStore.java:1211)
  at org.apache.hadoop.hbase.regionserver.HRegion.compact(HRegion.java:1952)
  at org.apache.hadoop.hbase.regionserver.HRegion.compact(HRegion.java:1774)

or 

org.apache.hadoop.hbase.regionserver.RowTooBigException: Max row size allowed: 
10485760, but the row is bigger than that.
  at 
org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:576)
  at 
org.apache.hadoop.hbase.regionserver.StoreFlusher.performFlush(StoreFlusher.java:132)
  at 
org.apache.hadoop.hbase.regionserver.DefaultStoreFlusher.flushSnapshot(DefaultStoreFlusher.java:75)
  at org.apache.hadoop.hbase.regionserver.HStore.flushCache(HStore.java:880)
  at 
org.apache.hadoop.hbase.regionserver.HStore$StoreFlusherImpl.flushCache(HStore.java:2155)
  at 
org.apache.hadoop.hbase.regionserver.HRegion.internalFlushCacheAndCommit(HRegion.java:2454)
  at 
org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2193)
  at 
org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2162)
  at org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:2053)
  at org.apache.hadoop.hbase.regionserver.HRegion.flush(HRegion.java:1979)
at 
org.apache.hadoop.hbase.regionserver.TestRowTooBig.testScannersSeekOnFewLargeCells(TestRowTooBig.java:101)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HBASE-15885) Compute StoreFile HDFS Blocks Distribution when need it

2016-05-24 Thread Guanghao Zhang (JIRA)

Guanghao Zhang created HBASE-15885:
--

 Summary: Compute StoreFile HDFS Blocks Distribution when need it
 Key: HBASE-15885
 URL: https://issues.apache.org/jira/browse/HBASE-15885
 Project: HBase
  Issue Type: Improvement
  Components: HFile
Affects Versions: 2.0.0
Reporter: Guanghao Zhang


Now when open a StoreFileReader, it always need to compute HDFS blocks 
distribution. But when balance a region, it will increase the region not 
serving time. Because it need first close region on rs A, then open it on rs B. 
When close region, it first preFlush, then flush the new update to a new store 
file. The new store file will first be flushed to tmp directory, then move it 
to column family directory. These need open StoreFileReader twice which means 
it need compute HDFS blocks distribution twice. When open region on rs B, it 
need open StoreFileReader and compute HDFS blocks distribution too. So when 
balance a region, it need compute HDFS blocks distribution three times for per 
new store file. This will increase the region not serving time and we don't 
need compute HDFS blocks distribution when close a region.

The related three methods in HStore.
1. validateStoreFile(...)
2. commitFile(...)
3. openStoreFiles(...)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HBASE-15829) hbase.client.retries.number has different meanings in branch-1 and master

2016-05-13 Thread Guanghao Zhang (JIRA)

Guanghao Zhang created HBASE-15829:
--

 Summary: hbase.client.retries.number has different meanings in 
branch-1 and master
 Key: HBASE-15829
 URL: https://issues.apache.org/jira/browse/HBASE-15829
 Project: HBase
  Issue Type: Bug
  Components: Client
Affects Versions: 2.0.0
Reporter: Guanghao Zhang
Priority: Minor


The comment of hbase.client.retries.number is:
{code}
  /**  
   * Parameter name for maximum retries, used as maximum for all retryable
   * operations such as fetching of the root region from root region server,
   * getting a cell's value, starting a row update, etc.
   */
  public static final String HBASE_CLIENT_RETRIES_NUMBER = 
"hbase.client.retries.number";
{code}

In branch-1, the max attempts number equals with hbase.client.retries.number. 
But in master, the max attempts number equals with hbase.client.retries.number 
+ 1.

For RpcRetryingCaller.
{code}
this.retries = retries; // branch-1
{code}
{code}
this.maxAttempts = retries + 1; // master
{code}

For AsyncProcess:
{code}
this.numTries = conf.getInt(HConstants.HBASE_CLIENT_RETRIES_NUMBER,
HConstants.DEFAULT_HBASE_CLIENT_RETRIES_NUMBER); // branch-1
{code}
{code}
// how many times we could try in total, one more than retry number
this.numTries = conf.getInt(HConstants.HBASE_CLIENT_RETRIES_NUMBER,
HConstants.DEFAULT_HBASE_CLIENT_RETRIES_NUMBER) + 1; // master
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HBASE-16416) Make NoncedRegionServerCallable extends RegionServerCallable

2016-08-15 Thread Guanghao Zhang (JIRA)

Guanghao Zhang created HBASE-16416:
--

 Summary: Make NoncedRegionServerCallable extends 
RegionServerCallable
 Key: HBASE-16416
 URL: https://issues.apache.org/jira/browse/HBASE-16416
 Project: HBase
  Issue Type: Improvement
  Components: Client
Affects Versions: 2.0.0
Reporter: Guanghao Zhang
Priority: Minor


After HBASE-16308, there are a new class NoncedRegionServerCallable which 
extends AbstractRegionServerCallable. But it have some duplicate methods with 
RegionServerCallable. So we can make NoncedRegionServerCallable extends 
RegionServerCallable.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HBASE-16368) test*WhenRegionMove in TestPartialResultsFromClientSide is flaky

2016-08-06 Thread Guanghao Zhang (JIRA)

Guanghao Zhang created HBASE-16368:
--

 Summary: test*WhenRegionMove in TestPartialResultsFromClientSide 
is flaky
 Key: HBASE-16368
 URL: https://issues.apache.org/jira/browse/HBASE-16368
 Project: HBase
  Issue Type: Bug
  Components: Scanners
Affects Versions: 1.4.0
Reporter: Guanghao Zhang


This test fail when Hadoop QA run preCommit:
https://builds.apache.org/job/PreCommit-HBASE-Build/2971/testReport/org.apache.hadoop.hbase/TestPartialResultsFromClientSide/testReversedCompleteResultWhenRegionMove/.

And I found it is in Flaky Tests Dashboard: 
http://hbase.x10host.com/flaky-tests/. I run it in my local machine and it may 
fail, too.

Test results show that the region location didn't update when scanner callable 
get a NotServingRegionException or RegionMovedException.
{code}
org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed after 
attempts=36, exceptions:
Sat Aug 06 05:55:52 UTC 2016, null, java.net.SocketTimeoutException: 
callTimeout=2000, callDuration=2157: 
org.apache.hadoop.hbase.NotServingRegionException: 
testReversedCompleteResultWhenRegionMove,,1470462949504.5069bd63bf6eda5108acec4fcc087b0e.
 is closing
at 
org.apache.hadoop.hbase.regionserver.HRegion.startRegionOperation(HRegion.java:8233)
at 
org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.java:2634)
at 
org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.java:2629)
at 
org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.java:2623)
at 
org.apache.hadoop.hbase.regionserver.RSRpcServices.scan(RSRpcServices.java:2490)
at 
org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:34950)
at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2264)
at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:118)
at 
org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:189)
at 
org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:169)
 row '' on table 'testReversedCompleteResultWhenRegionMove' at 
region=testReversedCompleteResultWhenRegionMove,,1470462949504.5069bd63bf6eda5108acec4fcc087b0e.,
 hostname=asf907.gq1.ygridcore.net,38914,1470462943053, seqNum=2

at 
org.apache.hadoop.hbase.client.RpcRetryingCallerWithReadReplicas.throwEnrichedException(RpcRetryingCallerWithReadReplicas.java:281)
at 
org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:213)
at 
org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:61)
at 
org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithoutRetries(RpcRetryingCaller.java:212)
at 
org.apache.hadoop.hbase.client.ReversedClientScanner.nextScanner(ReversedClientScanner.java:118)
at 
org.apache.hadoop.hbase.client.ClientScanner.initializeScannerInConstruction(ClientScanner.java:166)
at 
org.apache.hadoop.hbase.client.ClientScanner.(ClientScanner.java:161)
at 
org.apache.hadoop.hbase.client.ReversedClientScanner.(ReversedClientScanner.java:56)
at org.apache.hadoop.hbase.client.HTable.getScanner(HTable.java:785)
at 
org.apache.hadoop.hbase.TestPartialResultsFromClientSide.testReversedCompleteResultWhenRegionMove(TestPartialResultsFromClientSide.java:986)
{code}

{code}
org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed after 
attempts=36, exceptions:
Sat Aug 06 16:27:22 CST 2016, null, java.net.SocketTimeoutException: 
callTimeout=2000, callDuration=3035: Region moved to: hostname=localhost 
port=58351 startCode=1470472007714. As of locationSeqNum=6. row 'testRow0' on 
table 'testPartialResultWhenRegionMove' at 
region=testPartialResultWhenRegionMove,,1470472035048.977faf05c1d6d9990b5559b17aa18913.,
 hostname=localhost,40425,1470472007646, seqNum=2

at 
org.apache.hadoop.hbase.client.RpcRetryingCallerWithReadReplicas.throwEnrichedException(RpcRetryingCallerWithReadReplicas.java:281)
at 
org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:213)
at 
org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:61)
at 
org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithoutRetries(RpcRetryingCaller.java:212)
at 
org.apache.hadoop.hbase.client.ClientScanner.call(ClientScanner.java:326)
at 
org.apache.hadoop.hbase.client.ClientScanner.nextScanner(ClientScanner.java:301)
at 
org.apache.hadoop.hbase.client.ClientScanner.possiblyNextScanner(ClientScanner.java:247)
at 
org.apache.hadoop.hbase.client.ClientScanner.loadCache(ClientScanner.java:541)
at 
org.apache.hadoop.hbase.client.ClientScanner.next(ClientScanner.java:370

[jira] [Created] (HBASE-17600) Implement get/create/modify/delete/list namespace admin operations

2017-02-06 Thread Guanghao Zhang (JIRA)

Guanghao Zhang created HBASE-17600:
--

 Summary: Implement get/create/modify/delete/list namespace admin 
operations
 Key: HBASE-17600
 URL: https://issues.apache.org/jira/browse/HBASE-17600
 Project: HBase
  Issue Type: Sub-task
Reporter: Guanghao Zhang
Assignee: Guanghao Zhang






--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Created] (HBASE-17615) Use nonce and procedure v2 for add/remove replication peer

2017-02-08 Thread Guanghao Zhang (JIRA)

Guanghao Zhang created HBASE-17615:
--

 Summary: Use nonce and procedure v2 for add/remove replication peer
 Key: HBASE-17615
 URL: https://issues.apache.org/jira/browse/HBASE-17615
 Project: HBase
  Issue Type: Sub-task
Affects Versions: 2.0.0
Reporter: Guanghao Zhang






--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Created] (HBASE-17596) Implement add/delete/modify column family methods

2017-02-04 Thread Guanghao Zhang (JIRA)

Guanghao Zhang created HBASE-17596:
--

 Summary: Implement add/delete/modify column family methods
 Key: HBASE-17596
 URL: https://issues.apache.org/jira/browse/HBASE-17596
 Project: HBase
  Issue Type: Sub-task
Reporter: Guanghao Zhang






--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Created] (HBASE-17511) Implement enable/disable table methods

2017-01-23 Thread Guanghao Zhang (JIRA)

Guanghao Zhang created HBASE-17511:
--

 Summary: Implement enable/disable table methods
 Key: HBASE-17511
 URL: https://issues.apache.org/jira/browse/HBASE-17511
 Project: HBase
  Issue Type: Sub-task
Reporter: Guanghao Zhang
Assignee: Guanghao Zhang






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HBASE-17498) Implement listTables methods

2017-01-19 Thread Guanghao Zhang (JIRA)

Guanghao Zhang created HBASE-17498:
--

 Summary: Implement listTables methods
 Key: HBASE-17498
 URL: https://issues.apache.org/jira/browse/HBASE-17498
 Project: HBase
  Issue Type: Sub-task
Reporter: Guanghao Zhang
Assignee: Guanghao Zhang






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HBASE-17500) Implement getTable/creatTable/deleteTable/truncateTable methods

2017-01-19 Thread Guanghao Zhang (JIRA)

Guanghao Zhang created HBASE-17500:
--

 Summary: Implement getTable/creatTable/deleteTable/truncateTable 
methods
 Key: HBASE-17500
 URL: https://issues.apache.org/jira/browse/HBASE-17500
 Project: HBase
  Issue Type: Sub-task
Reporter: Guanghao Zhang
Assignee: Guanghao Zhang






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HBASE-16460) Can't rebuild the BucketAllocator's data structures when BucketCache use FileIOEngine

2016-08-19 Thread Guanghao Zhang (JIRA)

Guanghao Zhang created HBASE-16460:
--

 Summary: Can't rebuild the BucketAllocator's data structures when 
BucketCache use FileIOEngine
 Key: HBASE-16460
 URL: https://issues.apache.org/jira/browse/HBASE-16460
 Project: HBase
  Issue Type: Bug
  Components: BucketCache
Affects Versions: 2.0.0
Reporter: Guanghao Zhang
Assignee: Guanghao Zhang


When bucket cache use FileIOEngine, it will rebuild the bucket allocator's data 
structures from a persisted map. So it should first read the map from 
persistence file then use the map to new a BucketAllocator. But now the code 
has wrong sequence in retrieveFromFile() method of BucketCache.java.

{code}
  BucketAllocator allocator = new BucketAllocator(cacheCapacity, 
bucketSizes, backingMap, realCacheSize);
  backingMap = (ConcurrentHashMap<BlockCacheKey, BucketEntry>) 
ois.readObject();
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HBASE-16561) Add metrics about read/write/scan queue length and active handler count

2016-09-05 Thread Guanghao Zhang (JIRA)

Guanghao Zhang created HBASE-16561:
--

 Summary: Add metrics about read/write/scan queue length and active 
handler count
 Key: HBASE-16561
 URL: https://issues.apache.org/jira/browse/HBASE-16561
 Project: HBase
  Issue Type: Improvement
  Components: IPC/RPC, metrics
Reporter: Guanghao Zhang
Priority: Minor


Now there are only metrics about total queue length and active rpc handler 
count. But for the RWQueueRpcExecutor, we can have different queues and 
handlers for read/write/scan request. I thought it is necessary to add more 
metrics for RWQueueRpcExecutor. When use it in production cluster, we can 
adjust the config of queues and handlers according to the metrics.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HBASE-16666) Add append and remove peer namespaces cmds for replication

2016-09-21 Thread Guanghao Zhang (JIRA)

Guanghao Zhang created HBASE-1:
--

 Summary: Add append and remove peer namespaces cmds for replication
 Key: HBASE-1
 URL: https://issues.apache.org/jira/browse/HBASE-1
 Project: HBase
  Issue Type: Improvement
  Components: Replication
Reporter: Guanghao Zhang
Assignee: Guanghao Zhang
Priority: Minor


After HBASE-16447, we support replication by namespaces config in peer. Like 
append_peer_tableCFs and remove_peer_tableCFs, I thought we need two new shell 
cmd:  append_peer_namespaces and remove_peer_namespaces. Then we can easily 
change the namespaces config.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HBASE-16653) Backport HBASE-11393 to all branches which support namespace

2016-09-19 Thread Guanghao Zhang (JIRA)

Guanghao Zhang created HBASE-16653:
--

 Summary: Backport HBASE-11393 to all branches which support 
namespace
 Key: HBASE-16653
 URL: https://issues.apache.org/jira/browse/HBASE-16653
 Project: HBase
  Issue Type: Bug
Reporter: Guanghao Zhang


As HBASE-11386 mentioned, the parse code about replication table-cfs config 
will be wrong when table name contains namespace and we can only config the 
default namespace's tables in the peer. It is a bug for all branches which 
support namespace. HBASE-11393 resolved this by use a pb object but it was only 
merged to master branch. Other branches still have this problem. I thought we 
should fix this bug in all branches which support namespace.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HBASE-16446) append_peer_tableCFs failed when there already have this table's partial cfs in the peer

2016-08-18 Thread Guanghao Zhang (JIRA)

Guanghao Zhang created HBASE-16446:
--

 Summary: append_peer_tableCFs failed when there already have this 
table's partial cfs in the peer
 Key: HBASE-16446
 URL: https://issues.apache.org/jira/browse/HBASE-16446
 Project: HBase
  Issue Type: Bug
  Components: Replication
Affects Versions: 0.98.21, 2.0.0
Reporter: Guanghao Zhang
Assignee: Guanghao Zhang
Priority: Minor


{code}
hbase(main):011:0> list_peers
 PEER_ID CLUSTER_KEY STATE TABLE_CFS PROTOCOL BANDWIDTH
 20 hbase://c3tst-pressure98 ENABLED default.test_replication:A NATIVE 0
1 row(s) in 0.0080 seconds

hbase(main):012:0> append_peer_tableCFs '20', {"test_replication" => []}
0 row(s) in 0.0060 seconds

hbase(main):013:0> list_peers
 PEER_ID CLUSTER_KEY STATE TABLE_CFS PROTOCOL BANDWIDTH
 20 hbase://c3tst-pressure98 ENABLED default.test_replication:A NATIVE 0
1 row(s) in 0.0030 seconds
{code}
"test_replication" => [] means replication all cf of this table,so the result 
doesn't right.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HBASE-16447) Replication by namespace in peer

2016-08-18 Thread Guanghao Zhang (JIRA)

Guanghao Zhang created HBASE-16447:
--

 Summary: Replication by namespace in peer
 Key: HBASE-16447
 URL: https://issues.apache.org/jira/browse/HBASE-16447
 Project: HBase
  Issue Type: New Feature
  Components: Replication
Reporter: Guanghao Zhang


Now we only config table cfs in peer. But in our production cluster, there are 
a dozen of namespace and every namespace has dozens of tables. It was 
complicated to config all table cfs in peer. For some namespace, it need 
replication all tables to other slave cluster. It will be easy to config if we 
support replication by namespace. Suggestions and discussions are welcomed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HBASE-16707) [Umbrella] Improve throttling feature for production usage

2016-09-26 Thread Guanghao Zhang (JIRA)

Guanghao Zhang created HBASE-16707:
--

 Summary: [Umbrella] Improve throttling feature for production usage
 Key: HBASE-16707
 URL: https://issues.apache.org/jira/browse/HBASE-16707
 Project: HBase
  Issue Type: Umbrella
Reporter: Guanghao Zhang


HBASE-11598 add rpc throttling feature and did a great initial work there. We 
plan to use throttling in our production cluster and did some improvements for 
it. From the user mail list, I found that there are other users used throttling 
feature, too. I thought it is time to contribute our work to community, include:
1. Add shell cmd to start/stop throttling.
2. Add metrics for throttling request.
3. Basic UI support in master/regionserver.
4. Handle throttling exception in client.
5. Add more throttle types like DynamoDB, use read/write capacity unit to 
throttle.
6. Support soft limit, user can over consume his quota when regionserver has 
available capacity because other users not consume at the same time.
7. ... ...

I thought some of these improvements are useful. So open an umbrella issue to 
track. Suggestions and discussions are welcomed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HBASE-16870) Add the metrics of replication sources which were transformed from other died rs to ReplicationLoad

2016-10-18 Thread Guanghao Zhang (JIRA)

Guanghao Zhang created HBASE-16870:
--

 Summary: Add the metrics of replication sources which were 
transformed from other died rs to ReplicationLoad
 Key: HBASE-16870
 URL: https://issues.apache.org/jira/browse/HBASE-16870
 Project: HBase
  Issue Type: Bug
  Components: Replication
Reporter: Guanghao Zhang
Assignee: Guanghao Zhang
Priority: Minor


{code}
  private void buildReplicationLoad() {
// get source
List sources = 
this.replicationManager.getSources();
List sourceMetricsList = new ArrayList();

for (ReplicationSourceInterface source : sources) {
  if (source instanceof ReplicationSource) {
sourceMetricsList.add(((ReplicationSource) source).getSourceMetrics());
  }
}

// get sink
MetricsSink sinkMetrics = this.replicationSink.getSinkMetrics();
this.replicationLoad.buildReplicationLoad(sourceMetricsList, sinkMetrics);
  }
{code}
The buildReplicationLoad method in o.a.h.h.r.r.Replication didn't consider the 
replication source which were transformed from other died rs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HBASE-16868) Avoid appending table to a peer which replicates all tables

2016-10-17 Thread Guanghao Zhang (JIRA)

Guanghao Zhang created HBASE-16868:
--

 Summary: Avoid appending table to a peer which replicates all 
tables
 Key: HBASE-16868
 URL: https://issues.apache.org/jira/browse/HBASE-16868
 Project: HBase
  Issue Type: New Feature
  Components: Replication
Affects Versions: 2.0.0
Reporter: Guanghao Zhang


First add a new peer by shell cmd.
{code}
add_peer '1', CLUSTER_KEY => "server1.cie.com:2181:/hbase".
{code}
If we don't set namespaces and table cfs in peer config. It means replicate all 
tables to the peer cluster.

Then append a table to the peer config.
{code}
append_peer_tableCFs '1', {"table1" => []}
{code}
Then this peer will only replicate table1 to the peer cluster. It changes to 
replicate only one table from replicate all tables in the cluster. It is very 
easy to misuse in production cluster. So we should avoid appending table to a 
peer which replicates all table.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HBASE-16938) TableCFsUpdater maybe failed due to no write permission on peerNode

2016-10-24 Thread Guanghao Zhang (JIRA)

Guanghao Zhang created HBASE-16938:
--

 Summary: TableCFsUpdater maybe failed due to no write permission 
on peerNode
 Key: HBASE-16938
 URL: https://issues.apache.org/jira/browse/HBASE-16938
 Project: HBase
  Issue Type: Bug
  Components: Replication
Affects Versions: 2.0.0, 1.4.0
Reporter: Guanghao Zhang


After HBASE-11393, replication table-cfs use a PB object. So it need copy the 
old string config to new PB object when upgrade cluster. In our use case, we 
have different kerberos for different cluster, etc. online serve cluster and 
offline processing cluster. And we use a unify global admin kerberos for all 
clusters. The peer node is created by client. So only global admin has the 
write  permission for it. When upgrade cluster, HMaster doesn't has the write 
permission on peer node, it maybe failed to copy old table-cfs string to new PB 
Object. I thought it need a tool for client to do this copy job.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HBASE-16939) ExportSnapshot: set owner and permission on right directory

2016-10-24 Thread Guanghao Zhang (JIRA)

Guanghao Zhang created HBASE-16939:
--

 Summary: ExportSnapshot: set owner and permission on right 
directory
 Key: HBASE-16939
 URL: https://issues.apache.org/jira/browse/HBASE-16939
 Project: HBase
  Issue Type: Bug
Reporter: Guanghao Zhang
Priority: Minor


{code}
FileUtil.copy(inputFs, snapshotDir, outputFs, initialOutputSnapshotDir, false, 
false, conf);
  if (filesUser != null || filesGroup != null) {
setOwner(outputFs, snapshotTmpDir, filesUser, filesGroup, true);
  }
  if (filesMode > 0) {
setPermission(outputFs, snapshotTmpDir, (short)filesMode, true);
  }
{code}
It copy snapshot manifest to initialOutputSnapshotDir, but it set owner on 
snapshotTmpDir. They are different directory when skipTmp is true.

Another problem is new cluster doesn't have .hbase-snapshot directory. So after 
export snapshot, it should set owner on .hbase-snapshot directory.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HBASE-17088) Refactor RWQueueRpcExecutor/BalancedQueueRpcExecutor/RpcExecutor

2016-11-14 Thread Guanghao Zhang (JIRA)

Guanghao Zhang created HBASE-17088:
--

 Summary: Refactor 
RWQueueRpcExecutor/BalancedQueueRpcExecutor/RpcExecutor
 Key: HBASE-17088
 URL: https://issues.apache.org/jira/browse/HBASE-17088
 Project: HBase
  Issue Type: Improvement
  Components: rpc
Affects Versions: 2.0.0
Reporter: Guanghao Zhang


1. The RWQueueRpcExecutor has eight constructor method and the longest one has 
ten parameters. But It is only used in SimpleRpcScheduler and easy to confused 
when read the code.
2. There are duplicate method implement in RWQueueRpcExecutor and 
BalancedQueueRpcExecutor. They can be implemented in their parent class 
RpcExecutor.
3. SimpleRpcScheduler read many configs to new RpcExecutor. But the 
CALL_QUEUE_SCAN_SHARE_CONF_KEY is only needed by RWQueueRpcExecutor. And 
CALL_QUEUE_CODEL_TARGET_DELAY, CALL_QUEUE_CODEL_INTERVAL and 
CALL_QUEUE_CODEL_LIFO_THRESHOLD are only needed by AdaptiveLifoCoDelCallQueue.

So I thought we can refactor it. Suggestions are welcome.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HBASE-17178) Add region balance throttling

2016-11-27 Thread Guanghao Zhang (JIRA)

Guanghao Zhang created HBASE-17178:
--

 Summary: Add region balance throttling
 Key: HBASE-17178
 URL: https://issues.apache.org/jira/browse/HBASE-17178
 Project: HBase
  Issue Type: Improvement
  Components: Balancer
Reporter: Guanghao Zhang


Our online cluster serves dozens of  tables and different tables serve for 
different services. If the balancer moves too many regions in the same time, 
it will decrease the availability for some table or some services. So we add 
region balance throttling on our online serve cluster. 
We introduce a new config hbase.balancer.max.balancing.regions, which means the 
max number of regions in transition when balancing.
If we config this to 1 and a table have 100 regions, then the table will have 
99 regions available at any time. It helps a lot for our use case and it has 
been running a long time
our production cluster.

But for some use case, we need the balancer run faster. If a cluster has 100 
regionservers, then it add 50 new regionservers for peak requests. Then it need 
balancer run as soon as
possible and let the cluster reach a balance state soon. Our idea is compute 
max number of regions in transition by the max balancing time and the average 
time of region in transition.
Then the balancer use the computed value to throttling.

Examples for understanding.
A cluster has 100 regionservers, each regionserver has 200 regions and the 
average time of region in transition is 1 seconds, we config the max balancing 
time is 10 * 60 seconds.
Case 1. One regionserver crash, the cluster at most need balance 200 regions. 
Then 200 / (10 * 60s / 1s) < 1, it means the max number of regions in 
transition is 1 when balancing. Then the balancer can move region one by one 
and the cluster will have high availability  when balancing.
Case 2. Add other 100 regionservers, the cluster at most need balance 1 
regions. Then 1 / (10 * 60s / 1s) = 16.7, it means the max number of 
regions in transition is 17 when balancing. Then the cluster can reach a 
balance state within the max balancing time.

Any suggestions are welcomed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HBASE-17125) Inconsistent result when use filter to read data

2016-11-18 Thread Guanghao Zhang (JIRA)

Guanghao Zhang created HBASE-17125:
--

 Summary: Inconsistent result when use filter to read data
 Key: HBASE-17125
 URL: https://issues.apache.org/jira/browse/HBASE-17125
 Project: HBase
  Issue Type: Bug
Reporter: Guanghao Zhang


Assume a cloumn's max versions is 3, then we write 4 versions of this column. 
The oldest version doesn't remove immediately. But from the user view, the 
oldest version has gone. When user use a filter to query, if the filter skip a 
new version, then the oldest version will be seen again. But after compact the 
region, then the oldest version will never been seen. So it is weird for user. 
The query will get inconsistent result before and after region compaction.

The reason is matchColumn method of UserScanQueryMatcher. It first check the 
cell by filter, then check the number of versions needed. So if the filter skip 
the new version, then the oldest version will be seen again when it is not 
removed.

Have a discussion offline with [~Apache9] and [~fenghh], now we have two 
solution for this problem. The first idea is check the number of versions 
first, then check the cell by filter. As the comment of setFilter, the filter 
is called after all tests for ttl, column match, deletes and max versions have 
been run.
{code}
  /**
   * Apply the specified server-side filter when performing the Query.
   * Only {@link Filter#filterKeyValue(Cell)} is called AFTER all tests
   * for ttl, column match, deletes and max versions have been run.
   * @param filter filter to run on the server
   * @return this for invocation chaining
   */
  public Query setFilter(Filter filter) {
this.filter = filter;
return this;
  }
{code}
But this idea has another problem, if a column's max version is 5 and the user 
query only need 3 versions. It first check the version's number, then check the 
cell by filter. So the cells number of the result may less than 3. But there 
are 2 versions which don't read anymore.

So the second idea has three steps.
1. check by the max versions of this column
2. check the kv by filter
3. check the versions which user need.
But this will lead the ScanQueryMatcher more complicated. And this will break 
the javadoc of Query.setFilter.

Now we don't have a final solution for this problem. Suggestions are welcomed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HBASE-17077) Don't copy the replication queue which belong to the peer have been deleted

2016-11-11 Thread Guanghao Zhang (JIRA)

Guanghao Zhang created HBASE-17077:
--

 Summary: Don't copy the replication queue which belong to the peer 
have been deleted
 Key: HBASE-17077
 URL: https://issues.apache.org/jira/browse/HBASE-17077
 Project: HBase
  Issue Type: Improvement
Reporter: Guanghao Zhang
Assignee: Guanghao Zhang
Priority: Minor


When a region server is dead, then other live region servers will transfer the 
dead rs's replication queue to their own queue. Now the live rs first copy the 
wals queue to its own znode, then create a new replication source to replicate 
the wals. But if the queue belong to a peer have been deleted, it copy the 
queue, too. The current steps is:
1. copy the queue to its own znode
2. found the queue belong to a peer have been deleted
3. remove the queue and don't create a new replication source for it

There is a small improvement. The live region server doesn't need to copy the 
queue to its own znode. The new steps is:
1. found the queue belong to a peer have been deleted
2. remove the queue directly instead of copy it



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HBASE-17140) Throw RegionOfflineException directly when request for a disabled table

2016-11-21 Thread Guanghao Zhang (JIRA)

Guanghao Zhang created HBASE-17140:
--

 Summary: Throw RegionOfflineException directly when request for a 
disabled table
 Key: HBASE-17140
 URL: https://issues.apache.org/jira/browse/HBASE-17140
 Project: HBase
  Issue Type: Improvement
  Components: Client
Reporter: Guanghao Zhang


Now when request for a disabled table, it need 3 rpc call before fail.
1. get region location
2. send call to rs and get NotServeRegionException
3. retry and check the table state, then throw TableNotEnabledException

The table state check is added for disabled table. But now the prepare method 
in RegionServerCallable shows that all retry request will get table state first.
{code}
public void prepare(final boolean reload) throws IOException {
// check table state if this is a retry
if (reload && !tableName.equals(TableName.META_TABLE_NAME) &&
getConnection().isTableDisabled(tableName)) {
  throw new TableNotEnabledException(tableName.getNameAsString() + " is 
disabled.");
}
try (RegionLocator regionLocator = connection.getRegionLocator(tableName)) {
  this.location = regionLocator.getRegionLocation(row);
}
if (this.location == null) {
  throw new IOException("Failed to find location, tableName=" + tableName +
  ", row=" + Bytes.toString(row) + ", reload=" + reload);
}
setStubByServiceName(this.location.getServerName());
}
{code}

An improvement is set the region offline in HRegionInfo. Then throw the 
RegionOfflineException when get region location.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HBASE-16910) Avoid NPE when start StochasticLoadBalancer

2016-10-21 Thread Guanghao Zhang (JIRA)

Guanghao Zhang created HBASE-16910:
--

 Summary: Avoid NPE when start StochasticLoadBalancer
 Key: HBASE-16910
 URL: https://issues.apache.org/jira/browse/HBASE-16910
 Project: HBase
  Issue Type: Bug
  Components: Balancer
Affects Versions: 2.0.0
Reporter: Guanghao Zhang
Priority: Minor


When master start, it initialize StochasticLoadBalancer.
{code}
this.balancer.setClusterStatus(getClusterStatus());
this.balancer.setMasterServices(this);
{code}

It first setClusterStatus(), then setMasterService(). But in setClusterStatus 
method, it use master service which is not initialized. So it will throw NPE.
```
int tablesCount = isByTable ? services.getTableDescriptors().getAll().size() : 
1;
```
It happens when set isByTable is ture.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HBASE-16985) TestClusterId failed due to wrong hbase rootdir

2016-11-01 Thread Guanghao Zhang (JIRA)

Guanghao Zhang created HBASE-16985:
--

 Summary: TestClusterId failed due to wrong hbase rootdir
 Key: HBASE-16985
 URL: https://issues.apache.org/jira/browse/HBASE-16985
 Project: HBase
  Issue Type: Bug
Reporter: Guanghao Zhang
Priority: Minor


https://builds.apache.org/job/PreCommit-HBASE-Build/4253/testReport/org.apache.hadoop.hbase.regionserver/TestClusterId/testClusterId/

{code}
java.io.IOException: Shutting down
at 
org.apache.hadoop.hbase.util.JVMClusterUtil.startup(JVMClusterUtil.java:230)
at 
org.apache.hadoop.hbase.LocalHBaseCluster.startup(LocalHBaseCluster.java:409)
at 
org.apache.hadoop.hbase.MiniHBaseCluster.init(MiniHBaseCluster.java:227)
at 
org.apache.hadoop.hbase.MiniHBaseCluster.(MiniHBaseCluster.java:96)
at 
org.apache.hadoop.hbase.HBaseTestingUtility.startMiniHBaseCluster(HBaseTestingUtility.java:1071)
at 
org.apache.hadoop.hbase.HBaseTestingUtility.startMiniHBaseCluster(HBaseTestingUtility.java:1037)
at 
org.apache.hadoop.hbase.regionserver.TestClusterId.testClusterId(TestClusterId.java:85)
{code}
The cluster can not start up because there are no active master. The active 
master can not finish initialing because the hbase:namespace region can not be 
assign. 

In TestClusterId unit test, TEST_UTIL.startMiniHBaseCluster set new hbase root 
dir. But the regionserver thread which stared first used  a different hbase 
root dir. If assign hbase:namespace region to this regionserver, the region can 
not be assigned because there are no tableinfo on wrong hbase root dir.

When regionserver report to master, it will get back some new config. But the 
FSTableDescriptors has been initialed so it's root dir didn't changed.
{code}
if (LOG.isDebugEnabled()) {
LOG.info("Config from master: " + key + "=" + value);
}
{code} 
I thought FSTableDescriptors need update the rootdir when regionserver get 
report from master.

The master branch has same problem, too. But the balancer always assign 
hbase:namesapce region to master. So this unit test can passed on master branch.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Reopened] (HBASE-16983) TestMultiTableSnapshotInputFormat failing with Unable to create region directory: /tmp/...

2016-11-02 Thread Guanghao Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-16983?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guanghao Zhang reopened HBASE-16983:


Reopen for addendum. [~stack]

> TestMultiTableSnapshotInputFormat failing with  Unable to create region 
> directory: /tmp/...
> ---
>
> Key: HBASE-16983
> URL: https://issues.apache.org/jira/browse/HBASE-16983
> Project: HBase
>  Issue Type: Bug
>  Components: test
>Reporter: stack
>Assignee: stack
>Priority: Minor
> Fix For: 2.0.0, 1.4.0
>
> Attachments: 16983.txt, HBASE-16983-ADDENDUM.patch, 
> HBASE-16983-branch-1-ADDENDUM.patch
>
>
> Test is using /tmp. We failed creating dir in /tmp in a few tests from this 
> suite just now:
> https://builds.apache.org/job/PreCommit-HBASE-Build/4253/testReport/org.apache.hadoop.hbase.mapred/TestMultiTableSnapshotInputFormat/testScanOBBToOPP/
> {code}
> Caused by: java.io.IOException: Unable to create region directory: 
> /tmp/scantest2_snapshot__953e2b2d-22aa-4c6a-a46a-272619f5436e/data/default/scantest2/5629158a49e010e21ac0bd16453b2d8c
>   at 
> org.apache.hadoop.hbase.regionserver.HRegionFileSystem.createRegionOnFileSystem(HRegionFileSystem.java:896)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegion.createHRegion(HRegion.java:6520)
>   at 
> org.apache.hadoop.hbase.util.ModifyRegionUtils.createRegion(ModifyRegionUtils.java:205)
>   at 
> org.apache.hadoop.hbase.util.ModifyRegionUtils$1.call(ModifyRegionUtils.java:173)
>   at 
> org.apache.hadoop.hbase.util.ModifyRegionUtils$1.call(ModifyRegionUtils.java:170)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at java.lang.Thread.run(Thread.java:745)
> ...
> {code}
> No more detail than this. Let me change it so creates stuff in the test dir 
> that it for sure owns/can write to.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HBASE-16947) Some improvements for DumpReplicationQueues tool

2016-10-26 Thread Guanghao Zhang (JIRA)

Guanghao Zhang created HBASE-16947:
--

 Summary: Some improvements for DumpReplicationQueues tool
 Key: HBASE-16947
 URL: https://issues.apache.org/jira/browse/HBASE-16947
 Project: HBase
  Issue Type: Improvement
  Components: Replication
Reporter: Guanghao Zhang


Recently we met too many replication WALs problem in our production cluster. We 
need the DumpReplicationQueues tool to analyze the replication queues info in 
zookeeper. So I backport HBASE-16450 to our branch based 0.98 and did some 
improvements for it.

1. Show the dead regionservers under replication/rs znode. When there are too 
many WALs under znode, it can't be atomic transferred to new rs znode. So the 
dead rs znode will be leaved on zookeeper.
2. Make a summary about all the queues that belong to peer has been deleted. 
3. Aggregate all regionservers' size of replication queue. Now the regionserver 
report ReplicationLoad to master, but there were not a aggregate metrics for 
replication.
4. Show how many WALs which can not found on hdfs. But the reason (WAL Not 
Found) need more time to dig.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HBASE-17288) Add warn log for huge keyvalue and huge row

2016-12-10 Thread Guanghao Zhang (JIRA)

Guanghao Zhang created HBASE-17288:
--

 Summary: Add warn log for huge keyvalue and huge row
 Key: HBASE-17288
 URL: https://issues.apache.org/jira/browse/HBASE-17288
 Project: HBase
  Issue Type: Improvement
  Components: scan
Reporter: Guanghao Zhang
Assignee: Guanghao Zhang
Priority: Minor


Some log examples from our production cluster.
{code}
2016-12-10,17:08:11,478 WARN org.apache.hadoop.hbase.regionserver.StoreScanner: 
adding a HUGE KV into result list, kv size:1253360, 
kv:10567114001-1-c/R:r1/1481360887152/Put/vlen=1253245/ts=923099, from 
table X
2016-12-10,17:08:16,724 WARN org.apache.hadoop.hbase.regionserver.StoreScanner: 
adding a HUGE KV into result list, kv size:1048680, 
kv:0220459/I:i_0/1481360889551/Put/vlen=1048576/ts=13642, from table XX
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HBASE-17289) Avoid adding a replication peer named "lock"

2016-12-11 Thread Guanghao Zhang (JIRA)

Guanghao Zhang created HBASE-17289:
--

 Summary: Avoid adding a replication peer named "lock"
 Key: HBASE-17289
 URL: https://issues.apache.org/jira/browse/HBASE-17289
 Project: HBase
  Issue Type: Bug
  Components: Replication
Affects Versions: 1.2.4, 0.98.23, 1.1.7, 1.3.0, 1.4.0
Reporter: Guanghao Zhang
Priority: Minor


When zk based replication queue is used and useMulti is false, the steps of 
transfer replication queues are first add a lock, then copy nodes, finally 
clean old queue and the lock. And the default lock znode's name is "lock". So 
we should avoid adding a peer named "lock". 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HBASE-17312) [JDK8] Use default method in XXXObserver

2016-12-13 Thread Guanghao Zhang (JIRA)

Guanghao Zhang created HBASE-17312:
--

 Summary: [JDK8] Use default method in XXXObserver
 Key: HBASE-17312
 URL: https://issues.apache.org/jira/browse/HBASE-17312
 Project: HBase
  Issue Type: Task
  Components: Coprocessors
Affects Versions: 2.0.0
Reporter: Guanghao Zhang






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HBASE-17317) [branch-1] The updatePeerConfig method in ReplicationPeersZKImpl didn't update the table-cfs map

2016-12-14 Thread Guanghao Zhang (JIRA)

Guanghao Zhang created HBASE-17317:
--

 Summary: [branch-1] The updatePeerConfig method in 
ReplicationPeersZKImpl didn't update the table-cfs map
 Key: HBASE-17317
 URL: https://issues.apache.org/jira/browse/HBASE-17317
 Project: HBase
  Issue Type: Task
Affects Versions: 1.4.0
Reporter: Guanghao Zhang
Assignee: Guanghao Zhang


The updatePeerConfig method in ReplicationPeersZKImpl.java
{code}
  @Override
  public void updatePeerConfig(String id, ReplicationPeerConfig newConfig)
throws ReplicationException {
ReplicationPeer peer = getPeer(id);
if (peer == null){
  throw new ReplicationException("Could not find peer Id " + id);
}   
ReplicationPeerConfig existingConfig = peer.getPeerConfig();
if (newConfig.getClusterKey() != null && 
!newConfig.getClusterKey().isEmpty() &&
!newConfig.getClusterKey().equals(existingConfig.getClusterKey())){
  throw new ReplicationException("Changing the cluster key on an existing 
peer is not allowed."
  + " Existing key '" + existingConfig.getClusterKey() + "' does not 
match new key '"
  + newConfig.getClusterKey() +
  "'");
}   
String existingEndpointImpl = existingConfig.getReplicationEndpointImpl();
if (newConfig.getReplicationEndpointImpl() != null &&
!newConfig.getReplicationEndpointImpl().isEmpty() &&
!newConfig.getReplicationEndpointImpl().equals(existingEndpointImpl)){
  throw new ReplicationException("Changing the replication endpoint 
implementation class " +
  "on an existing peer is not allowed. Existing class '"
  + existingConfig.getReplicationEndpointImpl()
  + "' does not match new class '" + 
newConfig.getReplicationEndpointImpl() + "'");
}   
//Update existingConfig's peer config and peer data with the new values, 
but don't touch config
// or data that weren't explicitly changed
existingConfig.getConfiguration().putAll(newConfig.getConfiguration());
existingConfig.getPeerData().putAll(newConfig.getPeerData());
   // Bug. We should update table-cfs map, too.
try {
  ZKUtil.setData(this.zookeeper, getPeerNode(id),
  ReplicationSerDeHelper.toByteArray(existingConfig));
}   
catch(KeeperException ke){
  throw new ReplicationException("There was a problem trying to save 
changes to the " +
  "replication peer " + id, ke);
}   
  }
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HBASE-17296) Provide per peer throttling for replication

2016-12-12 Thread Guanghao Zhang (JIRA)

Guanghao Zhang created HBASE-17296:
--

 Summary: Provide per peer throttling for replication
 Key: HBASE-17296
 URL: https://issues.apache.org/jira/browse/HBASE-17296
 Project: HBase
  Issue Type: Improvement
  Components: Replication
Reporter: Guanghao Zhang


HBASE-9501 added a config to provide throttling for replication. And each peer 
has same bandwidth up limit. In our use case, one cluster may have several 
peers and several slave clusters. Each slave cluster may have different scales 
and need different bandwidth up limit for each peer. So We add bandwidth to 
replication peer config and provide a shell cmd set_peer bandwidth to update 
the bandwidth in need. It has been used for a long time on our clusters.  Any 
suggestions are welcomed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HBASE-17303) Let master to check and transfer the dead rs's replication queues

2016-12-13 Thread Guanghao Zhang (JIRA)

Guanghao Zhang created HBASE-17303:
--

 Summary: Let master to check and transfer the dead rs's 
replication queues
 Key: HBASE-17303
 URL: https://issues.apache.org/jira/browse/HBASE-17303
 Project: HBase
  Issue Type: Bug
  Components: Replication
Reporter: Guanghao Zhang


Dump replication queues result from our cluster.
{code}
Found 8 deleted queues, run hbck -fixReplication in order to remove the deleted 
replication queues
hostname,24610,1481528189915/80-hostname,24620,1476784763605

hostname,24620,1476784763605/70-hostname,24630,1470418208092-hostname,24600,1476773709589

hostname,24630,1481528526258/17000-hostname,24620,1470044455538-hostname,24630,1470037674231-hostname,24600,1476773708489-hostname,24620,1476784763605

hostname,24620,1481528358531/70-hostname,24600,1476773709589-hostname,24620,1476784763605

hostname,24600,1481528021595/70-hostname,24630,1470421093464-hostname,24630,1476773708939-hostname,24610,1476779010928-hostname,24620,1476784747260
hostname,24600,1481528021595/17000-hostname,24620,1476784763605

hostname,24600,1481528021595/17000-hostname,24630,1475381530644-hostname,24600,1476773709589-hostname,24620,1476784763605

hostname,24600,1481528021595/17000-hostname,24600,1476773709589-hostname,24620,1476784763605
Found 2 dead regionservers, restart one regionserver to transfer the queues of 
dead regionservers
hostname,24600,1481547616148
hostname,24620,1476784763605
{code}
Now for dead rs's replication znode, you need restart one regionserver to 
transfer the replication queues of dead regionservers. Same idea with 
HBASE-16336, we can let master to periodically check the dead rs znode, too. 
And send the transfer replication queues request to any regionserver. Then the 
dead rs's replication queues can be transfer automatically and don't need to 
wait a regionserver restart. Any suggestions are welcomed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HBASE-17442) Move most of the replication related classes to hbase-server package

2017-01-09 Thread Guanghao Zhang (JIRA)

Guanghao Zhang created HBASE-17442:
--

 Summary: Move most of the replication related classes to 
hbase-server package
 Key: HBASE-17442
 URL: https://issues.apache.org/jira/browse/HBASE-17442
 Project: HBase
  Issue Type: Sub-task
Affects Versions: 2.0.0
Reporter: Guanghao Zhang


After the replication requests are routed through master, replication 
implementation details didn't need be exposed to client. We should move most of 
the replication related classes to hbase-server package.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HBASE-17205) Add a metric for the duration of region in transition

2016-11-30 Thread Guanghao Zhang (JIRA)

Guanghao Zhang created HBASE-17205:
--

 Summary: Add a metric for the duration of region in transition
 Key: HBASE-17205
 URL: https://issues.apache.org/jira/browse/HBASE-17205
 Project: HBase
  Issue Type: Improvement
  Components: Region Assignment
Reporter: Guanghao Zhang
Assignee: Guanghao Zhang
Priority: Minor


When work for HBASE-17178, I found there are not a metric for the overall 
duration of region in transition. When move a region form A to B, the 
transformation of region state is PENDING_CLOSE => CLOSING => CLOSED => 
PENDING_OPEN => OPENING => OPENED. When transform old region state to new 
region state, it update the time stamp to current time. So we can't get the 
overall transformation's duration of region in transition. Add a rit duration 
to RegionState for accumulating this metric.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HBASE-17388) Move ReplicationPeer and other replication related PB messages to the replication.proto

2016-12-28 Thread Guanghao Zhang (JIRA)

Guanghao Zhang created HBASE-17388:
--

 Summary: Move ReplicationPeer and other replication related PB 
messages to the replication.proto
 Key: HBASE-17388
 URL: https://issues.apache.org/jira/browse/HBASE-17388
 Project: HBase
  Issue Type: Sub-task
  Components: Replication
Affects Versions: 2.0.0
Reporter: Guanghao Zhang
Assignee: Guanghao Zhang
 Fix For: 2.0.0






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HBASE-17389) Convert all internal usages from ReplicationAdmin to Admin

2016-12-28 Thread Guanghao Zhang (JIRA)

Guanghao Zhang created HBASE-17389:
--

 Summary: Convert all internal usages from ReplicationAdmin to Admin
 Key: HBASE-17389
 URL: https://issues.apache.org/jira/browse/HBASE-17389
 Project: HBase
  Issue Type: Sub-task
Affects Versions: 2.0.0
Reporter: Guanghao Zhang
 Fix For: 2.0.0






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HBASE-17396) Add first async admin impl and implement balance methods

2016-12-30 Thread Guanghao Zhang (JIRA)

Guanghao Zhang created HBASE-17396:
--

 Summary: Add first async admin impl and implement balance methods
 Key: HBASE-17396
 URL: https://issues.apache.org/jira/browse/HBASE-17396
 Project: HBase
  Issue Type: Sub-task
Reporter: Guanghao Zhang
Assignee: Guanghao Zhang






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HBASE-17443) Move listReplicated/enableTableRep/disableTableRep from ReplicationAdmin to Admin

2017-01-10 Thread Guanghao Zhang (JIRA)

Guanghao Zhang created HBASE-17443:
--

 Summary: Move listReplicated/enableTableRep/disableTableRep from 
ReplicationAdmin to Admin
 Key: HBASE-17443
 URL: https://issues.apache.org/jira/browse/HBASE-17443
 Project: HBase
  Issue Type: Sub-task
Affects Versions: 2.0.0
Reporter: Guanghao Zhang
 Fix For: 2.0.0


We have moved other replication requests to Admin and mark ReplicationAdmin as 
Deprecated, so listReplicated/enableTableRep/disableTableRep methods need move 
to Admin, too.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HBASE-17348) Remove the unused hbase.replication from javadoc/comment completely

2016-12-20 Thread Guanghao Zhang (JIRA)

Guanghao Zhang created HBASE-17348:
--

 Summary: Remove the unused hbase.replication from javadoc/comment 
completely
 Key: HBASE-17348
 URL: https://issues.apache.org/jira/browse/HBASE-17348
 Project: HBase
  Issue Type: Improvement
Reporter: Guanghao Zhang
Assignee: Guanghao Zhang
Priority: Trivial


Configuration hbase.replication has been removed by HBASE-16040. But there are 
still some hbase.replication left in javadoc of ReplicationAdmin, Admin.proto 
and shell.rb. Let's remove it completely.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HBASE-17337) list replication peers request should be routed through master

2016-12-19 Thread Guanghao Zhang (JIRA)

Guanghao Zhang created HBASE-17337:
--

 Summary: list replication peers request should be routed through 
master
 Key: HBASE-17337
 URL: https://issues.apache.org/jira/browse/HBASE-17337
 Project: HBase
  Issue Type: Sub-task
Reporter: Guanghao Zhang
Assignee: Guanghao Zhang






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HBASE-17335) enable/disable replication peer requests should be routed through master

2016-12-19 Thread Guanghao Zhang (JIRA)

Guanghao Zhang created HBASE-17335:
--

 Summary: enable/disable replication peer requests should be routed 
through master
 Key: HBASE-17335
 URL: https://issues.apache.org/jira/browse/HBASE-17335
 Project: HBase
  Issue Type: Sub-task
Reporter: Guanghao Zhang
Assignee: Guanghao Zhang


As HBASE-11392 description says, we should move replication operations to be 
routed through master.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HBASE-17336) get/update replication peer config requests should be routed through master

2016-12-19 Thread Guanghao Zhang (JIRA)

Guanghao Zhang created HBASE-17336:
--

 Summary: get/update replication peer config requests should be 
routed through master
 Key: HBASE-17336
 URL: https://issues.apache.org/jira/browse/HBASE-17336
 Project: HBase
  Issue Type: Sub-task
Reporter: Guanghao Zhang
Assignee: Guanghao Zhang


As HBASE-11392 description says, we should move replication operations to be 
routed through master.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (HBASE-14609) Can't config all day as OffPeakHours

2016-12-24 Thread Guanghao Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-14609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guanghao Zhang resolved HBASE-14609.

Resolution: Won't Fix

> Can't config all day as OffPeakHours
> 
>
> Key: HBASE-14609
> URL: https://issues.apache.org/jira/browse/HBASE-14609
> Project: HBase
>  Issue Type: Bug
>    Reporter: Guanghao Zhang
>        Assignee: Guanghao Zhang
>Priority: Minor
>
> The offpeak hours is [startHour, endHour) and endhour is exclusive. But 
> endHour is not valid when config as 24, so we can't config all day as 
> OffPeakHours.
> {code}
>   private static boolean isValidHour(int hour) {
> return 0 <= hour && hour <= 23; 
>   }
> {code}
> Let endHour=24 is valid or  enable startHour==endHour can fix this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HBASE-17326) Fix findbugs warning in BufferedMutatorParams

2016-12-16 Thread Guanghao Zhang (JIRA)

Guanghao Zhang created HBASE-17326:
--

 Summary: Fix findbugs warning in BufferedMutatorParams
 Key: HBASE-17326
 URL: https://issues.apache.org/jira/browse/HBASE-17326
 Project: HBase
  Issue Type: Bug
Reporter: Guanghao Zhang


https://builds.apache.org/job/PreCommit-HBASE-Build/4947/artifact/patchprocess/branch-findbugs-hbase-client-warnings.html

org.apache.hadoop.hbase.client.BufferedMutatorParams defines clone() but 
doesn't implement Cloneable



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HBASE-17846) [JDK8] Use Optional instead of Nullable parameter in async client

2017-03-29 Thread Guanghao Zhang (JIRA)

Guanghao Zhang created HBASE-17846:
--

 Summary: [JDK8] Use Optional instead of Nullable parameter in 
async client
 Key: HBASE-17846
 URL: https://issues.apache.org/jira/browse/HBASE-17846
 Project: HBase
  Issue Type: Improvement
Affects Versions: 2.0.0
Reporter: Guanghao Zhang
Assignee: Guanghao Zhang


For master branch, we use a lot of java 8 features in async client, like 
lambda, stream, default method and so on. And java 8 support Optional, we can 
update some method to use Optional instead of Nullable parameter. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Created] (HBASE-17790) Mark ReplicationAdmin's peerAdded and listReplicationPeers as Deprecated

2017-03-15 Thread Guanghao Zhang (JIRA)

Guanghao Zhang created HBASE-17790:
--

 Summary: Mark ReplicationAdmin's peerAdded and 
listReplicationPeers as Deprecated
 Key: HBASE-17790
 URL: https://issues.apache.org/jira/browse/HBASE-17790
 Project: HBase
  Issue Type: Sub-task
Reporter: Guanghao Zhang
Assignee: Guanghao Zhang
Priority: Minor


Now most of public method in ReplicationAdmin have been moved to Admin and 
marked as Deprecated. And peerAdded and listReplicationPeers method need be 
marked as Deprecated, too.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Created] (HBASE-17913) Fix flaky test TestExportSnapshot/TestMobExportSnapshot/TestMobSecureExportSnapshot/TestSecureExportSnapshot

2017-04-13 Thread Guanghao Zhang (JIRA)

Guanghao Zhang created HBASE-17913:
--

 Summary: Fix flaky test 
TestExportSnapshot/TestMobExportSnapshot/TestMobSecureExportSnapshot/TestSecureExportSnapshot
 Key: HBASE-17913
 URL: https://issues.apache.org/jira/browse/HBASE-17913
 Project: HBase
  Issue Type: Bug
Reporter: Guanghao Zhang


https://builds.apache.org/job/PreCommit-HBASE-Build/6410/artifact/patchprocess/patch-unit-hbase-server.txt

Failed tests: 
  TestExportSnapshot.testExportRetry:273->testExportFileSystemState:233 
expected:<0> but was:<1>
  
TestExportSnapshot.testExportWithTargetName:192->testExportFileSystemState:197->testExportFileSystemState:204->testExportFileSystemState:233
 expected:<0> but was:<1>
  
TestMobExportSnapshot>TestExportSnapshot.testExportRetry:273->TestExportSnapshot.testExportFileSystemState:233
 expected:<0> but was:<1>
  
TestMobExportSnapshot>TestExportSnapshot.testExportWithTargetName:192->TestExportSnapshot.testExportFileSystemState:197->TestExportSnapshot.testExportFileSystemState:204->TestExportSnapshot.testExportFileSystemState:233
 expected:<0> but was:<1>
  
TestMobSecureExportSnapshot>TestExportSnapshot.testConsecutiveExports:184->TestExportSnapshot.testExportFileSystemState:204->TestExportSnapshot.testExportFileSystemState:233
 expected:<0> but was:<1>
  
TestMobSecureExportSnapshot>TestExportSnapshot.testEmptyExportFileSystemState:178->TestExportSnapshot.testExportFileSystemState:197->TestExportSnapshot.testExportFileSystemState:204->TestExportSnapshot.testExportFileSystemState:233
 expected:<0> but was:<1>
  
TestMobSecureExportSnapshot>TestExportSnapshot.testExportFileSystemState:163->TestExportSnapshot.testExportFileSystemState:197->TestExportSnapshot.testExportFileSystemState:204->TestExportSnapshot.testExportFileSystemState:233
 expected:<0> but was:<1>
  
TestSecureExportSnapshot>TestExportSnapshot.testExportFileSystemStateWithSkipTmp:170->TestExportSnapshot.testExportFileSystemState:197->TestExportSnapshot.testExportFileSystemState:204->TestExportSnapshot.testExportFileSystemState:233
 expected:<0> but was:<1>
  
TestSecureExportSnapshot>TestExportSnapshot.testExportRetry:273->TestExportSnapshot.testExportFileSystemState:233
 expected:<0> but was:<1>



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Created] (HBASE-17915) Implement replication admin methods

2017-04-14 Thread Guanghao Zhang (JIRA)

Guanghao Zhang created HBASE-17915:
--

 Summary: Implement replication admin methods
 Key: HBASE-17915
 URL: https://issues.apache.org/jira/browse/HBASE-17915
 Project: HBase
  Issue Type: Sub-task
Reporter: Guanghao Zhang
Assignee: Guanghao Zhang






--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Reopened] (HBASE-9899) for idempotent operation dups, return the result instead of throwing conflict exception

2017-04-23 Thread Guanghao Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-9899?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guanghao Zhang reopened HBASE-9899:
---

Reopen for forget to pass nonce for scanner in branch-1.

> for idempotent operation dups, return the result instead of throwing conflict 
> exception
> ---
>
> Key: HBASE-9899
> URL: https://issues.apache.org/jira/browse/HBASE-9899
> Project: HBase
>  Issue Type: Improvement
>Reporter: Sergey Shelukhin
>    Assignee: Guanghao Zhang
> Fix For: 2.0.0, 1.3.0, 1.4.0
>
> Attachments: HBASE-9899-addendum.patch, HBASE-9899-branch-1.patch, 
> HBASE-9899-branch-1.patch, HBASE-9899-branch-1.patch, HBASE-9899-v1.patch, 
> HBASE-9899-v2.patch, HBASE-9899-v3.patch, HBASE-9899-v3.patch, 
> HBASE-9899-v4.patch, HBASE-9899-v4.patch
>
>
> After HBASE-3787, we could store mvcc in operation context, and use it to 
> convert the modification request into read on dups instead of throwing 
> OperationConflictException.
> MVCC tracking will have to be aware of such MVCC numbers present. Given that 
> scanners are usually relatively short-lived, that would prevent low watermark 
> from advancing for quite a bit more time



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Created] (HBASE-18485) Performance issue: ClientAsyncPrefetchScanner is slower than ClientSimpleScanner

2017-07-31 Thread Guanghao Zhang (JIRA)

Guanghao Zhang created HBASE-18485:
--

 Summary: Performance issue: ClientAsyncPrefetchScanner is slower 
than ClientSimpleScanner
 Key: HBASE-18485
 URL: https://issues.apache.org/jira/browse/HBASE-18485
 Project: HBase
  Issue Type: Bug
Reporter: Guanghao Zhang


Copied the test result from HBASE-17994.
{code}
./bin/hbase org.apache.hadoop.hbase.PerformanceEvaluation --rows=10 
--nomapred scan 1
./bin/hbase org.apache.hadoop.hbase.PerformanceEvaluation --rows=10 
--nomapred --asyncPrefetch=True scan 1
{code}

Mean latency.
|| ||Test1|| Test2 || Test3 || Test4|| Test5||
|scan| 12.21 | 14.32 | 13.25 | 13.07 | 11.83 |
|scan with prefetch=True | 37.36 | 37.88 | 37.56 | 37.66 | 38.28 |




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Created] (HBASE-18481) The autoFlush flag was not used in PE tool

2017-07-30 Thread Guanghao Zhang (JIRA)

Guanghao Zhang created HBASE-18481:
--

 Summary: The autoFlush flag was not used in PE tool
 Key: HBASE-18481
 URL: https://issues.apache.org/jira/browse/HBASE-18481
 Project: HBase
  Issue Type: Bug
Reporter: Guanghao Zhang
Priority: Minor


After HBASE-12728, PE used the BufferedMutator for random/sequential write test 
and the autoFlush flag was not used. So all write test will buffered the write 
request and send as a batch request when the buffer has filled.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Created] (HBASE-18500) Performance issue: Don't use BufferedMutator for HTable's put method

2017-08-01 Thread Guanghao Zhang (JIRA)

Guanghao Zhang created HBASE-18500:
--

 Summary: Performance issue: Don't use BufferedMutator for HTable's 
put method
 Key: HBASE-18500
 URL: https://issues.apache.org/jira/browse/HBASE-18500
 Project: HBase
  Issue Type: Bug
Reporter: Guanghao Zhang


Copied the test result from HBASE-17994.
Run start-hbase.sh in my local computer and use the default config to test with 
PE tool.
{code}
./bin/hbase org.apache.hadoop.hbase.PerformanceEvaluation --rows=10 
--nomapred --autoFlush=True randomWrite 1
./bin/hbase org.apache.hadoop.hbase.PerformanceEvaluation --rows=10 
--nomapred --autoFlush=True asyncRandomWrite 1
{code}

Mean latency test result.
|| || Test1 || Test2 || Test3 || Test4 || Test5 ||
| randomWrite | 164.39 | 161.22 | 164.78 | 140.61 | 151.69 |
| asyncRandomWrite | 122.29 | 125.58 | 122.23 | 113.18 | 123.02 |

50th latency test result.
|| || Test1 || Test2 || Test3 || Test4 || Test5 ||
| randomWrite | 130.00 | 125.00 | 123.00 | 112.00 | 121.00 |
| asyncRandomWrite | 95.00 | 97.00 | 95.00 | 88.00 | 95.00 |

99th latency test result.
|| || Test1 || Test2 || Test3 || Test4 || Test5 ||
| randomWrite | 600.00 | 600.00 | 650.00 | 404.00 | 425.00 |
| asyncRandomWrite | 339.00 | 327.00 | 297.00 | 311.00 | 318.00 |

In our internal 0.98 branch, the PE test result shows the async write has the 
almost same latency with the blocking write. But for master branch, the result 
shows the async write has better latency than the blocking client.  Take a look 
about the code, I thought the difference is the BufferedMutator. For master 
branch, HTable don't have a write buffer and all write request will be flushed 
directly. And user can use BufferedMutator when user want to perform 
client-side buffering of writes. For the performance issue (autoFlush=True), I 
thought we can use rpc caller directly in HTable's put method. Thanks.




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Created] (HBASE-18608) AsyncConnection should return AsyncAdmin interface instead of the implemenation

2017-08-16 Thread Guanghao Zhang (JIRA)

Guanghao Zhang created HBASE-18608:
--

 Summary: AsyncConnection should return AsyncAdmin interface 
instead of the implemenation
 Key: HBASE-18608
 URL: https://issues.apache.org/jira/browse/HBASE-18608
 Project: HBase
  Issue Type: Sub-task
Reporter: Guanghao Zhang
Assignee: Guanghao Zhang


hbase-client/src/main/java/org/apache/hadoop/hbase/client/AsyncConnection.java
{code}
AsyncAdminBuilder getAdminBuilder();
AsyncAdminBuilder getAdminBuilder(ExecutorService pool);
{code}
These two method should not use the implemenation: RawAsyncHBaseAdmin and 
AsyncHBaseAdmin.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Created] (HBASE-18598) AsyncNonMetaRegionLocator use FIFO algorithm to get a candidate locate request

2017-08-15 Thread Guanghao Zhang (JIRA)

Guanghao Zhang created HBASE-18598:
--

 Summary: AsyncNonMetaRegionLocator use FIFO algorithm to get a 
candidate locate request
 Key: HBASE-18598
 URL: https://issues.apache.org/jira/browse/HBASE-18598
 Project: HBase
  Issue Type: Bug
  Components: asyncclient
Reporter: Guanghao Zhang
Assignee: Guanghao Zhang
Priority: Minor






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Created] (HBASE-18600) Provide a method to disable the Short-Circuit HTable in coprocessor

2017-08-15 Thread Guanghao Zhang (JIRA)

Guanghao Zhang created HBASE-18600:
--

 Summary: Provide a method to disable the Short-Circuit HTable in 
coprocessor
 Key: HBASE-18600
 URL: https://issues.apache.org/jira/browse/HBASE-18600
 Project: HBase
  Issue Type: Improvement
Reporter: Guanghao Zhang






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Created] (HBASE-18571) RpcContext inconsistent when call Table's method in RegionCoprocessorEnvironment

2017-08-11 Thread Guanghao Zhang (JIRA)

Guanghao Zhang created HBASE-18571:
--

 Summary: RpcContext inconsistent when call Table's method in 
RegionCoprocessorEnvironment
 Key: HBASE-18571
 URL: https://issues.apache.org/jira/browse/HBASE-18571
 Project: HBase
  Issue Type: Bug
Reporter: Guanghao Zhang






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Resolved] (HBASE-17797) Add a filter to implement the function which return the special number of versions of each column

2017-08-10 Thread Guanghao Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-17797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guanghao Zhang resolved HBASE-17797.

Resolution: Duplicate

As we solved this in HBASE-17125, so resolve this as duplicate.

> Add a filter to implement the function which return the special number of 
> versions of each column
> -
>
> Key: HBASE-17797
> URL: https://issues.apache.org/jira/browse/HBASE-17797
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.0.0
>    Reporter: Guanghao Zhang
>    Assignee: Guanghao Zhang
>
> After HBASE-17125, ScanQueryMatch will first check column then check by 
> filter. The scan/get will get consistent result when use filter to read data. 
> But scan/get setMaxVersions() can not return the special number of versions 
> of each column. So this issue will introduce a new filter to implement this 
> function which return the special number of versions of each column.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Created] (HBASE-18380) Implement async RSGroup admin based on the async admin

2017-07-13 Thread Guanghao Zhang (JIRA)

Guanghao Zhang created HBASE-18380:
--

 Summary: Implement async RSGroup admin based on the async admin
 Key: HBASE-18380
 URL: https://issues.apache.org/jira/browse/HBASE-18380
 Project: HBase
  Issue Type: Sub-task
Reporter: Guanghao Zhang


Now the RSGroup admin client get a blocking stub based on the blocking admin's 
coprocessor service. As we add coprocessor service support in async admin. So 
we can implement a new async RSGroup admin client based on the new async admin.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Reopened] (HBASE-18343) Track the remaining unimplemented methods for async admin

2017-07-11 Thread Guanghao Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-18343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guanghao Zhang reopened HBASE-18343:


TestAsyncRegionAdminApi#testSplitSwitch failed on my PC.

> Track the remaining unimplemented methods for async admin
> -
>
> Key: HBASE-18343
> URL: https://issues.apache.org/jira/browse/HBASE-18343
> Project: HBase
>  Issue Type: Sub-task
>  Components: Client
>        Reporter: Guanghao Zhang
>    Assignee: Guanghao Zhang
> Fix For: 3.0.0, 2.0.0-alpha-2
>
> Attachments: HBASE-18343.master.001.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Created] (HBASE-18343) Track the remaining unimplemented methods

2017-07-08 Thread Guanghao Zhang (JIRA)

Guanghao Zhang created HBASE-18343:
--

 Summary: Track the remaining unimplemented methods
 Key: HBASE-18343
 URL: https://issues.apache.org/jira/browse/HBASE-18343
 Project: HBase
  Issue Type: Sub-task
Reporter: Guanghao Zhang






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Created] (HBASE-18342) Add coprocessor service support

2017-07-08 Thread Guanghao Zhang (JIRA)

Guanghao Zhang created HBASE-18342:
--

 Summary: Add coprocessor service support
 Key: HBASE-18342
 URL: https://issues.apache.org/jira/browse/HBASE-18342
 Project: HBase
  Issue Type: Sub-task
Reporter: Guanghao Zhang






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Resolved] (HBASE-17359) Implement async admin

2017-07-16 Thread Guanghao Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-17359?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guanghao Zhang resolved HBASE-17359.

   Resolution: Fixed
Fix Version/s: (was: 2.0.0)
   2.0.0-alpha-2
   3.0.0

All sub-tasks have been resolved.

> Implement async admin
> -
>
> Key: HBASE-17359
> URL: https://issues.apache.org/jira/browse/HBASE-17359
> Project: HBase
>  Issue Type: Umbrella
>  Components: Client
>Reporter: Duo Zhang
>    Assignee: Guanghao Zhang
>  Labels: asynchronous
> Fix For: 3.0.0, 2.0.0-alpha-2
>
>
> And as we will return a CompletableFuture, I think we can just remove the 
> XXXAsync methods, and make all the methods blocking which means we will only 
> finish the CompletableFuture when the operation is done. User can choose 
> whether to wait on the returned CompletableFuture.
> Convert this to a umbrella task. There maybe some sub-tasks.
> 1. Table admin operations.
> 2. Region admin operations.
> 3. Namespace admin operations.
> 4. Snapshot admin operations.
> 5. Replication admin operations.
> 6. Other operations, like quota, balance..



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Created] (HBASE-18297) Provide a AsyncAdminBuilder to create new AsyncAdmin instance

2017-06-29 Thread Guanghao Zhang (JIRA)

Guanghao Zhang created HBASE-18297:
--

 Summary: Provide a AsyncAdminBuilder to create new AsyncAdmin 
instance
 Key: HBASE-18297
 URL: https://issues.apache.org/jira/browse/HBASE-18297
 Project: HBase
  Issue Type: Sub-task
Reporter: Guanghao Zhang


Similar with AsyncTableBuilder, user can only set the configs they care about 
when create a new AsyncAdmin instance. It is easy to update the rpc timeout or 
operation timeout config when they take some time-consuming admin operations.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Created] (HBASE-18317) Implement async admin operations for Normalizer/CleanerChore/CatalogJanitor

2017-07-04 Thread Guanghao Zhang (JIRA)

Guanghao Zhang created HBASE-18317:
--

 Summary: Implement async admin operations for 
Normalizer/CleanerChore/CatalogJanitor
 Key: HBASE-18317
 URL: https://issues.apache.org/jira/browse/HBASE-18317
 Project: HBase
  Issue Type: Sub-task
Reporter: Guanghao Zhang






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Created] (HBASE-18318) Implement updateConfiguration/stopMaster/stopRegionServer/shutdown methods

2017-07-04 Thread Guanghao Zhang (JIRA)

Guanghao Zhang created HBASE-18318:
--

 Summary: Implement 
updateConfiguration/stopMaster/stopRegionServer/shutdown methods
 Key: HBASE-18318
 URL: https://issues.apache.org/jira/browse/HBASE-18318
 Project: HBase
  Issue Type: Sub-task
Reporter: Guanghao Zhang






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Created] (HBASE-18316) Implement async admin operations for draining region servers

2017-07-04 Thread Guanghao Zhang (JIRA)

Guanghao Zhang created HBASE-18316:
--

 Summary: Implement async admin operations for draining region 
servers
 Key: HBASE-18316
 URL: https://issues.apache.org/jira/browse/HBASE-18316
 Project: HBase
  Issue Type: Sub-task
Reporter: Guanghao Zhang






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Created] (HBASE-18319) Implement getClusterStatus method

2017-07-04 Thread Guanghao Zhang (JIRA)

Guanghao Zhang created HBASE-18319:
--

 Summary: Implement getClusterStatus method
 Key: HBASE-18319
 URL: https://issues.apache.org/jira/browse/HBASE-18319
 Project: HBase
  Issue Type: Sub-task
Reporter: Guanghao Zhang






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Created] (HBASE-17958) Avoid passing cell to ScanQueryMatcher when optimize SEEK to SKIP

2017-04-25 Thread Guanghao Zhang (JIRA)

Guanghao Zhang created HBASE-17958:
--

 Summary: Avoid passing cell to ScanQueryMatcher when optimize SEEK 
to SKIP
 Key: HBASE-17958
 URL: https://issues.apache.org/jira/browse/HBASE-17958
 Project: HBase
  Issue Type: Bug
Reporter: Guanghao Zhang


{code}
ScanQueryMatcher.MatchCode qcode = matcher.match(cell);
qcode = optimize(qcode, cell);
{code}
The optimize method may change the MatchCode from SEEK_NEXT_COL/SEEK_NEXT_ROW 
to SKIP. But it still pass the next cell to ScanQueryMatcher. It will get wrong 
result when use some filter, etc. ColumnCountGetFilter. It just count the  
columns's number. If pass a same column to this filter, the count result will 
be wrong. So we should avoid passing cell to ScanQueryMatcher when optimize 
SEEK to SKIP.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Created] (HBASE-18626) Handle the incompatible change about the replication TableCFs' config

2017-08-17 Thread Guanghao Zhang (JIRA)

Guanghao Zhang created HBASE-18626:
--

 Summary: Handle the incompatible change about the replication 
TableCFs' config
 Key: HBASE-18626
 URL: https://issues.apache.org/jira/browse/HBASE-18626
 Project: HBase
  Issue Type: Bug
Reporter: Guanghao Zhang
Priority: Blocker


About compatibility, there is one incompatible change about the replication 
TableCFs' config. The old config is a string and it concatenate the list of 
tables and column families in format "table1:cf1,cf2;table2:cfA,cfB" in 
zookeeper for table-cf to replication peer mapping. When parse the config, it 
use ":" to split the string. If table name includes namespace, it will be wrong 
(See HBASE-11386). It is a problem since we support namespace (0.98). So 
HBASE-11393 (and HBASE-16653) changed it to a PB object. When rolling update 
cluster, you need rolling master first. And the master will try to translate 
the string config to a PB object. But there are two problems.
1. Permission problem. The replication client can write the zookeeper directly. 
So the znode may have different owner. And master may don't have the write 
permission for the znode. It maybe failed to translate old table-cfs string to 
new PB Object. See HBASE-16938
2. We usually keep compatibility between old client and new server. But the old 
replication client may write a string config to znode directly. Then the new 
server can't parse them.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Created] (HBASE-18053) AsyncTableResultScanner will hang when scan wrong column family

2017-05-15 Thread Guanghao Zhang (JIRA)

Guanghao Zhang created HBASE-18053:
--

 Summary: AsyncTableResultScanner will hang when scan wrong column 
family
 Key: HBASE-18053
 URL: https://issues.apache.org/jira/browse/HBASE-18053
 Project: HBase
  Issue Type: Bug
Reporter: Guanghao Zhang






--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Created] (HBASE-18052) Add doc, examples for async admin

2017-05-15 Thread Guanghao Zhang (JIRA)

Guanghao Zhang created HBASE-18052:
--

 Summary: Add doc, examples for async admin
 Key: HBASE-18052
 URL: https://issues.apache.org/jira/browse/HBASE-18052
 Project: HBase
  Issue Type: Sub-task
Reporter: Guanghao Zhang






--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Reopened] (HBASE-18234) Revisit the async admin api

2017-06-26 Thread Guanghao Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-18234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guanghao Zhang reopened HBASE-18234:


> Revisit the async admin api
> ---
>
> Key: HBASE-18234
> URL: https://issues.apache.org/jira/browse/HBASE-18234
> Project: HBase
>  Issue Type: Sub-task
>  Components: Client
>        Reporter: Guanghao Zhang
>    Assignee: Guanghao Zhang
> Fix For: 3.0.0, 2.0.0-alpha-2
>
> Attachments: HBASE-18234.master.001.patch, 
> HBASE-18234.master.002.patch, HBASE-18234.master.003.patch, 
> HBASE-18234.master.004.patch, HBASE-18234.master.005.patch, 
> HBASE-18234.master.006.patch, HBASE-18234.master.006.patch, 
> HBASE-18234.master.006.patch, HBASE-18234.master.007.patch, 
> HBASE-18234.master.008.patch, HBASE-18234.master.009.patch, 
> HBASE-18234.master.010.patch, HBASE-18234.master.010.patch, 
> HBASE-18234.master.addendum.patch
>
>
> 1. Update the balance method name. 
> balancer -> balance
> setBalancerRunning -> setBalancerOn
> isBalancerEnabled -> isBalancerOn
> 2. Use HRegionLocation instead of Pair<HRegionInfo, ServerName>
> 3. Remove the closeRegionWithEncodedRegionName method. Because all other api 
> can handle region name or encoded region name both. So don't need a method 
> for encoded name.
> 4. Unify the region name parameter's type to byte[]. And region name may be 
> full name or encoded name.
> 5. Unify the server name parameter's type to ServerName. For smoe api, it 
> support null for server name. So use Optional instead.
> 6. Unify the table name parameter's type to TableName.
> 7. Unify all list* method only support Pattern as the parameter type.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Resolved] (HBASE-17846) [JDK8] Use Optional instead of Nullable parameter in async client

2017-06-27 Thread Guanghao Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-17846?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guanghao Zhang resolved HBASE-17846.

Resolution: Duplicate

As HBASE-18234 has been resloved, this can be closed, too.

> [JDK8] Use Optional instead of Nullable parameter in async client
> -
>
> Key: HBASE-17846
> URL: https://issues.apache.org/jira/browse/HBASE-17846
> Project: HBase
>  Issue Type: Sub-task
>Affects Versions: 2.0.0
>        Reporter: Guanghao Zhang
>    Assignee: Guanghao Zhang
>
> For master branch, we use a lot of java 8 features in async client, like 
> lambda, stream, default method and so on. And java 8 support Optional, we can 
> update some method to use Optional instead of Nullable parameter. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Reopened] (HBASE-15616) Allow null qualifier for all table operations

2017-05-19 Thread Guanghao Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-15616?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guanghao Zhang reopened HBASE-15616:


> Allow null qualifier for all table operations
> -
>
> Key: HBASE-15616
> URL: https://issues.apache.org/jira/browse/HBASE-15616
> Project: HBase
>  Issue Type: Bug
>  Components: Client
>Affects Versions: 2.0.0
>Reporter: Jianwei Cui
>    Assignee: Guanghao Zhang
> Fix For: 2.0.0
>
> Attachments: HBASE-15615-addendum.patch, HBASE-15616-v1.patch, 
> HBASE-15616-v2.patch, HBASE-15616-v3.patch, HBASE-15616-v4.patch, 
> HBASE-15616-v5.patch
>
>
> If qualifier to check is null, the checkAndMutate/checkAndPut/checkAndDelete 
> will encounter NPE.
> The test code:
> {code}
> table.checkAndPut(row, family, null, Bytes.toBytes(0), new 
> Put(row).addColumn(family, null, Bytes.toBytes(1)));
> {code}
> The exception:
> {code}
> Exception in thread "main" 
> org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed after 
> attempts=3, exceptions:
> Fri Apr 08 15:51:31 CST 2016, 
> RpcRetryingCaller{globalStartTime=1460101891615, pause=100, maxAttempts=3}, 
> java.io.IOException: com.google.protobuf.ServiceException: 
> java.lang.NullPointerException
> Fri Apr 08 15:51:31 CST 2016, 
> RpcRetryingCaller{globalStartTime=1460101891615, pause=100, maxAttempts=3}, 
> java.io.IOException: com.google.protobuf.ServiceException: 
> java.lang.NullPointerException
> Fri Apr 08 15:51:32 CST 2016, 
> RpcRetryingCaller{globalStartTime=1460101891615, pause=100, maxAttempts=3}, 
> java.io.IOException: com.google.protobuf.ServiceException: 
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.hbase.client.RpcRetryingCallerImpl.callWithRetries(RpcRetryingCallerImpl.java:120)
>   at org.apache.hadoop.hbase.client.HTable.checkAndPut(HTable.java:772)
>   at ...
> Caused by: java.io.IOException: com.google.protobuf.ServiceException: 
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.hbase.protobuf.ProtobufUtil.getRemoteException(ProtobufUtil.java:341)
>   at org.apache.hadoop.hbase.client.HTable$7.call(HTable.java:768)
>   at org.apache.hadoop.hbase.client.HTable$7.call(HTable.java:755)
>   at 
> org.apache.hadoop.hbase.client.RpcRetryingCallerImpl.callWithRetries(RpcRetryingCallerImpl.java:99)
>   ... 2 more
> Caused by: com.google.protobuf.ServiceException: 
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.hbase.ipc.AbstractRpcClient.callBlockingMethod(AbstractRpcClient.java:239)
>   at 
> org.apache.hadoop.hbase.ipc.AbstractRpcClient$BlockingRpcChannelImplementation.callBlockingMethod(AbstractRpcClient.java:331)
>   at 
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$BlockingStub.mutate(ClientProtos.java:35252)
>   at org.apache.hadoop.hbase.client.HTable$7.call(HTable.java:765)
>   ... 4 more
> Caused by: java.lang.NullPointerException
>   at com.google.protobuf.LiteralByteString.size(LiteralByteString.java:76)
>   at 
> com.google.protobuf.CodedOutputStream.computeBytesSizeNoTag(CodedOutputStream.java:767)
>   at 
> com.google.protobuf.CodedOutputStream.computeBytesSize(CodedOutputStream.java:539)
>   at 
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$Condition.getSerializedSize(ClientProtos.java:7483)
>   at 
> com.google.protobuf.CodedOutputStream.computeMessageSizeNoTag(CodedOutputStream.java:749)
>   at 
> com.google.protobuf.CodedOutputStream.computeMessageSize(CodedOutputStream.java:530)
>   at 
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$MutateRequest.getSerializedSize(ClientProtos.java:12431)
>   at 
> org.apache.hadoop.hbase.ipc.IPCUtil.getTotalSizeWhenWrittenDelimited(IPCUtil.java:311)
>   at 
> org.apache.hadoop.hbase.ipc.AsyncRpcChannel.writeRequest(AsyncRpcChannel.java:409)
>   at 
> org.apache.hadoop.hbase.ipc.AsyncRpcChannel.callMethod(AsyncRpcChannel.java:333)
>   at 
> org.apache.hadoop.hbase.ipc.AsyncRpcClient.call(AsyncRpcClient.java:245)
>   at 
> org.apache.hadoop.hbase.ipc.AbstractRpcClient.callBlockingMethod(AbstractRpcClient.java:226)
>   ... 7 more
> {code}
> The reason is {{LiteralByteString.size()}} will throw NPE if wrapped byte 
> array is null. It is possible to invoke {{put}} and {{checkAndMutate}} on the 
> same column, because null qualifier is allowed for {{Put}},  users may be 
> confused if null qualifier is not allowed for {{checkAndMutate}}. We can also 
> convert null qualifier to empty byte array for {{checkAndMutate}} in client 
> side. Discussions and suggestions are welcomed. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Created] (HBASE-18111) Replication stuck when cluster connection is closed

2017-05-24 Thread Guanghao Zhang (JIRA)

Guanghao Zhang created HBASE-18111:
--

 Summary: Replication stuck when cluster connection is closed
 Key: HBASE-18111
 URL: https://issues.apache.org/jira/browse/HBASE-18111
 Project: HBase
  Issue Type: Bug
Affects Versions: 1.1.10, 0.98.24, 1.2.5, 1.3.1, 2.0.0, 1.4.0
Reporter: Guanghao Zhang
Assignee: Guanghao Zhang


Log:
{code}
2017-05-24,03:01:25,603 ERROR [regionserver13700-SendThread(hostxxx:11000)] 
org.apache.zookeeper.ClientCnxn: SASL authentication with Zookeeper Quorum 
member failed: javax.security.sasl.SaslException: An error: 
(java.security.PrivilegedActionException: javax.security.sasl.SaslException: 
GSS initiate failed [Caused by GSSException: No valid credentials provided 
(Mechanism level: Connection reset)]) occurred when evaluating Zookeeper Quorum 
Member's  received SASL token. Zookeeper Client will go to AUTH_FAILED state.
2017-05-24,03:01:25,615 FATAL [regionserver13700-EventThread] 
org.apache.hadoop.hbase.client.HConnectionImplementation: 
hconnection-0x1148dd9b-0x35b6b4d4ca999c6, 
quorum=10.108.37.30:11000,10.108.38.30:11000,10.108.39.30:11000,10.108.84.25:11000,10.108.84.32:11000,
 baseZNode=/hbase/c3prc-xiaomi98 hconnection-0x1148dd9b-0x35b6b4d4ca999c6 
received auth failed from ZooKeeper, aborting
org.apache.zookeeper.KeeperException$AuthFailedException: KeeperErrorCode = 
AuthFailed
at 
org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.connectionEvent(ZooKeeperWatcher.java:425)
at 
org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:333)
at 
org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:522)
at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498)

2017-05-24,03:01:25,615 INFO [regionserver13700-EventThread] 
org.apache.hadoop.hbase.client.HConnectionImplementation: Closing zookeeper 
sessionid=0x35b6b4d4ca999c6

2017-05-24,03:01:25,623 WARN [regionserver13700.replicationSource,800] 
org.apache.hadoop.hbase.replication.regionserver.HBaseInterClusterReplicationEndpoint:
 Replicate edites to peer cluster failed.
java.io.IOException: Call to hostxxx/10.136.22.6:24600 failed on local 
exception: java.io.IOException: Connection closed
{code}

jstack
{code}
 java.lang.Thread.State: TIMED_WAITING (sleeping)
at java.lang.Thread.sleep(Native Method)
at 
org.apache.hadoop.hbase.replication.regionserver.HBaseInterClusterReplicationEndpoint.sleepForRetries(HBaseInterClusterReplicationEndpoint.java:127)
at 
org.apache.hadoop.hbase.replication.regionserver.HBaseInterClusterReplicationEndpoint.replicate(HBaseInterClusterReplicationEndpoint.java:199)
at 
org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.shipEdits(ReplicationSource.java:905)
at 
org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.run(ReplicationSource.java:492)
{code}

The cluster connection was aborted when the ZookeeperWatcher receive a 
AuthFailed event. Then the HBaseInterClusterReplicationEndpoint's replicate() 
method will stuck in a while loop.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Created] (HBASE-18130) Refactor ReplicationSource

2017-05-28 Thread Guanghao Zhang (JIRA)

Guanghao Zhang created HBASE-18130:
--

 Summary: Refactor ReplicationSource
 Key: HBASE-18130
 URL: https://issues.apache.org/jira/browse/HBASE-18130
 Project: HBase
  Issue Type: Improvement
Reporter: Guanghao Zhang
Assignee: Guanghao Zhang


One basic idea is move the code about recovered queue to a new subclass 
RecoveredReplicationSource. Then ReplicationSource will don't need call 
isQueueRecovered many times. This will make the code more clearly.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Created] (HBASE-18114) Update the config of TestAsync*AdminApi to make test stable

2017-05-25 Thread Guanghao Zhang (JIRA)

Guanghao Zhang created HBASE-18114:
--

 Summary: Update the config of TestAsync*AdminApi to make test 
stable
 Key: HBASE-18114
 URL: https://issues.apache.org/jira/browse/HBASE-18114
 Project: HBase
  Issue Type: Sub-task
Reporter: Guanghao Zhang
Assignee: Guanghao Zhang


{code}
2017-05-25 17:56:34,967 INFO  
[RpcServer.default.FPBQ.Fifo.handler=3,queue=0,port=50801] 
master.HMaster$11(2297): Client=hao//127.0.0.1 disable testModifyColumnFamily
2017-05-25 17:56:37,974 INFO  [RpcClient-timer-pool1-t1] 
client.AsyncHBaseAdmin$TableProcedureBiConsumer(2219): Operation: DISABLE, 
Table Name: default:testModifyColumnFamily failed with Failed after attempts=3, 
exceptions: 
Thu May 25 17:56:35 CST 2017, , java.io.IOException: Call to 
localhost/127.0.0.1:50801 failed on local exception: 
org.apache.hadoop.hbase.ipc.CallTimeoutException: Call id=294, waitTime=1008, 
rpcTimeout=1000
Thu May 25 17:56:37 CST 2017, , java.io.IOException: Call to 
localhost/127.0.0.1:50801 failed on local exception: 
org.apache.hadoop.hbase.ipc.CallTimeoutException: Call id=295, waitTime=1299, 
rpcTimeout=1000
Thu May 25 17:56:37 CST 2017, , java.io.IOException: Call to 
localhost/127.0.0.1:50801 failed on local exception: 
org.apache.hadoop.hbase.ipc.CallTimeoutException: Call id=296, waitTime=668, 
rpcTimeout=660
017-05-25 17:56:38,936 DEBUG 
[RpcServer.default.FPBQ.Fifo.handler=3,queue=0,port=50801] 
procedure2.ProcedureExecutor(788): Stored procId=15, owner=hao, 
state=RUNNABLE:DISABLE_TABLE_PREPARE, DisableTableProcedure 
table=testModifyColumnFamily
{code}
For this disable table procedure, master return the procedure id when it submit 
the procedure to ProcedureExecutor. And the above procedure take 4 seconds to 
submit. So the disable table call failed because the rpc timeout is 1 seconds 
and the retry number is 3.
For admin operation, I thought we don't need change the default timeout config 
in unit test. And the retry is not need, too. (Or we can set a retry > 1 to 
test nonce thing). Meanwhile, the default timeout is 60 seconds. So the test 
type may need change to LargeTests.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

< 1 2 3 4 5 6 7 8 9 >

201 - 300 of 849 matches

Mail list logo