Re: [ANNOUNCE] Please welcome Lijin Bin to the HBase PMC

2020-05-25 Thread Allan Yang
Congratulations and welcome, Lijin Bin!

Best Regards
Allan Yang


Reid Chan  于2020年5月26日周二 上午10:14写道:

>
> Welcome Lijin!
>
>
>
> --
>
> Best regards,
> R.C
>
>
>
> 
> From: Guanghao Zhang 
> Sent: 25 May 2020 22:22
> To: HBase Dev List; Hbase-User
> Subject: [ANNOUNCE] Please welcome Lijin Bin to the HBase PMC
>
> On behalf of the Apache HBase PMC I am pleased to announce that Lijin Bin
> has accepted our invitation to become a PMC member on the Apache HBase
> project. We appreciate Lijin Bin stepping up to take more responsibility in
> the HBase project.
>
> Please join me in welcoming Lijin Bin to the HBase PMC!
>


Re: [ANNOUNCE] Please welcome Wellington Chevreuil to the Apache HBase PMC

2019-11-07 Thread Allan Yang
Congratulations, Wellington!
Best Regards
Allan Yang


Yu Li  于2019年11月7日周四 下午11:51写道:

> Congratulations and welcome, Wellington!
>
> Best Regards,
> Yu
>
>
> On Sun, 3 Nov 2019 at 12:02, Toshihiro Suzuki  wrote:
>
> > Congratulations! Wellington
> >
> > On Sat, Nov 2, 2019 at 9:14 PM Pankaj kr  wrote:
> >
> > > Congratulations Wellington..!!
> > >
> > > Regards,
> > > Pankaj
> > >
> > >
> > > -Original Message-
> > > From: Sean Busbey [mailto:bus...@apache.org]
> > > Sent: 24 October 2019 01:47
> > > To: dev ; Hbase-User 
> > > Subject: [ANNOUNCE] Please welcome Wellington Chevreuil to the Apache
> > > HBase PMC
> > >
> > > On behalf of the Apache HBase PMC I am pleased to announce that
> > Wellington
> > > Chevreuil has accepted our invitation to become a PMC member on the
> HBase
> > > project. We appreciate Wellington stepping up to take more
> responsibility
> > > in the HBase project.
> > >
> > > Please join me in welcoming Wellington to the HBase PMC!
> > >
> > >
> > >
> > > As a reminder, if anyone would like to nominate another person as a
> > > committer or PMC member, even if you are not currently a committer or
> PMC
> > > member, you can always drop a note to priv...@hbase.apache.org to let
> us
> > > know.
> > >
> >
>


Re: [ANNOUNCE] Please welcome Zheng Hu to the HBase PMC

2019-08-05 Thread Allan Yang
Congratulations, Hu!
Best Regards
Allan Yang


Peter Somogyi  于2019年8月5日周一 下午4:47写道:

> Congratulations!
>
> On Mon, Aug 5, 2019 at 8:57 AM Pankaj kr  wrote:
>
> > Congratulations Zheng..!!
> >
> > Regards,
> > Pankaj
> >
> > -Original Message-
> > From: Duo Zhang [mailto:zhang...@apache.org]
> > Sent: 05 August 2019 07:38
> > To: HBase Dev List ; hbase-user <
> > u...@hbase.apache.org>
> > Subject: [ANNOUNCE] Please welcome Zheng Hu to the HBase PMC
> >
> > On behalf of the Apache HBase PMC I am pleased to announce that Zheng Hu
> > has accepted our invitation to become a PMC member on the Apache HBase
> > project. We appreciate Zheng Hu stepping up to take more responsibility
> in
> > the HBase project.
> >
> > Please join me in welcoming Zheng Hu to the HBase PMC!
> >
>


Re: [ANNOUNCE] new HBase committer Sakthi

2019-08-01 Thread Allan Yang
Congratulations Sakthi !

Best Regards
Allan Yang


Nihal Jain  于2019年8月1日周四 下午8:40写道:

> Congrats Sakthi. More power to you!
>
> On Thu, 1 Aug, 2019, 4:42 PM ramkrishna vasudevan, <
> ramkrishna.s.vasude...@gmail.com> wrote:
>
> > Congratulations Sakthi !!!
> >
> > On Thu, Aug 1, 2019 at 3:34 PM 张铎(Duo Zhang) 
> > wrote:
> >
> > > Congratulations!
> > >
> > > Pankaj kr  于2019年8月1日周四 下午5:56写道:
> > >
> > > > Congratulation Sakthi..!!
> > > >
> > > > Regards,
> > > > Pankaj
> > > >
> > > > -Original Message-
> > > > From: Sean Busbey [mailto:bus...@apache.org]
> > > > Sent: 01 August 2019 05:35
> > > > To: u...@hbase.apache.org; dev 
> > > > Subject: [ANNOUNCE] new HBase committer Sakthi
> > > >
> > > > On behalf of the HBase PMC, I'm pleased to announce that Sakthi has
> > > > accepted our invitation to become an HBase committer.
> > > >
> > > > We'd like to thank Sakthi for all of his diligent contributions to
> the
> > > > project thus far. We look forward to his continued participation in
> our
> > > > community.
> > > >
> > > > Congrats and welcome Sakthi!
> > > >
> > >
> >
>


Re: [Announce] 张铎 (Duo Zhang) is Apache HBase PMC chair

2019-07-20 Thread Allan Yang
Congratulations Duo!
Best Regards
Allan Yang


宾莉金(binlijin)  于2019年7月21日周日 上午10:05写道:

> Congratulations Duo!  and thanks Misty.
>
> Anoop Sam John  于2019年7月21日周日 上午9:26写道:
>
> > Congrats Duo.
> >
> > Thanks Misty for your great work as the PMC chair.
> >
> > Anoop
> >
> > On Sat, Jul 20, 2019 at 12:07 AM Xu Cang  wrote:
> >
> > > Thank you Misty!
> > > Congratulations Duo, thanks for taking extra work!
> > >
> > > On Fri, Jul 19, 2019 at 11:23 AM Zach York <
> zyork.contribut...@gmail.com
> > >
> > > wrote:
> > >
> > > > Congratulations Duo! Thanks for offering to take on the additional
> > work!
> > > >
> > > > On Fri, Jul 19, 2019 at 10:34 AM Stack  wrote:
> > > >
> > > > > Thank you Misty for your years of service (FYI, for non-PMCers, the
> > > > reports
> > > > > Misty wrote to the Apache Board on our behalf were repeatedly
> called
> > > out
> > > > > for their quality and thoughtfulness).
> > > > >
> > > > > Duo Zhang, thank you for taking on the mantle.
> > > > >
> > > > > S
> > > > >
> > > > > On Thu, Jul 18, 2019 at 10:46 AM Misty Linville 
> > > > wrote:
> > > > >
> > > > > > Each Apache project has a project management committee (PMC) that
> > > > > oversees
> > > > > > governance of the project, votes on new committers and PMC
> members,
> > > and
> > > > > > ensures that the software we produce adheres to the standards of
> > the
> > > > > > Foundation. One of the roles on the PMC is the PMC chair. The PMC
> > > chair
> > > > > > represents the project as a Vice President of the Foundation and
> > > > > > communicates to the board about the project's health, once per
> > > quarter
> > > > > and
> > > > > > at other times as needed.
> > > > > >
> > > > > > It's been my honor to serve as your PMC chair since 2017, when I
> > took
> > > > > over
> > > > > > from Andrew Purtell. I've decided to step back from my volunteer
> > ASF
> > > > > > activities to leave room in my life for other things. The HBase
> PMC
> > > > > > nominated Duo for this role, and Duo has kindly agreed! The board
> > > > passed
> > > > > > this resolution in its meeting yesterday[1] and it is already
> > > > > official[2].
> > > > > > Congratulations, Duo, and thank you for continuing to honor the
> > > project
> > > > > > with your dedication.
> > > > > >
> > > > > > Misty
> > > > > >
> > > > > > [1] The minutes have not yet posted at the time of this email,
> but
> > > will
> > > > > be
> > > > > > available at
> > http://www.apache.org/foundation/records/minutes/2019/.
> > > > > > [2] https://www.apache.org/foundation/#who-runs-the-asf
> > > > > >
> > > > >
> > > >
> > >
> >
>
>
> --
> *Best Regards,*
>  lijin bin
>


Re: DISCUSS: Heads-up 2.1.5 RC in next few days and suggest EOL'ing 2.1 branch

2019-05-16 Thread Allan Yang
Why so soon... branch-2.2 is not stable yet. branch-2.1 is the only stable
branch in 2.x since branch-2.0 is already EOL’d.
Best Regards
Allan Yang


张铎(Duo Zhang)  于2019年5月17日周五 上午10:23写道:

> The next RC for 2.2.0 will be available soon...
>
> Sean Busbey  于2019年5月17日周五 上午10:13写道:
>
> > Too early to EOL 2.1.
> >
> > Also we didn't EOL 2.0 at 2.0.5 because folks wanted a post 2.2.0
> release.
> >
> > On Thu, May 16, 2019, 18:26 Zach York 
> > wrote:
> >
> > > I like the proactive approach to EOLing branches, but I don't think we
> > can
> > > quite EOL a branch when there is no newer branch (2.2.0) out. If that's
> > > 2.1.6, that's fine.
> > >
> > > On Thu, May 16, 2019 at 3:18 PM Stack  wrote:
> > >
> > > > Was going to put up an RC for 2.1.5 in next day or so after a review
> of
> > > > unresolved JIRAs that have 2.1.5 as fix version. It is coming up on
> two
> > > > months since 2.1.4 and by the time we're done, there'll be 90+
> changes.
> > > > Branch-2.1 nightlies have started to stabilize again.
> > > >
> > > > Was also thinking of EOL'ing the 2.1 branch. We EOL'd 2.0 branch at
> > > 2.0.5.
> > > > We have too many branches as it is. What do folks think? 2.2.0 isn't
> > out
> > > > yet so maybe wait on 2.1.6? That'd be fine too.
> > > >
> > > > Thanks,
> > > > S
> > > >
> > >
> >
>


Re: 申请 JIRA Contributor 权限

2019-04-08 Thread Allan Yang
Which Jira issue are you interested in, I can make you an assignee.
Best Regards
Allan Yang


郭康康  于2019年4月9日周二 上午10:43写道:

> Hi,
>
> I want to contribute to Apache HBase.
> Would you please give me the contributor permission?
> My JIRA ID is gk_coder.
>
>


Re: [DISCUSS] EOL for branch-1.2

2019-04-07 Thread Allan Yang
+1
Best Regards
Allan Yang


Stack  于2019年4月8日周一 上午1:43写道:

> +1
> Stack
>
> On Fri, Apr 5, 2019 at 6:35 PM Sean Busbey  wrote:
>
> > Hi folks!
> >
> > Back when our stable pointer first moved off of branch-1.2 releases I
> > said I'd do ~6 months of releases[1]. I'm preparing an RC for the
> > 1.2.12 release[2] and it's been ~6.5 months.
> >
> > I intend 1.2.12 to be the last release I manage off of branch-1.2. If
> > anyone else sees a reason to keep this release line going and is
> > willing to step in as RM please speak up.
> >
> > This is the template for the notice that's been in the ANNOUNCE emails
> > for 1.2.7 - 1.2.11:
> >
> > > All users of previous 1.2.z releases are encouraged to upgrade to
> either
> > > this release or the latest in our stable release line, which is
> currently
> > > X.Y.Z. Releases in the 1.2.z line are expected to stop in late spring
> > 2019.
> >
> >
> > Presuming no one wants to keep things going by the time a 1.2.12 VOTE
> > passes, I'll modify it to say the branch is EOL. About a month later
> > I'll do a dedicated EOL post to user@hbase, clean up project
> > references to the release line, and then remove the release from
> > dist.apache.org (it will remain along with all the others on
> > archive.apache.org).
> >
> > [1]: https://s.apache.org/UEPy
> > [2]: https://issues.apache.org/jira/browse/HBASE-22171
> >
>


Re: precommit is producing some unnecessary pain; let's clean it up

2019-04-03 Thread Allan Yang
Thanks for pointing out, Duo. Not everyone can understand the configs for
checkstyles, for me, I have read the config but could not understand it...
the sequence was totally from my 'experience'. So what's more important is
that we should have a docs for error prone and checkstyles, which is more
friendly to developers.
Best Regards
Allan Yang


张铎(Duo Zhang)  于2019年4月4日周四 下午12:05写道:

> Hey Allan, your import order template is still incorrect, we do not treat
> java and javax specially...
>
> The rules in checkstyles are
>
> 
>value="*,org.apache.hbase.thirdparty,org.apache.hadoop.hbase.shaded"/>
>   
>   
>   
> 
>
> So the correct way is
> import static
> import all other imports
> import org.apache.hbase.thirdparty.*
> import org.apache.hadoop.hbase.shaded.*
>
> Allan Yang  于2019年4月4日周四 上午11:57写道:
>
> > Import Order is quite confusing and there is no document for it. After
> many
> > attempt, I finally figure out the right sequence and set it into my
> IDEA's
> > auto import, now it doesn't bother me anymore.
> > The right import order are:
> > import static
> > import java.*
> > import javax.*
> > import all other imports
> > import org.apache.hbase.thirdparty.*
> > import org.apache.hadoop.hbase.shaded.*
> >
> > I think we should add some docs about our error prone rules, like import
> > order, and line length limitation...
> > Best Regards
> > Allan Yang
> >
> >
> > Andrew Purtell  于2019年4月4日周四 上午10:24写道:
> >
> > > Error prone used to be an optional profile and not run by precommit by
> > > default. That is what I contributed.
> > >
> > > One of the main motivations for integrating error prone in the first
> > place
> > > was the idea we would use it to implement our own static checks for
> HBase
> > > specific coding conventions and invariants. This hasn’t happened yet.
> It
> > > might not. If this doesn’t happen the added value is marginal and we
> > should
> > > not run it during precommit, especially if it does not produce stable
> > > results.
> > >
> > > > On Apr 3, 2019, at 4:25 PM, 张铎(Duo Zhang) 
> > wrote:
> > > >
> > > > Please be patient sir, IIRC the error prone integration was proposed
> by
> > > you
> > > > and done by Mike Drob at the first place. The result is not stable
> for
> > a
> > > > long time, you can see HBASE-21895 and related issues, it does not
> > always
> > > > generate the same warnings for an unchanged file. We tried to upgrade
> > the
> > > > version of error prone, and also sort the javac warnings as the order
> > is
> > > > not stable too, but it seems that the result is still not stable
> > enough.
> > > > The AbstractTestDLS is not touched recently I believe.
> > > >
> > > > So I think we could remove the error prone check from the pre commit,
> > or
> > > > add it to the ignore list first, where it will generate a -0, not a
> -1,
> > > > just like the tests4tests check. Just a one line change in our
> > > personality
> > > > script. And maybe open an issue for the error prone project to see
> > > whether
> > > > the guys there know how to deal with these problems.
> > > >
> > > > And for the import order, I’m using eclipse too, I can share you the
> > > import
> > > > order settings. And yeah it has been customized by me so I’m not sure
> > > > whether the one in the dev-support is still valid. We should keep it
> in
> > > > sync so we do not confuse developers.
> > > >
> > > > Thanks.
> > > >
> > > > Andrew Purtell 于2019年4月4日 周四05:15写道:
> > > >
> > > >> Regarding the error-prone issues I am talking about, here is one
> > > >>
> > > >>
> > > >>
> > >
> >
> https://builds.apache.org/job/PreCommit-HBASE-Build/16578/artifact/patchprocess/diff-compile-javac-root.txt
> > > >>
> > > >> [WARNING]
> > > >>
> > >
> >
> /testptch/hbase/hbase-server/src/test/java/org/apache/hadoop/hbase/master/AbstractTestDLS.java:[654,54]
> > > >> [UnusedVariable] The parameter 'master' is never read.
> > > >>
> > > >> The submitted patch does not touch this file AbstractTestDLS and
> this
> > > javac
> > > >> warning has t

Re: precommit is producing some unnecessary pain; let's clean it up

2019-04-03 Thread Allan Yang
Import Order is quite confusing and there is no document for it. After many
attempt, I finally figure out the right sequence and set it into my IDEA's
auto import, now it doesn't bother me anymore.
The right import order are:
import static
import java.*
import javax.*
import all other imports
import org.apache.hbase.thirdparty.*
import org.apache.hadoop.hbase.shaded.*

I think we should add some docs about our error prone rules, like import
order, and line length limitation...
Best Regards
Allan Yang


Andrew Purtell  于2019年4月4日周四 上午10:24写道:

> Error prone used to be an optional profile and not run by precommit by
> default. That is what I contributed.
>
> One of the main motivations for integrating error prone in the first place
> was the idea we would use it to implement our own static checks for HBase
> specific coding conventions and invariants. This hasn’t happened yet. It
> might not. If this doesn’t happen the added value is marginal and we should
> not run it during precommit, especially if it does not produce stable
> results.
>
> > On Apr 3, 2019, at 4:25 PM, 张铎(Duo Zhang)  wrote:
> >
> > Please be patient sir, IIRC the error prone integration was proposed by
> you
> > and done by Mike Drob at the first place. The result is not stable for a
> > long time, you can see HBASE-21895 and related issues, it does not always
> > generate the same warnings for an unchanged file. We tried to upgrade the
> > version of error prone, and also sort the javac warnings as the order is
> > not stable too, but it seems that the result is still not stable enough.
> > The AbstractTestDLS is not touched recently I believe.
> >
> > So I think we could remove the error prone check from the pre commit, or
> > add it to the ignore list first, where it will generate a -0, not a -1,
> > just like the tests4tests check. Just a one line change in our
> personality
> > script. And maybe open an issue for the error prone project to see
> whether
> > the guys there know how to deal with these problems.
> >
> > And for the import order, I’m using eclipse too, I can share you the
> import
> > order settings. And yeah it has been customized by me so I’m not sure
> > whether the one in the dev-support is still valid. We should keep it in
> > sync so we do not confuse developers.
> >
> > Thanks.
> >
> > Andrew Purtell 于2019年4月4日 周四05:15写道:
> >
> >> Regarding the error-prone issues I am talking about, here is one
> >>
> >>
> >>
> https://builds.apache.org/job/PreCommit-HBASE-Build/16578/artifact/patchprocess/diff-compile-javac-root.txt
> >>
> >> [WARNING]
> >>
> /testptch/hbase/hbase-server/src/test/java/org/apache/hadoop/hbase/master/AbstractTestDLS.java:[654,54]
> >> [UnusedVariable] The parameter 'master' is never read.
> >>
> >> The submitted patch does not touch this file AbstractTestDLS and this
> javac
> >> warning has the format of an error-prone finding.
> >>
> >>
> >>> On Wed, Apr 3, 2019 at 2:11 PM Andrew Purtell 
> wrote:
> >>>
> >>> I use Eclipse. Eclipse orders imports automatically per formatter
> >>> settings. I have installed the HBase formatter from our dev-support
> into
> >>> all of the relevant workspaces. This sometimes fails to do the right
> >> thing
> >>> as far as checkstyle reporting in precommit is concerned. I have tried
> >>> moving the indicated imports around by hand when this happens and it
> >> still
> >>> complains. I should not be required to use another IDE. (I won't.)
> >> Perhaps
> >>> the Eclipse formatter definition we ship in the project needs an
> update.
> >>> (From a contributor POV this shouldn't be necessary.) It's not like I
> am
> >>> trying to get away with being sloppy.
> >>>
> >>> HBASE-15560 is one.
> >>> HBASE-22114 amplifies it by having I think some unfortunate
> interactions
> >>> between how Yetus decides what has changed and how to test it and what
> is
> >>> being attempted.
> >>>
> >>>> On Wed, Apr 3, 2019 at 1:27 PM Josh Elser  wrote:
> >>>>
> >>>> Yeah, can you share some evidence of what you've been running into,
> >>>> Andrew?
> >>>>
> >>>> The nit-picky tools are always a pain in the rear (especially when
> >>>> working across other branches) -- agree with you there. Can we help
> >>>> lessen the pain by making it more clear what to run/inspect when QA
&

[jira] [Resolved] (HBASE-22154) Facing issue with HA of HBase

2019-04-02 Thread Allan Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-22154?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allan Yang resolved HBASE-22154.

Resolution: Not A Problem

> Facing issue with HA of HBase
> -
>
> Key: HBASE-22154
> URL: https://issues.apache.org/jira/browse/HBASE-22154
> Project: HBase
>  Issue Type: Test
>Reporter: James
>Priority: Critical
>  Labels: /hbase-1.2.6.1
>
> Hi Team,
> I have set up HA Hadoop cluster and same for HBase.
> When my Active name node is going down, Stand by name node is becoming active 
> name node however as same time my backup hbase master is not becoming active 
> HMaster(Active HMaster and Region server goes down). 
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HBASE-22047) LeaseException in Scan should be retired

2019-03-12 Thread Allan Yang (JIRA)
Allan Yang created HBASE-22047:
--

 Summary: LeaseException in Scan should be retired
 Key: HBASE-22047
 URL: https://issues.apache.org/jira/browse/HBASE-22047
 Project: HBase
  Issue Type: Bug
Affects Versions: 2.1.3, 2.0.4, 2.2.0
Reporter: Allan Yang


We should retry LeaseException just like other exceptions like 
OutOfOrderScannerNextException and UnknownScannerException
Code in ClientScanner:
{code:java}
if ((cause != null && cause instanceof NotServingRegionException) ||
(cause != null && cause instanceof RegionServerStoppedException) ||
e instanceof OutOfOrderScannerNextException || e instanceof 
UnknownScannerException ||
e instanceof ScannerResetException) {
  // Pass. It is easier writing the if loop test as list of what is allowed 
rather than
  // as a list of what is not allowed... so if in here, it means we do not 
throw.
  if (retriesLeft <= 0) {
throw e; // no more retries
  }
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (HBASE-22043) HMaster Went down

2019-03-12 Thread Allan Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-22043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allan Yang resolved HBASE-22043.

Resolution: Not A Problem

> HMaster Went down
> -
>
> Key: HBASE-22043
> URL: https://issues.apache.org/jira/browse/HBASE-22043
> Project: HBase
>  Issue Type: Bug
>  Components: Admin
>Reporter: James
>Priority: Critical
>
> HMaster went down
> /hbase/WALs/regionserver80-XXXsplitting is non empty': Directory is 
> not empty



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: 2.0.4 to 2.2.0 testing

2019-03-08 Thread Allan Yang
Try to delete meta Znode from Zookeeper, and restart master.
Best Regards
Allan Yang


Sean Busbey  于2019年3月8日周五 上午4:37写道:

> In HBase 2 you should never delete master proc wals. unlike in earlier
> releases, it will almost certainly damage the cluster. Probably now
> you are in a known-bad state independent of whatever your earlier
> issue was. I think though we can fix it.
>
> Some baseline info:
>
> 1) Did you follow the upgrade process to go from 2.0.z to 2.2.0?
>
> I can't link directly to the section due to HBASE-22010, but it's the
> first one here:
>
> http://hbase.apache.org/book.html#_upgrade_paths
>
> 2) I think your meta issue is somethign we'll need HBCK2 to fix. so
> I'd like to work out what's not working for you there.
>
> We have not done a release of HBCK2 yet, so unfortunately you'll have
> to build it yourself. I think you've already realized that's
> non-trivial. We have, however, successfully gone through using it with
> prior releases.
>
> Can you tell me where in this high level things fell down? Or where I
> should drill in more?
>
> 2a) Get the code from the git repo:
> https://github.com/apache/hbase-operator-tools
> 2b) Build for use with the RC. It is important that you specify your
> hbase version
>
> mvn -Dhbase.version=2.2.0 package
>
> Note that since 2.2.0 hasn't been released yet, you'll need to tell
> maven to point at the staged repository posted in the VOTE. e.g. save
> this gist
>
> https://gist.github.com/busbey/ce2293e78440f060fa60aa2dcf1333f1
>
>  as ~/hbase-2.2.0rc0.settings.xml and then do
>
> mvn --settings ~/hbase-2.2.0rc0.settings.xml -Dhbase.version=2.2.0 package
>
> 2c) grab the jar from
> hbase-hbck2/target/hbase-hbck2-1.0.0-SNAPSHOT.jar and put it where you
> can access it on the cluster. let's call it in
> ~/hbase-hbck2-for-2.2.0.jar
>
> 2d) run hbck2 on the cluster to verify that you get the correct help
>
> hbase hbck -j ~/hbase-hbck2-for-2.2.0.jar
>
> 3) are there outstanding procedures? when master isn't finishing
> initialization, what does it print out about the meta region?
>
>
>
> On Thu, Mar 7, 2019 at 1:57 PM Jean-Marc Spaggiari
>  wrote:
> >
> > Sure! here it is!
> >
> > I cleaned all WALs (old, master, etc.) and it seems to be a bit more
> clean
> > now but it's stlil stuck trying to assign the META table.
> >
> > 2019-03-07 14:50:35,286 WARN  [master/node2:6:becomeActiveMaster]
> > master.HMaster: hbase:meta,,1.1588230740 is NOT online; state={1588230740
> > state=OPEN, ts=1551988229980, server=node5.distparser.com
> ,16020,1551986838747};
> > ServerCrashProcedures=false. Master startup cannot progress, in
> > holding-pattern until region onlined.
> > 2019-03-07 14:50:36,287 WARN  [master/node2:6:becomeActiveMaster]
> > master.HMaster: hbase:meta,,1.1588230740 is NOT online; state={1588230740
> > state=OPEN, ts=1551988229980, server=node5.distparser.com
> ,16020,1551986838747};
> > ServerCrashProcedures=false. Master startup cannot progress, in
> > holding-pattern until region onlined.
> > 2019-03-07 14:50:38,287 WARN  [master/node2:6:becomeActiveMaster]
> > master.HMaster: hbase:meta,,1.1588230740 is NOT online; state={1588230740
> > state=OPEN, ts=1551988229980, server=node5.distparser.com
> ,16020,1551986838747};
> > ServerCrashProcedures=false. Master startup cannot progress, in
> > holding-pattern until region onlined.
> > 2019-03-07 14:50:42,288 WARN  [master/node2:6:becomeActiveMaster]
> > master.HMaster: hbase:meta,,1.1588230740 is NOT online; state={1588230740
> > state=OPEN, ts=1551988229980, server=node5.distparser.com
> ,16020,1551986838747};
> > ServerCrashProcedures=false. Master startup cannot progress, in
> > holding-pattern until region onlined.
> > 2019-03-07 14:50:50,289 WARN  [master/node2:6:becomeActiveMaster]
> > master.HMaster: hbase:meta,,1.1588230740 is NOT online; state={1588230740
> > state=OPEN, ts=1551988229980, server=node5.distparser.com
> ,16020,1551986838747};
> > ServerCrashProcedures=false. Master startup cannot progress, in
> > holding-pattern until region onlined.
> > 2019-03-07 14:51:06,290 WARN  [master/node2:6:becomeActiveMaster]
> > master.HMaster: hbase:meta,,1.1588230740 is NOT online; state={1588230740
> > state=OPEN, ts=1551988229980, server=node5.distparser.com
> ,16020,1551986838747};
> > ServerCrashProcedures=false. Master startup cannot progress, in
> > holding-pattern until region onlined.
> > 2019-03-07 14:51:29,765 INFO
> > [ReadOnlyZKClient-latitude.distparser.com:2181@0x71707c

[jira] [Created] (HBASE-21962) Filters do not work in ThriftTable

2019-02-26 Thread Allan Yang (JIRA)
Allan Yang created HBASE-21962:
--

 Summary: Filters do not work in ThriftTable
 Key: HBASE-21962
 URL: https://issues.apache.org/jira/browse/HBASE-21962
 Project: HBase
  Issue Type: Sub-task
  Components: Thrift
Reporter: Allan Yang
Assignee: Allan Yang
 Fix For: 3.0.0, 2.2.0


Filters in ThriftTable is not working, this issue is to fix it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: [ANNOUNCE] New HBase committer Xu Cang

2019-02-08 Thread Allan Yang
Congratulations, Xu!
Best Regards
Allan Yang


Xu Cang  于2019年2月7日周四 上午3:06写道:

> Thank you all! Looking forward to contributing more and making more
> meaningful impact on our community.
>
> Xu
>
> On Wed, Feb 6, 2019 at 1:13 AM Yuhao Bi  wrote:
>
> > Congratulations!
> >
> > Pankaj kr  于2019年2月6日周三 下午4:40写道:
> >
> > > Congratulations Xu...!!
> > >
> > > Regards,
> > > Pankaj
> > >
> > >
> > > -Original Message-
> > > From: Andrew Purtell [mailto:apurt...@apache.org]
> > > Sent: 06 February 2019 04:19
> > > To: dev ; Hbase-User 
> > > Subject: [ANNOUNCE] New HBase committer Xu Cang
> > >
> > > On behalf of the Apache HBase PMC, I am pleased to announce that Xu
> Cang
> > > has accepted the PMC's invitation to become a committer on the project.
> > We
> > > appreciate all of Xu's generous contributions thus far and look forward
> > to
> > > his continued involvement.
> > >
> > > Congratulations and welcome, Xu!
> > >
> > > --
> > > Best regards,
> > > Andrew
> > >
> >
>


[jira] [Created] (HBASE-21809) Add retry thrift client for ThriftTable/Admin

2019-01-29 Thread Allan Yang (JIRA)
Allan Yang created HBASE-21809:
--

 Summary: Add retry thrift client for ThriftTable/Admin
 Key: HBASE-21809
 URL: https://issues.apache.org/jira/browse/HBASE-21809
 Project: HBase
  Issue Type: Sub-task
Reporter: Allan Yang
Assignee: Allan Yang
 Fix For: 3.0.0, 2.2.0


This is for ThriftTable/Admin to handle exceptions like connection loss.
And only available for http thrift client. For client using TSocket, it is not 
so easy to implement a retry client, maybe later.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HBASE-21754) ReportRegionStateTransitionRequest should be executed in priority executor

2019-01-22 Thread Allan Yang (JIRA)
Allan Yang created HBASE-21754:
--

 Summary: ReportRegionStateTransitionRequest should be executed in 
priority executor
 Key: HBASE-21754
 URL: https://issues.apache.org/jira/browse/HBASE-21754
 Project: HBase
  Issue Type: Bug
Affects Versions: 2.0.4, 2.1.2
Reporter: Allan Yang
Assignee: Allan Yang


Now, ReportRegionStateTransitionRequest is executed in default handler, only 
region of system table is executed in priority handler. That is because we have 
only two kinds of handler default and priority in master(replication handler is 
for replication specifically), if the transition report for all region is 
executed in priority handler, there is a dead lock situation that other 
regions' transition report take all handler and need to update meta, but meta 
region is not able to report online since all handler is taken(addressed in the 
comments of MasterAnnotationReadingPriorityFunction).

But there is another dead lock case that user's DDL requests (or other sync op 
like moveregion) take over all default handlers, making region transition 
report is not possible, thus those sync ops can't complete either. A simple UT 
provided in the patch shows this case.

To resolve this problem, I added a new metaTransitionExecutor to execute meta 
region transition report only, and all the other region's report are executed 
in priority handlers, separating them from user's requests.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (HBASE-14223) Meta WALs are not cleared if meta region was closed and RS aborts

2019-01-21 Thread Allan Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-14223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allan Yang resolved HBASE-14223.

Resolution: Fixed

> Meta WALs are not cleared if meta region was closed and RS aborts
> -
>
> Key: HBASE-14223
> URL: https://issues.apache.org/jira/browse/HBASE-14223
> Project: HBase
>  Issue Type: Bug
>Reporter: Enis Soztutar
>Assignee: Enis Soztutar
>Priority: Major
> Fix For: 3.0.0, 1.5.0, 2.2.0
>
> Attachments: HBASE-14223logs, hbase-14223_v0.patch, 
> hbase-14223_v1-branch-1.patch, hbase-14223_v2-branch-1.patch, 
> hbase-14223_v3-branch-1.patch, hbase-14223_v3-branch-1.patch, 
> hbase-14223_v3-master.patch
>
>
> When an RS opens meta, and later closes it, the WAL(FSHlog) is not closed. 
> The last WAL file just sits there in the RS WAL directory. If RS stops 
> gracefully, the WAL file for meta is deleted. Otherwise if RS aborts, WAL for 
> meta is not cleaned. It is also not split (which is correct) since master 
> determines that the RS no longer hosts meta at the time of RS abort. 
> From a cluster after running ITBLL with CM, I see a lot of {{-splitting}} 
> directories left uncleaned: 
> {code}
> [root@os-enis-dal-test-jun-4-7 cluster-os]# sudo -u hdfs hadoop fs -ls 
> /apps/hbase/data/WALs
> Found 31 items
> drwxr-xr-x   - hbase hadoop  0 2015-06-05 01:14 
> /apps/hbase/data/WALs/hregion-58203265
> drwxr-xr-x   - hbase hadoop  0 2015-06-05 07:54 
> /apps/hbase/data/WALs/os-enis-dal-test-jun-4-1.openstacklocal,16020,1433489308745-splitting
> drwxr-xr-x   - hbase hadoop  0 2015-06-05 09:28 
> /apps/hbase/data/WALs/os-enis-dal-test-jun-4-1.openstacklocal,16020,1433494382959-splitting
> drwxr-xr-x   - hbase hadoop  0 2015-06-05 10:01 
> /apps/hbase/data/WALs/os-enis-dal-test-jun-4-1.openstacklocal,16020,1433498252205-splitting
> ...
> {code}
> The directories contain WALs from meta: 
> {code}
> [root@os-enis-dal-test-jun-4-7 cluster-os]# sudo -u hdfs hadoop fs -ls 
> /apps/hbase/data/WALs/os-enis-dal-test-jun-4-5.openstacklocal,16020,1433466904285-splitting
> Found 2 items
> -rw-r--r--   3 hbase hadoop 201608 2015-06-05 03:15 
> /apps/hbase/data/WALs/os-enis-dal-test-jun-4-5.openstacklocal,16020,1433466904285-splitting/os-enis-dal-test-jun-4-5.openstacklocal%2C16020%2C1433466904285..meta.1433470511501.meta
> -rw-r--r--   3 hbase hadoop  44420 2015-06-05 04:36 
> /apps/hbase/data/WALs/os-enis-dal-test-jun-4-5.openstacklocal,16020,1433466904285-splitting/os-enis-dal-test-jun-4-5.openstacklocal%2C16020%2C1433466904285..meta.1433474111645.meta
> {code}
> The RS hosted the meta region for some time: 
> {code}
> 2015-06-05 03:14:28,692 INFO  [PostOpenDeployTasks:1588230740] 
> zookeeper.MetaTableLocator: Setting hbase:meta region location in ZooKeeper 
> as os-enis-dal-test-jun-4-5.openstacklocal,16020,1433466904285
> ...
> 2015-06-05 03:15:17,302 INFO  
> [RS_CLOSE_META-os-enis-dal-test-jun-4-5:16020-0] regionserver.HRegion: Closed 
> hbase:meta,,1.1588230740
> {code}
> In between, a WAL is created: 
> {code}
> 2015-06-05 03:15:11,707 INFO  
> [RS_OPEN_META-os-enis-dal-test-jun-4-5:16020-0-MetaLogRoller] wal.FSHLog: 
> Rolled WAL 
> /apps/hbase/data/WALs/os-enis-dal-test-jun-4-5.openstacklocal,16020,1433466904285/os-enis-dal-test-jun-4-5.openstacklocal%2C16020%2C1433466904285..meta.1433470511501.meta
>  with entries=385, filesize=196.88 KB; new WAL 
> /apps/hbase/data/WALs/os-enis-dal-test-jun-4-5.openstacklocal,16020,1433466904285/os-enis-dal-test-jun-4-5.openstacklocal%2C16020%2C1433466904285..meta.1433474111645.meta
> {code}
> When CM killed the region server later master did not see these WAL files: 
> {code}
> ./hbase-hbase-master-os-enis-dal-test-jun-4-3.log:2015-06-05 03:36:46,075 
> INFO  [MASTER_SERVER_OPERATIONS-os-enis-dal-test-jun-4-3:16000-0] 
> master.SplitLogManager: started splitting 2 logs in 
> [hdfs://os-enis-dal-test-jun-4-1.openstacklocal:8020/apps/hbase/data/WALs/os-enis-dal-test-jun-4-5.openstacklocal,16020,1433466904285-splitting]
>  for [os-enis-dal-test-jun-4-5.openstacklocal,16020,1433466904285]
> ./hbase-hbase-master-os-enis-dal-test-jun-4-3.log:2015-06-05 03:36:47,300 
> INFO  [main-EventThread] wal.WALSplitter: Archived processed log 
> hdfs://os-enis-dal-test-jun-4-1.openstacklocal:8020/apps/hbase/data/WALs/os-enis-dal-test-jun-4-5.openstacklocal,16020,1433466904285-splitting/os-enis-dal-test-jun-4-5.openstacklocal%2C16020%2C1433466904285.default.1433475074436
>  to 
> hdfs://os-enis-dal-test-jun-4-1.openstacklocal:8020/apps/hbas

Re: [ANNOUNCE] Please welcome Peter Somogyi to the HBase PMC

2019-01-21 Thread Allan Yang
Congratulations Peter!
Best Regards
Allan Yang


Pankaj kr  于2019年1月22日周二 上午9:49写道:

>
> Congratulations Peter...!!!
>
> Regards,
> Pankaj
>
> --
> Pankaj Kumar
> M: +91-9535197664(India Contact Number)
> E: pankaj...@huawei.com<mailto:pankaj...@huawei.com>
> 2012实验室-班加罗尔研究所IT&Cloud BU分部
> 2012 Laboratories-IT&Cloud BU Branch Dept.HTIPL
> From:Duo Zhang 
> To:HBase Dev List ;hbase-user  >
> Date:2019-01-22 07:06:43
> Subject:[ANNOUNCE] Please welcome Peter Somogyi to the HBase PMC
>
> On behalf of the Apache HBase PMC I am pleased to announce that Peter
> Somogyi
> has accepted our invitation to become a PMC member on the Apache HBase
> project.
> We appreciate Peter stepping up to take more responsibility in the HBase
> project.
>
> Please join me in welcoming Peter to the HBase PMC!
>


[jira] [Created] (HBASE-21751) WAL create fails during region open may cause region assign forever fail

2019-01-21 Thread Allan Yang (JIRA)
Allan Yang created HBASE-21751:
--

 Summary: WAL create fails during region open may cause region 
assign forever fail
 Key: HBASE-21751
 URL: https://issues.apache.org/jira/browse/HBASE-21751
 Project: HBase
  Issue Type: Bug
Affects Versions: 2.0.4, 2.1.2
Reporter: Allan Yang
Assignee: Allan Yang
 Fix For: 2.2.0, 2.1.3, 2.0.5


During the first region opens on the RS, WALFactory will create a WAL file, but 
if the wal creation fails, in some cases, HDFS will leave a empty file in the 
dir(e.g. disk full, file is created succesfully but block allocation fails). We 
have a check in AbstractFSWAL that if WAL belong to the same factory exists, 
then a error will be throw. Thus, the region can never be open on this RS later.
{code:java}
2019-01-17 02:15:53,320 ERROR [RS_OPEN_META-regionserver/server003:16020-0] 
handler.OpenRegionHandler(301): Failed open of region=hbase:meta,,1.1588230740
java.io.IOException: Target WAL already exists within directory 
hdfs://cluster/hbase/WALs/server003.hbase.hostname.com,16020,1545269815888
at 
org.apache.hadoop.hbase.regionserver.wal.AbstractFSWAL.(AbstractFSWAL.java:382)
at 
org.apache.hadoop.hbase.regionserver.wal.AsyncFSWAL.(AsyncFSWAL.java:210)
at 
org.apache.hadoop.hbase.wal.AsyncFSWALProvider.createWAL(AsyncFSWALProvider.java:72)
at 
org.apache.hadoop.hbase.wal.AsyncFSWALProvider.createWAL(AsyncFSWALProvider.java:47)
at 
org.apache.hadoop.hbase.wal.AbstractFSWALProvider.getWAL(AbstractFSWALProvider.java:138)
at 
org.apache.hadoop.hbase.wal.AbstractFSWALProvider.getWAL(AbstractFSWALProvider.java:57)
at org.apache.hadoop.hbase.wal.WALFactory.getWAL(WALFactory.java:264)
at 
org.apache.hadoop.hbase.regionserver.HRegionServer.getWAL(HRegionServer.java:2085)
at 
org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.openRegion(OpenRegionHandler.java:284)
at 
org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.process(OpenRegionHandler.java:108)
at 
org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:104)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1147)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:622)
at java.lang.Thread.run(Thread.java:834)
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: About how features are integrated to different HBase versions

2019-01-20 Thread Allan Yang
{quote}
For example, you release 2.1.0 yesterday, 1.5.0 today, and then 2.1.1
tomorrow, it is OK that 2.1.0 does not have all feature which 1.5.0 has,
but 2.1.1 should have all features which 1.5.0 has.
{quote}
I don't think can work, normal user mostly won't care about the release
time, they only know 2.1.1 > 2.1.0 > 1.5.0. They will think higher version
includes all the feature in lower version.

I don't get the point why we are now still backporting new features from
branch-2 to branch-1. Yes, there is many 1.x cluster in production, so we
need to release 1.x versions to fix bugs and keep them stable, as the
stable point is still in 1.x.
And at the same time, we should try to move on to 2.x, making branch-1 as a
bugfix branch for sometime before deprecating it. As far as I see, branch-1
is still very 'active', too active I think.
If we stop backport features from branch-2 to branch-1, then there is no
problem, IMHO.
Best Regards
Allan Yang


Andrew Purtell  于2019年1月20日周日 上午5:27写道:

> As branch RM for branch-1 I will always check to make sure a commit there
> has first been committed to branch-2. There will always be an upgrade path
> from a branch-1 based release to a branch-2 based release. The relevant
> JIRAs will either have a 1.x and 2.x fixVersion or the backport JIRA will
> be linked to the one for the branch-2 commit. When making the release notes
> we will be looking at these things (or should, anyway). We can update the
> upgrade paths documentation whenever we find this kind of linkage. Perhaps
> we can describe this for future RMs in the how to release section of the
> doc? Does this satisfy the concerns?
>
> > On Jan 18, 2019, at 11:47 PM, Sean Busbey  wrote:
> >
> > I agree with Andrew that we can't both have maintenance releases and
> expect
> > every feature in ongoing branch-1 releases to be in branches-2.y.
> >
> > Tracking consideration for when features are available across major
> > versions fits in well with the "upgrade paths" section in the ref guide.
> >
> > We've just gotten in the habit of it only getting filled in when a big
> > release is coming up.
> >
> >
> >
> >> On Fri, Jan 18, 2019, 23:46 张铎(Duo Zhang)  >>
> >> Then we must have a upgrade path, for example, 1.5.x can only be
> upgraded
> >> to 2.2.x if you want all the features still there?
> >>
> >> Maybe we should have a release timeline for the first release of all the
> >> minor releases? So when user want to upgrade, they can choose the minor
> >> release which is released later than the current one.
> >>
> >> Andrew Purtell 于2019年1月19日 周六13:15写道:
> >>
> >>> Also I think branch-1 releases will be done on a monthly cadence
> >>> independent of any branch-2 releases. This is because there are
> different
> >>> RMs at work with different needs and schedules.
> >>>
> >>> I can certainly help out some with branch-2 releasing if you need it,
> >>> FWIW.
> >>>
> >>> It may also help if we begin talking about 1.x and 2.x as separate
> >>> "products". This can help avoid confusion about features in 1.5 not in
> >> 2.1
> >>> but in 2.2. For all practical purposes they are separate products. Some
> >> of
> >>> our community develop and run branch-1. Others develop and run
> branch-2.
> >>> There is some overlap but the overlap is not total. The concerns will
> >>> diverge a bit. I think this is healthy. Everyone is attending to what
> >> they
> >>> need. Let's figure out how to make it work.
> >>>
> >>>> On Jan 18, 2019, at 9:04 PM, Andrew Purtell  >
> >>> wrote:
> >>>>
> >>>> Also please be prepared to support forward evolution and maintenance
> of
> >>> branch-1 for, potentially, years. Because it is used in production and
> >> will
> >>> continue to do so for a long time. Features may end up in 1.6.0 that
> only
> >>> appear in 2.3 or 2.4. And in 1.7 that only appear in 2.5 or 2.6. This
> >>> shouldn't be confusing. We just need to document it. JIRA helps some,
> >>> release notes can help a lot more. Maybe in the future a feature to
> >> version
> >>> matrix in the book.
> >>>>
> >>>>> On Jan 18, 2019, at 8:59 PM, Andrew Purtell <
> andrew.purt...@gmail.com
> >>>
> >>> wrote:
> >>>>>
> >>>>> This can't work, because we can put things into a new minor that
> >> cannot
&

Re: Nice article on one of our own....

2019-01-09 Thread Allan Yang
Nice job, Duo!
Best Regards
Allan Yang


Guanghao Zhang  于2019年1月10日周四 上午10:26写道:

> Congratulations!
>
> Reid Chan  于2019年1月10日周四 上午10:24写道:
>
> > Thumbs up & clap!
> >
> >
> >
> >
> > --
> >
> > Best regards,
> > R.C
> >
> >
> >
> >
> >
> > 
> > From: Stack 
> > Sent: 10 January 2019 02:41
> > To: HBase Dev List
> > Subject: Nice article on one of our own
> >
> > See #3 in list of top 5 Apache committers:
> > https://www.cbronline.com/feature/apache-top-5
> > S
> >
>


[jira] [Reopened] (HBASE-21652) Refactor ThriftServer making thrift2 server inherited from thrift1 server

2019-01-09 Thread Allan Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allan Yang reopened HBASE-21652:


> Refactor ThriftServer making thrift2 server inherited from thrift1 server
> -
>
> Key: HBASE-21652
> URL: https://issues.apache.org/jira/browse/HBASE-21652
> Project: HBase
>  Issue Type: Sub-task
>    Reporter: Allan Yang
>    Assignee: Allan Yang
>Priority: Major
> Fix For: 3.0.0, 2.2.0
>
> Attachments: HBASE-21652.addendum.patch, HBASE-21652.branch-2.patch, 
> HBASE-21652.patch, HBASE-21652.v2.patch, HBASE-21652.v3.patch, 
> HBASE-21652.v4.patch, HBASE-21652.v5.patch, HBASE-21652.v6.patch, 
> HBASE-21652.v7.patch
>
>
> Except the different protocol, thrift2 Server should have no much difference 
> from thrift1 Server.  So refactoring the thrift server, making thrift2 server 
> inherit from thrift1 server. Getting rid of many duplicated code.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: [VOTE] Second release candidate for HBase 2.1.2 is available for download

2019-01-07 Thread Allan Yang
+1 (binding)

-- Build from src - OK

-- SHA and Signatures(both src and bin) - OK

-- start in distributed mode - OK

-- Basic shell commands[create/drop/alter/get/put/scan] - OK

-- Check Web UI - OK

-- ITBLL 1B row for 1 run - OK
Best Regards
Allan Yang


Peter Somogyi  于2019年1月8日周二 上午12:00写道:

> +1 (non-binding)
>
> Checksum, signature: OK
> Build from source: OK
> Unit tests: OK
> LTT 1M rows: OK
> Basic shell commands: OK
> Apache Rat: OK
>
> Thanks,
> Peter
>
> On Mon, Jan 7, 2019 at 9:20 AM Allan Yang  wrote:
>
> > OK, there is a 2.1.2 pending, will test the RC and come back soon.
> > Best Regards
> > Allan Yang
> >
> >
> > 张铎(Duo Zhang)  于2019年1月7日周一 下午12:12写道:
> >
> > > +1(binding)
> > >
> > > Built from src: OK
> > > Checked sums & sigs: All matched
> > > Run all UTs(jdk8u151): Fine. The same with testing 2.0.4RC1. With
> > > -PrunAllTests several UTs failed, and then passed when running
> > sequentially
> > > Started a 5 nodes cluster: OK, the master UI is fine
> > > Run basic shell cmds: OK
> > > Run LTT with 1M rows: Read & Write, both OK, and then major compact the
> > > table, and execute the count command in shell, also 1M rows returned.
> > >
> > > Stack  于2019年1月3日周四 上午9:23写道:
> > >
> > > > The second release candidate for HBase 2.1.2 is available for
> download:
> > > >
> > > > * https://dist.apache.org/repos/dist/dev/hbase/hbase-2.1.2RC1/
> > > > <https://dist.apache.org/repos/dist/dev/hbase/hbase-2.1.2RC1/>*
> > > >
> > > > Maven artifacts are also available in a staging repository at:
> > > >
> > > >
> > https://repository.apache.org/content/repositories/orgapachehbase-1247
> > > >
> > > > Artifacts are signed with my key (DB9D313DA7874F29) published in our
> > > > KEYS file at http://www.apache.org/dist/hbase/KEYS
> > > >
> > > > The RC corresponds to the signed tag 2.1.2RC1, which currently points
> > > > to commit
> > > >
> > > >   1dfc418f77801fbfb59a125756891b9100c1fc6d
> > > >
> > > > HBase 2.1.2 is the third maintenance release in the HBase 2.0 line,
> > > > continuing on the theme of bringing a stable, reliable database to
> > > > the Hadoop and NoSQL communities. It fixes a critical issue found
> > > > in the recent 2.0.3 and 2.1.1 releases (only), HBASE-21551. 2.1.2
> > > > includes ~70 bug and improvement fixes done since the 2.1.1,
> > > > ~ eight weeks ago.
> > > >
> > > > The detailed source and binary compatibility report vs 2.1.1 has been
> > > > published for your review, at:
> > > >
> > > >  https://dist.apache.org/repos/dist/dev/hbase/hbase-2.1.2RC1
> > > > /compatibility_report_2.1.1vs2.1.2.html
> > > >
> > > > The report shows no incompatibilities.
> > > >
> > > > The full list of fixes included in this release is available in
> > > > the CHANGES.md that ships as part of the release also available
> > > > here:
> > > >
> > > >
> > https://dist.apache.org/repos/dist/dev/hbase/hbase-2.1.2RC1/CHANGES.md
> > > >
> > > > The RELEASENOTES.md are here:
> > > >
> > > >
> > > >
> > >
> >
> https://dist.apache.org/repos/dist/dev/hbase/hbase-2.1.2RC1/RELEASENOTES.md
> > > >
> > > > Please try out this candidate and vote +1/-1 on whether we should
> > > > release these artifacts as HBase 2.1.2.
> > > >
> > > > The VOTE will remain open for at least 72 hours. Given sufficient
> votes
> > > > I would like to close it on Sunday, January 6th, 2018.
> > > >
> > > > Thanks,
> > > > S
> > > > P.S. Happy New Year!
> > > >
> > >
> >
>


Re: [VOTE] Second release candidate for HBase 2.1.2 is available for download

2019-01-07 Thread Allan Yang
OK, there is a 2.1.2 pending, will test the RC and come back soon.
Best Regards
Allan Yang


张铎(Duo Zhang)  于2019年1月7日周一 下午12:12写道:

> +1(binding)
>
> Built from src: OK
> Checked sums & sigs: All matched
> Run all UTs(jdk8u151): Fine. The same with testing 2.0.4RC1. With
> -PrunAllTests several UTs failed, and then passed when running sequentially
> Started a 5 nodes cluster: OK, the master UI is fine
> Run basic shell cmds: OK
> Run LTT with 1M rows: Read & Write, both OK, and then major compact the
> table, and execute the count command in shell, also 1M rows returned.
>
> Stack  于2019年1月3日周四 上午9:23写道:
>
> > The second release candidate for HBase 2.1.2 is available for download:
> >
> > * https://dist.apache.org/repos/dist/dev/hbase/hbase-2.1.2RC1/
> > <https://dist.apache.org/repos/dist/dev/hbase/hbase-2.1.2RC1/>*
> >
> > Maven artifacts are also available in a staging repository at:
> >
> >  https://repository.apache.org/content/repositories/orgapachehbase-1247
> >
> > Artifacts are signed with my key (DB9D313DA7874F29) published in our
> > KEYS file at http://www.apache.org/dist/hbase/KEYS
> >
> > The RC corresponds to the signed tag 2.1.2RC1, which currently points
> > to commit
> >
> >   1dfc418f77801fbfb59a125756891b9100c1fc6d
> >
> > HBase 2.1.2 is the third maintenance release in the HBase 2.0 line,
> > continuing on the theme of bringing a stable, reliable database to
> > the Hadoop and NoSQL communities. It fixes a critical issue found
> > in the recent 2.0.3 and 2.1.1 releases (only), HBASE-21551. 2.1.2
> > includes ~70 bug and improvement fixes done since the 2.1.1,
> > ~ eight weeks ago.
> >
> > The detailed source and binary compatibility report vs 2.1.1 has been
> > published for your review, at:
> >
> >  https://dist.apache.org/repos/dist/dev/hbase/hbase-2.1.2RC1
> > /compatibility_report_2.1.1vs2.1.2.html
> >
> > The report shows no incompatibilities.
> >
> > The full list of fixes included in this release is available in
> > the CHANGES.md that ships as part of the release also available
> > here:
> >
> >  https://dist.apache.org/repos/dist/dev/hbase/hbase-2.1.2RC1/CHANGES.md
> >
> > The RELEASENOTES.md are here:
> >
> >
> >
> https://dist.apache.org/repos/dist/dev/hbase/hbase-2.1.2RC1/RELEASENOTES.md
> >
> > Please try out this candidate and vote +1/-1 on whether we should
> > release these artifacts as HBase 2.1.2.
> >
> > The VOTE will remain open for at least 72 hours. Given sufficient votes
> > I would like to close it on Sunday, January 6th, 2018.
> >
> > Thanks,
> > S
> > P.S. Happy New Year!
> >
>


Re: [VOTE] The second release candidate for HBase 2.0.4 is available

2019-01-02 Thread Allan Yang
+1 (binding)

-- Build from src - OK

-- SHA and Signatures(both src and bin) - OK

-- start in distributed mode - OK

-- Basic shell commands[create/drop/alter/get/put/scan] - OK

-- Check Web UI - OK

-- ITBLL 1B row for 1 run - OK


Best Regards
Allan Yang


Stack  于2019年1月3日周四 上午1:19写道:

> On Sun, Dec 30, 2018 at 5:46 PM 张铎(Duo Zhang) 
> wrote:
>
> > +1(binding)
> >
> > Built from src: OK
> > Checked sums & sigs: All matched
> > Run all UTs(jdk8u151): Basically fine. TestExportSnapshot related UTs
> > always timeout under the runAllTests profile, but can pass if I remove
> the
> > profile and run them sequentially in the hbase-mapreduce module, maybe we
> > need to split them?
> >
>
> Filed HBASE-21666 to break up the tests. Thanks for trying 2.0.4RC1 Duo.
> S
>
>
>
>
> > Started a 5 nodes cluster: OK, the master UI is fine
> > Run basic shell cmds: OK, create, disable, drop, and also snapshot and
> > clone_snapshot
> > Run LTT with 1M rows: Read & Write, both OK, and then execute the count
> > command in shell, also 1M rows returned.
> >
> > Stack  于2018年12月30日周日 上午12:45写道:
> >
> > > The second release candidate for HBase 2.0.4 is available for download:
> > >
> > >  *https://dist.apache.org/repos/dist/dev/hbase/hbase-2.0.4RC1/
> > > <https://dist.apache.org/repos/dist/dev/hbase/hbase-2.0.4RC1/>*
> > >
> > > Maven artifacts are also available in a staging repository at:
> > >
> > >
> https://repository.apache.org/content/repositories/orgapachehbase-1246
> > >
> > > Artifacts are signed with my key (DB9D313DA7874F29) published in our
> > > KEYS file at http://www.apache.org/dist/hbase/KEYS
> > >
> > > The RC corresponds to the signed tag 2.0.4RC1, which currently points
> > > to commit
> > >
> > >   205e39c5704bf38568b34926dde9f1ee76e6b5d0
> > >
> > > HBase 2.0.4 is the fourth maintenance release in the HBase 2.0 line,
> > > continuing on the theme of bringing a stable, reliable database to
> > > the Hadoop and NoSQL communities. It fixes a critical issue found
> > > in the recent 2.0.3 and 2.1.1 releases (only), HBASE-21551. 2.0.4
> > > includes ~31 bug and improvement fixes done since the 2.0.3,
> > > ~ a month ago.
> > >
> > > The detailed source and binary compatibility report vs 2.0.3 has been
> > > published for your review, at:
> > >
> > >
> > >
> > >
> >
> https://dist.apache.org/repos/dist/dev/hbase/hbase-2.0.4RC1/compatibility_report_2.0.3vs2.0.4.html
> > >
> > > The report shows no incompatibilities.
> > >
> > > The full list of fixes included in this release is available in
> > > the CHANGES.md that ships as part of the release also available
> > > here:
> > >
> > >
> https://dist.apache.org/repos/dist/dev/hbase/hbase-2.0.4RC1/CHANGES.md
> > >
> > > The RELEASENOTES.md are here:
> > >
> > >
> > >
> >
> https://dist.apache.org/repos/dist/dev/hbase/hbase-2.0.4RC1/RELEASENOTES.md
> > >
> > > Please try out this candidate and vote +1/-1 on whether we should
> > > release these artifacts as HBase 2.0.4.
> > >
> > > The VOTE will remain open for at least 72 hours. It'll probably take
> > longer
> > > given
> > > its the holidays. I'll close the vote when sufficient votes after
> > Tuesday,
> > > January
> > > 1st, 2019.
> > >
> > > Thanks,
> > > S
> > >
> >
>


[jira] [Created] (HBASE-21661) Provide Thrift2 implementation of Table/Admin

2018-12-29 Thread Allan Yang (JIRA)
Allan Yang created HBASE-21661:
--

 Summary: Provide Thrift2 implementation of Table/Admin
 Key: HBASE-21661
 URL: https://issues.apache.org/jira/browse/HBASE-21661
 Project: HBase
  Issue Type: Sub-task
 Environment: Provide Thrift2 implementation of Table/Admin, making 
Java user to use thrift client more easily(Some environment which can not 
expose ZK or RS Servers directly require thrift or REST protocol even using 
Java). 
Another Example of this is RemoteHTable and RemoteAdmin, they are REST 
connectors.
Reporter: Allan Yang
Assignee: Allan Yang






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HBASE-21652) Refactor ThriftServer making thrift2 server to support both thrift1 and thrift2 protocol

2018-12-27 Thread Allan Yang (JIRA)
Allan Yang created HBASE-21652:
--

 Summary: Refactor ThriftServer making thrift2 server to support 
both thrift1 and thrift2 protocol
 Key: HBASE-21652
 URL: https://issues.apache.org/jira/browse/HBASE-21652
 Project: HBase
  Issue Type: Sub-task
Reporter: Allan Yang
Assignee: Allan Yang


Except the different protocol, thrift2 Server should have no much difference 
from thrift1 Server.  So refactoring the thrift server, making thrift2 server 
inherit from thrift1 server. Getting rid of many duplicated code, making 
thrift2 server can serve thrift1 protocol tt the same time.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HBASE-21650) Add DDL operation and some other miscellaneous to thrift2

2018-12-26 Thread Allan Yang (JIRA)
Allan Yang created HBASE-21650:
--

 Summary: Add DDL operation and some other miscellaneous to thrift2
 Key: HBASE-21650
 URL: https://issues.apache.org/jira/browse/HBASE-21650
 Project: HBase
  Issue Type: Sub-task
Reporter: Allan Yang
Assignee: Allan Yang
 Fix For: 3.0.0, 2.2.0






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HBASE-21649) Complete Thrift2 to supersede Thrift1

2018-12-26 Thread Allan Yang (JIRA)
Allan Yang created HBASE-21649:
--

 Summary: Complete Thrift2 to supersede Thrift1
 Key: HBASE-21649
 URL: https://issues.apache.org/jira/browse/HBASE-21649
 Project: HBase
  Issue Type: Umbrella
Reporter: Allan Yang
Assignee: Allan Yang
 Fix For: 3.0.0, 2.2.0


Thrift1 and Thrift2 coexists in our project for a very long time. Functionality 
is more complete in thrift1 but its interface design is bad for adding new 
features(so we have get(), getVer(),getVerTs,getRowWithColumns() and so many 
other methods for a single get request, this is bad). Thrift2 has a more clean 
interface and structure definition, making our user more easy to use. But, it 
has not been updated for a long time, lacking of DDL method is a major 
weakness. 

I think we should complete Thrift2 and supersede Thrift1, making Thrift2 as the 
standard multi language definition. This is a umbrella issue to make it happen. 
The plan would be:
1. complete the DDL interface of thrift2
2. Making thrift2 server can handle thrift1 requests, user don't have to choose 
which thrift server they need to start
3. deprecate thrift1



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: [VOTE] First release candidate for HBase 2.0.4 is available hbase

2018-12-12 Thread Allan Yang
This UT is written by me. @Andrew, can you upload the output? I can't
reproduce it form my laptop. I'm also  using Mac OS High
Sierra, with jdk1.8.0_19.
Sorry for the inconvenience.
Best Regards
Allan Yang


Andrew Purtell  于2018年12月12日周三 上午2:56写道:

> TestCleanupMetaWAL#testCleanupMetaWAL fails for me consistently, both for
> 2.0.4 and 2.1.2 RCs. Mac OS High Sierra, OpenJDK Runtime Environment (Zulu
> 8.30.0.2-macosx) (build 1.8.0_172-b01)
> java.lang.AssertionError: Waiting timed out after [10,000] msec
> at
>
> org.apache.hadoop.hbase.regionserver.TestCleanupMetaWAL.testCleanupMetaWAL(TestCleanupMetaWAL.java:70)
>
> Output of failed test run is available upon request if you can't reproduce.
>
>
> On Fri, Dec 7, 2018 at 11:21 AM Stack  wrote:
>
> > The first release candidate for HBase 2.0.4 is available for download:
> >
> >  *https://dist.apache.org/repos/dist/dev/hbase/hbase-2.0.4RC0/
> > <https://dist.apache.org/repos/dist/dev/hbase/hbase-2.0.4RC0/>*
> >
> > Maven artifacts are also available in a staging repository at:
> >
> >  https://repository.apache.org/content/repositories/orgapachehbase-1241
> >
> > Artifacts are signed with my key (DB9D313DA7874F29) published in our
> > KEYS file at http://www.apache.org/dist/hbase/KEYS
> >
> > The RC corresponds to the signed tag 2.0.4RC0, which currently points
> > to commit
> >
> >   9097821560f630dff0fb9df4b0c589ad2acb8016
> >
> > HBase 2.0.4 is the fourth maintenance release in the HBase 2.0 line,
> > continuing on the theme of bringing a stable, reliable database to
> > the Hadoop and NoSQL communities. It fixes a critical issue found
> > in the recent 2.0.3 and 2.1.1 releases (only), HBASE-21551. 2.0.4
> > includes ~12 bug and improvement fixes done since the 2.0.3,
> > two weeks ago.
> >
> > The detailed source and binary compatibility report vs 2.0.3 has been
> > published for your review, at:
> >
> >
> >
> >
> https://dist.apache.org/repos/dist/dev/hbase/hbase-2.0.4RC0/compatibiilty_report_2.0.3vs2.04.html
> >
> > The report shows no incompatibilities.
> >
> > The full list of fixes included in this release is available in
> > the CHANGES.md that ships as part of the release also available
> > here:
> >
> >  https://dist.apache.org/repos/dist/dev/hbase/hbase-2.0.4RC0/CHANGES.md
> >
> > The RELEASENOTES.md are here:
> >
> >
> >
> https://dist.apache.org/repos/dist/dev/hbase/hbase-2.0.4RC0/RELEASENOTES.md
> >
> > Please try out this candidate and vote +1/-1 on whether we should
> > release these artifacts as HBase 2.0.4.
> >
> > The VOTE will remain open for at least 72 hours. Given sufficient votes
> > I would like to close it on Tuesday, December 11th, 2018.
> >
> > Thanks,
> > S
> >
>
>
> --
> Best regards,
> Andrew
>
> Words like orphans lost among the crosstalk, meaning torn from truth's
> decrepit hands
>- A23, Crosstalk
>


Re: [DISCUSS] EOL branch-1.3

2018-12-08 Thread Allan Yang
+1, the less, the better.
Best Regards
Allan Yang


Sean Busbey  于2018年12月8日周六 上午10:09写道:

> correct, branch-1.2 was the stable release line prior to 1.4 becoming
> it in August.
>
> at the time I said I'd keep making monthly 1.2.z releases for ~6 months:
>
> https://s.apache.org/EYkB
>
> I wanted to give folks who heed our "this is the stable line" advice a
> cushion for planning migration once we updated it to a different
> release line.
> On Fri, Dec 7, 2018 at 7:57 PM Guanghao Zhang  wrote:
> >
> > +1. But branch-1.2 is not EOL now?
> >
> > 张铎(Duo Zhang)  于2018年12月8日周六 上午9:28写道:
> >
> > > +1.
> > >
> > > Andrew Purtell  于2018年12月8日周六 上午5:45写道:
> > >
> > > > I'm good with doing one more 1.3 release. It would be my pleasure to
> > > offer
> > > > that service to the community. I like RM-ing.
> > > >
> > > >
> > > > On Fri, Dec 7, 2018 at 12:29 PM Stack  wrote:
> > > >
> > > > > +1
> > > > >
> > > > > (Pity you have to make a release to EOL it).
> > > > >
> > > > > S
> > > > >
> > > > > On Fri, Dec 7, 2018 at 11:25 AM Andrew Purtell <
> apurt...@apache.org>
> > > > > wrote:
> > > > >
> > > > > > We haven't had a release from branch-1.3 for a long time and do
> not
> > > > > appear
> > > > > > to have an active RM for it. Unless a RM for 1.3 steps forward
> and
> > > > > promises
> > > > > > to make a release in the very near future, I propose we make one
> more
> > > > > > release of 1.3, from the head of branch-1.3, and then retire the
> > > > branch.
> > > > > If
> > > > > > this is acceptable I can RM the final 1.3 release.
> > > > > >
> > > > > > --
> > > > > > Best regards,
> > > > > > Andrew
> > > > > >
> > > > > > Words like orphans lost among the crosstalk, meaning torn from
> > > truth's
> > > > > > decrepit hands
> > > > > >- A23, Crosstalk
> > > > > >
> > > > >
> > > >
> > > >
> > > > --
> > > > Best regards,
> > > > Andrew
> > > >
> > > > Words like orphans lost among the crosstalk, meaning torn from
> truth's
> > > > decrepit hands
> > > >- A23, Crosstalk
> > > >
> > >
>


Re: [DISCUSS] No more release branches for 1.x, release from branch-1 directly

2018-12-08 Thread Allan Yang
I think Andrew's way makes sense for branch-1, we can release 1.6,1.7 and
so on directly on branch-1. Since in branch-1, we will only have bug fixes.
If we want to apply this policy to branch-2, we'd be careful when new
functions check in, because all feature releases will include this function
if we only have a branch-2. But good news is that we only need to commit
our code to one branch, for now, sometimes I need to commit to four
branches(branch-2.0, branch-2.1, branch-2 and master). And for users, they
don't have to make difficult chose about which release line they need to
use. And they don't have to consider about compatibility when upgrading a
minor version. We make sure all the Incompatible changes are only commit to
master branch(for releasing 3.x).

Best Regards
Allan Yang


Stack  于2018年12月9日周日 上午7:35写道:

> On Fri, Dec 7, 2018 at 5:59 PM Andrew Purtell  wrote:
>
> > We could do that. Or we could simply renumber branch-1 to 1.6.x at that
> > time, e.g. 1.5.whatever-SNAPSHOT -> 1.6.0-SNAPSHOT. Every release has a
> tag
> > in rel/. It is possible at any time to check out from a release tag and
> > make a branch for an additional patch release for an old minor line. If
> we
> > need to do it, we can at that time, otherwise why proliferate branches
> and
> > make more work for committers? I think for branch-1 after moving from
> > 1.5.whatever to 1.6.0 any additional 1.5.x releases would be unlikely,
> and
> > going forward for 1.6, and so on. This same policy could work for
> branch-2.
> > We shouldn't be afraid to make new minors. Prior to 1.0.0 every release
> was
> > a minor release and patch releases were rare. I think we want to get back
> > to something more like that.
> >
> > It also makes sense to have a long term stable branch. That is currently
> > branch-1.2. If in the future we want it to be 1.5, then at that time it
> > makes sense to have a separate branch-1.5 for the LTS.
> >
> >
> Let's try it.
>
> Should be easy to do on branch-1 what with a single 'owner'.
>
> branch-2 would prove a more interesting experiment. Let branch-2 be where
> we cut 2.2.0 and 2.2.1, etc., from? (We need an RM for 2.2)
>
> S
>
>
> >
> >
> > On Fri, Dec 7, 2018 at 5:54 PM 张铎(Duo Zhang) 
> > wrote:
> >
> > > If 1.5 is not the last minor release line, then how do we release 1.6?
> > Make
> > > a branch-1.5 and then start to release 1.6 from branch-1?
> > >
> > > Andrew Purtell  于2018年12月8日周六 上午9:36写道:
> > >
> > > > Yeah, for branch-1 it is no longer a development branch. Every change
> > is
> > > > going to be maintenance related. No, I don't expect 1.5 to be the
> last
> > > > minor release line for 1.x. Maybe? Maybe not. In theory we could
> treat
> > > > branch-2 the same. Master is the only development branch. That is not
> > my
> > > > proposal, though. I'm only concerned with RM activities related to
> > > > branch-1.
> > > >
> > > > On Fri, Dec 7, 2018 at 5:33 PM 张铎(Duo Zhang) 
> > > > wrote:
> > > >
> > > > > So the idea is that, if we have a newer major release line, we can
> > > > release
> > > > > the previous major releases directly from the 'developing' branch?
> > > > >
> > > > > I think for branch-1 it is fine, as we are not likely to backport
> any
> > > big
> > > > > new feature to 1.x any more. And does this mean that 1.5 is the
> last
> > > > minor
> > > > > release line for 1.x?
> > > > >
> > > > > Stack  于2018年12月8日周六 上午4:15写道:
> > > > >
> > > > > > On Fri, Dec 7, 2018 at 11:36 AM Andrew Purtell <
> > apurt...@apache.org>
> > > > > > wrote:
> > > > > >
> > > > > > > Please be advised I plan to RM the next minor release from
> > > branch-1,
> > > > > > 1.5.0,
> > > > > > > in January of 2019. Once this is done we can continue making
> > > > > maintenance
> > > > > > > releases from branch-1.4 as needed but expect that not to be
> > > > necessary
> > > > > > > after a couple of months (or perhaps even immediately).
> > > > > > >
> > > > > > > I see no need to make a branch-1.5. As community resources
> > continue
> > > > to
> > > > > > > shift away from branch-1 we need to conserve available
> > a

Re: Rolling 2.1.2 and 2.0.4

2018-12-05 Thread Allan Yang
According to @Zheng Hu, this memory leak was introduced by HBASE-20704
<https://issues.apache.org/jira/browse/HBASE-20704>, which will only affect
the recently released version 2.0.3 and 2.1.1. I also think we need to
notice those users.
Best Regards
Allan Yang


Yu Li  于2018年12月6日周四 下午12:46写道:

> Memory leak is critical enough to roll a new release, and maybe we should
> add a notice somewhere for 2.x users, like our ref guide and/or user
> mailing list? Thanks.
>
> Best Regards,
> Yu
>
>
> On Thu, 6 Dec 2018 at 12:39, Reid Chan  wrote:
>
> > +1 for rolling.
> >
> > Nice found, Zheng Hu. (y)
> >
> >
> > --
> >
> > Best regards,
> > R.C
> >
> >
> >
> > 
> > From: Stack 
> > Sent: 06 December 2018 12:03
> > To: HBase Dev List
> > Subject: Rolling 2.1.2 and 2.0.4
> >
> > 2.1.1 and 2.0.3 have an ugly bug found by our Zheng Hu. See HBASE-21551
> > Memory leak when use scan with STREAM at server side.
> >
> > S
> >
>


Re: [ANNOUNCE] Allan Yang joins the Apache HBase PMC

2018-11-29 Thread Allan Yang
Thank you all!
Best Regards
Allan Yang


张铎(Duo Zhang)  于2018年11月30日周五 上午8:35写道:

> Congratulations!
>
> Azhaku Sakthi Vel Muthu Krishnan 
> 于2018年11月30日周五 上午4:09写道:
>
> > Congrats Allan!
> >
> > Sakthi
> >
> > On Thu, Nov 29, 2018 at 12:06 PM Andrew Purtell 
> > wrote:
> >
> > > Congratulations and welcome, Allan!
> > >
> > > On Wed, Nov 28, 2018 at 8:11 AM Yu Li  wrote:
> > >
> > > > On behalf of the Apache HBase PMC I am pleased to announce that Allan
> > > Yang
> > > > has accepted our invitation to become a PMC member on the Apache
> HBase
> > > > project. We appreciate Allan stepping up to take more responsibility
> in
> > > the
> > > > HBase project.
> > > >
> > > > Please join me in welcoming Allan to the HBase PMC!
> > > >
> > > > Best Regards,
> > > > Yu
> > > >
> > >
> > >
> > > --
> > > Best regards,
> > > Andrew
> > >
> > > Words like orphans lost among the crosstalk, meaning torn from truth's
> > > decrepit hands
> > >- A23, Crosstalk
> > >
> > --
> > Sakthi
> >
>


Re: [VOTE] First release candidate for HBase 2.0.3 is available

2018-11-29 Thread Allan Yang
+1(binding)

-- Build from src - OK

-- SHA and Signatures(both src and bin) - OK

-- start in distributed mode - OK

-- Basic shell commands[create/drop/alter/get/put/scan] - OK

-- Check Web UI - OK

-- ITBLL 1B row for 1 run - OK


I think we can release 2.0.3 first then fix these flaky tests later. I
glanced at the flaky board for branch-2.1, those tests are still flaky in
branch-2.1[1], they are not the fault of 2.0.3. They should not block the
release of 2.0.3.


[1]
https://builds.apache.org/view/H-L/view/HBase/job/HBase-Find-Flaky-Tests/job/branch-2.1/lastSuccessfulBuild/artifact/dashboard.html


Best Regards
Allan Yang


Stack  于2018年11月29日周四 下午2:16写道:

> What we want to do here?
>
> Sink the RC and work on fixing a few of the flakies?
> TestMasterFailoverWithProcedures and TestMasterFailoverWithProcedures? They
> fail rarely enough up on apache [1] and our flakies list is pretty bare
> these times on branch-2.0 [1].
>
> Thanks,
> S
>
> 1.
>
> https://builds.apache.org/view/H-L/view/HBase/job/HBase-Find-Flaky-Tests/job/branch-2.0/lastSuccessfulBuild/artifact/dashboard.html
>
> On Sun, Nov 25, 2018 at 4:22 PM Stack  wrote:
>
> > The first release candidate for HBase 2.0.3 is available for download:
> >
> >  https://dist.apache.org/repos/dist/dev/hbase/hbase-2.0.3RC0/
> >
> > Maven artifacts are also available in a staging repository at:
> >
> >  https://repository.apache.org/content/repositories/orgapachehbase-1237
> >
> > Artifacts are signed with my key (DB9D313DA7874F29) published in our
> > KEYS file at http://www.apache.org/dist/hbase/KEYS
> >
> > The RC corresponds to the signed tag 2.0.3RC0, which currently points
> > to commit
> >
> >   87a3aea8ee2d284807f7d4fbdac1f6d9dfedbc17
> >
> > HBase 2.0.3 is the third maintenance release in the HBase 2.0 line,
> > continuing on the theme of bringing a stable, reliable database to
> > the Hadoop and NoSQL communities. This release includes ~120 bug
> > and improvements fixes done since the 2.0.2 release almost 3 months
> > ago.
> >
> > The detailed source and binary compatibility report vs 2.0.2 has been
> > published for your review, at:
> >
> >
> >
> https://dist.apache.org/repos/dist/dev/hbase/hbase-2.0.3RC0/compat-check-report-2.0.2-vs-2.0.3.html
> >
> > The report shows no incompatibilities.
> >
> > The full list of fixes included in this release is available in
> > the CHANGES.md that ships as part of the release also available
> > here:
> >
> >  https://dist.apache.org/repos/dist/dev/hbase/hbase-2.0.3RC0/CHANGES.md
> >
> > The RELEASENOTES.md are here:
> >
> >
> >
> https://dist.apache.org/repos/dist/dev/hbase/hbase-2.0.3RC0/RELEASENOTES.md
> >
> > Please try out this candidate and vote +1/-1 on whether we should
> > release these artifacts as HBase 2.0.3.
> >
> > The VOTE will remain open for at least 72 hours. Given sufficient votes
> > I would like to close it on Thursday, November 29th, 2018.
> >
> > Thanks,
> > S
> >
>


Re: The child procedure is scheduled in the reversed order in Procedure V2 Framework

2018-11-28 Thread Allan Yang
Yes, you are right, every procedure will be add to front, so the final
execution order may be reversed, But actually, there will be more than one
worker threads, so likely, they will be executed at the same time. I think
the design is unintentionally, since all the child procedure should be
independent and won't depend on each other, so they can be executed at any
order. And more, after HBASE-21375
<https://issues.apache.org/jira/browse/HBASE-21375> checked in all 2.x
branch, the worker thread will execute every possible procedure in the
queue, so the front ones won't block, so this won't be a problem.
Best Regards
Allan Yang


OpenInx  于2018年11月28日周三 下午10:42写道:

> Hi :
>
>  I read parts of the procedure v2 framework, and found that  if a procedure
> has  3 added child procedure,
>  then it's children will be schedued in the reversed order.
>
> Let me give an example. if  a procedure A added 3 child procedure: B, C ,
> D.
>
> a.addChildProcedure(B, C, D)
>
> The procedure framework will add the B,C,D child produre in a dequeue to
> schedule
>
> ( Code Path  --- ProcedureExecutor#execProcedure
> -- submitChildrenProcedures  -- dequeue#addFront )
>
> So the dequeue will be :(front)   D, C, B  (back)
>
> Then we will poll each procedure from the dequeue, and dispatch to the
> executor to run ..
>
> In the final,   the procedure executing order will be :  D, C, B,  which is
> the revered order  compared to the addChildProcedure order.
>
> My question is :  is it designed intentionally ?  Or unintentionally doing
> the wrong implementation ?
>
> Seems most the child procedure are region assign or unassign, looks like no
> problem now..
>
> Please correct me if I am wrong or missing something.
>
> Thanks.
>


[jira] [Reopened] (HBASE-21392) HTable can still write data after calling the close method.

2018-11-28 Thread Allan Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21392?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allan Yang reopened HBASE-21392:


> HTable can still write data after calling the close method.
> ---
>
> Key: HBASE-21392
> URL: https://issues.apache.org/jira/browse/HBASE-21392
> Project: HBase
>  Issue Type: Improvement
>  Components: Client
>Affects Versions: 1.2.0, 2.1.0, 2.0.0
> Environment: HBase 1.2.0
>Reporter: lixiaobao
>Assignee: lixiaobao
>Priority: Major
> Attachments: HBASE-21392.patch
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> HTable can still write data after calling the close method.
>  
> {code:java}
> val conn = ConnectionFactory.createConnection(conf)
> var table = conn.getTable(TableName.valueOf(tableName))
> val put = new Put(rowKey.getBytes())
> put.addColumn("cf".getBytes(), columnField.getBytes(), endTimeLong, 
> Bytes.toBytes(line.getLong(8)))
> table.put(put)
> //call table close() method
> table.close()
> //put again
> val put1 = new Put(rowKey4.getBytes())
> out1.addColumn("cf".getBytes(), columnField.getBytes(), endTimeLong, 
> Bytes.toBytes(line.getLong(8)))
> table.put(put1)
> {code}
>  
> after call close method ,can alse write data into HBase,I think this does not 
> match close logic.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: [VOTE] First release candidate for HBase 2.0.3 is available

2018-11-28 Thread Allan Yang
This two UTs seems not related to  HBASE-21423
<https://issues.apache.org/jira/browse/HBASE-21423> ...
Best Regards
Allan Yang


张铎(Duo Zhang)  于2018年11月28日周三 下午4:04写道:

> This time they are TestVerifyReplication and TestMobSecureExportSnapshot...
>
> Allan Yang  于2018年11月28日周三 下午3:53写道:
>
> > I run TestMasterFailoverWithProcedures on my Mac for about 10 times,
> only 1
> > time failed. Agree with Peter in HBASE-21518 that the failure is because
> of
> > shutting down minicluster is timeout. is It blocking the release here? I
> > think we can release 2.0.3 if we have enough +1, and resolve this in
> > HBASE-21518 later.
> > @Duo Zhang, FYI, can you post what test is flaky? in 2.0.3, I have
> > HBASE-21423 <https://issues.apache.org/jira/browse/HBASE-21423>  checked
> > in, which will add an additional worker thread for meta table. If the
> test
> > requires only one worker thread(like testAndDoubleExecution), this
> > issue may make it flaky. I have disabled this feature for some of the
> test
> > in HBASE-21468 <https://issues.apache.org/jira/browse/HBASE-21468>.
> Maybe
> > there is more.
> > Best Regards
> > Allan Yang
> >
> >
> > Peter Somogyi  于2018年11月28日周三 上午5:58写道:
> >
> > > On my MacBook Pro this test failed 8 out of 20 runs. I made some
> analysis
> > > over HBASE-21518 and for me it looks like shutdown race condition.
> > >
> > > On Tue, Nov 27, 2018 at 6:04 PM Stack  wrote:
> > >
> > > > Thanks boys.
> > > >
> > > > I was going off the nightly [1] and the flakey list [2]. They didn't
> > look
> > > > too bad. Shout which tests are causing you issue and I can take a
> look.
> > > > TestMasterFailoverWithProcedures seems like it fails on occasion, but
> > > only
> > > > on the gce runs at < 20% of time. You see different Peter?
> > > >
> > > > Thanks,
> > > > St.Ack
> > > >
> > > > 1. https://bu
> > > >
> > ilds.apache.org/view/H-L/view/HBase/job/HBase%20Nightly/job/branch-2.0/
> > > > 2.
> > > >
> > > >
> > >
> >
> https://builds.apache.org/view/H-L/view/HBase/job/HBase-Find-Flaky-Tests/job/branch-2.0/lastSuccessfulBuild/artifact/dashboard.html
> > > >
> > > > On Tue, Nov 27, 2018 at 7:49 AM Peter Somogyi 
> > > wrote:
> > > >
> > > > > I'm also facing test issues. TestMasterFailoverWithProcedures is
> > > failing
> > > > > frequently on my machine and can be found on the flaky dashboard
> for
> > > > > multiple branches.  Created HBASE-21518 issue for this.
> > > > >
> > > > > On Tue, Nov 27, 2018 at 2:28 PM 张铎(Duo Zhang) <
> palomino...@gmail.com
> > >
> > > > > wrote:
> > > > >
> > > > > > I'm still testing but it is really hard for me to get all UTs
> pass,
> > > but
> > > > > not
> > > > > > the same UT... Still trying...
> > > > > >
> > > > > > Artem Ervits  于2018年11月27日周二 上午6:03写道:
> > > > > >
> > > > > > > +1 (non-binding)
> > > > > > >
> > > > > > >   signatures and sums for src and bin: OK
> > > > > > >   installed on pseudodistributed hadoop 2.7.7: OK
> > > > > > >   hbase shell: OK
> > > > > > >   logs: OK
> > > > > > >   UI: OK
> > > > > > >   LTT 1M write/read 20%: OK
> > > > > > >   compile from src OpenJDK 1.8.0_191: OK
> > > > > > >   Java: write 1M: OK
> > > > > > >
> > > > > > > On Sun, Nov 25, 2018 at 7:29 PM Stack 
> wrote:
> > > > > > >
> > > > > > > > The first release candidate for HBase 2.0.3 is available for
> > > > > download:
> > > > > > > >
> > > > > > > >
> https://dist.apache.org/repos/dist/dev/hbase/hbase-2.0.3RC0/
> > > > > > > >
> > > > > > > > Maven artifacts are also available in a staging repository
> at:
> > > > > > > >
> > > > > > > >
> > > > > >
> > > https://repository.apache.org/content/repositories/orgapachehbase-1237
> > > > > > > >
> > > > > > > > Artifacts are signed with my key (DB9D313DA7874F

Re: [VOTE] First release candidate for HBase 2.0.3 is available

2018-11-27 Thread Allan Yang
I run TestMasterFailoverWithProcedures on my Mac for about 10 times, only 1
time failed. Agree with Peter in HBASE-21518 that the failure is because of
shutting down minicluster is timeout. is It blocking the release here? I
think we can release 2.0.3 if we have enough +1, and resolve this in
HBASE-21518 later.
@Duo Zhang, FYI, can you post what test is flaky? in 2.0.3, I have
HBASE-21423 <https://issues.apache.org/jira/browse/HBASE-21423>  checked
in, which will add an additional worker thread for meta table. If the test
requires only one worker thread(like testAndDoubleExecution), this
issue may make it flaky. I have disabled this feature for some of the test
in HBASE-21468 <https://issues.apache.org/jira/browse/HBASE-21468>. Maybe
there is more.
Best Regards
Allan Yang


Peter Somogyi  于2018年11月28日周三 上午5:58写道:

> On my MacBook Pro this test failed 8 out of 20 runs. I made some analysis
> over HBASE-21518 and for me it looks like shutdown race condition.
>
> On Tue, Nov 27, 2018 at 6:04 PM Stack  wrote:
>
> > Thanks boys.
> >
> > I was going off the nightly [1] and the flakey list [2]. They didn't look
> > too bad. Shout which tests are causing you issue and I can take a look.
> > TestMasterFailoverWithProcedures seems like it fails on occasion, but
> only
> > on the gce runs at < 20% of time. You see different Peter?
> >
> > Thanks,
> > St.Ack
> >
> > 1. https://bu
> > ilds.apache.org/view/H-L/view/HBase/job/HBase%20Nightly/job/branch-2.0/
> > 2.
> >
> >
> https://builds.apache.org/view/H-L/view/HBase/job/HBase-Find-Flaky-Tests/job/branch-2.0/lastSuccessfulBuild/artifact/dashboard.html
> >
> > On Tue, Nov 27, 2018 at 7:49 AM Peter Somogyi 
> wrote:
> >
> > > I'm also facing test issues. TestMasterFailoverWithProcedures is
> failing
> > > frequently on my machine and can be found on the flaky dashboard for
> > > multiple branches.  Created HBASE-21518 issue for this.
> > >
> > > On Tue, Nov 27, 2018 at 2:28 PM 张铎(Duo Zhang) 
> > > wrote:
> > >
> > > > I'm still testing but it is really hard for me to get all UTs pass,
> but
> > > not
> > > > the same UT... Still trying...
> > > >
> > > > Artem Ervits  于2018年11月27日周二 上午6:03写道:
> > > >
> > > > > +1 (non-binding)
> > > > >
> > > > >   signatures and sums for src and bin: OK
> > > > >   installed on pseudodistributed hadoop 2.7.7: OK
> > > > >   hbase shell: OK
> > > > >   logs: OK
> > > > >   UI: OK
> > > > >   LTT 1M write/read 20%: OK
> > > > >   compile from src OpenJDK 1.8.0_191: OK
> > > > >   Java: write 1M: OK
> > > > >
> > > > > On Sun, Nov 25, 2018 at 7:29 PM Stack  wrote:
> > > > >
> > > > > > The first release candidate for HBase 2.0.3 is available for
> > > download:
> > > > > >
> > > > > >  https://dist.apache.org/repos/dist/dev/hbase/hbase-2.0.3RC0/
> > > > > >
> > > > > > Maven artifacts are also available in a staging repository at:
> > > > > >
> > > > > >
> > > >
> https://repository.apache.org/content/repositories/orgapachehbase-1237
> > > > > >
> > > > > > Artifacts are signed with my key (DB9D313DA7874F29) published in
> > our
> > > > > > KEYS file at http://www.apache.org/dist/hbase/KEYS
> > > > > >
> > > > > > The RC corresponds to the signed tag 2.0.3RC0, which currently
> > points
> > > > > > to commit
> > > > > >
> > > > > >   87a3aea8ee2d284807f7d4fbdac1f6d9dfedbc17
> > > > > >
> > > > > > HBase 2.0.3 is the third maintenance release in the HBase 2.0
> line,
> > > > > > continuing on the theme of bringing a stable, reliable database
> to
> > > > > > the Hadoop and NoSQL communities. This release includes ~120 bug
> > > > > > and improvements fixes done since the 2.0.2 release almost 3
> months
> > > > > > ago.
> > > > > >
> > > > > > The detailed source and binary compatibility report vs 2.0.2 has
> > been
> > > > > > published for your review, at:
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://dist.apache.org/repos/dist/dev/hbase/hbase-2.0.3RC0/compat-check-report-2.0.2-vs-2.0.3.html
> > > > > >
> > > > > > The report shows no incompatibilities.
> > > > > >
> > > > > > The full list of fixes included in this release is available in
> > > > > > the CHANGES.md that ships as part of the release also available
> > > > > > here:
> > > > > >
> > > > > >
> > > >
> https://dist.apache.org/repos/dist/dev/hbase/hbase-2.0.3RC0/CHANGES.md
> > > > > >
> > > > > > The RELEASENOTES.md are here:
> > > > > >
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://dist.apache.org/repos/dist/dev/hbase/hbase-2.0.3RC0/RELEASENOTES.md
> > > > > >
> > > > > > Please try out this candidate and vote +1/-1 on whether we should
> > > > > > release these artifacts as HBase 2.0.3.
> > > > > >
> > > > > > The VOTE will remain open for at least 72 hours. Given sufficient
> > > votes
> > > > > > I would like to close it on Thursday, November 29th, 2018.
> > > > > >
> > > > > > Thanks,
> > > > > > S
> > > > > >
> > > > >
> > > >
> > >
> >
>


Re: [DISCUSS] Gathering metrics on HBase versions in use

2018-11-14 Thread Allan Yang
I also think having metrics about the downloads from Apache/archives is a
doable action. Most HBase clusters are running in user's Intranet with no
public access, sending anonymous data from them may not be possible. And
also we need to find a way to obtain their authorization I think...
Best Regards
Allan Yang

Zach York  于2018年11月15日周四 上午5:35写道:

> Can we have metrics around the downloads from Apache/archives? I'm not sure
> how that is all set up, but might be a low cost way to get some metrics.
>
> On Wed, Nov 14, 2018, 12:12 PM Andrew Purtell 
> > While it seems you are proposing some kind of autonomous ongoing usage
> > metrics collection, please note I ran an anonymous version usage survey
> via
> > surveymonkey for 1.x last year. It was opt in and there were no PII
> > concerns by its nature. All of the issues around data collection,
> storage,
> > and processing were also handled (by surveymonkey). Unfortunately I
> > recently cancelled my account.
> >
> > For occasional surveys something like that might work. Otherwise there
> are
> > a ton of questions: How do we generate the data? How do we get per-site
> > opt-in permission? How do we collect the data? Store it? Process it?
> Audit
> > it? Seems more trouble than it's worth and requires ongoing volunteer
> > hosting and effort to maintain.
> >
> >
> > On Wed, Nov 14, 2018 at 11:47 AM Misty Linville 
> wrote:
> >
> > > When discussing the 2.0.x branch in another thread, it came up that we
> > > don’t have a good way to understand the version skew of HBase across
> the
> > > user base. Metrics gathering can be tricky. You don’t want to capture
> > > personally identifiable information (PII) and you need to be
> transparent
> > > about what you gather, for what purpose, how long the data will be
> > > retained, etc. The data can also be sensitive, for instance if a large
> > > number of installations are running a version with a CVE or known
> > > vulnerability against it. If you gather metrics, it really needs to be
> > > opt-out rather than opt-in so that you actually get a reasonable amount
> > of
> > > data. You also need to stand up some kind of metrics-gathering service
> > and
> > > run it somewhere, and some kind of reporting / visualization tooling.
> The
> > > flip side of all these difficulties is a more intelligent way to decide
> > > when to retire a branch or when to communicate more broadly / loudly
> > asking
> > > people in a certain version stream to upgrade, as well as where to
> > > concentrate our efforts.
> > >
> > > I’m not sticking my hand up to implement such a monster. I only wanted
> to
> > > open a discussion and see what y’all think. It seems to me that a few
> > > must-haves are:
> > >
> > > - Transparency: Release notes, logging about the status of
> > > metrics-gathering (on or off) at master or RS start-up, logging about
> > > exactly when and what metrics are sent
> > > - Low frequency: Would we really need to wake up and send metrics more
> > > often than weekly?
> > > - Conservative approach: Only collect what we can find useful today,
> > don’t
> > > collect the world.
> > > - Minimize PII: This probably means not trying to group together
> > > time-series results for a given server or cluster at all, but could
> make
> > > the data look like there were a lot more clusters running in the world
> > than
> > > really are.
> > > - Who has access to the data? Do we make it public or limit access to
> the
> > > PMC? Making it public would bolster our discipline about transparency
> and
> > > minimizing PII.
> > >
> > > I’m sure I’m missing a ton so I leave the discussion to y’all.
> > >
> >
> >
> > --
> > Best regards,
> > Andrew
> >
> > Words like orphans lost among the crosstalk, meaning torn from truth's
> > decrepit hands
> >- A23, Crosstalk
> >
>


Re: [ANNOUNCE] New HBase committer Jingyun Tian

2018-11-13 Thread Allan Yang
Congratulations, Jingyun!
Best Regards
Allan Yang


Ashish Singhi  于2018年11月13日周二 下午4:33写道:

> Congratulations & Welcome!
>
> Regards,
> Ashish
>
> On Tue, Nov 13, 2018 at 1:24 PM 张铎(Duo Zhang) 
> wrote:
>
> > On behalf of the Apache HBase PMC, I am pleased to announce that Jingyun
> > Tian has accepted the PMC's invitation to become a committer on the
> > project. We appreciate all of Jingyun's generous contributions thus far
> and
> > look forward to his continued involvement.
> >
> > Congratulations and welcome, Jingyun!
> >
>


[jira] [Created] (HBASE-21469) Re-visit post* hooks in DDL operations

2018-11-11 Thread Allan Yang (JIRA)
Allan Yang created HBASE-21469:
--

 Summary: Re-visit post* hooks in DDL operations
 Key: HBASE-21469
 URL: https://issues.apache.org/jira/browse/HBASE-21469
 Project: HBase
  Issue Type: Bug
Affects Versions: 2.0.2, 2.1.1
Reporter: Allan Yang
Assignee: Allan Yang


I have some discuss in HBASE-19953 from 
[here|https://issues.apache.org/jira/browse/HBASE-19953?focusedCommentId=16673126&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16673126]
In HBASE-19953,[~elserj] want to make sure that the post* hooks are called only 
when the procedures finish.But it accidentally turns modifytable and truncate 
table request into a sync call, which make clients RPC timeout easily on big 
tables.
We should re-visit those postxxx hooks in DDL operations, because they are now 
not consistent now:
For DDLs other than modifytable and truncate table, although the call will wait 
on the latch, the latch is actually released just after prepare state, so we 
still call postxxx hooks before the operation finish.
For DDLs of  modifytable and truncate, the latch is only released after the 
whole procedure finish. So the effort works(but will cause RPC timeout)
I think these latches are designed for compatibility with 1.x clients. Take 
ModifyTable for example, in 1.x, we use admin.getAlterStauts() to check the 
alter status, but in 2.x, this method is deprecated and returning inaccurate 
result, we have to make 1.x client in a sync wait.
And for the semantics of postxxx hooks in 1.x, we will call them after the 
corresponding DDL request return, but actually, the DDL request may not 
finished also since we don't want for region assignment.

So, here, we need to discuss the semantics of postxxx hooks in DDL operations, 
we need to make it consistent in every DDL operations, do we really need to 
make sure this hooks being called only after the operation finish? What's more, 
we have postCompletedxxx hooks for that need.
  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (HBASE-21423) Procedures for meta table/region should be able to execute in separate workers

2018-11-11 Thread Allan Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21423?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allan Yang resolved HBASE-21423.

Resolution: Fixed

Opened HBASE-21468 for the addendum, close this one

> Procedures for meta table/region should be able to execute in separate 
> workers 
> ---
>
> Key: HBASE-21423
> URL: https://issues.apache.org/jira/browse/HBASE-21423
> Project: HBase
>  Issue Type: Sub-task
>Affects Versions: 2.1.1, 2.0.2
>    Reporter: Allan Yang
>Assignee: Allan Yang
>Priority: Major
> Fix For: 2.0.3, 2.1.2
>
> Attachments: HBASE-21423.branch-2.0.001.patch, 
> HBASE-21423.branch-2.0.002.patch, HBASE-21423.branch-2.0.003.patch, 
> HBASE-21423.branch-2.0.addendum.patch
>
>
> We have higher priority for meta table procedures, but only in queue level. 
> There is a case that the meta table is closed and a AssignProcedure(or RTSP 
> in branch-2+) is waiting there to be executed, but at the same time, all the 
> Work threads are executing procedures need to write to meta table, then all 
> the worker will be stuck and retry for writing meta, no worker will take the 
> AP for meta.
> Though we have a mechanism that will detect stuck and adding more 
> ''KeepAlive'' workers to the pool to resolve the stuck. It is already stuck a 
> long time.
> This is a real case I encountered in ITBLL.
> So, I add one 'Urgent work' to the ProceudureExecutor, which only take meta 
> procedures(other workers can take meta procedures too), which can resolve 
> this kind of stuck.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HBASE-21468) separate workers for meta table is not working

2018-11-11 Thread Allan Yang (JIRA)
Allan Yang created HBASE-21468:
--

 Summary: separate workers for meta table is not working
 Key: HBASE-21468
 URL: https://issues.apache.org/jira/browse/HBASE-21468
 Project: HBase
  Issue Type: Bug
Affects Versions: 2.0.2, 2.1.1
Reporter: Allan Yang
Assignee: Allan Yang


This is an addendum for HBASE-21423, since HBASE-21423 is already closed, the 
QA won't be triggered.
It is my mistake that the separate workers for meta table is not working, since 
when polling from queue, the onlyUrgent flag is not passed in.
And for some UT that only require one worker thread, urgent workers should set 
to 0 to ensure there is one worker at time.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: [DISCUSS] Release cadence for HBase 2.y

2018-11-11 Thread Allan Yang
But, since that some users may already have their production system on
HBase-2.0.x, maybe we should consider their feelings, they are the 'first
movers'. If we retire branch-2.0 so quickly, IIRC, the branch-2.0 will be
the shortest life branch ever. I think it will hurt the feeling of those
'first movers'. If we take this into count, then I think maybe we should
keep branch-2.0 at least one year. If the community is shorthanded, I
volunteer to take responsible to decide/backport/release branch-2.0 since I
was working on this branch most recently.
Anyway, this thread is about releasing HBase2.2.x, I will vote a +1 for it,
as for HBase-2.0.x, we can discuss later.
Best Regards
Allan Yang


Allan Yang  于2018年11月12日周一 上午10:12写道:

> Stack, are you suggest about retiring branch-2.0? I think it is OK, since
> branch-2.0 is almost the same with branch-2.1 now(except some new feature
> on replication). Yes, agree that we should help out on branch-2.2. AMv2
> changed a lot in branch-2, there may still have some work to do to make
> branch-2.2 stable. But at same time, I think we can mark branch-2.1 as
> stable. We have done tremendous work on this branch, and recently ITBLLs
> shows it is already stable enough(based on our internal version, but most
> of patches in branch-2.1 was backported)
> Best Regards
> Allan Yang
>
>
> Stack  于2018年11月12日周一 上午6:57写道:
>
>> Agree w/ Duo that the 2.x releases have been gated on stability watersheds
>> rather than features.
>>
>> What else do we need to add to HBCK2 Duo (apart from a release)?
>>
>> Related, I was going to work on a 2.0.3 release. It has been a while and a
>> bunch of good stability work has made it into branch-2.0. Thereafter
>> though, I was going to let branch-2.0 go unless demand -- Allan Yang? --
>> and switch instead to helping out on branch-2.2.
>>
>> S
>>
>> On Thu, Nov 8, 2018 at 6:10 PM 张铎(Duo Zhang) 
>> wrote:
>>
>> > I think for the 2.x release the problem is that we are still busy on
>> making
>> > the code stable, or speak more clearly, to make the procedure v2
>> framework
>> > stable... And another big problem is lacking of HBCK2 support. These
>> things
>> > are all big issues which prevent people to upgrade to 2.x.
>> >
>> > Once these things are done, I think a monthly release will not be a big
>> > problem to the RMs. Just simply run an ITBLL(for now it is not easy to
>> get
>> > a successful run and then we need to find out why...), and then the
>> > make_rc.sh can not everything for you...
>> >
>> > Sean Busbey  于2018年11月9日周五 上午9:45写道:
>> >
>> > > I think it just shifts the RM burden, no? Like instead of watching
>> e.g.
>> > > branch-2.2 I instead need to watch branch-2.
>> > >
>> > > On Thu, Nov 8, 2018, 17:28 Josh Elser > > >
>> > > > I think what I'd be concerned about WRT time-based releases is the
>> > > > burden on RM to keep the branch in a good state. Perhaps we need to
>> not
>> > > > push that onto an RM and do better about sharing that load (looking
>> in
>> > > > the mirror).
>> > > >
>> > > > However, I do like time-based releases as a means to avoid "hurt
>> > > > feelings" (e.g. the personal ties of a developer to a feature. "The
>> > > > release goes out on /yy/xx, this feature is not yet ready, can
>> go
>> > > > out one month later.." etc)
>> > > >
>> > > > On 11/7/18 2:31 PM, Sean Busbey wrote:
>> > > > > Hi folks!
>> > > > >
>> > > > > Some time ago we talked about trying to get back on track for a
>> more
>> > > > > regular cadence of minor releases rather than maintenance releases
>> > > > > (like how we did back pre-1.0). That never quite worked out for
>> the
>> > > > > HBase 1.y line, but is still something we could make happen for
>> HBase
>> > > > > 2.
>> > > > >
>> > > > > We're coming up on 4 months since the 2.1 release line started.
>> ATM
>> > > > > there are 63 issues in JIRA that claim to be in 2.2.0 and not in
>> any
>> > > > > 2.1.z version[1].
>> > > > >
>> > > > > The main argument against starting to do a 2.2.0 release is that
>> > > > > nothing springs out of that list as a "feature" that would entice
>> > > > > u

Re: [DISCUSS] Release cadence for HBase 2.y

2018-11-11 Thread Allan Yang
Stack, are you suggest about retiring branch-2.0? I think it is OK, since
branch-2.0 is almost the same with branch-2.1 now(except some new feature
on replication). Yes, agree that we should help out on branch-2.2. AMv2
changed a lot in branch-2, there may still have some work to do to make
branch-2.2 stable. But at same time, I think we can mark branch-2.1 as
stable. We have done tremendous work on this branch, and recently ITBLLs
shows it is already stable enough(based on our internal version, but most
of patches in branch-2.1 was backported)
Best Regards
Allan Yang


Stack  于2018年11月12日周一 上午6:57写道:

> Agree w/ Duo that the 2.x releases have been gated on stability watersheds
> rather than features.
>
> What else do we need to add to HBCK2 Duo (apart from a release)?
>
> Related, I was going to work on a 2.0.3 release. It has been a while and a
> bunch of good stability work has made it into branch-2.0. Thereafter
> though, I was going to let branch-2.0 go unless demand -- Allan Yang? --
> and switch instead to helping out on branch-2.2.
>
> S
>
> On Thu, Nov 8, 2018 at 6:10 PM 张铎(Duo Zhang) 
> wrote:
>
> > I think for the 2.x release the problem is that we are still busy on
> making
> > the code stable, or speak more clearly, to make the procedure v2
> framework
> > stable... And another big problem is lacking of HBCK2 support. These
> things
> > are all big issues which prevent people to upgrade to 2.x.
> >
> > Once these things are done, I think a monthly release will not be a big
> > problem to the RMs. Just simply run an ITBLL(for now it is not easy to
> get
> > a successful run and then we need to find out why...), and then the
> > make_rc.sh can not everything for you...
> >
> > Sean Busbey  于2018年11月9日周五 上午9:45写道:
> >
> > > I think it just shifts the RM burden, no? Like instead of watching e.g.
> > > branch-2.2 I instead need to watch branch-2.
> > >
> > > On Thu, Nov 8, 2018, 17:28 Josh Elser  > >
> > > > I think what I'd be concerned about WRT time-based releases is the
> > > > burden on RM to keep the branch in a good state. Perhaps we need to
> not
> > > > push that onto an RM and do better about sharing that load (looking
> in
> > > > the mirror).
> > > >
> > > > However, I do like time-based releases as a means to avoid "hurt
> > > > feelings" (e.g. the personal ties of a developer to a feature. "The
> > > > release goes out on /yy/xx, this feature is not yet ready, can go
> > > > out one month later.." etc)
> > > >
> > > > On 11/7/18 2:31 PM, Sean Busbey wrote:
> > > > > Hi folks!
> > > > >
> > > > > Some time ago we talked about trying to get back on track for a
> more
> > > > > regular cadence of minor releases rather than maintenance releases
> > > > > (like how we did back pre-1.0). That never quite worked out for the
> > > > > HBase 1.y line, but is still something we could make happen for
> HBase
> > > > > 2.
> > > > >
> > > > > We're coming up on 4 months since the 2.1 release line started. ATM
> > > > > there are 63 issues in JIRA that claim to be in 2.2.0 and not in
> any
> > > > > 2.1.z version[1].
> > > > >
> > > > > The main argument against starting to do a 2.2.0 release is that
> > > > > nothing springs out of that list as a "feature" that would entice
> > > > > users to upgrade. Waiting for these kinds of selling points to
> drive
> > a
> > > > > release is commonly referred to as "feature based releases." I
> think
> > > > > it would be fair to characterize the HBase 2.0 release as feature
> > > > > based centered on AMv2.
> > > > >
> > > > > An alternative to feature based releases is date based releases
> where
> > > > > we decide that e.g. we'll have a minor release each month
> regardless
> > > > > of how much is included in it. This is sometimes also called "train
> > > > > releases" as an analogy to how trains leave a station on a set
> > > > > schedule without regard to which individual passengers are ready.
> > Just
> > > > > as you'd catch the next scheduled train if you miss-timed your
> > > > > arrival, fixes or features that aren't ready just go in the next
> > > > > regular release.
> > > > >
> > > > > Personally, 

[jira] [Reopened] (HBASE-21423) Procedures for meta table/region should be able to execute in separate workers

2018-11-10 Thread Allan Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21423?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allan Yang reopened HBASE-21423:


> Procedures for meta table/region should be able to execute in separate 
> workers 
> ---
>
> Key: HBASE-21423
> URL: https://issues.apache.org/jira/browse/HBASE-21423
> Project: HBase
>  Issue Type: Sub-task
>Affects Versions: 2.1.1, 2.0.2
>    Reporter: Allan Yang
>Assignee: Allan Yang
>Priority: Major
> Fix For: 2.0.3, 2.1.2
>
> Attachments: HBASE-21423.branch-2.0.001.patch, 
> HBASE-21423.branch-2.0.002.patch, HBASE-21423.branch-2.0.003.patch
>
>
> We have higher priority for meta table procedures, but only in queue level. 
> There is a case that the meta table is closed and a AssignProcedure(or RTSP 
> in branch-2+) is waiting there to be executed, but at the same time, all the 
> Work threads are executing procedures need to write to meta table, then all 
> the worker will be stuck and retry for writing meta, no worker will take the 
> AP for meta.
> Though we have a mechanism that will detect stuck and adding more 
> ''KeepAlive'' workers to the pool to resolve the stuck. It is already stuck a 
> long time.
> This is a real case I encountered in ITBLL.
> So, I add one 'Urgent work' to the ProceudureExecutor, which only take meta 
> procedures(other workers can take meta procedures too), which can resolve 
> this kind of stuck.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: if changing the RingBuffer to priority Based Queue cause correctness issue in HBase?

2018-11-07 Thread Allan Yang
We have different priority in RPC level, we can even separate write
requests and read requests to different handlers. But, yes, the WAL uses
FIFO queue, and even more, we use a single thread to consume the append and
sync requests. The reason is that we need to keep the sequenceID of WAL
entry to increase monotonically, otherwise, data will can be loss when
replaying or replicating. The sequenceID is assigned when the WAL entry was
queued, it is impossible to serve a 'priority' request comes later without
breaking the order. However, if requests are from different table, I think
you can add a new WALProvider to separate the table's WAL from others to
achieve better Isolation.
Best Regards
Allan Yang


Jing Liu  于2018年11月8日周四 上午5:49写道:

> Hi,
>
> I'am trying to add priority to schedule different types of requests in
> HBase. But the Write-ahead logging use RingBuffer which
> is essentially a FIFO queue makes it hard. In this case, let's say if the
> low priority request already queued in the RingBuffer, the high priority
> request can not be executed before all those queued low priority request.
> I'm wondering if I change the FIFO queue into Priority-based queue
> will violate the write consistency guarantee or other issues?
>
> Thanks,
> Jing
>


[jira] [Created] (HBASE-21423) Procedures for meta table/region should be able to executed in separate workers

2018-11-01 Thread Allan Yang (JIRA)
Allan Yang created HBASE-21423:
--

 Summary: Procedures for meta table/region should be able to 
executed in separate workers 
 Key: HBASE-21423
 URL: https://issues.apache.org/jira/browse/HBASE-21423
 Project: HBase
  Issue Type: Sub-task
Affects Versions: 2.0.2, 2.1.1
Reporter: Allan Yang
Assignee: Allan Yang


We have higher priority for meta table procedures, but only in queue level. 
There is a case that the meta table is closed and a AssignProcedure(or RTSP in 
branch-2+) is waiting there to be executed, but at the same time, all the Work 
threads are executing procedures need to write to meta table, then all the 
worker will be stuck and retry for writing meta, no worker will take the AP for 
meta.
Though we have a mechanism that will detect stuck and adding more ''KeepAlive'' 
workers to the pool to resolve the stuck. It is already stuck a long time.
This is a real case I encountered in ITBLL.
So, I add one 'Urgent work' to the ProceudureExecutor, which only take meta 
procedures(other workers can take meta procedures too), which can resolve this 
kind of stuck.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HBASE-21421) Do not kill RS if reportOnlineRegions fails

2018-11-01 Thread Allan Yang (JIRA)
Allan Yang created HBASE-21421:
--

 Summary: Do not kill RS if reportOnlineRegions fails
 Key: HBASE-21421
 URL: https://issues.apache.org/jira/browse/HBASE-21421
 Project: HBase
  Issue Type: Sub-task
Affects Versions: 2.0.2, 2.1.1
Reporter: Allan Yang
Assignee: Allan Yang


In the periodic regionServerReport call from RS to master, we will check 
master.getAssignmentManager().reportOnlineRegions() to make sure the RS has a 
different state from Master. If RS holds a region which master think should be 
on another RS, the Master will kill the RS.

But, the regionServerReport could be lagging(due to network or something), 
which can't represent the current state of RegionServer. Besides, we will call 
reportRegionStateTransition and try forever until it successfully reported to 
master  when online a region. We can count on reportRegionStateTransition calls.

I have encountered cases that the regions are closed on the RS and  
reportRegionStateTransition to master successfully. But later, a lagging 
regionServerReport tells the master the region is online on the RS(Which is not 
at the moment, this call may generated some time ago and delayed by network 
somehow), the the master think the region should be on another RS, and kill the 
RS, which should not be.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: [VOTE] First release candidate for hbase-2.1.1 (RC0) is available for download and test

2018-10-29 Thread Allan Yang
+1(non-binding)
-- Build from src - OK
-- SHA and Signatures(both src, bin and client) - OK
-- start in distributed mode - OK
-- Basic shell commands[create/drop/alter/get/put/scan] - OK
-- Check Web UI - OK (There a useless small arrow on right of the Region
Servers section, may be a result of HBASE-21207
<https://issues.apache.org/jira/browse/HBASE-21207> )
-- ITBLL 1B row for 1 run - OK
Best Regards
Allan Yang


张铎(Duo Zhang)  于2018年10月29日周一 上午11:40写道:

>
> +1(binding)
>
> Built from src: OK
> Checked sums & sigs: All matched
> Run all UTs(jdk8u151): All passed
> Started a 5 nodes cluster: OK, but the master page looks a bit strange,
> CSS problem?(Please see the attachment)
> Run basic shell cmds: OK
> Run LTT with 1M rows: Read & Write, both OK
>
>
>
> Stack  于2018年10月28日周日 上午8:05写道:
>
>> Here is the compatibility report comparing 2.1.0 to 2.1.1:
>>
>>
>>
>> https://dist.apache.org/repos/dist/dev/hbase/hbase-2.1.1RC0/compatibility_report_2.1.0_vs_2.1.1.html
>>
>> Thanks,
>> S
>>
>>
>>
>> On Sat, Oct 27, 2018 at 11:29 AM Stack  wrote:
>>
>> > The first release candidate for Apache HBase 2.1.1 (RC0) is available
>> > for download and testing. This is a bug fix release with 180+ commits
>> [1]
>> > since hbase-2.1.0. Release notes are available here [2]. The
>> compatibility
>> > report is at [3].
>> >
>> > Artifacts are available here:
>> >
>> >   https://dist.apache.org/repos/dist/dev/hbase/hbase-2.1.1RC0/
>> >
>> > Corresponding convenience artifacts for maven use are in the staging
>> > repository:
>> >
>> >  https://repository.apache.org/content/repositories/orgapachehbase-1235
>> >
>> > All artifacts are signed with my code signing key, 8ACC93D2, which is
>> > also in the project KEYS file:
>> >
>> > http://www.apache.org/dist/hbase/KEYS
>> >
>> > These artifacts correspond to commit ref
>> >
>> >  b60a92d6864ef27295027f5961cb46f9162d7637
>> >
>> > which has been tagged as 2.1.1RC0.
>> >
>> > Nightlies are looking pretty good:
>> >
>> >
>> >
>> https://builds.apache.org/view/H-L/view/HBase/job/HBase%20Nightly/job/branch-2.1/
>> >
>> > Please take a few minutes to verify the release and vote on releasing
>> it:
>> >
>> > [ ] +1 Release these artifacts as Apache HBase 2.1.1
>> > [ ] -1 Do not release this package because ...
>> >
>> > This VOTE thread will remain open for at least 72 hours. It will close
>> > for sure
>> > Wednesday night (10/31).
>> >
>> > Thanks,
>> > S
>> >
>> > 1.
>> https://dist.apache.org/repos/dist/dev/hbase/hbase-2.1.1RC0/CHANGES.md
>> > 2.
>> >
>> https://dist.apache.org/repos/dist/dev/hbase/hbase-2.1.1RC0/RELEASENOTES.md
>> >
>> > P.S. Compatibility report is taking time. Will be back with it later
>> today.
>> >
>>
>


[jira] [Created] (HBASE-21395) Abort split/merge procedure if there is a table procedure of the same table going on

2018-10-26 Thread Allan Yang (JIRA)
Allan Yang created HBASE-21395:
--

 Summary: Abort split/merge procedure if there is a table procedure 
of the same table going on
 Key: HBASE-21395
 URL: https://issues.apache.org/jira/browse/HBASE-21395
 Project: HBase
  Issue Type: Sub-task
Affects Versions: 2.0.2, 2.1.0
Reporter: Allan Yang
Assignee: Allan Yang


In my ITBLL, I often see that if split/merge procedure and table procedure(like 
ModifyTableProcedure) happen at the same time, and since there some race 
conditions between these two kind of procedures,  causing some serious 
problems. e.g. the split/merged parent is bought on line by the table procedure 
or the split merged region making the whole table procedure rollback.
Talked with [~Apache9] offline today, this kind of problem was solved in 
branch-2+ since There is a fence that only one RTSP can agianst a single region 
at the same time.
To keep out of the mess in branch-2.0 and branch-2.1, I added a simple safe 
fence in the split/merge procedure: If there is a table procedure going on 
against the same table, then abort the split/merge procedure. Aborting the 
split/merge procedure at the beginning of the execution is no big deal, 
compared with the mess it will cause...




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (HBASE-21364) Procedure holds the lock should put to front of the queue after restart

2018-10-24 Thread Allan Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-21364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allan Yang resolved HBASE-21364.

   Resolution: Fixed
Fix Version/s: 2.2.0
   3.0.0

> Procedure holds the lock should put to front of the queue after restart
> ---
>
> Key: HBASE-21364
> URL: https://issues.apache.org/jira/browse/HBASE-21364
> Project: HBase
>  Issue Type: Sub-task
>Affects Versions: 2.1.0, 2.0.2
>    Reporter: Allan Yang
>    Assignee: Allan Yang
>Priority: Blocker
> Fix For: 3.0.0, 2.2.0, 2.1.1, 2.0.3
>
> Attachments: HBASE-21364.branch-2.0.001.patch, 
> HBASE-21364.branch-2.0.002.patch
>
>
> After restore the procedures form Procedure WALs. We will put the runable 
> procedures back to the queue to execute. The order is not the problem before 
> HBASE-20846 since the first one to execute will acquire the lock itself. But 
> since the locks will restored after HBASE-20846. If we execute a procedure 
> without the lock first before a procedure with the lock in the same queue, 
> there is a race condition that we may not be able to execute all procedures 
> in the same queue at all.
> The race condtion is:
> 1. A procedure need to take the table's exclusive lock was put into the 
> table's queue, but the table's shard lock was lock by a Region Procedure. 
> Since no one takes the exclusive lock, the queue is put to run queue to 
> execute. But soon, the worker thread see the procedure can't execute because 
> it doesn't hold the lock, so it will stop execute and remove the queue from 
> run queue.
> 2. At the same time, the Region procedure which holds the table's shard lock 
> and the region's exclusive lock is put to the table's queue. But, since the 
> queue already added to the run queue, it won't add again.
> 3. Since 1, the table's queue was removed from the run queue.
> 4. Then, no one will put the table's queue back, thus no worker will execute 
> the procedures inside
> A test case in the patch shows how.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HBASE-21384) Procedure with holdlock=false should not be restored lock when restarts

2018-10-24 Thread Allan Yang (JIRA)
Allan Yang created HBASE-21384:
--

 Summary: Procedure with holdlock=false should not be restored lock 
when restarts 
 Key: HBASE-21384
 URL: https://issues.apache.org/jira/browse/HBASE-21384
 Project: HBase
  Issue Type: Sub-task
Reporter: Allan Yang
Assignee: Allan Yang


Yet another case of stuck similar with HBASE-21364.
The case is that:
1. A ModifyProcedure spawned a ReopenTableProcedure, and since its 
holdLock=false, so it release the lock
2. The  ReopenTableProcedure spawned several MoveRegionProcedure, it also has 
holdLock=false, but just after it store the children procedures to the wal and 
begin to release the lock, the master was killed.
3. When restarting, the  ReopenTableProcedure's lock was restored (since it was 
hold the lock before, which is not right, since it is in WAITING state now and 
its holdLock=false)
4. After restart, MoveRegionProcedure can execute since its parent has the 
lock, but when it spawned the AssignProcedure, the AssignProcedure procedure 
can't execute anymore, since it parent didn't have the lock, but its 'grandpa' 
- ReopenTableProcedure  has.
5. Restart the master, the stuck still, because we will restore the lock for 
ReopenTableProcedure.

Two fixes:
1. We should not restore the lock if the procedure doesn't hold lock and in 
WAITING state.
2. Procedures don't have lock but its parent has the lock should also be put in 
front of the queue, as a addendum of HBASE-21364.

Discussion:
 Should we check the lock of all ancestors not only its parents? As addressed 
in the comments of the patch, currently, after fix the issue above, check 
parent is enough.  




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HBASE-21376) Add some verbose log to MasterProcedureScheduler

2018-10-23 Thread Allan Yang (JIRA)
Allan Yang created HBASE-21376:
--

 Summary: Add some verbose log to MasterProcedureScheduler
 Key: HBASE-21376
 URL: https://issues.apache.org/jira/browse/HBASE-21376
 Project: HBase
  Issue Type: Sub-task
Reporter: Allan Yang
Assignee: Allan Yang


As discussed in HBASE-21364, we divided the patch in HBASE-21364 to two, the 
critical one is already submitted in HBASE-21364 to branch-2.0 and branch-2.1, 
but I also added some useful logs  which need to commit to all branches.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HBASE-21364) Procedure holds the lock should put to front of the queue after restart

2018-10-23 Thread Allan Yang (JIRA)
Allan Yang created HBASE-21364:
--

 Summary: Procedure holds the lock should put to front of the queue 
after restart
 Key: HBASE-21364
 URL: https://issues.apache.org/jira/browse/HBASE-21364
 Project: HBase
  Issue Type: Sub-task
Affects Versions: 2.0.2, 2.1.0
Reporter: Allan Yang
Assignee: Allan Yang


After restore the procedures form Procedure WALs. We will put the runable 
procedures back to the queue to execute. The order is not the problem before 
HBASE-20846 since the first one to execute will acquire the lock itself. But 
since the locks will restored after HBASE-20846. If we execute a procedure 
without the lock first before a procedure with the lock in the same queue, 
there is a race condition that we may not be able to execute all procedures in 
the same queue at all.
The race condtion is:
1. A procedure need to take the table's exclusive lock was put into the table's 
queue, but the table's shard lock was lock by a Region Procedure. Since no one 
takes the exclusive lock, the queue is put to run queue to execute. But soon, 
the worker thread see the procedure can't execute because it doesn't hold the 
lock, so it will stop execute and remove the queue from run queue.
2. At the same time, the Region procedure which holds the table's shard lock 
and the region's exclusive lock is put to the table's queue. But, since the 
queue already added to the run queue, it won't add again.
3. Since 1, the table's queue was removed from the run queue.
4. Then, no one will put the table's queue back, thus no worker will execute 
the procedures inside
A test case in the patch shows how.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HBASE-21357) RS should abort if OOM in Reader thread

2018-10-22 Thread Allan Yang (JIRA)
Allan Yang created HBASE-21357:
--

 Summary: RS should abort if OOM in Reader thread
 Key: HBASE-21357
 URL: https://issues.apache.org/jira/browse/HBASE-21357
 Project: HBase
  Issue Type: Bug
Affects Versions: 1.4.8
Reporter: Allan Yang
Assignee: Allan Yang


It is a bit strange, we will abort the RS if OOM in Listener thread, Responder 
thread and in CallRunner thread, only not in Reader thread... 
We should abort RS if OOM happens in Reader thread, too. If not, the reader 
thread exists because of OOM, and the selector closes. Later connection select 
to this reader will be ignored
{quote}
try {
  if (key.isValid()) {
if (key.isAcceptable())
  doAccept(key);
  }
} catch (IOException ignored) {
  if (LOG.isTraceEnabled()) LOG.trace("ignored", ignored);
}
{quote}
Leaving the client (or Master and other RS)'s call wait until SocketTimeout.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HBASE-21354) Procedure may be deleted improperly during master restarts resulting in

2018-10-20 Thread Allan Yang (JIRA)
Allan Yang created HBASE-21354:
--

 Summary: Procedure may be deleted improperly during master 
restarts resulting in 
 Key: HBASE-21354
 URL: https://issues.apache.org/jira/browse/HBASE-21354
 Project: HBase
  Issue Type: Sub-task
Reporter: Allan Yang






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: [ANNOUNCE] New Committer: Balazs Meszaros

2018-10-11 Thread Allan Yang
Congratulations!, Balazs.
Best Regards
Allan Yang


Reid Chan  于2018年10月12日周五 上午10:26写道:

> Congratulations and welcome, Balazs.
>
> --
>
> Best regards,
> R.C
>
>
>
> 
> From: Umesh Agashe 
> Sent: 12 October 2018 06:43
> To: dev@hbase.apache.org
> Cc: Hbase-User; meszib...@apache.org
> Subject: Re: [ANNOUNCE] New Committer: Balazs Meszaros
>
> Congrats Balazs!
>
> On Thu, Oct 11, 2018 at 3:34 PM Andrew Purtell 
> wrote:
>
> > Congratulations and welcome Balazs.
> >
> > On Thu, Oct 11, 2018 at 12:49 PM Sean Busbey  wrote:
> >
> > > On behalf of the HBase PMC, I'm pleased to announce that Balazs
> > > Meszaros has accepted our invitation to become an HBase committer.
> > >
> > > Thanks for all your hard work Balazs; we look forward to more
> > > contributions!
> > >
> > > Please join me in extending congratulations to Balazs!
> > >
> >
> >
> > --
> > Best regards,
> > Andrew
> >
> > Words like orphans lost among the crosstalk, meaning torn from truth's
> > decrepit hands
> >- A23, Crosstalk
> >
>


Re: [ANNOUNCE] Please welcome Zach York to the HBase PMC

2018-10-11 Thread Allan Yang
Congratulations, Zach !
Best Regards
Allan Yang


Andrew Purtell  于2018年10月12日周五 上午6:34写道:

> Welcome, Zach.
>
> On Thu, Oct 11, 2018 at 1:01 PM Sean Busbey  wrote:
>
> > On behalf of the Apache HBase PMC I am pleased to announce that Zach
> > York has accepted our invitation to become a PMC member on the Apache
> > HBase project. We appreciate Zach stepping up to take more
> > responsibility in the HBase project.
> >
> > Please join me in welcoming Zach to the HBase PMC!
> >
> > As a reminder, if anyone would like to nominate another person as a
> > committer or PMC member, even if you are not currently a committer or
> > PMC member, you can always drop a note to priv...@hbase.apache.org to
> > let us know.
> >
>
>
> --
> Best regards,
> Andrew
>
> Words like orphans lost among the crosstalk, meaning torn from truth's
> decrepit hands
>- A23, Crosstalk
>


[jira] [Created] (HBASE-21292) IdLock.getLockEntry() may hang if interrupted

2018-10-11 Thread Allan Yang (JIRA)
Allan Yang created HBASE-21292:
--

 Summary: IdLock.getLockEntry() may hang if interrupted
 Key: HBASE-21292
 URL: https://issues.apache.org/jira/browse/HBASE-21292
 Project: HBase
  Issue Type: Bug
Reporter: Allan Yang
Assignee: Allan Yang
 Fix For: 1.4.9, 2.0.2, 2.1.0


This is a rare case found by my colleague which really happened on our 
production env. 
Thread may hang(or enter a infinite loop ) when try to call 
IdLock.getLockEntry(). Here is the case:
1. Thread1 owned the IdLock, while Thread2(the only one waiting) was waiting 
for it.
2. Thread1 called releaseLockEntry, it will set IdLock.locked = false, but 
since Thread2 was waiting, it won't call map.remove(entry.id)
3. While Thread1 was calling releaseLockEntry, Thread2 was interrupted. So no 
one will remove this IdLock from the map.
4. If another thread try to call getLockEntry on this IdLock, it will end up in 
a infinite loop. Since existing = map.putIfAbsent(entry.id, entry)) != null and 
existing.locked=false

It is hard to write a UT since it is a very rare race condition.




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HBASE-21288) HostingServer in UnassignProcedure is not accurate

2018-10-10 Thread Allan Yang (JIRA)
Allan Yang created HBASE-21288:
--

 Summary: HostingServer in UnassignProcedure is not accurate
 Key: HBASE-21288
 URL: https://issues.apache.org/jira/browse/HBASE-21288
 Project: HBase
  Issue Type: Sub-task
  Components: amv2, Balancer
Affects Versions: 2.0.2, 2.1.0
Reporter: Allan Yang
Assignee: Allan Yang


We have a case that a region shows status OPEN on a already dead server in meta 
table(it is hard to trace how this happen), meaning this region is actually not 
online. But balance came and scheduled a MoveReionProcedure for this region, 
which created a mess:
The balancer 'thought' this region was on the server which has the same 
address(but with different startcode). So it schedules a MRP from this online 
server to another, but the UnassignProcedure dispatch the unassign call to the 
dead server according to regionstate, which then found the server dead and 
schedulre a SCP for the dead server. But since the UnassignProcedure's 
hostingServer is not accurate, the SCP can't interrupt it.
So, in the end, the SCP can't finish since the UnassignProcedure has the 
region' lock, the UnassignProcedure can finish since no one wake it, thus stuck.

Here is log, notice that the server of the UnassignProcedure is 
'hb-uf6oyi699w8h700f0-003.hbase.rds. ,16020,1539153278584' but it was dispatch 
to 'hb-uf6oyi699w8h700f0-003.hbase.rds. ,16020,1539076734964'

{code}
2018-10-10 14:34:50,011 INFO  [PEWorker-4] 
assignment.RegionTransitionProcedure(252): Dispatch pid=13, ppid=12, 
state=RUNNABLE:REGION_TRANSITION_DISPATCH, hasLock=true; UnassignProcedure 
table=hbase:acl, region=267335c85766c62479fb4a5f18a1e95f, 
server=hb-uf6oyi699w8h700f0-003.hbase.rds. ,16020,1539153278584; rit=CLOSING, 
location=hb-uf6oyi699w8h700f0-003.hbase.rds. ,16020,1539076734964
2018-10-10 14:34:50,011 WARN  [PEWorker-4] 
assignment.RegionTransitionProcedure(230): Remote call failed 
hb-uf6oyi699w8h700f0-003.hbase.rds. ,16020,1539076734964; pid=13, ppid=12, 
state=RUNNABLE:REGION_TRANSITION_DISPATCH, hasLock=true; UnassignProcedure 
table=hbase:acl, region=267335c85766c62479fb4a5f18a1e95f, 
server=hb-uf6oyi699w8h700f0-003.hbase.rds. ,16020,1539153278584; rit=CLOSING, 
location=hb-uf6oyi699w8h700f0-003.hbase.rds. ,16020,1539076734964; 
exception=NoServerDispatchException
org.apache.hadoop.hbase.procedure2.NoServerDispatchException: 
hb-uf6oyi699w8h700f0-003.hbase.rds. ,16020,1539076734964; pid=13, ppid=12, 
state=RUNNABLE:REGION_TRANSITION_DISPATCH, hasLock=true; UnassignProcedure 
table=hbase:acl, region=267335c85766c62479fb4a5f18a1e95f, 
server=hb-uf6oyi699w8h700f0-003.hbase.rds. ,16020,1539153278584

//Then a SCP was scheduled
2018-10-10 14:34:50,012 WARN  [PEWorker-4] master.ServerManager(635): 
Expiration of hb-uf6oyi699w8h700f0-003.hbase.rds. ,16020,1539076734964 but 
server not online
2018-10-10 14:34:50,012 INFO  [PEWorker-4] master.ServerManager(615): 
Processing expiration of hb-uf6oyi699w8h700f0-003.hbase.rds. 
,16020,1539076734964 on hb-uf6oyi699w8h700f0-001.hbase.rds. ,16000,1539088156164
2018-10-10 14:34:50,017 DEBUG [PEWorker-4] procedure2.ProcedureExecutor(1089): 
Stored pid=14, state=RUNNABLE:SERVER_CRASH_START, hasLock=false; 
ServerCrashProcedure server=hb-uf6oyi699w8h700f0-003.hbase.rds. 
,16020,1539076734964, splitWal=true, meta=false

//The SCP did not interrupt the UnassignProcedure but schedule new 
AssignProcedure for this region
2018-10-10 14:34:50,043 DEBUG [PEWorker-6] procedure.ServerCrashProcedure(250): 
Done splitting WALs pid=14, state=RUNNABLE:SERVER_CRASH_SPLIT_LOGS, 
hasLock=true; ServerCrashProcedure server=hb-uf6oyi699w8h700f0-003.hbase.rds. 
,16020,1539076734964, splitWal=true, meta=false
2018-10-10 14:34:50,054 INFO  [PEWorker-8] procedure2.ProcedureExecutor(1691): 
Initialized subprocedures=[{pid=15, ppid=14, 
state=RUNNABLE:REGION_TRANSITION_QUEUE, hasLock=false; AssignProcedure 
table=hbase:acl, region=267335c85766c62479fb4a5f18a1e95f}, {pid=16, ppid=14, 
state=RUNNABLE:REGION_TRANSITION_QUEUE, hasLock=false; AssignProcedure 
table=hbase:req_intercept_rule, region=460481706415d776b3742f428a6f579b}, 
{pid=17, ppid=14, state=RUNNABLE:REGION_TRANSITION_QUEUE, hasLock=false; 
AssignProcedure table=hbase:namespace, region=ec7a965e7302840120a5d8289947c40b}]
{code}


Here I also added a safe fence in balancer, if such regions are found, 
balancing is skipped for safe.It should do no harm.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HBASE-21253) Backport HBASE-21244 Skip persistence when retrying for assignment related procedures to branch-2.0 and branch-2.1

2018-09-28 Thread Allan Yang (JIRA)
Allan Yang created HBASE-21253:
--

 Summary: Backport HBASE-21244 Skip persistence when retrying for 
assignment related procedures to branch-2.0 and branch-2.1 
 Key: HBASE-21253
 URL: https://issues.apache.org/jira/browse/HBASE-21253
 Project: HBase
  Issue Type: Bug
Affects Versions: 2.0.2, 2.1.0
Reporter: Allan Yang
Assignee: Allan Yang


See HBASE-21244



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: [DISCUSSION] Adding hbck2 on a point release

2018-09-26 Thread Allan Yang
+1 for adding hbck2 on a point release. Agree that it is an   "operator
tooling" than "new feature" .
Best Regards
Allan Yang


Sean Busbey  于2018年9月27日周四 上午12:12写道:

> This sounds fine to me. More like "operator tooling" than "new
> feature" and we're more lax on that stuff.
> On Wed, Sep 26, 2018 at 5:50 AM Stack  wrote:
> >
> > Unless objection, I was going to add support for hbck2[1] into branch-2.0
> > and branch-2.1; i.e. hbck2 support would show up on a point release in
> > 2.1.1 and 2.0.3. hbck2 is made up of a client tool that lives at
> > hbase-hbck2 and a new HbckService hosted by the Master. It is the latter
> > that would be added on point release.
> >
> > This goes against our general philosophy of bug-fixes only on
> point-release
> > but the thinking is that we make an exception for a critical 'fixup'
> tool.
> >
> > Some notes:
> >
> >  * Above comes of some discussion done on the tail of
> > https://issues.apache.org/jira/browse/HBASE-19121
> >  * The 2.1.0 and <= 2.0.2 releases have no hbck2 support. The hbck2 tool
> > will exit and ask the user upgrade.
> >  * The suggestion that hbck2 only show up in the next minor release --
> > 2.2.0 -- was shot down because it would leave branch-2.0 and branch-2.1
> > releases without a fixup tooling.
> >  * The HbckService is distinct and should not disturb normal operation.
> It
> > exposes new API for the hbck2 client tool to pull on. It is not exposed
> as
> > 'public' and is awkward to get at so should not show up on user's radar
> > other than via the hbck2 client tool (TODO: verify).
> >
> > Shout if you think otherwise.
> > Thanks,
> > S
> >
> > 1. hbck2 is the replacement for the original hbck tool. hbck (A.K.A
> hbck1)
> > does not work against hbase-2.x clusters.
>


[jira] [Created] (HBASE-21237) Use CompatRemoteProcedureResolver to dispatch open/close region requests to RS

2018-09-26 Thread Allan Yang (JIRA)
Allan Yang created HBASE-21237:
--

 Summary: Use CompatRemoteProcedureResolver to dispatch open/close 
region requests to RS
 Key: HBASE-21237
 URL: https://issues.apache.org/jira/browse/HBASE-21237
 Project: HBase
  Issue Type: Sub-task
Affects Versions: 2.0.2, 2.1.0
Reporter: Allan Yang
Assignee: Allan Yang


As discussed in HBASE-21217, in branch-2.0 and branch-2.1, we should use  
CompatRemoteProcedureResolver  instead of ExecuteProceduresRemoteCall to 
dispatch region open/close requests to RS. Since ExecuteProceduresRemoteCall  
will group all the open/close operations in one call and execute them 
sequentially on the target RS. If one operation fails, all the operation will 
be marked as failure. Actually, some of the operations(like open region) is 
already executing in the open region handler thread. But master thinks these 
operations fails and reassign the regions to another RS. So when the previous 
RS report to the master that the region is online, master will kill the RS 
since it already assign the region to another RS.
For branch-2.2+, HBASE-21217 will fix this issue.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HBASE-21228) Memory leak since AbstractFSWAL caches Thread object and never clean later

2018-09-25 Thread Allan Yang (JIRA)
Allan Yang created HBASE-21228:
--

 Summary: Memory leak since AbstractFSWAL caches Thread object and 
never clean later
 Key: HBASE-21228
 URL: https://issues.apache.org/jira/browse/HBASE-21228
 Project: HBase
  Issue Type: Bug
Affects Versions: 1.4.7, 2.0.2, 2.1.0
Reporter: Allan Yang
Assignee: Allan Yang


In AbstractFSWAL(FSHLog in branch-1), we have a map caches thread and 
SyncFutures.
{code}
/**
   * Map of {@link SyncFuture}s keyed by Handler objects. Used so we reuse 
SyncFutures.
   * 
   * TODO: Reuse FSWALEntry's rather than create them anew each time as we do 
SyncFutures here.
   * 
   * TODO: Add a FSWalEntry and SyncFuture as thread locals on handlers rather 
than have them get
   * them from this Map?
   */
  private final ConcurrentMap syncFuturesByHandler;
{code}

A colleague of mine find a memory leak case caused by this map.

Every thread who writes WAL will be cached in this map, And no one will clean 
the threads in the map even after the thread is dead. 

In one of our customer's cluster, we noticed that even though there is no 
requests, the heap of the RS is almost full and CMS GC was triggered every 
second.
We dumped the heap and then found out there were more than 30 thousands threads 
with Terminated state. which are all cached in this map above. Everything 
referenced in these threads were leaked. Most of the threads are:
1.PostOpenDeployTasksThread, which will write Open Region mark in WAL
2. hconnection-0x1f838e31-shared--pool, which are used to write index short 
circuit(Phoenix), and WAL will be write and sync in these threads.
3.  Index writer thread(Phoenix), which referenced by RegionEnvironment  then 
by HRegion and finally been referenced by PostOpenDeployTasksThread.

We should turn this map into a thread local one, let JVM GC the terminated 
thread for us. 




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HBASE-21212) Wrong flush time when update flush metric

2018-09-19 Thread Allan Yang (JIRA)
Allan Yang created HBASE-21212:
--

 Summary: Wrong flush time when update flush metric
 Key: HBASE-21212
 URL: https://issues.apache.org/jira/browse/HBASE-21212
 Project: HBase
  Issue Type: Bug
Affects Versions: 2.0.2, 2.1.0, 3.0.0
Reporter: Allan Yang
Assignee: Allan Yang






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: [VOTE] Second release candidate for hbase-2.0.2 (RC1) is available for download

2018-08-29 Thread Allan Yang
+1 (non-binding)

-- Build from src - OK
-- SHA and Signatures(both src and bin) - OK
-- start in distributed mode - OK
-- Basic shell commands[create/drop/alter/get/put/scan] - OK
-- Check Web UI - OK
-- ITBLL 1B row for 1 run - OK

Best Regards
Allan Yang


Stack  于2018年8月29日周三 下午1:05写道:

> Oh, +1 from me!
> S
>
> On Tue, Aug 28, 2018 at 9:38 PM Stack  wrote:
>
> > The second release candidate for Apache HBase 2.0.2 (RC1) is available
> > for download and testing. This is a bug fix release with 100+ commits
> [1].
> > Release notes are available here [2]. The compatibility report is at [3].
> >
> > Artifacts are available here:
> >
> >  https://dist.apache.org/repos/dist/dev/hbase/hbase-2.0.2RC1/
> >
> > Corresponding convenience artifacts for maven use are in the staging
> > repository:
> >
> >  https://repository.apache.org/content/repositories/orgapachehbase-1231
> >
> > All artifacts are signed with my code signing key, 8ACC93D2, which is
> > also in the project KEYS file:
> >
> > http://www.apache.org/dist/hbase/KEYS
> >
> > These artifacts correspond to commit ref
> >
> >  1cfab033e779df840d5612a85277f42a6a4e8172
> >
> > which has been tagged as 2.0.2RC1.
> >
> > Please take a few minutes to verify the release and vote on releasing it:
> >
> > [ ] +1 Release these artifacts as Apache HBase 2.0.2
> > [ ] -1 Do not release this package because ...
> >
> > This VOTE thread will remain open for at least 72 hours. It will close
> for
> > sure
> > Saturday morning.
> >
> > Thanks,
> > S
> >
> > 1.
> https://dist.apache.org/repos/dist/dev/hbase/hbase-2.0.2RC1/CHANGES.md
> > 2.
> >
> https://dist.apache.org/repos/dist/dev/hbase/hbase-2.0.2RC1/RELEASENOTES.md
> > 3.
> >
> https://dist.apache.org/repos/dist/dev/hbase/hbase-2.0.2RC1/compatibility_report_2.0.1_2.0.2.html
> >
>


Re: Pictures, Videos and Slides for HBaseConAsia2018

2018-08-22 Thread Allan Yang
Great work! Thanks, Yu Li!
Best Regards
Allan Yang


Yu Li  于2018年8月22日周三 下午6:03写道:

> Hi all,
>
> HBaseConAsia2018 is successfully held on Aug. 17th in Beijing, China and
> please following below links for a quick review:
>
> Pictures:
> https://drive.google.com/drive/folders/1eGuNI029a78s_BdH37VsSr4uOalyLi5O
>
> Slides and Video recording:
> https://yq.aliyun.com/articles/626119
>
> Enjoy it and let's expect the next year!
>
> Yu - on behalf of HBaseConAsia2018 PC
>


[jira] [Created] (HBASE-21085) Adding getter methods to some private fields in ProcedureV2 module

2018-08-21 Thread Allan Yang (JIRA)
Allan Yang created HBASE-21085:
--

 Summary: Adding getter methods to some private fields in 
ProcedureV2 module 
 Key: HBASE-21085
 URL: https://issues.apache.org/jira/browse/HBASE-21085
 Project: HBase
  Issue Type: Sub-task
Reporter: Allan Yang
Assignee: Allan Yang


Many fields are private in ProcedureV2 module. adding getter method to them 
making them more transparent.
And some classes are private too, making it public.
Some class is private in ProcecudeV2 module, making it public.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HBASE-21083) Introduce a mechanism to bypass the execution of a stuck procedure

2018-08-21 Thread Allan Yang (JIRA)
Allan Yang created HBASE-21083:
--

 Summary: Introduce a mechanism to bypass the execution of a stuck 
procedure
 Key: HBASE-21083
 URL: https://issues.apache.org/jira/browse/HBASE-21083
 Project: HBase
  Issue Type: Sub-task
  Components: amv2
Affects Versions: 2.0.1, 2.1.0
Reporter: Allan Yang
Assignee: Allan Yang


Offline discussed with [~stack] and [~Apache9]. We all agreed that we need to 
introduce a mechanism to 'force complete' a stuck procedure, so the AMv2 can 
continue running.
 we still have some unrevealed bugs hiding in our AMv2 and procedureV2 system, 
we need something to interfere with stuck procedures before HBCK2 can work. 
This is very crucial for a production ready system. 

For now, we have little ways to interfere with running procedures. Aborting 
them is not a good choice, since some procedures are not abort-able. And some 
procedure may have overridden the abort() method, which will ignore the abort 
request.

So, here, I will introduce a mechanism  to bypass the execution of a stuck 
procedure.
Basically, I added a field called 'bypass' to Procedure class. If we set this 
field to true, all the logic in execute/rollback will be skipped, letting this 
procedure and its ancestors complete normally and releasing the lock resources 
at last.

Notice that bypassing a procedure may leave the cluster in a middle state, e.g. 
the region not assigned, or some hdfs files left behind. 
The Operators need know the side effect of bypassing and recover the 
inconsistent state of the cluster themselves, like issuing new procedures to 
assign the regions.

A patch will be uploaded and review board will be open. For now, only APIs in 
ProcedureExecutor are provided. If anything is fine, I will add it to master 
service and add a shell command to bypass a procedure. Or, maybe we can use 
dynamically compiled JSPs to execute those APIs as mentioned in HBASE-20679. 




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HBASE-21051) Possible NPE if ModifyTable and region split happen at the same time

2018-08-14 Thread Allan Yang (JIRA)
Allan Yang created HBASE-21051:
--

 Summary: Possible NPE if ModifyTable and region split happen at 
the same time
 Key: HBASE-21051
 URL: https://issues.apache.org/jira/browse/HBASE-21051
 Project: HBase
  Issue Type: Sub-task
  Components: amv2
Affects Versions: 2.0.1, 2.1.0
Reporter: Allan Yang
Assignee: Allan Yang


Similar with HBASE-20921, ModifyTable procedure and reopenProcedure won't held 
the lock, so another procedures like split/merge can execute at the same time.

1. a split happend during ModifyTable, as you can see from the log, the split 
was nealy complete.
{code}
2018-08-05 01:28:31,339 INFO  [PEWorker-8] procedure2.ProcedureExecutor(1659): 
Finished subprocedure(s) of pid=772, 
state=RUNNABLE:SPLIT_TABLE_REGION_POST_OPERATION, hasLock=true; 
SplitTableRegionProce
dure table=IntegrationTestBigLinkedList, 
parent=357a7a6a62c76bc2d7ab30a6cc812637, 
daughterA=b13e5d155b65a5f752f3adda78fcfb6a, 
daughterB=5be3aadcee68d91c3d1e464865550246; resume parent processing.
2018-08-05 01:28:31,345 INFO  [PEWorker-8] procedure2.ProcedureExecutor(1296): 
Finished pid=795, ppid=772, state=SUCCESS, hasLock=false; AssignProcedure 
table=IntegrationTestBigLinkedList, region=b13e5
d155b65a5f752f3adda78fcfb6a, target=e010125048016.bja,60020,1533402809226 in 
5.0280sec
{code}

2. reopenProcedure began to reopen region by moving it
{code}
2018-08-05 01:28:31,389 INFO  [PEWorker-11] 
procedure.MasterProcedureScheduler(631): pid=781, ppid=774, 
state=RUNNABLE:MOVE_REGION_UNASSIGN, hasLock=false; MoveRegionProcedure 
hri=357a7a6a62c76bc2d7ab3
0a6cc812637, source=e010125048016.bja,60020,1533402809226, 
destination=e010125048016.bja,60020,1533402809226 checking lock on 
357a7a6a62c76bc2d7ab30a6cc812637
2018-08-05 01:28:31,390 INFO  [PEWorker-3] procedure2.ProcedureExecutor(1296): 
Finished pid=772, state=SUCCESS, hasLock=false; SplitTableRegionProcedure 
table=IntegrationTestBigLinkedList, parent=357a7
a6a62c76bc2d7ab30a6cc812637, daughterA=b13e5d155b65a5f752f3adda78fcfb6a, 
daughterB=5be3aadcee68d91c3d1e464865550246 in 21.9050sec
2018-08-05 01:28:31,518 INFO  [PEWorker-11] procedure2.ProcedureExecutor(1533): 
Initialized subprocedures=[{pid=797, ppid=781, 
state=RUNNABLE:REGION_TRANSITION_DISPATCH, hasLock=false; UnassignProcedur
e table=IntegrationTestBigLinkedList, region=357a7a6a62c76bc2d7ab30a6cc812637, 
server=e010125048016.bja,60020,1533402809226}]
2018-08-05 01:28:31,530 INFO  [PEWorker-15] 
procedure.MasterProcedureScheduler(631): pid=797, ppid=781, 
state=RUNNABLE:REGION_TRANSITION_DISPATCH, hasLock=false; UnassignProcedure 
table=IntegrationTest
BigLinkedList, region=357a7a6a62c76bc2d7ab30a6cc812637, 
server=e010125048016.bja,60020,1533402809226 checking lock on 
357a7a6a62c76bc2d7ab30a6cc812637
{code}

3. MoveRegionProcdure fails since the region did not exis any more (due to 
split)
{code}
2018-08-05 01:28:31,543 ERROR [PEWorker-15] procedure2.ProcedureExecutor(1517): 
CODE-BUG: Uncaught runtime exception: pid=797, ppid=781, 
state=RUNNABLE:REGION_TRANSITION_DISPATCH, hasLock=true; Unassig
nProcedure table=IntegrationTestBigLinkedList, 
region=357a7a6a62c76bc2d7ab30a6cc812637, 
server=e010125048016.bja,60020,1533402809226
java.lang.NullPointerException
at 
java.util.concurrent.ConcurrentHashMap.get(ConcurrentHashMap.java:936)
at 
org.apache.hadoop.hbase.master.assignment.RegionStates.getOrCreateServer(RegionStates.java:1097)
at 
org.apache.hadoop.hbase.master.assignment.RegionStates.addRegionToServer(RegionStates.java:1125)
at 
org.apache.hadoop.hbase.master.assignment.AssignmentManager.markRegionAsClosing(AssignmentManager.java:1455)
at 
org.apache.hadoop.hbase.master.assignment.UnassignProcedure.updateTransition(UnassignProcedure.java:204)
at 
org.apache.hadoop.hbase.master.assignment.RegionTransitionProcedure.execute(RegionTransitionProcedure.java:349)
at 
org.apache.hadoop.hbase.master.assignment.RegionTransitionProcedure.execute(RegionTransitionProcedure.java:101)
at 
org.apache.hadoop.hbase.procedure2.Procedure.doExecute(Procedure.java:873)
at 
org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execProcedure(ProcedureExecutor.java:1498)
at 
org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1278)
at 
org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$900(ProcedureExecutor.java:76)
at 
org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1785)
{code}

We need to think about the case, and find a untimely solution for it, 
otherwise, issues like this one and HBASE-20921 will keep comming.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HBASE-21050) Exclusive lock may be held by a SUCCESS state procedure forever

2018-08-14 Thread Allan Yang (JIRA)
Allan Yang created HBASE-21050:
--

 Summary: Exclusive lock may be held by a SUCCESS state procedure 
forever
 Key: HBASE-21050
 URL: https://issues.apache.org/jira/browse/HBASE-21050
 Project: HBase
  Issue Type: Sub-task
  Components: amv2
Affects Versions: 2.0.1, 2.1.0
Reporter: Allan Yang
Assignee: Allan Yang


After HBASE-20846, we restore lock info for procedures. But, there is a case 
that the lock and be held by a already success procedure. Since the procedure 
won't execute again, the lock will held by the procedure forever.

1. All children for pid=1208 had been finished, but before procedure 1208 
awake, the master was killed
{code}
2018-08-05 02:20:14,465 INFO  [PEWorker-8] procedure2.ProcedureExecutor(1659): 
Finished subprocedure(s) of pid=1208, ppid=1206, state=RUNNABLE, hasLock=true; 
MoveRegionProcedure hri=c2a23a735f16df57299
dba6fd4599f2f, source=e010125050127.bja,60020,1533403109034, 
destination=e010125050127.bja,60020,1533403109034; resume parent processing.

2018-08-05 02:20:14,466 INFO  [PEWorker-8] procedure2.ProcedureExecutor(1296): 
Finished pid=1232, ppid=1208, state=SUCCESS, hasLock=false; AssignProcedure 
table=IntegrationTestBigLinkedList, region=c2a
23a735f16df57299dba6fd4599f2f, target=e010125050127.bja,60020,1533403109034 in 
1.5060sec
{code}

2. Master restarts, since procedure 1208 held the lock before restart, so the 
lock was resotore for it
{code}
2018-08-05 02:20:30,803 DEBUG [Thread-15] procedure2.ProcedureExecutor(456): 
Loading pid=1208, ppid=1206, state=SUCCESS, hasLock=false; MoveRegionProcedure 
hri=c2a23a735f16df57299dba6fd4599f2f, source=
e010125050127.bja,60020,1533403109034, 
destination=e010125050127.bja,60020,1533403109034

2018-08-05 02:20:30,818 DEBUG [Thread-15] procedure2.Procedure(898): pid=1208, 
ppid=1206, state=SUCCESS, hasLock=false; MoveRegionProcedure 
hri=c2a23a735f16df57299dba6fd4599f2f, source=e010125050127.bj
a,60020,1533403109034, destination=e010125050127.bja,60020,1533403109034 held 
the lock before restarting, call acquireLock to restore it.

2018-08-05 02:20:30,818 INFO  [Thread-15] 
procedure.MasterProcedureScheduler(631): pid=1208, ppid=1206, state=SUCCESS, 
hasLock=false; MoveRegionProcedure hri=c2a23a735f16df57299dba6fd4599f2f, 
source=e0
10125050127.bja,60020,1533403109034, 
destination=e010125050127.bja,60020,1533403109034 checking lock on 
c2a23a735f16df57299dba6fd4599f2f
{code}

3. Since procedure 1208 is success, it won't execute later, so the lock will be 
held by it forever

We need to check the state of the procedure before restoring locks, if the 
procedure is already finished (success or rollback), we do not need to acquire 
lock for it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HBASE-21041) Memstore's heap size will be decreased to minus zero after flush

2018-08-13 Thread Allan Yang (JIRA)
Allan Yang created HBASE-21041:
--

 Summary: Memstore's heap size will be decreased to minus zero 
after flush
 Key: HBASE-21041
 URL: https://issues.apache.org/jira/browse/HBASE-21041
 Project: HBase
  Issue Type: Bug
Affects Versions: 2.0.1, 2.1.0
Reporter: Allan Yang
Assignee: Allan Yang


When creating an active mutable segment (MutableSegment) in memstore, 
MutableSegment's deep overheap (208 bytes) was added to its heap size, but not 
to the region's memstore's heap size. And so was the immutable 
segment(CSLMImmutableSegment) which the mutable segment turned into (additional 
8 bytes ) later. So after one flush, the memstore's heapsize will be decreased 
to -216 bytes, The minus number will accumulate after every flush. 
CompactingMemstore has this problem too.

We need to record the overhead for CSLMImmutableSegment and MutableSegment to 
the corresponding region's memstore size.

For CellArrayImmutableSegment,  CellChunkImmutableSegment and 
CompositeImmutableSegment , it is not necessary to do so, because inside 
CompactingMemstore, the overheads are already be taken care of when transfer a 
CSLMImmutableSegment into them.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HBASE-21035) Meta Table should be able to online even if all procedures are lost

2018-08-10 Thread Allan Yang (JIRA)
Allan Yang created HBASE-21035:
--

 Summary: Meta Table should be able to online even if all 
procedures are lost
 Key: HBASE-21035
 URL: https://issues.apache.org/jira/browse/HBASE-21035
 Project: HBase
  Issue Type: Sub-task
Affects Versions: 2.1.0
Reporter: Allan Yang
Assignee: Allan Yang


After HBASE-20708, we changed the way we init after master starts. It will only 
check WAL dirs and compare to Zookeeper RS nodes to decide which server need to 
expire. For servers which's dir is ending with 'SPLITTING', we assure that 
there will be a SCP for it.

But, if the server with the meta region crashed before master restarts, and if 
all the procedure wals are lost (due to bug, or deleted manually, whatever), 
the new restarted master will be stuck when initing. Since no one will bring 
meta region online.

Although it is a anomaly case, but I think no matter what happens, we need to 
online meta region. Otherwise, we are sitting ducks, noting can be done.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Reopened] (HBASE-20976) SCP can be scheduled multiple times for the same RS

2018-08-10 Thread Allan Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-20976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allan Yang reopened HBASE-20976:


> SCP can be scheduled multiple times for the same RS
> ---
>
> Key: HBASE-20976
> URL: https://issues.apache.org/jira/browse/HBASE-20976
> Project: HBase
>  Issue Type: Sub-task
>Affects Versions: 2.0.1
>    Reporter: Allan Yang
>    Assignee: Allan Yang
>Priority: Major
> Fix For: 2.0.2
>
> Attachments: HBASE-20976.branch-2.0.001.patch
>
>
> SCP can be scheduled multiple times for the same RS:
> 1. a RS crashed, a SCP was submitted for it
> 2. before this SCP finish, the Master crashed
> 3. The new master will scan the meta table and find some region is still open 
> on a dead server
> 4. The new master submit a SCP for the dead server again
> The two SCP for the same RS can even execute concurrently if without 
> HBASE-20846…
> Provided a test case to reproduce this issue and a fix solution in the patch.
> Another case that SCP might be scheduled multiple times for the same RS(with 
> HBASE-20708.):
> 1.  a RS crashed, a SCP was submitted for it
> 2. A new RS on the same host started, the old RS's Serveranme was remove from 
> DeadServer.deadServers
> 3. after the SCP passed the Handle_RIT state, a UnassignProcedure need to 
> send a close region operation to the crashed RS
> 4. The UnassignProcedure's dispatch failed since 'NoServerDispatchException'
> 5. Begin to expire the RS, but only find it not online and not in deadServer 
> list, so a SCP was submitted for the same RS again
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HBASE-21031) Memory leak if replay edits failed during region opening

2018-08-09 Thread Allan Yang (JIRA)
Allan Yang created HBASE-21031:
--

 Summary: Memory leak if replay edits failed during region opening
 Key: HBASE-21031
 URL: https://issues.apache.org/jira/browse/HBASE-21031
 Project: HBase
  Issue Type: Bug
Affects Versions: 2.0.1, 2.1.0
Reporter: Allan Yang
Assignee: Allan Yang


Due to HBASE-21029, when replaying edits with a lot of same cells, the memstore 
won't flush,  a exception will throw when all heap space was used:
{code}
2018-08-06 15:52:27,590 ERROR 
[RS_OPEN_REGION-regionserver/hb-bp10cw4ejoy0a2f3f-009:16020-2] 
handler.OpenRegionHandler(302): Failed open of 
region=hbase_test,dffa78,1531227033378.cbf9a2daf3aaa0c7e931e9c9a7b53f41., 
starting to roll back the global memstore size.
java.lang.OutOfMemoryError: Java heap space
at java.nio.HeapByteBuffer.(HeapByteBuffer.java:57)
at java.nio.ByteBuffer.allocate(ByteBuffer.java:335)
at 
org.apache.hadoop.hbase.regionserver.OnheapChunk.allocateDataBuffer(OnheapChunk.java:41)
at org.apache.hadoop.hbase.regionserver.Chunk.init(Chunk.java:104)
at 
org.apache.hadoop.hbase.regionserver.ChunkCreator.getChunk(ChunkCreator.java:226)
at 
org.apache.hadoop.hbase.regionserver.ChunkCreator.getChunk(ChunkCreator.java:180)
at 
org.apache.hadoop.hbase.regionserver.ChunkCreator.getChunk(ChunkCreator.java:163)
at 
org.apache.hadoop.hbase.regionserver.MemStoreLABImpl.getOrMakeChunk(MemStoreLABImpl.java:273)
at 
org.apache.hadoop.hbase.regionserver.MemStoreLABImpl.copyCellInto(MemStoreLABImpl.java:148)
at 
org.apache.hadoop.hbase.regionserver.MemStoreLABImpl.copyCellInto(MemStoreLABImpl.java:111)
at 
org.apache.hadoop.hbase.regionserver.Segment.maybeCloneWithAllocator(Segment.java:178)
at 
org.apache.hadoop.hbase.regionserver.AbstractMemStore.maybeCloneWithAllocator(AbstractMemStore.java:287)
at 
org.apache.hadoop.hbase.regionserver.AbstractMemStore.add(AbstractMemStore.java:107)
at org.apache.hadoop.hbase.regionserver.HStore.add(HStore.java:706)
at 
org.apache.hadoop.hbase.regionserver.HRegion.restoreEdit(HRegion.java:5494)
at 
org.apache.hadoop.hbase.regionserver.HRegion.replayRecoveredEdits(HRegion.java:4608)
at 
org.apache.hadoop.hbase.regionserver.HRegion.replayRecoveredEditsIfAny(HRegion.java:4404)
{code}
After this exception, the memstore did not roll back, and since MSLAB is used, 
all the chunk allocated won't release for ever. Those memory is leak forever...

We need to rollback the memory if open region fails(For now, only global 
memstore size is decreased after failure).

Another problem is that we use replayEditsPerRegion in RegionServerAccounting 
to record how many memory used during replaying. And decrease the global 
memstore size if replay fails. This is not right, since during replaying, we 
may also flush the memstore, the size in the map of replayEditsPerRegion is not 
accurate at all! 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HBASE-21029) Miscount of memstore's heap/offheap size if same cell was put

2018-08-08 Thread Allan Yang (JIRA)
Allan Yang created HBASE-21029:
--

 Summary: Miscount of memstore's heap/offheap size if same cell was 
put
 Key: HBASE-21029
 URL: https://issues.apache.org/jira/browse/HBASE-21029
 Project: HBase
  Issue Type: Bug
Affects Versions: 2.0.1, 2.1.0
Reporter: Allan Yang
Assignee: Allan Yang


We are now using memstore.heapSize() + memstore.offheapSize() to decide whether 
a flush is needed. But, if a same cell was put again in memstore, only the 
memstore's dataSize will be increased, the heap/offheap size won't. Actually, 
if MSLAB is used, the heap/offheap will increase no matter the cell is added or 
not. IIRC, memstore's heap/offheap size should always bigger than data size. We 
introduced heap/offheap size besides data size to reflect memory footprint more 
precisely. 
{code}
// If there's already a same cell in the CellSet and we are using MSLAB, we 
must count in the
// MSLAB allocation size as well, or else there will be memory leak 
(occupied heap size larger
// than the counted number)
if (succ || mslabUsed) {
  cellSize = getCellLength(cellToAdd);
}
// heap/offheap size is changed only if the cell is truly added in the 
cellSet
long heapSize = heapSizeChange(cellToAdd, succ);
long offHeapSize = offHeapSizeChange(cellToAdd, succ);
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (HBASE-20976) SCP can be scheduled multiple times for the same RS

2018-08-07 Thread Allan Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-20976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allan Yang resolved HBASE-20976.

Resolution: Invalid

> SCP can be scheduled multiple times for the same RS
> ---
>
> Key: HBASE-20976
> URL: https://issues.apache.org/jira/browse/HBASE-20976
> Project: HBase
>  Issue Type: Sub-task
>Affects Versions: 2.0.1
>    Reporter: Allan Yang
>    Assignee: Allan Yang
>Priority: Major
> Fix For: 2.0.2
>
> Attachments: HBASE-20976.branch-2.0.001.patch
>
>
> SCP can be scheduled multiple times for the same RS:
> 1. a RS crashed, a SCP was submitted for it
> 2. before this SCP finish, the Master crashed
> 3. The new master will scan the meta table and find some region is still open 
> on a dead server
> 4. The new master submit a SCP for the dead server again
> The two SCP for the same RS can even execute concurrently if without 
> HBASE-20846…
> Provided a test case to reproduce this issue and a fix solution in the patch.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HBASE-21003) Fix the flaky TestSplitOrMergeStatus

2018-08-03 Thread Allan Yang (JIRA)
Allan Yang created HBASE-21003:
--

 Summary: Fix the flaky TestSplitOrMergeStatus
 Key: HBASE-21003
 URL: https://issues.apache.org/jira/browse/HBASE-21003
 Project: HBase
  Issue Type: Bug
Affects Versions: 2.0.1, 2.1.0
Reporter: Allan Yang
Assignee: Allan Yang


TestSplitOrMergeStatus.testSplitSwitch() is flaky because :
{code}
//Set the split switch to false
boolean[] results = admin.setSplitOrMergeEnabled(false, false, 
MasterSwitchType.SPLIT);
..
//Split the region
admin.split(t.getName());
int count = admin.getTableRegions(tableName).size();
assertTrue(originalCount == count);
//Set the split switch to true, actually, the last split procedure may not 
started yet on master
//So, after setting the switch to true, the last split operation may 
success, which is not 
//excepted   
results = admin.setSplitOrMergeEnabled(true, false, MasterSwitchType.SPLIT);
assertEquals(1, results.length);
assertFalse(results[0]);
//Since last split success, split the region again will end up with a 
//DoNotRetryRegionException here
admin.split(t.getName());
{code}

{code}
org.apache.hadoop.hbase.client.DoNotRetryRegionException: 
3f16a57c583e6ecf044c5b7de2e97121 is not OPEN; 
regionState={3f16a57c583e6ecf044c5b7de2e97121 state=SPLITTING, 
ts=1533239385789, server=asf911.gq1.ygridcore.net,60061,1533239369899}
 at 
org.apache.hadoop.hbase.master.procedure.AbstractStateMachineTableProcedure.checkOnline(AbstractStateMachineTableProcedure.java:191)
 at 
org.apache.hadoop.hbase.master.assignment.SplitTableRegionProcedure.(SplitTableRegionProcedure.java:112)
 at 
org.apache.hadoop.hbase.master.assignment.AssignmentManager.createSplitProcedure(AssignmentManager.java:756)
 at org.apache.hadoop.hbase.master.HMaster$2.run(HMaster.java:1722)
 at 
org.apache.hadoop.hbase.master.procedure.MasterProcedureUtil.submitProcedure(MasterProcedureUtil.java:131)
 at org.apache.hadoop.hbase.master.HMaster.splitRegion(HMaster.java:1714)
 at 
org.apache.hadoop.hbase.master.MasterRpcServices.splitRegion(MasterRpcServices.java:797)
 at 
org.apache.hadoop.hbase.shaded.protobuf.generated.MasterProtos$MasterService$2.callBlockingMethod(MasterProtos.java)
 at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:409)
 at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:130)
 at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:324)
 at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:304)
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: [ANNOUNCE] New Committer: Toshihiro Suzuki

2018-08-01 Thread Allan Yang
Congratulations! Toshihiro Suzuki
Best Regards
Allan Yang


Josh Elser  于2018年8月1日周三 下午10:47写道:

> On behalf of the HBase PMC, I'm pleased to announce that Toshihiro
> Suzuki (aka Toshi, brfn169) has accepted our invitation to become an
> HBase committer. This was extended to Toshi as a result of his
> consistent, high-quality contributions to HBase. Thanks for all of your
> hard work, and we look forward to working with you even more!
>
> Please join me in extending a hearty "congrats" to Toshi!
>
> - Josh
>


[jira] [Created] (HBASE-20990) One operation in procedure batch throws an exception will cause all RegionTransitionProcedures receive the same exception

2018-07-31 Thread Allan Yang (JIRA)
Allan Yang created HBASE-20990:
--

 Summary: One operation in procedure batch throws an exception will 
cause all RegionTransitionProcedures receive the same exception
 Key: HBASE-20990
 URL: https://issues.apache.org/jira/browse/HBASE-20990
 Project: HBase
  Issue Type: Sub-task
  Components: amv2
Affects Versions: 2.0.1, 2.1.0
Reporter: Allan Yang
Assignee: Allan Yang


In AMv2, we batch open/close region operations and call RS with 
executeProcedures API. But, in this API, if one of the region's operations 
throws an exception, all the operations in the batch will receive the same 
exception. Actually, some of the operations in the batch is executing normally 
in the RS.
I think we should try catch exceptions respectively, and call remoteCallFailed 
or remoteCallCompleted in RegionTransitionProcedure respectively. 
Otherwise, there will be some very strange behave. Such as this one:
{code}
2018-07-18 02:56:18,506 WARN  [RSProcedureDispatcher-pool3-t1] 
assignment.RegionTransitionProcedure(226): Remote call failed 
e010125048016.bja,60020,1531848989401; pid=8362, ppid=8272, state=RUNNABLE:R
EGION_TRANSITION_DISPATCH; AssignProcedure table=IntegrationTestBigLinkedList, 
region=0beb8ea4e2f239fc082be7cefede1427, 
target=e010125048016.bja,60020,1531848989401; rit=OPENING, 
location=e010125048016
.bja,60020,1531848989401; exception=NotServingRegionException
{code}
The AssignProcedure failed with a NotServingRegionException, what??? It is very 
strange, actually, the AssignProcedure successes on the RS, another CloseRegion 
operation failed in the operation batch was causing the exception.
To correct this, we need to modify the response of executeProcedures API, which 
is the ExecuteProceduresResponse proto, to return infos(status, exceptions) per 
operation.
This issue alone won't cause much trouble, so not so hurry to change the behave 
here, but indeed we need to consider this one when we want do some reconstruct 
to AMv2.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Reopened] (HBASE-20976) SCP can be scheduled multiple times for the same RS

2018-07-30 Thread Allan Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-20976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allan Yang reopened HBASE-20976:


> SCP can be scheduled multiple times for the same RS
> ---
>
> Key: HBASE-20976
> URL: https://issues.apache.org/jira/browse/HBASE-20976
> Project: HBase
>  Issue Type: Sub-task
>Affects Versions: 2.0.1
>    Reporter: Allan Yang
>    Assignee: Allan Yang
>Priority: Major
> Attachments: HBASE-20976.branch-2.0.001.patch
>
>
> SCP can be scheduled multiple times for the same RS:
> 1. a RS crashed, a SCP was submitted for it
> 2. before this SCP finish, the Master crashed
> 3. The new master will scan the meta table and find some region is still open 
> on a dead server
> 4. The new master submit a SCP for the dead server again
> The two SCP for the same RS can even execute concurrently if without 
> HBASE-20846…
> Provided a test case to reproduce this issue and a fix solution in the patch.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HBASE-20976) SCP can be scheduled multiple times for the same RS

2018-07-30 Thread Allan Yang (JIRA)
Allan Yang created HBASE-20976:
--

 Summary: SCP can be scheduled multiple times for the same RS
 Key: HBASE-20976
 URL: https://issues.apache.org/jira/browse/HBASE-20976
 Project: HBase
  Issue Type: Sub-task
Affects Versions: 2.0.1, 2.1.0
Reporter: Allan Yang
Assignee: Allan Yang


SCP can be scheduled multiple times for the same RS:
1. a RS crashed, a SCP was submitted for it
2. before this SCP finish, the Master crashed
3. The new master will scan the meta table and find some region is still open 
on a dead server
4. The new master submit a SCP for the dead server again
The two SCP for the same RS can even execute concurrently if without 
HBASE-20846…

Provided a test case to reproduce this issue and a fix solution in the patch.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HBASE-20975) Lock may not be taken while rolling back procedure

2018-07-30 Thread Allan Yang (JIRA)
Allan Yang created HBASE-20975:
--

 Summary: Lock may not be taken while rolling back procedure
 Key: HBASE-20975
 URL: https://issues.apache.org/jira/browse/HBASE-20975
 Project: HBase
  Issue Type: Sub-task
  Components: amv2
Affects Versions: 2.0.1, 2.1.0
Reporter: Allan Yang
Assignee: Allan Yang


Find this one when investigating HBASE-20921, too.

Here is some code from executeRollback in ProcedureExecutor.java.
{code}
boolean reuseLock = false;
while (stackTail --> 0) {
  final Procedure proc = subprocStack.get(stackTail);

  LockState lockState;
  //If reuseLock, then don't acquire the lock
  if (!reuseLock && (lockState = acquireLock(proc)) != 
LockState.LOCK_ACQUIRED) {
return lockState;
  }

  lockState = executeRollback(proc);
  boolean abortRollback = lockState != LockState.LOCK_ACQUIRED;
  abortRollback |= !isRunning() || !store.isRunning();

  //If the next procedure in the stack is the current one, then reuseLock = 
true
  reuseLock = stackTail > 0 && (subprocStack.get(stackTail - 1) == proc) && 
!abortRollback;
  //If reuseLock, don't releaseLock
  if (!reuseLock) {
releaseLock(proc, false);
  }

  if (abortRollback) {
return lockState;
  }

  subprocStack.remove(stackTail);

  if (proc.isYieldAfterExecutionStep(getEnvironment())) {
return LockState.LOCK_YIELD_WAIT;
  }

  //But, here, lock is released no matter reuseLock is true or false
  if (proc != rootProc) {
execCompletionCleanup(proc);
  }
}
{code}

You can see my comments in the code above, reuseLock can cause the procedure 
executing(rollback) without a lock. Though I haven't found any bugs introduced 
by this issue, it is indeed a potential bug need to fix.
I think we can just remove the reuseLock logic. Acquire and release lock every 
time. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HBASE-20973) ArrayIndexOutOfBoundsException when rolling back procedure

2018-07-29 Thread Allan Yang (JIRA)
Allan Yang created HBASE-20973:
--

 Summary: ArrayIndexOutOfBoundsException when rolling back procedure
 Key: HBASE-20973
 URL: https://issues.apache.org/jira/browse/HBASE-20973
 Project: HBase
  Issue Type: Sub-task
  Components: amv2
Affects Versions: 2.0.1, 2.1.0
Reporter: Allan Yang
Assignee: Allan Yang


Find this one while investigating HBASE-20921. After the root 
procedure(ModifyTableProcedure  in this case) rolled back, a 
ArrayIndexOutOfBoundsException was thrown
{code}
2018-07-18 01:39:10,241 ERROR [PEWorker-8] procedure2.ProcedureExecutor(159): 
CODE-BUG: Uncaught runtime exception for pid=5973, 
state=FAILED:MODIFY_TABLE_REOPEN_ALL_REGIONS, exception=java.lang.NullPo
interException via CODE-BUG: Uncaught runtime exception: pid=5974, ppid=5973, 
state=RUNNABLE:REOPEN_TABLE_REGIONS_CONFIRM_REOPENED; 
ReopenTableRegionsProcedure table=IntegrationTestBigLinkedList:java.l
ang.NullPointerException; ModifyTableProcedure 
table=IntegrationTestBigLinkedList
java.lang.UnsupportedOperationException: unhandled 
state=MODIFY_TABLE_REOPEN_ALL_REGIONS
at 
org.apache.hadoop.hbase.master.procedure.ModifyTableProcedure.rollbackState(ModifyTableProcedure.java:147)
at 
org.apache.hadoop.hbase.master.procedure.ModifyTableProcedure.rollbackState(ModifyTableProcedure.java:50)
at 
org.apache.hadoop.hbase.procedure2.StateMachineProcedure.rollback(StateMachineProcedure.java:203)
at 
org.apache.hadoop.hbase.procedure2.Procedure.doRollback(Procedure.java:864)
at 
org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1353)
at 
org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1309)
at 
org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1178)
at 
org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$800(ProcedureExecutor.java:75)
at 
org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1741)
2018-07-18 01:39:10,243 WARN  [PEWorker-8] procedure2.ProcedureExecutor(1756): 
Worker terminating UNNATURALLY null
java.lang.ArrayIndexOutOfBoundsException: 1
at 
org.apache.hadoop.hbase.procedure2.store.ProcedureStoreTracker$BitSetNode.updateState(ProcedureStoreTracker.java:405)
at 
org.apache.hadoop.hbase.procedure2.store.ProcedureStoreTracker$BitSetNode.delete(ProcedureStoreTracker.java:178)
at 
org.apache.hadoop.hbase.procedure2.store.ProcedureStoreTracker.delete(ProcedureStoreTracker.java:513)
at 
org.apache.hadoop.hbase.procedure2.store.ProcedureStoreTracker.delete(ProcedureStoreTracker.java:505)
at 
org.apache.hadoop.hbase.procedure2.store.wal.WALProcedureStore.updateStoreTracker(WALProcedureStore.java:741)
at 
org.apache.hadoop.hbase.procedure2.store.wal.WALProcedureStore.pushData(WALProcedureStore.java:691)
at 
org.apache.hadoop.hbase.procedure2.store.wal.WALProcedureStore.delete(WALProcedureStore.java:603)
at 
org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1387)
at 
org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1309)
at 
org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1178)
at 
org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$800(ProcedureExecutor.java:75)
at 
org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1741)
{code}

This is a very serious condition, After this exception thrown, the exclusive 
lock held by ModifyTableProcedure was never released. All the procedure against 
this table were blocked. Until the master restarted, and since the lock info 
for the procedure won't be restored, the other procedures can go again, it is 
quite embarrassing that a bug save us...(this bug will be fixed in HBASE-20846)

I tried to reproduce this one using the test case in HBASE-20921 but I just 
can't reproduce it.
A easy way to resolve this is add a try catch, making sure no matter what 
happens, the table's exclusive lock can always be relased.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HBASE-20921) Possible NPE in ReopenTableRegionsProcedure

2018-07-23 Thread Allan Yang (JIRA)
Allan Yang created HBASE-20921:
--

 Summary: Possible NPE in ReopenTableRegionsProcedure
 Key: HBASE-20921
 URL: https://issues.apache.org/jira/browse/HBASE-20921
 Project: HBase
  Issue Type: Sub-task
  Components: amv2
Affects Versions: 2.1.0, 3.0.0, 2.0.2
Reporter: Allan Yang
Assignee: Allan Yang


After HBASE-20752, we issue a ReopenTableRegionsProcedure in 
ModifyTableProcedure to ensure all regions are reopened.
But, ModifyTableProcedure and ReopenTableRegionsProcedure do not hold the lock 
(why?), so there is a chance that while ModifyTableProcedure  executing, a 
merge/split procedure can be executed at the same time.
So, when ReopenTableRegionsProcedure reaches the state of 
"REOPEN_TABLE_REGIONS_CONFIRM_REOPENED", some of the persisted regions to check 
is actually not exists, thus a NPE will throw.
{code}
2018-07-18 01:38:57,528 INFO  [PEWorker-9] procedure2.ProcedureExecutor(1246): 
Finished pid=6110, state=SUCCESS; MergeTableRegionsProcedure 
table=IntegrationTestBigLinkedList, regions=[845d286231eb01b7
1aeaa17b0e30058d, 4a46ab0918c99cada72d5336ad83a828], forcibly=false in 
10.8610sec
2018-07-18 01:38:57,530 ERROR [PEWorker-8] procedure2.ProcedureExecutor(1478): 
CODE-BUG: Uncaught runtime exception: pid=5974, ppid=5973, 
state=RUNNABLE:REOPEN_TABLE_REGIONS_CONFIRM_REOPENED; ReopenTab
leRegionsProcedure table=IntegrationTestBigLinkedList
java.lang.NullPointerException
at 
org.apache.hadoop.hbase.master.assignment.RegionStates.checkReopened(RegionStates.java:651)
at 
java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
at 
java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1382)
at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
at 
java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
at 
java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708)
at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
at 
java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499)
at 
org.apache.hadoop.hbase.master.procedure.ReopenTableRegionsProcedure.executeFromState(ReopenTableRegionsProcedure.java:102)
at 
org.apache.hadoop.hbase.master.procedure.ReopenTableRegionsProcedure.executeFromState(ReopenTableRegionsProcedure.java:45)
at 
org.apache.hadoop.hbase.procedure2.StateMachineProcedure.execute(StateMachineProcedure.java:184)
at 
org.apache.hadoop.hbase.procedure2.Procedure.doExecute(Procedure.java:850)
at 
org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execProcedure(ProcedureExecutor.java:1453)
at 
org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1221)
at 
org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$800(ProcedureExecutor.java:75)
at 
org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1741)
{code}

I think we need to renew the region list of the table at the 
"REOPEN_TABLE_REGIONS_CONFIRM_REOPENED" state. For the regions which are merged 
or split, we do not need to check it. Since we can make sure that they are 
opened after we made change to table descriptor.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (HBASE-20864) RS was killed due to master thought the region should be on a already dead server

2018-07-19 Thread Allan Yang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-20864?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allan Yang resolved HBASE-20864.

   Resolution: Resolved
Fix Version/s: (was: 2.0.2)

HBASE-20792 solved this issue

> RS was killed due to master thought the region should be on a already dead 
> server
> -
>
> Key: HBASE-20864
> URL: https://issues.apache.org/jira/browse/HBASE-20864
> Project: HBase
>  Issue Type: Sub-task
>Affects Versions: 2.0.0
>    Reporter: Allan Yang
>Assignee: Allan Yang
>Priority: Major
> Attachments: log.zip
>
>
> When I was running ITBLL with our internal 2.0.0 version(with 2.0.1 
> backported and with other two issues: HBASE-20706, HBASE-20752). I found two 
> of my RS killed by master since master has a different region state with 
> those RS. It is very strange that master thought these region should be on a 
> already dead server. There might be a serious bug, but I haven't found it 
> yet. Here is the process:
> 1. e010125048153.bja,60020,1531137365840 is crashed, and clearly 
> 4423e4182457c5b573729be4682cc3a3 was assigned to 
> e010125049164.bja,60020,1531136465378 during ServerCrashProcedure
> {code:java}
> 2018-07-09 20:03:32,443 INFO  [PEWorker-10] procedure.ServerCrashProcedure: 
> Start pid=2303, state=RUNNABLE:SERVER_CRASH_START; ServerCrashProcedure 
> server=e010125048153.bja,60020,1531137365840, splitWal=true, meta=false
> 2018-07-09 20:03:39,220 DEBUG 
> [RpcServer.default.FPBQ.Fifo.handler=294,queue=24,port=6] 
> assignment.RegionTransitionProcedure: Received report OPENED seqId=16021, 
> pid=2305, ppid=2303, state=RUNNABLE:REGION_TRANSITION_DISPATCH; 
> AssignProcedure table=IntegrationTestBigLinkedList, 
> region=4423e4182457c5b573729be4682cc3a3; rit=OPENING, 
> location=e010125049164.bja,60020,1531136465378
> 2018-07-09 20:03:39,220 INFO  [PEWorker-13] assignment.RegionStateStore: 
> pid=2305 updating hbase:meta row=4423e4182457c5b573729be4682cc3a3, 
> regionState=OPEN, openSeqNum=16021, 
> regionLocation=e010125049164.bja,60020,1531136465378
> 2018-07-09 20:03:43,190 INFO  [PEWorker-12] procedure2.ProcedureExecutor: 
> Finished pid=2303, state=SUCCESS; ServerCrashProcedure 
> server=e010125048153.bja,60020,1531137365840, splitWal=true, meta=false in 
> 10.7490sec
> {code}
> 2. A modify table happened later, the 4423e4182457c5b573729be4682cc3a3 was 
> reopend on e010125049164.bja,60020,1531136465378
> {code:java}
> 2018-07-09 20:04:39,929 DEBUG 
> [RpcServer.default.FPBQ.Fifo.handler=295,queue=25,port=6] 
> assignment.RegionTransitionProcedure: Received report OPENED seqId=16024, 
> pid=2351, ppid=2314, state=RUNNABLE:REGION_TRANSITION_DISPATCH; 
> AssignProcedure table=IntegrationTestBigLinkedList, 
> region=4423e4182457c5b573729be4682cc3a3, 
> target=e010125049164.bja,60020,1531136465378; rit=OPENING, 
> location=e010125049164.bja,60020,1531136465378
> 2018-07-09 20:04:40,554 INFO  [PEWorker-6] assignment.RegionStateStore: 
> pid=2351 updating hbase:meta row=4423e4182457c5b573729be4682cc3a3, 
> regionState=OPEN, openSeqNum=16024, 
> regionLocation=e010125049164.bja,60020,1531136465378
> {code}
> 3. Active master was killed, the backup master took over, but when loading 
> meta entry, it clearly showed 4423e4182457c5b573729be4682cc3a3 is on the 
> privous dead server e010125048153.bja,60020,1531137365840. That is very very 
> strange!!!
> {code:java}
> 2018-07-09 20:06:17,985 INFO  [master/e010125048016:6] 
> assignment.RegionStateStore: Load hbase:meta entry 
> region=4423e4182457c5b573729be4682cc3a3, regionState=OPEN, 
> lastHost=e010125049164.bja,60020,1531136465378, 
> regionLocation=e010125048153.bja,60020,1531137365840, openSeqNum=16024
> {code}
> 4. the rs was killed
> {code:java}
> 2018-07-09 20:06:20,265 WARN  
> [RpcServer.default.FPBQ.Fifo.handler=297,queue=27,port=6] 
> assignment.AssignmentManager: Killing e010125049164.bja,60020,1531136465378: 
> rit=OPEN, location=e010125048153.bja,60020,1531137365840, 
> table=IntegrationTestBigLinkedList, 
> region=4423e4182457c5b573729be4682cc3a3reported OPEN on 
> server=e010125049164.bja,60020,1531136465378 but state has otherwise.
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HBASE-20903) backport HBASE-20792 "info:servername and info:sn inconsistent for OPEN region" to branch-2.0

2018-07-17 Thread Allan Yang (JIRA)
Allan Yang created HBASE-20903:
--

 Summary: backport HBASE-20792 "info:servername and info:sn 
inconsistent for OPEN region" to branch-2.0
 Key: HBASE-20903
 URL: https://issues.apache.org/jira/browse/HBASE-20903
 Project: HBase
  Issue Type: Bug
Affects Versions: 2.0.1
Reporter: Allan Yang
Assignee: Allan Yang
 Fix For: 2.0.2


As discussed in HBASE-20864. This is a very serious bug which can cause RS 
being killed or data loss. Should be backported to branch-2.0



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HBASE-20893) Data loss if splitting region while ServerCrashProcedure executing

2018-07-16 Thread Allan Yang (JIRA)
Allan Yang created HBASE-20893:
--

 Summary: Data loss if splitting region while ServerCrashProcedure 
executing
 Key: HBASE-20893
 URL: https://issues.apache.org/jira/browse/HBASE-20893
 Project: HBase
  Issue Type: Sub-task
Affects Versions: 2.0.1, 3.0.0, 2.1.0
Reporter: Allan Yang
Assignee: Allan Yang


Similar case as HBASE-20878.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HBASE-20878) Data loss if merging regions while ServerCrashProcedure executing

2018-07-12 Thread Allan Yang (JIRA)
Allan Yang created HBASE-20878:
--

 Summary: Data loss if merging regions while ServerCrashProcedure 
executing
 Key: HBASE-20878
 URL: https://issues.apache.org/jira/browse/HBASE-20878
 Project: HBase
  Issue Type: Bug
  Components: amv2
Affects Versions: 2.0.1, 3.0.0, 2.1.0
Reporter: Allan Yang
Assignee: Allan Yang


In MergeTableRegionsProcedure, we close the regions to merge using 
UnassignProcedure. But, if the RS these regions on is crashed, a 
ServerCrashProcedure will execute at the same time. UnassignProcedures will be 
blocks until all logs are split. But since these regions are closed for 
merging, the regions won't open again, the recovered.edit in the region dir 
won't be replay, thus, data will loss.
I provided a test to repo this case. I seriously doubt Split region procedure 
also has this kind of problem. I will check later



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HBASE-20870) Wrong HBase root dir in ITBLL's Search Tool

2018-07-11 Thread Allan Yang (JIRA)
Allan Yang created HBASE-20870:
--

 Summary: Wrong HBase root dir in ITBLL's Search Tool
 Key: HBASE-20870
 URL: https://issues.apache.org/jira/browse/HBASE-20870
 Project: HBase
  Issue Type: Bug
  Components: integration tests
Affects Versions: 2.0.1, 3.0.0, 2.1.0
Reporter: Allan Yang
Assignee: Allan Yang


When using IntegrationTestBigLinkedList's Search tools, it always fails since 
it tries to read WALs in a wrong HBase root dir. Turned out that when 
initializing IntegrationTestingUtility in IntegrationTestBigLinkedList, its 
super class HBaseTestingUtility will change hbase.rootdir to a local random 
dir. It is not wrong since HBaseTestingUtility is mostly used by Minicluster. 
But for IntegrationTest runs on distributed clusters, we should change it back.
 Here is the error info.
{code:java}
2018-07-11 16:35:49,679 DEBUG [main] hbase.HBaseCommonTestingUtility: Setting 
hbase.rootdir to 
/home/hadoop/target/test-data/deb67611-2737-4696-abe9-32a7783df7bb
2018-07-11 16:35:50,736 ERROR [main] util.AbstractHBaseTool: Error running 
command-line tool java.io.FileNotFoundException: File 
file:/home/hadoop/target/test-data/deb67611-2737-4696-abe9-32a7783df7bb/WALs 
does not exist
at 
org.apache.hadoop.fs.RawLocalFileSystem.listStatus(RawLocalFileSystem.java:431)
at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1517)
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HBASE-20867) RS may got killed while master restarts

2018-07-10 Thread Allan Yang (JIRA)
Allan Yang created HBASE-20867:
--

 Summary: RS may got killed while master restarts
 Key: HBASE-20867
 URL: https://issues.apache.org/jira/browse/HBASE-20867
 Project: HBase
  Issue Type: Bug
Affects Versions: 2.0.1, 3.0.0, 2.1.0
Reporter: Allan Yang
Assignee: Allan Yang


If the master is dispatching a RPC call to RS when aborting. A connection 
exception may be thrown by the RPC layer(A IOException with "Connection closed" 
message in this case). The 
RSProcedureDispatcher will regard is as an un-retryable exception and pass it 
to UnassignProcedue.remoteCallFailed, which will expire the RS.
Actually, the RS is very healthy, only the master is restarting.
I think we should deal with those kinds of connection exceptions in 
RSProcedureDispatcher and retry the rpc call



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HBASE-20864) RS was killed due to master thought the region should be on a already dead server

2018-07-10 Thread Allan Yang (JIRA)
Allan Yang created HBASE-20864:
--

 Summary: RS was killed due to master thought the region should be 
on a already dead server
 Key: HBASE-20864
 URL: https://issues.apache.org/jira/browse/HBASE-20864
 Project: HBase
  Issue Type: Bug
Affects Versions: 2.0.0
Reporter: Allan Yang


When I was running ITBLL with our internal 2.0.0 version(with 2.0.1 backported 
and with other two issues: HBASE-20706, HBASE-20752). I found two of my RS 
killed by master since master has a different region state with those RS. It is 
very strange that master thought these region should be on a already dead 
server. There might be a serious bug, but I haven't found it yet. Here is the 
process:


 1. e010125048153.bja,60020,1531137365840 is crashed, and clearly 
4423e4182457c5b573729be4682cc3a3 was assigned to 
e010125049164.bja,60020,1531136465378 during ServerCrashProcedure
{code:java}
2018-07-09 20:03:32,443 INFO  [PEWorker-10] procedure.ServerCrashProcedure: 
Start pid=2303, state=RUNNABLE:SERVER_CRASH_START; ServerCrashProcedure 
server=e010125048153.bja,60020,1531137365840, splitWa
l=true, meta=false
2018-07-09 20:03:39,220 DEBUG 
[RpcServer.default.FPBQ.Fifo.handler=294,queue=24,port=6] 
assignment.RegionTransitionProcedure: Received report OPENED seqId=16021, 
pid=2305, ppid=2303, state=RUNNABLE
:REGION_TRANSITION_DISPATCH; AssignProcedure 
table=IntegrationTestBigLinkedList, region=4423e4182457c5b573729be4682cc3a3; 
rit=OPENING, location=e010125049164.bja,60020,1531136465378
2018-07-09 20:03:39,220 INFO  [PEWorker-13] assignment.RegionStateStore: 
pid=2305 updating hbase:meta row=4423e4182457c5b573729be4682cc3a3, 
regionState=OPEN, openSeqNum=16021, regionLocation=e010125049
164.bja,60020,1531136465378
2018-07-09 20:03:43,190 INFO  [PEWorker-12] procedure2.ProcedureExecutor: 
Finished pid=2303, state=SUCCESS; ServerCrashProcedure 
server=e010125048153.bja,60020,1531137365840, splitWal=true, meta=false
in 10.7490sec
{code}
2. A modify table happened later, the 4423e4182457c5b573729be4682cc3a3 was 
reopend on e010125049164.bja,60020,1531136465378
{code:java}
2018-07-09 20:04:39,929 DEBUG 
[RpcServer.default.FPBQ.Fifo.handler=295,queue=25,port=6] 
assignment.RegionTransitionProcedure: Received report OPENED seqId=16024, 
pid=2351, ppid=2314, state=RUNNABLE
:REGION_TRANSITION_DISPATCH; AssignProcedure 
table=IntegrationTestBigLinkedList, region=4423e4182457c5b573729be4682cc3a3, 
target=e010125049164.bja,60020,1531136465378; rit=OPENING, location=e0101250491
64.bja,60020,1531136465378
2018-07-09 20:04:40,554 INFO  [PEWorker-6] assignment.RegionStateStore: 
pid=2351 updating hbase:meta row=4423e4182457c5b573729be4682cc3a3, 
regionState=OPEN, openSeqNum=16024, regionLocation=e0101250491
64.bja,60020,1531136465378
{code}
3. Active master was killed, the backup master took over, but when loading meta 
entry, it clearly showed 4423e4182457c5b573729be4682cc3a3 is on the privous 
dead server e010125048153.bja,60020,1531137365840. That is very very strange!!!
{code:java}
2018-07-09 20:06:17,985 INFO  [master/e010125048016:6] 
assignment.RegionStateStore: Load hbase:meta entry 
region=4423e4182457c5b573729be4682cc3a3, regionState=OPEN, 
lastHost=e010125049164.bja,60020
,1531136465378, regionLocation=e010125048153.bja,60020,1531137365840, 
openSeqNum=16024
{code}
4. the rs was killed
{code:java}
2018-07-09 20:06:20,265 WARN  
[RpcServer.default.FPBQ.Fifo.handler=297,queue=27,port=6] 
assignment.AssignmentManager: Killing e010125049164.bja,60020,1531136465378: 
rit=OPEN, location=e010125048153
.bja,60020,1531137365840, table=IntegrationTestBigLinkedList, 
region=4423e4182457c5b573729be4682cc3a3reported OPEN on 
server=e010125049164.bja,60020,1531136465378 but state has otherwise.
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HBASE-20860) Merged region's RIT state may not be cleaned after master restart

2018-07-09 Thread Allan Yang (JIRA)
Allan Yang created HBASE-20860:
--

 Summary: Merged region's RIT state may not be cleaned after master 
restart
 Key: HBASE-20860
 URL: https://issues.apache.org/jira/browse/HBASE-20860
 Project: HBase
  Issue Type: Bug
Affects Versions: 2.0.1, 3.0.0, 2.1.0
Reporter: Allan Yang
Assignee: Allan Yang
 Fix For: 3.0.0, 2.1.0, 2.0.2


In MergeTableRegionsProcedure, we issue UnassignProcedures to offline regions 
to merge. But if we restart master just after MergeTableRegionsProcedure 
finished these two UnassignProcedure and before it can delete their meta 
entries. The new master will found these two region is CLOSED but no procedures 
are attached to them. They will be regard as RIT regions and nobody will clean 
the RIT state for them later.
A quick way to resolve this stuck situation in the production env is restarting 
master again, since the meta entries are deleted in MergeTableRegionsProcedure. 
Here, I offer a fix for this problem.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HBASE-20854) Wrong retires in RpcRetryingCaller's log message

2018-07-06 Thread Allan Yang (JIRA)
Allan Yang created HBASE-20854:
--

 Summary: Wrong retires in RpcRetryingCaller's log message
 Key: HBASE-20854
 URL: https://issues.apache.org/jira/browse/HBASE-20854
 Project: HBase
  Issue Type: Bug
Affects Versions: 2.0.0
Reporter: Allan Yang
Assignee: Allan Yang
 Fix For: 3.0.0, 2.1.0, 2.0.2


Just a small bug fix. In the error log message in RpcRetryingCallerImpl. tries 
number is passed to both tries and retries. Causing a bit of confusing.

{code}
2018-07-05 21:04:46,343 INFO [Thread-20] 
org.apache.hadoop.hbase.client.RpcRetryingCallerImpl: Call exception, tries=6, 
retries=6, started=4174 ms ago, cancelled=false, 
msg=org.apache.hadoop.hbase.exce
ptions.RegionOpeningException: Region 
IntegrationTestBigLinkedList,\x7F\xFF\xFF\xFF\xFF\xFF\xFF\xFE,1530795739116.0cfd339596648348ac13d979150eb2bf.
 is opening on e010125049164.bja,60020,1530795698451
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HBASE-20846) Table's shared lock is not hold by sub-procedures after master restart

2018-07-04 Thread Allan Yang (JIRA)
Allan Yang created HBASE-20846:
--

 Summary: Table's shared lock is not hold by sub-procedures after 
master restart
 Key: HBASE-20846
 URL: https://issues.apache.org/jira/browse/HBASE-20846
 Project: HBase
  Issue Type: Bug
Affects Versions: 2.1.0
Reporter: Allan Yang
Assignee: Allan Yang
 Fix For: 3.0.0, 2.1.0, 2.0.2


Found this one when investigating ModifyTableProcedure got stuck while there 
was a MoveRegionProcedure going on after master restart.
Though this issue can be solved by HBASE-20752. But I discovered something else.
Before a MoveRegionProcedure can execute, it will hold the table's shared lock. 
so,, when a UnassignProcedure was spwaned, it will not check the table's shared 
lock since it is sure that its parent(MoveRegionProcedure) has aquired the 
table's lock.
{code:java}
// If there is parent procedure, it would have already taken xlock, so no need 
to take
  // shared lock here. Otherwise, take shared lock.
  if (!procedure.hasParent()
  && waitTableQueueSharedLock(procedure, table) == null) {
  return true;
  }
{code}

But, it is not the case when Master was restarted. The child 
procedure(UnassignProcedure) will be executed first after restart. Though it 
has a parent(MoveRegionProcedure), but apprently the parent didn't hold the 
table's lock.
So, since it began to execute without hold the table's shared lock. A 
ModifyTableProcedure can aquire the table's exclusive lock and execute at the 
same time. Which is not possible if the master was not restarted.
This will cause a stuck before HBASE-20752. But since HBASE-20752 has fixed, I 
wrote a simple UT to repo this case.

I think we don't have to check the parent for table's shared lock. It is a 
shared lock, right? I think we can acquire it every time we need it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (HBASE-20727) Persist FlushedSequenceId to speed up WAL split after cluster restart

2018-06-13 Thread Allan Yang (JIRA)
Allan Yang created HBASE-20727:
--

 Summary: Persist FlushedSequenceId to speed up WAL split after 
cluster restart
 Key: HBASE-20727
 URL: https://issues.apache.org/jira/browse/HBASE-20727
 Project: HBase
  Issue Type: New Feature
Affects Versions: 2.0.0
Reporter: Allan Yang
Assignee: Allan Yang
 Fix For: 3.0.0


We use flushedSequenceIdByRegion and storeFlushedSequenceIdsByRegion in 
ServerManager to record the latest flushed seqids of regions and stores. So 
during log split, we can use seqids stored in those maps to filter out the 
edits which do not need to be replayed. But, those maps are not persisted. 
After cluster restart or master restart, info of flushed seqids are all lost. 
Here I offer a way to persist those info to HDFS, even if master restart, we 
can still use those info to filter WAL edits and then to speed up replay.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


  1   2   >