from:"Heng Chen"

Re: [ANNOUNCE] New HBase committer Lijin Bin

2016-11-29 Thread Heng Chen

Congratulations!!

2016-11-30 6:42 GMT+08:00 Andrew Purtell :
> Congratulations and welcome, Lijin!
>
> On Tue, Nov 29, 2016 at 1:48 AM, Duo Zhang  wrote:
>
>> On behalf of the Apache HBase PMC, I am pleased to announce that Lijin
>> Bin(binlijin) has accepted the PMC's invitation to become a committer on
>> the project. We appreciate all of Lijin's generous contributions thus far
>> and look forward to his continued involvement.
>>
>> Congratulations and welcome, Lijin!
>>
>
>
>
> --
> Best regards,
>
>- Andy
>
> Problems worthy of attack prove their worth by hitting back. - Piet Hein
> (via Tom White)

Re: [ANNOUNCE] Stephen Yuan Jiang joins Apache HBase PMC

2016-10-16 Thread Heng Chen

Congrats!  :)

2016-10-16 8:19 GMT+08:00 Jerry He :
> Congratulations, Stephen.
>
> Jerry
>
> On Fri, Oct 14, 2016 at 12:56 PM, Dima Spivak  wrote:
>
>> Congrats, Stephen!
>>
>> -Dima
>>
>> On Fri, Oct 14, 2016 at 11:27 AM, Enis Söztutar  wrote:
>>
>> > On behalf of the Apache HBase PMC, I am happy to announce that Stephen
>> has
>> > accepted our invitation to become a PMC member of the Apache HBase
>> project.
>> >
>> > Stephen has been working on HBase for a couple of years, and is already a
>> > committer for more than a year. Apart from his contributions in proc v2,
>> > hbck and other areas, he is also helping for the 2.0 release which is the
>> > most important milestone for the project this year.
>> >
>> > Welcome to the PMC Stephen,
>> > Enis
>> >
>>

[jira] [Resolved] (HBASE-16641) QA tests for hbase-client skip the second part.

2016-10-12 Thread Heng Chen (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-16641?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Heng Chen resolved HBASE-16641.
---
Resolution: Duplicate

> QA tests for hbase-client skip the second part.
> ---
>
> Key: HBASE-16641
> URL: https://issues.apache.org/jira/browse/HBASE-16641
> Project: HBase
>  Issue Type: Bug
>    Reporter: Heng Chen
>
> See 
> https://builds.apache.org/job/PreCommit-HBASE-Build/3547/artifact/patchprocess/patch-unit-hbase-client.txt
> {code}
> [INFO] --- maven-surefire-plugin:2.18.1:test (secondPartTestsExecution) @ 
> hbase-client ---
> [INFO] Tests are skipped.
> {code}
> The first part passed fine,  but second parts is skipped. 
> Notice hbase-client/pom.xml 
> {code}
>  
>   
> secondPartTestsExecution
> test
> 
>   test
> 
> 
>   true
> 
>   
> 
> {code}
> If i change the 'skip' to be false,  the second part could be triggered.  But 
> this configuration existed for a long time,  is the cmd line on build box 
> updated recently? 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HBASE-16702) TestBlockEvictionFromClient is broken

2016-09-23 Thread Heng Chen (JIRA)

Heng Chen created HBASE-16702:
-

 Summary: TestBlockEvictionFromClient is broken
 Key: HBASE-16702
 URL: https://issues.apache.org/jira/browse/HBASE-16702
 Project: HBase
  Issue Type: Bug
Reporter: Heng Chen






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: [DISCUSSION] MR jobs started by Master or RS

2016-09-22 Thread Heng Chen

{quote}
If MR framework is not deployed in the cluster, hbase still functions
normally (post merge).
{quote}

If MR is not strong dependency for Master/RS,  it is OK for me.
And if MR not deployed,  Backup/Restore feature could not be used, right?

2016-09-23 10:49 GMT+08:00 Ted Yu :
> If MR framework is not deployed in the cluster, hbase still functions
> normally (post merge).
>
> In terms of build time dependency, we have long been depending on
> mapreduce. Take a look at ExportSnapshot.
>
> Cheers
>
> On Thu, Sep 22, 2016 at 7:42 PM, Heng Chen  wrote:
>
>> In our production cluster,  it is a common case we just have HDFS and
>> HBase deployed.
>> If our Master/RS depend on MR framework (especially some features we
>> have not used at all),  it introduced another cost for maintain.  I
>> don't think it is a good idea.
>>
>> 2016-09-23 10:28 GMT+08:00 张铎 :
>> > To be specific, for example, our nice Backup/Restore feature, if we think
>> > this is not a core feature of HBase, then we could make it depend on MR,
>> > and start a standalone BackupManager instance that submits MR jobs to do
>> > periodical maintenance job. And if we think this is a core feature that
>> > everyone should use it, then we'd better implement it without MR
>> > dependency, like DLS.
>> >
>> > Thanks.
>> >
>> > 2016-09-23 10:11 GMT+08:00 张铎 :
>> >
>> >> I‘m -1 on let master or rs launch MR jobs. It is OK that some of our
>> >> features depend on MR but I think the bottom line is that we should
>> launch
>> >> the jobs from outside manually or by other services.
>> >>
>> >> 2016-09-23 9:47 GMT+08:00 Andrew Purtell :
>> >>
>> >>> Ok, got it. Well "shelling out" is on the line I think, so a fair
>> >>> question.
>> >>>
>> >>> Can this be driven by a utility derived from Tool like our other MR
>> apps?
>> >>> The issue is needing the AccessController to decide if allowed? But
>> nothing
>> >>> prevents the user from running the job manually/independently, right?
>> >>>
>> >>> > On Sep 22, 2016, at 3:44 PM, Matteo Bertozzi <
>> theo.berto...@gmail.com>
>> >>> wrote:
>> >>> >
>> >>> > just a remark. my query was not about tools using MR (everyone i
>> think
>> >>> is
>> >>> > ok with those).
>> >>> > the topic was about: "are we ok with running MR jobs from Master and
>> RSs
>> >>> > code?" since this will be the first time we do this
>> >>> >
>> >>> > Matteo
>> >>> >
>> >>> >
>> >>> >> On Thu, Sep 22, 2016 at 2:49 PM, Devaraj Das 
>> >>> wrote:
>> >>> >>
>> >>> >> Very much agree; for tools like ExportSnapshot / Backup / Restore,
>> it's
>> >>> >> fine to be dependent on MR. MR is the right framework for such. We
>> >>> should
>> >>> >> also do compactions using MR (just saying :) )
>> >>> >> 
>> >>> >> From: Ted Yu 
>> >>> >> Sent: Thursday, September 22, 2016 2:00 PM
>> >>> >> To: dev@hbase.apache.org
>> >>> >> Subject: Re: [DISCUSSION] MR jobs started by Master or RS
>> >>> >>
>> >>> >> I agree - backup / restore is in the same category as import /
>> export.
>> >>> >>
>> >>> >> On Thu, Sep 22, 2016 at 1:58 PM, Andrew Purtell <
>> >>> andrew.purt...@gmail.com>
>> >>> >> wrote:
>> >>> >>
>> >>> >>> Backup is extra tooling around core in my opinion. Like import or
>> >>> export.
>> >>> >>> Or the optional MOB tool. It's fine.
>> >>> >>>
>> >>> >>>> On Sep 22, 2016, at 1:50 PM, Matteo Bertozzi <
>> mberto...@apache.org>
>> >>> >>> wrote:
>> >>> >>>>
>> >>> >>>> What's the latest opinion around running MR jobs from hbase
>> (Master
>> >>> or
>> >>> >>> RS)?
>> >>> >>>>
>> >>> >>>> I remember in the past that there was discussion about not having
>> MR
>> >>> >> has
>> >>> >>>> direct dependency of hbase.
>> >>> >>>>
>> >>> >>>> I think some of discussion where around MOB that had a MR job to
>> >>> >> compact,
>> >>> >>>> that later was transformed in a non-MR job to be merged, I think
>> we
>> >>> >> had a
>> >>> >>>> similar discussion for log split/replay.
>> >>> >>>>
>> >>> >>>> the latest is the new Backup feature (HBASE-7912), that runs a MR
>> job
>> >>> >>> from
>> >>> >>>> the master to copy data or restore data.
>> >>> >>>> (backup is also "not really core" as in.. if you don't use backup
>> >>> >> you'll
>> >>> >>>> not end up running MR jobs, but this was probably true for MOB as
>> in
>> >>> >> "if
>> >>> >>>> you don't enable MOB you don't need MR")
>> >>> >>>>
>> >>> >>>> any thoughts? do we a rule that says "we don't want to have hbase
>> run
>> >>> >> MR
>> >>> >>>> jobs, only tool started manually by the user can do that". or can
>> we
>> >>> >>> start
>> >>> >>>> adding MR calls around without problems?
>> >>> >>>
>> >>> >>
>> >>>
>> >>
>> >>
>>

Re: [DISCUSSION] MR jobs started by Master or RS

2016-09-22 Thread Heng Chen

In our production cluster,  it is a common case we just have HDFS and
HBase deployed.
If our Master/RS depend on MR framework (especially some features we
have not used at all),  it introduced another cost for maintain.  I
don't think it is a good idea.

2016-09-23 10:28 GMT+08:00 张铎 :
> To be specific, for example, our nice Backup/Restore feature, if we think
> this is not a core feature of HBase, then we could make it depend on MR,
> and start a standalone BackupManager instance that submits MR jobs to do
> periodical maintenance job. And if we think this is a core feature that
> everyone should use it, then we'd better implement it without MR
> dependency, like DLS.
>
> Thanks.
>
> 2016-09-23 10:11 GMT+08:00 张铎 :
>
>> I‘m -1 on let master or rs launch MR jobs. It is OK that some of our
>> features depend on MR but I think the bottom line is that we should launch
>> the jobs from outside manually or by other services.
>>
>> 2016-09-23 9:47 GMT+08:00 Andrew Purtell :
>>
>>> Ok, got it. Well "shelling out" is on the line I think, so a fair
>>> question.
>>>
>>> Can this be driven by a utility derived from Tool like our other MR apps?
>>> The issue is needing the AccessController to decide if allowed? But nothing
>>> prevents the user from running the job manually/independently, right?
>>>
>>> > On Sep 22, 2016, at 3:44 PM, Matteo Bertozzi 
>>> wrote:
>>> >
>>> > just a remark. my query was not about tools using MR (everyone i think
>>> is
>>> > ok with those).
>>> > the topic was about: "are we ok with running MR jobs from Master and RSs
>>> > code?" since this will be the first time we do this
>>> >
>>> > Matteo
>>> >
>>> >
>>> >> On Thu, Sep 22, 2016 at 2:49 PM, Devaraj Das 
>>> wrote:
>>> >>
>>> >> Very much agree; for tools like ExportSnapshot / Backup / Restore, it's
>>> >> fine to be dependent on MR. MR is the right framework for such. We
>>> should
>>> >> also do compactions using MR (just saying :) )
>>> >> 
>>> >> From: Ted Yu 
>>> >> Sent: Thursday, September 22, 2016 2:00 PM
>>> >> To: dev@hbase.apache.org
>>> >> Subject: Re: [DISCUSSION] MR jobs started by Master or RS
>>> >>
>>> >> I agree - backup / restore is in the same category as import / export.
>>> >>
>>> >> On Thu, Sep 22, 2016 at 1:58 PM, Andrew Purtell <
>>> andrew.purt...@gmail.com>
>>> >> wrote:
>>> >>
>>> >>> Backup is extra tooling around core in my opinion. Like import or
>>> export.
>>> >>> Or the optional MOB tool. It's fine.
>>> >>>
>>>  On Sep 22, 2016, at 1:50 PM, Matteo Bertozzi 
>>> >>> wrote:
>>> 
>>>  What's the latest opinion around running MR jobs from hbase (Master
>>> or
>>> >>> RS)?
>>> 
>>>  I remember in the past that there was discussion about not having MR
>>> >> has
>>>  direct dependency of hbase.
>>> 
>>>  I think some of discussion where around MOB that had a MR job to
>>> >> compact,
>>>  that later was transformed in a non-MR job to be merged, I think we
>>> >> had a
>>>  similar discussion for log split/replay.
>>> 
>>>  the latest is the new Backup feature (HBASE-7912), that runs a MR job
>>> >>> from
>>>  the master to copy data or restore data.
>>>  (backup is also "not really core" as in.. if you don't use backup
>>> >> you'll
>>>  not end up running MR jobs, but this was probably true for MOB as in
>>> >> "if
>>>  you don't enable MOB you don't need MR")
>>> 
>>>  any thoughts? do we a rule that says "we don't want to have hbase run
>>> >> MR
>>>  jobs, only tool started manually by the user can do that". or can we
>>> >>> start
>>>  adding MR calls around without problems?
>>> >>>
>>> >>
>>>
>>
>>

[jira] [Created] (HBASE-16665) Check whether KeyValueUtil.createXXX could be replaced by CellUtil without copy

2016-09-21 Thread Heng Chen (JIRA)

Heng Chen created HBASE-16665:
-

 Summary: Check whether KeyValueUtil.createXXX could be replaced by 
CellUtil without copy
 Key: HBASE-16665
 URL: https://issues.apache.org/jira/browse/HBASE-16665
 Project: HBase
  Issue Type: Bug
Reporter: Heng Chen






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HBASE-16652) Figure out performance difference between increment and append

2016-09-18 Thread Heng Chen (JIRA)

Heng Chen created HBASE-16652:
-

 Summary: Figure out performance difference between increment and 
append
 Key: HBASE-16652
 URL: https://issues.apache.org/jira/browse/HBASE-16652
 Project: HBase
  Issue Type: Sub-task
Reporter: Heng Chen


When do performance test in HBASE-16625,  i found it has the very big 
difference between Append and Increment (append is about 37% faster than 
increment).

As [~stack] mentioned in 
https://issues.apache.org/jira/browse/HBASE-16610?focusedCommentId=15493166&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15493166,
   append and increment has been unified in server-side,  and they looks the 
same in client-side. 

This issue is to figure out why the performance looks different between them.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HBASE-16641) QA tests for hbase-client skip the second part.

2016-09-15 Thread Heng Chen (JIRA)

Heng Chen created HBASE-16641:
-

 Summary: QA tests for hbase-client skip the second part.
 Key: HBASE-16641
 URL: https://issues.apache.org/jira/browse/HBASE-16641
 Project: HBase
  Issue Type: Bug
Reporter: Heng Chen






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HBASE-16631) Extract AsyncRequestFuture relates code from AsyncProcess

2016-09-14 Thread Heng Chen (JIRA)

Heng Chen created HBASE-16631:
-

 Summary: Extract AsyncRequestFuture relates code from AsyncProcess
 Key: HBASE-16631
 URL: https://issues.apache.org/jira/browse/HBASE-16631
 Project: HBase
  Issue Type: Bug
Reporter: Heng Chen
Assignee: Heng Chen


Now, AsyncProcess class is too large (over 2000+ lines),  and there are so many 
sub classes in it.  

AsyncRequestFutureImpl is the most biggest subclass in AP,  we could extract it 
out from AP to reduce the AP size.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HBASE-16625) Performance test for interface unified with AP

2016-09-13 Thread Heng Chen (JIRA)

Heng Chen created HBASE-16625:
-

 Summary: Performance test for interface unified with AP
 Key: HBASE-16625
 URL: https://issues.apache.org/jira/browse/HBASE-16625
 Project: HBase
  Issue Type: Sub-task
Reporter: Heng Chen






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HBASE-16623) Unify Get with AP

2016-09-12 Thread Heng Chen (JIRA)

Heng Chen created HBASE-16623:
-

 Summary: Unify Get with AP
 Key: HBASE-16623
 URL: https://issues.apache.org/jira/browse/HBASE-16623
 Project: HBase
  Issue Type: Sub-task
Reporter: Heng Chen
Assignee: Heng Chen






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HBASE-16611) Flakey org.apache.hadoop.hbase.client.TestReplicasClient.testCancelOfMultiGet

2016-09-11 Thread Heng Chen (JIRA)

Heng Chen created HBASE-16611:
-

 Summary: Flakey 
org.apache.hadoop.hbase.client.TestReplicasClient.testCancelOfMultiGet
 Key: HBASE-16611
 URL: https://issues.apache.org/jira/browse/HBASE-16611
 Project: HBase
  Issue Type: Bug
Reporter: Heng Chen


see 
https://builds.apache.org/job/PreCommit-HBASE-Build/3494/artifact/patchprocess/patch-unit-hbase-server.txt

{code}
testCancelOfMultiGet(org.apache.hadoop.hbase.client.TestReplicasClient)  Time 
elapsed: 4.026 sec  <<< FAILURE!
java.lang.AssertionError: null
at org.junit.Assert.fail(Assert.java:86)
at org.junit.Assert.assertTrue(Assert.java:41)
at org.junit.Assert.assertTrue(Assert.java:52)
at 
org.apache.hadoop.hbase.client.TestReplicasClient.testCancelOfMultiGet(TestReplicasClient.java:579)

Tests run: 26, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 94.401 sec - 
in org.apache.hadoop.hbase.client.TestAdmin2
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 11.861 sec - in 
org.apache.hadoop.hbase.client.TestClientScannerRPCTimeout
Running 
org.apache.hadoop.hbase.client.TestRestoreSnapshotFromClientWithRegionReplicas
Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 261.925 sec <<< 
FAILURE! - in org.apache.hadoop.hbase.client.TestReplicasClient
testCancelOfMultiGet(org.apache.hadoop.hbase.client.TestReplicasClient)  Time 
elapsed: 4.522 sec  <<< FAILURE!
java.lang.AssertionError: null
at org.junit.Assert.fail(Assert.java:86)
at org.junit.Assert.assertTrue(Assert.java:41)
at org.junit.Assert.assertTrue(Assert.java:52)
at 
org.apache.hadoop.hbase.client.TestReplicasClient.testCancelOfMultiGet(TestReplicasClient.java:581)

Running org.apache.hadoop.hbase.client.TestFastFail
Tests run: 2, Failures: 0, Errors: 0, Skipped: 1, Time elapsed: 3.648 sec - in 
org.apache.hadoop.hbase.client.TestFastFail
Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 277.894 sec <<< 
FAILURE! - in org.apache.hadoop.hbase.client.TestReplicasClient
testCancelOfMultiGet(org.apache.hadoop.hbase.client.TestReplicasClient)  Time 
elapsed: 5.359 sec  <<< FAILURE!
java.lang.AssertionError: null
at org.junit.Assert.fail(Assert.java:86)
at org.junit.Assert.assertTrue(Assert.java:41)
at org.junit.Assert.assertTrue(Assert.java:52)
at 
org.apache.hadoop.hbase.client.TestReplicasClient.testCancelOfMultiGet(TestReplicasClient.java:579)
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HBASE-16610) Unify append, increment with AP

2016-09-11 Thread Heng Chen (JIRA)

Heng Chen created HBASE-16610:
-

 Summary: Unify append, increment with AP
 Key: HBASE-16610
 URL: https://issues.apache.org/jira/browse/HBASE-16610
 Project: HBase
  Issue Type: Sub-task
Reporter: Heng Chen
Assignee: Heng Chen






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HBASE-16607) Make NoncedRegionServerCallable extends CancellableRegionServerCallable

2016-09-10 Thread Heng Chen (JIRA)

Heng Chen created HBASE-16607:
-

 Summary: Make NoncedRegionServerCallable extends 
CancellableRegionServerCallable
 Key: HBASE-16607
 URL: https://issues.apache.org/jira/browse/HBASE-16607
 Project: HBase
  Issue Type: Sub-task
Reporter: Heng Chen


This is the first step to unify append, increment with AP.

And after extends CancellableRegionServerCallable,  we could remove lots of 
duplicate code in NoncedRegionServerCallable



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HBASE-16606) Remove some duplicate code in HTable

2016-09-09 Thread Heng Chen (JIRA)

Heng Chen created HBASE-16606:
-

 Summary: Remove some duplicate code in HTable
 Key: HBASE-16606
 URL: https://issues.apache.org/jira/browse/HBASE-16606
 Project: HBase
  Issue Type: Sub-task
Reporter: Heng Chen
Assignee: Heng Chen






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (HBASE-16597) Revisit the threadPool is really needed in submitAll and submit interface in AsyncProcess

2016-09-09 Thread Heng Chen (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-16597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Heng Chen resolved HBASE-16597.
---
Resolution: Invalid

> Revisit the threadPool is really needed in submitAll and submit interface in 
> AsyncProcess
> -
>
> Key: HBASE-16597
> URL: https://issues.apache.org/jira/browse/HBASE-16597
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Heng Chen
>    Assignee: Heng Chen
>
> Currently,  threadPool could be passed into AP by constructor and submitAll, 
> submit interfaces,  but i notice in HTable the pool passed to AP though 
> submitAll looks same with the one in AP as default,  Let me revisit it to 
> ensure whether the pool is really needed in submitAll and submit interface.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: [VOTE] The 1st HBase 0.98.22 release candidate (RC0) is available

2016-09-09 Thread Heng Chen

Run TestShell twice, both could passed.

2016-09-10 2:56 GMT+08:00 Ted Yu :

> +1
>
> Checked layout
> Ran test suite - got test failure in ReplicationAdminTest which seems
> intermittent (HBASE-16600)
> Ran LoadTestTool
> Exercised basic shell commands
>
> On Sat, Sep 3, 2016 at 8:34 AM, Andrew Purtell 
> wrote:
>
> > The 1st HBase 0.98.2
> > 2 release candidate (RC0) is available for download at
> > https://dist.apache.org/repos/dist/dev/hbase/hbase-0.98.22RC0 and Maven
> > artifacts are also available in the temporary repository
> > https://repository.apache.org/content/repositories/orgapachehbase-1151 .
> >
> > The detailed source and binary compatibility report for this release with
> > respect to the previous is available for your review at
> > https://dist.apache.org/repos/dist/dev/hbase/hbase-0.98.
> > 22RC0/0.98.21_0.98.22RC0_compat_report.html
> > . There are no reported compatibility issues.
> >
> > The
> > 25
> > issues resolved in this release can be found at
> https://s.apache.org/C7SV
> > .
> >
> > I have made the following assessments of this candidate:
> > - Release audit check
> > : pass
> >
> > -
> >  Unit test suite: pass 10/10 (7u79)
> >
> > - Loaded 1M keys with LTT (10 readers, 10 writers, 10 updaters (20%): all
> > keys verified, no unusual messages or errors, latencies in the ballpark
> > - IntegrationTestBigLinkedList
> > 1B rows: 100% referenced, no errors (8u91)
> > - Built head of Apache Phoenix 4.x-HBase-0.98 branch
> > :
> > no errors (7u79)
> >
> > Signed with my code signing key D5365CCD.
> >
> > Please try out the candidate and vote +1/0/-1. This vote will be open for
> > at least 72 hours. Unless objection I will try to close it
> > Friday September 9, 2016 if we have sufficient votes.
> >
> > --
> > Best regards,
> >
> >- Andy
> >
> > Problems worthy of attack prove their worth by hitting back. - Piet Hein
> > (via Tom White)
> >
>

[jira] [Created] (HBASE-16597) Revisit the threadPool is really needed in submitAll and submit interface in AsyncProcess

2016-09-09 Thread Heng Chen (JIRA)

Heng Chen created HBASE-16597:
-

 Summary: Revisit the threadPool is really needed in submitAll and 
submit interface in AsyncProcess
 Key: HBASE-16597
 URL: https://issues.apache.org/jira/browse/HBASE-16597
 Project: HBase
  Issue Type: Sub-task
Reporter: Heng Chen


Currently,  threadPool could be passed into AP by constructor and submitAll, 
submit interfaces,  but i notice in HTable the pool passed to AP though 
submitAll looks same with the one in AP as default,  Let me revisit it to 
ensure whether the pool is really needed in submitAll and submit interface.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HBASE-16596) Reduce redundant interfaces in AP

2016-09-09 Thread Heng Chen (JIRA)

Heng Chen created HBASE-16596:
-

 Summary: Reduce redundant interfaces in AP
 Key: HBASE-16596
 URL: https://issues.apache.org/jira/browse/HBASE-16596
 Project: HBase
  Issue Type: Sub-task
Reporter: Heng Chen
Assignee: Heng Chen


Currently, there are lots of interfaces in AP,  some of them only be used only 
in test case, or be used only once,  let's remove the redundant ones to keep 
the AP more clear



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HBASE-16593) Unify HTable with AP

2016-09-09 Thread Heng Chen (JIRA)

Heng Chen created HBASE-16593:
-

 Summary: Unify HTable with AP
 Key: HBASE-16593
 URL: https://issues.apache.org/jira/browse/HBASE-16593
 Project: HBase
  Issue Type: Umbrella
Reporter: Heng Chen
Assignee: Heng Chen


Currently, HTable has two ways to deal with request,  one is call RPC directly, 
it is used to processed single action request such as Get, Delete, Append, 
Increment.  Another one is through AP to deal with multi action requests, such 
as batch, mutation etc.

This issue is to unify them with AP only. It has some benefits, for example we 
could implements async interface easily with AP,  and we could make the client 
logic more clear just use AP to communicate with Server.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HBASE-16592) Unify Delete request with AP

2016-09-09 Thread Heng Chen (JIRA)

Heng Chen created HBASE-16592:
-

 Summary: Unify Delete request with AP
 Key: HBASE-16592
 URL: https://issues.apache.org/jira/browse/HBASE-16592
 Project: HBase
  Issue Type: Task
Reporter: Heng Chen
Assignee: Heng Chen


This is the first step try to unify the HTable with AP only,  to extend AP 
could process single action, i introduced AbstractResponse,  multiResponse and 
singleResponse (introduced to deal with single result) will extend this class. 





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: [ANNOUNCE] Misty Stanley-Jones joins the Apache HBase PMC

2016-09-07 Thread Heng Chen

congrats!

2016-09-08 9:25 GMT+08:00 Mikhail Antonov :

> Congratulations!
>
> -Mikhail
>
> On Wed, Sep 7, 2016 at 6:14 PM, 张铎  wrote:
>
> > Congratulations!
> >
> > 2016-09-08 7:56 GMT+08:00 Jonathan Hsieh :
> >
> > > Congrats Misty!
> > >
> > > On Wed, Sep 7, 2016 at 11:40 AM, Sean Busbey 
> wrote:
> > >
> > > > On behalf of the Apache HBase PMC I am pleased to announce that Misty
> > has
> > > > accepted our invitation to become a PMC member on the Apache HBase
> > > project.
> > > > Misty has been a committer for almost 2 years now. She's done a great
> > job
> > > > of steering us good directions both for docs and as a project.
> > > >
> > > > Please join me in welcoming Misty to the HBase PMC
> > > >
> > > > -busbey
> > > >
> > >
> > >
> > >
> > > --
> > > // Jonathan Hsieh (shay)
> > > // HBase Tech Lead, Software Engineer, Cloudera
> > > // j...@cloudera.com // @jmhsieh
> > >
> >
>
>
>
> --
> Thanks,
> Michael Antonov
>

[jira] [Resolved] (HBASE-16562) ITBLL should fail to start if misconfigured

2016-09-07 Thread Heng Chen (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-16562?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Heng Chen resolved HBASE-16562.
---
  Resolution: Fixed
Hadoop Flags: Reviewed

> ITBLL should fail to start if misconfigured
> ---
>
> Key: HBASE-16562
> URL: https://issues.apache.org/jira/browse/HBASE-16562
> Project: HBase
>  Issue Type: Improvement
>  Components: integration tests
>Reporter: Andrew Purtell
>    Assignee: Heng Chen
> Fix For: 2.0.0, 1.0.4, 1.4.0, 1.3.1, 1.1.7, 0.98.23, 1.2.4
>
> Attachments: HBASE-16562-branch-1.2.patch, 
> HBASE-16562-branch-1.2.v1.patch, HBASE-16562.patch, HBASE-16562.v1.patch, 
> HBASE-16562.v1.patch-addendum
>
>
> The number of nodes in ITBLL must a multiple of width*wrap (defaults to 25M, 
> but can be configured by adding two more args to the test invocation) or else 
> verification will fail. This can be very expensive in terms of time or hourly 
> billing for on demand test resources. Check the sanity of test parameters 
> before launching any MR jobs and fail fast if invariants aren't met with an 
> indication what parameter(s) need fixing. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (HBASE-16564) ITBLL run failed with hadoop 2.7.2 on branch 0.98

2016-09-06 Thread Heng Chen (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-16564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Heng Chen resolved HBASE-16564.
---
Resolution: Invalid

As [~Apache9] said, the best solution is upgrade hadoop client,  so close this 
issue as invalid.

> ITBLL run failed with hadoop 2.7.2 on branch 0.98
> -
>
> Key: HBASE-16564
> URL: https://issues.apache.org/jira/browse/HBASE-16564
> Project: HBase
>  Issue Type: Bug
>    Reporter: Heng Chen
>Priority: Minor
>
> 0.98 compiled with hadoop 2.2.0,   so it has some compatibility issues with 
> hadoop 2.7.2 (it seems 2.5.0+ has the same issue),  some counter has been 
> removed.  
> IMO we should catch the exception so our ITBLL could go on.
> {code}
> 16/09/06 15:39:33 INFO hbase.HBaseCluster: Added new HBaseAdmin
> 16/09/06 15:39:33 INFO hbase.HBaseCluster: Restoring cluster - done
> 16/09/06 15:39:33 INFO hbase.HBaseCommonTestingUtility: Stopping mini 
> mapreduce cluster...
> 16/09/06 15:39:33 INFO Configuration.deprecation: mapred.job.tracker is 
> deprecated. Instead, use mapreduce.jobtracker.address
> 16/09/06 15:39:33 INFO hbase.HBaseCommonTestingUtility: Mini mapreduce 
> cluster stopped
> 16/09/06 15:39:33 ERROR util.AbstractHBaseTool: Error running command-line 
> tool
> java.lang.IllegalArgumentException: No enum constant 
> org.apache.hadoop.mapreduce.JobCounter.MB_MILLIS_MAPS
>   at java.lang.Enum.valueOf(Enum.java:238)
>   at 
> org.apache.hadoop.mapreduce.counters.FrameworkCounterGroup.valueOf(FrameworkCounterGroup.java:148)
>   at 
> org.apache.hadoop.mapreduce.counters.FrameworkCounterGroup.findCounter(FrameworkCounterGroup.java:182)
>   at 
> org.apache.hadoop.mapreduce.counters.AbstractCounters.findCounter(AbstractCounters.java:154)
>   at 
> org.apache.hadoop.mapreduce.TypeConverter.fromYarn(TypeConverter.java:240)
>   at 
> org.apache.hadoop.mapred.ClientServiceDelegate.getJobCounters(ClientServiceDelegate.java:370)
>   at 
> org.apache.hadoop.mapred.YARNRunner.getJobCounters(YARNRunner.java:511)
>   at org.apache.hadoop.mapreduce.Job$7.run(Job.java:756)
>   at org.apache.hadoop.mapreduce.Job$7.run(Job.java:753)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
>   at org.apache.hadoop.mapreduce.Job.getCounters(Job.java:753)
>   at 
> org.apache.hadoop.mapreduce.Job.monitorAndPrintJob(Job.java:1361)
>   at 
> org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1289)
>   at 
> org.apache.hadoop.hbase.test.IntegrationTestBigLinkedList$Generator.jobCompletion(IntegrationTestBigLinkedList.java:543)
>   at 
> org.apache.hadoop.hbase.test.IntegrationTestBigLinkedList$Generator.runRandomInputGenerator(IntegrationTestBigLinkedList.java:505)
>   at 
> org.apache.hadoop.hbase.test.IntegrationTestBigLinkedList$Generator.run(IntegrationTestBigLinkedList.java:553)
>   at 
> org.apache.hadoop.hbase.test.IntegrationTestBigLinkedList$Loop.runGenerator(IntegrationTestBigLinkedList.java:842)
>   at 
> org.apache.hadoop.hbase.test.IntegrationTestBigLinkedList$Loop.run(IntegrationTestBigLinkedList.java:892)
>   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
>   at 
> org.apache.hadoop.hbase.test.IntegrationTestBigLinkedList.runTestFromCommandLine(IntegrationTestBigLinkedList.java:1237)
>   at 
> org.apache.hadoop.hbase.IntegrationTestBase.doWork(IntegrationTestBase.java:115)
>   at 
> org.apache.hadoop.hbase.util.AbstractHBaseTool.run(AbstractHBaseTool.java:112)
>   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
>   at 
> org.apache.hadoop.hbase.test.IntegrationTestBigLinkedList.main(IntegrationTestBigLinkedList.java:1272)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: [VOTE] The 1st HBase 0.98.22 release candidate (RC0) is available

2016-09-06 Thread Heng Chen

+1

- Unpacked source and binary tarballs: layout looks good

- Started up a 3-node cluster (Hadoop 2.7.2, Oracle JDK 8u20, 2 master, 3
rs) from binary tarballs.

- Verified that the web UI works and shell works

- build from source and run test case (JDK 8u20),  passed. (There is some
failed test case about thrift server, but could pass when rerun manually,
list the failed test case below)

TestThriftServer.beforeClass:97 » IO Shutting down

TestThriftServerCmdLine.setUpBeforeClass:119 » IO Shutting down

TestThriftHBaseServiceHandler.beforeClass:135 » IO Shutting down

TestThriftHBaseServiceHandlerWithLabels.beforeClass:135 » IO Shutting
down

- Run LTT with 1M rows (100 writers,  30 readers (100%),  10 updaters
(20%))  all keys verified,  no warns, no errors,  no failed, latencies lgtm

- Run ITBLL with 2M rows (slowDeterministic), passed.

- Run ITBLL with 2.5M rows (serverKilling), passed.

Some notes:  because 0.98 compiled with hadoop 2.2.0,  so when i run ITBLL
on hadoop 2.7.2, it failed due to compatibiltiy issue, see HBASE-16564,  so
i replace hadoop-2.2.0 jar with hadoop 2.5.1,  and pass the ITBLL.  Still
give +1 because it is MapReduce issue not HBase




2016-09-05 13:41 GMT+08:00 Dima Spivak :

> Ugh, sorry guys, I'm dumb. I was running 1 mapper per RS before, but
> switched to a d2.4xlarge instance today and, after noticing cores sitting
> idly, decided to try setting the number of mappers and reducers to the
> number of cores to speed testing up (RAM is still grossly underutilized
> with less than 16 GB/122 GB in use at any one time). This definitely made
> runs go faster (generation took less than 3 hours, verification took about
> 1 hour), but I just realized that the number of nodes I picked (6250)
> isn't a multiple of 25,000,000 and so the list won't wrap properly. I'll
> rerun and confirm, but I'm guessing this is a false alarm.
>
> Sorry again. :(
>
> -Dima
>
> On Sun, Sep 4, 2016 at 9:56 PM, Andrew Purtell 
> wrote:
>
> > I will also try your incantation (and JRE version) on this RC and 0.98.21
> > next week to answer those same questions.
> >
> > Looks like you are using a multiple of RSes (16) as numMappers? Is that
> > 4x? On what kind of instance type? I am (also, I think) using a 5 node
> > "cluster" with 4 RS nodes but numMappers 4 and numNodes 25000. Since
> > with clusterdock everything is contending for one instance's resources I
> > didn't want to overdo and so have started at 1 mapper per RS. Since you
> > appear to be using a higher value, I'm curious if you've found that you
> > will get stable results with that, if more mappers in this configuration
> > does a better job finding problems in your experience, and what instance
> > type are you using? I've been using a d2.4xlarge.
> >
> > > On Sep 4, 2016, at 9:04 PM, Andrew Purtell 
> > wrote:
> > >
> > > I've been running 1B tests with slowDeterministic. 0.98.21 and this
> > 0.98.22 RC. I get 1B referenced, all ok.
> > >
> > > Did you run serverKilling with 0.98.21? And did it pass? Or does
> 0.98.21
> > pass for you now? If so then we have a regression. If not then it's
> > something to look at for 0.98.23 I'd say.
> > >
> > >> On Sep 4, 2016, at 8:44 PM, Dima Spivak 
> wrote:
> > >>
> > >> Anyone else running ITBLL seeing issues? I just ran a 5-node
> clusterdock
> > >> cluster with JDK 7u79 of this RC and tried out ITBLL with 1 billion
> rows
> > >> and the serverKilling monkey (`hbase
> > >> org.apache.hadoop.hbase.test.IntegrationTestBigLinkedList -m
> > serverKilling
> > >> loop 1 16 6250 ${RANDOM} 16`). This failed for me because of
> > >> unreferenced list nodes:
> > >>
> > >> org.apache.hadoop.hbase.test.IntegrationTestBigLinkedList$
> Verify$Counts
> > >> REFERENCED=732006926
> > >> UNREFERENCED=12003580
> > >>
> > >> Perhaps this is similar to what Mikhail saw a while back with later
> > >> releases?
> > >>
> > >> -Dima
> > >>
> > >>> On Sat, Sep 3, 2016 at 8:34 AM, Andrew Purtell 
> > wrote:
> > >>>
> > >>> The 1st HBase 0.98.2
> > >>> 2 release candidate (RC0) is available for download at
> > >>> https://dist.apache.org/repos/dist/dev/hbase/hbase-0.98.22RC0 and
> > Maven
> > >>> artifacts are also available in the temporary repository
> > >>> https://repository.apache.org/content/repositories/
> orgapachehbase-1151
> > .
> > >>>
> > >>> The detailed source and binary compatibility report for this release
> > with
> > >>> respect to the previous is available for your review at
> > >>> https://dist.apache.org/repos/dist/dev/hbase/hbase-0.98.
> > >>> 22RC0/0.98.21_0.98.22RC0_compat_report.html
> > >>> . There are no reported compatibility issues.
> > >>>
> > >>> The
> > >>> 25
> > >>> issues resolved in this release can be found at
> > https://s.apache.org/C7SV
> > >>> .
> > >>>
> > >>> I have made the following assessments of this candidate:
> > >>> - Release audit check
> > >>> : pass
> > >>>
> > >>> -
> > >>>  Unit test suite: pass 10/10 (7u79)
> > >>>
> > >>> - Loaded 1M

[jira] [Created] (HBASE-16564) ITBLL run failed with hdfs 2.7.2 on branch 0.98

2016-09-06 Thread Heng Chen (JIRA)

Heng Chen created HBASE-16564:
-

 Summary: ITBLL run failed with hdfs 2.7.2 on branch 0.98
 Key: HBASE-16564
 URL: https://issues.apache.org/jira/browse/HBASE-16564
 Project: HBase
  Issue Type: Bug
Reporter: Heng Chen
Priority: Minor


0.98 compiled with hdfs 2.2.0,   so it has some compatibility issues with hdfs 
2.7.2 (it seems 2.5.0+ has the same issue),  some counter has been removed.  

IMO we should catch the exception so our ITBLL could go on.

{code}
16/09/06 15:39:33 INFO hbase.HBaseCluster: Added new HBaseAdmin
16/09/06 15:39:33 INFO hbase.HBaseCluster: Restoring cluster - done
16/09/06 15:39:33 INFO hbase.HBaseCommonTestingUtility: Stopping mini mapreduce 
cluster...
16/09/06 15:39:33 INFO Configuration.deprecation: mapred.job.tracker is 
deprecated. Instead, use mapreduce.jobtracker.address
16/09/06 15:39:33 INFO hbase.HBaseCommonTestingUtility: Mini mapreduce cluster 
stopped
16/09/06 15:39:33 ERROR util.AbstractHBaseTool: Error running command-line tool
java.lang.IllegalArgumentException: No enum constant 
org.apache.hadoop.mapreduce.JobCounter.MB_MILLIS_MAPS
at java.lang.Enum.valueOf(Enum.java:238)
at 
org.apache.hadoop.mapreduce.counters.FrameworkCounterGroup.valueOf(FrameworkCounterGroup.java:148)
at 
org.apache.hadoop.mapreduce.counters.FrameworkCounterGroup.findCounter(FrameworkCounterGroup.java:182)
at 
org.apache.hadoop.mapreduce.counters.AbstractCounters.findCounter(AbstractCounters.java:154)
at 
org.apache.hadoop.mapreduce.TypeConverter.fromYarn(TypeConverter.java:240)
at 
org.apache.hadoop.mapred.ClientServiceDelegate.getJobCounters(ClientServiceDelegate.java:370)
at 
org.apache.hadoop.mapred.YARNRunner.getJobCounters(YARNRunner.java:511)
at org.apache.hadoop.mapreduce.Job$7.run(Job.java:756)
at org.apache.hadoop.mapreduce.Job$7.run(Job.java:753)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
at org.apache.hadoop.mapreduce.Job.getCounters(Job.java:753)
at org.apache.hadoop.mapreduce.Job.monitorAndPrintJob(Job.java:1361)
at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1289)
at 
org.apache.hadoop.hbase.test.IntegrationTestBigLinkedList$Generator.jobCompletion(IntegrationTestBigLinkedList.java:543)
at 
org.apache.hadoop.hbase.test.IntegrationTestBigLinkedList$Generator.runRandomInputGenerator(IntegrationTestBigLinkedList.java:505)
at 
org.apache.hadoop.hbase.test.IntegrationTestBigLinkedList$Generator.run(IntegrationTestBigLinkedList.java:553)
at 
org.apache.hadoop.hbase.test.IntegrationTestBigLinkedList$Loop.runGenerator(IntegrationTestBigLinkedList.java:842)
at 
org.apache.hadoop.hbase.test.IntegrationTestBigLinkedList$Loop.run(IntegrationTestBigLinkedList.java:892)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at 
org.apache.hadoop.hbase.test.IntegrationTestBigLinkedList.runTestFromCommandLine(IntegrationTestBigLinkedList.java:1237)
at 
org.apache.hadoop.hbase.IntegrationTestBase.doWork(IntegrationTestBase.java:115)
at 
org.apache.hadoop.hbase.util.AbstractHBaseTool.run(AbstractHBaseTool.java:112)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at 
org.apache.hadoop.hbase.test.IntegrationTestBigLinkedList.main(IntegrationTestBigLinkedList.java:1272)
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: [VOTE] First release candidate for hbase-1.2.3 (RC0) is available

2016-09-05 Thread Heng Chen

Thanks Dima!  I rerun test with you suggestion, and it passed! (hbase
--config ~/hadoop/hbase/f04_conf/
org.apache.hadoop.hbase.test.IntegrationTestBigLinkedList loop 1 3 250
/tmp/itbll 3 10 25 -m serverKilling)

So +1 for me

2016-09-05 21:33 GMT+08:00 Dima Spivak :

> Hey Heng,
>
> You need to ensure that the number of nodes in ITBLL is a multiple of
> width*wrap (defaults to 25M, but can be configured by adding two more args
> to the test invocation). See: the 0.98.22 RC0 thread, where I forgot this
> :).
>
> On Monday, September 5, 2016, Heng Chen  wrote:
>
> > I will test it with 1.2.2 again.  Not sure about it now.
> >
> > 2016-09-05 20:26 GMT+08:00 Stack >:
> >
> > > Thanks Heng. Do you know if 1.2.2 or earlier versions of 1.2 failed in
> > > similar way or is this new phenomenon?
> > > Thanks,
> > > St.Ack
> > >
> > > On Mon, Sep 5, 2016 at 1:26 AM, Heng Chen  > >
> > > wrote:
> > >
> > > > - Unpacked source and binary tarballs: layout looks good
> > > > - Started up a 3-node cluster (Hadoop 2.7.2, Oracle JDK 8u20, 2
> > master, 2
> > > > rs) from binary tarballs.
> > > > - Verified that the web UI works and shell works
> > > > - build from source and run test case (JDK 8u20),  passed.
> > > > - Run LTT with 1M rows (100 writers,  30 readers (100%),  10 updaters
> > > > (20%))  all keys verified,  no warns, no errors,  no failed,
> latencies
> > > >  lgtm
> > > > - Run ITBLL with 2M rows (slowDeterministic), passed.
> > > >
> > > > Run ITBLL with 2M rows (serverKilling) has some issues.  I run two
> > times,
> > > > all failed ( hbase --config ~/hadoop/hbase/f04_conf/
> > > > org.apache.hadoop.hbase.test.IntegrationTestBigLinkedList loop 1 1
> > > > 200 /tmp/it_16224_v0 1 -m serverKilling)
> > > >
> > > > The first time,  after kill components in cluster serval times (1
> > active
> > > > master, 1 backup master, 1 active RS, 1 dead RS), some regions fall
> in
> > > > Failed_open state on active RS,  so balance could not run, and ITBLL
> > > hang.
> > > >   Then i start the dead RS manually,   and stop the active RS,  ITBLL
> > > could
> > > > go on.  But after some time,  ITBLL hang again due to backup master
> > could
> > > > not be startup,  so i do manually again,  and ITBLL go on ,  and at
> > least
> > > > verified failed.
> > > >
> > > >
> > > > The second time,  there is no abnormal issues during ITBLL,  but
> > verified
> > > > failed.
> > > >
> > > > Upload the logs
> > > >
> > > >
> > > >
> > > > 2016-09-03 0:33 GMT+08:00 Misty Stanley-Jones  > >:
> > > >
> > > >> +1 based on OSX 10.11.6
> > > >>
> > > >> Steps taken:
> > > >>
> > > >> Binary tar.gz:
> > > >> - Download the tarball
> > > >> - Test the MD5sum, it matched
> > > >> - Extract the tarball
> > > >> - Start  HBase in standalone mode
> > > >> - Start the CLI
> > > >> - Access the master and regionServer web UIs
> > > >> - Stop HBase
> > > >>
> > > >> Source tar.gz:
> > > >> - Download the tarball
> > > >> - Test the MD5sum, it matched
> > > >> - Extract the tarball
> > > >> - Build using Maven 3.39 and JDK 1.8.0_102 on OSX and 'mvn clean
> > install
> > > >> --fail-at-end' and let the full test suite run
> > > >> -- A few failed tests but each passed when I ran it separately. Not
> > > >> surprising since I was running all this on a Macbook.
> > > >> - Start HBase in standalone mode
> > > >> - Start the CLI
> > > >> - Access the master and regionServer web UIs
> > > >> - Stop HBase
> > > >>
> > > >>
> > > >> On Tue, Aug 30, 2016, at 11:14 AM, Stack wrote:
> > > >> > The first release candidate for HBase 1.2.3 (hbase-1.2.3RC0) is
> > > >> > available for download at:
> > > >> >
> > > >> >  https://dist.apache.org/repos/dist/dev/hbase/hbase-1.2.3RC0/
> > > >> >
> > > >> > Maven artifacts are also available in a staging repository at:
> > > >> >
> > > >> >  ht

Re: [VOTE] First release candidate for hbase-1.2.3 (RC0) is available

2016-09-05 Thread Heng Chen

I will test it with 1.2.2 again.  Not sure about it now.

2016-09-05 20:26 GMT+08:00 Stack :

> Thanks Heng. Do you know if 1.2.2 or earlier versions of 1.2 failed in
> similar way or is this new phenomenon?
> Thanks,
> St.Ack
>
> On Mon, Sep 5, 2016 at 1:26 AM, Heng Chen 
> wrote:
>
> > - Unpacked source and binary tarballs: layout looks good
> > - Started up a 3-node cluster (Hadoop 2.7.2, Oracle JDK 8u20, 2 master, 2
> > rs) from binary tarballs.
> > - Verified that the web UI works and shell works
> > - build from source and run test case (JDK 8u20),  passed.
> > - Run LTT with 1M rows (100 writers,  30 readers (100%),  10 updaters
> > (20%))  all keys verified,  no warns, no errors,  no failed, latencies
> >  lgtm
> > - Run ITBLL with 2M rows (slowDeterministic), passed.
> >
> > Run ITBLL with 2M rows (serverKilling) has some issues.  I run two times,
> > all failed ( hbase --config ~/hadoop/hbase/f04_conf/
> > org.apache.hadoop.hbase.test.IntegrationTestBigLinkedList loop 1 1
> > 200 /tmp/it_16224_v0 1 -m serverKilling)
> >
> > The first time,  after kill components in cluster serval times (1 active
> > master, 1 backup master, 1 active RS, 1 dead RS), some regions fall in
> > Failed_open state on active RS,  so balance could not run, and ITBLL
> hang.
> >   Then i start the dead RS manually,   and stop the active RS,  ITBLL
> could
> > go on.  But after some time,  ITBLL hang again due to backup master could
> > not be startup,  so i do manually again,  and ITBLL go on ,  and at least
> > verified failed.
> >
> >
> > The second time,  there is no abnormal issues during ITBLL,  but verified
> > failed.
> >
> > Upload the logs
> >
> >
> >
> > 2016-09-03 0:33 GMT+08:00 Misty Stanley-Jones :
> >
> >> +1 based on OSX 10.11.6
> >>
> >> Steps taken:
> >>
> >> Binary tar.gz:
> >> - Download the tarball
> >> - Test the MD5sum, it matched
> >> - Extract the tarball
> >> - Start  HBase in standalone mode
> >> - Start the CLI
> >> - Access the master and regionServer web UIs
> >> - Stop HBase
> >>
> >> Source tar.gz:
> >> - Download the tarball
> >> - Test the MD5sum, it matched
> >> - Extract the tarball
> >> - Build using Maven 3.39 and JDK 1.8.0_102 on OSX and 'mvn clean install
> >> --fail-at-end' and let the full test suite run
> >> -- A few failed tests but each passed when I ran it separately. Not
> >> surprising since I was running all this on a Macbook.
> >> - Start HBase in standalone mode
> >> - Start the CLI
> >> - Access the master and regionServer web UIs
> >> - Stop HBase
> >>
> >>
> >> On Tue, Aug 30, 2016, at 11:14 AM, Stack wrote:
> >> > The first release candidate for HBase 1.2.3 (hbase-1.2.3RC0) is
> >> > available for download at:
> >> >
> >> >  https://dist.apache.org/repos/dist/dev/hbase/hbase-1.2.3RC0/
> >> >
> >> > Maven artifacts are also available in a staging repository at:
> >> >
> >> >  https://repository.apache.org/content/repositories/orgapache
> >> hbase-1149/
> >> >
> >> > Artifacts are signed with my key (30CD0996) published up in our KEYS
> >> > file at https://www-us.apache.org/dist/hbase/KEYS.
> >> >
> >> > The RC is tagged 1.2.3RC0 (I'll sign the tag the next time through...)
> >> >
> >> > The detailed source and binary compatibility report vs 1.2.2 has been
> >> > published for your review, at:
> >> >
> >> >  http://people.apache.org/~stack/1.2.2_1.2.3RC0_compat_report.html
> >> >
> >> > HBase 1.2.3 is the third patch release in the HBase 1.2 line,
> continuing
> >> > on
> >> > the theme of bringing a stable, reliable database to the Hadoop and
> >> NoSQL
> >> > communities. This release includes over 48 bug fixes since 1.2.2. In
> >> > particular it addresses an API incompatibility uncovered by our Apache
> >> > Phoenix bothers and sisters in our Table Interface: HBASE-16420.
> >> >
> >> > The full list of fixes included in this release is available at:
> >> >
> >> >
> >> > https://issues.apache.org/jira/secure/ReleaseNote.jspa?proje
> >> ctId=12310753&version=12336053
> >> >
> >> > and in the CHANGES.txt file included in the distribution.
> >> >
> >> > Please try out this candidate and vote +/-1 by 23:59 Pacific time on
> >> > Monday, 2016-09-05 as to whether we should release these artifacts as
> >> > HBase
> >> > 1.2.3.
> >> >
> >> > Thanks,
> >> > St.Ack
> >>
> >
> >
>

Re: [DISCUSS] 0.98 branch disposition

2016-09-01 Thread Heng Chen

Thanks andrew for your work on 0.98 branch,  we use this branch over 1
year,  and it's time for us to move on to branch-1.1+.

Thanks again.

2016-09-01 0:10 GMT+08:00 Nick Dimiduk :

> 0.98 has had a great run! I think it's entirely reasonable to start winding
> it down. Maybe some blog post walking through the upgrade process --
> demonstrating how easy it is and where the gotchas may lie would help
> encourage our brave and true 23%.
>
> Thank you for your dedicated and diligent 2.5 years and 21 releases! You
> are an impressive volunteer.
>
> On Friday, August 26, 2016, Andrew Purtell  wrote:
>
> > Greetings,
> >
> > HBase 0.98.0 was released in February of 2014. We have had 21 releases
> in 2
> > 1/2 years at a fairly regular cadence, a terrific run for any software
> > product. However as 0.98 RM I think it's now time to discuss winding down
> > 0.98. I want to give you notice of this as far in advance as possible
> (and
> > have just come to a decision barely this week). We have several more
> recent
> > releases at this point that are quite stable, a superset of 0.98
> > functionality, and have been proven in deployments. It's wise not to take
> > on unnecessary risk by upgrading from a particular version, but in the
> case
> > of 0.98, it's getting to be that time.
> >
> > If you have not yet, I would encourage you to take a few moments to
> > participate in our fully anonymous usage survey:
> > https://www.surveymonkey.com/r/NJFKKGW . According to results received
> so
> > far, the versions of HBase in production use break down as:
> >
> >- 0.94 - 19%
> >- 0.96 - 2%
> >- *0.98 - 23%*
> >- 1.0 - 20%
> >- 1.1 - 34%
> >- 1.2 - 23%
> >
> > These figures add up to more than 100% because some respondents I expect
> > run more than one version.
> >
> > For those 23% still on 0.98 (and the 2% on 0.96) it's time to start
> > seriously thinking about an upgrade to 1.1 or later. The upgrade process
> > can be done in a rolling manner. We consider 1.1 (and 1.2 for that
> matter)
> > to be stable and ready for production.
> >
> > As 0.98 RM, my plan is to continue active maintenance at a roughly
> monthly
> > release cadence through December of this year. However in January 2017 I
> > plan to tender my resignation as 0.98 RM and, hopefully, take that active
> > role forward to more recent code not so full of dust and cobwebs and more
> > interesting to develop and maintain. Unless someone else steps up to take
> > on that task this will end regular 0.98 releases. I do not expect anyone
> to
> > take on that role, frankly. Of course we can still make occasional 0.98
> > releases on demand. Any committer can wrangle the bits and the PMC can
> > entertain a vote. (If you can conscript a committer to assist with
> > releasing I don't think you even need to be a committer to function as RM
> > for a release.) Anyway, concurrent with my resignation as 0.98 RM I
> expect
> > the project to discuss and decide an official position on 0.98 support.
> It
> > is quite possible we will announce that position to be an end of life.
> >
> >
> > --
> > Best regards,
> >
> >- Andy
> >
> > Problems worthy of attack prove their worth by hitting back. - Piet Hein
> > (via Tom White)
> >
>

Re: [ANNOUNCE] Dima Spivak joins the Apache HBase PMC

2016-08-31 Thread Heng Chen

congrats

2016-09-01 7:46 GMT+08:00 Jonathan Hsieh :

> congrats dima!
>
> On Wednesday, August 31, 2016, Andrew Purtell  wrote:
>
> > On behalf of the Apache HBase PMC I am pleased to announce that Dima
> Spivak
> > has accepted our invitation to become a committer and PMC member on the
> > Apache HBase project. Dima has been an active contributor for some time,
> > particularly in development and contribution of release tooling that all
> of
> > our RMs now use, such as the API compatibility checker. Dima has also
> been
> > active in testing and voting on release candidates. Release voting is
> > important to project health and momentum and demonstrates interest and
> > capability above and beyond just committing. We wish to recognize this
> and
> > make those release votes binding. Please join me in thanking Dima for his
> > contributions to date and anticipation of many more contributions.
> >
> > Welcome to the HBase project, Dima!
> >
> > --
> > Best regards,
> >
> >- Andy
> >
> > Problems worthy of attack prove their worth by hitting back. - Piet Hein
> > (via Tom White)
> >
>
>
> --
> // Jonathan Hsieh (shay)
> // HBase Tech Lead, Software Engineer, Cloudera
> // j...@cloudera.com // @jmhsieh
>

[jira] [Created] (HBASE-16512) when locate meta region, we should respect the param "useCache" passed in on 0.98

2016-08-28 Thread Heng Chen (JIRA)

Heng Chen created HBASE-16512:
-

 Summary: when locate meta region, we should respect the param 
"useCache" passed in on 0.98
 Key: HBASE-16512
 URL: https://issues.apache.org/jira/browse/HBASE-16512
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.98.21
Reporter: Heng Chen


we found that when RS with meta crash,  client will retry the same request,  
but it still use the original meta location in cache, so all request retried 
will failed. 

Notice the code in HConnectionMananger#locateRegionInMeta,  the "useCache" 
passed in is not used when try to found the meta region. 

{code}
private HRegionLocation locateRegionInMeta(final TableName parentTable,
  final TableName tableName, final byte [] row, boolean useCache,
  Object regionLockObject, boolean retry)
throws IOException {
  ..
  for (int tries = 0; true; tries++) {
   .
HRegionLocation metaLocation = null;
try {
  // locate the meta region
  metaLocation = locateRegion(parentTable, metaKey, true, false); 
//NOTICE: we should honor the "useCache" passed in when locate the meta region.
  
  }
}
{code}







--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HBASE-16490) Fix race condition between SnapshotManager and SnapshotCleaner

2016-08-23 Thread Heng Chen (JIRA)

Heng Chen created HBASE-16490:
-

 Summary: Fix race condition between SnapshotManager and 
SnapshotCleaner
 Key: HBASE-16490
 URL: https://issues.apache.org/jira/browse/HBASE-16490
 Project: HBase
  Issue Type: Bug
Reporter: Heng Chen
 Fix For: 2.0.0


As [~mbertozzi] comments on HBASE-16464,  there maybe race condition between 
SnapshotManager and SnapshotCleaner.  We should use one lock when create 
snapshot,  and cleanup should acquire the lock before take action.

One method is pass HMaster as param into Cleaner through 
{{FileCleanerDelegate.getDeletableFiles}},  suggestions are welcomed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HBASE-16464) archive folder grows bigger and bigger due to corrupt snapshot under tmp dir

2016-08-22 Thread Heng Chen (JIRA)

Heng Chen created HBASE-16464:
-

 Summary: archive folder grows bigger and bigger due to corrupt 
snapshot under tmp dir
 Key: HBASE-16464
 URL: https://issues.apache.org/jira/browse/HBASE-16464
 Project: HBase
  Issue Type: Bug
Reporter: Heng Chen


We met the problem on our real production cluster,  we need to cleanup some 
data on hbase,  we notice the archive folder is much larger than others,  so we 
delete all snapshots of all tables,  but the archive folder still grows bigger 
and bigger. 

After check the hmaster log, we notice the exception below:
{code}
2016-08-22 15:34:33,089 ERROR [f04,16000,1471240833208_ChoreService_1] 
snapshot.SnapshotHFileCleaner: Exception while checking if files were valid, 
keeping them just in case.
org.apache.hadoop.hbase.snapshot.CorruptedSnapshotException: Couldn't read 
snapshot info 
from:hdfs://f04/hbase/.hbase-snapshot/.tmp/frog_stastic_2016-08-17/.snapshotinfo
at 
org.apache.hadoop.hbase.snapshot.SnapshotDescriptionUtils.readSnapshotInfo(SnapshotDescriptionUtils.java:295)
at 
org.apache.hadoop.hbase.snapshot.SnapshotReferenceUtil.getHFileNames(SnapshotReferenceUtil.java:328)
at 
org.apache.hadoop.hbase.master.snapshot.SnapshotHFileCleaner$1.filesUnderSnapshot(SnapshotHFileCleaner.java:85)
at 
org.apache.hadoop.hbase.master.snapshot.SnapshotFileCache.getSnapshotsInProgress(SnapshotFileCache.java:303)
at 
org.apache.hadoop.hbase.master.snapshot.SnapshotFileCache.getUnreferencedFiles(SnapshotFileCache.java:194)
at 
org.apache.hadoop.hbase.master.snapshot.SnapshotHFileCleaner.getDeletableFiles(SnapshotHFileCleaner.java:62)
at 
org.apache.hadoop.hbase.master.cleaner.CleanerChore.checkAndDeleteFiles(CleanerChore.java:233)
at 
org.apache.hadoop.hbase.master.cleaner.CleanerChore.checkAndDeleteEntries(CleanerChore.java:157)
at 
org.apache.hadoop.hbase.master.cleaner.CleanerChore.checkAndDeleteDirectory(CleanerChore.java:180)
at 
org.apache.hadoop.hbase.master.cleaner.CleanerChore.checkAndDeleteEntries(CleanerChore.java:149)
at 
org.apache.hadoop.hbase.master.cleaner.CleanerChore.checkAndDeleteDirectory(CleanerChore.java:180)
at 
org.apache.hadoop.hbase.master.cleaner.CleanerChore.checkAndDeleteEntries(CleanerChore.java:149)
at 
org.apache.hadoop.hbase.master.cleaner.CleanerChore.checkAndDeleteDirectory(CleanerChore.java:180)
at 
org.apache.hadoop.hbase.master.cleaner.CleanerChore.checkAndDeleteEntries(CleanerChore.java:149)
at 
org.apache.hadoop.hbase.master.cleaner.CleanerChore.checkAndDeleteDirectory(CleanerChore.java:180)
at 
org.apache.hadoop.hbase.master.cleaner.CleanerChore.checkAndDeleteEntries(CleanerChore.java:149)
at 
org.apache.hadoop.hbase.master.cleaner.CleanerChore.checkAndDeleteDirectory(CleanerChore.java:180)
at 
org.apache.hadoop.hbase.master.cleaner.CleanerChore.checkAndDeleteEntries(CleanerChore.java:149)
at 
org.apache.hadoop.hbase.master.cleaner.CleanerChore.chore(CleanerChore.java:124)
at org.apache.hadoop.hbase.ScheduledChore.run(ScheduledChore.java:185)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.io.FileNotFoundException: File does not exist: 
/hbase/.hbase-snapshot/.tmp/frog_stastic_2016-08-17/.snapshotinfo
at 
org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:71)
at 
org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:61)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:1828)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1799)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1712)
at 
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:587)
at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:365)
at 
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at 
org.apache.hadoop.ipc.ProtobufRpcE

[jira] [Created] (HBASE-16427) After HBASE-13701, hbase standalone mode start failed due to mkdir failed

2016-08-16 Thread Heng Chen (JIRA)

Heng Chen created HBASE-16427:
-

 Summary: After HBASE-13701,  hbase standalone mode start failed 
due to mkdir failed
 Key: HBASE-16427
 URL: https://issues.apache.org/jira/browse/HBASE-16427
 Project: HBase
  Issue Type: Bug
Reporter: Heng Chen


{code}
2016-08-17 10:26:43,305 ERROR [main] regionserver.SecureBulkLoadManager: Failed 
to create or set permission on staging directory /user/chenheng/hbase-staging
ExitCodeException exitCode=1: chmod: /user/chenheng/hbase-staging: No such file 
or directory

at org.apache.hadoop.util.Shell.runCommand(Shell.java:545)
at org.apache.hadoop.util.Shell.run(Shell.java:456)
at 
org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:722)
at org.apache.hadoop.util.Shell.execCommand(Shell.java:815)
at org.apache.hadoop.util.Shell.execCommand(Shell.java:798)
at 
org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.java:728)
at 
org.apache.hadoop.fs.FilterFileSystem.setPermission(FilterFileSystem.java:502)
at 
org.apache.hadoop.hbase.regionserver.SecureBulkLoadManager.start(SecureBulkLoadManager.java:124)
at 
org.apache.hadoop.hbase.regionserver.HRegionServer.(HRegionServer.java:626)
at org.apache.hadoop.hbase.master.HMaster.(HMaster.java:406)
at 
org.apache.hadoop.hbase.master.HMasterCommandLine$LocalHMaster.(HMasterCommandLine.java:307)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:422)
at 
org.apache.hadoop.hbase.util.JVMClusterUtil.createMasterThread(JVMClusterUtil.java:140)
at 
org.apache.hadoop.hbase.LocalHBaseCluster.addMaster(LocalHBaseCluster.java:221)
at 
org.apache.hadoop.hbase.LocalHBaseCluster.(LocalHBaseCluster.java:156)
at 
org.apache.hadoop.hbase.master.HMasterCommandLine.startMaster(HMasterCommandLine.java:226)
at 
org.apache.hadoop.hbase.master.HMasterCommandLine.run(HMasterCommandLine.java:139)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at 
org.apache.hadoop.hbase.util.ServerCommandLine.doMain(ServerCommandLine.java:127)
at org.apache.hadoop.hbase.master.HMaster.main(HMaster.java:2421)
2016-08-17 10:26:43,306 ERROR [main] master.HMasterCommandLine: Master exiting
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HBASE-16258) Some issues about metrices on webUI

2016-07-20 Thread Heng Chen (JIRA)

Heng Chen created HBASE-16258:
-

 Summary: Some issues about metrices on webUI
 Key: HBASE-16258
 URL: https://issues.apache.org/jira/browse/HBASE-16258
 Project: HBase
  Issue Type: Bug
Reporter: Heng Chen






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (HBASE-16076) Cannot configure split policy in HBase shell

2016-07-17 Thread Heng Chen (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-16076?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Heng Chen resolved HBASE-16076.
---
Resolution: Fixed
  Assignee: Heng Chen

Commit to master

> Cannot configure split policy in HBase shell
> 
>
> Key: HBASE-16076
> URL: https://issues.apache.org/jira/browse/HBASE-16076
> Project: HBase
>  Issue Type: Bug
>  Components: documentation
>Affects Versions: 2.0.0
>Reporter: Youngjoon Kim
>Assignee: Heng Chen
>Priority: Minor
> Fix For: 2.0.0
>
> Attachments: HBASE-16076.patch, HBASE-16076_v1.patch
>
>
> The reference guide explains how to configure split policy in HBase 
> shell([link|http://hbase.apache.org/book.html#_custom_split_policies]).
> {noformat}
> Configuring the Split Policy On a Table Using HBase Shell
> hbase> create 'test', {METHOD => 'table_att', CONFIG => {'SPLIT_POLICY' => 
> 'org.apache.hadoop.hbase.regionserver.ConstantSizeRegionSplitPolicy'}},
> {NAME => 'cf1'}
> {noformat}
> But if run that command, shell complains 'An argument ignored (unknown or 
> overridden): CONFIG', and the table description has no split policy.
> {noformat}
> hbase(main):067:0* create 'test', {METHOD => 'table_att', CONFIG => 
> {'SPLIT_POLICY' => 
> 'org.apache.hadoop.hbase.regionserver.ConstantSizeRegionSplitPolicy'}}, {NAME 
> => 'cf1'}
> An argument ignored (unknown or overridden): CONFIG
> Created table test
> Took 1.2180 seconds
> hbase(main):068:0> describe 'test'
> Table test is ENABLED
> test
> COLUMN FAMILIES DESCRIPTION
> {NAME => 'cf1', DATA_BLOCK_ENCODING => 'NONE', BLOOMFILTER => 'ROW', 
> REPLICATION_SCOPE => '0', COMPRESSION => 'NONE', VERSIONS => '1', TTL => 
> 'FOREVER', MIN_VERSIONS => '0', IN_MEMORY_COMPACTION => 'false', 
> KEEP_DELETED_CELLS => 'FALSE', BLOCKSIZE => '65536', IN_MEMORY => '
> false', BLOCKCACHE => 'true'}
> 1 row(s)
> Took 0.0200 seconds
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (HBASE-16111) Truncate preserve shell command is broken

2016-06-27 Thread Heng Chen (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-16111?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Heng Chen resolved HBASE-16111.
---
  Resolution: Fixed
Assignee: Heng Chen
Hadoop Flags: Reviewed

> Truncate preserve shell command is broken
> -
>
> Key: HBASE-16111
> URL: https://issues.apache.org/jira/browse/HBASE-16111
> Project: HBase
>  Issue Type: Bug
>  Components: shell
>Reporter: Elliott Clark
>    Assignee: Heng Chen
>  Labels: shell
> Fix For: 2.0.0
>
> Attachments: HBASE-16111.patch
>
>
> On a recent version of master I get this:
> {code}
> hbase(main):001:0> truncate_preserve 'TestTable'
> ERROR: undefined local variable or method `table' for 
> #
> Here is some help for this command:
>   Disables, drops and recreates the specified table while still maintaing the 
> previous region boundaries.
> Took 0.0290 seconds
> hbase(main):002:0> truncate 'TestTable'
> Truncating 'TestTable' table (it may take a while):
> Disabling table...
> Truncating table...
> Took 10.0040 seconds
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (HBASE-15900) RS stuck in get lock of HStore

2016-06-22 Thread Heng Chen (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-15900?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Heng Chen resolved HBASE-15900.
---
Resolution: Duplicate

> RS stuck in get lock of HStore
> --
>
> Key: HBASE-15900
> URL: https://issues.apache.org/jira/browse/HBASE-15900
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 1.1.1, 1.3.0
>Reporter: Heng Chen
> Attachments: 0d32a6bab354e6cc170cd59a2d485797.jstack.txt, 
> 0d32a6bab354e6cc170cd59a2d485797.rs.log, 9fe15a52_9fe15a52_save, 
> c91324eb_81194e359707acadee2906ffe36ab130.log, dump.txt
>
>
> It happens on my production cluster when i run MR job.  I save the dump.txt 
> from this RS webUI.
> Many threads stuck here:
> {code}
> Thread 133 (B.defaultRpcServer.handler=94,queue=4,port=16020):
>32   State: WAITING
>31   Blocked count: 477816
>30   Waited count: 535255
>29   Waiting on 
> java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync@6447ba67
>28   Stack:
>27 sun.misc.Unsafe.park(Native Method)
>26 java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
>25 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
>24 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared(AbstractQueuedSynchronizer.java:967)
>23 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(AbstractQueuedSynchronizer.java:1283)
>22 
> java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock(ReentrantReadWriteLock.java:727)
>21 org.apache.hadoop.hbase.regionserver.HStore.add(HStore.java:666)
>20 
> org.apache.hadoop.hbase.regionserver.HRegion.applyFamilyMapToMemstore(HRegion.java:3621)
>19 
> org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchMutation(HRegion.java:3038)
>18 
> org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:2793)
>17 
> org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:2735)
>16 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.doBatchOp(RSRpcServices.java:692)
>15 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.doNonAtomicRegionMutation(RSRpcServices.java:654)
>14 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.multi(RSRpcServices.java:2029)
>13 
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:32213)
>12 org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2112)
>11 org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:101)
>10 
> org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:130)
> 9 org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:107)
> 8 java.lang.Thread.run(Thread.java:745)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: [DISCUSS] HBase-2.0 SHOULD be rolling upgradable and wire-compatible with 1.x

2016-06-21 Thread Heng Chen

bq. We should keep main data paths working between 1.x client and 2.0
cluster. It is fine if some admin operation does not work with older client.
+1



2016-06-22 2:13 GMT+08:00 Enis Söztutar :

> Agreed with above. We should keep main data paths working between 1.x
> client and 2.0 cluster. It is fine if some admin operation does not work
> with older client. Agreed on replication as well, it must work or we should
> have a dedicated replicator implementation.
>
> HBASE-16060 would have been fine to leave unresolved according to above,
> however, accessing the table state is needed from the main data path in
> retry logic. Whenever we cannot find a region between retries, we check
> whether the table is disabled or not (because from region assignment
> perspective, there is no distinction between a region not assigned yet, or
> a disabled table. So, I think HBASE-16060 is a blocker for 2.0
> unfortunately.
>
> Enis
>
> On Tue, Jun 21, 2016 at 10:51 AM, Andrew Purtell  >
> wrote:
>
> > Inline
> >
> > > On Jun 20, 2016, at 11:30 PM, Matteo Bertozzi  >
> > wrote:
> > >
> > > I think everyone wants rolling upgrade. the discussion should probably
> be
> > > around how much compatibility code do we want to keep around.
> > >
> > > using as example HBASE-16060, we need to decide how much are we rolling
> > > upgradable and from where.
> > > I'm not too convinced that we should have extra code in master to
> > "simulate
> > > the old states",
> > > I'll rather have cleaner code in 2.0 and force the users to move to one
> > of
> > > the latest 1.x.y
> > > there are not many changes in the 1.x releases, so we should be able to
> > say:
> > > if you are on 1.1 move to the latest 1.1.x, if you are on 1.2 move to
> the
> > > latest 1.2.x and so on.
> > >
> > > also there are some operations that may not be needed during rolling
> > > upgrades,
> > > and we can cut on compatibility to have some code removed.
> > > an example here is HBASE-15521 where we are no longer able to
> > clone/restore
> > > snapshot during 1.x -> 2.x rolling upgrade, until the two master are on
> > > 2.x. but this may be extended to you can't perform some operation until
> > all
> > > the machines are on 2.x for some future change.
> > >
> > > I think we should aim for something like:
> > > - data path: HTable put/get/scan/... must work during a rolling upgrade
> >
> > Yes.
> >
> > > - replication: must? work during rolling upgrade
> >
> > This is a must. If anything this is what gave users the most pain at the
> > "singularity" - so much so that at least one built custom cross version
> > replication endpoints.  That should have been on us to provide.
> >
> > > - admin: some operation may not be working during rolling upgrade
> >
> > This would depend on what won't work.
> >
> > I think it would be great if you could start a rolling upgrade, stop at
> > like 10% or 20% of the fleet, see how it goes for a while, and then
> either
> > commit or roll back. How comfortable that mixed version state is will
> > depend on what won't work. I submit this for consideration as a stretch
> > goal.
> >
> > > - upgrade to the latest 1.x.y before the 2.x upgrade (we can add in 2.x
> > > master and rs the ability to check the client version)
> >
> > This would be fine, I think.
> >
> > >
> > >
> > > Matteo
> > >
> > >
> > >> On Tue, Jun 21, 2016 at 12:05 AM, Dima Spivak 
> > wrote:
> > >>
> > >> If there’s no technical limitation, we should definitely do it. As you
> > >> note, customers running in production hate when they have to shut down
> > >> clusters and with some of the testing infrastructure being rolled out,
> > this
> > >> is definitely something we can set up automated testing for. +1
> > >>
> > >> -Dima
> > >>
> > >>> On Mon, Jun 20, 2016 at 2:58 PM, Enis Söztutar 
> > wrote:
> > >>>
> > >>> Time to formalize 2.0 rolling upgrade scenario?
> > >>>
> > >>> 0.94 -> 0.96 singularity was a real pain for operators and for our
> > users.
> > >>> If possible we should not have the users suffer through the same
> thing
> > >>> unless there is a very compelling reason. For the current stuff in
> > >> master,
> > >>> there is nothing that will prevent us to not have rolling upgrade
> > support
> > >>> for 2.0. So I say, we should decide on the rolling upgrade
> requirement
> > >> now,
> > >>> and start to evaluate incoming patches accordingly. Otherwise, we
> risk
> > >> the
> > >>> option to go deeper down the hole.
> > >>>
> > >>> What do you guys think. Previous threads [1] and [2] seems to be in
> > >> favor.
> > >>> Should we vote?
> > >>>
> > >>> Ref:
> > >>> [1]
> > >>
> >
> http://search-hadoop.com/m/YGbbsd4An1aso5E1&subj=HBase+1+x+to+2+0+upgrade+goals+
> > >>>
> > >>> [2]
> > >>
> >
> http://search-hadoop.com/m/YGbb1CBXTL8BTI&subj=thinking+about+supporting+upgrades+to+HBase+1+x+and+2+x
> > >>
> >
>

[jira] [Created] (HBASE-16040) Remove configuration "hbase.replication"

2016-06-16 Thread Heng Chen (JIRA)

Heng Chen created HBASE-16040:
-

 Summary: Remove configuration "hbase.replication"
 Key: HBASE-16040
 URL: https://issues.apache.org/jira/browse/HBASE-16040
 Project: HBase
  Issue Type: Bug
Reporter: Heng Chen
 Fix For: 2.0.0


This configuration was introduced to reduce overhead of replication. 
Now the overhead of replication is negligible. Besides that, this config is not 
in hbase-default.xml,  user has to read the code to know about it and its' 
default value, this is unfriendly. 

So let's remove it.  suggestions?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (HBASE-16031) Documents about "hbase.replication" default value seems wrong

2016-06-16 Thread Heng Chen (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-16031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Heng Chen resolved HBASE-16031.
---
   Resolution: Fixed
 Assignee: Heng Chen
Fix Version/s: 2.0.0

> Documents about "hbase.replication" default value seems wrong
> -
>
> Key: HBASE-16031
> URL: https://issues.apache.org/jira/browse/HBASE-16031
> Project: HBase
>  Issue Type: Bug
>Reporter: Heng Chen
>Assignee: Heng Chen
> Fix For: 2.0.0
>
> Attachments: HBASE-16031.patch
>
>
> {code}
>   public static final String
>   REPLICATION_ENABLE_KEY = "hbase.replication";
>   public static final boolean
>   REPLICATION_ENABLE_DEFAULT = true;
> {code}
> The code shows that default value is true, but documents shows the default 
> value is false.
> {code}
> | hbase.replication
> | Whether replication is enabled or disabled on a given
> cluster
> | false
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HBASE-16031) Documents about "hbase.replication" default value seems wrong

2016-06-14 Thread Heng Chen (JIRA)

Heng Chen created HBASE-16031:
-

 Summary: Documents about "hbase.replication" default value seems 
wrong
 Key: HBASE-16031
 URL: https://issues.apache.org/jira/browse/HBASE-16031
 Project: HBase
  Issue Type: Bug
Reporter: Heng Chen


{code}
  public static final String
  REPLICATION_ENABLE_KEY = "hbase.replication";
  public static final boolean
  REPLICATION_ENABLE_DEFAULT = true;
{code}

The code shows that default value is true, but documents shows the default 
value is false.

{code}

| hbase.replication
| Whether replication is enabled or disabled on a given
cluster
| false
{code}





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: [ANNOUNCE] New HBase committer Apekshit Sharma

2016-06-05 Thread Heng Chen

Congratulations

2016-06-04 8:09 GMT+08:00 Nick Dimiduk :

> Nice work Appy! Keep at it :)
>
> On Fri, Jun 3, 2016 at 10:56 AM, Andrew Purtell 
> wrote:
>
> > On behalf of the Apache HBase PMC, I am pleased to announce that Apekshit
> > Sharma has accepted the PMC's invitation to become a committer on the
> > project.
> > We appreciate all of Appy's generous contributions thus far and look
> > forward to his continued involvement.
> >
> > Congratulations and welcome, Appy!
> >
> > --
> > Best regards,
> >
> >- Andy
> >
> > Problems worthy of attack prove their worth by hitting back. - Piet Hein
> > (via Tom White)
> >
>

Re: Want to help

2016-05-30 Thread Heng Chen

Sorry,  it was created by mistake,  i can't delete it by myself.  If you
want to do something in HBase, please search 'beginner' Label issues in
https://issues.apache.org/jira/browse/HBASE

2016-05-30 22:15 GMT+08:00 :

> To Whom It May Concern:
>  Hi, I'm Reid, a student engaging in Distributed System Design.
>  After seeing a help wanted about HBase from
> https://helpwanted.apache.org/task.html?57953b5ddcf5b938e1af2aed4a2becd7181ccee7
>I wonder if I can get more details about the issue.
>  Thank you!

[jira] [Created] (HBASE-15900) RS stuck in get lock of HStore

2016-05-27 Thread Heng Chen (JIRA)

Heng Chen created HBASE-15900:
-

 Summary: RS stuck in get lock of HStore
 Key: HBASE-15900
 URL: https://issues.apache.org/jira/browse/HBASE-15900
 Project: HBase
  Issue Type: Bug
Reporter: Heng Chen
 Attachments: dump.txt

It happens on my production cluster when i run MR job.  I save the dump.txt 
from this RS webUI.

Many threads stuck here:
{code}
Thread 133 (B.defaultRpcServer.handler=94,queue=4,port=16020):
   32   State: WAITING
   31   Blocked count: 477816
   30   Waited count: 535255
   29   Waiting on 
java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync@6447ba67
   28   Stack:
   27 sun.misc.Unsafe.park(Native Method)
   26 java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
   25 
java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
   24 
java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared(AbstractQueuedSynchronizer.java:967)
   23 
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(AbstractQueuedSynchronizer.java:1283)
   22 
java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock(ReentrantReadWriteLock.java:727)
   21 org.apache.hadoop.hbase.regionserver.HStore.add(HStore.java:666)
   20 
org.apache.hadoop.hbase.regionserver.HRegion.applyFamilyMapToMemstore(HRegion.java:3621)
   19 
org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchMutation(HRegion.java:3038)
   18 
org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:2793)
   17 
org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:2735)
   16 
org.apache.hadoop.hbase.regionserver.RSRpcServices.doBatchOp(RSRpcServices.java:692)
   15 
org.apache.hadoop.hbase.regionserver.RSRpcServices.doNonAtomicRegionMutation(RSRpcServices.java:654)
   14 
org.apache.hadoop.hbase.regionserver.RSRpcServices.multi(RSRpcServices.java:2029)
   13 
org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:32213)
   12 org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2112)
   11 org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:101)
   10 
org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:130)
9 org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:107)
8 java.lang.Thread.run(Thread.java:745)
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: [ANNOUNCE] Mikhail Antonov joins the Apache HBase PMC

2016-05-26 Thread Heng Chen

Welcome Mikhail.

2016-05-27 4:30 GMT+08:00 Enis Söztutar :

> Welcome Mikhail.
>
> Enis
>
> On Thu, May 26, 2016 at 12:19 PM, Gary Helmling 
> wrote:
>
> > Welcome Mikhail!
> >
> > On Thu, May 26, 2016 at 11:47 AM Ted Yu  wrote:
> >
> > > Congratulations, Mikhail !
> > >
> > > On Thu, May 26, 2016 at 11:30 AM, Andrew Purtell 
> > > wrote:
> > >
> > > > On behalf of the Apache HBase PMC I am pleased to announce that
> Mikhail
> > > > Antonov has accepted our invitation to become a PMC member on the
> > Apache
> > > > HBase project. Mikhail has been an active contributor in many areas,
> > > > including recently taking on the Release Manager role for the
> upcoming
> > > > 1.3.x code line. Please join me in thanking Mikhail for his
> > contributions
> > > > to date and anticipation of many more contributions.
> > > >
> > > > Welcome to the PMC, Mikhail!
> > > >
> > > > --
> > > > Best regards,
> > > >
> > > >- Andy
> > > >
> > > > Problems worthy of attack prove their worth by hitting back. - Piet
> > Hein
> > > > (via Tom White)
> > > >
> > >
> >
>

Re: NOTICE: removing doBulkLoad(Path hfofDir, final HTable table) though it has not been through a complete deprecation cycle?

2016-05-22 Thread Heng Chen

+1.

2016-05-23 1:29 GMT+08:00 Stack :

> HBASE-15875 wants to remove HTable and HTableInterface, classes that have
> been deprecated since before hbase-1.0.0 release. Unfortunately, we missed
> deprecating the method doBulkLoad(Path hfofDir, final HTable table) in
> LoadIncrementalHFiles, a public, stable class.
>
> My thinking is that its removal is 'ok' -- with proper notice -- since this
> a method used by offline tooling. My thinking is that as long as proper
> notice, folks can prepare their tooling ahead of time changing them to use
> the alternative that takes a Table instance in time for the upgrade. This
> method is also 'damaged' given it has a param that was properly deprecated
> -- i.e. the HTable instance -- but nonetheless, its removal will be a
> breaking change in 2.0.
>
> You lot all good w/ this?
> Thanks,
> St.
>

Re: Help with task: HBase

2016-05-20 Thread Heng Chen

Sorry,  the task was created by accident.

2016-05-20 12:49 GMT+08:00 Micheal Myers :

> I would like to help out with the task listed at
> https://helpwanted.apache.org/task.html?57953b5d
> If there is any preliminary info you would like me to review, please let
> me know. Other than that, please contact me if you need any information
> from me. Thanks.

[jira] [Created] (HBASE-15844) We should respect hfile.block.index.cacheonwrite when write intermediate index Block

2016-05-17 Thread Heng Chen (JIRA)

Heng Chen created HBASE-15844:
-

 Summary: We should respect hfile.block.index.cacheonwrite when 
write intermediate index Block
 Key: HBASE-15844
 URL: https://issues.apache.org/jira/browse/HBASE-15844
 Project: HBase
  Issue Type: Bug
Reporter: Heng Chen


{code: title=BlockIndexWriter#writeIntermediateBlock}
  if (cacheConf != null) {
HFileBlock blockForCaching = blockWriter.getBlockForCaching(cacheConf);
cacheConf.getBlockCache().cacheBlock(new BlockCacheKey(nameForCaching,
  beginOffset, true, blockForCaching.getBlockType()), blockForCaching);
  }
{code}

The if condition should be ?
{code}
if (cacheConf != null && cacheConf.shouldCacheIndexesOnWrite()) 
{code} 





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HBASE-15814) Miss important information in Document of HBase Security

2016-05-10 Thread Heng Chen (JIRA)

Heng Chen created HBASE-15814:
-

 Summary: Miss important information in Document of HBase Security
 Key: HBASE-15814
 URL: https://issues.apache.org/jira/browse/HBASE-15814
 Project: HBase
  Issue Type: Bug
  Components: documentation
Reporter: Heng Chen


I have deployed secure cluster recently, and found we miss important 
information in http://hbase.apache.org/book.html#security

Some configurations like 
{code}

  hbase.regionserver.kerberos.principal 
  hbase/_h...@your-realm.com 
 

 
  hbase.regionserver.keytab.file 
  /etc/hbase/conf/hbase.keytab 


 
  hbase.master.kerberos.principal 
  hbase/_h...@your-realm.com 
 

 
hbase.master.keytab.file 
/etc/hbase/conf/hbase.keytab 

{code}

And i found more detailed document in 
http://www.cloudera.com/documentation/enterprise/5-5-x/topics/cdh_sg_hbase_authentication.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HBASE-15804) Return 404 in some document link

2016-05-09 Thread Heng Chen (JIRA)

Heng Chen created HBASE-15804:
-

 Summary: Return 404 in some document link
 Key: HBASE-15804
 URL: https://issues.apache.org/jira/browse/HBASE-15804
 Project: HBase
  Issue Type: Bug
Reporter: Heng Chen


http://hbase.apache.org/book.html#security

The link to {{Understanding User Authentication and Authorization in Apache 
HBase} } return 404



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (HBASE-15720) Print row locks at the debug dump page

2016-05-02 Thread Heng Chen (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-15720?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Heng Chen resolved HBASE-15720.
---
Resolution: Fixed

> Print row locks at the debug dump page
> --
>
> Key: HBASE-15720
> URL: https://issues.apache.org/jira/browse/HBASE-15720
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 1.2.1
>Reporter: Enis Soztutar
>    Assignee: Heng Chen
> Fix For: 2.0.0, 1.3.0, 1.4.0, 1.1.5, 1.2.2, 0.98.20, 1.0.5
>
> Attachments: 4742C21D-B9CE-4921-9B32-CC319488EC64.png, 
> HBASE-15720-branch-1.0-addendum.patch, HBASE-15720-branch-1.2-addendum.patch, 
> HBASE-15720.patch
>
>
> We had to debug cases where some handlers are holding row locks for an 
> extended time (and maybe leak) and other handlers are getting timeouts for 
> obtaining row locks. 
> We should add row lock information at the debug page at the RS UI to be able 
> to live-debug such cases.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Reopened] (HBASE-15278) AsyncRPCClient hangs if Connection closes before RPC call response

2016-04-29 Thread Heng Chen (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-15278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Heng Chen reopened HBASE-15278:
---

revert it.  See testcase failed 
https://builds.apache.org/job/PreCommit-HBASE-Build/1691/testReport/

> AsyncRPCClient hangs if Connection closes before RPC call response 
> ---
>
> Key: HBASE-15278
> URL: https://issues.apache.org/jira/browse/HBASE-15278
> Project: HBase
>  Issue Type: Bug
>  Components: rpc, test
>Affects Versions: 2.0.0
>Reporter: Enis Soztutar
>Assignee: Heng Chen
>Priority: Blocker
> Fix For: 2.0.0
>
> Attachments: HBASE-15278.patch, HBASE-15278_v1.patch, 
> HBASE-15278_v2.patch, hbase-15278_v00.patch
>
>
> The test for HBASE-15212 discovered an issue with Async RPC Client. 
> In that test, we are closing the connection if an RPC call writes a call 
> larger than max allowed size, the server closes the connection. However the 
> async client does not seem to handle connection closes with outstanding RPC 
> calls. The client just hangs. 
> Marking this blocker against 2.0 since it is default there. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: [DISCUSS] Make AsyncFSWAL the default WAL in 2.0

2016-04-28 Thread Heng Chen

The performance is quite great,  but i think maybe we should collect some
experience on real production cluster before we make it as default.

2016-04-29 11:30 GMT+08:00 张铎 :

> Inline comments.
> Thanks,
>
> 2016-04-29 10:57 GMT+08:00 Sean Busbey :
>
> > I am nervous about having default out-of-the-box new HBase users reliant
> on
> > a bespoke HDFS client, especially given Hadoop's compatibility
> > promises and history. Answers for these questions would make me more
> > confident:
> >
> > 1) Where are we on getting the client-side changes to HDFS pushed back
> > upstream?
> >
> No progress yet... Here I want to tell a good story that HBase is already
> use it as default :)
>
> >
> > 2) How well do we detect when our FS is not HDFS and what does
> > fallback look like?
> >
> Just wrap FSDataOutputStream to make it act like an asynchronous
> output(call hflush in a separated thread). The performance is not good I
> think.
>
> >
> > 3) Will this mean altering the versions of Hadoop we label as
> > supported for HBase 2.y+?
> >
> I have tested with hadoop versions from 2.4.x to 2.7.x, so I don't think we
> need to change the supported versions?
>
> >
> > 4) How are we going to ensure our client remains compatible with newer
> > Hadoop releases?
> >
> We can not ensure, HDFS always breaks HBase at a new release...
> I need to test AsyncFSWAL on every new 2.x release and make it compatible
> with that version. And back to #1, I think we should make sure that the
> AsyncFSOutput will be in HDFS-3.0. And in HBase-3.0, we can introduce a new
> 'AsyncFSWAL' that use the AsyncFSOutput in HDFS.
>
> >
> > On Thu, Apr 28, 2016 at 9:42 PM, Duo Zhang  wrote:
> > > Six month after I filed HBASE-14790...
> > >
> > > Now the AsyncFSWAL is ready. The WALPE result shows that it is
> > *1.4x~3.7x*
> > > faster than FSHLog. The ITBLL result turns out that it is *not bad*
> than
> > > FSHLog(the master branch is not that stable itself...).
> > >
> > > More details can be found on HBASE-15536.
> > >
> > > So here we propose to change the default WAL from FSHLog to AsyncFSWAL.
> > > Suggestions are welcomed.
> > >
> > > Thanks.
> >
> >
> >
> > --
> > busbey
> >
>

[jira] [Resolved] (HBASE-15720) Print row locks at the debug dump page

2016-04-27 Thread Heng Chen (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-15720?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Heng Chen resolved HBASE-15720.
---
  Resolution: Fixed
Hadoop Flags: Reviewed

> Print row locks at the debug dump page
> --
>
> Key: HBASE-15720
> URL: https://issues.apache.org/jira/browse/HBASE-15720
> Project: HBase
>  Issue Type: Improvement
>Reporter: Enis Soztutar
>Assignee: Heng Chen
> Fix For: 2.0.0, 1.3.0, 1.4.0, 1.1.5, 1.2.2, 0.98.20, 1.0.5
>
> Attachments: 4742C21D-B9CE-4921-9B32-CC319488EC64.png, 
> HBASE-15720.patch
>
>
> We had to debug cases where some handlers are holding row locks for an 
> extended time (and maybe leak) and other handlers are getting timeouts for 
> obtaining row locks. 
> We should add row lock information at the debug page at the RS UI to be able 
> to live-debug such cases.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HBASE-15692) We should remove snapshot dir under .tmp when write .snapshotinfo failed.

2016-04-21 Thread Heng Chen (JIRA)

Heng Chen created HBASE-15692:
-

 Summary: We should remove snapshot dir under .tmp when write 
.snapshotinfo failed.
 Key: HBASE-15692
 URL: https://issues.apache.org/jira/browse/HBASE-15692
 Project: HBase
  Issue Type: Bug
Reporter: Heng Chen


we encounter this problem on our production cluster.   
This is exception in HMaster.log
{code}
2016-04-22 11:05:06,390 ERROR [f04,16000,1459941011479_ChoreService_3] 
snapshot.SnapshotHFileCleaner: Exception while checking if files were valid, 
keeping them just in case.
org.apache.hadoop.hbase.snapshot.CorruptedSnapshotException: Couldn't read 
snapshot info 
from:hdfs://f04/hbase/.hbase-snapshot/.tmp/frog_stastic_2016-04-07/.snapshotinfo
at 
org.apache.hadoop.hbase.snapshot.SnapshotDescriptionUtils.readSnapshotInfo(SnapshotDescriptionUtils.java:295)
at 
org.apache.hadoop.hbase.snapshot.SnapshotReferenceUtil.getHFileNames(SnapshotReferenceUtil.java:328)
at 
org.apache.hadoop.hbase.master.snapshot.SnapshotHFileCleaner$1.filesUnderSnapshot(SnapshotHFileCleaner.java:85)
at 
org.apache.hadoop.hbase.master.snapshot.SnapshotFileCache.getSnapshotsInProgress(SnapshotFileCache.java:303)
at 
org.apache.hadoop.hbase.master.snapshot.SnapshotFileCache.getUnreferencedFiles(SnapshotFileCache.java:194)
at 
org.apache.hadoop.hbase.master.snapshot.SnapshotHFileCleaner.getDeletableFiles(SnapshotHFileCleaner.java:62)
at 
org.apache.hadoop.hbase.master.cleaner.CleanerChore.checkAndDeleteFiles(CleanerChore.java:233)
at 
org.apache.hadoop.hbase.master.cleaner.CleanerChore.checkAndDeleteEntries(CleanerChore.java:157)
at 
org.apache.hadoop.hbase.master.cleaner.CleanerChore.checkAndDeleteDirectory(CleanerChore.java:180)
at 
org.apache.hadoop.hbase.master.cleaner.CleanerChore.checkAndDeleteEntries(CleanerChore.java:149)
at 
org.apache.hadoop.hbase.master.cleaner.CleanerChore.checkAndDeleteDirectory(CleanerChore.java:180)
at 
org.apache.hadoop.hbase.master.cleaner.CleanerChore.checkAndDeleteEntries(CleanerChore.java:149)
at 
org.apache.hadoop.hbase.master.cleaner.CleanerChore.checkAndDeleteDirectory(CleanerChore.java:180)
at 
org.apache.hadoop.hbase.master.cleaner.CleanerChore.checkAndDeleteEntries(CleanerChore.java:149)
at 
org.apache.hadoop.hbase.master.cleaner.CleanerChore.checkAndDeleteDirectory(CleanerChore.java:180)
at 
org.apache.hadoop.hbase.master.cleaner.CleanerChore.checkAndDeleteEntries(CleanerChore.java:149)
at 
org.apache.hadoop.hbase.master.cleaner.CleanerChore.checkAndDeleteDirectory(CleanerChore.java:180)
at 
org.apache.hadoop.hbase.master.cleaner.CleanerChore.checkAndDeleteEntries(CleanerChore.java:149)
at 
org.apache.hadoop.hbase.master.cleaner.CleanerChore.chore(CleanerChore.java:124)
at org.apache.hadoop.hbase.ScheduledChore.run(ScheduledChore.java:185)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.io.FileNotFoundException: File does not exist: 
/hbase/.hbase-snapshot/.tmp/frog_stastic_2016-04-07/.snapshotinfo
at 
org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:64)
at 
org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:54)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsUpdateTimes(FSNamesystem.java:1795)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:1738)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1718)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1690)
at 
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:519)
at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:337)
at 
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
at org.apache.hadoop.ipc.RPC$Server

[jira] [Resolved] (HBASE-15642) split region number for one table on webUI never be reduced

2016-04-13 Thread Heng Chen (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-15642?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Heng Chen resolved HBASE-15642.
---
Resolution: Invalid

> split region number for one table on webUI never be reduced 
> 
>
> Key: HBASE-15642
> URL: https://issues.apache.org/jira/browse/HBASE-15642
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.98.17
>Reporter: Heng Chen
> Attachments: 268286F8-8BA4-4144-9B66-1A6F70FEC2EB.png
>
>
> This happened after we upgrade our cluster from 0.98.6 to 0.98.17. 
> The number should be reduced,   but it always increases from original 20+ to 
> 49 now. (yesterday it was 48)
> Need to dig.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HBASE-15643) Need metrics of cache hit ratio for one table

2016-04-12 Thread Heng Chen (JIRA)

Heng Chen created HBASE-15643:
-

 Summary: Need metrics of cache hit ratio for one table
 Key: HBASE-15643
 URL: https://issues.apache.org/jira/browse/HBASE-15643
 Project: HBase
  Issue Type: Bug
Reporter: Heng Chen


There are many tables on our cluster,  we  only some of them need to be read 
online.  

We could improve the performance of read by cache,  but we need some metrics 
for it at table level 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HBASE-15642) split region number for one table on webUI never be reduced

2016-04-12 Thread Heng Chen (JIRA)

Heng Chen created HBASE-15642:
-

 Summary: split region number for one table on webUI never be 
reduced 
 Key: HBASE-15642
 URL: https://issues.apache.org/jira/browse/HBASE-15642
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.98.17
Reporter: Heng Chen


This happened after we upgrade our cluster from 0.98.6 to 0.98.17. 




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HBASE-15635) Mean age of Blocks in cache (seconds) on webUI should be greater than zero

2016-04-11 Thread Heng Chen (JIRA)

Heng Chen created HBASE-15635:
-

 Summary: Mean age of Blocks in cache (seconds) on webUI should be 
greater than zero
 Key: HBASE-15635
 URL: https://issues.apache.org/jira/browse/HBASE-15635
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.98.17
Reporter: Heng Chen






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HBASE-15629) Backport HBASE-14703 to 0.98+

2016-04-11 Thread Heng Chen (JIRA)

Heng Chen created HBASE-15629:
-

 Summary: Backport HBASE-14703 to 0.98+
 Key: HBASE-15629
 URL: https://issues.apache.org/jira/browse/HBASE-15629
 Project: HBase
  Issue Type: Task
Reporter: Heng Chen






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: Please welcome new HBase Committer Francis Liu

2016-04-07 Thread Heng Chen

congrats!


2016-04-08 13:30 GMT+08:00 Jesse Yates :

> Congrats and welcome!
>
> On Thu, Apr 7, 2016 at 10:27 PM Ted Yu  wrote:
>
> > Congratulations, Francis.
> >
> > > On Apr 7, 2016, at 10:19 PM, Stack  wrote:
> > >
> > > Francis has been around forever looking after one of the biggest HBase
> > > deploys. He has contributed a bunch of big features during this time --
> > > namespacing and grouping to mention a few -- and has more coming down
> the
> > > pipe.
> > >
> > > Please welcome Francis to the committer fold.
> > >
> > > Thanks for all the great work Francis,
> > > St.Ack
> >
>

Re: Please welcome new HBase committer Ashish Singhi

2016-04-07 Thread Heng Chen

congrats!

2016-04-08 13:26 GMT+08:00 Ted Yu :

> Congrats, Ashish.
>
> > On Apr 7, 2016, at 10:23 PM, Stack  wrote:
> >
> > Ashish has contributed loads of stuff starting out basic doing rakes of
> > simple fixes but then ramping up to take on the hard stuff going out of
> his
> > way to land the 'best' fix. He's been doing great work. Keep it up
> Ashish!
> >
> > St.Ack
>

Re: [ANNOUNCE] New HBase committer Yu Li

2016-03-18 Thread Heng Chen

Congratulations, Yu!  :)

2016-03-17 10:13 GMT+08:00 Andrew Purtell :

> Congratulations and welcome!
>
> > On Mar 16, 2016, at 6:49 PM, Nick Dimiduk  wrote:
> >
> > On behalf of the Apache HBase PMC, I am pleased to announce that Yu Li
> > has accepted the PMC's invitation to become a committer on the
> > project. We appreciate all of Yu's generous contributions thus far and
> > look forward to his continued involvement.
> >
> > Congratulations and welcome, Yu!
> >
> > -n
>

[jira] [Resolved] (HBASE-15326) NPE in TestHRegion.testBatchPut_whileNoRowLocksHeld

2016-03-18 Thread Heng Chen (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-15326?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Heng Chen resolved HBASE-15326.
---
Resolution: Duplicate

Yeah, duplicate. 

> NPE in TestHRegion.testBatchPut_whileNoRowLocksHeld
> ---
>
> Key: HBASE-15326
> URL: https://issues.apache.org/jira/browse/HBASE-15326
> Project: HBase
>  Issue Type: Bug
>    Reporter: Heng Chen
>
> {code}
> java.lang.NullPointerException
>   at 
> org.apache.hadoop.metrics2.lib.MutableHistogram.updateSnapshotMetrics(MutableHistogram.java:72)
>   at 
> org.apache.hadoop.metrics2.lib.MutableRangeHistogram.snapshot(MutableRangeHistogram.java:59)
>   at 
> org.apache.hadoop.metrics2.lib.DynamicMetricsRegistry.snapshot(DynamicMetricsRegistry.java:391)
>   at 
> org.apache.hadoop.hbase.metrics.BaseSourceImpl.getMetrics(BaseSourceImpl.java:146)
>   at 
> org.apache.hadoop.hbase.test.MetricsAssertHelperImpl.getMetrics(MetricsAssertHelperImpl.java:243)
>   at 
> org.apache.hadoop.hbase.test.MetricsAssertHelperImpl.getCounter(MetricsAssertHelperImpl.java:201)
>   at 
> org.apache.hadoop.hbase.regionserver.TestHRegion.testBatchPut_whileNoRowLocksHeld(TestHRegion.java:1498)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:497)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
>   at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
>   at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55)
>   at org.junit.rules.RunRules.evaluate(RunRules.java:20)
>   at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
>   at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
>   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
>   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
>   at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
>   at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
>   at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
>   at org.junit.runner.JUnitCore.run(JUnitCore.java:137)
>   at 
> com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:117)
>   at 
> com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:234)
>   at 
> com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:74)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:497)
>   at com.intellij.rt.execution.application.AppMain.main(AppMain.java:144)
> {code}
> It seems to be introduced after HBASE-15222,  [~eclark]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (HBASE-15377) Per-RS Get metric is time based, per-region metric is size-based

2016-03-07 Thread Heng Chen (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-15377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Heng Chen resolved HBASE-15377.
---
Resolution: Duplicate

> Per-RS Get metric is time based, per-region metric is size-based
> 
>
> Key: HBASE-15377
> URL: https://issues.apache.org/jira/browse/HBASE-15377
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Enis Soztutar
>
> We have metrics for Get operations at the region server level and region 
> level. 
> {code}
>"Get_num_ops" : 4837505,
> "Get_min" : 0,
> "Get_max" : 296,
> "Get_mean" : 0.2934618155433431,
> "Get_median" : 0.0,
> "Get_75th_percentile" : 0.0,
> "Get_95th_percentile" : 1.0,
> "Get_99th_percentile" : 1.0,
> {code}
> and 
> {code}
>"Namespace_hbase_table_meta_region_1588230740_metric_get_num_ops" : 103,
> "Namespace_hbase_table_meta_region_1588230740_metric_get_min" : 450,
> "Namespace_hbase_table_meta_region_1588230740_metric_get_max" : 470,
> "Namespace_hbase_table_meta_region_1588230740_metric_get_mean" : 
> 450.19417475728153,
> "Namespace_hbase_table_meta_region_1588230740_metric_get_median" : 460.0,
> "Namespace_hbase_table_meta_region_1588230740_metric_get_75th_percentile" 
> : 470.0,
> "Namespace_hbase_table_meta_region_1588230740_metric_get_95th_percentile" 
> : 470.0,
> "Namespace_hbase_table_meta_region_1588230740_metric_get_99th_percentile" 
> : 470.0,
> {code}
> The problem is that the report values for the region server shows the 
> latency, versus the reported values for the region shows the response sizes. 
> There is no way of telling this without reading the source code. 
> I think we should deprecate response size histograms in favor of latency 
> histograms. 
> See also HBASE-15376. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (HBASE-15329) Cross-Site Scripting: Reflected in table.jsp

2016-03-02 Thread Heng Chen (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-15329?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Heng Chen resolved HBASE-15329.
---
  Resolution: Fixed
Hadoop Flags: Reviewed

> Cross-Site Scripting: Reflected in table.jsp
> 
>
> Key: HBASE-15329
> URL: https://issues.apache.org/jira/browse/HBASE-15329
> Project: HBase
>  Issue Type: Bug
>  Components: security
>Reporter: stack
>Assignee: Samir Ahmic
>Priority: Minor
> Fix For: 2.0.0
>
> Attachments: HBASE-15329_v0.patch
>
>
> Minor issue where we write back table name in a few places. Should clean it 
> up:
> {code}
>  } else { 
>   out.write("\nTable: ");
>   out.print( fqtn );
>   out.write("\n");
>  } 
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HBASE-15384) Avoid using '/tmp' directory in TestBulkLoad

2016-03-02 Thread Heng Chen (JIRA)

Heng Chen created HBASE-15384:
-

 Summary: Avoid using '/tmp' directory in TestBulkLoad
 Key: HBASE-15384
 URL: https://issues.apache.org/jira/browse/HBASE-15384
 Project: HBase
  Issue Type: Sub-task
Reporter: Heng Chen






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: [CIS-CMMI-3] regionserver not starting up

2016-03-02 Thread Heng Chen

Please try "mvn clean package -DskipTests"

2016-03-02 17:43 GMT+08:00 Kshitij Shukla :

> Hello everyone,
>
> I have compiled hbase (hbase-0.98.8-hadoop2) source, with hadoop 2.5.2 and
> Java 1.7 using "mvn package -DskipTests" command. Followed by rsyncing it
> across the cluster. Its getting compiled successfully, but when I trying to
> start hbase, starting fine on master. When I do jps on master it output me
> this :
>
> [root@ns613 c1]# jps
> 135001 Jps
> 131938 NodeManager
> 131837 ResourceManager
> 134565 HRegionServer
> 131385 NameNode
> 134314 HQuorumPeer
> 134418 HMaster
> 131679 SecondaryNameNode
> 131508 DataNode
>
> But on slave regionservers are not starting up, jps(on slave) returns the
> following :
>
> [root@slave01 ~]# jps
> 58168 NodeManager
> 59033 Jps
> 58066 DataNode
> 58785 HQuorumPeer
>
> On exploring logs on slave I found this error stack :
>
> START
> 2016-03-02 10:28:13,922 ERROR [main]
> regionserver.HRegionServerCommandLine: Region server exiting
> java.lang.RuntimeException: Failed construction of Regionserver: class
> org.apache.hadoop.hbase.regionserver.HRegionServer
> at
> org.apache.hadoop.hbase.regionserver.HRegionServer.constructRegionServer(HRegionServer.java:2488)
> at
> org.apache.hadoop.hbase.regionserver.HRegionServerCommandLine.start(HRegionServerCommandLine.java:61)
> at
> org.apache.hadoop.hbase.regionserver.HRegionServerCommandLine.run(HRegionServerCommandLine.java:85)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
> at
> org.apache.hadoop.hbase.util.ServerCommandLine.doMain(ServerCommandLine.java:126)
> at
> org.apache.hadoop.hbase.regionserver.HRegionServer.main(HRegionServer.java:2503)
> Caused by: java.lang.reflect.InvocationTargetException
> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
> Method)
> at
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
> at
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
> at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
> at
> org.apache.hadoop.hbase.regionserver.HRegionServer.constructRegionServer(HRegionServer.java:2486)
> ... 5 more
> Caused by: java.lang.NoClassDefFoundError: com/yammer/metrics/stats/Sample
> at
> org.apache.hadoop.metrics2.lib.DynamicMetricsRegistry.newHistogram(DynamicMetricsRegistry.java:271)
> at
> org.apache.hadoop.hbase.ipc.MetricsHBaseServerSourceImpl.(MetricsHBaseServerSourceImpl.java:65)
> at
> org.apache.hadoop.hbase.ipc.MetricsHBaseServerSourceFactoryImpl.getSource(MetricsHBaseServerSourceFactoryImpl.java:48)
> at
> org.apache.hadoop.hbase.ipc.MetricsHBaseServerSourceFactoryImpl.create(MetricsHBaseServerSourceFactoryImpl.java:38)
> at
> org.apache.hadoop.hbase.ipc.MetricsHBaseServer.(MetricsHBaseServer.java:30)
> at org.apache.hadoop.hbase.ipc.RpcServer.(RpcServer.java:1879)
> at
> org.apache.hadoop.hbase.regionserver.HRegionServer.(HRegionServer.java:621)
> ... 10 more
> Caused by: java.lang.ClassNotFoundException:
> com.yammer.metrics.stats.Sample
> at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
> at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
> at java.security.AccessController.doPrivileged(Native Method)
> at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
> at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
> ... 17 more
> END
>
> Can you please suggest me some workarounds/clue ?
>
> BR
>
> --
>
> --
>
> *Cyber Infrastructure (P) Limited, [CIS] **(CMMI Level 3 Certified)*
>
> Central India's largest Technology company.
>
> *Ensuring the success of our clients and partners through our highly
> optimized Technology solutions.*
>
> www.cisin.com | +Cisin  | Linkedin <
> https://www.linkedin.com/company/cyber-infrastructure-private-limited> |
> Offices: *Indore, India.* *Singapore. Silicon Valley, USA*.
>
> DISCLAIMER:  INFORMATION PRIVACY is important for us, If you are not the
> intended recipient, you should delete this message and are notified that
> any disclosure, copying or distribution of this message, or taking any
> action based on it, is strictly prohibited by Law.
>

[jira] [Created] (HBASE-15326) NPE in TestHRegion.testBatchPut_whileNoRowLocksHeld

2016-02-25 Thread Heng Chen (JIRA)

Heng Chen created HBASE-15326:
-

 Summary: NPE in TestHRegion.testBatchPut_whileNoRowLocksHeld
 Key: HBASE-15326
 URL: https://issues.apache.org/jira/browse/HBASE-15326
 Project: HBase
  Issue Type: Bug
Reporter: Heng Chen


{code}
java.lang.NullPointerException
at 
org.apache.hadoop.metrics2.lib.MutableHistogram.updateSnapshotMetrics(MutableHistogram.java:72)
at 
org.apache.hadoop.metrics2.lib.MutableRangeHistogram.snapshot(MutableRangeHistogram.java:59)
at 
org.apache.hadoop.metrics2.lib.DynamicMetricsRegistry.snapshot(DynamicMetricsRegistry.java:391)
at 
org.apache.hadoop.hbase.metrics.BaseSourceImpl.getMetrics(BaseSourceImpl.java:146)
at 
org.apache.hadoop.hbase.test.MetricsAssertHelperImpl.getMetrics(MetricsAssertHelperImpl.java:243)
at 
org.apache.hadoop.hbase.test.MetricsAssertHelperImpl.getCounter(MetricsAssertHelperImpl.java:201)
at 
org.apache.hadoop.hbase.regionserver.TestHRegion.testBatchPut_whileNoRowLocksHeld(TestHRegion.java:1498)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at 
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
at 
org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55)
at org.junit.rules.RunRules.evaluate(RunRules.java:20)
at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
at org.junit.runner.JUnitCore.run(JUnitCore.java:137)
at 
com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:117)
at 
com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:234)
at 
com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:74)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at com.intellij.rt.execution.application.AppMain.main(AppMain.java:144)
{code}

It seems to be introduced after HBASE-15222,  [~eclark]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HBASE-15308) Flakey TestSplitWalDataLoss on branch-1.1

2016-02-22 Thread Heng Chen (JIRA)

Heng Chen created HBASE-15308:
-

 Summary: Flakey TestSplitWalDataLoss on branch-1.1
 Key: HBASE-15308
 URL: https://issues.apache.org/jira/browse/HBASE-15308
 Project: HBase
  Issue Type: Sub-task
Reporter: Heng Chen


It happens during HBASE-15169 QA test,  see 

https://builds.apache.org/job/PreCommit-HBASE-Build/628/artifact/patchprocess/patch-unit-hbase-server-jdk1.8.0_72.txt

https://builds.apache.org/job/PreCommit-HBASE-Build/547/artifact/patchprocess/patch-unit-hbase-server-jdk1.8.0_72.txt



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: ruby checkstyle seems some unreasonable

2016-02-18 Thread Heng Chen

Thanks for your reply,  busbey.

btw.  Do you know how to run rubocop locally, i want to test my ruby
locally firstly before upload to jira.

Thanks again.

2016-02-19 8:21 GMT+08:00 Sean Busbey :

> I agree we should provide rubocop with a project specific configuration
> file.
>
> On your specific grievances:
>
> 1) sure, the method was already in dire straights. I think this is a case
> for committer judgement rather than a rule change. A reasonable outcome
> could be giving a patch that raises such a concern a pass, but making sure
> we have a follow on issue to fix things.
>
> 2) the Ruby line length should match the rest of the project. I think
> that's 100?
>
> 3) the existing uses of double quotes should be corrected in time and we
> should avoid new uses.
>
> Much of our Ruby code was written by folks who don't do much Ruby coding.
> It would benefit us as a project to move towards making our Ruby code look
> more like the wider Ruby use.
>
> --
> Sean Busbey
> On Feb 18, 2016 17:04, "Heng Chen"  wrote:
>
> > During HBASE-15128,  i made some changes to ruby script, it cause a lot
> of
> > warning about ruby style.
> >
> >
> >
> https://builds.apache.org/job/PreCommit-HBASE-Build/587/artifact/patchprocess/diff-patch-rubocop.txt
> >
> > A lot of warnings exist already,  i think it is unreasonable.
> >
> > for example:
> >
> > 1.  /testptch/hbase/hbase-shell/src/main/ruby/hbase/admin.rb:32:3: C:
> Class
> > has too many lines. [777/100]
> > The class length has exceeded 100 already.
> >
> > 2.  /testptch/hbase/hbase-shell/src/main/ruby/hbase/admin.rb:135:81: C:
> > Line is too long. [99/80]
> > I think length 100 is more reasonable.
> >
> >
> > 3. /testptch/hbase/hbase-shell/src/main/ruby/hbase/admin.rb:145:30: C:
> > Prefer single-quoted strings when you don't need string interpolation or
> > special symbols.
> > I see many strings in script use double-quoted.
> >
> > wdyt?
> >
>

ruby checkstyle seems some unreasonable

2016-02-18 Thread Heng Chen

During HBASE-15128,  i made some changes to ruby script, it cause a lot of
warning about ruby style.

https://builds.apache.org/job/PreCommit-HBASE-Build/587/artifact/patchprocess/diff-patch-rubocop.txt

A lot of warnings exist already,  i think it is unreasonable.

for example:

1.  /testptch/hbase/hbase-shell/src/main/ruby/hbase/admin.rb:32:3: C: Class
has too many lines. [777/100]
The class length has exceeded 100 already.

2.  /testptch/hbase/hbase-shell/src/main/ruby/hbase/admin.rb:135:81: C:
Line is too long. [99/80]
I think length 100 is more reasonable.


3. /testptch/hbase/hbase-shell/src/main/ruby/hbase/admin.rb:145:30: C:
Prefer single-quoted strings when you don't need string interpolation or
special symbols.
I see many strings in script use double-quoted.

wdyt?

[jira] [Created] (HBASE-15288) Flakey TestMasterMetrics.testClusterRequests on branch-1.1

2016-02-18 Thread Heng Chen (JIRA)

Heng Chen created HBASE-15288:
-

 Summary: Flakey TestMasterMetrics.testClusterRequests on branch-1.1
 Key: HBASE-15288
 URL: https://issues.apache.org/jira/browse/HBASE-15288
 Project: HBase
  Issue Type: Sub-task
Reporter: Heng Chen


I found it during HBASE-15169, Let me see what's happening.

https://builds.apache.org/job/PreCommit-HBASE-Build/586/testReport/org.apache.hadoop.hbase.master/TestMasterMetrics/testClusterRequests/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (HBASE-15182) unify normalizer switch

2016-02-17 Thread Heng Chen (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-15182?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Heng Chen resolved HBASE-15182.
---
Resolution: Invalid

> unify normalizer switch
> ---
>
> Key: HBASE-15182
> URL: https://issues.apache.org/jira/browse/HBASE-15182
> Project: HBase
>  Issue Type: Sub-task
>    Reporter: Heng Chen
> Fix For: 2.0.0, 1.3.0
>
>
> After HBASE-15128,  we will have an uniform way to do switch. Let's unify 
> normalizer into it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HBASE-15280) Scan metrics should not ignore empty rows.

2016-02-17 Thread Heng Chen (JIRA)

Heng Chen created HBASE-15280:
-

 Summary: Scan metrics should not ignore empty rows.
 Key: HBASE-15280
 URL: https://issues.apache.org/jira/browse/HBASE-15280
 Project: HBase
  Issue Type: Bug
Reporter: Heng Chen


It is come from HBASE-15267.

we found that as for empty result,  GET increment the readRequestCount, but 
Scan does not do it.   
Detail information, please see  HBASE-15267 comments.

This jira is to fix it.  We should record each request we do.  At least, we 
could make Scan metrics behavior be consistent with Get. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HBASE-15182) unify normalizer switch

2016-01-27 Thread Heng Chen (JIRA)

Heng Chen created HBASE-15182:
-

 Summary: unify normalizer switch
 Key: HBASE-15182
 URL: https://issues.apache.org/jira/browse/HBASE-15182
 Project: HBase
  Issue Type: Sub-task
Reporter: Heng Chen


After HBASE-15128,  we will have an uniform way to do switch. Let's unify 
normalizer into it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (HBASE-15178) metrics on web UI sometimes flakey

2016-01-27 Thread Heng Chen (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-15178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Heng Chen resolved HBASE-15178.
---
Resolution: Duplicate

> metrics on web UI sometimes flakey
> --
>
> Key: HBASE-15178
> URL: https://issues.apache.org/jira/browse/HBASE-15178
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.98.6
>Reporter: Heng Chen
> Attachments: 70DD704D-7E05-49BA-AC84-0646671F67AA.png
>
>
> see attachement



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HBASE-15178) metrics on web UI sometimes flakey

2016-01-26 Thread Heng Chen (JIRA)

Heng Chen created HBASE-15178:
-

 Summary: metrics on web UI sometimes flakey
 Key: HBASE-15178
 URL: https://issues.apache.org/jira/browse/HBASE-15178
 Project: HBase
  Issue Type: Bug
Reporter: Heng Chen






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: Data loss after compaction when a row has more than Integer.MAX_VALUE columns

2016-01-17 Thread Heng Chen

The incoming-follow-nodes and outgoing-follow-nodes of one node exceed
Integer.MAX_VALUE,  unbelievable!

Is the performance OK if i request the number of incoming-follow-nodes?


2016-01-18 13:26 GMT+08:00 Toshihiro Suzuki :

> Thank you for your reply.
>
> We are using hbase to store social graph data on the SNS we provide.
>
> Our use case was presented in HBasecon 2015.
>
> http://www.slideshare.net/HBaseCon/use-cases-session-6a
>
> Schema Design is the below,
>
> http://www.slideshare.net/HBaseCon/use-cases-session-6a/44
>
> Thanks,
> Toshihiro Suzuki.
>
>
> 2016-01-18 13:48 GMT+09:00 Ted Yu :
>
> > Interesting.
> >
> > Can you share your use case where more than Integer.MAX_VALUE columns are
> > needed.
> >
> > Consider filing a JIRA for the proposed change.
> >
> > On Sun, Jan 17, 2016 at 8:05 PM, Toshihiro Suzuki 
> > wrote:
> >
> > > Hi,
> > >
> > > We have lost the data in our development environment when a row has
> more
> > > than Integer.MAX_VALUE columns after compaction.
> > >
> > > I think the reason is type of StoreScanner's countPerRow is int.
> > >
> > >
> > >
> > >
> >
> https://github.com/apache/hbase/blob/master/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java#L67
> > >
> > >
> > > After changing the type to long, it seems to be fixed.
> > >
> > > What do you think about that?
> > >
> > >
> > > Thanks,
> > >
> > > Toshihiro Suzuki.
> > >
> >
>

Re: Data loss after compaction when a row has more than Integer.MAX_VALUE columns

2016-01-17 Thread Heng Chen

I am interesting in which situation a row has more than Integer.MAX_VALUE
columns.   If so, how large the row is,  it satisfies the limit of row size?

[jira] [Created] (HBASE-15108) TestReplicationAdmin failed on branch-1.0

2016-01-13 Thread Heng Chen (JIRA)

Heng Chen created HBASE-15108:
-

 Summary: TestReplicationAdmin failed on branch-1.0 
 Key: HBASE-15108
 URL: https://issues.apache.org/jira/browse/HBASE-15108
 Project: HBase
  Issue Type: Bug
Reporter: Heng Chen


I notice it on HBASE-15095.

See
https://builds.apache.org/job/PreCommit-HBASE-Build/95/artifact/patchprocess/patch-unit-hbase-server-jdk1.8.0.txt



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: identifying source of region split request

2016-01-06 Thread Heng Chen

Could we use Procedure V2 framework to do split?
It seems to be a great work...

2016-01-07 4:35 GMT+08:00 Ted Yu :

> bq. is splitting and subsequent merging a necessarily bad thing?
>
> Please note that merging may happen before the size of R1 (R2) drops to
> zero. Meaning, there may be churn in normalization activity.
> I would say that the normalization in the scenario below is premature.
>
> Cheers
>
> On Wed, Jan 6, 2016 at 11:41 AM, Mikhail Antonov 
> wrote:
>
> > Yeah, I see. Btw, in scenario you described, is splitting and subsequent
> > merging a necessarily bad thing? If data expiration happens not that
> often,
> > having more evenly sized regions in between major collections isn't worth
> > it?
> >
> > On Wed, Jan 6, 2016 at 10:23 AM, Ted Yu  wrote:
> >
> > > bq. collect statistics to blacklist some RSs with high failure rate
> > >
> > > The metadata would help pinpoint the regions which consistently fail
> > split
> > > (merge) in the recent past. The failure could be due to corrupt
> HFile(s)
> > or
> > > other reason.
> > > Having the statistics would also help normalizer avoid the following
> > > scenario:
> > >
> > > region R gets split into R1 and R2
> > > size of R1 and R2 decreases due to expiration of data
> > > R1 and R2 get merged into R'
> > > more data comes into R', resulting in split, again
> > >
> > > Cheers
> > >
> > > On Wed, Jan 6, 2016 at 10:14 AM, Mikhail Antonov  >
> > > wrote:
> > >
> > > > Adding this tracing information to know who initiated split is in
> > general
> > > > useful thing. Right now though I'm not sure I see how that would help
> > to
> > > > make better normalization decisions? Region split failure implies
> > > > underlying FS issue? Any examples/ideas?
> > > >
> > > > Kind of..collect statistics to blacklist some RSs with high failure
> > rate
> > > > and don't attempt to split regions hosted there in future?
> > > >
> > > > On Tue, Jan 5, 2016 at 2:55 PM, Ted Yu  wrote:
> > > >
> > > > > Hi,
> > > > > I recently worked on improving region normalization feature.
> > > > >
> > > > > If region split request triggered by the execution of
> > > > > SplitNormalizationPlan fails, there is no way of knowing whether
> the
> > > > failed
> > > > > split originated from region normalization.
> > > > > Such association would give RegionNormalizer information so that it
> > can
> > > > > make better normalization decisions in the subsequent invocations.
> > > > >
> > > > > One enhancement I can think of is to embed metadata in SplitRequest
> > > which
> > > > > gets passed through RegionStateTransitionContext when
> > > > > RegionServerServices#reportRegionStateTransition() is called.
> > > > > This way, RegionStateListener can be notified with the metadata (id
> > of
> > > > the
> > > > > requester).
> > > > >
> > > > > Comment is welcome.
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > > Thanks,
> > > > Michael Antonov
> > > >
> > >
> >
> >
> >
> > --
> > Thanks,
> > Michael Antonov
> >
>

[jira] [Resolved] (HBASE-15049) AuthTypes.NONE cause exception after HS2 start

2015-12-29 Thread Heng Chen (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-15049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Heng Chen resolved HBASE-15049.
---
Resolution: Invalid

> AuthTypes.NONE cause exception after HS2 start
> --
>
> Key: HBASE-15049
> URL: https://issues.apache.org/jira/browse/HBASE-15049
> Project: HBase
>  Issue Type: Bug
>    Reporter: Heng Chen
>
> I set {{hive.server2.authentication}} to be {{NONE}}
> After HS2 start, i see exception in log below:
> {code}
> 2015-12-29 16:58:42,339 ERROR [HiveServer2-Handler-Pool: Thread-31]: 
> server.TThreadPoolServer (TThreadPoolServer.java:run(296)) - Error occurred 
> during processing of message.
> java.lang.RuntimeException: 
> org.apache.thrift.transport.TSaslTransportException: No data or no sasl data 
> in the stream
> at 
> org.apache.thrift.transport.TSaslServerTransport$Factory.getTransport(TSaslServerTransport.java:219)
> at 
> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:268)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: org.apache.thrift.transport.TSaslTransportException: No data or no 
> sasl data in the stream
> at 
> org.apache.thrift.transport.TSaslTransport.open(TSaslTransport.java:328)
> at 
> org.apache.thrift.transport.TSaslServerTransport.open(TSaslServerTransport.java:41)
> at 
> org.apache.thrift.transport.TSaslServerTransport$Factory.getTransport(TSaslServerTransport.java:216)
> ... 4 more
> {code}
> IMO the problem is we use Sasl transport when authType is NONE, 
> {code:title=HiveAuthFactory.java}
>   public TTransportFactory getAuthTransFactory() throws LoginException {
> TTransportFactory transportFactory;
> if (authTypeStr.equalsIgnoreCase(AuthTypes.KERBEROS.getAuthName())) {
>   try {
> transportFactory = 
> saslServer.createTransportFactory(getSaslProperties());
>   } catch (TTransportException e) {
> throw new LoginException(e.getMessage());
>   }
> } else if (authTypeStr.equalsIgnoreCase(AuthTypes.NONE.getAuthName())) {
>   transportFactory = 
> PlainSaslHelper.getPlainTransportFactory(authTypeStr);
> } else if (authTypeStr.equalsIgnoreCase(AuthTypes.LDAP.getAuthName())) {
>   transportFactory = 
> PlainSaslHelper.getPlainTransportFactory(authTypeStr);
> } else if (authTypeStr.equalsIgnoreCase(AuthTypes.PAM.getAuthName())) {
>   transportFactory = 
> PlainSaslHelper.getPlainTransportFactory(authTypeStr);
> } else if (authTypeStr.equalsIgnoreCase(AuthTypes.NOSASL.getAuthName())) {
>   transportFactory = new TTransportFactory();
> } else if (authTypeStr.equalsIgnoreCase(AuthTypes.CUSTOM.getAuthName())) {
>   transportFactory = 
> PlainSaslHelper.getPlainTransportFactory(authTypeStr);
> } else {
>   throw new LoginException("Unsupported authentication type " + 
> authTypeStr);
> }
> return transportFactory;
>   }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HBASE-15049) AuthTypes.NONE cause exception after HS2 start

2015-12-29 Thread Heng Chen (JIRA)

Heng Chen created HBASE-15049:
-

 Summary: AuthTypes.NONE cause exception after HS2 start
 Key: HBASE-15049
 URL: https://issues.apache.org/jira/browse/HBASE-15049
 Project: HBase
  Issue Type: Bug
Reporter: Heng Chen


I set {{hive.server2.authentication}} to be {{NONE}}

After HS2 start, i see exception is log below:
{code}
2015-12-29 16:58:42,339 ERROR [HiveServer2-Handler-Pool: Thread-31]: 
server.TThreadPoolServer (TThreadPoolServer.java:run(296)) - Error occurred 
during processing of message.
java.lang.RuntimeException: 
org.apache.thrift.transport.TSaslTransportException: No data or no sasl data in 
the stream
at 
org.apache.thrift.transport.TSaslServerTransport$Factory.getTransport(TSaslServerTransport.java:219)
at 
org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:268)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.thrift.transport.TSaslTransportException: No data or no 
sasl data in the stream
at 
org.apache.thrift.transport.TSaslTransport.open(TSaslTransport.java:328)
at 
org.apache.thrift.transport.TSaslServerTransport.open(TSaslServerTransport.java:41)
at 
org.apache.thrift.transport.TSaslServerTransport$Factory.getTransport(TSaslServerTransport.java:216)
... 4 more
{code}

IMO the problem is we use Sasl transport when authType is NONE, 
{code:title=HiveAuthFactory.java}
  public TTransportFactory getAuthTransFactory() throws LoginException {
TTransportFactory transportFactory;
if (authTypeStr.equalsIgnoreCase(AuthTypes.KERBEROS.getAuthName())) {
  try {
transportFactory = 
saslServer.createTransportFactory(getSaslProperties());
  } catch (TTransportException e) {
throw new LoginException(e.getMessage());
  }
} else if (authTypeStr.equalsIgnoreCase(AuthTypes.NONE.getAuthName())) {
  transportFactory = PlainSaslHelper.getPlainTransportFactory(authTypeStr);
} else if (authTypeStr.equalsIgnoreCase(AuthTypes.LDAP.getAuthName())) {
  transportFactory = PlainSaslHelper.getPlainTransportFactory(authTypeStr);
} else if (authTypeStr.equalsIgnoreCase(AuthTypes.PAM.getAuthName())) {
  transportFactory = PlainSaslHelper.getPlainTransportFactory(authTypeStr);
} else if (authTypeStr.equalsIgnoreCase(AuthTypes.NOSASL.getAuthName())) {
  transportFactory = new TTransportFactory();
} else if (authTypeStr.equalsIgnoreCase(AuthTypes.CUSTOM.getAuthName())) {
  transportFactory = PlainSaslHelper.getPlainTransportFactory(authTypeStr);
} else {
  throw new LoginException("Unsupported authentication type " + 
authTypeStr);
}
return transportFactory;
  }
{code}




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Reopened] (HBASE-14684) Try to remove all MiniMapReduceCluster in unit tests

2015-12-22 Thread Heng Chen (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-14684?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Heng Chen reopened HBASE-14684:
---

revert branch-1. It seems some problems about some tests

> Try to remove all MiniMapReduceCluster in unit tests
> 
>
> Key: HBASE-14684
> URL: https://issues.apache.org/jira/browse/HBASE-14684
> Project: HBase
>  Issue Type: Improvement
>  Components: test
>Reporter: Heng Chen
>    Assignee: Heng Chen
>Priority: Critical
> Fix For: 2.0.0, 1.2.0, 1.3.0
>
> Attachments: 14684.branch-1.txt, 14684.branch-1.txt, 
> 14684.branch-1.txt, HBASE-14684-branch-1.2.patch, HBASE-14684-branch-1.patch, 
> HBASE-14684-branch-1.patch, HBASE-14684-branch-1.patch, 
> HBASE-14684-branch-1_v1.patch, HBASE-14684.patch, HBASE-14684_v1.patch
>
>
> As discussion in dev list,  we will try to do MR job without 
> MiniMapReduceCluster.
> Testcases will run faster and more reliable.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Reopened] (HBASE-14949) Skip duplicate entries when replay WAL.

2015-12-09 Thread Heng Chen (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-14949?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Heng Chen reopened HBASE-14949:
---

> Skip duplicate entries when replay WAL.
> ---
>
> Key: HBASE-14949
> URL: https://issues.apache.org/jira/browse/HBASE-14949
> Project: HBase
>  Issue Type: Sub-task
>    Reporter: Heng Chen
> Attachments: HBASE-14949.patch
>
>
> As HBASE-14004 design,  there will be duplicate entries in different WAL.  It 
> happens when one hflush failed, we will close old WAL with 'acked hflushed' 
> length,  then open a new WAL and write the unacked hlushed entries into it.
> So there maybe some overlap between old WAL and new WAL.
> We should skip the duplicate entries when replay.  I think it has no harm to 
> current logic, maybe we do it first. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (HBASE-14949) Skip duplicate entries when replay WAL.

2015-12-09 Thread Heng Chen (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-14949?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Heng Chen resolved HBASE-14949.
---
Resolution: Invalid

> Skip duplicate entries when replay WAL.
> ---
>
> Key: HBASE-14949
> URL: https://issues.apache.org/jira/browse/HBASE-14949
> Project: HBase
>  Issue Type: Sub-task
>    Reporter: Heng Chen
> Attachments: HBASE-14949.patch
>
>
> As HBASE-14004 design,  there will be duplicate entries in different WAL.  It 
> happens when one hflush failed, we will close old WAL with 'acked hflushed' 
> length,  then open a new WAL and write the unacked hlushed entries into it.
> So there maybe some overlap between old WAL and new WAL.
> We should skip the duplicate entries when replay.  I think it has no harm to 
> current logic, maybe we do it first. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HBASE-14949) Skip duplicate entries when replay WAL.

2015-12-08 Thread Heng Chen (JIRA)

Heng Chen created HBASE-14949:
-

 Summary: Skip duplicate entries when replay WAL.
 Key: HBASE-14949
 URL: https://issues.apache.org/jira/browse/HBASE-14949
 Project: HBase
  Issue Type: Improvement
Reporter: Heng Chen


As HBASE-14004 design,  there will be duplicate entries in different WAL.  It 
happens when one hflush failed, we will close old WAL with 'acked hflushed' 
length,  then open a new WAL and write the unacked hlushed entries into it.
So there maybe some overlap between old WAL and new WAL.

We should skip the duplicate entries when replay.  I think it has no harm to 
current logic, maybe we do it first. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HBASE-14897) TestTableLockManager.testReapAllTableLocks is flakey

2015-11-30 Thread Heng Chen (JIRA)

Heng Chen created HBASE-14897:
-

 Summary: TestTableLockManager.testReapAllTableLocks is flakey
 Key: HBASE-14897
 URL: https://issues.apache.org/jira/browse/HBASE-14897
 Project: HBase
  Issue Type: Bug
Reporter: Heng Chen


It comes from email list which [~stack] post.

This is the some relates QA information.

https://builds.apache.org/view/H-L/view/HBase/job/HBase-Trunk_matrix/512/jdk=latest1.8,label=Hadoop/testReport/org.apache.hadoop.hbase.master/TestTableLockManager/testReapAllTableLocks/

The reason is here.
{code}
writeLocksObtained.await();
writeLocksAttempted.await();
{code}

writeLocksAttempted maybe count down to 0 before created node on ZK,  and main 
thread will go on to run lockManager.reapWriteLocks(),  And after that node was 
created on ZK,  so relates lock acquire will timeout.

I upload a patch which can reproduce this issue.






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HBASE-14843) TestWALProcedureStore.testLoad is flakey

2015-11-19 Thread Heng Chen (JIRA)

Heng Chen created HBASE-14843:
-

 Summary: TestWALProcedureStore.testLoad is flakey
 Key: HBASE-14843
 URL: https://issues.apache.org/jira/browse/HBASE-14843
 Project: HBase
  Issue Type: Bug
Affects Versions: 2.0.0
Reporter: Heng Chen


I see it twice recently, 
see.
https://builds.apache.org/job/PreCommit-HBASE-Build/16589//testReport/org.apache.hadoop.hbase.procedure2.store.wal/TestWALProcedureStore/testLoad/

https://builds.apache.org/job/PreCommit-HBASE-Build/16532/testReport/org.apache.hadoop.hbase.procedure2.store.wal/TestWALProcedureStore/testLoad/

Let's see what's happening.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Any plan to upgrade log4j to 2.x

2015-11-17 Thread Heng Chen

Some projects begin to use log4j 2.x in my company.
The API for Log4j 2 is not compatible with Log4j 1.x,
So sometimes hbase client log has some conflicts with our project.

[jira] [Created] (HBASE-14815) TestMobExportSnapshot.testExportFailure timeout occasionally

2015-11-14 Thread Heng Chen (JIRA)

Heng Chen created HBASE-14815:
-

 Summary: TestMobExportSnapshot.testExportFailure timeout 
occasionally
 Key: HBASE-14815
 URL: https://issues.apache.org/jira/browse/HBASE-14815
 Project: HBase
  Issue Type: Bug
Reporter: Heng Chen


On master,  TestMobExportSnapshot.testExportFailure timeout occasionally.

See
https://builds.apache.org/job/PreCommit-HBASE-Build/16514//testReport/org.apache.hadoop.hbase.snapshot/TestMobExportSnapshot/testExportFailure/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (HBASE-14659) Fix flakey TestHFileOutputFormat

2015-11-05 Thread Heng Chen (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-14659?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Heng Chen resolved HBASE-14659.
---
Resolution: Won't Fix

> Fix flakey TestHFileOutputFormat
> 
>
> Key: HBASE-14659
> URL: https://issues.apache.org/jira/browse/HBASE-14659
> Project: HBase
>  Issue Type: Bug
>    Reporter: Heng Chen
>Assignee: Heng Chen
>
> As i said in HBASE-14654
> This testcase was hanged twice recently.   
> the two QA information:
> https://builds.apache.org/job/PreCommit-HBASE-Build/16118/console
> https://builds.apache.org/job/PreCommit-HBASE-Build/16117/console
> I notice that the two QA running simultaneously on same machine H0.
> I doubt If there are some common resources the two QA conflicts.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: Balancer not running for a long time.

2015-10-30 Thread Heng Chen

Any plan backport HBASE-14309 into 0.98 ?

2015-10-31 10:26 GMT+08:00 Heng Chen :

> bq. Not running balancer because 3 region(s) in transition
>
> Yeah, but balancer runs every 5 minutes, 3 region(s) in transition log
> only show up only in a few places
>
> 2015-10-31 10:23 GMT+08:00 Ted Yu :
>
>> bq. there are no logs to record why we not running balancer
>>
>> Here was the reason:
>>
>> bq. Not running balancer because 3 region(s) in transition:
>>
>> bq. Could we just balance regions not in transition?
>>
>> Yes. Please take a look at HBASE-14309
>>
>> Cheers
>>
>> On Fri, Oct 30, 2015 at 7:19 PM, Heng Chen 
>> wrote:
>>
>> > My hbase cluster version is 0.98.6
>> >
>> > There are lots of regions on it,  about 1+
>> >
>> > Load is heavy,  almost every time there are regions in split
>> >
>> > So i found that the balancer not run for a long time.
>> >
>> > grep -i 'balancer' master.log, there are only logs like below
>> >
>> > 2015-09-30 11:29:13,994 DEBUG
>> > [dx-ape-hmaster1-online,6,1438752227040-BalancerChore]
>> master.HMaster:
>> > Not running balancer because 3 region(s) in transition:
>> > {30971a1ae707b9f5bbcd7b8802f32059={30971a1ae707b9f5bbcd7b8802f32059
>> > state=SPLITTING_NEW, ts=1443583753692,
>> > server=dx-ape-regionserver30-online,60020,1440183710528},
>> > 13eaacf6df912d0cb598067610c5a85f={13eaacf6df912d0cb598067610c5a85f
>> > state=SPLITTING_NEW, ...
>> > 2015-10-01 17:44:14,032 DEBUG
>> > [dx-ape-hmaster1-online,6,1438752227040-BalancerChore]
>> master.HMaster:
>> > Not running balancer because 3 region(s) in transition:
>> > {55fc1c408832233ee1dd01c70c61ae14={55fc1c408832233ee1dd01c70c61ae14
>> > state=SPLITTING, ts=1443692653425,
>> > server=dx-ape-regionserver27-online,60020,1440183264316},
>> > 07439db0ff1319d20b43aa4d2e43a4ae={07439db0ff1319d20b43aa4d2e43a4ae
>> > state=SPLITTING_NEW, ts=1...
>> > 2015-10-04 14:04:14,126 DEBUG
>> > [dx-ape-hmaster1-online,6,1438752227040-BalancerChore]
>> master.HMaster:
>> > Not running balancer because 3 region(s) in transition:
>> > {2bd0891dc9ca5fb15ea8b661127193b7={2bd0891dc9ca5fb15ea8b661127193b7
>> > state=SPLITTING, ts=1443938653837,
>> > server=dx-ape-regionserver9-online,60020,1440182448264},
>> > 76bbb47201c3958e3a9c1086bfb351c5={76bbb47201c3958e3a9c1086bfb351c5
>> > state=SPLITTING_NEW, ts=14...
>> > 2015-10-05 14:14:14,161 DEBUG
>> > [dx-ape-hmaster1-online,6,1438752227040-BalancerChore]
>> master.HMaster:
>> > Not running balancer because 3 region(s) in transition:
>> > {669719254f132476c6df0e0e9b1fc93f={669719254f132476c6df0e0e9b1fc93f
>> > state=SPLITTING_NEW, ts=1444025653911,
>> > server=dx-ape-regionserver1-online,60020,1440178926883},
>> > ec612addaabb22c8f46b2c903bd1158b={ec612addaabb22c8f46b2c903bd1158b
>> > state=SPLITTING_NEW, t...
>> > 2015-10-15 21:19:14,512 DEBUG
>> > [dx-ape-hmaster1-online,6,1438752227040-BalancerChore]
>> master.HMaster:
>> > Not running balancer because 3 region(s) in transition:
>> > {2b7a5c3ddc7ee919199c68611e6f6c96={2b7a5c3ddc7ee919199c68611e6f6c96
>> > state=SPLITTING, ts=1444915153714,
>> > server=dx-ape-regionserver12-online,60020,1440181883146},
>> > cda06b9ebd651c616361f73a469a1a52={cda06b9ebd651c616361f73a469a1a52
>> > state=SPLITTING_NEW, ts=1...
>> > 2015-10-15 23:39:14,513 DEBUG
>> > [dx-ape-hmaster1-online,6,1438752227040-BalancerChore]
>> master.HMaster:
>> > Not running balancer because 3 region(s) in transition:
>> > {b1d3429606407280e442d8ce3de873c4={b1d3429606407280e442d8ce3de873c4
>> > state=SPLITTING, ts=1444923553844,
>> > server=dx-ape-regionserver25-online,60020,1440183200463},
>> > ae7ba7ee139c7ba84ba707671b7959c4={ae7ba7ee139c7ba84ba707671b7959c4
>> > state=SPLITTING_NEW, ts=1...
>> > 2015-10-21 19:29:14,692 DEBUG
>> > [dx-ape-hmaster1-online,6,1438752227040-BalancerChore]
>> master.HMaster:
>> > Not running balancer because 3 region(s) in transition:
>> > {e677e41a383eb20429c9906bafc252bb={e677e41a383eb20429c9906bafc252bb
>> > state=SPLITTING_NEW, ts=1445426954437,
>> > server=dx-ape-regionserver11-online,60020,1440181972615},
>> > 0028b035271bdd6d30e7fb6f1ffb406d={0028b035271bdd6d30e7fb6f1ffb406d
>> > state=SPLITTING, ts=1...
>> > 2015-10-25 10:24:14,790 DEBUG
>>

Re: Balancer not running for a long time.

2015-10-30 Thread Heng Chen

bq. Not running balancer because 3 region(s) in transition

Yeah, but balancer runs every 5 minutes, 3 region(s) in transition log only
show up only in a few places

2015-10-31 10:23 GMT+08:00 Ted Yu :

> bq. there are no logs to record why we not running balancer
>
> Here was the reason:
>
> bq. Not running balancer because 3 region(s) in transition:
>
> bq. Could we just balance regions not in transition?
>
> Yes. Please take a look at HBASE-14309
>
> Cheers
>
> On Fri, Oct 30, 2015 at 7:19 PM, Heng Chen 
> wrote:
>
> > My hbase cluster version is 0.98.6
> >
> > There are lots of regions on it,  about 1+
> >
> > Load is heavy,  almost every time there are regions in split
> >
> > So i found that the balancer not run for a long time.
> >
> > grep -i 'balancer' master.log, there are only logs like below
> >
> > 2015-09-30 11:29:13,994 DEBUG
> > [dx-ape-hmaster1-online,6,1438752227040-BalancerChore]
> master.HMaster:
> > Not running balancer because 3 region(s) in transition:
> > {30971a1ae707b9f5bbcd7b8802f32059={30971a1ae707b9f5bbcd7b8802f32059
> > state=SPLITTING_NEW, ts=1443583753692,
> > server=dx-ape-regionserver30-online,60020,1440183710528},
> > 13eaacf6df912d0cb598067610c5a85f={13eaacf6df912d0cb598067610c5a85f
> > state=SPLITTING_NEW, ...
> > 2015-10-01 17:44:14,032 DEBUG
> > [dx-ape-hmaster1-online,6,1438752227040-BalancerChore]
> master.HMaster:
> > Not running balancer because 3 region(s) in transition:
> > {55fc1c408832233ee1dd01c70c61ae14={55fc1c408832233ee1dd01c70c61ae14
> > state=SPLITTING, ts=1443692653425,
> > server=dx-ape-regionserver27-online,60020,1440183264316},
> > 07439db0ff1319d20b43aa4d2e43a4ae={07439db0ff1319d20b43aa4d2e43a4ae
> > state=SPLITTING_NEW, ts=1...
> > 2015-10-04 14:04:14,126 DEBUG
> > [dx-ape-hmaster1-online,6,1438752227040-BalancerChore]
> master.HMaster:
> > Not running balancer because 3 region(s) in transition:
> > {2bd0891dc9ca5fb15ea8b661127193b7={2bd0891dc9ca5fb15ea8b661127193b7
> > state=SPLITTING, ts=1443938653837,
> > server=dx-ape-regionserver9-online,60020,1440182448264},
> > 76bbb47201c3958e3a9c1086bfb351c5={76bbb47201c3958e3a9c1086bfb351c5
> > state=SPLITTING_NEW, ts=14...
> > 2015-10-05 14:14:14,161 DEBUG
> > [dx-ape-hmaster1-online,6,1438752227040-BalancerChore]
> master.HMaster:
> > Not running balancer because 3 region(s) in transition:
> > {669719254f132476c6df0e0e9b1fc93f={669719254f132476c6df0e0e9b1fc93f
> > state=SPLITTING_NEW, ts=1444025653911,
> > server=dx-ape-regionserver1-online,60020,1440178926883},
> > ec612addaabb22c8f46b2c903bd1158b={ec612addaabb22c8f46b2c903bd1158b
> > state=SPLITTING_NEW, t...
> > 2015-10-15 21:19:14,512 DEBUG
> > [dx-ape-hmaster1-online,6,1438752227040-BalancerChore]
> master.HMaster:
> > Not running balancer because 3 region(s) in transition:
> > {2b7a5c3ddc7ee919199c68611e6f6c96={2b7a5c3ddc7ee919199c68611e6f6c96
> > state=SPLITTING, ts=1444915153714,
> > server=dx-ape-regionserver12-online,60020,1440181883146},
> > cda06b9ebd651c616361f73a469a1a52={cda06b9ebd651c616361f73a469a1a52
> > state=SPLITTING_NEW, ts=1...
> > 2015-10-15 23:39:14,513 DEBUG
> > [dx-ape-hmaster1-online,6,1438752227040-BalancerChore]
> master.HMaster:
> > Not running balancer because 3 region(s) in transition:
> > {b1d3429606407280e442d8ce3de873c4={b1d3429606407280e442d8ce3de873c4
> > state=SPLITTING, ts=1444923553844,
> > server=dx-ape-regionserver25-online,60020,1440183200463},
> > ae7ba7ee139c7ba84ba707671b7959c4={ae7ba7ee139c7ba84ba707671b7959c4
> > state=SPLITTING_NEW, ts=1...
> > 2015-10-21 19:29:14,692 DEBUG
> > [dx-ape-hmaster1-online,6,1438752227040-BalancerChore]
> master.HMaster:
> > Not running balancer because 3 region(s) in transition:
> > {e677e41a383eb20429c9906bafc252bb={e677e41a383eb20429c9906bafc252bb
> > state=SPLITTING_NEW, ts=1445426954437,
> > server=dx-ape-regionserver11-online,60020,1440181972615},
> > 0028b035271bdd6d30e7fb6f1ffb406d={0028b035271bdd6d30e7fb6f1ffb406d
> > state=SPLITTING, ts=1...
> > 2015-10-25 10:24:14,790 DEBUG
> > [dx-ape-hmaster1-online,6,1438752227040-BalancerChore]
> master.HMaster:
> > Not running balancer because 3 region(s) in transition:
> > {694912c058fcd0e6bff7b3eaed1b051b={694912c058fcd0e6bff7b3eaed1b051b
> > state=SPLITTING_NEW, ts=1445739851757,
> > server=dx-ape-regionserver27-online,60020,1440183264316},
> > 7859193f7ca5ee2c98636cb812b549a7={7859193f7ca5ee2c98636cb812b549a7
> > state=SPLITTING, ts=1...
> >
> >
>

Balancer not running for a long time.

2015-10-30 Thread Heng Chen

My hbase cluster version is 0.98.6

There are lots of regions on it,  about 1+

Load is heavy,  almost every time there are regions in split

So i found that the balancer not run for a long time.

grep -i 'balancer' master.log, there are only logs like below

2015-09-30 11:29:13,994 DEBUG
[dx-ape-hmaster1-online,6,1438752227040-BalancerChore] master.HMaster:
Not running balancer because 3 region(s) in transition:
{30971a1ae707b9f5bbcd7b8802f32059={30971a1ae707b9f5bbcd7b8802f32059
state=SPLITTING_NEW, ts=1443583753692,
server=dx-ape-regionserver30-online,60020,1440183710528},
13eaacf6df912d0cb598067610c5a85f={13eaacf6df912d0cb598067610c5a85f
state=SPLITTING_NEW, ...
2015-10-01 17:44:14,032 DEBUG
[dx-ape-hmaster1-online,6,1438752227040-BalancerChore] master.HMaster:
Not running balancer because 3 region(s) in transition:
{55fc1c408832233ee1dd01c70c61ae14={55fc1c408832233ee1dd01c70c61ae14
state=SPLITTING, ts=1443692653425,
server=dx-ape-regionserver27-online,60020,1440183264316},
07439db0ff1319d20b43aa4d2e43a4ae={07439db0ff1319d20b43aa4d2e43a4ae
state=SPLITTING_NEW, ts=1...
2015-10-04 14:04:14,126 DEBUG
[dx-ape-hmaster1-online,6,1438752227040-BalancerChore] master.HMaster:
Not running balancer because 3 region(s) in transition:
{2bd0891dc9ca5fb15ea8b661127193b7={2bd0891dc9ca5fb15ea8b661127193b7
state=SPLITTING, ts=1443938653837,
server=dx-ape-regionserver9-online,60020,1440182448264},
76bbb47201c3958e3a9c1086bfb351c5={76bbb47201c3958e3a9c1086bfb351c5
state=SPLITTING_NEW, ts=14...
2015-10-05 14:14:14,161 DEBUG
[dx-ape-hmaster1-online,6,1438752227040-BalancerChore] master.HMaster:
Not running balancer because 3 region(s) in transition:
{669719254f132476c6df0e0e9b1fc93f={669719254f132476c6df0e0e9b1fc93f
state=SPLITTING_NEW, ts=1444025653911,
server=dx-ape-regionserver1-online,60020,1440178926883},
ec612addaabb22c8f46b2c903bd1158b={ec612addaabb22c8f46b2c903bd1158b
state=SPLITTING_NEW, t...
2015-10-15 21:19:14,512 DEBUG
[dx-ape-hmaster1-online,6,1438752227040-BalancerChore] master.HMaster:
Not running balancer because 3 region(s) in transition:
{2b7a5c3ddc7ee919199c68611e6f6c96={2b7a5c3ddc7ee919199c68611e6f6c96
state=SPLITTING, ts=1444915153714,
server=dx-ape-regionserver12-online,60020,1440181883146},
cda06b9ebd651c616361f73a469a1a52={cda06b9ebd651c616361f73a469a1a52
state=SPLITTING_NEW, ts=1...
2015-10-15 23:39:14,513 DEBUG
[dx-ape-hmaster1-online,6,1438752227040-BalancerChore] master.HMaster:
Not running balancer because 3 region(s) in transition:
{b1d3429606407280e442d8ce3de873c4={b1d3429606407280e442d8ce3de873c4
state=SPLITTING, ts=1444923553844,
server=dx-ape-regionserver25-online,60020,1440183200463},
ae7ba7ee139c7ba84ba707671b7959c4={ae7ba7ee139c7ba84ba707671b7959c4
state=SPLITTING_NEW, ts=1...
2015-10-21 19:29:14,692 DEBUG
[dx-ape-hmaster1-online,6,1438752227040-BalancerChore] master.HMaster:
Not running balancer because 3 region(s) in transition:
{e677e41a383eb20429c9906bafc252bb={e677e41a383eb20429c9906bafc252bb
state=SPLITTING_NEW, ts=1445426954437,
server=dx-ape-regionserver11-online,60020,1440181972615},
0028b035271bdd6d30e7fb6f1ffb406d={0028b035271bdd6d30e7fb6f1ffb406d
state=SPLITTING, ts=1...
2015-10-25 10:24:14,790 DEBUG
[dx-ape-hmaster1-online,6,1438752227040-BalancerChore] master.HMaster:
Not running balancer because 3 region(s) in transition:
{694912c058fcd0e6bff7b3eaed1b051b={694912c058fcd0e6bff7b3eaed1b051b
state=SPLITTING_NEW, ts=1445739851757,
server=dx-ape-regionserver27-online,60020,1440183264316},
7859193f7ca5ee2c98636cb812b549a7={7859193f7ca5ee2c98636cb812b549a7
state=SPLITTING, ts=1...


The balancer runs every 5 minutes,  there are no logs to record why we not
running balancer,  should we add some logs at least?

As for the above logs,  it seems we stop running balancer when regions in
transition

This is the relates code

// Only allow one balance run at at time.
if (this.assignmentManager.getRegionStates().isRegionsInTransition()) {
  Map regionsInTransition =
this.assignmentManager.getRegionStates().getRegionsInTransition();
  LOG.debug("Not running balancer because " + regionsInTransition.size() +
" region(s) in transition: " + org.apache.commons.lang.StringUtils.
  abbreviate(regionsInTransition.toString(), 256));
  return false;
}

And i have questions,  why we use regions states to avoid more than
one balancer running?

Could we just balance regions not in transition?


Thanks!

[jira] [Resolved] (HBASE-14265) we should forbid creating table using 'hbase' namespace except by superuser

2015-10-27 Thread Heng Chen (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-14265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Heng Chen resolved HBASE-14265.
---
Resolution: Invalid

> we should forbid creating table using 'hbase' namespace except by superuser
> ---
>
> Key: HBASE-14265
> URL: https://issues.apache.org/jira/browse/HBASE-14265
> Project: HBase
>  Issue Type: Bug
>Reporter: Heng Chen
>Assignee: Heng Chen
> Attachments: HBASE-14265.patch, HBASE-14265_v2.patch, 
> HBASE-14265_v3.patch, HBASE-14265_v4.patch
>
>
> Now, there is no limit for users who can create table under 'hbase' 
> NameSpace. I think it has some risk.
> Because we use {{TableName.systemTable}} to decide whether this table is 
> System or not.
> But as code,  {{TableName.systemTable}} will be true, if NS equals "hbase'
> {code}
>  if (Bytes.equals(NamespaceDescriptor.SYSTEM_NAMESPACE_NAME, namespace)) {
> this.namespace = NamespaceDescriptor.SYSTEM_NAMESPACE_NAME;
> this.namespaceAsString = 
> NamespaceDescriptor.SYSTEM_NAMESPACE_NAME_STR;
> this.systemTable = true;
>   } 
> {code}
>  
> And we treat system table and normal table differently. 
> For example,  https://issues.apache.org/jira/browse/HBASE-14257 will flush 
> fast if table belong to system table.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (HBASE-14703) update the per-region stats twice for the call on return

2015-10-26 Thread Heng Chen (JIRA)

Heng Chen created HBASE-14703:
-

 Summary: update the per-region stats twice for the call on return
 Key: HBASE-14703
 URL: https://issues.apache.org/jira/browse/HBASE-14703
 Project: HBase
  Issue Type: Bug
Reporter: Heng Chen


In {{AsyncProcess.SingleServerRequestRunnable}}, it seems we update 
serverStatistics twice.

The first one is that we wrapper {{RetryingCallable}}  by 
{{StatsTrackingRpcRetryingCaller}}, and do serverStatistics update when we call 
{{callWithRetries}} and {{callWithoutRetries}}. Relates code like below:
{code}
  @Override
  public T callWithRetries(RetryingCallable callable, int callTimeout)
  throws IOException, RuntimeException {
T result = delegate.callWithRetries(callable, callTimeout);
return updateStatsAndUnwrap(result, callable);
  }

  @Override
  public T callWithoutRetries(RetryingCallable callable, int callTimeout)
  throws IOException, RuntimeException {
T result = delegate.callWithRetries(callable, callTimeout);
return updateStatsAndUnwrap(result, callable);
  }
{code}

The secondary one is after we get response, in {{receiveMultiAction}}, we do 
update again. 
{code}
// update the stats about the region, if its a user table. We don't want to 
slow down
// updates to meta tables, especially from internal updates (master, etc).
if (AsyncProcess.this.connection.getStatisticsTracker() != null) {
  result = ResultStatsUtil.updateStats(result,
  AsyncProcess.this.connection.getStatisticsTracker(), server, regionName);
}
{code}

It seems that {{StatsTrackingRpcRetryingCaller}} is NOT necessary,  remove it?




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

1 2 >

1 - 100 of 116 matches

Mail list logo