[jira] [Created] (HBASE-19397) Design procedures for ReplicationManager to notify peer change event from master

2017-11-30 Thread Zheng Hu (JIRA)
Zheng Hu created HBASE-19397:


 Summary: Design  procedures for ReplicationManager to notify peer 
change event from master
 Key: HBASE-19397
 URL: https://issues.apache.org/jira/browse/HBASE-19397
 Project: HBase
  Issue Type: Sub-task
Reporter: Zheng Hu
Assignee: Zheng Hu


After we store peer states / peer queues information into hbase table,   RS can 
not tracker peer config change by adding watcher znode.   

So we need design procedures for ReplicationManager to notify peer change 
event.   the replication rpc interfaces which may be implemented by procedures 
are following: 

{code}
1. addReplicationPeer
2. removeReplicationPeer
3. enableReplicationPeer
4. disableReplicationPeer
5. updateReplicationPeerConfig
{code}

BTW,  our RS states will still be store in zookeeper,  so when RS crash, the 
tracker which will trigger to transfer queues of crashed RS will still be a 
Zookeeper Tracker.  we need NOT implement that by  procedures.  

As we will  release 2.0 in next weeks,  and the HBASE-15867 can not be resolved 
before the release,  so I'd prefer to create a new feature branch for 
HBASE-15867. 







--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (HBASE-19396) Fix flaky test TestHTableMultiplexerFlushCache

2017-11-30 Thread Guanghao Zhang (JIRA)
Guanghao Zhang created HBASE-19396:
--

 Summary: Fix flaky test TestHTableMultiplexerFlushCache
 Key: HBASE-19396
 URL: https://issues.apache.org/jira/browse/HBASE-19396
 Project: HBase
  Issue Type: Bug
  Components: test
Affects Versions: 1.5.0
Reporter: Guanghao Zhang
Assignee: Guanghao Zhang
Priority: Minor


[INFO] Running org.apache.hadoop.hbase.client.TestHTableMultiplexerFlushCache
[ERROR] Tests run: 2, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 36.67 s 
<<< FAILURE! - in org.apache.hadoop.hbase.client.TestHTableMultiplexerFlushCache
[ERROR] 
testOnRegionMove(org.apache.hadoop.hbase.client.TestHTableMultiplexerFlushCache)
  Time elapsed: 4.644 s  <<< FAILURE!
java.lang.AssertionError: Did not find a new RegionServer to use
at 
org.apache.hadoop.hbase.client.TestHTableMultiplexerFlushCache.testOnRegionMove(TestHTableMultiplexerFlushCache.java:160)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (HBASE-19395) [branch-1] TestEndToEndSplitTransaction.testMasterOpsWhileSplitting fails with NPE

2017-11-30 Thread Guanghao Zhang (JIRA)
Guanghao Zhang created HBASE-19395:
--

 Summary: [branch-1] 
TestEndToEndSplitTransaction.testMasterOpsWhileSplitting fails with NPE
 Key: HBASE-19395
 URL: https://issues.apache.org/jira/browse/HBASE-19395
 Project: HBase
  Issue Type: Bug
Affects Versions: 1.5.0
Reporter: Guanghao Zhang


[INFO] Running org.apache.hadoop.hbase.regionserver.TestEndToEndSplitTransaction
[ERROR] Tests run: 3, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 50.388 
s <<< FAILURE! - in 
org.apache.hadoop.hbase.regionserver.TestEndToEndSplitTransaction
[ERROR] 
testMasterOpsWhileSplitting(org.apache.hadoop.hbase.regionserver.TestEndToEndSplitTransaction)
  Time elapsed: 8.903 s  <<< ERROR!
java.lang.NullPointerException
at 
org.apache.hadoop.hbase.regionserver.TestEndToEndSplitTransaction.test(TestEndToEndSplitTransaction.java:239)
at 
org.apache.hadoop.hbase.regionserver.TestEndToEndSplitTransaction.testMasterOpsWhileSplitting(TestEndToEndSplitTransaction.java:148)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (HBASE-19394) Issue on the publication feature of RS status with multicast (hbase.status.published) in multi-homeing env

2017-11-30 Thread Toshihiro Suzuki (JIRA)
Toshihiro Suzuki created HBASE-19394:


 Summary: Issue on the publication feature of RS status with 
multicast (hbase.status.published) in multi-homeing env
 Key: HBASE-19394
 URL: https://issues.apache.org/jira/browse/HBASE-19394
 Project: HBase
  Issue Type: Bug
  Components: master
Reporter: Toshihiro Suzuki


Currently, when the publication feature is enabled 
(hbase.status.published=true), it uses the interface which is found first:
https://github.com/apache/hbase/blob/2e8bd0036dbdf3a99786e5531495d8d4cb51b86c/hbase-server/src/main/java/org/apache/hadoop/hbase/master/ClusterStatusPublisher.java#L268-L275

This won't work when the host has the multiple network interfaces and the 
unreachable one to the other nodes is selected. The interface which can be used 
for the communication between cluster nodes should be configurable.




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (HBASE-19385) [1.3] TestReplicator failed 1.3 nightly

2017-11-30 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-19385?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack resolved HBASE-19385.
---
  Resolution: Fixed
Hadoop Flags: Reviewed

Pushed to branch-1.4 too at [~apurtell] request. Left the test disabled. 
Resolving.

> [1.3] TestReplicator failed 1.3 nightly
> ---
>
> Key: HBASE-19385
> URL: https://issues.apache.org/jira/browse/HBASE-19385
> Project: HBase
>  Issue Type: Bug
>  Components: test
>Reporter: stack
>Assignee: stack
> Fix For: 2.0.0, 1.4.0, 1.3.2
>
> Attachments: HBASE-19385.branch-1.3.001.patch
>
>
> TestReplicator failed 1.3 nightly. Running it local, it fails sometimes. 
> Complaint is illegalmonitorstate  and indeed, locking around latch is unsafe. 
> Fixing this, I can't get it to fail locally anymore.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[VOTE] First release candidate for HBase 1.1.13 (RC0) is available

2017-11-30 Thread Nick Dimiduk
I'm happy to announce the first release candidate of HBase 1.1.13
(HBase-1.1.13RC0) is available for download at
*https://dist.apache.org/repos/dist/dev/hbase/hbase-1.1.13RC0/
*.

This is to be the final release from branch-1.1.

Maven artifacts are available in the staging repository
*https://repository.apache.org/content/repositories/orgapachehbase-1182
*

Artifacts are signed with my code signing subkey 0xAD9039071C3489BD,
available in the Apache keys directory https://people.apache.org/keys
/committer/ndimiduk.asc and in our KEYS file http://www-us.apache.org/dist/
hbase/KEYS.

There's also a signed tag for this release at
*https://git-wip-us.apache.org/repos/asf?p=hbase.git;a=tag;h=16a04e6629e614c7900c443f3a29cdba92dd7b7e
*

The detailed source and binary compatibility report vs 1.1.12 has been
published for your review, at
*https://home.apache.org/~ndimiduk/1.1.12_1.1.13RC0_compat_report.html
*

HBase 1.1.13 is the thirteenth and final patch release in the HBase 1.1
line, continuing on the theme of bringing a stable, reliable database to
the Hadoop and NoSQL communities. This release includes over 40 resolved
issues since the 1.1.12 release; the majority of these changes are to build
tooling rather than the product itself. Notable product correctness fixes
include HBASE-18665 and HBASE-19052.

The full list of fixes included in this release is available at
*https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12310753=12341346
*
and
and in the CHANGES.txt file included in the distribution.

Please try out this candidate and vote +/-1 by 23:59 Pacific time on
Friday, 2017-12-08 as to whether we should release these artifacts as HBase
1.1.13.

Thanks,
Nick


Re: [DISCUSS] Plan to avoid backup/restore removal from 2.0

2017-11-30 Thread Stack
On Tue, Nov 7, 2017 at 8:30 PM, Josh Elser  wrote:

> Folks,
>
> I've been working with Vlad and Ted offline to make sure we have a plan
> that addresses the implementation gaps Vlad sees and the barriers-for-entry
> previously stated to keep the feature in HBase 2.0. My hope is that this
> can be an honest discussion given 2.0-beta timelines, with a concrete
> action plan. I'm trying my best to not re-hash the logic/reasoning/caveats
> behind previous concerns; anything folks feel is a blocker that I haven't
> covered below is unintentional.
>
> The list:
>
> 1. Documentation. It must be updated and committed, ensuring it covers the
> details operators/architects need to know to use it effectively
> (HBASE-16574). Vlad will help with content, myself and/or Frank will get it
> updated to asciidoc.
>
> 2. Distributed testing missing. Vlad has taken my previous document on
> goals and translated that into an implementation outline[1]. Ted and I have
> already weighed in -- I believe it hits the salient points for the quality
> of testing we're looking for. I'll get started on this while Vlad does #4
> (after consensus on approach, of course). Needs JIRA issue (maybe?).
>
> 3. Operator utility to verify backups. In abstract, this should just be
> the same guts of a tool like VerifyReplication. In practice, this should be
> the same code that #3 uses (if not _actually_ the same guts as
> VerifyReplication). The hope is that this will be encapsulated (time-wise)
> by #3. Needs JIRA issue (maybe?).
>
> 4. Polish DistCP for bulk-loaded files/fault-tolerance (HBASE-17852). I
> don't have specifics here -- will rely on Vlad to correct me if there's a
> better JIRA issue to track than the aforementioned. Will rely on details to
> show up the JIRA issue to track it.
>
> Current due dates:
>
>
Checking in on the plan.


> 1. End of week (2017/11/10)
>

I believe this is done.


> 2. Before US Thanksgiving (2017/11/22)
> 3. Same as #2
> 4. Same as #1
>
>
These were not done in time for thanksgiving? Correct me if I'm wrong.

Thanks,
St.Ack



> My current thought is that this is reasonable for implementation times,
> and would not derail the rest of the beta-1 train. I appreciate the
> patience from all parties, and I hope that those trying to make this better
> can find a little more time to give some feedback. Thanks for the long read
> if nothing else.
>
> - Josh
>
> [1] https://docs.google.com/document/d/1xbPlLKjOcPq2LDqjbSkF6uND
> AG0mzgOxek6P3POLeMc/edit?usp=sharing
>


Re: [DISCUSS] Plan to avoid backup/restore removal from 2.0

2017-11-30 Thread Vladimir Rodionov
Nope, Mike. Fortunately, 99% of a FT code will remains after introducing
concurrent sessions support

Just two lines will be changed: TakeSnapshot -> BeginTX, RestoreSnapshot ->
RollbackTx

-Vlad

On Thu, Nov 30, 2017 at 7:20 PM, Mike Drob  wrote:

> Bringing this thread up again, because I don't really know where else to
> ask...
>
> The current backup solution snapshots the backup metadata table and
> will restore-via-snapshot in case something goes wrong (or this is still in
> a patch? unclear if this has been committed or not, since there's a ton of
> code to dig through)
>
> AFAICT this is the major reason that we do not support concurrent backup or
> restore operations. (Are there others? Also couldn't find this.)
>
> The fault tolerance that we're working on now will need to be gutted and
> completely rewritten for the future improvements. I get that this is all
> internal and as long as we make it seamless for the operators then we have
> wide latitude to make our own changes. But an important question is just
> because we can, does it mean we should do this? I'm concerned that we're
> writing code that we know will get thrown away and replaced, except we will
> have to continue to support it for as long as 2.0 is an active branch.
>
> Mike
>
>
> On Wed, Nov 15, 2017 at 3:05 PM, Josh Elser  wrote:
>
> > On 11/14/17 4:54 PM, Mike Drob wrote:
> >
> >> I can see a small section on the documentation update I've already been
> >>> hacking on to include details on the issue "We can't help you secure
> >>> where
> >>> you put the data". Given how many instances of "globally readable S3
> >>> bucket" I've seen recently, this strikes me as prudent.
> >>>
> >>> I would prefer this to be a giant, hard to miss, red letters, all caps
> >> warning; not a small section. I do think it is our responsibility for
> >> telling users how to configure the backup/restore process for
> >> communicating
> >> with secure systems. Or, at a minimum, documenting how we pass arbitrary
> >> configuration options that can then be used to communicate with said
> >> systems.
> >>
> >
> > :D
> >
> > For example, if we support writing backups to S3, then we should have a
> way
> >> to specify an Auth string and maybe even some of the custom headers like
> >> x-amz-acl. We don't have to explicitly enumerate best practices, but if
> >> the
> >> only option is to write to a globally open bucket, then I don't think we
> >> should advertise writing to S3 as an available option.
> >>
> >> Similarly, if we tell people that they can send backups to HDFS, then we
> >> should give them the hooks to correctly interface with a kerberized
> HDFS.
> >>
> >> Maybe this is already in the proposed patch, I haven't gone looking yet.
> >>
> >
> > Nope. I actually meant to include this in the patch I re-rolled today but
> > forgot. Let me update once more.
> >
> > Thanks again, Mike. Good questions/feedback!
> >
>


Re: [DISCUSS] Plan to avoid backup/restore removal from 2.0

2017-11-30 Thread Mike Drob
Bringing this thread up again, because I don't really know where else to
ask...

The current backup solution snapshots the backup metadata table and
will restore-via-snapshot in case something goes wrong (or this is still in
a patch? unclear if this has been committed or not, since there's a ton of
code to dig through)

AFAICT this is the major reason that we do not support concurrent backup or
restore operations. (Are there others? Also couldn't find this.)

The fault tolerance that we're working on now will need to be gutted and
completely rewritten for the future improvements. I get that this is all
internal and as long as we make it seamless for the operators then we have
wide latitude to make our own changes. But an important question is just
because we can, does it mean we should do this? I'm concerned that we're
writing code that we know will get thrown away and replaced, except we will
have to continue to support it for as long as 2.0 is an active branch.

Mike


On Wed, Nov 15, 2017 at 3:05 PM, Josh Elser  wrote:

> On 11/14/17 4:54 PM, Mike Drob wrote:
>
>> I can see a small section on the documentation update I've already been
>>> hacking on to include details on the issue "We can't help you secure
>>> where
>>> you put the data". Given how many instances of "globally readable S3
>>> bucket" I've seen recently, this strikes me as prudent.
>>>
>>> I would prefer this to be a giant, hard to miss, red letters, all caps
>> warning; not a small section. I do think it is our responsibility for
>> telling users how to configure the backup/restore process for
>> communicating
>> with secure systems. Or, at a minimum, documenting how we pass arbitrary
>> configuration options that can then be used to communicate with said
>> systems.
>>
>
> :D
>
> For example, if we support writing backups to S3, then we should have a way
>> to specify an Auth string and maybe even some of the custom headers like
>> x-amz-acl. We don't have to explicitly enumerate best practices, but if
>> the
>> only option is to write to a globally open bucket, then I don't think we
>> should advertise writing to S3 as an available option.
>>
>> Similarly, if we tell people that they can send backups to HDFS, then we
>> should give them the hooks to correctly interface with a kerberized HDFS.
>>
>> Maybe this is already in the proposed patch, I haven't gone looking yet.
>>
>
> Nope. I actually meant to include this in the patch I re-rolled today but
> forgot. Let me update once more.
>
> Thanks again, Mike. Good questions/feedback!
>


Re: Release 1.4.0 update

2017-11-30 Thread Guanghao Zhang
Andrew, HBASE-18626 is a document fix for the incompatible change about the
replication TableCFs' config. Can we include it for 1.4? Thanks.

2017-12-01 9:19 GMT+08:00 Stack :

> I pushed HBASE-18233. Thanks for finding the issue and patience waiting on
> fix Andrew.
> St.Ack
>
> On Thu, Nov 30, 2017 at 5:04 PM, Andrew Purtell 
> wrote:
>
> > No problem, committing it now
> >
> > On Thu, Nov 30, 2017 at 4:54 PM, Sergey Soldatov <
> sergeysolda...@gmail.com
> > >
> > wrote:
> >
> > > Andrew,
> > >
> > > Can we include HBASE-19393 as well? Quite annoying issue and very
> simple
> > > fix.
> > >
> > > Thanks,
> > > Sergey
> > >
> > > On Thu, Nov 30, 2017 at 3:47 PM, Andrew Purtell 
> > > wrote:
> > >
> > > > Not too late, no
> > > >
> > > > On Thu, Nov 30, 2017 at 3:31 PM, Stack  wrote:
> > > >
> > > > > Fix is up if it is not too late Andrew.
> > > > > St.Ack
> > > > >
> > > > > On Thu, Nov 30, 2017 at 1:58 PM, Stack  wrote:
> > > > >
> > > > > > Andrew, your testing has turned up an issue in HBASE-18233. It is
> > > > present
> > > > > > in the 1.4 candidate patch and in 1.3. The failure is
> > intermittent. I
> > > > am
> > > > > > working on a fix but want to make sure I have it right. So, I
> > > withdraw
> > > > my
> > > > > > request that 1.4 include it.
> > > > > >
> > > > > > Thanks,
> > > > > > S
> > > > > >
> > > > > > On Wed, Nov 29, 2017 at 5:14 PM, Andrew Purtell <
> > apurt...@apache.org
> > > >
> > > > > > wrote:
> > > > > >
> > > > > >> TestGlobalThrottler is a problem stemming from the revert of
> > > > HBASE-9465
> > > > > >> ​ on branch-1.4. The test came in on HBASE-17314 so I'll also
> > revert
> > > > > that
> > > > > >> from branch-1.4. For more on this see HBASE-19381
> > > > > >> ​
> > > > > >>
> > > > > >> On Wed, Nov 29, 2017 at 5:00 PM, Andrew Purtell <
> > > apurt...@apache.org>
> > > > > >> wrote:
> > > > > >>
> > > > > >> > The TestEndToEndSplitTransaction failure will be fixed by
> > > > HBASE-19379.
> > > > > >> >
> > > > > >> > The TestGlobalThrottler issue is a hang, which is probably why
> > it
> > > > > >> slipped
> > > > > >> > through the cracks. I went back 32 commits from head and it
> was
> > > > still
> > > > > >> > stuck. 64 commits back it's good. Somewhere in between. Will
> get
> > > to
> > > > > the
> > > > > >> > offending commit shortly.
> > > > > >> >
> > > > > >> >
> > > > > >> > On Wed, Nov 29, 2017 at 3:56 PM, Andrew Purtell <
> > > > apurt...@apache.org>
> > > > > >> > wrote:
> > > > > >> >
> > > > > >> >> Thanks. I'll take a look. They were passing for me before I
> > went
> > > > out
> > > > > on
> > > > > >> >> vacation.
> > > > > >> >>
> > > > > >> >>
> > > > > >> >> On Wed, Nov 29, 2017 at 3:52 PM, Stack 
> > wrote:
> > > > > >> >>
> > > > > >> >>> Thanks.
> > > > > >> >>>
> > > > > >> >>> BTW, I noticed this morning that TestGlobalThrottler and
> > > > > >> >>> TestEndToEndSplitTransaction
> > > > > >> >>> fail locally for me and up on jenkins as part of hadoopqa
> runs
> > > and
> > > > > on
> > > > > >> >>> recent 1.4 runs.
> > > > > >> >>>
> > > > > >> >>> I tried to poke at why. They seem fine in 1.2, 1.3, and 2.0.
> > Got
> > > > > >> >>> distracted
> > > > > >> >>> and got no further than this
> > > > > >> >>>
> > > > > >> >>> S
> > > > > >> >>>
> > > > > >> >>> On Wed, Nov 29, 2017 at 3:00 PM, Andrew Purtell <
> > > > > apurt...@apache.org>
> > > > > >> >>> wrote:
> > > > > >> >>>
> > > > > >> >>> > Ok, no problem.
> > > > > >> >>> >
> > > > > >> >>> > On Wed, Nov 29, 2017 at 2:59 PM, Stack 
> > > > wrote:
> > > > > >> >>> >
> > > > > >> >>> > > May I get HBASE-18233 into 1.4.0 Andrew? It is in 1.2
> and
> > > 1.3.
> > > > > >> >>> Waiting on
> > > > > >> >>> > > hadoopqa run. Would be good to have it all up and down
> > > > branch-1.
> > > > > >> >>> > > Thanks Sir,
> > > > > >> >>> > > St.Ack
> > > > > >> >>> > >
> > > > > >> >>> > > On Wed, Nov 29, 2017 at 12:38 PM, Peter Somogyi <
> > > > > >> >>> psomo...@cloudera.com>
> > > > > >> >>> > > wrote:
> > > > > >> >>> > >
> > > > > >> >>> > > > HBASE-19188 was just resolved. :)
> > > > > >> >>> > > >
> > > > > >> >>> > > > On Wed, Nov 29, 2017 at 8:12 PM, Andrew Purtell <
> > > > > >> >>> apurt...@apache.org>
> > > > > >> >>> > > > wrote:
> > > > > >> >>> > > >
> > > > > >> >>> > > > > I come back to find HBASE-19188 is a blocker. :-/
> > > > > >> >>> > > > > Need to resolve it
> > > > > >> >>> > > > >
> > > > > >> >>> > > > > On Sat, Nov 18, 2017 at 10:30 AM, Sean Busbey <
> > > > > >> bus...@apache.org
> > > > > >> >>> >
> > > > > >> >>> > > wrote:
> > > > > >> >>> > > > >
> > > > > >> >>> > > > > > thanks for all the work as RM on this Andrew!
> > > > > >> >>> > > > > >
> > > > > >> >>> > > > > > On Sat, Nov 18, 2017 at 12:19 PM, Andrew Purtell
> > > > > >> >>> > > > > >  wrote:
> > > > > >> >>> > > > > > > Everything is in and 

Re: Testing and CI -- Apache Jenkins Builds (WAS -> Re: Testing)

2017-11-30 Thread Stack
On the move over to nightly test runs:

1.2 nightly had a successful build last night after the branch-1
stabilization effort (HBASE-19204) and fixing a few unit test failures. See
build 150
https://builds.apache.org/view/H-L/view/HBase/job/HBase%20Nightly/job/branch-1.2/
It then failed, 151, because of timed out test. Need to dig in. Clean up a
few more unit tests and branch-1.2 is probably ready for a release-cutting.

1.3 has a few flakies. The last build failed because of:

  Test Result (1 failure / ±0)
org.apache.hadoop.hbase.regionserver.TestEncryptionKeyRotation.testCFKeyRotation

Just a little effort should turn 1.3 green.

I was going to disable the 1.4 job,
https://builds.apache.org/view/H-L/view/HBase/job/HBase-1.4/,  in favor of
the 1.4 nightly,
https://builds.apache.org/view/H-L/view/HBase/job/HBase%20Nightly/job/branch-1.4/,
if ok w/ you Andrew Purtell... And move over the branch-1, branch-2, and
master too.

Thanks,
S



On Wed, Nov 29, 2017 at 8:06 AM, Stack  wrote:

> Example of the new nice reporting: vhttps://builds.apache.org/
> view/H-L/view/HBase/job/HBase%20Nightly/job/branch-1.2/
> S
>
> On Wed, Nov 29, 2017 at 8:06 AM, Stack  wrote:
>
>> Note that I have disabled the HBase-1.2-JDK7, HBase-1.2-JDK8,
>> HBase-1.3-JDK7, and HBase-1.3-JDK8 jobs. They have been broken for a good
>> while now. In their place, refer to an ongoing Sean "Nightly" project, an
>> effort he has been at for a while. It does more checking with pretty
>> reports that will help figuring general stability over time. See under
>> https://builds.apache.org/view/H-L/view/HBase/job/HBase%20Nightly/
>> See the nightly builds for 1.2 and 1.3. They have some teething issues
>> still but are almost there. See the 1.2 build from last night. In recent
>> days, the 1.2 branch went from trash-can fire to stable. See how all tests
>> passed in the last build but then we failed generating the src bundle on
>> the end (this is what I mean by 'teething' issue). Will work on fixing this
>> last step and moving over 1.4, etc., in the next few days.
>>
>> FYI,
>> St.Ack
>>
>>
>> On Tue, Nov 7, 2017 at 7:45 AM, Stack  wrote:
>>
>>> On Tue, Nov 7, 2017 at 6:10 AM, Sean Busbey  wrote:
>>>
 > Should I be able to see the machine dir when I look at nightlies
 output?
 > (Was trying to see what else is running).

 Ah. we don't have the same machine sampling on nightly as we do in
 precommit. I am 80% on a patch for HBASE-19189 (run test ad-hoc
 repeatedly)  that includes pulling that information gathering into a
 place where we could also use it in nightly.


>>> Sweet.
>>>
>>>
>>>
 Did we ever figure out how many cores we expect our tests to need? It
 looks like the Hadoop nodes have 8 cores. (with 2 executors that means
 4 is our fair share)


>>> At the end of the thread inquiry I suggested that we don't use enough
>>> cores, that we could up our fork counts and tests would complete in less
>>> time. I wanted to experiment some w/ high fork counts -- 16 or so -- to see
>>> if concurrent running brought on  more failure.
>>>
>>> St.Ack
>>>
>>>
>>>
>>>
 On Tue, Nov 7, 2017 at 8:05 AM, Sean Busbey  wrote:
 > surefire results get zipped up (we were filling the jenkins hosts with
 > old test logs previously) and stored in a file called "test_logs.zip"
 > for each jvm run. So if that happend in the jdk7 run for branch-1.2,
 > it'd be in artifacts -> output-jdk7 -> test_logs.zip.
 >
 > I don't know if the archival process grabs things from surefire that
 > aren't the surefire XML files, but we can update it to do so if it
 > doesn't.
 >
 > On Mon, Nov 6, 2017 at 11:39 PM, Stack  wrote:
 >> I see this in the 1.2 nightly just when it gives up the ghost
 >>
 >> [WARNING] Corrupted STDOUT by directly writing to native stream in
 >> forked JVM 2. See FAQ web page and the dump file
 >> /testptch/hbase/hbase-server/target/surefire-reports/2017-11
 -06T20-11-30_219-jvmRun2.dumpstream
 >>
 >> .. but the pointed to dumpstream doesn't seem to be around post
 build.
 >> I am looking in wrong place?
 >>
 >>
 >> Thanks,
 >>
 >> S
 >>
 >>
 >> On Mon, Nov 6, 2017 at 8:20 PM, Stack  wrote:
 >>
 >>> On Mon, Nov 6, 2017 at 8:35 AM, Sean Busbey 
 wrote:
 >>>
  Given that all of the old post-commit tests have been posting that
  they're failing to JIRAs for what looks like a month, is there any
  reason not to switch to the new tests that also say they're
 failing?
 
 
 >>> No reason.
 >>>
 >>>
 >>>
  The reason HBASE-18467 has been sitting on hold this whole time has
  been because the new nightly branch tests keep complaining about
  

Re: Release 1.4.0 update

2017-11-30 Thread Stack
I pushed HBASE-18233. Thanks for finding the issue and patience waiting on
fix Andrew.
St.Ack

On Thu, Nov 30, 2017 at 5:04 PM, Andrew Purtell  wrote:

> No problem, committing it now
>
> On Thu, Nov 30, 2017 at 4:54 PM, Sergey Soldatov  >
> wrote:
>
> > Andrew,
> >
> > Can we include HBASE-19393 as well? Quite annoying issue and very simple
> > fix.
> >
> > Thanks,
> > Sergey
> >
> > On Thu, Nov 30, 2017 at 3:47 PM, Andrew Purtell 
> > wrote:
> >
> > > Not too late, no
> > >
> > > On Thu, Nov 30, 2017 at 3:31 PM, Stack  wrote:
> > >
> > > > Fix is up if it is not too late Andrew.
> > > > St.Ack
> > > >
> > > > On Thu, Nov 30, 2017 at 1:58 PM, Stack  wrote:
> > > >
> > > > > Andrew, your testing has turned up an issue in HBASE-18233. It is
> > > present
> > > > > in the 1.4 candidate patch and in 1.3. The failure is
> intermittent. I
> > > am
> > > > > working on a fix but want to make sure I have it right. So, I
> > withdraw
> > > my
> > > > > request that 1.4 include it.
> > > > >
> > > > > Thanks,
> > > > > S
> > > > >
> > > > > On Wed, Nov 29, 2017 at 5:14 PM, Andrew Purtell <
> apurt...@apache.org
> > >
> > > > > wrote:
> > > > >
> > > > >> TestGlobalThrottler is a problem stemming from the revert of
> > > HBASE-9465
> > > > >> ​ on branch-1.4. The test came in on HBASE-17314 so I'll also
> revert
> > > > that
> > > > >> from branch-1.4. For more on this see HBASE-19381
> > > > >> ​
> > > > >>
> > > > >> On Wed, Nov 29, 2017 at 5:00 PM, Andrew Purtell <
> > apurt...@apache.org>
> > > > >> wrote:
> > > > >>
> > > > >> > The TestEndToEndSplitTransaction failure will be fixed by
> > > HBASE-19379.
> > > > >> >
> > > > >> > The TestGlobalThrottler issue is a hang, which is probably why
> it
> > > > >> slipped
> > > > >> > through the cracks. I went back 32 commits from head and it was
> > > still
> > > > >> > stuck. 64 commits back it's good. Somewhere in between. Will get
> > to
> > > > the
> > > > >> > offending commit shortly.
> > > > >> >
> > > > >> >
> > > > >> > On Wed, Nov 29, 2017 at 3:56 PM, Andrew Purtell <
> > > apurt...@apache.org>
> > > > >> > wrote:
> > > > >> >
> > > > >> >> Thanks. I'll take a look. They were passing for me before I
> went
> > > out
> > > > on
> > > > >> >> vacation.
> > > > >> >>
> > > > >> >>
> > > > >> >> On Wed, Nov 29, 2017 at 3:52 PM, Stack 
> wrote:
> > > > >> >>
> > > > >> >>> Thanks.
> > > > >> >>>
> > > > >> >>> BTW, I noticed this morning that TestGlobalThrottler and
> > > > >> >>> TestEndToEndSplitTransaction
> > > > >> >>> fail locally for me and up on jenkins as part of hadoopqa runs
> > and
> > > > on
> > > > >> >>> recent 1.4 runs.
> > > > >> >>>
> > > > >> >>> I tried to poke at why. They seem fine in 1.2, 1.3, and 2.0.
> Got
> > > > >> >>> distracted
> > > > >> >>> and got no further than this
> > > > >> >>>
> > > > >> >>> S
> > > > >> >>>
> > > > >> >>> On Wed, Nov 29, 2017 at 3:00 PM, Andrew Purtell <
> > > > apurt...@apache.org>
> > > > >> >>> wrote:
> > > > >> >>>
> > > > >> >>> > Ok, no problem.
> > > > >> >>> >
> > > > >> >>> > On Wed, Nov 29, 2017 at 2:59 PM, Stack 
> > > wrote:
> > > > >> >>> >
> > > > >> >>> > > May I get HBASE-18233 into 1.4.0 Andrew? It is in 1.2 and
> > 1.3.
> > > > >> >>> Waiting on
> > > > >> >>> > > hadoopqa run. Would be good to have it all up and down
> > > branch-1.
> > > > >> >>> > > Thanks Sir,
> > > > >> >>> > > St.Ack
> > > > >> >>> > >
> > > > >> >>> > > On Wed, Nov 29, 2017 at 12:38 PM, Peter Somogyi <
> > > > >> >>> psomo...@cloudera.com>
> > > > >> >>> > > wrote:
> > > > >> >>> > >
> > > > >> >>> > > > HBASE-19188 was just resolved. :)
> > > > >> >>> > > >
> > > > >> >>> > > > On Wed, Nov 29, 2017 at 8:12 PM, Andrew Purtell <
> > > > >> >>> apurt...@apache.org>
> > > > >> >>> > > > wrote:
> > > > >> >>> > > >
> > > > >> >>> > > > > I come back to find HBASE-19188 is a blocker. :-/
> > > > >> >>> > > > > Need to resolve it
> > > > >> >>> > > > >
> > > > >> >>> > > > > On Sat, Nov 18, 2017 at 10:30 AM, Sean Busbey <
> > > > >> bus...@apache.org
> > > > >> >>> >
> > > > >> >>> > > wrote:
> > > > >> >>> > > > >
> > > > >> >>> > > > > > thanks for all the work as RM on this Andrew!
> > > > >> >>> > > > > >
> > > > >> >>> > > > > > On Sat, Nov 18, 2017 at 12:19 PM, Andrew Purtell
> > > > >> >>> > > > > >  wrote:
> > > > >> >>> > > > > > > Everything is in and ready to go. I'm out next
> week
> > > for
> > > > >> the
> > > > >> >>> > > > > Thanksgiving
> > > > >> >>> > > > > > > holiday, but will be back first week in December.
> > > > >> >>> > > > > > >
> > > > >> >>> > > > > > > Here is what I anticipate:
> > > > >> >>> > > > > > >
> > > > >> >>> > > > > > >- December 4
> > > > >> >>> > > > > > >   - 1.4.0 RC0 binaries will be available.
> > > > >> >>> > > > > > >   - Voting begins.
> > > > >> >>> > > > > > >   - Preflight checks 

Re: Release 1.4.0 update

2017-11-30 Thread Andrew Purtell
No problem, committing it now

On Thu, Nov 30, 2017 at 4:54 PM, Sergey Soldatov 
wrote:

> Andrew,
>
> Can we include HBASE-19393 as well? Quite annoying issue and very simple
> fix.
>
> Thanks,
> Sergey
>
> On Thu, Nov 30, 2017 at 3:47 PM, Andrew Purtell 
> wrote:
>
> > Not too late, no
> >
> > On Thu, Nov 30, 2017 at 3:31 PM, Stack  wrote:
> >
> > > Fix is up if it is not too late Andrew.
> > > St.Ack
> > >
> > > On Thu, Nov 30, 2017 at 1:58 PM, Stack  wrote:
> > >
> > > > Andrew, your testing has turned up an issue in HBASE-18233. It is
> > present
> > > > in the 1.4 candidate patch and in 1.3. The failure is intermittent. I
> > am
> > > > working on a fix but want to make sure I have it right. So, I
> withdraw
> > my
> > > > request that 1.4 include it.
> > > >
> > > > Thanks,
> > > > S
> > > >
> > > > On Wed, Nov 29, 2017 at 5:14 PM, Andrew Purtell  >
> > > > wrote:
> > > >
> > > >> TestGlobalThrottler is a problem stemming from the revert of
> > HBASE-9465
> > > >> ​ on branch-1.4. The test came in on HBASE-17314 so I'll also revert
> > > that
> > > >> from branch-1.4. For more on this see HBASE-19381
> > > >> ​
> > > >>
> > > >> On Wed, Nov 29, 2017 at 5:00 PM, Andrew Purtell <
> apurt...@apache.org>
> > > >> wrote:
> > > >>
> > > >> > The TestEndToEndSplitTransaction failure will be fixed by
> > HBASE-19379.
> > > >> >
> > > >> > The TestGlobalThrottler issue is a hang, which is probably why it
> > > >> slipped
> > > >> > through the cracks. I went back 32 commits from head and it was
> > still
> > > >> > stuck. 64 commits back it's good. Somewhere in between. Will get
> to
> > > the
> > > >> > offending commit shortly.
> > > >> >
> > > >> >
> > > >> > On Wed, Nov 29, 2017 at 3:56 PM, Andrew Purtell <
> > apurt...@apache.org>
> > > >> > wrote:
> > > >> >
> > > >> >> Thanks. I'll take a look. They were passing for me before I went
> > out
> > > on
> > > >> >> vacation.
> > > >> >>
> > > >> >>
> > > >> >> On Wed, Nov 29, 2017 at 3:52 PM, Stack  wrote:
> > > >> >>
> > > >> >>> Thanks.
> > > >> >>>
> > > >> >>> BTW, I noticed this morning that TestGlobalThrottler and
> > > >> >>> TestEndToEndSplitTransaction
> > > >> >>> fail locally for me and up on jenkins as part of hadoopqa runs
> and
> > > on
> > > >> >>> recent 1.4 runs.
> > > >> >>>
> > > >> >>> I tried to poke at why. They seem fine in 1.2, 1.3, and 2.0. Got
> > > >> >>> distracted
> > > >> >>> and got no further than this
> > > >> >>>
> > > >> >>> S
> > > >> >>>
> > > >> >>> On Wed, Nov 29, 2017 at 3:00 PM, Andrew Purtell <
> > > apurt...@apache.org>
> > > >> >>> wrote:
> > > >> >>>
> > > >> >>> > Ok, no problem.
> > > >> >>> >
> > > >> >>> > On Wed, Nov 29, 2017 at 2:59 PM, Stack 
> > wrote:
> > > >> >>> >
> > > >> >>> > > May I get HBASE-18233 into 1.4.0 Andrew? It is in 1.2 and
> 1.3.
> > > >> >>> Waiting on
> > > >> >>> > > hadoopqa run. Would be good to have it all up and down
> > branch-1.
> > > >> >>> > > Thanks Sir,
> > > >> >>> > > St.Ack
> > > >> >>> > >
> > > >> >>> > > On Wed, Nov 29, 2017 at 12:38 PM, Peter Somogyi <
> > > >> >>> psomo...@cloudera.com>
> > > >> >>> > > wrote:
> > > >> >>> > >
> > > >> >>> > > > HBASE-19188 was just resolved. :)
> > > >> >>> > > >
> > > >> >>> > > > On Wed, Nov 29, 2017 at 8:12 PM, Andrew Purtell <
> > > >> >>> apurt...@apache.org>
> > > >> >>> > > > wrote:
> > > >> >>> > > >
> > > >> >>> > > > > I come back to find HBASE-19188 is a blocker. :-/
> > > >> >>> > > > > Need to resolve it
> > > >> >>> > > > >
> > > >> >>> > > > > On Sat, Nov 18, 2017 at 10:30 AM, Sean Busbey <
> > > >> bus...@apache.org
> > > >> >>> >
> > > >> >>> > > wrote:
> > > >> >>> > > > >
> > > >> >>> > > > > > thanks for all the work as RM on this Andrew!
> > > >> >>> > > > > >
> > > >> >>> > > > > > On Sat, Nov 18, 2017 at 12:19 PM, Andrew Purtell
> > > >> >>> > > > > >  wrote:
> > > >> >>> > > > > > > Everything is in and ready to go. I'm out next week
> > for
> > > >> the
> > > >> >>> > > > > Thanksgiving
> > > >> >>> > > > > > > holiday, but will be back first week in December.
> > > >> >>> > > > > > >
> > > >> >>> > > > > > > Here is what I anticipate:
> > > >> >>> > > > > > >
> > > >> >>> > > > > > >- December 4
> > > >> >>> > > > > > >   - 1.4.0 RC0 binaries will be available.
> > > >> >>> > > > > > >   - Voting begins.
> > > >> >>> > > > > > >   - Preflight checks will include RAT check,
> > release
> > > >> >>> audits,
> > > >> >>> > > and
> > > >> >>> > > > 25
> > > >> >>> > > > > > >   iterations of the unit test suite.
> > > >> >>> > > > > > >- December 5 - 8
> > > >> >>> > > > > > >   - 24 hours ITBLL
> > > >> >>> > > > > > >   - PE and YCSB on cluster perf comparison with
> > 1.2
> > > >> >>> > > > > > >   - PE and YCSB single server profiling with
> JFR,
> > > >> >>> comparison
> > > >> >>> > > with
> > > >> 

[jira] [Created] (HBASE-19393) HTTP 413 FULL head while accessing HBase UI using SSL.

2017-11-30 Thread Sergey Soldatov (JIRA)
Sergey Soldatov created HBASE-19393:
---

 Summary: HTTP 413 FULL head while accessing HBase UI using SSL. 
 Key: HBASE-19393
 URL: https://issues.apache.org/jira/browse/HBASE-19393
 Project: HBase
  Issue Type: Bug
  Components: UI
Affects Versions: 1.4.0
 Environment: SSL enabled for UI/REST. 
Reporter: Sergey Soldatov
Assignee: Sergey Soldatov
 Fix For: 1.4.0


For REST/UI we are using 64Kb header buffer size instead of the jetty default 
6kb (?). But it comes that we set it only for _http_ protocol, but not for 
_https_. So if SSL is enabled it's quite easy to get HTTP 413 error. Not 
relevant to branch-2 nor master because it's fixed by HBASE-12894



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


Re: Release 1.4.0 update

2017-11-30 Thread Stack
Fix is up if it is not too late Andrew.
St.Ack

On Thu, Nov 30, 2017 at 1:58 PM, Stack  wrote:

> Andrew, your testing has turned up an issue in HBASE-18233. It is present
> in the 1.4 candidate patch and in 1.3. The failure is intermittent. I am
> working on a fix but want to make sure I have it right. So, I withdraw my
> request that 1.4 include it.
>
> Thanks,
> S
>
> On Wed, Nov 29, 2017 at 5:14 PM, Andrew Purtell 
> wrote:
>
>> TestGlobalThrottler is a problem stemming from the revert of HBASE-9465
>> ​ on branch-1.4. The test came in on HBASE-17314 so I'll also revert that
>> from branch-1.4. For more on this see HBASE-19381
>> ​
>>
>> On Wed, Nov 29, 2017 at 5:00 PM, Andrew Purtell 
>> wrote:
>>
>> > The TestEndToEndSplitTransaction failure will be fixed by HBASE-19379.
>> >
>> > The TestGlobalThrottler issue is a hang, which is probably why it
>> slipped
>> > through the cracks. I went back 32 commits from head and it was still
>> > stuck. 64 commits back it's good. Somewhere in between. Will get to the
>> > offending commit shortly.
>> >
>> >
>> > On Wed, Nov 29, 2017 at 3:56 PM, Andrew Purtell 
>> > wrote:
>> >
>> >> Thanks. I'll take a look. They were passing for me before I went out on
>> >> vacation.
>> >>
>> >>
>> >> On Wed, Nov 29, 2017 at 3:52 PM, Stack  wrote:
>> >>
>> >>> Thanks.
>> >>>
>> >>> BTW, I noticed this morning that TestGlobalThrottler and
>> >>> TestEndToEndSplitTransaction
>> >>> fail locally for me and up on jenkins as part of hadoopqa runs and on
>> >>> recent 1.4 runs.
>> >>>
>> >>> I tried to poke at why. They seem fine in 1.2, 1.3, and 2.0. Got
>> >>> distracted
>> >>> and got no further than this
>> >>>
>> >>> S
>> >>>
>> >>> On Wed, Nov 29, 2017 at 3:00 PM, Andrew Purtell 
>> >>> wrote:
>> >>>
>> >>> > Ok, no problem.
>> >>> >
>> >>> > On Wed, Nov 29, 2017 at 2:59 PM, Stack  wrote:
>> >>> >
>> >>> > > May I get HBASE-18233 into 1.4.0 Andrew? It is in 1.2 and 1.3.
>> >>> Waiting on
>> >>> > > hadoopqa run. Would be good to have it all up and down branch-1.
>> >>> > > Thanks Sir,
>> >>> > > St.Ack
>> >>> > >
>> >>> > > On Wed, Nov 29, 2017 at 12:38 PM, Peter Somogyi <
>> >>> psomo...@cloudera.com>
>> >>> > > wrote:
>> >>> > >
>> >>> > > > HBASE-19188 was just resolved. :)
>> >>> > > >
>> >>> > > > On Wed, Nov 29, 2017 at 8:12 PM, Andrew Purtell <
>> >>> apurt...@apache.org>
>> >>> > > > wrote:
>> >>> > > >
>> >>> > > > > I come back to find HBASE-19188 is a blocker. :-/
>> >>> > > > > Need to resolve it
>> >>> > > > >
>> >>> > > > > On Sat, Nov 18, 2017 at 10:30 AM, Sean Busbey <
>> bus...@apache.org
>> >>> >
>> >>> > > wrote:
>> >>> > > > >
>> >>> > > > > > thanks for all the work as RM on this Andrew!
>> >>> > > > > >
>> >>> > > > > > On Sat, Nov 18, 2017 at 12:19 PM, Andrew Purtell
>> >>> > > > > >  wrote:
>> >>> > > > > > > Everything is in and ready to go. I'm out next week for
>> the
>> >>> > > > > Thanksgiving
>> >>> > > > > > > holiday, but will be back first week in December.
>> >>> > > > > > >
>> >>> > > > > > > Here is what I anticipate:
>> >>> > > > > > >
>> >>> > > > > > >- December 4
>> >>> > > > > > >   - 1.4.0 RC0 binaries will be available.
>> >>> > > > > > >   - Voting begins.
>> >>> > > > > > >   - Preflight checks will include RAT check, release
>> >>> audits,
>> >>> > > and
>> >>> > > > 25
>> >>> > > > > > >   iterations of the unit test suite.
>> >>> > > > > > >- December 5 - 8
>> >>> > > > > > >   - 24 hours ITBLL
>> >>> > > > > > >   - PE and YCSB on cluster perf comparison with 1.2
>> >>> > > > > > >   - PE and YCSB single server profiling with JFR,
>> >>> comparison
>> >>> > > with
>> >>> > > > > 1.2
>> >>> > > > > > >- December 11
>> >>> > > > > > >   - Voting concludes
>> >>> > > > > > >   - Release, or RC1 depending on testing outcome
>> >>> > > > > > >   - December 18
>> >>> > > > > > >   - RC1 voting concludes and release, if we need a RC1
>> >>> > > > > > >
>> >>> > > > > > >
>> >>> > > > > > > From now until the 1.4.0 release, please refrain from
>> >>> committing
>> >>> > > > > > > potentially destabilizing changes or changes to public
>> APIs
>> >>> to
>> >>> > > > > > branch-1.4.
>> >>> > > > > > >
>> >>> > > > > > > On Mon, Nov 13, 2017 at 10:33 AM, Andrew Purtell <
>> >>> > > > > > andrew.purt...@gmail.com>
>> >>> > > > > > > wrote:
>> >>> > > > > > >
>> >>> > > > > > >> On HBASE-19232 we discuss testing the shaded client using
>> >>> YCSB,
>> >>> > so
>> >>> > > > > I'll
>> >>> > > > > > >> use it to sanity check the shaded client as well as
>> >>> complete a
>> >>> > > perf
>> >>> > > > > > >> comparison with 1.2.
>> >>> > > > > > >>
>> >>> > > > > > >>
>> >>> > > > > > >>
>> >>> > > > > > >> On Sat, Nov 11, 2017 at 9:53 AM, Andrew Purtell <
>> >>> > > > > > andrew.purt...@gmail.com>
>> >>> > > > > > >> wrote:

[jira] [Created] (HBASE-19392) TestReplicaWithCluster#testReplicaGetWithPrimaryAndMetaDown failure in master

2017-11-30 Thread huaxiang sun (JIRA)
huaxiang sun created HBASE-19392:


 Summary: 
TestReplicaWithCluster#testReplicaGetWithPrimaryAndMetaDown failure in master
 Key: HBASE-19392
 URL: https://issues.apache.org/jira/browse/HBASE-19392
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 2.0.0-alpha-4, 3.0.0
Reporter: huaxiang sun
Assignee: huaxiang sun
Priority: Minor


Please see the flakey test list.

https://builds.apache.org/job/HBASE-Find-Flaky-Tests/lastSuccessfulBuild/artifact/dashboard.html

client.TestReplicaWithCluster   96.7% (29 / 30) 29 / 0 / 0  
show/hide



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (HBASE-19391) Calling HRegion#initializeRegionInternals from a region replica can still re-create a region directory

2017-11-30 Thread Esteban Gutierrez (JIRA)
Esteban Gutierrez created HBASE-19391:
-

 Summary: Calling HRegion#initializeRegionInternals from a region 
replica can still re-create a region directory
 Key: HBASE-19391
 URL: https://issues.apache.org/jira/browse/HBASE-19391
 Project: HBase
  Issue Type: Bug
Reporter: Esteban Gutierrez
Assignee: Esteban Gutierrez


This is a follow up from HBASE-18024. There stills a chance that attempting to 
open a region that is not the default region replica can still create a GC'd 
region directory by the CatalogJanitor causing inconsistencies with hbck.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (HBASE-19390) Revert to older version of Jetty 9.3

2017-11-30 Thread Esteban Gutierrez (JIRA)
Esteban Gutierrez created HBASE-19390:
-

 Summary: Revert to older version of Jetty 9.3 
 Key: HBASE-19390
 URL: https://issues.apache.org/jira/browse/HBASE-19390
 Project: HBase
  Issue Type: Bug
Reporter: Esteban Gutierrez


As discussed in HBASE-19256 we will have to temporarily revert to Jetty 9.3 due 
existing issues with 9.4 and Hadoop3. Once HBASE-19256 is resolved we can 
revert to 9.4.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (HBASE-19389) RS's handlers are all busy when writing many columns (more than 1000 columns)

2017-11-30 Thread Chance Li (JIRA)
Chance Li created HBASE-19389:
-

 Summary: RS's handlers are all busy when writing many columns 
(more than 1000 columns) 
 Key: HBASE-19389
 URL: https://issues.apache.org/jira/browse/HBASE-19389
 Project: HBase
  Issue Type: Improvement
  Components: hbase
Affects Versions: 2.0.0
 Environment: 2000+ Region Servers
PCI-E ssd
Reporter: Chance Li
Assignee: Chance Li
Priority: Minor
 Fix For: 2.0.0, 3.0.0


In a large cluster, with a large number of clients, we found the RS's handlers 
are all busy sometimes. And after investigation we found the root cause is 
about CSLM, such as compare function heavy load. We reviewed the related WALs, 
and found that there were many columns (more than 1000 columns) were writing at 
that time.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)