date:20180925

[jira] [Updated] (HBASE-18451) PeriodicMemstoreFlusher should inspect the queue before adding a delayed flush request

2018-09-25 Thread Xu Cang (JIRA)



 [ 
https://issues.apache.org/jira/browse/HBASE-18451?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xu Cang updated HBASE-18451:

Attachment: HBASE-18451.branch-1.002.patch

> PeriodicMemstoreFlusher should inspect the queue before adding a delayed 
> flush request
> --
>
> Key: HBASE-18451
> URL: https://issues.apache.org/jira/browse/HBASE-18451
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Affects Versions: 2.0.0-alpha-1
>Reporter: Jean-Marc Spaggiari
>Assignee: Xu Cang
>Priority: Major
> Attachments: HBASE-18451.branch-1.001.patch, 
> HBASE-18451.branch-1.002.patch, HBASE-18451.master.002.patch, 
> HBASE-18451.master.003.patch, HBASE-18451.master.patch
>
>
> If you run a big job every 4 hours, impacting many tables (they have 150 
> regions per server), ad the end all the regions might have some data to be 
> flushed, and we want, after one hour, trigger a periodic flush. That's 
> totally fine.
> Now, to avoid a flush storm, when we detect a region to be flushed, we add a 
> "randomDelay" to the delayed flush, that way we spread them away.
> RANGE_OF_DELAY is 5 minutes. So we spread the flush over the next 5 minutes, 
> which is very good.
> However, because we don't check if there is already a request in the queue, 
> 10 seconds after, we create a new request, with a new randomDelay.
> If you generate a randomDelay every 10 seconds, at some point, you will end 
> up having a small one, and the flush will be triggered almost immediatly.
> As a result, instead of spreading all the flush within the next 5 minutes, 
> you end-up getting them all way more quickly. Like within the first minute. 
> Which not only feed the queue to to many flush requests, but also defeats the 
> purpose of the randomDelay.
> {code}
> @Override
> protected void chore() {
>   final StringBuffer whyFlush = new StringBuffer();
>   for (Region r : this.server.onlineRegions.values()) {
> if (r == null) continue;
> if (((HRegion)r).shouldFlush(whyFlush)) {
>   FlushRequester requester = server.getFlushRequester();
>   if (requester != null) {
> long randomDelay = RandomUtils.nextInt(RANGE_OF_DELAY) + 
> MIN_DELAY_TIME;
> LOG.info(getName() + " requesting flush of " +
>   r.getRegionInfo().getRegionNameAsString() + " because " +
>   whyFlush.toString() +
>   " after random delay " + randomDelay + "ms");
> //Throttle the flushes by putting a delay. If we don't throttle, 
> and there
> //is a balanced write-load on the regions in a table, we might 
> end up
> //overwhelming the filesystem with too many flushes at once.
> requester.requestDelayedFlush(r, randomDelay, false);
>   }
> }
>   }
> }
> {code}
> {code}
> 2017-07-24 18:44:33,338 INFO 
> org.apache.hadoop.hbase.regionserver.HRegionServer: 
> hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting 
> flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f 
> has an old edit so flush to free WALs after random delay 270785ms
> 2017-07-24 18:44:43,328 INFO 
> org.apache.hadoop.hbase.regionserver.HRegionServer: 
> hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting 
> flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f 
> has an old edit so flush to free WALs after random delay 200143ms
> 2017-07-24 18:44:53,954 INFO 
> org.apache.hadoop.hbase.regionserver.HRegionServer: 
> hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting 
> flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f 
> has an old edit so flush to free WALs after random delay 191082ms
> 2017-07-24 18:45:03,528 INFO 
> org.apache.hadoop.hbase.regionserver.HRegionServer: 
> hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting 
> flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f 
> has an old edit so flush to free WALs after random delay 92532ms
> 2017-07-24 18:45:14,201 INFO 
> org.apache.hadoop.hbase.regionserver.HRegionServer: 
> hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting 
> flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f 
> has an old edit so flush to free WALs after random delay 238780ms
> 2017-07-24 18:45:24,195 INFO 
> org.apache.hadoop.hbase.regionserver.HRegionServer: 
> hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting 
> flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f 
> has an old edit so flush to free WALs after random delay 35390ms
> 2017-07-24 18:4

[jira] [Commented] (HBASE-18451) PeriodicMemstoreFlusher should inspect the queue before adding a delayed flush request

2018-09-25 Thread Xu Cang (JIRA)



[ 
https://issues.apache.org/jira/browse/HBASE-18451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16626938#comment-16626938
 ] 

Xu Cang commented on HBASE-18451:
-

Uploaded another set of pathces to address Mike's review comments.

 

Also ran previous (branch-1) failed unit test locally and it is passing as 
below:

[INFO] Running org.apache.hadoop.hbase.util.TestHBaseFsck
[WARNING] Tests run: 59, Failures: 0, Errors: 0, Skipped: 1, Time elapsed: 
331.094 s - in org.apache.hadoop.hbase.util.TestHBaseFsck
[INFO]
[INFO] Results:
[INFO]
[WARNING] Tests run: 59, Failures: 0, Errors: 0, Skipped: 1

> PeriodicMemstoreFlusher should inspect the queue before adding a delayed 
> flush request
> --
>
> Key: HBASE-18451
> URL: https://issues.apache.org/jira/browse/HBASE-18451
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Affects Versions: 2.0.0-alpha-1
>Reporter: Jean-Marc Spaggiari
>Assignee: Xu Cang
>Priority: Major
> Attachments: HBASE-18451.branch-1.001.patch, 
> HBASE-18451.branch-1.002.patch, HBASE-18451.master.002.patch, 
> HBASE-18451.master.003.patch, HBASE-18451.master.patch
>
>
> If you run a big job every 4 hours, impacting many tables (they have 150 
> regions per server), ad the end all the regions might have some data to be 
> flushed, and we want, after one hour, trigger a periodic flush. That's 
> totally fine.
> Now, to avoid a flush storm, when we detect a region to be flushed, we add a 
> "randomDelay" to the delayed flush, that way we spread them away.
> RANGE_OF_DELAY is 5 minutes. So we spread the flush over the next 5 minutes, 
> which is very good.
> However, because we don't check if there is already a request in the queue, 
> 10 seconds after, we create a new request, with a new randomDelay.
> If you generate a randomDelay every 10 seconds, at some point, you will end 
> up having a small one, and the flush will be triggered almost immediatly.
> As a result, instead of spreading all the flush within the next 5 minutes, 
> you end-up getting them all way more quickly. Like within the first minute. 
> Which not only feed the queue to to many flush requests, but also defeats the 
> purpose of the randomDelay.
> {code}
> @Override
> protected void chore() {
>   final StringBuffer whyFlush = new StringBuffer();
>   for (Region r : this.server.onlineRegions.values()) {
> if (r == null) continue;
> if (((HRegion)r).shouldFlush(whyFlush)) {
>   FlushRequester requester = server.getFlushRequester();
>   if (requester != null) {
> long randomDelay = RandomUtils.nextInt(RANGE_OF_DELAY) + 
> MIN_DELAY_TIME;
> LOG.info(getName() + " requesting flush of " +
>   r.getRegionInfo().getRegionNameAsString() + " because " +
>   whyFlush.toString() +
>   " after random delay " + randomDelay + "ms");
> //Throttle the flushes by putting a delay. If we don't throttle, 
> and there
> //is a balanced write-load on the regions in a table, we might 
> end up
> //overwhelming the filesystem with too many flushes at once.
> requester.requestDelayedFlush(r, randomDelay, false);
>   }
> }
>   }
> }
> {code}
> {code}
> 2017-07-24 18:44:33,338 INFO 
> org.apache.hadoop.hbase.regionserver.HRegionServer: 
> hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting 
> flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f 
> has an old edit so flush to free WALs after random delay 270785ms
> 2017-07-24 18:44:43,328 INFO 
> org.apache.hadoop.hbase.regionserver.HRegionServer: 
> hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting 
> flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f 
> has an old edit so flush to free WALs after random delay 200143ms
> 2017-07-24 18:44:53,954 INFO 
> org.apache.hadoop.hbase.regionserver.HRegionServer: 
> hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting 
> flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f 
> has an old edit so flush to free WALs after random delay 191082ms
> 2017-07-24 18:45:03,528 INFO 
> org.apache.hadoop.hbase.regionserver.HRegionServer: 
> hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting 
> flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f 
> has an old edit so flush to free WALs after random delay 92532ms
> 2017-07-24 18:45:14,201 INFO 
> org.apache.hadoop.hbase.regionserver.HRegionServer: 
> hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting 
> flush of testflush,,150093264912

[jira] [Comment Edited] (HBASE-18451) PeriodicMemstoreFlusher should inspect the queue before adding a delayed flush request

2018-09-25 Thread Xu Cang (JIRA)



[ 
https://issues.apache.org/jira/browse/HBASE-18451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16626938#comment-16626938
 ] 

Xu Cang edited comment on HBASE-18451 at 9/25/18 7:47 AM:
--

Uploaded another set of patches to address Mike's review comments.

 

Also ran previous (branch-1) failed unit test locally and it is passing as 
below:

[INFO] Running org.apache.hadoop.hbase.util.TestHBaseFsck
 [WARNING] Tests run: 59, Failures: 0, Errors: 0, Skipped: 1, Time elapsed: 
331.094 s - in org.apache.hadoop.hbase.util.TestHBaseFsck
 [INFO]
 [INFO] Results:
 [INFO]
 [WARNING] Tests run: 59, Failures: 0, Errors: 0, Skipped: 1


was (Author: xucang):
Uploaded another set of pathces to address Mike's review comments.

 

Also ran previous (branch-1) failed unit test locally and it is passing as 
below:

[INFO] Running org.apache.hadoop.hbase.util.TestHBaseFsck
[WARNING] Tests run: 59, Failures: 0, Errors: 0, Skipped: 1, Time elapsed: 
331.094 s - in org.apache.hadoop.hbase.util.TestHBaseFsck
[INFO]
[INFO] Results:
[INFO]
[WARNING] Tests run: 59, Failures: 0, Errors: 0, Skipped: 1

> PeriodicMemstoreFlusher should inspect the queue before adding a delayed 
> flush request
> --
>
> Key: HBASE-18451
> URL: https://issues.apache.org/jira/browse/HBASE-18451
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Affects Versions: 2.0.0-alpha-1
>Reporter: Jean-Marc Spaggiari
>Assignee: Xu Cang
>Priority: Major
> Attachments: HBASE-18451.branch-1.001.patch, 
> HBASE-18451.branch-1.002.patch, HBASE-18451.master.002.patch, 
> HBASE-18451.master.003.patch, HBASE-18451.master.patch
>
>
> If you run a big job every 4 hours, impacting many tables (they have 150 
> regions per server), ad the end all the regions might have some data to be 
> flushed, and we want, after one hour, trigger a periodic flush. That's 
> totally fine.
> Now, to avoid a flush storm, when we detect a region to be flushed, we add a 
> "randomDelay" to the delayed flush, that way we spread them away.
> RANGE_OF_DELAY is 5 minutes. So we spread the flush over the next 5 minutes, 
> which is very good.
> However, because we don't check if there is already a request in the queue, 
> 10 seconds after, we create a new request, with a new randomDelay.
> If you generate a randomDelay every 10 seconds, at some point, you will end 
> up having a small one, and the flush will be triggered almost immediatly.
> As a result, instead of spreading all the flush within the next 5 minutes, 
> you end-up getting them all way more quickly. Like within the first minute. 
> Which not only feed the queue to to many flush requests, but also defeats the 
> purpose of the randomDelay.
> {code}
> @Override
> protected void chore() {
>   final StringBuffer whyFlush = new StringBuffer();
>   for (Region r : this.server.onlineRegions.values()) {
> if (r == null) continue;
> if (((HRegion)r).shouldFlush(whyFlush)) {
>   FlushRequester requester = server.getFlushRequester();
>   if (requester != null) {
> long randomDelay = RandomUtils.nextInt(RANGE_OF_DELAY) + 
> MIN_DELAY_TIME;
> LOG.info(getName() + " requesting flush of " +
>   r.getRegionInfo().getRegionNameAsString() + " because " +
>   whyFlush.toString() +
>   " after random delay " + randomDelay + "ms");
> //Throttle the flushes by putting a delay. If we don't throttle, 
> and there
> //is a balanced write-load on the regions in a table, we might 
> end up
> //overwhelming the filesystem with too many flushes at once.
> requester.requestDelayedFlush(r, randomDelay, false);
>   }
> }
>   }
> }
> {code}
> {code}
> 2017-07-24 18:44:33,338 INFO 
> org.apache.hadoop.hbase.regionserver.HRegionServer: 
> hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting 
> flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f 
> has an old edit so flush to free WALs after random delay 270785ms
> 2017-07-24 18:44:43,328 INFO 
> org.apache.hadoop.hbase.regionserver.HRegionServer: 
> hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting 
> flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f 
> has an old edit so flush to free WALs after random delay 200143ms
> 2017-07-24 18:44:53,954 INFO 
> org.apache.hadoop.hbase.regionserver.HRegionServer: 
> hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting 
> flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f 
> has an old edit so flush to free WALs after random delay 191082ms
> 2

[jira] [Commented] (HBASE-21224) Handle compaction queue duplication

2018-09-25 Thread Xu Cang (JIRA)



[ 
https://issues.apache.org/jira/browse/HBASE-21224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16627008#comment-16627008
 ] 

Xu Cang commented on HBASE-21224:
-

[~allan163] Thanks for the suggestion and tips. Sounds like a good starting 
point. I will try to do some experimental changes and see how that helps. 

> Handle compaction queue duplication
> ---
>
> Key: HBASE-21224
> URL: https://issues.apache.org/jira/browse/HBASE-21224
> Project: HBase
>  Issue Type: Improvement
>  Components: Compaction
>Reporter: Xu Cang
>Priority: Minor
>
> Mentioned by [~allan163] that we may want to handle compaction queue 
> duplication in this Jira https://issues.apache.org/jira/browse/HBASE-18451 
> Creating this item for further assessment and discussion.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Assigned] (HBASE-21224) Handle compaction queue duplication

2018-09-25 Thread Xu Cang (JIRA)



 [ 
https://issues.apache.org/jira/browse/HBASE-21224?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xu Cang reassigned HBASE-21224:
---

Assignee: Xu Cang

> Handle compaction queue duplication
> ---
>
> Key: HBASE-21224
> URL: https://issues.apache.org/jira/browse/HBASE-21224
> Project: HBase
>  Issue Type: Improvement
>  Components: Compaction
>Reporter: Xu Cang
>Assignee: Xu Cang
>Priority: Minor
>
> Mentioned by [~allan163] that we may want to handle compaction queue 
> duplication in this Jira https://issues.apache.org/jira/browse/HBASE-18451 
> Creating this item for further assessment and discussion.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HBASE-21217) Revisit the executeProcedure method for open/close region

2018-09-25 Thread Duo Zhang (JIRA)



[ 
https://issues.apache.org/jira/browse/HBASE-21217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16627095#comment-16627095
 ] 

Duo Zhang commented on HBASE-21217:
---

Let me commit.

> Revisit the executeProcedure method for open/close region
> -
>
> Key: HBASE-21217
> URL: https://issues.apache.org/jira/browse/HBASE-21217
> Project: HBase
>  Issue Type: Sub-task
>  Components: amv2, proc-v2
>Reporter: Duo Zhang
>Assignee: Duo Zhang
>Priority: Critical
> Fix For: 3.0.0, 2.2.0
>
> Attachments: HBASE-21217-v1.patch, HBASE-21217-v2.patch, 
> HBASE-21217.patch
>
>
> Currently we just call openRegion and closeRegion directly, which is a bit 
> buggy. For example, in order to not fail all the open region requests while 
> there is only one failure, we will catch the exception and set a flag in the 
> return value. But for executeProcedures call, the return value will be 
> ignored, and we expect the openRegion method will always call 
> reportRegionStateTransition to report the failure but in fact it does not...
> And after HBASE-20881, we can confirm that the race could happen, where we 
> send a close request to a region which is opening(HBASE-21199), and vice 
> visa. So I think here we need to revisit the implementation of 
> executeProcedures to make it more stable.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HBASE-18451) PeriodicMemstoreFlusher should inspect the queue before adding a delayed flush request

2018-09-25 Thread Hadoop QA (JIRA)



[ 
https://issues.apache.org/jira/browse/HBASE-18451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16627117#comment-16627117
 ] 

Hadoop QA commented on HBASE-18451:
---

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
11s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green}  0m  
0s{color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  5m 
42s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
59s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
18s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  4m 
31s{color} | {color:green} branch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m  
2s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
31s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  5m 
12s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
53s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m 
53s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  1m 
16s{color} | {color:red} hbase-server: The patch generated 1 new + 145 
unchanged - 0 fixed = 146 total (was 145) {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  4m 
24s{color} | {color:green} patch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 
11m 12s{color} | {color:green} Patch does not cause any errors with Hadoop 
2.7.4 or 3.0.0. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
15s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
30s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red}134m 44s{color} 
| {color:red} hbase-server in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
35s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}178m 48s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.hbase.master.procedure.TestDisableTableProcedure |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:b002b0b |
| JIRA Issue | HBASE-18451 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12941177/HBASE-18451.master.003.patch
 |
| Optional Tests |  asflicense  javac  javadoc  unit  findbugs  shadedjars  
hadoopcheck  hbaseanti  checkstyle  compile  |
| uname | Linux dedff5b5a286 4.4.0-134-generic #160~14.04.1-Ubuntu SMP Fri Aug 
17 11:07:07 UTC 2018 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh
 |
| git revision | master / c686b535c2 |
| maven | version: Apache Maven 3.5.4 
(1edded0938998edf8bf061f1ceb3cfdeccf443fe; 2018-06-17T18:33:14Z) |
| Default Java | 1.8.0_181 |
| findbugs | v3.1.0-RC3 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-HBASE-Build/14492/artifact/patchprocess/diff-checkstyle-hbase-server.txt
 |
| unit | 
https://builds.apache.org/job/PreCommit-HBASE-Build/14492/artifact/patchprocess/patch-unit-hbase

[jira] [Updated] (HBASE-21217) Revisit the executeProcedure method for open/close region

2018-09-25 Thread Duo Zhang (JIRA)



 [ 
https://issues.apache.org/jira/browse/HBASE-21217?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Duo Zhang updated HBASE-21217:
--
  Resolution: Fixed
Hadoop Flags: Reviewed
  Status: Resolved  (was: Patch Available)

Pushed to master and branch-2. Thanks [~stack] and [~allan163] for reviewing.

> Revisit the executeProcedure method for open/close region
> -
>
> Key: HBASE-21217
> URL: https://issues.apache.org/jira/browse/HBASE-21217
> Project: HBase
>  Issue Type: Sub-task
>  Components: amv2, proc-v2
>Reporter: Duo Zhang
>Assignee: Duo Zhang
>Priority: Critical
> Fix For: 3.0.0, 2.2.0
>
> Attachments: HBASE-21217-v1.patch, HBASE-21217-v2.patch, 
> HBASE-21217.patch
>
>
> Currently we just call openRegion and closeRegion directly, which is a bit 
> buggy. For example, in order to not fail all the open region requests while 
> there is only one failure, we will catch the exception and set a flag in the 
> return value. But for executeProcedures call, the return value will be 
> ignored, and we expect the openRegion method will always call 
> reportRegionStateTransition to report the failure but in fact it does not...
> And after HBASE-20881, we can confirm that the race could happen, where we 
> send a close request to a region which is opening(HBASE-21199), and vice 
> visa. So I think here we need to revisit the implementation of 
> executeProcedures to make it more stable.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HBASE-18451) PeriodicMemstoreFlusher should inspect the queue before adding a delayed flush request

2018-09-25 Thread Hadoop QA (JIRA)



[ 
https://issues.apache.org/jira/browse/HBASE-18451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16627130#comment-16627130
 ] 

Hadoop QA commented on HBASE-18451:
---

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 14m 
51s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
1s{color} | {color:blue} Findbugs executables are not available. {color} |
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green}  0m  
0s{color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
|| || || || {color:brown} branch-1 Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  7m 
49s{color} | {color:green} branch-1 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
34s{color} | {color:green} branch-1 passed with JDK v1.8.0_181 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
39s{color} | {color:green} branch-1 passed with JDK v1.7.0_191 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
27s{color} | {color:green} branch-1 passed {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  2m 
46s{color} | {color:green} branch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
34s{color} | {color:green} branch-1 passed with JDK v1.8.0_181 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
36s{color} | {color:green} branch-1 passed with JDK v1.7.0_191 {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
35s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
37s{color} | {color:green} the patch passed with JDK v1.8.0_181 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
37s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
38s{color} | {color:green} the patch passed with JDK v1.7.0_191 {color} |
| {color:red}-1{color} | {color:red} javac {color} | {color:red}  0m 38s{color} 
| {color:red} hbase-server-jdk1.7.0_191 with JDK v1.7.0_191 generated 2 new + 4 
unchanged - 2 fixed = 6 total (was 6) {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
23s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  2m 
42s{color} | {color:green} patch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green}  
1m 39s{color} | {color:green} Patch does not cause any errors with Hadoop 
2.7.4. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
28s{color} | {color:green} the patch passed with JDK v1.8.0_181 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
35s{color} | {color:green} the patch passed with JDK v1.7.0_191 {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}109m 
19s{color} | {color:green} hbase-server in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
28s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}149m 11s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:61288f8 |
| JIRA Issue | HBASE-18451 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12941182/HBASE-18451.branch-1.002.patch
 |
| Optional Tests |  asflicense  javac  javadoc  unit  findbugs  shadedjars  
hadoopcheck  hbaseanti  checkstyle  compile  |
| uname | Linux d14dd42fe65e 3.13.0-153-generic #203-Ubuntu SMP Thu Jun 14 
08:52:

[jira] [Commented] (HBASE-21217) Revisit the executeProcedure method for open/close region

2018-09-25 Thread Allan Yang (JIRA)



[ 
https://issues.apache.org/jira/browse/HBASE-21217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16627167#comment-16627167
 ] 

Allan Yang commented on HBASE-21217:


{quote}
There is no FAILED_CLOSE state so we can not tell master that the closing is 
failed. And I think the only possible way to meet the null region is that, we 
have already sent the request to RS, and RS has finished the closing, but a 
retrying rpc call has aleady been sent, and finally it is scheduled after we 
finish everything, then here we will meet a null region. For this case I think 
it is fine to just ignore it? Not sure if there are other possible ways to 
enter here, if so I think there will be bugs...
{quote}
Quote from the reviewborad.
Yes, I have encountered this in ITBLL, that's why I want to change back to 
CompatRemoteProcedureResolver in branch-2.0 and branch-2.1.  But it is 
definitely another bug need to reveal and fix.

> Revisit the executeProcedure method for open/close region
> -
>
> Key: HBASE-21217
> URL: https://issues.apache.org/jira/browse/HBASE-21217
> Project: HBase
>  Issue Type: Sub-task
>  Components: amv2, proc-v2
>Reporter: Duo Zhang
>Assignee: Duo Zhang
>Priority: Critical
> Fix For: 3.0.0, 2.2.0
>
> Attachments: HBASE-21217-v1.patch, HBASE-21217-v2.patch, 
> HBASE-21217.patch
>
>
> Currently we just call openRegion and closeRegion directly, which is a bit 
> buggy. For example, in order to not fail all the open region requests while 
> there is only one failure, we will catch the exception and set a flag in the 
> return value. But for executeProcedures call, the return value will be 
> ignored, and we expect the openRegion method will always call 
> reportRegionStateTransition to report the failure but in fact it does not...
> And after HBASE-20881, we can confirm that the race could happen, where we 
> send a close request to a region which is opening(HBASE-21199), and vice 
> visa. So I think here we need to revisit the implementation of 
> executeProcedures to make it more stable.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HBASE-21208) Bytes#toShort doesn't work without unsafe

2018-09-25 Thread Hudson (JIRA)



[ 
https://issues.apache.org/jira/browse/HBASE-21208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16627174#comment-16627174
 ] 

Hudson commented on HBASE-21208:


Results for branch branch-2.0
[build #861 on 
builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.0/861/]: 
(x) *{color:red}-1 overall{color}*

details (if available):

(/) {color:green}+1 general checks{color}
-- For more information [see general 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.0/861//General_Nightly_Build_Report/]




(x) {color:red}-1 jdk8 hadoop2 checks{color}
-- For more information [see jdk8 (hadoop2) 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.0/861//JDK8_Nightly_Build_Report_(Hadoop2)/]


(/) {color:green}+1 jdk8 hadoop3 checks{color}
-- For more information [see jdk8 (hadoop3) 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.0/861//JDK8_Nightly_Build_Report_(Hadoop3)/]


(/) {color:green}+1 source release artifact{color}
-- See build output for details.


> Bytes#toShort doesn't work without unsafe
> -
>
> Key: HBASE-21208
> URL: https://issues.apache.org/jira/browse/HBASE-21208
> Project: HBase
>  Issue Type: Bug
>Reporter: Chia-Ping Tsai
>Assignee: Chia-Ping Tsai
>Priority: Critical
> Fix For: 2.2.0, 2.1.1, 2.0.3
>
> Attachments: HBASE-21208.v0.patch, HBASE-21208.v1.patch, 
> HBASE-21208.v2.patch
>
>
> seems we put the brackets in the wrong place.
> {code}
>   short n = 0;
>   n = (short) ((n ^ bytes[offset]) & 0xFF);
>   n = (short) (n << 8);
>   n = (short) ((n ^ bytes[offset+1]) & 0xFF);   // this one
>   return n;
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HBASE-21208) Bytes#toShort doesn't work without unsafe

2018-09-25 Thread Hudson (JIRA)



[ 
https://issues.apache.org/jira/browse/HBASE-21208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16627198#comment-16627198
 ] 

Hudson commented on HBASE-21208:


Results for branch branch-2
[build #1298 on 
builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/branch-2/1298/]: 
(x) *{color:red}-1 overall{color}*

details (if available):

(/) {color:green}+1 general checks{color}
-- For more information [see general 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2/1298//General_Nightly_Build_Report/]




(x) {color:red}-1 jdk8 hadoop2 checks{color}
-- For more information [see jdk8 (hadoop2) 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2/1298//JDK8_Nightly_Build_Report_(Hadoop2)/]


(x) {color:red}-1 jdk8 hadoop3 checks{color}
-- For more information [see jdk8 (hadoop3) 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2/1298//JDK8_Nightly_Build_Report_(Hadoop3)/]


(/) {color:green}+1 source release artifact{color}
-- See build output for details.


(/) {color:green}+1 client integration test{color}


> Bytes#toShort doesn't work without unsafe
> -
>
> Key: HBASE-21208
> URL: https://issues.apache.org/jira/browse/HBASE-21208
> Project: HBase
>  Issue Type: Bug
>Reporter: Chia-Ping Tsai
>Assignee: Chia-Ping Tsai
>Priority: Critical
> Fix For: 2.2.0, 2.1.1, 2.0.3
>
> Attachments: HBASE-21208.v0.patch, HBASE-21208.v1.patch, 
> HBASE-21208.v2.patch
>
>
> seems we put the brackets in the wrong place.
> {code}
>   short n = 0;
>   n = (short) ((n ^ bytes[offset]) & 0xFF);
>   n = (short) (n << 8);
>   n = (short) ((n ^ bytes[offset+1]) & 0xFF);   // this one
>   return n;
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HBASE-21217) Revisit the executeProcedure method for open/close region

2018-09-25 Thread Duo Zhang (JIRA)



[ 
https://issues.apache.org/jira/browse/HBASE-21217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16627269#comment-16627269
 ] 

Duo Zhang commented on HBASE-21217:
---

Yes, when calling closeRegion we could throw a NotServingRegionException, so 
the master could know that the close is useless. But we need to know why master 
issues the useless close region request...

> Revisit the executeProcedure method for open/close region
> -
>
> Key: HBASE-21217
> URL: https://issues.apache.org/jira/browse/HBASE-21217
> Project: HBase
>  Issue Type: Sub-task
>  Components: amv2, proc-v2
>Reporter: Duo Zhang
>Assignee: Duo Zhang
>Priority: Critical
> Fix For: 3.0.0, 2.2.0
>
> Attachments: HBASE-21217-v1.patch, HBASE-21217-v2.patch, 
> HBASE-21217.patch
>
>
> Currently we just call openRegion and closeRegion directly, which is a bit 
> buggy. For example, in order to not fail all the open region requests while 
> there is only one failure, we will catch the exception and set a flag in the 
> return value. But for executeProcedures call, the return value will be 
> ignored, and we expect the openRegion method will always call 
> reportRegionStateTransition to report the failure but in fact it does not...
> And after HBASE-20881, we can confirm that the race could happen, where we 
> send a close request to a region which is opening(HBASE-21199), and vice 
> visa. So I think here we need to revisit the implementation of 
> executeProcedures to make it more stable.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (HBASE-21226) Revisit the close region related code at RS side

2018-09-25 Thread Duo Zhang (JIRA)

Duo Zhang created HBASE-21226:
-

 Summary: Revisit the close region related code at RS side
 Key: HBASE-21226
 URL: https://issues.apache.org/jira/browse/HBASE-21226
 Project: HBase
  Issue Type: Sub-task
Reporter: Duo Zhang


We use the closeRegion method to close a region and it will schedule a 
CloseRegionHandler(before HBASE-21217). The problem here is that, the 
CloseRegionHandler and closeRegion method are mainly designed to be called by 
master, but in fact, when shutting down RS, we will also call the closeRegion 
method to close all the regions on the RS.

In HBASE-21217, we change to use UnassignRegionHandler to close a region if the 
request is from master, so here we need to consider the close region request 
when shutting down RS.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (HBASE-21227) Implement exponential retrying backoff for Assign/UnassignRegionHandler introduced in HBASE-21217

2018-09-25 Thread Duo Zhang (JIRA)

Duo Zhang created HBASE-21227:
-

 Summary: Implement exponential retrying backoff for 
Assign/UnassignRegionHandler introduced in HBASE-21217
 Key: HBASE-21227
 URL: https://issues.apache.org/jira/browse/HBASE-21227
 Project: HBase
  Issue Type: Sub-task
  Components: amv2, regionserver
Reporter: Duo Zhang
 Fix For: 3.0.0, 2.2.0






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HBASE-21208) Bytes#toShort doesn't work without unsafe

2018-09-25 Thread Hudson (JIRA)



[ 
https://issues.apache.org/jira/browse/HBASE-21208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16627281#comment-16627281
 ] 

Hudson commented on HBASE-21208:


Results for branch master
[build #509 on 
builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/master/509/]: (x) 
*{color:red}-1 overall{color}*

details (if available):

(/) {color:green}+1 general checks{color}
-- For more information [see general 
report|https://builds.apache.org/job/HBase%20Nightly/job/master/509//General_Nightly_Build_Report/]




(x) {color:red}-1 jdk8 hadoop2 checks{color}
-- For more information [see jdk8 (hadoop2) 
report|https://builds.apache.org/job/HBase%20Nightly/job/master/509//JDK8_Nightly_Build_Report_(Hadoop2)/]


(x) {color:red}-1 jdk8 hadoop3 checks{color}
-- For more information [see jdk8 (hadoop3) 
report|https://builds.apache.org/job/HBase%20Nightly/job/master/509//JDK8_Nightly_Build_Report_(Hadoop3)/]


(/) {color:green}+1 source release artifact{color}
-- See build output for details.


(/) {color:green}+1 client integration test{color}


> Bytes#toShort doesn't work without unsafe
> -
>
> Key: HBASE-21208
> URL: https://issues.apache.org/jira/browse/HBASE-21208
> Project: HBase
>  Issue Type: Bug
>Reporter: Chia-Ping Tsai
>Assignee: Chia-Ping Tsai
>Priority: Critical
> Fix For: 2.2.0, 2.1.1, 2.0.3
>
> Attachments: HBASE-21208.v0.patch, HBASE-21208.v1.patch, 
> HBASE-21208.v2.patch
>
>
> seems we put the brackets in the wrong place.
> {code}
>   short n = 0;
>   n = (short) ((n ^ bytes[offset]) & 0xFF);
>   n = (short) (n << 8);
>   n = (short) ((n ^ bytes[offset+1]) & 0xFF);   // this one
>   return n;
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HBASE-21227) Implement exponential retrying backoff for Assign/UnassignRegionHandler introduced in HBASE-21217

2018-09-25 Thread Duo Zhang (JIRA)



 [ 
https://issues.apache.org/jira/browse/HBASE-21227?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Duo Zhang updated HBASE-21227:
--
Assignee: Duo Zhang
  Status: Patch Available  (was: Open)

> Implement exponential retrying backoff for Assign/UnassignRegionHandler 
> introduced in HBASE-21217
> -
>
> Key: HBASE-21227
> URL: https://issues.apache.org/jira/browse/HBASE-21227
> Project: HBase
>  Issue Type: Sub-task
>  Components: amv2, regionserver
>Reporter: Duo Zhang
>Assignee: Duo Zhang
>Priority: Major
> Fix For: 3.0.0, 2.2.0
>
> Attachments: HBASE-21227.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HBASE-21227) Implement exponential retrying backoff for Assign/UnassignRegionHandler introduced in HBASE-21217

2018-09-25 Thread Duo Zhang (JIRA)



 [ 
https://issues.apache.org/jira/browse/HBASE-21227?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Duo Zhang updated HBASE-21227:
--
Attachment: HBASE-21227.patch

> Implement exponential retrying backoff for Assign/UnassignRegionHandler 
> introduced in HBASE-21217
> -
>
> Key: HBASE-21227
> URL: https://issues.apache.org/jira/browse/HBASE-21227
> Project: HBase
>  Issue Type: Sub-task
>  Components: amv2, regionserver
>Reporter: Duo Zhang
>Priority: Major
> Fix For: 3.0.0, 2.2.0
>
> Attachments: HBASE-21227.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HBASE-21213) [hbck2] bypass leaves behind state in RegionStates when assign/unassign

2018-09-25 Thread stack (JIRA)



[ 
https://issues.apache.org/jira/browse/HBASE-21213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16627437#comment-16627437
 ] 

stack commented on HBASE-21213:
---

[~allan163] Thanks for bringing this up.

I suppose we need to declare that hbck2 only works for 2.1.1 onward. 2.1.1 is 
when the HbckService shows up (I should verify that this stuff showing up on a 
minor version does not break upgrades). I need to add to hbck2 a version check, 
one we can run to check the remote cluster has new facilities as we add them to 
hbck.

How does this sound [~allan163]/[~Apache9]

(Can't wait till 2.2.x because 2.2.x has the awkward upgrade... that is my 
thinking at least).

> [hbck2] bypass leaves behind state in RegionStates when assign/unassign
> ---
>
> Key: HBASE-21213
> URL: https://issues.apache.org/jira/browse/HBASE-21213
> Project: HBase
>  Issue Type: Bug
>  Components: amv2, hbck2
>Reporter: stack
>Assignee: stack
>Priority: Major
> Fix For: 2.1.1
>
> Attachments: HBASE-21213.branch-2.1.001.patch, 
> HBASE-21213.branch-2.1.002.patch, HBASE-21213.branch-2.1.003.patch, 
> HBASE-21213.branch-2.1.004.patch, HBASE-21213.branch-2.1.005.patch, 
> HBASE-21213.branch-2.1.006.patch
>
>
> This is a follow-on from HBASE-21083 which added the 'bypass' functionality. 
> On bypass, there is more state to be cleared if we are allow new Procedures 
> to be scheduled.
> For example, here is a bypass:
> {code}
> 2018-09-20 05:45:43,722 INFO org.apache.hadoop.hbase.procedure2.Procedure: 
> pid=100449, state=RUNNABLE:REGION_TRANSITION_DISPATCH, locked=true, 
> bypass=LOG-REDACTED UnassignProcedure table=hbase:namespace, 
> region=37cc206fe9c4bc1c0a46a34c5f523d16, 
> server=ve1233.halxg.cloudera.com,22101,1537397961664 bypassed, returning null 
> to finish it
> 2018-09-20 05:45:44,022 INFO 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor: Finished pid=100449, 
> state=SUCCESS, bypass=LOG-REDACTED UnassignProcedure table=hbase:namespace, 
> region=37cc206fe9c4bc1c0a46a34c5f523d16, 
> server=ve1233.halxg.cloudera.com,22101,1537397961664 in 2mins, 7.618sec
> {code}
> ... but then when I try to assign the bypassed region later, I get this:
> {code}
> 2018-09-20 05:46:31,435 WARN 
> org.apache.hadoop.hbase.master.assignment.RegionTransitionProcedure: There is 
> already another procedure running on this region this=pid=100450, 
> state=RUNNABLE:REGION_TRANSITION_QUEUE, locked=true; AssignProcedure 
> table=hbase:namespace, region=37cc206fe9c4bc1c0a46a34c5f523d16 
> owner=pid=100449, state=SUCCESS, bypass=LOG-REDACTED UnassignProcedure 
> table=hbase:namespace, region=37cc206fe9c4bc1c0a46a34c5f523d16, 
> server=ve1233.halxg.cloudera.com,22101,1537397961664 pid=100450, 
> state=RUNNABLE:REGION_TRANSITION_QUEUE, locked=true; AssignProcedure 
> table=hbase:namespace, region=37cc206fe9c4bc1c0a46a34c5f523d16; rit=OPENING, 
> location=ve1233.halxg.cloudera.com,22101,1537397961664
> 2018-09-20 05:46:31,510 INFO 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor: Rolled back pid=100450, 
> state=ROLLEDBACK, 
> exception=org.apache.hadoop.hbase.procedure2.ProcedureAbortedException via 
> AssignProcedure:org.apache.hadoop.hbase.procedure2.ProcedureAbortedException: 
> There is already another procedure running on this region this=pid=100450, 
> state=RUNNABLE:REGION_TRANSITION_QUEUE, locked=true; AssignProcedure 
> table=hbase:namespace, region=37cc206fe9c4bc1c0a46a34c5f523d16 
> owner=pid=100449, state=SUCCESS, bypass=LOG-REDACTED UnassignProcedure 
> table=hbase:namespace, region=37cc206fe9c4bc1c0a46a34c5f523d16, 
> server=ve1233.halxg.cloudera.com,22101,1537397961664; AssignProcedure 
> table=hbase:namespace, region=37cc206fe9c4bc1c0a46a34c5f523d16 
> exec-time=473msec
> {code}
> ... which is a long-winded way of saying the Unassign Procedure still exists 
> still in RegionStateNodes.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HBASE-19121) HBCK for AMv2 (A.K.A HBCK2)

2018-09-25 Thread stack (JIRA)



[ 
https://issues.apache.org/jira/browse/HBASE-19121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16627438#comment-16627438
 ] 

stack commented on HBASE-19121:
---

TODO: hbck2 is showing up on a point release -- 2.1.1 -- rather than on a minor 
(2.2.x) because I'm thinking its ok adding in this new stuff because it is on a 
new Service and it won't break what was there previous (To be confirmed). Also 
avoiding waiting on 2.2.0 because it has an awkward upgrade. Thats how I'm 
thinking. Happy to hear opinions otherwise.

> HBCK for AMv2 (A.K.A HBCK2)
> ---
>
> Key: HBASE-19121
> URL: https://issues.apache.org/jira/browse/HBASE-19121
> Project: HBase
>  Issue Type: Bug
>  Components: hbck
>Reporter: stack
>Assignee: Umesh Agashe
>Priority: Major
> Attachments: hbase-19121.master.001.patch
>
>
> We don't have an hbck for the new AM. Old hbck may actually do damage going 
> against AMv2.
> Fix.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HBASE-19121) HBCK for AMv2 (A.K.A HBCK2)

2018-09-25 Thread Josh Elser (JIRA)



[ 
https://issues.apache.org/jira/browse/HBASE-19121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16627457#comment-16627457
 ] 

Josh Elser commented on HBASE-19121:


{quote}hbck2 is showing up on a point release – 2.1.1 – rather than on a minor 
(2.2.x) because I'm thinking its ok adding in this new stuff because it is on a 
new Service and it won't break what was there previous (To be confirmed).
{quote}
Seems OK. A little wonky for it to work on 2.1.1 and not 2.1.0, but that's not 
the end of the world.
{quote}Also avoiding waiting on 2.2.0 because it has an awkward upgrade
{quote}
You have more info I can read up on regarding this? Sounds like something we'd 
want to try to make better to avoid us getting stuck on 2.0 and 2.1 releases 
(not like that even happened in HBase 1.x releases ;))

> HBCK for AMv2 (A.K.A HBCK2)
> ---
>
> Key: HBASE-19121
> URL: https://issues.apache.org/jira/browse/HBASE-19121
> Project: HBase
>  Issue Type: Bug
>  Components: hbck
>Reporter: stack
>Assignee: Umesh Agashe
>Priority: Major
> Attachments: hbase-19121.master.001.patch
>
>
> We don't have an hbck for the new AM. Old hbck may actually do damage going 
> against AMv2.
> Fix.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HBASE-19121) HBCK for AMv2 (A.K.A HBCK2)

2018-09-25 Thread stack (JIRA)



[ 
https://issues.apache.org/jira/browse/HBASE-19121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16627473#comment-16627473
 ] 

stack commented on HBASE-19121:
---

Or crazy-pants stuff like cutting branch-2.2 from branch-2.1?

> HBCK for AMv2 (A.K.A HBCK2)
> ---
>
> Key: HBASE-19121
> URL: https://issues.apache.org/jira/browse/HBASE-19121
> Project: HBase
>  Issue Type: Bug
>  Components: hbck
>Reporter: stack
>Assignee: Umesh Agashe
>Priority: Major
> Attachments: hbase-19121.master.001.patch
>
>
> We don't have an hbck for the new AM. Old hbck may actually do damage going 
> against AMv2.
> Fix.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HBASE-19121) HBCK for AMv2 (A.K.A HBCK2)

2018-09-25 Thread Josh Elser (JIRA)



[ 
https://issues.apache.org/jira/browse/HBASE-19121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16627513#comment-16627513
 ] 

Josh Elser commented on HBASE-19121:


{quote}Or crazy-pants stuff like cutting branch-2.2 from branch-2.1?
{quote}
IMO, that's fine for me too. Until we've release a new version, I see those 
versions as something we fully control. branch-2.1 becoming "2.2.0" and 
branch-2.2's HEAD being pushed to 2.3 would give the user-facing semantics we 
want.

> HBCK for AMv2 (A.K.A HBCK2)
> ---
>
> Key: HBASE-19121
> URL: https://issues.apache.org/jira/browse/HBASE-19121
> Project: HBase
>  Issue Type: Bug
>  Components: hbck
>Reporter: stack
>Assignee: Umesh Agashe
>Priority: Major
> Attachments: hbase-19121.master.001.patch
>
>
> We don't have an hbck for the new AM. Old hbck may actually do damage going 
> against AMv2.
> Fix.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (HBASE-21228) Memory leak since AbstractFSWAL caches Thread object and never clean later

2018-09-25 Thread Allan Yang (JIRA)

Allan Yang created HBASE-21228:
--

 Summary: Memory leak since AbstractFSWAL caches Thread object and 
never clean later
 Key: HBASE-21228
 URL: https://issues.apache.org/jira/browse/HBASE-21228
 Project: HBase
  Issue Type: Bug
Affects Versions: 1.4.7, 2.0.2, 2.1.0
Reporter: Allan Yang
Assignee: Allan Yang


In AbstractFSWAL(FSHLog in branch-1), we have a map caches thread and 
SyncFutures.
{code}
/**
   * Map of {@link SyncFuture}s keyed by Handler objects. Used so we reuse 
SyncFutures.
   * 
   * TODO: Reuse FSWALEntry's rather than create them anew each time as we do 
SyncFutures here.
   * 
   * TODO: Add a FSWalEntry and SyncFuture as thread locals on handlers rather 
than have them get
   * them from this Map?
   */
  private final ConcurrentMap syncFuturesByHandler;
{code}

A colleague of mine find a memory leak case caused by this map.

Every thread who writes WAL will be cached in this map, And no one will clean 
the threads in the map even after the thread is dead. 

In one of our customer's cluster, we noticed that even though there is no 
requests, the heap of the RS is almost full and CMS GC was triggered every 
second.
We dumped the heap and then found out there were more than 30 thousands threads 
with Terminated state. which are all cached in this map above. Everything 
referenced in these threads were leaked. Most of the threads are:
1.PostOpenDeployTasksThread, which will write Open Region mark in WAL
2. hconnection-0x1f838e31-shared--pool, which are used to write index short 
circuit(Phoenix), and WAL will be write and sync in these threads.
3.  Index writer thread(Phoenix), which referenced by RegionEnvironment  then 
by HRegion and finally been referenced by PostOpenDeployTasksThread.

We should turn this map into a thread local one, let JVM GC the terminated 
thread for us. 




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HBASE-18451) PeriodicMemstoreFlusher should inspect the queue before adding a delayed flush request

2018-09-25 Thread Mike Drob (JIRA)



[ 
https://issues.apache.org/jira/browse/HBASE-18451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16627523#comment-16627523
 ] 

Mike Drob commented on HBASE-18451:
---

+1 after checkstyle fixed

> PeriodicMemstoreFlusher should inspect the queue before adding a delayed 
> flush request
> --
>
> Key: HBASE-18451
> URL: https://issues.apache.org/jira/browse/HBASE-18451
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Affects Versions: 2.0.0-alpha-1
>Reporter: Jean-Marc Spaggiari
>Assignee: Xu Cang
>Priority: Major
> Attachments: HBASE-18451.branch-1.001.patch, 
> HBASE-18451.branch-1.002.patch, HBASE-18451.master.002.patch, 
> HBASE-18451.master.003.patch, HBASE-18451.master.patch
>
>
> If you run a big job every 4 hours, impacting many tables (they have 150 
> regions per server), ad the end all the regions might have some data to be 
> flushed, and we want, after one hour, trigger a periodic flush. That's 
> totally fine.
> Now, to avoid a flush storm, when we detect a region to be flushed, we add a 
> "randomDelay" to the delayed flush, that way we spread them away.
> RANGE_OF_DELAY is 5 minutes. So we spread the flush over the next 5 minutes, 
> which is very good.
> However, because we don't check if there is already a request in the queue, 
> 10 seconds after, we create a new request, with a new randomDelay.
> If you generate a randomDelay every 10 seconds, at some point, you will end 
> up having a small one, and the flush will be triggered almost immediatly.
> As a result, instead of spreading all the flush within the next 5 minutes, 
> you end-up getting them all way more quickly. Like within the first minute. 
> Which not only feed the queue to to many flush requests, but also defeats the 
> purpose of the randomDelay.
> {code}
> @Override
> protected void chore() {
>   final StringBuffer whyFlush = new StringBuffer();
>   for (Region r : this.server.onlineRegions.values()) {
> if (r == null) continue;
> if (((HRegion)r).shouldFlush(whyFlush)) {
>   FlushRequester requester = server.getFlushRequester();
>   if (requester != null) {
> long randomDelay = RandomUtils.nextInt(RANGE_OF_DELAY) + 
> MIN_DELAY_TIME;
> LOG.info(getName() + " requesting flush of " +
>   r.getRegionInfo().getRegionNameAsString() + " because " +
>   whyFlush.toString() +
>   " after random delay " + randomDelay + "ms");
> //Throttle the flushes by putting a delay. If we don't throttle, 
> and there
> //is a balanced write-load on the regions in a table, we might 
> end up
> //overwhelming the filesystem with too many flushes at once.
> requester.requestDelayedFlush(r, randomDelay, false);
>   }
> }
>   }
> }
> {code}
> {code}
> 2017-07-24 18:44:33,338 INFO 
> org.apache.hadoop.hbase.regionserver.HRegionServer: 
> hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting 
> flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f 
> has an old edit so flush to free WALs after random delay 270785ms
> 2017-07-24 18:44:43,328 INFO 
> org.apache.hadoop.hbase.regionserver.HRegionServer: 
> hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting 
> flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f 
> has an old edit so flush to free WALs after random delay 200143ms
> 2017-07-24 18:44:53,954 INFO 
> org.apache.hadoop.hbase.regionserver.HRegionServer: 
> hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting 
> flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f 
> has an old edit so flush to free WALs after random delay 191082ms
> 2017-07-24 18:45:03,528 INFO 
> org.apache.hadoop.hbase.regionserver.HRegionServer: 
> hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting 
> flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f 
> has an old edit so flush to free WALs after random delay 92532ms
> 2017-07-24 18:45:14,201 INFO 
> org.apache.hadoop.hbase.regionserver.HRegionServer: 
> hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting 
> flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f 
> has an old edit so flush to free WALs after random delay 238780ms
> 2017-07-24 18:45:24,195 INFO 
> org.apache.hadoop.hbase.regionserver.HRegionServer: 
> hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting 
> flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f 
> has an old edit so flush to free WALs aft

[jira] [Updated] (HBASE-21228) Memory leak since AbstractFSWAL caches Thread object and never clean later

2018-09-25 Thread Allan Yang (JIRA)



 [ 
https://issues.apache.org/jira/browse/HBASE-21228?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allan Yang updated HBASE-21228:
---
Description: 
In AbstractFSWAL(FSHLog in branch-1), we have a map caches thread and 
SyncFutures.
{code}
/**
   * Map of {@link SyncFuture}s keyed by Handler objects. Used so we reuse 
SyncFutures.
   * 
   * TODO: Reuse FSWALEntry's rather than create them anew each time as we do 
SyncFutures here.
   * 
   * TODO: Add a FSWalEntry and SyncFuture as thread locals on handlers rather 
than have them get
   * them from this Map?
   */
  private final ConcurrentMap syncFuturesByHandler;
{code}

A colleague of mine find a memory leak case caused by this map.

Every thread who writes WAL will be cached in this map, And no one will clean 
the threads in the map even after the thread is dead. 

In one of our customer's cluster, we noticed that even though there is no 
requests, the heap of the RS is almost full and CMS GC was triggered every 
second.
We dumped the heap and then found out there were more than 30 thousands threads 
with Terminated state. which are all cached in this map above. Everything 
referenced in these threads were leaked. Most of the threads are:
1.PostOpenDeployTasksThread, which will write Open Region mark in WAL
2. hconnection-0x1f838e31-shared--pool, which are used to write index short 
circuit(Phoenix), and WAL will be write and sync in these threads.
3.  Index writer thread(Phoenix), which referenced by 
RegionCoprocessorHost$RegionEnvironment then by HRegion and finally been 
referenced by PostOpenDeployTasksThread.

We should turn this map into a thread local one, let JVM GC the terminated 
thread for us. 


  was:
In AbstractFSWAL(FSHLog in branch-1), we have a map caches thread and 
SyncFutures.
{code}
/**
   * Map of {@link SyncFuture}s keyed by Handler objects. Used so we reuse 
SyncFutures.
   * 
   * TODO: Reuse FSWALEntry's rather than create them anew each time as we do 
SyncFutures here.
   * 
   * TODO: Add a FSWalEntry and SyncFuture as thread locals on handlers rather 
than have them get
   * them from this Map?
   */
  private final ConcurrentMap syncFuturesByHandler;
{code}

A colleague of mine find a memory leak case caused by this map.

Every thread who writes WAL will be cached in this map, And no one will clean 
the threads in the map even after the thread is dead. 

In one of our customer's cluster, we noticed that even though there is no 
requests, the heap of the RS is almost full and CMS GC was triggered every 
second.
We dumped the heap and then found out there were more than 30 thousands threads 
with Terminated state. which are all cached in this map above. Everything 
referenced in these threads were leaked. Most of the threads are:
1.PostOpenDeployTasksThread, which will write Open Region mark in WAL
2. hconnection-0x1f838e31-shared--pool, which are used to write index short 
circuit(Phoenix), and WAL will be write and sync in these threads.
3.  Index writer thread(Phoenix), which referenced by RegionEnvironment  then 
by HRegion and finally been referenced by PostOpenDeployTasksThread.

We should turn this map into a thread local one, let JVM GC the terminated 
thread for us. 



> Memory leak since AbstractFSWAL caches Thread object and never clean later
> --
>
> Key: HBASE-21228
> URL: https://issues.apache.org/jira/browse/HBASE-21228
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.1.0, 2.0.2, 1.4.7
>Reporter: Allan Yang
>Assignee: Allan Yang
>Priority: Major
>
> In AbstractFSWAL(FSHLog in branch-1), we have a map caches thread and 
> SyncFutures.
> {code}
> /**
>* Map of {@link SyncFuture}s keyed by Handler objects. Used so we reuse 
> SyncFutures.
>* 
>* TODO: Reuse FSWALEntry's rather than create them anew each time as we do 
> SyncFutures here.
>* 
>* TODO: Add a FSWalEntry and SyncFuture as thread locals on handlers 
> rather than have them get
>* them from this Map?
>*/
>   private final ConcurrentMap syncFuturesByHandler;
> {code}
> A colleague of mine find a memory leak case caused by this map.
> Every thread who writes WAL will be cached in this map, And no one will clean 
> the threads in the map even after the thread is dead. 
> In one of our customer's cluster, we noticed that even though there is no 
> requests, the heap of the RS is almost full and CMS GC was triggered every 
> second.
> We dumped the heap and then found out there were more than 30 thousands 
> threads with Terminated state. which are all cached in this map above. 
> Everything referenced in these threads were leaked. Most of the threads are:
> 1.PostOpenDeployTasksThread, which will write Open Region mark in WAL
> 2. hconnection-0x1f838e31-shared--pool, which are used to write inde

[jira] [Commented] (HBASE-19121) HBCK for AMv2 (A.K.A HBCK2)

2018-09-25 Thread stack (JIRA)



[ 
https://issues.apache.org/jira/browse/HBASE-19121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16627528#comment-16627528
 ] 

stack commented on HBASE-19121:
---

Thanks [~elserj] for input.

HBASE-20881 adds new Procedure type. HBASE-21075 is about how we have to drain 
the old ones first before we can start the new Master. Need to make it 'smooth'.

> HBCK for AMv2 (A.K.A HBCK2)
> ---
>
> Key: HBASE-19121
> URL: https://issues.apache.org/jira/browse/HBASE-19121
> Project: HBase
>  Issue Type: Bug
>  Components: hbck
>Reporter: stack
>Assignee: Umesh Agashe
>Priority: Major
> Attachments: hbase-19121.master.001.patch
>
>
> We don't have an hbck for the new AM. Old hbck may actually do damage going 
> against AMv2.
> Fix.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HBASE-19121) HBCK for AMv2 (A.K.A HBCK2)

2018-09-25 Thread stack (JIRA)



[ 
https://issues.apache.org/jira/browse/HBASE-19121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16627531#comment-16627531
 ] 

stack commented on HBASE-19121:
---

Ok. Let me let this stew a bit to see what fellows from China have to say 
[~elserj]. Will surface on dev list tomorrow to see if more input.

Branching 2.2 from 2.1 might be the 'safest'.

> HBCK for AMv2 (A.K.A HBCK2)
> ---
>
> Key: HBASE-19121
> URL: https://issues.apache.org/jira/browse/HBASE-19121
> Project: HBase
>  Issue Type: Bug
>  Components: hbck
>Reporter: stack
>Assignee: Umesh Agashe
>Priority: Major
> Attachments: hbase-19121.master.001.patch
>
>
> We don't have an hbck for the new AM. Old hbck may actually do damage going 
> against AMv2.
> Fix.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HBASE-21228) Memory leak since AbstractFSWAL caches Thread object and never clean later

2018-09-25 Thread Mike Drob (JIRA)



[ 
https://issues.apache.org/jira/browse/HBASE-21228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16627533#comment-16627533
 ] 

Mike Drob commented on HBASE-21228:
---

You're suggesting to replace the map with {{ThreadLocal}}? Yea, 
that makes sense. Are you working on a patch for this?

> Memory leak since AbstractFSWAL caches Thread object and never clean later
> --
>
> Key: HBASE-21228
> URL: https://issues.apache.org/jira/browse/HBASE-21228
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.1.0, 2.0.2, 1.4.7
>Reporter: Allan Yang
>Assignee: Allan Yang
>Priority: Major
>
> In AbstractFSWAL(FSHLog in branch-1), we have a map caches thread and 
> SyncFutures.
> {code}
> /**
>* Map of {@link SyncFuture}s keyed by Handler objects. Used so we reuse 
> SyncFutures.
>* 
>* TODO: Reuse FSWALEntry's rather than create them anew each time as we do 
> SyncFutures here.
>* 
>* TODO: Add a FSWalEntry and SyncFuture as thread locals on handlers 
> rather than have them get
>* them from this Map?
>*/
>   private final ConcurrentMap syncFuturesByHandler;
> {code}
> A colleague of mine find a memory leak case caused by this map.
> Every thread who writes WAL will be cached in this map, And no one will clean 
> the threads in the map even after the thread is dead. 
> In one of our customer's cluster, we noticed that even though there is no 
> requests, the heap of the RS is almost full and CMS GC was triggered every 
> second.
> We dumped the heap and then found out there were more than 30 thousands 
> threads with Terminated state. which are all cached in this map above. 
> Everything referenced in these threads were leaked. Most of the threads are:
> 1.PostOpenDeployTasksThread, which will write Open Region mark in WAL
> 2. hconnection-0x1f838e31-shared--pool, which are used to write index short 
> circuit(Phoenix), and WAL will be write and sync in these threads.
> 3.  Index writer thread(Phoenix), which referenced by 
> RegionCoprocessorHost$RegionEnvironment then by HRegion and finally been 
> referenced by PostOpenDeployTasksThread.
> We should turn this map into a thread local one, let JVM GC the terminated 
> thread for us. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HBASE-20952) Re-visit the WAL API

2018-09-25 Thread Josh Elser (JIRA)



[ 
https://issues.apache.org/jira/browse/HBASE-20952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16627546#comment-16627546
 ] 

Josh Elser commented on HBASE-20952:


[~stack], [~Apache9], ping, just in case this got lost over the weekend. I 
think this is in a good state – am itching to start iterating on this.

> Re-visit the WAL API
> 
>
> Key: HBASE-20952
> URL: https://issues.apache.org/jira/browse/HBASE-20952
> Project: HBase
>  Issue Type: Sub-task
>  Components: wal
>Reporter: Josh Elser
>Priority: Major
> Attachments: 20952.v1.txt
>
>
> Take a step back from the current WAL implementations and think about what an 
> HBase WAL API should look like. What are the primitive calls that we require 
> to guarantee durability of writes with a high degree of performance?
> The API needs to take the current implementations into consideration. We 
> should also have a mind for what is happening in the Ratis LogService (but 
> the LogService should not dictate what HBase's WAL API looks like RATIS-272).
> Other "systems" inside of HBase that use WALs are replication and 
> backup&restore. Replication has the use-case for "tail"'ing the WAL which we 
> should provide via our new API. B&R doesn't do anything fancy (IIRC). We 
> should make sure all consumers are generally going to be OK with the API we 
> create.
> The API may be "OK" (or OK in a part). We need to also consider other methods 
> which were "bolted" on such as {{AbstractFSWAL}} and 
> {{WALFileLengthProvider}}. Other corners of "WAL use" (like the 
> {{WALSplitter}} should also be looked at to use WAL-APIs only).
> We also need to make sure that adequate interface audience and stability 
> annotations are chosen.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HBASE-20952) Re-visit the WAL API

2018-09-25 Thread Josh Elser (JIRA)



[ 
https://issues.apache.org/jira/browse/HBASE-20952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16627549#comment-16627549
 ] 

Josh Elser commented on HBASE-20952:


[~reidchan], [~zyork], also, FYI if you'd like to see the latest on the 
approach and give some feedback.

> Re-visit the WAL API
> 
>
> Key: HBASE-20952
> URL: https://issues.apache.org/jira/browse/HBASE-20952
> Project: HBase
>  Issue Type: Sub-task
>  Components: wal
>Reporter: Josh Elser
>Priority: Major
> Attachments: 20952.v1.txt
>
>
> Take a step back from the current WAL implementations and think about what an 
> HBase WAL API should look like. What are the primitive calls that we require 
> to guarantee durability of writes with a high degree of performance?
> The API needs to take the current implementations into consideration. We 
> should also have a mind for what is happening in the Ratis LogService (but 
> the LogService should not dictate what HBase's WAL API looks like RATIS-272).
> Other "systems" inside of HBase that use WALs are replication and 
> backup&restore. Replication has the use-case for "tail"'ing the WAL which we 
> should provide via our new API. B&R doesn't do anything fancy (IIRC). We 
> should make sure all consumers are generally going to be OK with the API we 
> create.
> The API may be "OK" (or OK in a part). We need to also consider other methods 
> which were "bolted" on such as {{AbstractFSWAL}} and 
> {{WALFileLengthProvider}}. Other corners of "WAL use" (like the 
> {{WALSplitter}} should also be looked at to use WAL-APIs only).
> We also need to make sure that adequate interface audience and stability 
> annotations are chosen.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HBASE-21228) Memory leak since AbstractFSWAL caches Thread object and never clean later

2018-09-25 Thread Allan Yang (JIRA)



 [ 
https://issues.apache.org/jira/browse/HBASE-21228?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allan Yang updated HBASE-21228:
---
Attachment: HBASE-21228.branch-2.0.001.patch

> Memory leak since AbstractFSWAL caches Thread object and never clean later
> --
>
> Key: HBASE-21228
> URL: https://issues.apache.org/jira/browse/HBASE-21228
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.1.0, 2.0.2, 1.4.7
>Reporter: Allan Yang
>Assignee: Allan Yang
>Priority: Major
> Attachments: HBASE-21228.branch-2.0.001.patch
>
>
> In AbstractFSWAL(FSHLog in branch-1), we have a map caches thread and 
> SyncFutures.
> {code}
> /**
>* Map of {@link SyncFuture}s keyed by Handler objects. Used so we reuse 
> SyncFutures.
>* 
>* TODO: Reuse FSWALEntry's rather than create them anew each time as we do 
> SyncFutures here.
>* 
>* TODO: Add a FSWalEntry and SyncFuture as thread locals on handlers 
> rather than have them get
>* them from this Map?
>*/
>   private final ConcurrentMap syncFuturesByHandler;
> {code}
> A colleague of mine find a memory leak case caused by this map.
> Every thread who writes WAL will be cached in this map, And no one will clean 
> the threads in the map even after the thread is dead. 
> In one of our customer's cluster, we noticed that even though there is no 
> requests, the heap of the RS is almost full and CMS GC was triggered every 
> second.
> We dumped the heap and then found out there were more than 30 thousands 
> threads with Terminated state. which are all cached in this map above. 
> Everything referenced in these threads were leaked. Most of the threads are:
> 1.PostOpenDeployTasksThread, which will write Open Region mark in WAL
> 2. hconnection-0x1f838e31-shared--pool, which are used to write index short 
> circuit(Phoenix), and WAL will be write and sync in these threads.
> 3.  Index writer thread(Phoenix), which referenced by 
> RegionCoprocessorHost$RegionEnvironment then by HRegion and finally been 
> referenced by PostOpenDeployTasksThread.
> We should turn this map into a thread local one, let JVM GC the terminated 
> thread for us. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HBASE-21228) Memory leak since AbstractFSWAL caches Thread object and never clean later

2018-09-25 Thread Allan Yang (JIRA)



 [ 
https://issues.apache.org/jira/browse/HBASE-21228?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allan Yang updated HBASE-21228:
---
Status: Patch Available  (was: Open)

> Memory leak since AbstractFSWAL caches Thread object and never clean later
> --
>
> Key: HBASE-21228
> URL: https://issues.apache.org/jira/browse/HBASE-21228
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 1.4.7, 2.0.2, 2.1.0
>Reporter: Allan Yang
>Assignee: Allan Yang
>Priority: Major
> Attachments: HBASE-21228.branch-2.0.001.patch
>
>
> In AbstractFSWAL(FSHLog in branch-1), we have a map caches thread and 
> SyncFutures.
> {code}
> /**
>* Map of {@link SyncFuture}s keyed by Handler objects. Used so we reuse 
> SyncFutures.
>* 
>* TODO: Reuse FSWALEntry's rather than create them anew each time as we do 
> SyncFutures here.
>* 
>* TODO: Add a FSWalEntry and SyncFuture as thread locals on handlers 
> rather than have them get
>* them from this Map?
>*/
>   private final ConcurrentMap syncFuturesByHandler;
> {code}
> A colleague of mine find a memory leak case caused by this map.
> Every thread who writes WAL will be cached in this map, And no one will clean 
> the threads in the map even after the thread is dead. 
> In one of our customer's cluster, we noticed that even though there is no 
> requests, the heap of the RS is almost full and CMS GC was triggered every 
> second.
> We dumped the heap and then found out there were more than 30 thousands 
> threads with Terminated state. which are all cached in this map above. 
> Everything referenced in these threads were leaked. Most of the threads are:
> 1.PostOpenDeployTasksThread, which will write Open Region mark in WAL
> 2. hconnection-0x1f838e31-shared--pool, which are used to write index short 
> circuit(Phoenix), and WAL will be write and sync in these threads.
> 3.  Index writer thread(Phoenix), which referenced by 
> RegionCoprocessorHost$RegionEnvironment then by HRegion and finally been 
> referenced by PostOpenDeployTasksThread.
> We should turn this map into a thread local one, let JVM GC the terminated 
> thread for us. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HBASE-21228) Memory leak since AbstractFSWAL caches Thread object and never clean later

2018-09-25 Thread Allan Yang (JIRA)



[ 
https://issues.apache.org/jira/browse/HBASE-21228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16627553#comment-16627553
 ] 

Allan Yang commented on HBASE-21228:


{quote}
You're suggesting to replace the map with ThreadLocal?
{quote}
Yeah, A patch is already uploaded

> Memory leak since AbstractFSWAL caches Thread object and never clean later
> --
>
> Key: HBASE-21228
> URL: https://issues.apache.org/jira/browse/HBASE-21228
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.1.0, 2.0.2, 1.4.7
>Reporter: Allan Yang
>Assignee: Allan Yang
>Priority: Major
> Attachments: HBASE-21228.branch-2.0.001.patch
>
>
> In AbstractFSWAL(FSHLog in branch-1), we have a map caches thread and 
> SyncFutures.
> {code}
> /**
>* Map of {@link SyncFuture}s keyed by Handler objects. Used so we reuse 
> SyncFutures.
>* 
>* TODO: Reuse FSWALEntry's rather than create them anew each time as we do 
> SyncFutures here.
>* 
>* TODO: Add a FSWalEntry and SyncFuture as thread locals on handlers 
> rather than have them get
>* them from this Map?
>*/
>   private final ConcurrentMap syncFuturesByHandler;
> {code}
> A colleague of mine find a memory leak case caused by this map.
> Every thread who writes WAL will be cached in this map, And no one will clean 
> the threads in the map even after the thread is dead. 
> In one of our customer's cluster, we noticed that even though there is no 
> requests, the heap of the RS is almost full and CMS GC was triggered every 
> second.
> We dumped the heap and then found out there were more than 30 thousands 
> threads with Terminated state. which are all cached in this map above. 
> Everything referenced in these threads were leaked. Most of the threads are:
> 1.PostOpenDeployTasksThread, which will write Open Region mark in WAL
> 2. hconnection-0x1f838e31-shared--pool, which are used to write index short 
> circuit(Phoenix), and WAL will be write and sync in these threads.
> 3.  Index writer thread(Phoenix), which referenced by 
> RegionCoprocessorHost$RegionEnvironment then by HRegion and finally been 
> referenced by PostOpenDeployTasksThread.
> We should turn this map into a thread local one, let JVM GC the terminated 
> thread for us. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HBASE-18451) PeriodicMemstoreFlusher should inspect the queue before adding a delayed flush request

2018-09-25 Thread Xu Cang (JIRA)



 [ 
https://issues.apache.org/jira/browse/HBASE-18451?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xu Cang updated HBASE-18451:

Attachment: HBASE-18451.master.004.patch

> PeriodicMemstoreFlusher should inspect the queue before adding a delayed 
> flush request
> --
>
> Key: HBASE-18451
> URL: https://issues.apache.org/jira/browse/HBASE-18451
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Affects Versions: 2.0.0-alpha-1
>Reporter: Jean-Marc Spaggiari
>Assignee: Xu Cang
>Priority: Major
> Attachments: HBASE-18451.branch-1.001.patch, 
> HBASE-18451.branch-1.002.patch, HBASE-18451.branch-1.002.patch, 
> HBASE-18451.master.002.patch, HBASE-18451.master.003.patch, 
> HBASE-18451.master.004.patch, HBASE-18451.master.patch
>
>
> If you run a big job every 4 hours, impacting many tables (they have 150 
> regions per server), ad the end all the regions might have some data to be 
> flushed, and we want, after one hour, trigger a periodic flush. That's 
> totally fine.
> Now, to avoid a flush storm, when we detect a region to be flushed, we add a 
> "randomDelay" to the delayed flush, that way we spread them away.
> RANGE_OF_DELAY is 5 minutes. So we spread the flush over the next 5 minutes, 
> which is very good.
> However, because we don't check if there is already a request in the queue, 
> 10 seconds after, we create a new request, with a new randomDelay.
> If you generate a randomDelay every 10 seconds, at some point, you will end 
> up having a small one, and the flush will be triggered almost immediatly.
> As a result, instead of spreading all the flush within the next 5 minutes, 
> you end-up getting them all way more quickly. Like within the first minute. 
> Which not only feed the queue to to many flush requests, but also defeats the 
> purpose of the randomDelay.
> {code}
> @Override
> protected void chore() {
>   final StringBuffer whyFlush = new StringBuffer();
>   for (Region r : this.server.onlineRegions.values()) {
> if (r == null) continue;
> if (((HRegion)r).shouldFlush(whyFlush)) {
>   FlushRequester requester = server.getFlushRequester();
>   if (requester != null) {
> long randomDelay = RandomUtils.nextInt(RANGE_OF_DELAY) + 
> MIN_DELAY_TIME;
> LOG.info(getName() + " requesting flush of " +
>   r.getRegionInfo().getRegionNameAsString() + " because " +
>   whyFlush.toString() +
>   " after random delay " + randomDelay + "ms");
> //Throttle the flushes by putting a delay. If we don't throttle, 
> and there
> //is a balanced write-load on the regions in a table, we might 
> end up
> //overwhelming the filesystem with too many flushes at once.
> requester.requestDelayedFlush(r, randomDelay, false);
>   }
> }
>   }
> }
> {code}
> {code}
> 2017-07-24 18:44:33,338 INFO 
> org.apache.hadoop.hbase.regionserver.HRegionServer: 
> hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting 
> flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f 
> has an old edit so flush to free WALs after random delay 270785ms
> 2017-07-24 18:44:43,328 INFO 
> org.apache.hadoop.hbase.regionserver.HRegionServer: 
> hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting 
> flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f 
> has an old edit so flush to free WALs after random delay 200143ms
> 2017-07-24 18:44:53,954 INFO 
> org.apache.hadoop.hbase.regionserver.HRegionServer: 
> hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting 
> flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f 
> has an old edit so flush to free WALs after random delay 191082ms
> 2017-07-24 18:45:03,528 INFO 
> org.apache.hadoop.hbase.regionserver.HRegionServer: 
> hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting 
> flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f 
> has an old edit so flush to free WALs after random delay 92532ms
> 2017-07-24 18:45:14,201 INFO 
> org.apache.hadoop.hbase.regionserver.HRegionServer: 
> hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting 
> flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f 
> has an old edit so flush to free WALs after random delay 238780ms
> 2017-07-24 18:45:24,195 INFO 
> org.apache.hadoop.hbase.regionserver.HRegionServer: 
> hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting 
> flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f 
> has an old edit so

[jira] [Updated] (HBASE-18451) PeriodicMemstoreFlusher should inspect the queue before adding a delayed flush request

2018-09-25 Thread Xu Cang (JIRA)



 [ 
https://issues.apache.org/jira/browse/HBASE-18451?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xu Cang updated HBASE-18451:

Attachment: HBASE-18451.branch-1.002.patch

> PeriodicMemstoreFlusher should inspect the queue before adding a delayed 
> flush request
> --
>
> Key: HBASE-18451
> URL: https://issues.apache.org/jira/browse/HBASE-18451
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Affects Versions: 2.0.0-alpha-1
>Reporter: Jean-Marc Spaggiari
>Assignee: Xu Cang
>Priority: Major
> Attachments: HBASE-18451.branch-1.001.patch, 
> HBASE-18451.branch-1.002.patch, HBASE-18451.branch-1.002.patch, 
> HBASE-18451.master.002.patch, HBASE-18451.master.003.patch, 
> HBASE-18451.master.004.patch, HBASE-18451.master.patch
>
>
> If you run a big job every 4 hours, impacting many tables (they have 150 
> regions per server), ad the end all the regions might have some data to be 
> flushed, and we want, after one hour, trigger a periodic flush. That's 
> totally fine.
> Now, to avoid a flush storm, when we detect a region to be flushed, we add a 
> "randomDelay" to the delayed flush, that way we spread them away.
> RANGE_OF_DELAY is 5 minutes. So we spread the flush over the next 5 minutes, 
> which is very good.
> However, because we don't check if there is already a request in the queue, 
> 10 seconds after, we create a new request, with a new randomDelay.
> If you generate a randomDelay every 10 seconds, at some point, you will end 
> up having a small one, and the flush will be triggered almost immediatly.
> As a result, instead of spreading all the flush within the next 5 minutes, 
> you end-up getting them all way more quickly. Like within the first minute. 
> Which not only feed the queue to to many flush requests, but also defeats the 
> purpose of the randomDelay.
> {code}
> @Override
> protected void chore() {
>   final StringBuffer whyFlush = new StringBuffer();
>   for (Region r : this.server.onlineRegions.values()) {
> if (r == null) continue;
> if (((HRegion)r).shouldFlush(whyFlush)) {
>   FlushRequester requester = server.getFlushRequester();
>   if (requester != null) {
> long randomDelay = RandomUtils.nextInt(RANGE_OF_DELAY) + 
> MIN_DELAY_TIME;
> LOG.info(getName() + " requesting flush of " +
>   r.getRegionInfo().getRegionNameAsString() + " because " +
>   whyFlush.toString() +
>   " after random delay " + randomDelay + "ms");
> //Throttle the flushes by putting a delay. If we don't throttle, 
> and there
> //is a balanced write-load on the regions in a table, we might 
> end up
> //overwhelming the filesystem with too many flushes at once.
> requester.requestDelayedFlush(r, randomDelay, false);
>   }
> }
>   }
> }
> {code}
> {code}
> 2017-07-24 18:44:33,338 INFO 
> org.apache.hadoop.hbase.regionserver.HRegionServer: 
> hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting 
> flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f 
> has an old edit so flush to free WALs after random delay 270785ms
> 2017-07-24 18:44:43,328 INFO 
> org.apache.hadoop.hbase.regionserver.HRegionServer: 
> hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting 
> flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f 
> has an old edit so flush to free WALs after random delay 200143ms
> 2017-07-24 18:44:53,954 INFO 
> org.apache.hadoop.hbase.regionserver.HRegionServer: 
> hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting 
> flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f 
> has an old edit so flush to free WALs after random delay 191082ms
> 2017-07-24 18:45:03,528 INFO 
> org.apache.hadoop.hbase.regionserver.HRegionServer: 
> hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting 
> flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f 
> has an old edit so flush to free WALs after random delay 92532ms
> 2017-07-24 18:45:14,201 INFO 
> org.apache.hadoop.hbase.regionserver.HRegionServer: 
> hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting 
> flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f 
> has an old edit so flush to free WALs after random delay 238780ms
> 2017-07-24 18:45:24,195 INFO 
> org.apache.hadoop.hbase.regionserver.HRegionServer: 
> hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting 
> flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f 
> has an old edit s

[jira] [Commented] (HBASE-21228) Memory leak since AbstractFSWAL caches Thread object and never clean later

2018-09-25 Thread Ted Yu (JIRA)



[ 
https://issues.apache.org/jira/browse/HBASE-21228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16627554#comment-16627554
 ] 

Ted Yu commented on HBASE-21228:


+1, pending QA

> Memory leak since AbstractFSWAL caches Thread object and never clean later
> --
>
> Key: HBASE-21228
> URL: https://issues.apache.org/jira/browse/HBASE-21228
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.1.0, 2.0.2, 1.4.7
>Reporter: Allan Yang
>Assignee: Allan Yang
>Priority: Major
> Attachments: HBASE-21228.branch-2.0.001.patch
>
>
> In AbstractFSWAL(FSHLog in branch-1), we have a map caches thread and 
> SyncFutures.
> {code}
> /**
>* Map of {@link SyncFuture}s keyed by Handler objects. Used so we reuse 
> SyncFutures.
>* 
>* TODO: Reuse FSWALEntry's rather than create them anew each time as we do 
> SyncFutures here.
>* 
>* TODO: Add a FSWalEntry and SyncFuture as thread locals on handlers 
> rather than have them get
>* them from this Map?
>*/
>   private final ConcurrentMap syncFuturesByHandler;
> {code}
> A colleague of mine find a memory leak case caused by this map.
> Every thread who writes WAL will be cached in this map, And no one will clean 
> the threads in the map even after the thread is dead. 
> In one of our customer's cluster, we noticed that even though there is no 
> requests, the heap of the RS is almost full and CMS GC was triggered every 
> second.
> We dumped the heap and then found out there were more than 30 thousands 
> threads with Terminated state. which are all cached in this map above. 
> Everything referenced in these threads were leaked. Most of the threads are:
> 1.PostOpenDeployTasksThread, which will write Open Region mark in WAL
> 2. hconnection-0x1f838e31-shared--pool, which are used to write index short 
> circuit(Phoenix), and WAL will be write and sync in these threads.
> 3.  Index writer thread(Phoenix), which referenced by 
> RegionCoprocessorHost$RegionEnvironment then by HRegion and finally been 
> referenced by PostOpenDeployTasksThread.
> We should turn this map into a thread local one, let JVM GC the terminated 
> thread for us. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HBASE-18451) PeriodicMemstoreFlusher should inspect the queue before adding a delayed flush request

2018-09-25 Thread Xu Cang (JIRA)



[ 
https://issues.apache.org/jira/browse/HBASE-18451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16627555#comment-16627555
 ] 

Xu Cang commented on HBASE-18451:
-

Fixed code style for the master branch.

re-uploaded branch-1 patch to trigger another hadoop-qa run. The javac error 
was strange, let's try it again.

> PeriodicMemstoreFlusher should inspect the queue before adding a delayed 
> flush request
> --
>
> Key: HBASE-18451
> URL: https://issues.apache.org/jira/browse/HBASE-18451
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Affects Versions: 2.0.0-alpha-1
>Reporter: Jean-Marc Spaggiari
>Assignee: Xu Cang
>Priority: Major
> Attachments: HBASE-18451.branch-1.001.patch, 
> HBASE-18451.branch-1.002.patch, HBASE-18451.branch-1.002.patch, 
> HBASE-18451.master.002.patch, HBASE-18451.master.003.patch, 
> HBASE-18451.master.004.patch, HBASE-18451.master.patch
>
>
> If you run a big job every 4 hours, impacting many tables (they have 150 
> regions per server), ad the end all the regions might have some data to be 
> flushed, and we want, after one hour, trigger a periodic flush. That's 
> totally fine.
> Now, to avoid a flush storm, when we detect a region to be flushed, we add a 
> "randomDelay" to the delayed flush, that way we spread them away.
> RANGE_OF_DELAY is 5 minutes. So we spread the flush over the next 5 minutes, 
> which is very good.
> However, because we don't check if there is already a request in the queue, 
> 10 seconds after, we create a new request, with a new randomDelay.
> If you generate a randomDelay every 10 seconds, at some point, you will end 
> up having a small one, and the flush will be triggered almost immediatly.
> As a result, instead of spreading all the flush within the next 5 minutes, 
> you end-up getting them all way more quickly. Like within the first minute. 
> Which not only feed the queue to to many flush requests, but also defeats the 
> purpose of the randomDelay.
> {code}
> @Override
> protected void chore() {
>   final StringBuffer whyFlush = new StringBuffer();
>   for (Region r : this.server.onlineRegions.values()) {
> if (r == null) continue;
> if (((HRegion)r).shouldFlush(whyFlush)) {
>   FlushRequester requester = server.getFlushRequester();
>   if (requester != null) {
> long randomDelay = RandomUtils.nextInt(RANGE_OF_DELAY) + 
> MIN_DELAY_TIME;
> LOG.info(getName() + " requesting flush of " +
>   r.getRegionInfo().getRegionNameAsString() + " because " +
>   whyFlush.toString() +
>   " after random delay " + randomDelay + "ms");
> //Throttle the flushes by putting a delay. If we don't throttle, 
> and there
> //is a balanced write-load on the regions in a table, we might 
> end up
> //overwhelming the filesystem with too many flushes at once.
> requester.requestDelayedFlush(r, randomDelay, false);
>   }
> }
>   }
> }
> {code}
> {code}
> 2017-07-24 18:44:33,338 INFO 
> org.apache.hadoop.hbase.regionserver.HRegionServer: 
> hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting 
> flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f 
> has an old edit so flush to free WALs after random delay 270785ms
> 2017-07-24 18:44:43,328 INFO 
> org.apache.hadoop.hbase.regionserver.HRegionServer: 
> hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting 
> flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f 
> has an old edit so flush to free WALs after random delay 200143ms
> 2017-07-24 18:44:53,954 INFO 
> org.apache.hadoop.hbase.regionserver.HRegionServer: 
> hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting 
> flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f 
> has an old edit so flush to free WALs after random delay 191082ms
> 2017-07-24 18:45:03,528 INFO 
> org.apache.hadoop.hbase.regionserver.HRegionServer: 
> hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting 
> flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f 
> has an old edit so flush to free WALs after random delay 92532ms
> 2017-07-24 18:45:14,201 INFO 
> org.apache.hadoop.hbase.regionserver.HRegionServer: 
> hbasetest2.domainname.com,60020,1500916375517-MemstoreFlusherChore requesting 
> flush of testflush,,1500932649126.578c27d2eb7ef0ad437bf2ff38c053ae. because f 
> has an old edit so flush to free WALs after random delay 238780ms
> 2017-07-24 18:45:24,195 INFO 
> org.apache.hadoop.hbase.regionserver.HRegionServer: 
> hbasetest2.domai

[jira] [Commented] (HBASE-21228) Memory leak since AbstractFSWAL caches Thread object and never clean later

2018-09-25 Thread Mike Drob (JIRA)



[ 
https://issues.apache.org/jira/browse/HBASE-21228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16627558#comment-16627558
 ] 

Mike Drob commented on HBASE-21228:
---

If we override {{initialValue}} method on the ThreadLocal, then we can simplify 
the logic in {{getSyncFuture}}.

> Memory leak since AbstractFSWAL caches Thread object and never clean later
> --
>
> Key: HBASE-21228
> URL: https://issues.apache.org/jira/browse/HBASE-21228
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.1.0, 2.0.2, 1.4.7
>Reporter: Allan Yang
>Assignee: Allan Yang
>Priority: Major
> Attachments: HBASE-21228.branch-2.0.001.patch
>
>
> In AbstractFSWAL(FSHLog in branch-1), we have a map caches thread and 
> SyncFutures.
> {code}
> /**
>* Map of {@link SyncFuture}s keyed by Handler objects. Used so we reuse 
> SyncFutures.
>* 
>* TODO: Reuse FSWALEntry's rather than create them anew each time as we do 
> SyncFutures here.
>* 
>* TODO: Add a FSWalEntry and SyncFuture as thread locals on handlers 
> rather than have them get
>* them from this Map?
>*/
>   private final ConcurrentMap syncFuturesByHandler;
> {code}
> A colleague of mine find a memory leak case caused by this map.
> Every thread who writes WAL will be cached in this map, And no one will clean 
> the threads in the map even after the thread is dead. 
> In one of our customer's cluster, we noticed that even though there is no 
> requests, the heap of the RS is almost full and CMS GC was triggered every 
> second.
> We dumped the heap and then found out there were more than 30 thousands 
> threads with Terminated state. which are all cached in this map above. 
> Everything referenced in these threads were leaked. Most of the threads are:
> 1.PostOpenDeployTasksThread, which will write Open Region mark in WAL
> 2. hconnection-0x1f838e31-shared--pool, which are used to write index short 
> circuit(Phoenix), and WAL will be write and sync in these threads.
> 3.  Index writer thread(Phoenix), which referenced by 
> RegionCoprocessorHost$RegionEnvironment then by HRegion and finally been 
> referenced by PostOpenDeployTasksThread.
> We should turn this map into a thread local one, let JVM GC the terminated 
> thread for us. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HBASE-21213) [hbck2] bypass leaves behind state in RegionStates when assign/unassign

2018-09-25 Thread Allan Yang (JIRA)



[ 
https://issues.apache.org/jira/browse/HBASE-21213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16627566#comment-16627566
 ] 

Allan Yang commented on HBASE-21213:


I want HBCK2 to work on branch-2.0. Maybe we can focus on branch-2.1 for now, 
later I can do some back port, [~stack].

> [hbck2] bypass leaves behind state in RegionStates when assign/unassign
> ---
>
> Key: HBASE-21213
> URL: https://issues.apache.org/jira/browse/HBASE-21213
> Project: HBase
>  Issue Type: Bug
>  Components: amv2, hbck2
>Reporter: stack
>Assignee: stack
>Priority: Major
> Fix For: 2.1.1
>
> Attachments: HBASE-21213.branch-2.1.001.patch, 
> HBASE-21213.branch-2.1.002.patch, HBASE-21213.branch-2.1.003.patch, 
> HBASE-21213.branch-2.1.004.patch, HBASE-21213.branch-2.1.005.patch, 
> HBASE-21213.branch-2.1.006.patch
>
>
> This is a follow-on from HBASE-21083 which added the 'bypass' functionality. 
> On bypass, there is more state to be cleared if we are allow new Procedures 
> to be scheduled.
> For example, here is a bypass:
> {code}
> 2018-09-20 05:45:43,722 INFO org.apache.hadoop.hbase.procedure2.Procedure: 
> pid=100449, state=RUNNABLE:REGION_TRANSITION_DISPATCH, locked=true, 
> bypass=LOG-REDACTED UnassignProcedure table=hbase:namespace, 
> region=37cc206fe9c4bc1c0a46a34c5f523d16, 
> server=ve1233.halxg.cloudera.com,22101,1537397961664 bypassed, returning null 
> to finish it
> 2018-09-20 05:45:44,022 INFO 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor: Finished pid=100449, 
> state=SUCCESS, bypass=LOG-REDACTED UnassignProcedure table=hbase:namespace, 
> region=37cc206fe9c4bc1c0a46a34c5f523d16, 
> server=ve1233.halxg.cloudera.com,22101,1537397961664 in 2mins, 7.618sec
> {code}
> ... but then when I try to assign the bypassed region later, I get this:
> {code}
> 2018-09-20 05:46:31,435 WARN 
> org.apache.hadoop.hbase.master.assignment.RegionTransitionProcedure: There is 
> already another procedure running on this region this=pid=100450, 
> state=RUNNABLE:REGION_TRANSITION_QUEUE, locked=true; AssignProcedure 
> table=hbase:namespace, region=37cc206fe9c4bc1c0a46a34c5f523d16 
> owner=pid=100449, state=SUCCESS, bypass=LOG-REDACTED UnassignProcedure 
> table=hbase:namespace, region=37cc206fe9c4bc1c0a46a34c5f523d16, 
> server=ve1233.halxg.cloudera.com,22101,1537397961664 pid=100450, 
> state=RUNNABLE:REGION_TRANSITION_QUEUE, locked=true; AssignProcedure 
> table=hbase:namespace, region=37cc206fe9c4bc1c0a46a34c5f523d16; rit=OPENING, 
> location=ve1233.halxg.cloudera.com,22101,1537397961664
> 2018-09-20 05:46:31,510 INFO 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor: Rolled back pid=100450, 
> state=ROLLEDBACK, 
> exception=org.apache.hadoop.hbase.procedure2.ProcedureAbortedException via 
> AssignProcedure:org.apache.hadoop.hbase.procedure2.ProcedureAbortedException: 
> There is already another procedure running on this region this=pid=100450, 
> state=RUNNABLE:REGION_TRANSITION_QUEUE, locked=true; AssignProcedure 
> table=hbase:namespace, region=37cc206fe9c4bc1c0a46a34c5f523d16 
> owner=pid=100449, state=SUCCESS, bypass=LOG-REDACTED UnassignProcedure 
> table=hbase:namespace, region=37cc206fe9c4bc1c0a46a34c5f523d16, 
> server=ve1233.halxg.cloudera.com,22101,1537397961664; AssignProcedure 
> table=hbase:namespace, region=37cc206fe9c4bc1c0a46a34c5f523d16 
> exec-time=473msec
> {code}
> ... which is a long-winded way of saying the Unassign Procedure still exists 
> still in RegionStateNodes.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HBASE-21228) Memory leak since AbstractFSWAL caches Thread object and never clean later

2018-09-25 Thread Allan Yang (JIRA)



[ 
https://issues.apache.org/jira/browse/HBASE-21228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16627575#comment-16627575
 ] 

Allan Yang commented on HBASE-21228:


{quote}
If we override initialValue method on the ThreadLocal, then we can simplify the 
logic in getSyncFuture.
{quote}
I prefer the straightforward way to do so, so that anyone later who read the 
code won't be surprise by the possible NPE first and then find out the 
initialValue magic :). What do you think, [~mdrob], sir?

> Memory leak since AbstractFSWAL caches Thread object and never clean later
> --
>
> Key: HBASE-21228
> URL: https://issues.apache.org/jira/browse/HBASE-21228
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.1.0, 2.0.2, 1.4.7
>Reporter: Allan Yang
>Assignee: Allan Yang
>Priority: Major
> Attachments: HBASE-21228.branch-2.0.001.patch
>
>
> In AbstractFSWAL(FSHLog in branch-1), we have a map caches thread and 
> SyncFutures.
> {code}
> /**
>* Map of {@link SyncFuture}s keyed by Handler objects. Used so we reuse 
> SyncFutures.
>* 
>* TODO: Reuse FSWALEntry's rather than create them anew each time as we do 
> SyncFutures here.
>* 
>* TODO: Add a FSWalEntry and SyncFuture as thread locals on handlers 
> rather than have them get
>* them from this Map?
>*/
>   private final ConcurrentMap syncFuturesByHandler;
> {code}
> A colleague of mine find a memory leak case caused by this map.
> Every thread who writes WAL will be cached in this map, And no one will clean 
> the threads in the map even after the thread is dead. 
> In one of our customer's cluster, we noticed that even though there is no 
> requests, the heap of the RS is almost full and CMS GC was triggered every 
> second.
> We dumped the heap and then found out there were more than 30 thousands 
> threads with Terminated state. which are all cached in this map above. 
> Everything referenced in these threads were leaked. Most of the threads are:
> 1.PostOpenDeployTasksThread, which will write Open Region mark in WAL
> 2. hconnection-0x1f838e31-shared--pool, which are used to write index short 
> circuit(Phoenix), and WAL will be write and sync in these threads.
> 3.  Index writer thread(Phoenix), which referenced by 
> RegionCoprocessorHost$RegionEnvironment then by HRegion and finally been 
> referenced by PostOpenDeployTasksThread.
> We should turn this map into a thread local one, let JVM GC the terminated 
> thread for us. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HBASE-20734) Colocate recovered edits directory with hbase.wal.dir

2018-09-25 Thread Reid Chan (JIRA)



[ 
https://issues.apache.org/jira/browse/HBASE-20734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16627578#comment-16627578
 ] 

Reid Chan commented on HBASE-20734:
---

Will commit it late this day if no other comments.

> Colocate recovered edits directory with hbase.wal.dir
> -
>
> Key: HBASE-20734
> URL: https://issues.apache.org/jira/browse/HBASE-20734
> Project: HBase
>  Issue Type: Improvement
>  Components: MTTR, Recovery, wal
>Reporter: Ted Yu
>Assignee: Zach York
>Priority: Major
> Fix For: 3.0.0
>
> Attachments: HBASE-20734.branch-1.001.patch, 
> HBASE-20734.branch-1.002.patch, HBASE-20734.branch-1.003.patch, 
> HBASE-20734.branch-1.004.patch, HBASE-20734.branch-1.005.patch, 
> HBASE-20734.master.001.patch, HBASE-20734.master.002.patch, 
> HBASE-20734.master.003.patch, HBASE-20734.master.004.patch, 
> HBASE-20734.master.005.patch, HBASE-20734.master.006.patch, 
> HBASE-20734.master.007.patch, HBASE-20734.master.008.patch, 
> HBASE-20734.master.009.patch, HBASE-20734.master.010.patch, 
> HBASE-20734.master.011.patch, HBASE-20734.master.012.patch
>
>
> During investigation of HBASE-20723, I realized that we wouldn't get the best 
> performance when hbase.wal.dir is configured to be on different (fast) media 
> than hbase rootdir w.r.t. recovered edits since recovered edits directory is 
> currently under rootdir.
> Such setup may not result in fast recovery when there is region server 
> failover.
> This issue is to find proper (hopefully backward compatible) way in 
> colocating recovered edits directory with hbase.wal.dir .



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HBASE-21223) [amv2] Remove abort_procedure from shell

2018-09-25 Thread stack (JIRA)



[ 
https://issues.apache.org/jira/browse/HBASE-21223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16627581#comment-16627581
 ] 

stack commented on HBASE-21223:
---

Looks like most procedures just ignore this call anyways...

2018-09-25 09:08:25,935 TRACE org.apache.hadoop.hbase.ipc.RpcServer: callId: 11 
service: MasterService methodName: AbortProcedure size: 32 connection: 
10.17.208.21:34862 deadline: 1537891878430 param: TODO: class 
org.apache.hadoop.hbase.shaded.protobuf.generated.MasterProtos$AbortProcedureRequest
 connection: 10.17.208.21:34862, response is_procedure_aborted: false 
queueTime: 7502 processingTime: 1 totalTime: 7503

> [amv2] Remove abort_procedure from shell
> 
>
> Key: HBASE-21223
> URL: https://issues.apache.org/jira/browse/HBASE-21223
> Project: HBase
>  Issue Type: Bug
>  Components: amv2, hbck2
>Reporter: stack
>Assignee: stack
>Priority: Critical
>
> Remove this command. It will cause more damage than it could ever solve. It 
> should exist, it should be out in hbck2, not here in user-space.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HBASE-21213) [hbck2] bypass leaves behind state in RegionStates when assign/unassign

2018-09-25 Thread stack (JIRA)



[ 
https://issues.apache.org/jira/browse/HBASE-21213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16627582#comment-16627582
 ] 

stack commented on HBASE-21213:
---

Ok [~allan163]. Shout if there is anything that might disrupt your being able 
to backport.

> [hbck2] bypass leaves behind state in RegionStates when assign/unassign
> ---
>
> Key: HBASE-21213
> URL: https://issues.apache.org/jira/browse/HBASE-21213
> Project: HBase
>  Issue Type: Bug
>  Components: amv2, hbck2
>Reporter: stack
>Assignee: stack
>Priority: Major
> Fix For: 2.1.1
>
> Attachments: HBASE-21213.branch-2.1.001.patch, 
> HBASE-21213.branch-2.1.002.patch, HBASE-21213.branch-2.1.003.patch, 
> HBASE-21213.branch-2.1.004.patch, HBASE-21213.branch-2.1.005.patch, 
> HBASE-21213.branch-2.1.006.patch
>
>
> This is a follow-on from HBASE-21083 which added the 'bypass' functionality. 
> On bypass, there is more state to be cleared if we are allow new Procedures 
> to be scheduled.
> For example, here is a bypass:
> {code}
> 2018-09-20 05:45:43,722 INFO org.apache.hadoop.hbase.procedure2.Procedure: 
> pid=100449, state=RUNNABLE:REGION_TRANSITION_DISPATCH, locked=true, 
> bypass=LOG-REDACTED UnassignProcedure table=hbase:namespace, 
> region=37cc206fe9c4bc1c0a46a34c5f523d16, 
> server=ve1233.halxg.cloudera.com,22101,1537397961664 bypassed, returning null 
> to finish it
> 2018-09-20 05:45:44,022 INFO 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor: Finished pid=100449, 
> state=SUCCESS, bypass=LOG-REDACTED UnassignProcedure table=hbase:namespace, 
> region=37cc206fe9c4bc1c0a46a34c5f523d16, 
> server=ve1233.halxg.cloudera.com,22101,1537397961664 in 2mins, 7.618sec
> {code}
> ... but then when I try to assign the bypassed region later, I get this:
> {code}
> 2018-09-20 05:46:31,435 WARN 
> org.apache.hadoop.hbase.master.assignment.RegionTransitionProcedure: There is 
> already another procedure running on this region this=pid=100450, 
> state=RUNNABLE:REGION_TRANSITION_QUEUE, locked=true; AssignProcedure 
> table=hbase:namespace, region=37cc206fe9c4bc1c0a46a34c5f523d16 
> owner=pid=100449, state=SUCCESS, bypass=LOG-REDACTED UnassignProcedure 
> table=hbase:namespace, region=37cc206fe9c4bc1c0a46a34c5f523d16, 
> server=ve1233.halxg.cloudera.com,22101,1537397961664 pid=100450, 
> state=RUNNABLE:REGION_TRANSITION_QUEUE, locked=true; AssignProcedure 
> table=hbase:namespace, region=37cc206fe9c4bc1c0a46a34c5f523d16; rit=OPENING, 
> location=ve1233.halxg.cloudera.com,22101,1537397961664
> 2018-09-20 05:46:31,510 INFO 
> org.apache.hadoop.hbase.procedure2.ProcedureExecutor: Rolled back pid=100450, 
> state=ROLLEDBACK, 
> exception=org.apache.hadoop.hbase.procedure2.ProcedureAbortedException via 
> AssignProcedure:org.apache.hadoop.hbase.procedure2.ProcedureAbortedException: 
> There is already another procedure running on this region this=pid=100450, 
> state=RUNNABLE:REGION_TRANSITION_QUEUE, locked=true; AssignProcedure 
> table=hbase:namespace, region=37cc206fe9c4bc1c0a46a34c5f523d16 
> owner=pid=100449, state=SUCCESS, bypass=LOG-REDACTED UnassignProcedure 
> table=hbase:namespace, region=37cc206fe9c4bc1c0a46a34c5f523d16, 
> server=ve1233.halxg.cloudera.com,22101,1537397961664; AssignProcedure 
> table=hbase:namespace, region=37cc206fe9c4bc1c0a46a34c5f523d16 
> exec-time=473msec
> {code}
> ... which is a long-winded way of saying the Unassign Procedure still exists 
> still in RegionStateNodes.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HBASE-21221) Ineffective assertion in TestFromClientSide3#testMultiRowMutations

2018-09-25 Thread Ted Yu (JIRA)



 [ 
https://issues.apache.org/jira/browse/HBASE-21221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-21221:
---
   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: 3.0.0
   Status: Resolved  (was: Patch Available)

Thanks for the review, Mingliang.

> Ineffective assertion in TestFromClientSide3#testMultiRowMutations
> --
>
> Key: HBASE-21221
> URL: https://issues.apache.org/jira/browse/HBASE-21221
> Project: HBase
>  Issue Type: Test
>Reporter: Ted Yu
>Assignee: Ted Yu
>Priority: Minor
> Fix For: 3.0.0
>
> Attachments: 21221.v10.txt, 21221.v11.txt, 21221.v12.txt, 
> 21221.v7.txt, 21221.v8.txt, 21221.v9.txt
>
>
> Observed the following in 
> org.apache.hadoop.hbase.util.TestFromClientSide3WoUnsafe-output.txt :
> {code}
> Caused by: 
> org.apache.hadoop.hbase.ipc.RemoteWithExtrasException(java.io.IOException): 
> java.io.IOException: Timed out waiting for lock for row: ROW-1 in region 
> 089bdfa75f44d88e596479038a6da18b
>   at 
> org.apache.hadoop.hbase.regionserver.HRegion.getRowLockInternal(HRegion.java:5816)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegion$4.lockRowsAndBuildMiniBatch(HRegion.java:7432)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchMutate(HRegion.java:4008)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:3982)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegion.mutateRowsWithLocks(HRegion.java:7424)
>   at 
> org.apache.hadoop.hbase.coprocessor.MultiRowMutationEndpoint.mutateRows(MultiRowMutationEndpoint.java:116)
>   at 
> org.apache.hadoop.hbase.protobuf.generated.MultiRowMutationProtos$MultiRowMutationService.callMethod(MultiRowMutationProtos.java:2266)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegion.execService(HRegion.java:8182)
>   at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.execServiceOnRegion(RSRpcServices.java:2481)
>   at 
> org.apache.hadoop.hbase.regionserver.RSRpcServices.execService(RSRpcServices.java:2463)
> ...
> Exception in thread "pool-678-thread-1" java.lang.AssertionError: This cp 
> should fail because the target lock is blocked by previous put
>   at org.junit.Assert.fail(Assert.java:88)
>   at 
> org.apache.hadoop.hbase.client.TestFromClientSide3.lambda$testMultiRowMutations$7(TestFromClientSide3.java:861)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> {code}
> Here is related code:
> {code}
>   cpService.execute(() -> {
> ...
> if (!threw) {
>   // Can't call fail() earlier because the catch would eat it.
>   fail("This cp should fail because the target lock is blocked by 
> previous put");
> }
> {code}
> Since the fail() call is executed by the cpService, the assertion had no 
> bearing on the outcome of the test.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HBASE-21217) Revisit the executeProcedure method for open/close region

2018-09-25 Thread stack (JIRA)



[ 
https://issues.apache.org/jira/browse/HBASE-21217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16627591#comment-16627591
 ] 

stack commented on HBASE-21217:
---

Is there an issue for reenabling CompatRemoteProcedureResolver in branch-2.0 
and branch-2.1? Thanks.

> Revisit the executeProcedure method for open/close region
> -
>
> Key: HBASE-21217
> URL: https://issues.apache.org/jira/browse/HBASE-21217
> Project: HBase
>  Issue Type: Sub-task
>  Components: amv2, proc-v2
>Reporter: Duo Zhang
>Assignee: Duo Zhang
>Priority: Critical
> Fix For: 3.0.0, 2.2.0
>
> Attachments: HBASE-21217-v1.patch, HBASE-21217-v2.patch, 
> HBASE-21217.patch
>
>
> Currently we just call openRegion and closeRegion directly, which is a bit 
> buggy. For example, in order to not fail all the open region requests while 
> there is only one failure, we will catch the exception and set a flag in the 
> return value. But for executeProcedures call, the return value will be 
> ignored, and we expect the openRegion method will always call 
> reportRegionStateTransition to report the failure but in fact it does not...
> And after HBASE-20881, we can confirm that the race could happen, where we 
> send a close request to a region which is opening(HBASE-21199), and vice 
> visa. So I think here we need to revisit the implementation of 
> executeProcedures to make it more stable.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HBASE-21227) Implement exponential retrying backoff for Assign/UnassignRegionHandler introduced in HBASE-21217

2018-09-25 Thread stack (JIRA)



[ 
https://issues.apache.org/jira/browse/HBASE-21227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16627608#comment-16627608
 ] 

stack commented on HBASE-21227:
---

Use RetryCounter utility calculating backoff? Otherwise, LGTM.

> Implement exponential retrying backoff for Assign/UnassignRegionHandler 
> introduced in HBASE-21217
> -
>
> Key: HBASE-21227
> URL: https://issues.apache.org/jira/browse/HBASE-21227
> Project: HBase
>  Issue Type: Sub-task
>  Components: amv2, regionserver
>Reporter: Duo Zhang
>Assignee: Duo Zhang
>Priority: Major
> Fix For: 3.0.0, 2.2.0
>
> Attachments: HBASE-21227.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HBASE-21228) Memory leak since AbstractFSWAL caches Thread object and never clean later

2018-09-25 Thread Mike Drob (JIRA)



[ 
https://issues.apache.org/jira/browse/HBASE-21228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16627611#comment-16627611
 ] 

Mike Drob commented on HBASE-21228:
---

I was under the impression that initialValue was well known in usage of 
ThreadLocal and there's even a hint to it in the javadoc of {{get()}}. Maybe 
I'm mistaken on how widespread this knowledge is though.

I still prefer the initialValue approach, but won't argue about the current 
implementation.

> Memory leak since AbstractFSWAL caches Thread object and never clean later
> --
>
> Key: HBASE-21228
> URL: https://issues.apache.org/jira/browse/HBASE-21228
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.1.0, 2.0.2, 1.4.7
>Reporter: Allan Yang
>Assignee: Allan Yang
>Priority: Major
> Attachments: HBASE-21228.branch-2.0.001.patch
>
>
> In AbstractFSWAL(FSHLog in branch-1), we have a map caches thread and 
> SyncFutures.
> {code}
> /**
>* Map of {@link SyncFuture}s keyed by Handler objects. Used so we reuse 
> SyncFutures.
>* 
>* TODO: Reuse FSWALEntry's rather than create them anew each time as we do 
> SyncFutures here.
>* 
>* TODO: Add a FSWalEntry and SyncFuture as thread locals on handlers 
> rather than have them get
>* them from this Map?
>*/
>   private final ConcurrentMap syncFuturesByHandler;
> {code}
> A colleague of mine find a memory leak case caused by this map.
> Every thread who writes WAL will be cached in this map, And no one will clean 
> the threads in the map even after the thread is dead. 
> In one of our customer's cluster, we noticed that even though there is no 
> requests, the heap of the RS is almost full and CMS GC was triggered every 
> second.
> We dumped the heap and then found out there were more than 30 thousands 
> threads with Terminated state. which are all cached in this map above. 
> Everything referenced in these threads were leaked. Most of the threads are:
> 1.PostOpenDeployTasksThread, which will write Open Region mark in WAL
> 2. hconnection-0x1f838e31-shared--pool, which are used to write index short 
> circuit(Phoenix), and WAL will be write and sync in these threads.
> 3.  Index writer thread(Phoenix), which referenced by 
> RegionCoprocessorHost$RegionEnvironment then by HRegion and finally been 
> referenced by PostOpenDeployTasksThread.
> We should turn this map into a thread local one, let JVM GC the terminated 
> thread for us. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HBASE-21227) Implement exponential retrying backoff for Assign/UnassignRegionHandler introduced in HBASE-21217

2018-09-25 Thread Hadoop QA (JIRA)



[ 
https://issues.apache.org/jira/browse/HBASE-21227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16627609#comment-16627609
 ] 

Hadoop QA commented on HBASE-21227:
---

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
14s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green}  0m  
0s{color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:orange}-0{color} | {color:orange} test4tests {color} | {color:orange}  
0m  0s{color} | {color:orange} The patch doesn't appear to include any new or 
modified tests. Please justify why no new tests are needed for this patch. Also 
please list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  4m 
53s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
44s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
10s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  4m 
15s{color} | {color:green} branch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
58s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
30s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  4m 
55s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
45s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m 
45s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  1m 
11s{color} | {color:red} hbase-server: The patch generated 1 new + 0 unchanged 
- 0 fixed = 1 total (was 0) {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  4m 
13s{color} | {color:green} patch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 
10m 21s{color} | {color:green} Patch does not cause any errors with Hadoop 
2.7.4 or 3.0.0. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m  
2s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
30s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}123m 
24s{color} | {color:green} hbase-server in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
22s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}163m 59s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:b002b0b |
| JIRA Issue | HBASE-21227 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12941218/HBASE-21227.patch |
| Optional Tests |  asflicense  javac  javadoc  unit  findbugs  shadedjars  
hadoopcheck  hbaseanti  checkstyle  compile  |
| uname | Linux 588aae4d8926 3.13.0-143-generic #192-Ubuntu SMP Tue Feb 27 
10:45:36 UTC 2018 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh
 |
| git revision | master / 8eaaa63114 |
| maven | version: Apache Maven 3.5.4 
(1edded0938998edf8bf061f1ceb3cfdeccf443fe; 2018-06-17T18:33:14Z) |
| Default Java | 1.8.0_181 |
| findbugs | v3.1.0-RC3 |
| checkstyle | 
https://builds.apache.org/job/PreCommit-HBASE-Build/14494/artifact/patchprocess/diff-checkstyle-hbase-server.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HBASE-Build/14494/testReport/ |
|

[jira] [Commented] (HBASE-20993) [Auth] IPC client fallback to simple auth allowed doesn't work

2018-09-25 Thread Sean Busbey (JIRA)



[ 
https://issues.apache.org/jira/browse/HBASE-20993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16627620#comment-16627620
 ] 

Sean Busbey commented on HBASE-20993:
-

let me get a jira filed so we can move discussion there.

the short answer is that this would only be in nightly, so there's no worrying 
about patched states. we'd be running against a cluster running the HEAD of 
whatever branch we're testing as of the nightly run. As a first pass for the 
wire compatibility check we'd just anchor the client to a particular version 
for the branch. I like Andrew's suggestion of using 1.2.0 since it was a stable 
line and isn't yet EOM.  If that doesn't work for some release line (i.e. 2.y) 
and we've documented the break already, then we'd update for that branch. (for 
2.y perhaps to 2.0.0).

> [Auth] IPC client fallback to simple auth allowed doesn't work
> --
>
> Key: HBASE-20993
> URL: https://issues.apache.org/jira/browse/HBASE-20993
> Project: HBase
>  Issue Type: Bug
>  Components: Client, IPC/RPC, security
>Affects Versions: 1.2.6, 1.3.2, 1.2.7, 1.4.7
>Reporter: Reid Chan
>Assignee: Jack Bearden
>Priority: Critical
> Fix For: 1.5.0, 1.4.8
>
> Attachments: HBASE-20993.001.patch, 
> HBASE-20993.003.branch-1.flowchart.png, HBASE-20993.branch-1.002.patch, 
> HBASE-20993.branch-1.003.patch, HBASE-20993.branch-1.004.patch, 
> HBASE-20993.branch-1.005.patch, HBASE-20993.branch-1.006.patch, 
> HBASE-20993.branch-1.007.patch, HBASE-20993.branch-1.008.patch, 
> HBASE-20993.branch-1.009.patch, HBASE-20993.branch-1.009.patch, 
> HBASE-20993.branch-1.2.001.patch, HBASE-20993.branch-1.wip.002.patch, 
> HBASE-20993.branch-1.wip.patch, yetus-local-testpatch-output-009.txt
>
>
> It is easily reproducible.
> client's hbase-site.xml: hadoop.security.authentication:kerberos, 
> hbase.security.authentication:kerberos, 
> hbase.ipc.client.fallback-to-simple-auth-allowed:true, keytab and principal 
> are right set
> A simple auth hbase cluster, a kerberized hbase client application. 
> application trying to r/w/c/d table will have following exception:
> {code}
> javax.security.sasl.SaslException: GSS initiate failed [Caused by 
> GSSException: No valid credentials provided (Mechanism level: Failed to find 
> any Kerberos tgt)]
>   at 
> com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:211)
>   at 
> org.apache.hadoop.hbase.security.HBaseSaslRpcClient.saslConnect(HBaseSaslRpcClient.java:179)
>   at 
> org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.setupSaslConnection(RpcClientImpl.java:617)
>   at 
> org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.access$700(RpcClientImpl.java:162)
>   at 
> org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection$2.run(RpcClientImpl.java:743)
>   at 
> org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection$2.run(RpcClientImpl.java:740)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
>   at 
> org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.setupIOstreams(RpcClientImpl.java:740)
>   at 
> org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.writeRequest(RpcClientImpl.java:906)
>   at 
> org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.tracedWriteRequest(RpcClientImpl.java:873)
>   at 
> org.apache.hadoop.hbase.ipc.RpcClientImpl.call(RpcClientImpl.java:1241)
>   at 
> org.apache.hadoop.hbase.ipc.AbstractRpcClient.callBlockingMethod(AbstractRpcClient.java:227)
>   at 
> org.apache.hadoop.hbase.ipc.AbstractRpcClient$BlockingRpcChannelImplementation.callBlockingMethod(AbstractRpcClient.java:336)
>   at 
> org.apache.hadoop.hbase.protobuf.generated.MasterProtos$MasterService$BlockingStub.isMasterRunning(MasterProtos.java:58383)
>   at 
> org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation$MasterServiceStubMaker.isMasterRunning(ConnectionManager.java:1592)
>   at 
> org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation$StubMaker.makeStubNoRetries(ConnectionManager.java:1530)
>   at 
> org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation$StubMaker.makeStub(ConnectionManager.java:1552)
>   at 
> org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation$MasterServiceStubMaker.makeStub(ConnectionManager.java:1581)
>   at 
> org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.getKeepAliveMasterService(ConnectionManager.java:1738)
>   at 
> org.apache.hadoop.hbase.client.MasterCallable.prepare(MasterCallable.java:38)
>   at 
> org.ap

[jira] [Created] (HBASE-21229) Add a nightly check that client-server wire compatibility works

2018-09-25 Thread Sean Busbey (JIRA)

Sean Busbey created HBASE-21229:
---

 Summary: Add a nightly check that client-server wire compatibility 
works
 Key: HBASE-21229
 URL: https://issues.apache.org/jira/browse/HBASE-21229
 Project: HBase
  Issue Type: Improvement
  Components: test
Affects Versions: 3.0.0, 1.5.0, 1.3.3, 1.2.8, 2.2.0, 1.4.8, 2.1.1, 2.0.3
Reporter: Sean Busbey


>From HBASE-20993:

{quote}
bq. Good reminder that we lack a unit test for wire compatibility. I wonder how 
hard it would be to grab the 1.2 shaded client artifact and use it to talk with 
the server code at head of branch.

We could add a nightly test that did this pretty easily. Essentially we could 
just add it as an additional step in [the test that starts up a 1-node cluster 
and runs an example 
program|https://github.com/apache/hbase/blob/master/dev-support/hbase_nightly_pseudo-distributed-test.sh].
{quote}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HBASE-21223) [amv2] Remove abort_procedure from shell

2018-09-25 Thread stack (JIRA)



 [ 
https://issues.apache.org/jira/browse/HBASE-21223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-21223:
--
Attachment: HBASE-21223.branch-2.1.001.patch

> [amv2] Remove abort_procedure from shell
> 
>
> Key: HBASE-21223
> URL: https://issues.apache.org/jira/browse/HBASE-21223
> Project: HBase
>  Issue Type: Bug
>  Components: amv2, hbck2
>Reporter: stack
>Assignee: stack
>Priority: Critical
> Attachments: HBASE-21223.branch-2.1.001.patch
>
>
> Remove this command. It will cause more damage than it could ever solve. It 
> should exist, it should be out in hbck2, not here in user-space.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HBASE-21223) [amv2] Remove abort_procedure from shell

2018-09-25 Thread stack (JIRA)



 [ 
https://issues.apache.org/jira/browse/HBASE-21223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-21223:
--
Fix Version/s: 2.1.1
   Status: Patch Available  (was: Open)

> [amv2] Remove abort_procedure from shell
> 
>
> Key: HBASE-21223
> URL: https://issues.apache.org/jira/browse/HBASE-21223
> Project: HBase
>  Issue Type: Bug
>  Components: amv2, hbck2
>Reporter: stack
>Assignee: stack
>Priority: Critical
> Fix For: 2.1.1
>
> Attachments: HBASE-21223.branch-2.1.001.patch
>
>
> Remove this command. It will cause more damage than it could ever solve. It 
> should exist, it should be out in hbck2, not here in user-space.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HBASE-21229) Add a nightly check that client-server wire compatibility works

2018-09-25 Thread Sean Busbey (JIRA)



[ 
https://issues.apache.org/jira/browse/HBASE-21229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16627638#comment-16627638
 ] 

Sean Busbey commented on HBASE-21229:
-

I'd say probably the script that runs the nightly test should take a parameter 
for an hbase shaded client jar to use, then attempt to use that to build + run 
the example application again (presuming it covers the breakage that was seen 
in HBASE-20993). unless the hadoop bits are a problem, in which case we'll need 
another simple example program.

Then in our actual use in the nightly tests we can pass in whatever version 
client jar it is we want, probably 1.2.0.

> Add a nightly check that client-server wire compatibility works
> ---
>
> Key: HBASE-21229
> URL: https://issues.apache.org/jira/browse/HBASE-21229
> Project: HBase
>  Issue Type: Improvement
>  Components: test
>Affects Versions: 3.0.0, 1.5.0, 1.3.3, 1.2.8, 2.2.0, 1.4.8, 2.1.1, 2.0.3
>Reporter: Sean Busbey
>Priority: Major
>
> From HBASE-20993:
> {quote}
> bq. Good reminder that we lack a unit test for wire compatibility. I wonder 
> how hard it would be to grab the 1.2 shaded client artifact and use it to 
> talk with the server code at head of branch.
> We could add a nightly test that did this pretty easily. Essentially we could 
> just add it as an additional step in [the test that starts up a 1-node 
> cluster and runs an example 
> program|https://github.com/apache/hbase/blob/master/dev-support/hbase_nightly_pseudo-distributed-test.sh].
> {quote}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HBASE-20993) [Auth] IPC client fallback to simple auth allowed doesn't work

2018-09-25 Thread Sean Busbey (JIRA)



[ 
https://issues.apache.org/jira/browse/HBASE-20993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16627639#comment-16627639
 ] 

Sean Busbey commented on HBASE-20993:
-

filed HBASE-21229. I haven't assigned it to myself yet because I don't know 
when I'll have implementation time. Maybe in 1-2 weeks? If someone wants to try 
out implementing it before then I'd be happy to review.

> [Auth] IPC client fallback to simple auth allowed doesn't work
> --
>
> Key: HBASE-20993
> URL: https://issues.apache.org/jira/browse/HBASE-20993
> Project: HBase
>  Issue Type: Bug
>  Components: Client, IPC/RPC, security
>Affects Versions: 1.2.6, 1.3.2, 1.2.7, 1.4.7
>Reporter: Reid Chan
>Assignee: Jack Bearden
>Priority: Critical
> Fix For: 1.5.0, 1.4.8
>
> Attachments: HBASE-20993.001.patch, 
> HBASE-20993.003.branch-1.flowchart.png, HBASE-20993.branch-1.002.patch, 
> HBASE-20993.branch-1.003.patch, HBASE-20993.branch-1.004.patch, 
> HBASE-20993.branch-1.005.patch, HBASE-20993.branch-1.006.patch, 
> HBASE-20993.branch-1.007.patch, HBASE-20993.branch-1.008.patch, 
> HBASE-20993.branch-1.009.patch, HBASE-20993.branch-1.009.patch, 
> HBASE-20993.branch-1.2.001.patch, HBASE-20993.branch-1.wip.002.patch, 
> HBASE-20993.branch-1.wip.patch, yetus-local-testpatch-output-009.txt
>
>
> It is easily reproducible.
> client's hbase-site.xml: hadoop.security.authentication:kerberos, 
> hbase.security.authentication:kerberos, 
> hbase.ipc.client.fallback-to-simple-auth-allowed:true, keytab and principal 
> are right set
> A simple auth hbase cluster, a kerberized hbase client application. 
> application trying to r/w/c/d table will have following exception:
> {code}
> javax.security.sasl.SaslException: GSS initiate failed [Caused by 
> GSSException: No valid credentials provided (Mechanism level: Failed to find 
> any Kerberos tgt)]
>   at 
> com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:211)
>   at 
> org.apache.hadoop.hbase.security.HBaseSaslRpcClient.saslConnect(HBaseSaslRpcClient.java:179)
>   at 
> org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.setupSaslConnection(RpcClientImpl.java:617)
>   at 
> org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.access$700(RpcClientImpl.java:162)
>   at 
> org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection$2.run(RpcClientImpl.java:743)
>   at 
> org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection$2.run(RpcClientImpl.java:740)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
>   at 
> org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.setupIOstreams(RpcClientImpl.java:740)
>   at 
> org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.writeRequest(RpcClientImpl.java:906)
>   at 
> org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.tracedWriteRequest(RpcClientImpl.java:873)
>   at 
> org.apache.hadoop.hbase.ipc.RpcClientImpl.call(RpcClientImpl.java:1241)
>   at 
> org.apache.hadoop.hbase.ipc.AbstractRpcClient.callBlockingMethod(AbstractRpcClient.java:227)
>   at 
> org.apache.hadoop.hbase.ipc.AbstractRpcClient$BlockingRpcChannelImplementation.callBlockingMethod(AbstractRpcClient.java:336)
>   at 
> org.apache.hadoop.hbase.protobuf.generated.MasterProtos$MasterService$BlockingStub.isMasterRunning(MasterProtos.java:58383)
>   at 
> org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation$MasterServiceStubMaker.isMasterRunning(ConnectionManager.java:1592)
>   at 
> org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation$StubMaker.makeStubNoRetries(ConnectionManager.java:1530)
>   at 
> org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation$StubMaker.makeStub(ConnectionManager.java:1552)
>   at 
> org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation$MasterServiceStubMaker.makeStub(ConnectionManager.java:1581)
>   at 
> org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.getKeepAliveMasterService(ConnectionManager.java:1738)
>   at 
> org.apache.hadoop.hbase.client.MasterCallable.prepare(MasterCallable.java:38)
>   at 
> org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:134)
>   at 
> org.apache.hadoop.hbase.client.HBaseAdmin.executeCallable(HBaseAdmin.java:4297)
>   at 
> org.apache.hadoop.hbase.client.HBaseAdmin.executeCallable(HBaseAdmin.java:4289)
>   at 
> org.apache.hadoop.hbase.client.HBaseAdmin.createTableAsyncV2(HBaseAdmin.java:753)
>   at 
> org.apache.hadoop.hbase.client.HBase

[jira] [Updated] (HBASE-21223) [amv2] Remove abort_procedure from shell

2018-09-25 Thread stack (JIRA)



 [ 
https://issues.apache.org/jira/browse/HBASE-21223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-21223:
--
Attachment: HBASE-21223.branch-2.1.002.patch

> [amv2] Remove abort_procedure from shell
> 
>
> Key: HBASE-21223
> URL: https://issues.apache.org/jira/browse/HBASE-21223
> Project: HBase
>  Issue Type: Bug
>  Components: amv2, hbck2
>Reporter: stack
>Assignee: stack
>Priority: Critical
> Fix For: 2.1.1
>
> Attachments: HBASE-21223.branch-2.1.001.patch, 
> HBASE-21223.branch-2.1.002.patch
>
>
> Remove this command. It will cause more damage than it could ever solve. It 
> should exist, it should be out in hbck2, not here in user-space.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HBASE-21223) [amv2] Remove abort_procedure from shell

2018-09-25 Thread stack (JIRA)



[ 
https://issues.apache.org/jira/browse/HBASE-21223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16627655#comment-16627655
 ] 

stack commented on HBASE-21223:
---

.002 Addresses [~balazs.meszaros] review.

> [amv2] Remove abort_procedure from shell
> 
>
> Key: HBASE-21223
> URL: https://issues.apache.org/jira/browse/HBASE-21223
> Project: HBase
>  Issue Type: Bug
>  Components: amv2, hbck2
>Reporter: stack
>Assignee: stack
>Priority: Critical
> Fix For: 2.1.1
>
> Attachments: HBASE-21223.branch-2.1.001.patch, 
> HBASE-21223.branch-2.1.002.patch
>
>
> Remove this command. It will cause more damage than it could ever solve. It 
> should exist, it should be out in hbck2, not here in user-space.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HBASE-20993) [Auth] IPC client fallback to simple auth allowed doesn't work

2018-09-25 Thread Reid Chan (JIRA)



[ 
https://issues.apache.org/jira/browse/HBASE-20993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16627659#comment-16627659
 ] 

Reid Chan commented on HBASE-20993:
---

Thanks Sean!

> [Auth] IPC client fallback to simple auth allowed doesn't work
> --
>
> Key: HBASE-20993
> URL: https://issues.apache.org/jira/browse/HBASE-20993
> Project: HBase
>  Issue Type: Bug
>  Components: Client, IPC/RPC, security
>Affects Versions: 1.2.6, 1.3.2, 1.2.7, 1.4.7
>Reporter: Reid Chan
>Assignee: Jack Bearden
>Priority: Critical
> Fix For: 1.5.0, 1.4.8
>
> Attachments: HBASE-20993.001.patch, 
> HBASE-20993.003.branch-1.flowchart.png, HBASE-20993.branch-1.002.patch, 
> HBASE-20993.branch-1.003.patch, HBASE-20993.branch-1.004.patch, 
> HBASE-20993.branch-1.005.patch, HBASE-20993.branch-1.006.patch, 
> HBASE-20993.branch-1.007.patch, HBASE-20993.branch-1.008.patch, 
> HBASE-20993.branch-1.009.patch, HBASE-20993.branch-1.009.patch, 
> HBASE-20993.branch-1.2.001.patch, HBASE-20993.branch-1.wip.002.patch, 
> HBASE-20993.branch-1.wip.patch, yetus-local-testpatch-output-009.txt
>
>
> It is easily reproducible.
> client's hbase-site.xml: hadoop.security.authentication:kerberos, 
> hbase.security.authentication:kerberos, 
> hbase.ipc.client.fallback-to-simple-auth-allowed:true, keytab and principal 
> are right set
> A simple auth hbase cluster, a kerberized hbase client application. 
> application trying to r/w/c/d table will have following exception:
> {code}
> javax.security.sasl.SaslException: GSS initiate failed [Caused by 
> GSSException: No valid credentials provided (Mechanism level: Failed to find 
> any Kerberos tgt)]
>   at 
> com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:211)
>   at 
> org.apache.hadoop.hbase.security.HBaseSaslRpcClient.saslConnect(HBaseSaslRpcClient.java:179)
>   at 
> org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.setupSaslConnection(RpcClientImpl.java:617)
>   at 
> org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.access$700(RpcClientImpl.java:162)
>   at 
> org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection$2.run(RpcClientImpl.java:743)
>   at 
> org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection$2.run(RpcClientImpl.java:740)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
>   at 
> org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.setupIOstreams(RpcClientImpl.java:740)
>   at 
> org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.writeRequest(RpcClientImpl.java:906)
>   at 
> org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.tracedWriteRequest(RpcClientImpl.java:873)
>   at 
> org.apache.hadoop.hbase.ipc.RpcClientImpl.call(RpcClientImpl.java:1241)
>   at 
> org.apache.hadoop.hbase.ipc.AbstractRpcClient.callBlockingMethod(AbstractRpcClient.java:227)
>   at 
> org.apache.hadoop.hbase.ipc.AbstractRpcClient$BlockingRpcChannelImplementation.callBlockingMethod(AbstractRpcClient.java:336)
>   at 
> org.apache.hadoop.hbase.protobuf.generated.MasterProtos$MasterService$BlockingStub.isMasterRunning(MasterProtos.java:58383)
>   at 
> org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation$MasterServiceStubMaker.isMasterRunning(ConnectionManager.java:1592)
>   at 
> org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation$StubMaker.makeStubNoRetries(ConnectionManager.java:1530)
>   at 
> org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation$StubMaker.makeStub(ConnectionManager.java:1552)
>   at 
> org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation$MasterServiceStubMaker.makeStub(ConnectionManager.java:1581)
>   at 
> org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.getKeepAliveMasterService(ConnectionManager.java:1738)
>   at 
> org.apache.hadoop.hbase.client.MasterCallable.prepare(MasterCallable.java:38)
>   at 
> org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:134)
>   at 
> org.apache.hadoop.hbase.client.HBaseAdmin.executeCallable(HBaseAdmin.java:4297)
>   at 
> org.apache.hadoop.hbase.client.HBaseAdmin.executeCallable(HBaseAdmin.java:4289)
>   at 
> org.apache.hadoop.hbase.client.HBaseAdmin.createTableAsyncV2(HBaseAdmin.java:753)
>   at 
> org.apache.hadoop.hbase.client.HBaseAdmin.createTable(HBaseAdmin.java:674)
>   at 
> org.apache.hadoop.hbase.client.HBaseAdmin.createTable(HBaseAdmin.java:607)
>   at 
> org.playground.hbase.KerberizedClientFallback.main(KerberizedCl

[jira] [Commented] (HBASE-21217) Revisit the executeProcedure method for open/close region

2018-09-25 Thread Hudson (JIRA)



[ 
https://issues.apache.org/jira/browse/HBASE-21217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16627665#comment-16627665
 ] 

Hudson commented on HBASE-21217:


Results for branch branch-2
[build #1300 on 
builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/branch-2/1300/]: 
(x) *{color:red}-1 overall{color}*

details (if available):

(/) {color:green}+1 general checks{color}
-- For more information [see general 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2/1300//General_Nightly_Build_Report/]




(x) {color:red}-1 jdk8 hadoop2 checks{color}
-- For more information [see jdk8 (hadoop2) 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2/1300//JDK8_Nightly_Build_Report_(Hadoop2)/]


(/) {color:green}+1 jdk8 hadoop3 checks{color}
-- For more information [see jdk8 (hadoop3) 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2/1300//JDK8_Nightly_Build_Report_(Hadoop3)/]


(/) {color:green}+1 source release artifact{color}
-- See build output for details.


(/) {color:green}+1 client integration test{color}


> Revisit the executeProcedure method for open/close region
> -
>
> Key: HBASE-21217
> URL: https://issues.apache.org/jira/browse/HBASE-21217
> Project: HBase
>  Issue Type: Sub-task
>  Components: amv2, proc-v2
>Reporter: Duo Zhang
>Assignee: Duo Zhang
>Priority: Critical
> Fix For: 3.0.0, 2.2.0
>
> Attachments: HBASE-21217-v1.patch, HBASE-21217-v2.patch, 
> HBASE-21217.patch
>
>
> Currently we just call openRegion and closeRegion directly, which is a bit 
> buggy. For example, in order to not fail all the open region requests while 
> there is only one failure, we will catch the exception and set a flag in the 
> return value. But for executeProcedures call, the return value will be 
> ignored, and we expect the openRegion method will always call 
> reportRegionStateTransition to report the failure but in fact it does not...
> And after HBASE-20881, we can confirm that the race could happen, where we 
> send a close request to a region which is opening(HBASE-21199), and vice 
> visa. So I think here we need to revisit the implementation of 
> executeProcedures to make it more stable.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HBASE-21223) [amv2] Remove abort_procedure from shell

2018-09-25 Thread Hadoop QA (JIRA)



[ 
https://issues.apache.org/jira/browse/HBASE-21223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16627702#comment-16627702
 ] 

Hadoop QA commented on HBASE-21223:
---

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
17s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green}  0m  
0s{color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:orange}-0{color} | {color:orange} test4tests {color} | {color:orange}  
0m  0s{color} | {color:orange} The patch doesn't appear to include any new or 
modified tests. Please justify why no new tests are needed for this patch. Also 
please list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} branch-2.1 Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
23s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  4m 
52s{color} | {color:green} branch-2.1 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
5s{color} | {color:green} branch-2.1 passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
46s{color} | {color:green} branch-2.1 passed {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  3m 
53s{color} | {color:green} branch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m  
5s{color} | {color:green} branch-2.1 passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
34s{color} | {color:green} branch-2.1 passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
13s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  4m 
20s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
1s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m  
1s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
43s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} rubocop {color} | {color:green}  0m 
15s{color} | {color:green} The patch generated 0 new + 412 unchanged - 3 fixed 
= 412 total (was 415) {color} |
| {color:green}+1{color} | {color:green} ruby-lint {color} | {color:green}  0m  
6s{color} | {color:green} The patch generated 0 new + 747 unchanged - 3 fixed = 
747 total (was 750) {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  3m 
56s{color} | {color:green} patch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green}  
9m 42s{color} | {color:green} Patch does not cause any errors with Hadoop 2.7.4 
or 3.0.0. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
12s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
39s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red}  1m 55s{color} 
| {color:red} hbase-client in the patch failed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  7m 
46s{color} | {color:green} hbase-shell in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
18s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 45m 36s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.hbase.client.TestInterfaceAlign |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:42ca976 |
| JIRA Issue | HBASE-21223 |
| JIRA Patch URL | 
https://issues.ap

[jira] [Updated] (HBASE-14950) Create table with AC fails when quota is enabled

2018-09-25 Thread Mike Drob (JIRA)



 [ 
https://issues.apache.org/jira/browse/HBASE-14950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mike Drob updated HBASE-14950:
--
Component/s: proc-v2

> Create table with AC fails when quota is enabled
> 
>
> Key: HBASE-14950
> URL: https://issues.apache.org/jira/browse/HBASE-14950
> Project: HBase
>  Issue Type: Bug
>  Components: proc-v2
>Affects Versions: 1.1.2
>Reporter: Ashish Singhi
>Priority: Critical
>
> Scenario:
> 1. Set hbase.quota.enabled to true
> 2. As per the [ACL matrix | 
> http://hbase.apache.org/book.html#appendix_acl_matrix] for create table, 
> grant '@group1', 'C', '@ns1'
> 3. From a user of group1, create 't1', 'd'  -- *Failed*
> {noformat}
> ERROR: java.io.IOException: Namespace Descriptor found null for ns1 This is 
> unexpected.
>   at 
> org.apache.hadoop.hbase.namespace.NamespaceStateManager.checkAndUpdateNamespaceTableCount(NamespaceStateManager.java:170)
>   at 
> org.apache.hadoop.hbase.namespace.NamespaceAuditor.checkQuotaToCreateTable(NamespaceAuditor.java:76)
>   at 
> org.apache.hadoop.hbase.quotas.MasterQuotaManager.checkNamespaceTableAndRegionQuota(MasterQuotaManager.java:312)
>   at org.apache.hadoop.hbase.master.HMaster.createTable(HMaster.java:1445)
>   at 
> org.apache.hadoop.hbase.master.MasterRpcServices.createTable(MasterRpcServices.java:428)
>   at 
> org.apache.hadoop.hbase.protobuf.generated.MasterProtos$MasterService$2.callBlockingMethod(MasterProtos.java:49404)
>   at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2136)
>   at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:107)
>   at 
> org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:133)
>   at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:108)
>   at java.lang.Thread.run(Thread.java:745)
> {noformat}
> When quota is enabled, then as part of createTable we internally also call 
> getNamespaceDescriptor which needs 'A' privilege.
> So when quota is enabled we need both C and A permission to create a table. 
> ACL Matrix needs to be updated.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HBASE-21223) [amv2] Remove abort_procedure from shell

2018-09-25 Thread Hadoop QA (JIRA)



[ 
https://issues.apache.org/jira/browse/HBASE-21223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16627724#comment-16627724
 ] 

Hadoop QA commented on HBASE-21223:
---

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
12s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green}  0m  
0s{color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:orange}-0{color} | {color:orange} test4tests {color} | {color:orange}  
0m  0s{color} | {color:orange} The patch doesn't appear to include any new or 
modified tests. Please justify why no new tests are needed for this patch. Also 
please list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} branch-2.1 Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
21s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  4m 
20s{color} | {color:green} branch-2.1 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
59s{color} | {color:green} branch-2.1 passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
44s{color} | {color:green} branch-2.1 passed {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  3m 
39s{color} | {color:green} branch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m  
0s{color} | {color:green} branch-2.1 passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
30s{color} | {color:green} branch-2.1 passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
15s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  4m 
19s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
0s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m  
0s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
44s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} rubocop {color} | {color:green}  0m 
12s{color} | {color:green} The patch generated 0 new + 412 unchanged - 3 fixed 
= 412 total (was 415) {color} |
| {color:green}+1{color} | {color:green} ruby-lint {color} | {color:green}  0m  
5s{color} | {color:green} The patch generated 0 new + 747 unchanged - 3 fixed = 
747 total (was 750) {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  3m 
39s{color} | {color:green} patch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green}  
8m 55s{color} | {color:green} Patch does not cause any errors with Hadoop 2.7.4 
or 3.0.0. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m  
2s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
30s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  3m  
2s{color} | {color:green} hbase-client in the patch passed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  7m 
27s{color} | {color:green} hbase-shell in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
17s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 43m 44s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:42ca976 |
| JIRA Issue | HBASE-21223 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12941244/HBASE-21223.branch-2.1.002.patch
 |
| Opt

[jira] [Updated] (HBASE-21228) Memory leak since AbstractFSWAL caches Thread object and never clean later

2018-09-25 Thread Xu Cang (JIRA)



 [ 
https://issues.apache.org/jira/browse/HBASE-21228?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xu Cang updated HBASE-21228:

Description: 
In AbstractFSWAL(FSHLog in branch-1), we have a map caches thread and 
SyncFutures.
{code:java}
/**
   * Map of {@link SyncFuture}s keyed by Handler objects. Used so we reuse 
SyncFutures.
   * 
   * TODO: Reuse FSWALEntry's rather than create them anew each time as we do 
SyncFutures here.
   * 
   * TODO: Add a FSWalEntry and SyncFuture as thread locals on handlers rather 
than have them get
   * them from this Map?
   */
  private final ConcurrentMap syncFuturesByHandler;
{code}
A colleague of mine find a memory leak case caused by this map.

Every thread who writes WAL will be cached in this map, And no one will clean 
the threads in the map even after the thread is dead.

In one of our customer's cluster, we noticed that even though there is no 
requests, the heap of the RS is almost full and CMS GC was triggered every 
second.
We dumped the heap and then found out there were more than 30 thousands threads 
with Terminated state. which are all cached in this map above. Everything 
referenced in these threads were leaked. Most of the threads are:
1.PostOpenDeployTasksThread, which will write Open Region mark in WAL
2. hconnection-0x1f838e31-shared--pool, which are used to write index short 
circuit(Phoenix), and WAL will be write and sync in these threads.
3. Index writer thread(Phoenix), which referenced by 
RegionCoprocessorHost$RegionEnvironment then by HRegion and finally been 
referenced by PostOpenDeployTasksThread.

We should turn this map into a thread local one, let JVM GC the terminated 
thread for us.

  was:
In AbstractFSWAL(FSHLog in branch-1), we have a map caches thread and 
SyncFutures.
{code}
/**
   * Map of {@link SyncFuture}s keyed by Handler objects. Used so we reuse 
SyncFutures.
   * 
   * TODO: Reuse FSWALEntry's rather than create them anew each time as we do 
SyncFutures here.
   * 
   * TODO: Add a FSWalEntry and SyncFuture as thread locals on handlers rather 
than have them get
   * them from this Map?
   */
  private final ConcurrentMap syncFuturesByHandler;
{code}

A colleague of mine find a memory leak case caused by this map.

Every thread who writes WAL will be cached in this map, And no one will clean 
the threads in the map even after the thread is dead. 

In one of our customer's cluster, we noticed that even though there is no 
requests, the heap of the RS is almost full and CMS GC was triggered every 
second.
We dumped the heap and then found out there were more than 30 thousands threads 
with Terminated state. which are all cached in this map above. Everything 
referenced in these threads were leaked. Most of the threads are:
1.PostOpenDeployTasksThread, which will write Open Region mark in WAL
2. hconnection-0x1f838e31-shared--pool, which are used to write index short 
circuit(Phoenix), and WAL will be write and sync in these threads.
3.  Index writer thread(Phoenix), which referenced by 
RegionCoprocessorHost$RegionEnvironment then by HRegion and finally been 
referenced by PostOpenDeployTasksThread.

We should turn this map into a thread local one, let JVM GC the terminated 
thread for us. 



> Memory leak since AbstractFSWAL caches Thread object and never clean later
> --
>
> Key: HBASE-21228
> URL: https://issues.apache.org/jira/browse/HBASE-21228
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.1.0, 2.0.2, 1.4.7
>Reporter: Allan Yang
>Assignee: Allan Yang
>Priority: Major
> Attachments: HBASE-21228.branch-2.0.001.patch
>
>
> In AbstractFSWAL(FSHLog in branch-1), we have a map caches thread and 
> SyncFutures.
> {code:java}
> /**
>* Map of {@link SyncFuture}s keyed by Handler objects. Used so we reuse 
> SyncFutures.
>* 
>* TODO: Reuse FSWALEntry's rather than create them anew each time as we do 
> SyncFutures here.
>* 
>* TODO: Add a FSWalEntry and SyncFuture as thread locals on handlers 
> rather than have them get
>* them from this Map?
>*/
>   private final ConcurrentMap syncFuturesByHandler;
> {code}
> A colleague of mine find a memory leak case caused by this map.
> Every thread who writes WAL will be cached in this map, And no one will clean 
> the threads in the map even after the thread is dead.
> In one of our customer's cluster, we noticed that even though there is no 
> requests, the heap of the RS is almost full and CMS GC was triggered every 
> second.
> We dumped the heap and then found out there were more than 30 thousands 
> threads with Terminated state. which are all cached in this map above. 
> Everything referenced in these threads were leaked. Most of the threads are:
> 1.PostOpenDeployTasksThread, which will write Open Region mark

[jira] [Commented] (HBASE-18451) PeriodicMemstoreFlusher should inspect the queue before adding a delayed flush request

2018-09-25 Thread Hadoop QA (JIRA)



[ 
https://issues.apache.org/jira/browse/HBASE-18451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16627733#comment-16627733
 ] 

Hadoop QA commented on HBASE-18451:
---

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
21s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
2s{color} | {color:blue} Findbugs executables are not available. {color} |
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green}  0m  
0s{color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
1s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 2 new or modified test 
files. {color} |
|| || || || {color:brown} branch-1 Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
54s{color} | {color:green} branch-1 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
35s{color} | {color:green} branch-1 passed with JDK v1.8.0_181 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
37s{color} | {color:green} branch-1 passed with JDK v1.7.0_191 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
21s{color} | {color:green} branch-1 passed {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  2m 
35s{color} | {color:green} branch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
26s{color} | {color:green} branch-1 passed with JDK v1.8.0_181 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
35s{color} | {color:green} branch-1 passed with JDK v1.7.0_191 {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
34s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
34s{color} | {color:green} the patch passed with JDK v1.8.0_181 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
34s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
38s{color} | {color:green} the patch passed with JDK v1.7.0_191 {color} |
| {color:red}-1{color} | {color:red} javac {color} | {color:red}  0m 38s{color} 
| {color:red} hbase-server-jdk1.7.0_191 with JDK v1.7.0_191 generated 2 new + 4 
unchanged - 2 fixed = 6 total (was 6) {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
20s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  2m 
35s{color} | {color:green} patch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green}  
1m 32s{color} | {color:green} Patch does not cause any errors with Hadoop 
2.7.4. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
26s{color} | {color:green} the patch passed with JDK v1.8.0_181 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
35s{color} | {color:green} the patch passed with JDK v1.7.0_191 {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}106m  
9s{color} | {color:green} hbase-server in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
20s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}124m 41s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:61288f8 |
| JIRA Issue | HBASE-18451 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12941237/HBASE-18451.branch-1.002.patch
 |
| Optional Tests |  asflicense  javac  javadoc  unit  findbugs  shadedjars  
hadoopcheck  hbaseanti  checkstyle  compile  |
| uname | Linux 20367d59c081 3.13.0-139-generic #188-Ubuntu SMP Tue Jan 9 
14:43:0

[jira] [Commented] (HBASE-21228) Memory leak since AbstractFSWAL caches Thread object and never clean later

2018-09-25 Thread Xu Cang (JIRA)



[ 
https://issues.apache.org/jira/browse/HBASE-21228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16627742#comment-16627742
 ] 

Xu Cang commented on HBASE-21228:
-

nit: Remove the line after TODO you've removed.

[~allan163]

 

> Memory leak since AbstractFSWAL caches Thread object and never clean later
> --
>
> Key: HBASE-21228
> URL: https://issues.apache.org/jira/browse/HBASE-21228
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.1.0, 2.0.2, 1.4.7
>Reporter: Allan Yang
>Assignee: Allan Yang
>Priority: Major
> Attachments: HBASE-21228.branch-2.0.001.patch
>
>
> In AbstractFSWAL(FSHLog in branch-1), we have a map caches thread and 
> SyncFutures.
> {code:java}
> /**
>* Map of {@link SyncFuture}s keyed by Handler objects. Used so we reuse 
> SyncFutures.
>* 
>* TODO: Reuse FSWALEntry's rather than create them anew each time as we do 
> SyncFutures here.
>* 
>* TODO: Add a FSWalEntry and SyncFuture as thread locals on handlers 
> rather than have them get
>* them from this Map?
>*/
>   private final ConcurrentMap syncFuturesByHandler;
> {code}
> A colleague of mine find a memory leak case caused by this map.
> Every thread who writes WAL will be cached in this map, And no one will clean 
> the threads in the map even after the thread is dead.
> In one of our customer's cluster, we noticed that even though there is no 
> requests, the heap of the RS is almost full and CMS GC was triggered every 
> second.
> We dumped the heap and then found out there were more than 30 thousands 
> threads with Terminated state. which are all cached in this map above. 
> Everything referenced in these threads were leaked. Most of the threads are:
> 1.PostOpenDeployTasksThread, which will write Open Region mark in WAL
> 2. hconnection-0x1f838e31-shared--pool, which are used to write index short 
> circuit(Phoenix), and WAL will be write and sync in these threads.
> 3. Index writer thread(Phoenix), which referenced by 
> RegionCoprocessorHost$RegionEnvironment then by HRegion and finally been 
> referenced by PostOpenDeployTasksThread.
> We should turn this map into a thread local one, let JVM GC the terminated 
> thread for us.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HBASE-21223) [amv2] Remove abort_procedure from shell

2018-09-25 Thread stack (JIRA)



 [ 
https://issues.apache.org/jira/browse/HBASE-21223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-21223:
--
  Resolution: Fixed
Hadoop Flags: Incompatible change,Reviewed
Release Note: Removed the abort_procedure command from shell -- dangerous 
-- and deprecated abortProcedure in Admin API.
  Status: Resolved  (was: Patch Available)

> [amv2] Remove abort_procedure from shell
> 
>
> Key: HBASE-21223
> URL: https://issues.apache.org/jira/browse/HBASE-21223
> Project: HBase
>  Issue Type: Bug
>  Components: amv2, hbck2
>Reporter: stack
>Assignee: stack
>Priority: Critical
> Fix For: 2.1.1
>
> Attachments: HBASE-21223.branch-2.1.001.patch, 
> HBASE-21223.branch-2.1.002.patch
>
>
> Remove this command. It will cause more damage than it could ever solve. It 
> should exist, it should be out in hbck2, not here in user-space.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HBASE-21164) reportForDuty to spew less log if master is initializing

2018-09-25 Thread stack (JIRA)



 [ 
https://issues.apache.org/jira/browse/HBASE-21164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-21164:
--
   Resolution: Fixed
Fix Version/s: 2.1.1
   2.2.0
   3.0.0
   Status: Resolved  (was: Patch Available)

Pushed to branch-2.1+. Thank you for the nice improvement and sticking with the 
issue [~liuml07]

> reportForDuty to spew less log if master is initializing
> 
>
> Key: HBASE-21164
> URL: https://issues.apache.org/jira/browse/HBASE-21164
> Project: HBase
>  Issue Type: Improvement
>  Components: regionserver
>Reporter: stack
>Assignee: Mingliang Liu
>Priority: Minor
> Fix For: 3.0.0, 2.2.0, 2.1.1
>
> Attachments: HBASE-21164.005.patch, HBASE-21164.006.patch, 
> HBASE-21164.007.patch, HBASE-21164.008.patch, HBASE-21164.009.patch, 
> HBASE-21164.branch-2.1.001.patch, HBASE-21164.branch-2.1.002.patch, 
> HBASE-21164.branch-2.1.003.patch, HBASE-21164.branch-2.1.004.patch
>
>
> RegionServers do reportForDuty on startup to tell Master they are available. 
> If Master is initializing, and especially on a big cluster when it can take a 
> while particularly if something is amiss, the log every three seconds is 
> annoying and doesn't do anything of use. We should spew less those logs. Here 
> is example:
> {code:java}
> 2018-09-06 14:01:39,312 INFO 
> org.apache.hadoop.hbase.regionserver.HRegionServer: reportForDuty to 
> master=vc0207.halxg.cloudera.com,22001,1536266763109 with port=22001, 
> startcode=1536266763109
> 2018-09-06 14:01:39,312 WARN 
> org.apache.hadoop.hbase.regionserver.HRegionServer: reportForDuty failed; 
> sleeping and then retrying.
> 
> {code}
> For example, I am looking at a large cluster now that had a backlog of 
> procedure WALs. It is taking a couple of hours recreating the procedure-state 
> because there are millions of procedures outstanding. Meantime, the Master 
> log is just full of the above message – every three seconds...



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HBASE-21223) [amv2] Remove abort_procedure from shell

2018-09-25 Thread stack (JIRA)



 [ 
https://issues.apache.org/jira/browse/HBASE-21223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-21223:
--
Fix Version/s: 2.2.0
   3.0.0

> [amv2] Remove abort_procedure from shell
> 
>
> Key: HBASE-21223
> URL: https://issues.apache.org/jira/browse/HBASE-21223
> Project: HBase
>  Issue Type: Bug
>  Components: amv2, hbck2, shell
>Reporter: stack
>Assignee: stack
>Priority: Critical
> Fix For: 3.0.0, 2.2.0, 2.1.1
>
> Attachments: HBASE-21223.branch-2.1.001.patch, 
> HBASE-21223.branch-2.1.002.patch
>
>
> Remove this command. It will cause more damage than it could ever solve. It 
> should exist, it should be out in hbck2, not here in user-space.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HBASE-21223) [amv2] Remove abort_procedure from shell

2018-09-25 Thread stack (JIRA)



 [ 
https://issues.apache.org/jira/browse/HBASE-21223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-21223:
--
Component/s: shell

> [amv2] Remove abort_procedure from shell
> 
>
> Key: HBASE-21223
> URL: https://issues.apache.org/jira/browse/HBASE-21223
> Project: HBase
>  Issue Type: Bug
>  Components: amv2, hbck2, shell
>Reporter: stack
>Assignee: stack
>Priority: Critical
> Fix For: 3.0.0, 2.2.0, 2.1.1
>
> Attachments: HBASE-21223.branch-2.1.001.patch, 
> HBASE-21223.branch-2.1.002.patch
>
>
> Remove this command. It will cause more damage than it could ever solve. It 
> should exist, it should be out in hbck2, not here in user-space.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HBASE-21228) Memory leak since AbstractFSWAL caches Thread object and never clean later

2018-09-25 Thread Hadoop QA (JIRA)



[ 
https://issues.apache.org/jira/browse/HBASE-21228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16627788#comment-16627788
 ] 

Hadoop QA commented on HBASE-21228:
---

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
28s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green}  0m  
0s{color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:orange}-0{color} | {color:orange} test4tests {color} | {color:orange}  
0m  0s{color} | {color:orange} The patch doesn't appear to include any new or 
modified tests. Please justify why no new tests are needed for this patch. Also 
please list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} branch-2.0 Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  3m 
55s{color} | {color:green} branch-2.0 passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  2m  
7s{color} | {color:green} branch-2.0 passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
25s{color} | {color:green} branch-2.0 passed {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  4m 
27s{color} | {color:green} branch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
46s{color} | {color:green} branch-2.0 passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
36s{color} | {color:green} branch-2.0 passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  3m 
 3s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
56s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m 
56s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
24s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  4m 
25s{color} | {color:green} patch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green}  
9m  0s{color} | {color:green} Patch does not cause any errors with Hadoop 2.6.5 
2.7.4 or 3.0.0. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m 
17s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
28s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red}113m 30s{color} 
| {color:red} hbase-server in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
22s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}152m 40s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | 
hadoop.hbase.master.procedure.TestMasterFailoverWithProcedures |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:6f01af0 |
| JIRA Issue | HBASE-21228 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12941235/HBASE-21228.branch-2.0.001.patch
 |
| Optional Tests |  asflicense  javac  javadoc  unit  findbugs  shadedjars  
hadoopcheck  hbaseanti  checkstyle  compile  |
| uname | Linux c003d3bc0efe 3.13.0-143-generic #192-Ubuntu SMP Tue Feb 27 
10:45:36 UTC 2018 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build@2/component/dev-support/hbase-personality.sh
 |
| git revision | branch-2.0 / cf915f9c7c |
| maven | version: Apache Maven 3.5.4 
(1edded0938998edf8bf061f1ceb3cfdeccf443fe; 2018-06-17T18:33:14Z) |
| Default Java | 1.8.0_181 |
| findbugs | v3.1.0-RC3 |
| unit | 
https://builds.apache.org/job/PreCommit-HBASE-Build/14495/artifact/patchprocess/patch-unit-hbase-server.txt
 |

[jira] [Commented] (HBASE-14950) Create table with AC fails when quota is enabled

2018-09-25 Thread Mike Drob (JIRA)



[ 
https://issues.apache.org/jira/browse/HBASE-14950?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16627841#comment-16627841
 ] 

Mike Drob commented on HBASE-14950:
---

I tested this on branch 2.1 and was not able to reproduce this issue.

> Create table with AC fails when quota is enabled
> 
>
> Key: HBASE-14950
> URL: https://issues.apache.org/jira/browse/HBASE-14950
> Project: HBase
>  Issue Type: Bug
>  Components: proc-v2
>Affects Versions: 1.1.2
>Reporter: Ashish Singhi
>Priority: Critical
>
> Scenario:
> 1. Set hbase.quota.enabled to true
> 2. As per the [ACL matrix | 
> http://hbase.apache.org/book.html#appendix_acl_matrix] for create table, 
> grant '@group1', 'C', '@ns1'
> 3. From a user of group1, create 't1', 'd'  -- *Failed*
> {noformat}
> ERROR: java.io.IOException: Namespace Descriptor found null for ns1 This is 
> unexpected.
>   at 
> org.apache.hadoop.hbase.namespace.NamespaceStateManager.checkAndUpdateNamespaceTableCount(NamespaceStateManager.java:170)
>   at 
> org.apache.hadoop.hbase.namespace.NamespaceAuditor.checkQuotaToCreateTable(NamespaceAuditor.java:76)
>   at 
> org.apache.hadoop.hbase.quotas.MasterQuotaManager.checkNamespaceTableAndRegionQuota(MasterQuotaManager.java:312)
>   at org.apache.hadoop.hbase.master.HMaster.createTable(HMaster.java:1445)
>   at 
> org.apache.hadoop.hbase.master.MasterRpcServices.createTable(MasterRpcServices.java:428)
>   at 
> org.apache.hadoop.hbase.protobuf.generated.MasterProtos$MasterService$2.callBlockingMethod(MasterProtos.java:49404)
>   at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2136)
>   at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:107)
>   at 
> org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:133)
>   at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:108)
>   at java.lang.Thread.run(Thread.java:745)
> {noformat}
> When quota is enabled, then as part of createTable we internally also call 
> getNamespaceDescriptor which needs 'A' privilege.
> So when quota is enabled we need both C and A permission to create a table. 
> ACL Matrix needs to be updated.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Resolved] (HBASE-14950) Create table with AC fails when quota is enabled

2018-09-25 Thread Mike Drob (JIRA)



 [ 
https://issues.apache.org/jira/browse/HBASE-14950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mike Drob resolved HBASE-14950.
---
Resolution: Cannot Reproduce

> Create table with AC fails when quota is enabled
> 
>
> Key: HBASE-14950
> URL: https://issues.apache.org/jira/browse/HBASE-14950
> Project: HBase
>  Issue Type: Bug
>  Components: proc-v2
>Affects Versions: 1.1.2
>Reporter: Ashish Singhi
>Priority: Critical
>
> Scenario:
> 1. Set hbase.quota.enabled to true
> 2. As per the [ACL matrix | 
> http://hbase.apache.org/book.html#appendix_acl_matrix] for create table, 
> grant '@group1', 'C', '@ns1'
> 3. From a user of group1, create 't1', 'd'  -- *Failed*
> {noformat}
> ERROR: java.io.IOException: Namespace Descriptor found null for ns1 This is 
> unexpected.
>   at 
> org.apache.hadoop.hbase.namespace.NamespaceStateManager.checkAndUpdateNamespaceTableCount(NamespaceStateManager.java:170)
>   at 
> org.apache.hadoop.hbase.namespace.NamespaceAuditor.checkQuotaToCreateTable(NamespaceAuditor.java:76)
>   at 
> org.apache.hadoop.hbase.quotas.MasterQuotaManager.checkNamespaceTableAndRegionQuota(MasterQuotaManager.java:312)
>   at org.apache.hadoop.hbase.master.HMaster.createTable(HMaster.java:1445)
>   at 
> org.apache.hadoop.hbase.master.MasterRpcServices.createTable(MasterRpcServices.java:428)
>   at 
> org.apache.hadoop.hbase.protobuf.generated.MasterProtos$MasterService$2.callBlockingMethod(MasterProtos.java:49404)
>   at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2136)
>   at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:107)
>   at 
> org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:133)
>   at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:108)
>   at java.lang.Thread.run(Thread.java:745)
> {noformat}
> When quota is enabled, then as part of createTable we internally also call 
> getNamespaceDescriptor which needs 'A' privilege.
> So when quota is enabled we need both C and A permission to create a table. 
> ACL Matrix needs to be updated.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Resolved] (HBASE-14707) NPE spew getting metrics via jmx

2018-09-25 Thread Mike Drob (JIRA)



 [ 
https://issues.apache.org/jira/browse/HBASE-14707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mike Drob resolved HBASE-14707.
---
Resolution: Cannot Reproduce

Not seen in a while and not enough info to reproduce. The JVM used for the 
original report seems to have stack trace optimization turned on, where it 
discards the rest of the NPE after printing it the first few times. If this 
comes up again, then we'll try to address it.

> NPE spew getting metrics via jmx
> 
>
> Key: HBASE-14707
> URL: https://issues.apache.org/jira/browse/HBASE-14707
> Project: HBase
>  Issue Type: Bug
>  Components: metrics
>Reporter: stack
>Priority: Major
>
> See this in branch-1 tip:
> {code}
> 2015-10-27 08:01:08,954 INFO  [main-EventThread] 
> replication.ReplicationTrackerZKImpl: 
> /hbase/rs/e1101.halxg.cloudera.com,16020,1445958006576 znode expired, 
> triggering replicatorRemoved event
> 2015-10-27 08:01:20,645 ERROR [685943200@qtp-893835279-134] util.JSONBean: 
> getting attribute Value of 
> "org.apache.hadoop.hbase.client":type="MetricsConnection",scope="hconnection-0x33abd9d3",name="executorPoolActiveThreads"
>  threw an exception
> javax.management.RuntimeMBeanException: java.lang.NullPointerException
> at 
> com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.rethrow(DefaultMBeanServerInterceptor.java:839)
> at 
> com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.rethrowMaybeMBeanException(DefaultMBeanServerInterceptor.java:852)
> at 
> com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.getAttribute(DefaultMBeanServerInterceptor.java:651)
> at 
> com.sun.jmx.mbeanserver.JmxMBeanServer.getAttribute(JmxMBeanServer.java:678)
> at 
> org.apache.hadoop.hbase.util.JSONBean.writeAttribute(JSONBean.java:235)
> at org.apache.hadoop.hbase.util.JSONBean.write(JSONBean.java:209)
> at org.apache.hadoop.hbase.util.JSONBean.access$000(JSONBean.java:53)
> at org.apache.hadoop.hbase.util.JSONBean$1.write(JSONBean.java:96)
> at 
> org.apache.hadoop.hbase.http.jmx.JMXJsonServlet.doGet(JMXJsonServlet.java:202)
> at javax.servlet.http.HttpServlet.service(HttpServlet.java:707)
> at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
> at 
> org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:511)
> at 
> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1221)
> at 
> org.apache.hadoop.hbase.http.lib.StaticUserWebFilter$StaticUserFilter.doFilter(StaticUserWebFilter.java:113)
> at 
> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
> at 
> org.apache.hadoop.hbase.http.ClickjackingPreventionFilter.doFilter(ClickjackingPreventionFilter.java:48)
> at 
> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
> at 
> org.apache.hadoop.hbase.http.HttpServer$QuotingInputFilter.doFilter(HttpServer.java:1354)
> at 
> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
> at 
> org.apache.hadoop.hbase.http.NoCacheFilter.doFilter(NoCacheFilter.java:49)
> at 
> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
> at 
> org.apache.hadoop.hbase.http.NoCacheFilter.doFilter(NoCacheFilter.java:49)
> at 
> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
> at 
> org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
> at 
> org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
> at 
> org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
> at 
> org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
> at 
> org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)
> at 
> org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
> at 
> org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
> at org.mortbay.jetty.Server.handle(Server.java:326)
> at 
> org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
> at 
> org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928)
> at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549)
> at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212)
> at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
> at 
> org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:410)
> at 
> org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)
> Caused

[jira] [Commented] (HBASE-20225) [RPC] Server does not say what version it is

2018-09-25 Thread stack (JIRA)



[ 
https://issues.apache.org/jira/browse/HBASE-20225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16627857#comment-16627857
 ] 

stack commented on HBASE-20225:
---

Back here again. HBCK2 wants to know what version it is going against so it can 
see if the remote side has support for particular operations.

> [RPC] Server does not say what version it is
> 
>
> Key: HBASE-20225
> URL: https://issues.apache.org/jira/browse/HBASE-20225
> Project: HBase
>  Issue Type: Bug
>  Components: rpc
>Affects Versions: 2.0.0
>Reporter: stack
>Priority: Major
>
> Strange. Server does not tell clients what version it is. It is explicitly 
> this way. See [1] from refguide appendix on rpc protocol where client says 
> what it wants and server is silent unless it is unable to satisfy the client 
> request. I suppose it made sense at the time trying to squeeze in protobuf 
> handling into a pre-existing RPC but in hindsight, it seems a little silly we 
> don't answer the Connection setup with a Connection setup response that has 
> stuff like server version and capabilities.
> Its not so much a problem for our clients currently but I'm in here because 
> asynchbase is broke against hbase2 [2]; hbase2 removes support for 
> getClosestRowOrBefore doing meta lookups; clients are supposed to do a 
> reverse scan instead.
> [~manolamancha] has just made a fix but you have to specify you are 
> connecting to hbase2 which is not how asynchbase does it; in the past, 
> asynchbase would just figure out what to do going off hints and exceptions 
> thrown by our server.
> Not sure there is anything to do here. I tried reto-fitting a connection 
> response but it will break hbase1 clients which we need to avoid.
> Here is yet another reason for our throwing away this home-grown RPC. New 
> Project: put up an alternate port on which we'd provide a modern RPC, one 
> that does streaming, etc., and basics like return server version.
> 1. http://hbase.apache.org/book.html#_server
> 2. https://github.com/OpenTSDB/asynchbase/issues/150#issuecomment-373949082



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HBASE-20225) [RPC] Server does not tell clients what version it is

2018-09-25 Thread stack (JIRA)



 [ 
https://issues.apache.org/jira/browse/HBASE-20225?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-20225:
--
Summary: [RPC] Server does not tell clients what version it is  (was: [RPC] 
Server does not say what version it is)

> [RPC] Server does not tell clients what version it is
> -
>
> Key: HBASE-20225
> URL: https://issues.apache.org/jira/browse/HBASE-20225
> Project: HBase
>  Issue Type: Bug
>  Components: rpc
>Affects Versions: 2.0.0
>Reporter: stack
>Priority: Critical
>
> Strange. Server does not tell clients what version it is. It is explicitly 
> this way. See [1] from refguide appendix on rpc protocol where client says 
> what it wants and server is silent unless it is unable to satisfy the client 
> request. I suppose it made sense at the time trying to squeeze in protobuf 
> handling into a pre-existing RPC but in hindsight, it seems a little silly we 
> don't answer the Connection setup with a Connection setup response that has 
> stuff like server version and capabilities.
> Its not so much a problem for our clients currently but I'm in here because 
> asynchbase is broke against hbase2 [2]; hbase2 removes support for 
> getClosestRowOrBefore doing meta lookups; clients are supposed to do a 
> reverse scan instead.
> [~manolamancha] has just made a fix but you have to specify you are 
> connecting to hbase2 which is not how asynchbase does it; in the past, 
> asynchbase would just figure out what to do going off hints and exceptions 
> thrown by our server.
> Not sure there is anything to do here. I tried reto-fitting a connection 
> response but it will break hbase1 clients which we need to avoid.
> Here is yet another reason for our throwing away this home-grown RPC. New 
> Project: put up an alternate port on which we'd provide a modern RPC, one 
> that does streaming, etc., and basics like return server version.
> 1. http://hbase.apache.org/book.html#_server
> 2. https://github.com/OpenTSDB/asynchbase/issues/150#issuecomment-373949082



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HBASE-20225) [RPC] Server does not say what version it is

2018-09-25 Thread stack (JIRA)



 [ 
https://issues.apache.org/jira/browse/HBASE-20225?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-20225:
--
Priority: Critical  (was: Major)

> [RPC] Server does not say what version it is
> 
>
> Key: HBASE-20225
> URL: https://issues.apache.org/jira/browse/HBASE-20225
> Project: HBase
>  Issue Type: Bug
>  Components: rpc
>Affects Versions: 2.0.0
>Reporter: stack
>Priority: Critical
>
> Strange. Server does not tell clients what version it is. It is explicitly 
> this way. See [1] from refguide appendix on rpc protocol where client says 
> what it wants and server is silent unless it is unable to satisfy the client 
> request. I suppose it made sense at the time trying to squeeze in protobuf 
> handling into a pre-existing RPC but in hindsight, it seems a little silly we 
> don't answer the Connection setup with a Connection setup response that has 
> stuff like server version and capabilities.
> Its not so much a problem for our clients currently but I'm in here because 
> asynchbase is broke against hbase2 [2]; hbase2 removes support for 
> getClosestRowOrBefore doing meta lookups; clients are supposed to do a 
> reverse scan instead.
> [~manolamancha] has just made a fix but you have to specify you are 
> connecting to hbase2 which is not how asynchbase does it; in the past, 
> asynchbase would just figure out what to do going off hints and exceptions 
> thrown by our server.
> Not sure there is anything to do here. I tried reto-fitting a connection 
> response but it will break hbase1 clients which we need to avoid.
> Here is yet another reason for our throwing away this home-grown RPC. New 
> Project: put up an alternate port on which we'd provide a modern RPC, one 
> that does streaming, etc., and basics like return server version.
> 1. http://hbase.apache.org/book.html#_server
> 2. https://github.com/OpenTSDB/asynchbase/issues/150#issuecomment-373949082



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (HBASE-21230) BackupUtils#checkTargetDir doesn't compose error message correctly

2018-09-25 Thread Ted Yu (JIRA)

Ted Yu created HBASE-21230:
--

 Summary: BackupUtils#checkTargetDir doesn't compose error message 
correctly
 Key: HBASE-21230
 URL: https://issues.apache.org/jira/browse/HBASE-21230
 Project: HBase
  Issue Type: Bug
  Components: backup&restore
Reporter: Ted Yu


Here is related code:
{code}
  String expMsg = e.getMessage();
  String newMsg = null;
  if (expMsg.contains("No FileSystem for scheme")) {
newMsg =
"Unsupported filesystem scheme found in the backup target url. 
Error Message: "
+ newMsg;
{code}
I think the intention was to concatenate expMsg at the end of newMsg.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (HBASE-21231) Add documentation for MajorCompactor

2018-09-25 Thread Balazs Meszaros (JIRA)

Balazs Meszaros created HBASE-21231:
---

 Summary: Add documentation for MajorCompactor
 Key: HBASE-21231
 URL: https://issues.apache.org/jira/browse/HBASE-21231
 Project: HBase
  Issue Type: Task
  Components: documentation
Affects Versions: 3.0.0
Reporter: Balazs Meszaros
Assignee: Balazs Meszaros


HBASE-19528 added a new MajorCompactor tool, but it lacks of documentation. 
Let's document it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HBASE-21231) Add documentation for MajorCompactor

2018-09-25 Thread Balazs Meszaros (JIRA)



 [ 
https://issues.apache.org/jira/browse/HBASE-21231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Balazs Meszaros updated HBASE-21231:

Attachment: HBASE-21231.master.001.patch

> Add documentation for MajorCompactor
> 
>
> Key: HBASE-21231
> URL: https://issues.apache.org/jira/browse/HBASE-21231
> Project: HBase
>  Issue Type: Task
>  Components: documentation
>Affects Versions: 3.0.0
>Reporter: Balazs Meszaros
>Assignee: Balazs Meszaros
>Priority: Major
> Attachments: HBASE-21231.master.001.patch
>
>
> HBASE-19528 added a new MajorCompactor tool, but it lacks of documentation. 
> Let's document it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HBASE-21231) Add documentation for MajorCompactor

2018-09-25 Thread Balazs Meszaros (JIRA)



 [ 
https://issues.apache.org/jira/browse/HBASE-21231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Balazs Meszaros updated HBASE-21231:

Status: Patch Available  (was: Open)

> Add documentation for MajorCompactor
> 
>
> Key: HBASE-21231
> URL: https://issues.apache.org/jira/browse/HBASE-21231
> Project: HBase
>  Issue Type: Task
>  Components: documentation
>Affects Versions: 3.0.0
>Reporter: Balazs Meszaros
>Assignee: Balazs Meszaros
>Priority: Major
> Attachments: HBASE-21231.master.001.patch
>
>
> HBASE-19528 added a new MajorCompactor tool, but it lacks of documentation. 
> Let's document it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HBASE-21231) Add documentation for MajorCompactor

2018-09-25 Thread Balazs Meszaros (JIRA)



 [ 
https://issues.apache.org/jira/browse/HBASE-21231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Balazs Meszaros updated HBASE-21231:

Fix Version/s: 3.0.0

> Add documentation for MajorCompactor
> 
>
> Key: HBASE-21231
> URL: https://issues.apache.org/jira/browse/HBASE-21231
> Project: HBase
>  Issue Type: Task
>  Components: documentation
>Affects Versions: 3.0.0
>Reporter: Balazs Meszaros
>Assignee: Balazs Meszaros
>Priority: Major
> Fix For: 3.0.0
>
> Attachments: HBASE-21231.master.001.patch
>
>
> HBASE-19528 added a new MajorCompactor tool, but it lacks of documentation. 
> Let's document it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HBASE-21231) Add documentation for MajorCompactor

2018-09-25 Thread Hadoop QA (JIRA)



[ 
https://issues.apache.org/jira/browse/HBASE-21231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16628022#comment-16628022
 ] 

Hadoop QA commented on HBASE-21231:
---

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
11s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:blue}0{color} | {color:blue} shelldocs {color} | {color:blue}  0m  
3s{color} | {color:blue} Shelldocs was not available. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  5m 
54s{color} | {color:green} master passed {color} |
| {color:blue}0{color} | {color:blue} refguide {color} | {color:blue}  5m 
30s{color} | {color:blue} branch has no errors when building the reference 
guide. See footer for rendered docs, which you should manually inspect. {color} 
|
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} shellcheck {color} | {color:green}  0m 
 1s{color} | {color:green} There were no new shellcheck issues. {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:blue}0{color} | {color:blue} refguide {color} | {color:blue}  4m 
58s{color} | {color:blue} patch has no errors when building the reference 
guide. See footer for rendered docs, which you should manually inspect. {color} 
|
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
11s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 17m  3s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:b002b0b |
| JIRA Issue | HBASE-21231 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12941299/HBASE-21231.master.001.patch
 |
| Optional Tests |  asflicense  shellcheck  shelldocs  refguide  |
| uname | Linux 37909b5bdcee 3.13.0-139-generic #188-Ubuntu SMP Tue Jan 9 
14:43:09 UTC 2018 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh
 |
| git revision | master / 08c4d70aaf |
| maven | version: Apache Maven 3.5.4 
(1edded0938998edf8bf061f1ceb3cfdeccf443fe; 2018-06-17T18:33:14Z) |
| shellcheck | v0.4.4 |
| refguide | 
https://builds.apache.org/job/PreCommit-HBASE-Build/14499/artifact/patchprocess/branch-site/book.html
 |
| refguide | 
https://builds.apache.org/job/PreCommit-HBASE-Build/14499/artifact/patchprocess/patch-site/book.html
 |
| Max. process+thread count | 83 (vs. ulimit of 1) |
| modules | C: . U: . |
| Console output | 
https://builds.apache.org/job/PreCommit-HBASE-Build/14499/console |
| Powered by | Apache Yetus 0.7.0   http://yetus.apache.org |


This message was automatically generated.



> Add documentation for MajorCompactor
> 
>
> Key: HBASE-21231
> URL: https://issues.apache.org/jira/browse/HBASE-21231
> Project: HBase
>  Issue Type: Task
>  Components: documentation
>Affects Versions: 3.0.0
>Reporter: Balazs Meszaros
>Assignee: Balazs Meszaros
>Priority: Major
> Fix For: 3.0.0
>
> Attachments: HBASE-21231.master.001.patch
>
>
> HBASE-19528 added a new MajorCompactor tool, but it lacks of documentation. 
> Let's document it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HBASE-21223) [amv2] Remove abort_procedure from shell

2018-09-25 Thread Hudson (JIRA)



[ 
https://issues.apache.org/jira/browse/HBASE-21223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16628052#comment-16628052
 ] 

Hudson commented on HBASE-21223:


Results for branch branch-2.1
[build #376 on 
builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.1/376/]: 
(x) *{color:red}-1 overall{color}*

details (if available):

(/) {color:green}+1 general checks{color}
-- For more information [see general 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.1/376//General_Nightly_Build_Report/]




(x) {color:red}-1 jdk8 hadoop2 checks{color}
-- For more information [see jdk8 (hadoop2) 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.1/376//JDK8_Nightly_Build_Report_(Hadoop2)/]


(x) {color:red}-1 jdk8 hadoop3 checks{color}
-- For more information [see jdk8 (hadoop3) 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.1/376//JDK8_Nightly_Build_Report_(Hadoop3)/]


(/) {color:green}+1 source release artifact{color}
-- See build output for details.


(/) {color:green}+1 client integration test{color}


> [amv2] Remove abort_procedure from shell
> 
>
> Key: HBASE-21223
> URL: https://issues.apache.org/jira/browse/HBASE-21223
> Project: HBase
>  Issue Type: Bug
>  Components: amv2, hbck2, shell
>Reporter: stack
>Assignee: stack
>Priority: Critical
> Fix For: 3.0.0, 2.2.0, 2.1.1
>
> Attachments: HBASE-21223.branch-2.1.001.patch, 
> HBASE-21223.branch-2.1.002.patch
>
>
> Remove this command. It will cause more damage than it could ever solve. It 
> should exist, it should be out in hbck2, not here in user-space.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HBASE-21164) reportForDuty to spew less log if master is initializing

2018-09-25 Thread Hudson (JIRA)



[ 
https://issues.apache.org/jira/browse/HBASE-21164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16628053#comment-16628053
 ] 

Hudson commented on HBASE-21164:


Results for branch branch-2.1
[build #376 on 
builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.1/376/]: 
(x) *{color:red}-1 overall{color}*

details (if available):

(/) {color:green}+1 general checks{color}
-- For more information [see general 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.1/376//General_Nightly_Build_Report/]




(x) {color:red}-1 jdk8 hadoop2 checks{color}
-- For more information [see jdk8 (hadoop2) 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.1/376//JDK8_Nightly_Build_Report_(Hadoop2)/]


(x) {color:red}-1 jdk8 hadoop3 checks{color}
-- For more information [see jdk8 (hadoop3) 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.1/376//JDK8_Nightly_Build_Report_(Hadoop3)/]


(/) {color:green}+1 source release artifact{color}
-- See build output for details.


(/) {color:green}+1 client integration test{color}


> reportForDuty to spew less log if master is initializing
> 
>
> Key: HBASE-21164
> URL: https://issues.apache.org/jira/browse/HBASE-21164
> Project: HBase
>  Issue Type: Improvement
>  Components: regionserver
>Reporter: stack
>Assignee: Mingliang Liu
>Priority: Minor
> Fix For: 3.0.0, 2.2.0, 2.1.1
>
> Attachments: HBASE-21164.005.patch, HBASE-21164.006.patch, 
> HBASE-21164.007.patch, HBASE-21164.008.patch, HBASE-21164.009.patch, 
> HBASE-21164.branch-2.1.001.patch, HBASE-21164.branch-2.1.002.patch, 
> HBASE-21164.branch-2.1.003.patch, HBASE-21164.branch-2.1.004.patch
>
>
> RegionServers do reportForDuty on startup to tell Master they are available. 
> If Master is initializing, and especially on a big cluster when it can take a 
> while particularly if something is amiss, the log every three seconds is 
> annoying and doesn't do anything of use. We should spew less those logs. Here 
> is example:
> {code:java}
> 2018-09-06 14:01:39,312 INFO 
> org.apache.hadoop.hbase.regionserver.HRegionServer: reportForDuty to 
> master=vc0207.halxg.cloudera.com,22001,1536266763109 with port=22001, 
> startcode=1536266763109
> 2018-09-06 14:01:39,312 WARN 
> org.apache.hadoop.hbase.regionserver.HRegionServer: reportForDuty failed; 
> sleeping and then retrying.
> 
> {code}
> For example, I am looking at a large cluster now that had a backlog of 
> procedure WALs. It is taking a couple of hours recreating the procedure-state 
> because there are millions of procedures outstanding. Meantime, the Master 
> log is just full of the above message – every three seconds...



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HBASE-21223) [amv2] Remove abort_procedure from shell

2018-09-25 Thread Hudson (JIRA)



[ 
https://issues.apache.org/jira/browse/HBASE-21223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16628069#comment-16628069
 ] 

Hudson commented on HBASE-21223:


Results for branch branch-2
[build #1301 on 
builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/branch-2/1301/]: 
(x) *{color:red}-1 overall{color}*

details (if available):

(/) {color:green}+1 general checks{color}
-- For more information [see general 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2/1301//General_Nightly_Build_Report/]




(x) {color:red}-1 jdk8 hadoop2 checks{color}
-- For more information [see jdk8 (hadoop2) 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2/1301//JDK8_Nightly_Build_Report_(Hadoop2)/]


(x) {color:red}-1 jdk8 hadoop3 checks{color}
-- For more information [see jdk8 (hadoop3) 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2/1301//JDK8_Nightly_Build_Report_(Hadoop3)/]


(/) {color:green}+1 source release artifact{color}
-- See build output for details.


(/) {color:green}+1 client integration test{color}


> [amv2] Remove abort_procedure from shell
> 
>
> Key: HBASE-21223
> URL: https://issues.apache.org/jira/browse/HBASE-21223
> Project: HBase
>  Issue Type: Bug
>  Components: amv2, hbck2, shell
>Reporter: stack
>Assignee: stack
>Priority: Critical
> Fix For: 3.0.0, 2.2.0, 2.1.1
>
> Attachments: HBASE-21223.branch-2.1.001.patch, 
> HBASE-21223.branch-2.1.002.patch
>
>
> Remove this command. It will cause more damage than it could ever solve. It 
> should exist, it should be out in hbck2, not here in user-space.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HBASE-21164) reportForDuty to spew less log if master is initializing

2018-09-25 Thread Hudson (JIRA)



[ 
https://issues.apache.org/jira/browse/HBASE-21164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16628070#comment-16628070
 ] 

Hudson commented on HBASE-21164:


Results for branch branch-2
[build #1301 on 
builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/branch-2/1301/]: 
(x) *{color:red}-1 overall{color}*

details (if available):

(/) {color:green}+1 general checks{color}
-- For more information [see general 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2/1301//General_Nightly_Build_Report/]




(x) {color:red}-1 jdk8 hadoop2 checks{color}
-- For more information [see jdk8 (hadoop2) 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2/1301//JDK8_Nightly_Build_Report_(Hadoop2)/]


(x) {color:red}-1 jdk8 hadoop3 checks{color}
-- For more information [see jdk8 (hadoop3) 
report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2/1301//JDK8_Nightly_Build_Report_(Hadoop3)/]


(/) {color:green}+1 source release artifact{color}
-- See build output for details.


(/) {color:green}+1 client integration test{color}


> reportForDuty to spew less log if master is initializing
> 
>
> Key: HBASE-21164
> URL: https://issues.apache.org/jira/browse/HBASE-21164
> Project: HBase
>  Issue Type: Improvement
>  Components: regionserver
>Reporter: stack
>Assignee: Mingliang Liu
>Priority: Minor
> Fix For: 3.0.0, 2.2.0, 2.1.1
>
> Attachments: HBASE-21164.005.patch, HBASE-21164.006.patch, 
> HBASE-21164.007.patch, HBASE-21164.008.patch, HBASE-21164.009.patch, 
> HBASE-21164.branch-2.1.001.patch, HBASE-21164.branch-2.1.002.patch, 
> HBASE-21164.branch-2.1.003.patch, HBASE-21164.branch-2.1.004.patch
>
>
> RegionServers do reportForDuty on startup to tell Master they are available. 
> If Master is initializing, and especially on a big cluster when it can take a 
> while particularly if something is amiss, the log every three seconds is 
> annoying and doesn't do anything of use. We should spew less those logs. Here 
> is example:
> {code:java}
> 2018-09-06 14:01:39,312 INFO 
> org.apache.hadoop.hbase.regionserver.HRegionServer: reportForDuty to 
> master=vc0207.halxg.cloudera.com,22001,1536266763109 with port=22001, 
> startcode=1536266763109
> 2018-09-06 14:01:39,312 WARN 
> org.apache.hadoop.hbase.regionserver.HRegionServer: reportForDuty failed; 
> sleeping and then retrying.
> 
> {code}
> For example, I am looking at a large cluster now that had a backlog of 
> procedure WALs. It is taking a couple of hours recreating the procedure-state 
> because there are millions of procedures outstanding. Meantime, the Master 
> log is just full of the above message – every three seconds...



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (HBASE-21232) Show table state in Tables view on Master home page

2018-09-25 Thread stack (JIRA)

stack created HBASE-21232:
-

 Summary: Show table state in Tables view on Master home page
 Key: HBASE-21232
 URL: https://issues.apache.org/jira/browse/HBASE-21232
 Project: HBase
  Issue Type: Bug
  Components: UI
Affects Versions: 2.1.0
Reporter: stack
Assignee: stack
 Fix For: 2.1.1
 Attachments: table.pdf

Add a column to the Tables panel on the Master home page. Useful when trying to 
figure if table is enabled/disable/disabling/enabling...



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HBASE-21232) Show table state in Tables view on Master home page

2018-09-25 Thread stack (JIRA)



 [ 
https://issues.apache.org/jira/browse/HBASE-21232?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-21232:
--
Attachment: table.pdf

> Show table state in Tables view on Master home page
> ---
>
> Key: HBASE-21232
> URL: https://issues.apache.org/jira/browse/HBASE-21232
> Project: HBase
>  Issue Type: Bug
>  Components: UI
>Affects Versions: 2.1.0
>Reporter: stack
>Assignee: stack
>Priority: Major
> Fix For: 2.1.1
>
> Attachments: table.pdf
>
>
> Add a column to the Tables panel on the Master home page. Useful when trying 
> to figure if table is enabled/disable/disabling/enabling...



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HBASE-21232) Show table state in Tables view on Master home page

2018-09-25 Thread stack (JIRA)



[ 
https://issues.apache.org/jira/browse/HBASE-21232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16628103#comment-16628103
 ] 

stack commented on HBASE-21232:
---

Above is a picture of what it looks like.

> Show table state in Tables view on Master home page
> ---
>
> Key: HBASE-21232
> URL: https://issues.apache.org/jira/browse/HBASE-21232
> Project: HBase
>  Issue Type: Bug
>  Components: UI
>Affects Versions: 2.1.0
>Reporter: stack
>Assignee: stack
>Priority: Major
> Fix For: 2.1.1
>
> Attachments: table.pdf
>
>
> Add a column to the Tables panel on the Master home page. Useful when trying 
> to figure if table is enabled/disable/disabling/enabling...



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HBASE-21232) Show table state in Tables view on Master home page

2018-09-25 Thread stack (JIRA)



 [ 
https://issues.apache.org/jira/browse/HBASE-21232?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-21232:
--
Attachment: HBASE-21232.branch-2.1.001.patch

> Show table state in Tables view on Master home page
> ---
>
> Key: HBASE-21232
> URL: https://issues.apache.org/jira/browse/HBASE-21232
> Project: HBase
>  Issue Type: Bug
>  Components: UI
>Affects Versions: 2.1.0
>Reporter: stack
>Assignee: stack
>Priority: Major
> Fix For: 2.1.1
>
> Attachments: HBASE-21232.branch-2.1.001.patch, table.pdf
>
>
> Add a column to the Tables panel on the Master home page. Useful when trying 
> to figure if table is enabled/disable/disabling/enabling...



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HBASE-21232) Show table state in Tables view on Master home page

2018-09-25 Thread stack (JIRA)



 [ 
https://issues.apache.org/jira/browse/HBASE-21232?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-21232:
--
Release Note: Add table state column to the tables panel
  Status: Patch Available  (was: Open)

Small patch.

> Show table state in Tables view on Master home page
> ---
>
> Key: HBASE-21232
> URL: https://issues.apache.org/jira/browse/HBASE-21232
> Project: HBase
>  Issue Type: Bug
>  Components: UI
>Affects Versions: 2.1.0
>Reporter: stack
>Assignee: stack
>Priority: Major
> Fix For: 2.1.1
>
> Attachments: HBASE-21232.branch-2.1.001.patch, table.pdf
>
>
> Add a column to the Tables panel on the Master home page. Useful when trying 
> to figure if table is enabled/disable/disabling/enabling...



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Assigned] (HBASE-21230) BackupUtils#checkTargetDir doesn't compose error message correctly

2018-09-25 Thread liubangchen (JIRA)



 [ 
https://issues.apache.org/jira/browse/HBASE-21230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

liubangchen reassigned HBASE-21230:
---

Assignee: liubangchen

> BackupUtils#checkTargetDir doesn't compose error message correctly
> --
>
> Key: HBASE-21230
> URL: https://issues.apache.org/jira/browse/HBASE-21230
> Project: HBase
>  Issue Type: Bug
>  Components: backup&restore
>Reporter: Ted Yu
>Assignee: liubangchen
>Priority: Minor
>
> Here is related code:
> {code}
>   String expMsg = e.getMessage();
>   String newMsg = null;
>   if (expMsg.contains("No FileSystem for scheme")) {
> newMsg =
> "Unsupported filesystem scheme found in the backup target url. 
> Error Message: "
> + newMsg;
> {code}
> I think the intention was to concatenate expMsg at the end of newMsg.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HBASE-21217) Revisit the executeProcedure method for open/close region

2018-09-25 Thread Duo Zhang (JIRA)



[ 
https://issues.apache.org/jira/browse/HBASE-21217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16628157#comment-16628157
 ] 

Duo Zhang commented on HBASE-21217:
---

Ping [~allan163], mind opening the issue to upload your patch for branch-2.0 & 
branch-2.1? Thanks.

> Revisit the executeProcedure method for open/close region
> -
>
> Key: HBASE-21217
> URL: https://issues.apache.org/jira/browse/HBASE-21217
> Project: HBase
>  Issue Type: Sub-task
>  Components: amv2, proc-v2
>Reporter: Duo Zhang
>Assignee: Duo Zhang
>Priority: Critical
> Fix For: 3.0.0, 2.2.0
>
> Attachments: HBASE-21217-v1.patch, HBASE-21217-v2.patch, 
> HBASE-21217.patch
>
>
> Currently we just call openRegion and closeRegion directly, which is a bit 
> buggy. For example, in order to not fail all the open region requests while 
> there is only one failure, we will catch the exception and set a flag in the 
> return value. But for executeProcedures call, the return value will be 
> ignored, and we expect the openRegion method will always call 
> reportRegionStateTransition to report the failure but in fact it does not...
> And after HBASE-20881, we can confirm that the race could happen, where we 
> send a close request to a region which is opening(HBASE-21199), and vice 
> visa. So I think here we need to revisit the implementation of 
> executeProcedures to make it more stable.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HBASE-21227) Implement exponential retrying backoff for Assign/UnassignRegionHandler introduced in HBASE-21217

2018-09-25 Thread Duo Zhang (JIRA)



[ 
https://issues.apache.org/jira/browse/HBASE-21227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16628160#comment-16628160
 ] 

Duo Zhang commented on HBASE-21227:
---

Let me check RetryCounter. Thanks.

> Implement exponential retrying backoff for Assign/UnassignRegionHandler 
> introduced in HBASE-21217
> -
>
> Key: HBASE-21227
> URL: https://issues.apache.org/jira/browse/HBASE-21227
> Project: HBase
>  Issue Type: Sub-task
>  Components: amv2, regionserver
>Reporter: Duo Zhang
>Assignee: Duo Zhang
>Priority: Major
> Fix For: 3.0.0, 2.2.0
>
> Attachments: HBASE-21227.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HBASE-21228) Memory leak since AbstractFSWAL caches Thread object and never clean later

2018-09-25 Thread Duo Zhang (JIRA)



[ 
https://issues.apache.org/jira/browse/HBASE-21228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16628163#comment-16628163
 ] 

Duo Zhang commented on HBASE-21228:
---

IIRC ThreadLocal has a initialValue method to create the first value, so you do 
not need to add the null check?

Anyway, good catch. I believe the intention here is that only rpc handlers will 
call this method so the map is bound, as we the number of our rpc handler is 
fixed. And this is wrong, obviously, as you described in the description 
section.

And the comment for syncFuturesByHandler is broken? And the name 
'syncFuturesByHandler' is not suitable any more?

> Memory leak since AbstractFSWAL caches Thread object and never clean later
> --
>
> Key: HBASE-21228
> URL: https://issues.apache.org/jira/browse/HBASE-21228
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.1.0, 2.0.2, 1.4.7
>Reporter: Allan Yang
>Assignee: Allan Yang
>Priority: Major
> Attachments: HBASE-21228.branch-2.0.001.patch
>
>
> In AbstractFSWAL(FSHLog in branch-1), we have a map caches thread and 
> SyncFutures.
> {code:java}
> /**
>* Map of {@link SyncFuture}s keyed by Handler objects. Used so we reuse 
> SyncFutures.
>* 
>* TODO: Reuse FSWALEntry's rather than create them anew each time as we do 
> SyncFutures here.
>* 
>* TODO: Add a FSWalEntry and SyncFuture as thread locals on handlers 
> rather than have them get
>* them from this Map?
>*/
>   private final ConcurrentMap syncFuturesByHandler;
> {code}
> A colleague of mine find a memory leak case caused by this map.
> Every thread who writes WAL will be cached in this map, And no one will clean 
> the threads in the map even after the thread is dead.
> In one of our customer's cluster, we noticed that even though there is no 
> requests, the heap of the RS is almost full and CMS GC was triggered every 
> second.
> We dumped the heap and then found out there were more than 30 thousands 
> threads with Terminated state. which are all cached in this map above. 
> Everything referenced in these threads were leaked. Most of the threads are:
> 1.PostOpenDeployTasksThread, which will write Open Region mark in WAL
> 2. hconnection-0x1f838e31-shared--pool, which are used to write index short 
> circuit(Phoenix), and WAL will be write and sync in these threads.
> 3. Index writer thread(Phoenix), which referenced by 
> RegionCoprocessorHost$RegionEnvironment then by HRegion and finally been 
> referenced by PostOpenDeployTasksThread.
> We should turn this map into a thread local one, let JVM GC the terminated 
> thread for us.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HBASE-19121) HBCK for AMv2 (A.K.A HBCK2)

2018-09-25 Thread Duo Zhang (JIRA)



[ 
https://issues.apache.org/jira/browse/HBASE-19121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16628181#comment-16628181
 ] 

Duo Zhang commented on HBASE-19121:
---

I think we should have hbck2 for all branches? IIRC on another issue, 
[~allan163] said that he will backport the hbck2 stuffs to branch-2.0.

And I do not think branching 2.2 from 2.1 can solve the problem permanently. 
Hbck2 is in a separated repo, but lots of the recovery code are in the hbase 
repo, and if we rely on the new recovery code then hbck2 will not be compatible 
with the old versions of hbase...

A possible way to deal with this problem is that, we align the version of hbase 
and hbck2, i.e, release a new version of hbck2 every time when we release a new 
hbase version.

Thanks.

> HBCK for AMv2 (A.K.A HBCK2)
> ---
>
> Key: HBASE-19121
> URL: https://issues.apache.org/jira/browse/HBASE-19121
> Project: HBase
>  Issue Type: Bug
>  Components: hbck
>Reporter: stack
>Assignee: Umesh Agashe
>Priority: Major
> Attachments: hbase-19121.master.001.patch
>
>
> We don't have an hbck for the new AM. Old hbck may actually do damage going 
> against AMv2.
> Fix.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HBASE-19121) HBCK for AMv2 (A.K.A HBCK2)

2018-09-25 Thread stack (JIRA)



[ 
https://issues.apache.org/jira/browse/HBASE-19121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16628200#comment-16628200
 ] 

stack commented on HBASE-19121:
---

hbck2 for all branches is tough given we then add a new HbckService on a point 
release. We were trying to do bug fixes only on point releases.

2.1.1 has 126 fixes in it so far too... enough to make a minor release?

If folks think hbck2 is an exception and that we should allow it in on a point 
release, thats fine. I can take it to the dev list for discussion.

> HBCK for AMv2 (A.K.A HBCK2)
> ---
>
> Key: HBASE-19121
> URL: https://issues.apache.org/jira/browse/HBASE-19121
> Project: HBase
>  Issue Type: Bug
>  Components: hbck
>Reporter: stack
>Assignee: Umesh Agashe
>Priority: Major
> Attachments: hbase-19121.master.001.patch
>
>
> We don't have an hbck for the new AM. Old hbck may actually do damage going 
> against AMv2.
> Fix.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HBASE-21217) Revisit the executeProcedure method for open/close region

2018-09-25 Thread Allan Yang (JIRA)



[ 
https://issues.apache.org/jira/browse/HBASE-21217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16628205#comment-16628205
 ] 

Allan Yang commented on HBASE-21217:


{quote}
Ping Allan Yang, mind opening the issue to upload your patch for branch-2.0 & 
branch-2.1? Thanks.
{quote}
Sure, will open a issue later today.

> Revisit the executeProcedure method for open/close region
> -
>
> Key: HBASE-21217
> URL: https://issues.apache.org/jira/browse/HBASE-21217
> Project: HBase
>  Issue Type: Sub-task
>  Components: amv2, proc-v2
>Reporter: Duo Zhang
>Assignee: Duo Zhang
>Priority: Critical
> Fix For: 3.0.0, 2.2.0
>
> Attachments: HBASE-21217-v1.patch, HBASE-21217-v2.patch, 
> HBASE-21217.patch
>
>
> Currently we just call openRegion and closeRegion directly, which is a bit 
> buggy. For example, in order to not fail all the open region requests while 
> there is only one failure, we will catch the exception and set a flag in the 
> return value. But for executeProcedures call, the return value will be 
> ignored, and we expect the openRegion method will always call 
> reportRegionStateTransition to report the failure but in fact it does not...
> And after HBASE-20881, we can confirm that the race could happen, where we 
> send a close request to a region which is opening(HBASE-21199), and vice 
> visa. So I think here we need to revisit the implementation of 
> executeProcedures to make it more stable.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HBASE-19121) HBCK for AMv2 (A.K.A HBCK2)

2018-09-25 Thread Allan Yang (JIRA)



[ 
https://issues.apache.org/jira/browse/HBASE-19121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16628219#comment-16628219
 ] 

Allan Yang commented on HBASE-19121:


{quote}
A possible way to deal with this problem is that, we align the version of hbase 
and hbck2, i.e, release a new version of hbck2 every time when we release a new 
hbase version.
{quote}
IMHO, HBCK2 should be able to against every hbase2 version. We don't want user 
to download different versions for different hbase clusters. At least for some 
basic operations, higher hbase version should be compatible with lower HBCK2 
version. 

> HBCK for AMv2 (A.K.A HBCK2)
> ---
>
> Key: HBASE-19121
> URL: https://issues.apache.org/jira/browse/HBASE-19121
> Project: HBase
>  Issue Type: Bug
>  Components: hbck
>Reporter: stack
>Assignee: Umesh Agashe
>Priority: Major
> Attachments: hbase-19121.master.001.patch
>
>
> We don't have an hbck for the new AM. Old hbck may actually do damage going 
> against AMv2.
> Fix.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HBASE-20734) Colocate recovered edits directory with hbase.wal.dir

2018-09-25 Thread Reid Chan (JIRA)



[ 
https://issues.apache.org/jira/browse/HBASE-20734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16628222#comment-16628222
 ] 

Reid Chan commented on HBASE-20734:
---

Conflicts in branch-2,
{code}
<<< HEAD
  51 * ClassSize.REFERENCE + 3 * Bytes.SIZEOF_INT +
  (14 * Bytes.SIZEOF_LONG) +
===
  53 * ClassSize.REFERENCE + 3 * Bytes.SIZEOF_INT +
  (15 * Bytes.SIZEOF_LONG) +
>>> 0e173d38b0... HBASE-20734 Colocate recovered edits directory with 
>>> hbase.wal.dir
{code}
I kept the {{(14 * Bytes.SIZEOF_LONG)}} because this patch doesn't include any 
new added long.

> Colocate recovered edits directory with hbase.wal.dir
> -
>
> Key: HBASE-20734
> URL: https://issues.apache.org/jira/browse/HBASE-20734
> Project: HBase
>  Issue Type: Improvement
>  Components: MTTR, Recovery, wal
>Reporter: Ted Yu
>Assignee: Zach York
>Priority: Major
> Fix For: 3.0.0
>
> Attachments: HBASE-20734.branch-1.001.patch, 
> HBASE-20734.branch-1.002.patch, HBASE-20734.branch-1.003.patch, 
> HBASE-20734.branch-1.004.patch, HBASE-20734.branch-1.005.patch, 
> HBASE-20734.master.001.patch, HBASE-20734.master.002.patch, 
> HBASE-20734.master.003.patch, HBASE-20734.master.004.patch, 
> HBASE-20734.master.005.patch, HBASE-20734.master.006.patch, 
> HBASE-20734.master.007.patch, HBASE-20734.master.008.patch, 
> HBASE-20734.master.009.patch, HBASE-20734.master.010.patch, 
> HBASE-20734.master.011.patch, HBASE-20734.master.012.patch
>
>
> During investigation of HBASE-20723, I realized that we wouldn't get the best 
> performance when hbase.wal.dir is configured to be on different (fast) media 
> than hbase rootdir w.r.t. recovered edits since recovered edits directory is 
> currently under rootdir.
> Such setup may not result in fast recovery when there is region server 
> failover.
> This issue is to find proper (hopefully backward compatible) way in 
> colocating recovered edits directory with hbase.wal.dir .



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

1 2 >

1 - 100 of 107 matches

Mail list logo